Spot Welding Parameter Tuning for Weld Defect Prevention in Automotive Production Lines: An ML-Based Approach

Bayır, Musa; Yücel, Ertuğrul; Kaya, Tolga; Yıldırım, Nihan

doi:10.3390/info14010050

Open AccessArticle

Spot Welding Parameter Tuning for Weld Defect Prevention in Automotive Production Lines: An ML-Based Approach

by

Musa Bayır

,

Ertuğrul Yücel

,

Tolga Kaya

and

Nihan Yıldırım

^*

Management Faculty, Istanbul Technical University, Istanbul 36626, Turkey

^*

Author to whom correspondence should be addressed.

Information 2023, 14(1), 50; https://doi.org/10.3390/info14010050

Submission received: 25 October 2022 / Revised: 20 December 2022 / Accepted: 30 December 2022 / Published: 13 January 2023

(This article belongs to the Special Issue Predictive Analytics and Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

Spot welding is a critical joining process which presents specific challenges in early defect detection, has high rework costs, and consumes excessive amounts of materials, hindering effective, sustainable production. Especially in automotive manufacturing, the welding source’s quality needs to be controlled to increase the efficiency and sustainable performance of the production lines. Using data analytics, manufacturing companies can control and predict the welding parameters causing problems related to resource quality and process performance. In this study, we aimed to define the root cause of welding defects and solve the welding input value range problem using machine learning algorithms. In an automotive production line application, we analyzed real-time IoT data and created variables regarding the best working range of welding input parameters required in the inference analysis for expulsion reduction. The results will help to provide guidelines and parameter selection approaches to model ML-based solutions for the optimization problems associated with welding.

Keywords:

spot welding; machine learning; tree algorithms; expulsion reduction; welding parameters

1. Introduction

In the digitalization of production lines, manufacturing companies are strongly oriented toward minimizing defects in production processes by adopting intelligent solutions enabled by digital tools. In this context, data connectivity, predictive analytics using machine learning models, intelligent manufacturing systems, and big data solutions act as primary tools to improve production efficiency and profitability. In particular, intelligent control systems, which include learning process and pattern recognition algorithms, offer opportunities to detect rare quality defects in the automotive industry, acting as a leader and first adopter of smart manufacturing worldwide [1,2]. Artificial intelligence and machine learning (ML) techniques for process simulation are enablers of “Time to Market” and “Do It Right the First Time” expectations in the automotive industry [3]. Hence, the quality of the welding process urgently needs to be improved to enable higher resource efficiency and cost reduction [4]. In practice, weld quality checks are usually conducted at the end of a welding operation. Hence, welding defects can only be identified after the welding is completed, causing delays in defect identification and necessitating reworks and excessive material usage, which hinder sustainable production [5].

In this context, the use of optimized parameter combinations significantly reduces the possibility of scrap in the welding process. However, setting parameters from complex welding parameter sets is a challenge in machine learning applications. Mechanical, thermal, and metallurgical processes have severe effects on each other in expulsion; for example, the electric current determines the heat input rate affecting the source’s temperature [6]. On the other hand, the estimation or control of expulsion is a complicated process which requires the application of various techniques. The input parameters of the welding process are the most critical factors affecting welding success. In this context, finding the optimum values of these input parameters increases the welding process’s efficiency. Machine learning algorithms can be used to select the variables for machine learning applications in production systems. In a study by Gujre and Anand [7], which aimed to find the input parameters’ optimum values, the welding input’s values and reference values were determined by the experts in a production line. Although this application reduced the rate of defective output from the welding process, high defective output rates remained in some product types, which made the identification of the root cause of the problem challenging.

It is crucial to analyze production data and prepare it for ML to solve problems related to quality. However, due to the lack of solutions in industry, there is a huge demand for defect identification with multi-sensory system applications. As presented in Section 2, many previous studies provided valuable examples of machine learning applications to welding processes. However, few studies focused on precise welding process parameter selection in expulsion prediction with the usage of a welding robot.

In addition, setting parameters requires a thorough understanding of the inter-relationships between potential welding process parameters, reflected by their correlations. In this context, this paper focuses on a machine learning application to define the welding process quality parameters and predict the variables which cause expulsion defects in a case study from the automobile production line of a major Turkish automotive producer. In this article, we discuss the input parameters that may cause defective outputs and the methods that can be applied to improve the process. A detailed literature review was conducted regarding the welding parameters affecting process quality based on the theoretical background. In addition, we revisited previous research regarding machine learning applications on similar welding processes. After defining the parameters that cause expulsion, we examined product types with parameter values that cause more defects. Accordingly, we revised the values given to the welding robot for system adjustment.

In this paper, we also present a procedure for the identification of the best working range of welding input parameters and the most probable set of variables that might cause defect problems. To design a prediction model, the data domain knowledge regarding machine welding parameters, multiple machine learning algorithms, and the welding robot’s working principle were considered. The analysis was based on real-time data exported from the factory production line’s process databases. Then, new and the most practical variables were defined in the raw data set. With a tree-based machine learning algorithm, model outputs provided data to the welding robot in the automotive production line. The model was used to predict the expulsion per distance of welding parameters from the reference value. The parameter values were measured instantaneously, and the formation of expulsion with sudden breakages was detected.

2. Background

The utilization of AI to address challenges and problems in welding has been extensively discussed in the literature [8]. Previous research focused on increasing production line efficiency, which benefited from the application of many analytical approaches to predict and detect problems regarding the maintenance time, defective products, welding defects, and root cause analyses of the product and process quality problems. In all production line processes, to produce a product with appropriate quality, precise combinations of input process conditions are needed [9]. In welding processes, poorly controlled welding parameters and weld geometry have been main topics of research regarding process forecasting precision for quality improvement [8]. KNN (k-Nearest Neighbor), ANNs (artificial neural networks), backpropagation (BP) in neural networks, random forest algorithms, SVMs (support vector machines), logistic regression and its regularized versions, AECs (Auto-Encoder Classifiers), and sequence tagging are among the models applied to solve problems regarding welding quality in the literature. In an initial study, Haapalainen et al. [10] used processed and finalized output samples without defects to select the welding process parameters, using the KNN algorithm to compare each sample’s parameters which entered the welding process with the previously concluded observations’ parameter values without defects. They aimed to choose the most representative variables to find similarity between the observations and increase efficiency; however, the calculation of the similarity in outlier observations was a limitation to the application of this method. In the following publications, researchers applied ANN (artificial neural network) modeling to predict the depth of penetration and weld bead width from the weld pool’s infrared thermal image [11], the random forest algorithm and logistic regression for variable selection and model building to predict low-weld-quality tubes [7], and J48 and random forest classification to determine the root cause of weld defects associated with input parameters [12]. Gujre and Anand [7] concluded that process parameters’ safe operating ranges increase the efficiency of the welding process.

In the following years, researchers also combined digital technologies with machine learning applications. Rather than manual methods, intelligent systems began to be used to more accurately detect the welding defects in production lines. During this period, CNN algorithms and multiple classification approaches were used to classify defect images [13], while the SURF (Speeded-up Robust Features) method could be used to distinguish the weld defects most effectively, the AEC (Auto-Encoder Classifier) was used to solve the classification problem [14], and tree-based machine learning methods such as XGBoost and Random Forest algorithms detected potential failures using real-time data and predicted signals for potential failures [15]. Focusing on the root cause of weld defects associated with input parameters, which provide high precision regarding the welding process parameters’ values, and utilizing industrial image processing to automate quality controls and detect abnormal products, Rahmatov et al. [16] used machine-learning-based approaches to detect abnormal products through image classification. To determine the most effective variables, Escobar et al. [1] achieved 100% accuracy by applying the L1-regularized logistic regression learning algorithm and pattern recognition method to solve the problem of welding quality using a binary classification method. They obtained 100% accuracy in the detection of defective products. In other studies, the SVM algorithm was utilized to examine maintenance times and find a cutting tool’s correct replacement time in the production line [17], and Agglomerative Cluster Analysis methods were used to cluster undesired product states [18]; the backpropagation (BP) feature in neural networks was used to improve defect detection accuracy [19]; sequence tagging and logistic regression were used on various artificially created weld defects to generate various signals [5]; unsupervised methods such as PCA (Principal Component Analysis) were used to reduce the dimensionality of the welding process feature value data set, and k-means was used to classify subsets, which both outperformed the static BP neural network in the prediction of the quality of all types of welded joints [2]. Pereverzev et al. [20] developed intelligent control algorithms to determine the quality of welded joints and process performance by applying artificial intelligence to arc welding processes and direct arc growth under disruptive conditions. Table 1 shows a summary of a comparative analysis of these studies, including their problems, approaches, and results. Additionally, Afroz et al. [21] developed a wearable speed monitoring device (SMD) by fusing optical and inertial sensor data to measure the handheld device speed of the welder, with the aim of tuning the parameters affecting the performance of the SMD prototype in a real industrial environment. Czimmerman et al. [22] conducted a survey among producers and concluded that neural networks represent a powerful technique which is often employed in artificial image processing as they can be used to solve nearly every classification problem; however, the main drawback is the large number of training samples required and due to the lack of solutions in industry, there is a huge demand for an increase in defect identification efficiency with multi-sensory system applications [22]. On the other hand, unsupervised learning methods which are used for density estimation, dimensionality reduction, and clustering problems in manufacturing processes have lower efficiencies than supervised learning methods [22].

3. Methodology

This paper’s methodology is split into two sections regarding problem identification, parameter selection, and the prediction of expulsion as a problem related to the quality of the welding process. In the study, we focused on problems in the welding process in the production line of a major Turkish Automotive Producer. Firstly, the Welding Process Parameters and Data Explanation is provided to present a procedure for the identification of the best working range of welding input parameters and the most probable set of variables that might cause welding defect problems. Secondly, to design the prediction model, the machine welding parameters, the machine learning algorithms, data domain knowledge, and the welding robot’s working principle were defined. Six ML prediction models were applied to real-time data exported from the factory database which provided the raw data in the studied automotive production line.

The methodological process and ML Model Flowchart to predict spot welding quality is provided in Figure 1.

3.1. Welding Process Parameters and Data Explanation

Figure 2 presents the architecture of the expulsion prediction system and the studied automotive production system’s data flow in the robot welding application. This is a scalable process for the production line in which welding and tracking robots provide the prediction models’ data sets.

The manufacturing robot fed the IoT hub, and the system used IoT to collect data online from the robot and this hub. The collected data were stored in a database, including the variables to be analyzed for spot welding resistance. After data modeling, the system reflected the outputs on a dashboard with a user-friendly interface (Figure 2). The data set covered 32,609 welding counts in which 30,192 had no expulsions.

“Spot Resistance Welding” is electrical resistance welding, which successfully fuses materials to be combined [23]. This welding process mainly includes welding current, welding time, electrode force, contact resistance, electrode material property, surface condition, etc., controlled by variables. The weld current, voltage, gas flow rate, heat input, and the AE energy accumulation rate (extracted from time-driven AE) can be used as welding parameters in an acoustic emission system [5].

By revisiting some basic definitions, we can provide better understanding of the welding parameters:

(a): Concerning the welding current, the basic principle of spot resistance welding refers to providing a large enough current in a sufficient time from the point to be welded. A small electrical current may cause a point not to melt adequately and thus to not be welded, while a large current can cause melting at the welding point and explosions or electrode distortions. The current continuously increases until expulsion is formed between the metal sheets. When determining the current to be used, the current temporal changes should be taken into account, and a gradual increase in current is preferred [23].
(b): Resistance welding processes are utilized to combine metals and contain spot welding and seam welding, where resistance is a factor characteristic associated with the material between the weld surfaces, with dynamic interaction with other parameters such as current and force [24].
(c): Welding time is a directly proportional variable to heat generation. Generally, the theoretical minimum current and time required for welding are insufficient to weld materials due to various losses; thus, the determination of weld time is one of the most challenging stages in the welding process [6]. As the welding time is related to the welding point requirements, it is not easy to provide exact values for optimum welding [25].

Afroz et al. [21] mentioned that heat input is used to ensure the quality of submerged metal arc welding, one of the most common metal joining procedures in the manufacturing industry. This heat parameter is dependent on the current and voltage, defined by the specific welding procedure, along with the welding speed, a feature that is strongly dependent on the hand movement of the welding operator [21].

(d): The surface coating protects the material from corrosion or other reactions; however, it makes the welding of the resistance more difficult and facilitates ordinarily tricky processes in which separate electrode and welding parameter settings exist for each coating type [26].
(e): Electrode force compresses the metal sheets to be joined. If the welding quality is low, a large electrode force is required, causing other problems [6]. There is an inverse relationship between heat energy and electrode force, meaning that higher electrode strength requires a higher welding current [23].
(f): Holding time refers to the time for which the electrodes are applied to cool the source after welding (the welding ingot must solidify, making the cooling time necessary before releasing the welded parts). A long hold time and a higher proportion of carbon content elements may result in the weld becoming brittle [23].
(g): Welding voltage is a parameter developed together with the heat development formula and determines the phase mode of the welding process without significantly affecting the heat [27].

In the welding process, weld current, weld resistance, weld time, surface coating, electrode force, hold time, and welding voltage have severe effects on each other. Figure 3 shows an example of the appropriate welding range for a welding lobe due to the evaluation of parameters such as welding time and welding current, as adapted from Hwang et al. [6].

No expulsion occurs if a suitable welding time is applied while working at a lower current. According to Wan et al. [26], the acceptable current range can behave more flexibly under electrode force conditions; however, the current range is limited. The pulse-type welding current waveform control could be used to reduce weld expulsion and increase the acceptable welding current range. Many studies suggested the use of electrode forces to solve the problem of expulsion reduction [26].

Spot welding is widely used to join low-carbon steel components for cars, furniture, and similar products. Stainless steel, aluminum, and copper alloys are also spot-welded materials [28,29].

On the other hand, resistance spot welding can be examined as a multi-input and output process, because welding quality is directly affected by input parameters; in fact, the aim of this study was to try to optimize these parameters [27].

3.2. Predicting the Expulsion in the Welding Process Outputs of Automotive Production Line

The material used in this study was welding robot data and process information from the studied automotive production line. Figure 4 shows the analytical process including the data collection, feature elimination, classification modeling via learning algorithms, and evaluation of model results.

In the case of an application in an automotive manufacturing line, the dependent variable was defined as the “expulsion” in the welding process’s outputs. The values of the variables used in the root-cause analysis were taken from the welding robot’s database. The application also used the robots’ log data, including the observations logged to the database on factory servers by the welding robots after each operation. This application aimed to predict the classes of dependent variables belonging to observations with machine learning methods.

3.2.1. Variable Setting and Feature Selection

First, the welding process inputs were examined, and the variables with a correlation rate of over 70% were eliminated. Then, we attempted to obtain new variables via variable transformation and evaluated the results using a machine learning model. The welding robot calculated the expulsion using a complex algorithm, resulting in classification with an accuracy of 100% during the defined test period. The robot’s prediction results were used as the dependent variable. In this model, we aimed to converge true negative (TN) and true positive (TP) values to the values classified by the robot. After selecting the model and variables with the most potent representation of the robot, we explored the problem’s root cause. Table 2 shows the raw features taken from the welding machine database system.

The explanations of these variables are presented below:

ID: The values which are assigned for each row.
Çapak (Expulsion): “Çapak” is actually a dependent variable which was to be related throughout the project. In each observation, the categories of expulsion or not expulsion categories were assigned. Being assigned to the “expulsion” class meant that an observation was a problematic observation. By using this model, we aimed to make predictions and analyses on this variable (“Çapak”) by using independent variables which had high explanation power over the dependent variables. The categories were entirely assigned by the robot. The categorization of the robot was controlled, concluding with the 100% correct prediction rate of the robot. The “Çapak” class was assigned to the observations if a sudden change in resistance was seen, as can be seen in Figure 5, which was taken from the robot education document.

Timer: The name of robot that was used for the analysis.
Date/Time: The exact time that the operation was carried out.
Program: One of the categorical variables in which each program referred to a different point where the welding occurred, meaning that each program represented a different category. The difference between programs varied depending on factors such as where the welding occurred, the thickness of the material, and the type of material, such as aluminum.

Spot: Each different spot type actually corresponds to each program type which has one spot number. Spot is also one of the categorical variables, like “program”.
Wear: In which order the spot welding is observed in each cycle. In time, the wear variable increases as it indicates the number of point shots that the welding made in that cycle.
Actual Voltage (Act. Volt.): The amount of voltage the robot provides during the welding period for each observation unit. This is a parameter developed together with the heat development formula and does not have a significant effect on heat.
Reference voltage (Ref. Volt.): Decided through previous studies, the reference voltage value actually indicates the optimum voltage value for each observation, aiming to complete the process without encountering any spatter problems. These reference values are given to the robot to work with during the process.
Actual current (Act. Curr.): The amount of current the robot provides for the duration of the welding for each observation unit. If a welding current is excessive, cracks may occur due to difficulties in the flow of the current from the electrodes to the material.
Reference current (Ref. Curr.): Decided through previous studies, the reference current value actually indicates the optimum current value for each observation, aiming to complete the process without encountering any expulsion-related problems. These values are also given to the robot to work with during the process.
Actual welding time (Act. Weld time): The duration of the welding process for each observation. Heat production is directly proportional to the welding time. Determining the welding time is one of the most difficult stages of the welding process.
Reference weld time (Ref. Weld Time): Decided through previous studies, this value actually indicates the optimum current value for each observation. The aim is to complete the process without encountering any expulsion-related problems. These values are also given to the robot to work with during the process.
Actual energy (Act. Energy): The amount of energy given during the process for each observation. This value is taken from the welding robot with formulations based on other variables.
Reference energy (Ref. Energy): Decided through previous studies, the reference energy value actually indicates the optimum current value for each observation. The aim is to complete the process without encountering any expulsion-related problems. These values are calculated using the formulations, and the energy values are dependent on the other variables.
Actual heat (Act. Heat.): These heat values are given for each observation while welding is carried out, aiming to bring these values as close to the optimum as possible.
Reference heat (Ref. Heat): Determined by the previous experiments, the reference heat value actually indicates the optimum current value for each observation. The aim is to complete the process without encountering any expulsion-related problems.
Actual resistance (Act. Res.): The resistance force that occurs when the electrodes join together and perform the welding process. There is also a formulation connection between actual resistance values and actual volt values.
Reference resistance (Ref. Res.): Determined by the previous experiments, the reference resistance value actually indicates the optimum current value for each observation. The aim is to complete the process without encountering any spatter-related problems.

Some of the features listed in Table 3 were transformed by using the raw variables given in Table 2. These three variables show the effect of the actual and reference values on the model. In the welding process, the robot was fed with the reference values, and it converged to these reference values for each product type. The defect rate decreased as the robot converged to the reference value. It was found that reducing the difference between the actual and reference values is the most effective way to optimize the welding process.

At this stage, instead of using variables directly, the interaction of the reference and actual values was used as variables to observe the effect of the reference values on them. Table 4 shows a summary of the numerical variables in our data set.

Regarding the selected variables in Table 4, there were no null or missing values in the process database. However, some negative values for the “Time Diff.” variable were identified, possibly due to measurement errors. The process data also revealed an “energy” loss due to expulsion, negatively affecting the machine efficiency. “Volt” values had a very narrow range due to the regulator used in the welding machines, ensuring the efficiency of the equipment. The “Current” value, which depends on volts and resistance, had a narrow range as it was calculated by the spot point (material properties) and the volt. The “Weld Time” variable had a very wide range and was a critical parameter. The chart below (Figure 6a,b) shows that the process that normally takes place with a lower weld time, which takes a longer time when expulsion occurs. The measured “Energy“ and “Heat” values also had wide ranges, in which more energy was consumed than in the normal process in the case of expulsion, but the amount of heat measured was lower because of the loss of heat due to the expulsion.

The “Resistance” variable, which is one of the most important variables, was fixed to a narrow range because it depended on the material. Our main purpose in expulsion calculation was to detect sudden changes in resistance. The “Proc. Stabler.” variable varied between 44 and 99 variables. The Figure 7 is tilted to the left (negative) for the “expulsion (blue)” and “no-expulsion (orange)” values.

As can be seen in Table 5, there was no missing data in these variables. We have 30,192 data points without expulsion out of the total of 32,609 data points. There was only one “Timer” variable, as the measurements of a single robot were evaluated. Among the 61 programs in the database, most were “Program-7”. Among the 61 spots in the database, most were “11693_00_1”. Additionally, this spot point matched “Program-7”. Among the 140 wear variables, most were “Wear-1”, with a frequency of 140. However, there were data in which the same time was measured, meaning there was duplicated data for 144 rows. Table 5 shows the number of expulsions in each program. There were 61 unique programs, and each was assigned to each specific spot point. The programs with the most expulsions in the 2417 expulsion points in our database were “19, 12, 15, 10, 18”. The total expulsion ratio in these programs was 69.9%. Additionally, 11 programs showed no expulsion. From Table 6, the average wear values for each program (program–wear relationship) and the average wear values for each program, which are labeled as “expulsion” (program–wear with expulsion), are also shown.

As the wear–expulsion graphic (Figure 8) shows, expulsion was generally seen in the first wear points. To solve this problem, the parameters in the programs may need to be optimized. The graph shows that spot-welding-related problems can exist in the wear variable. After discussions with the production line supervisors in the company, and based on the literature research, the wear variable was labeled as “ineffective”.

Finally, to summarize, before the data preparation and the selection of the ML application, the correlation values for the numeric actual features and reference features of the data set were explored. As can be seen from Table 7, there was a highly positive correlation between the actual energy and actual weld time variables, as expected and referred to in the literature. More features correlated with each other, as can be seen in Table 7. The actual heat and actual voltage and actual resistance and actual voltage had the highest correlations.

In addition, Table 8 shows the reference values’ correlation table. The correlations in this table were similar to the actual features’ correlations in Table 8, and it can be concluded that the reference values matched with the actual values, providing guidelines for feature selection.

3.2.2. Data Preparation and Selection for ML Application

As the heat and energy variables are dependent to each other (Heat = Volt (V) ∗ Current (I); Energy = Heat ∗ V ∗ I ∗ T), the model excluded them, and these variables were used in the model studies. Additionally, due to their high correlation, the volt and resistance variables were not included in the model. In addition, the fact that each program had a unique reference value had an impact on the features; hence, the program categorical variable was also not used.

Before the model was trained, the data set was divided into a training set and a test set using different methods [30]. After deciding on the new variable set, the data set was divided using the stratifying method depending on the target and product type variables. As a result, 70 percent of the data set was the training set, and the remainder was the test set.

3.2.3. ML Application: Models Selection and Findings

Classical and frequently used, well-known prediction algorithms are k-NN (k- Nearest Neighbor) and support vector machines (SVMs) [31]. SVM algorithm calculations are complex and costly, while the k-NN algorithm is non-parametric. As the aim of our study was to explain the root cause of the problem, these algorithms were not preferred. On the other hand, logistic regression (LR) is often used in classification problems as it aids in parameter estimation based on the likelihood principle and its mathematical and optimal properties [1]. Decision trees could also be favorable for use in similar classification problems because they support the differences between classes and reveal similarities [32]. Asif et al. [5] provided prediction accuracy of 91.18% and 82.35% by using sequence tagging and logistic regression algorithms in welding defect detection. Logistic regression predicted each data point separately. The adversarial sequence tagging method predicted four weld states’ presence as good, excessive penetration, burn-through, porosity, and porosity–excessive penetration.

The random forest method is often used for extensive data [33]. For this reason, in this study, the random forest algorithm was also used for the comparison of the results. Natekin and Knoll [34] claimed that it is difficult to interpret the Gradient Boosting Machine (GBM) model built from thousands of trees instead of a simple decision tree. However, various tools have been designed to solve the GBM with appropriate tools. As a result, models created in the GBM can provide us with the necessary information about variables. In particular, visualizing important variables is an important method of interpretation. Wan et al. [26] reported that XGBoost performs on imbalanced data effectively. As the data set in this study was not balanced, the XGBoost algorithm’s advantages were utilized.

4. Results

An F1 score is the harmonic mean of precision and recall, and it was used as the success metric for our study. An F1 score is calculated to measure model success in imbalanced data sets, although accuracy is frequently used [35]. The class balance of the data does not affect the F1 score. Table 4 shows the results of the models used.

4.1. Logistic Regression

In logistic regression [36], a logistic function with outputs between 0 and 1 was used as follows:

p (X) = \frac{e^{β_{0} + β_{1} X}}{1 + e^{β_{0} + β_{1} X}}

(1)

The maximum probability method was applied to the model. This function will always generate an S-shaped curve, providing a reasonable estimate. After the manipulation of the logistic function, the following equation was obtained:

\log (\frac{p (X)}{1 - p (X)}) = β_{0} + β_{1} X_{1} + \dots + β_{p} X_{p}

(2)

At the center of the logistic regression analysis in this equation is the prediction of an event’s log rate. Mathematically, logistic regression estimates the multiple linear regression function, defined as above. The algorithm classifies a given observation into the class with the highest probability by estimating Y’s conditional distribution with K-nearest neighbors’ given X values. The algorithm identifies the neighboring points in the training data closest to x₀, represented by the KNN classifier N₀, with a given positive integer K and test data. Then, the conditional probability for class j is estimated as a fraction of the points in N₀, the response values of which equal j:

\Pr (Y = j | X = {\hat{x}}_{0}) = \frac{1}{K} \sum_{i ϵ N_{0}} I (y_{i} = j)

(3)

Finally, by applying the KNN Bayes rule, test observation classifies X0 into the class with the highest probability.

When making logistic regression prediction, transformed variables are used. Model output can be seen in Table 9, in which the F statistics and p values showed that the model variables and the output of the model were significant. However, it was observed that the R² value of the model was well below the desired value depending on the R-squared measurement unit, which was 0.309, showing how well the regression line approached the real data points (Table 9). The VIF score for OLS regression results showed a consistency of 2.09; Dif. Res. Prop value of 179l Dif. Curr. Prop value of 1.69; and Dif. Weld time. Prop value of 1.14. Hence, it is concluded that the VIF values for the features were good enough to evaluate this model without any collinearity hesitation.

4.2. Support Vector Machine (SVM) Model

In applications in SVMs [31], the model learns to determine the decision boundary between the two classes while training the model. The data found at the boundary between classes are essential for the decision boundary. These are called support vectors and are used to estimate new points. A classification decision is made based on the distances to the support vector and the importance of support vectors learned during training. The distance between data points is measured in Gaussian kernel:

k_{r b f} (x_{1}, x_{2}) = \exp (ɣ {||x_{1} - x_{2}||}^{2})

(4)

Here, x₁ and x₂ are data points, ‖x₁ − x₂‖ denotes the Euclidean distance, and ɣ (gamma) is a parameter that controls the Gaussian width kernel.
When building the SVM algorithm, important parameter optimization is carried out to increase the success of the model. The best parameter values, which can be seen below, are the values that are decided before the SVM model is created.
{‘c’: 10, ‘gamma’: 5, ‘kernel’: ‘rbf’}.
The confusion matrix for the SVM model is shown in Table 10. The recall rate was found to be 0.42, and the F1 score was 0.57 for the SVM model, with a precision rate of 0.86. Depending on the precision and recall values of Expulsion = 1, it was observed that the model success did not reach the desired level, which may have caused quality-related problems.

4.3. GBM Algorithms

The primary purpose of the GBM algorithm is to maximally correlate the negative gradient of the loss functions (referred to as the Bernoulli loss), which are core components of the GBM [34]. In the classification problem, the response variable for the loss function comes from the Bernoulli distribution, and the class-based response probability can be predicted by minimizing the probability of negative logic associated with new class tags:

ψ {(y, f)}_{Bern} = \log (1 + \exp (- 2 \bar{y} f))

(5)

The GBM can produce highly optimized results when working with big data. The parameters used to increase model success are very important in the process of designing this algorithm. With the help of the GridSearch CV function, the best parameter values were selected as shown below:

{‘learning_rate’: 0.2,
‘max_depth’: 5,
‘min_samples_split’: 2,
‘n_estimators’: 500}

The results in Table 11 as obtained from the model established with the help of the GBM algorithm summarize the model success very well. The precision, recall and F1 score values were more satisfactory for the prediction of expulsion than the previous models, showing that the algorithm used in the robot was affectively analyzed by the GBM.

4.4. Decision Tree Model and Random Forest Model

Decision tree models predict that each observation belongs to the most common class of training observations in the region to which the classification tree results are interpreted, with the class ratios between the class prediction corresponding to the terminal node region and the training observations falling in that region [36]. The Gini index in (5) calculates the total variance measure between K classes. If the Gini index has a small value, the node mainly contains observations from a single class [36].

- G = \sum_{k = 1}^{K} {\hat{p}}_{mk} (1 - {\hat{p}}_{mk})

(6)

From the results in Table 12, the decision tree model we created achieved modeling success very close to the technical analysis of the robot in the production line. The precision rate was 0.91, the F1 score was 0.90, and the recall rate was 0.88 for the prediction of expulsion via the decision tree model.

The random forest algorithm can produce highly optimized results while working with big data. The parameters used to increase model success are very important in the process of designing this algorithm. In random forest models [37], a series of decision trees are created by associating trees with each other, in which a random sample of m estimators is chosen from the complete set of p predictors when building the model. A new sample of the estimator m is taken at each division. The number of estimators considered in each division is approximately equal to the square root of the total number of estimators (m ≈ √ p). Many variables are not considered for every split in the tree when creating the model. Thus, moderately strong and strong predictors are taken into account.

With the help of the GridSearch CV function, the best parameter values are selected as shown below.

{‘min_samples_split’: 3,
‘n_estimators’: 200}

Depending on the scope of the project, models are created to improve the operation of the robot in the line in order to increase the efficiency of the production line in such technical and production-based projects. In fact, using this model, it attempted to learn the robot’s operating system, and the results below prove that a good estimator was developed. The random forest model predicted expulsion with a higher precision rate (0.94) and recall rate (0.89), and an F1 score of 0.92, as can be seen from Table 13.

4.5. XG Boost

XGBoost is an optimized distributed algorithm designed to be highly efficient, flexible, and portable, implementing machine learning algorithms under the Gradient Boosting framework [38]. The purpose of XGBoost is to minimize the loss function [19].

L (F) = \sum_{i} l ({\hat{y}}_{i}, y_{i}) + \sum_{k} Γ (f_{k})

(7)

where

Γ (f_{k}) = γ T + λ {||w||}^{2}

(8)

In L(F),

l ({\hat{y}}_{i}, y_{i})

represents the loss function between the actual label of data and the predicted label. The latter function

Γ (f_{k})

is the penalizing term. T is the number of leaves in the tree, where γ, λ are two parameters which control the complexity of the tree.

The XGBOOST algorithm optimizes the model very well when working with big data because it interferes with the model with many parameters. Therefore, the parameters used to increase model success are very important in the process of designing this algorithm. With the help of the GridSearch CV function, the best parameter values were selected as shown below:

{‘eval_metric’: ‘auc’,
‘learning_rate’: 0.5,
‘max_depth’: 5,
‘min_samples_split’: 2,
‘n_estimators’: 500,
‘reg’: ‘logistic’}

In this case, models are created for better operation of the robot in order to increase production line efficiency. In fact, the operating system of the robot is learned with this model, and the following results in Table 14 prove that a best estimator model has been developed as a good reference to the robot’s operating system.

4.6. Evaluation of the Results from All Models

Table 15 also shows that the accuracy scores were relatively high. The F1 score was also evaluated as there was imbalanced data in this study. The GBM model was the most successful model in terms of the F1 score. Figure 9 represent the confusion matrix for GBM, which is the algorithm that obtained the best F1 score rate.

To elaborate on the importance of the variables in terms of their impact on the efficiency of the welding process, the welding process parameter decision tree demonstrates an actual and optimal view to gain insight into the process problem. When attention is paid to the trunks of trees, the current feature, the weld time, follows the most important variable. According to the decision tree (Figure 10), when the variables of weld time and current differ from the reference values given for each program, the possibility of expulsion always increases. This means that the reference values given for the programs by the team at the factory provided the optimum level for most of the programs.

In the feature importance plot (see Figure 11), “Dif.weld time.prop” is the most critical feature, followed by “Dif.Res.prop” (difference ratio between actual and reference resistance value) and “Dif.Curr.prop” (difference ratio between the actual and reference welding current value).

The significance level of the selected model variables was examined using TN and TP values. The significance levels of these variables were proportional to the Gini scores calculated in Equation (5). The variable which decreased the entropy most—“Dif.Weld time.prop” (difference ratio between actual and reference weld time value)—was considered most powerful feature over the model and was the independent variable explaining the dependent variable most. Therefore, it was used in the root-cause analysis of the expulsion-related problem.

5. Conclusions and Discussion

Welding processes in the manufacturing industry have quality-related problems such as expulsion, which cannot be controlled by manual mechanisms due to its dependence on various complex factors. The prediction of these quality-related problems and defects is important in terms of the prevention of quality costs and the achievement of sustainable production targets. Although previous research provided valuable examples of machine learning applications on welding processes, few studies focused on precise parameter selection in a welding process to be used in expulsion prediction. In addition, to set parameters, a thorough understanding of the inter-relationships between potential welding process parameters, reflected by their correlations, is required. Another important step in constructing the prediction model is the data preparation and the selection of the machine learning algorithm for the special data structure of the welding process robot. In this context, this paper focused on the application of machine learning to define the quality parameters in the welding process and predict the variables causing expulsion defects in a case study from the automobile production line of a major Turkish automotive producer.

The high accuracy rate of the model proved that expulsion-based defects can be minimized with the help of machine learning techniques applied in manufacturing big data architecture. In this paper, we also proposed methods to improve welding processes through a root-cause analysis based on outputs from the analytical model. The optimization of the input parameters during the welding process could help to identify the root causes of problems in the welding process in production lines. Taking everything into consideration, firms in the production sector use various smart systems to solve problems in production lines. One of the biggest quality-related problems companies in the automotive industry face is during the welding process. However, it is very difficult to control the processes and find solutions with traditional methods, because many factors affect this process. One of the problems that arises during the welding process is the expulsion problem. Supervised learning methods, due to their capabilities, are preferred for use in classification in the industry, but in many cases, they are time-consuming to train and require large data sets [22].

Rather than using the selected variables directly, we derived the interaction of the reference and actual values as variables to observe the effect of reference values on variables following the work of He and Garcia [35]. Aligned with the previous literature [1,5,19,31,32,34,36,37], learning algorithms of logistic regression, KNN, SVM, decision tree, GBM, and XGBoost were applied to the process data from the welding robot. Our data were imbalanced; hence, our study validated the advantage of XGBoost in such data [37], differing from the work of Sumesh et al. [12], who reported that the highest precision level was achieved with J48 and random forest classification algorithms with sound signal data, and Asif et al. [5], who applied sequence tagging and logistic regression algorithms on weld defect frequencies. To compensate for the imbalance in the data, we also evaluated the F1 score, as He and Garcia [35] recommended. The Gradient Boosting Machine (GBM) model was the most successful model in terms of the F1 score. Despite the GBM being difficult to interpret in previous research [34] using R and python tools, the GBM provided us with the necessary information about variables for a meaningful interpretation. Next, the XGBoost method was the second most accurate model. From this finding, we conclude that Boosting methods can effectively be adapted to ML models to predict welding process practices. It must also be noted that the random forest model was also tested; however, it was not suitable for our data set due to its need for extensive data [31].

This study is unique in its use of differences between actual and reference values of weld time, resistance, and welding current, differing from the studies by Asif et al. [5], who focused on an acoustic emission system, reporting frequencies to monitor gas metal arc weld defects, and Sumesh et al. [12], who studied sound signals.

The findings of the study revealed that the “Dif.Weldtime.prop” (difference ratio between actual and reference weld time value) variable decreased the entropy most in the studied spot welding process. This ratio was the independent variable, which explained the most dependent variable and decreased the entropy because it was the feature with the highest power over the model. Focusing on the weld time, its relationship with the speed of the welding process has also been underlined in previous studies, such as by Afroz et al. [21], who studied the optimization of a wearable speed monitoring device for welding applications. Other vital features are “difference ratio between actual and reference resistance value” and “difference ratio between the actual and reference welding current value”. Hence, in spot welding processes, these variables should be monitored and kept under strict control to achieve higher product quality and cost reductions.

The application from the case study showed that expulsion problems can be minimized with the help of machine learning techniques applied on process data; however, the availability of smart systems, such as the robotic application in the studied company, is crucial for such practices. The optimization of the input parameters during the welding process helps production line engineers to reach the root cause, as provided by the decision tree model in Figure 10.

The presented ML application can be used as a case study providing a solution model to increase the defect identification efficiency with multi-sensory system applications, as mentioned by Czimmerman [22]. Researchers and practitioners who aim to locate and solve weld-defect-related problems with machine learning methods can utilize this study’s procedures and findings to optimize the input parameters in the welding process. In further research, clustering methods can be utilized for sub-data sets; the inclusion of the energy loss feature in data can ensure the prediction of sustainability performance. In addition, to detect and classify defects that can occur during welding, future studies can utilize deep learning methods.

Author Contributions

Conceptualization, M.B., E.Y., T.K. and N.Y.; methodology, M.B., E.Y. and T.K.; software, M.B. and E.Y.; validation, M.B., E.Y., T.K. and N.Y.; formal analysis, M.B. and E.Y., resources, M.B., E.Y. and N.Y.; data curation, M.B., E.Y.; writing—original draft preparation, M.B. and E.Y.; writing—review and editing, N.Y. and T.K.; visualization, M.B. and E.Y.; supervision, T.K. and N.Y.; project administration, T.K. and N.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Due to the nature of this research, participants in this study did not agree to their data being shared publicly; therefore, supporting data are not available.

Acknowledgments

We would like to acknowledge and thank to Agile Tribe Lead, Data Science and AI Lead Haydar Vural and AI-Data Science Chapter Lead Şirin Altıok in TOFAŞ Türk Otomobil Fabrikası A.Ş. for their contributions, facilitation, outstanding collaboration and for all of the opportunities we were given, especially in the provision of the data set used in this research. The authors also thank Çağatay Bahadır for his support in the preparation of the article.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Escobar, C.; Morales-Menendez, R. Machine learning techniques for quality control in high conformance manufacturing environment. Adv. Mech. Eng. 2018, 10, 1687814018755519. [Google Scholar] [CrossRef] [Green Version]
Chen, G.; Sheng, B.; Luo, R.; Jia, P. A parallel strategy for predicting the quality of welded joints in automotive bodies based on machine learning. J. Manuf. Syst. 2022, 62, 636–649. [Google Scholar] [CrossRef]
Romero-Hdz, J.; Saha, B.; Toledo-Ramirez, G.; Beltran-Bqz, D. Welding Sequence Optimization Using Artificial Intelligence Techniques, an Overview. Int. J. Comput. Sci. Eng. 2016, 3, 90–95. [Google Scholar] [CrossRef] [Green Version]
Restecka, M.; Wolniak, R. IT systems in aid of welding processes quality management in the automotive industry. Arch. Metall. Mater. 2016, 61, 1785–1792. [Google Scholar] [CrossRef] [Green Version]
Asif, K.; Zhang, L.; Derrible, S.; Indacochea, J.E.; Ozevin, D.; Ziebart, B. Machine learning model to predict welding quality using air-coupled acoustic emission and weld inputs. J. Intell. Manuf. 2020, 33, 881–895. [Google Scholar] [CrossRef]
Hwang, I.; Kanga, M.; Kima, D. Expulsion Reduction in Resistance Spot Welding by Controlling of welding Current Waveform. Procedia Eng. 2011, 10, 2777–2880. [Google Scholar] [CrossRef] [Green Version]
Gujre, S.; Anand, R. Machine learning algorithms for failure prediction and yield improvement during electric resistance welded tube manufacturing. J. Exp. Theor. Artif. Intell. 2019, 32, 601–622. [Google Scholar] [CrossRef]
Gyasi, E.A.; Handroos, H.; Kah, P. Survey on artificial intelligence (AI) applied in welding: A future scenario of the influence of AI on technological, economic, educational and social changes. Procedia Manuf. 2019, 38, 702–714. [Google Scholar] [CrossRef]
Maarif, M.R.; Listyanda, R.F.; Kang, Y.-S.; Syafrudin, M. Artificial Neural Network Training Using Structural Learning with Forgetting for Parameter Analysis of Injection Molding Quality Prediction. Information 2022, 13, 488. [Google Scholar] [CrossRef]
Haapalainen, E.; Laurinen, P.; Junno, H.; Tuovinen, L.; Roning, J. Feature Selection For Identification of Spot Welding Processes. In Informatics in Control Automation and Robotics; Lecture Notes Electrical Engineering; Cetto, J.A., Ferrier, J.L., Costa dias Pereira, J., Filipe, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 15. [Google Scholar] [CrossRef]
Chokkalingham, S.; Chandrasekhar, N.; Vasudevan, M. Predicting the depth of penetration and weld bead width from the infrared thermal image of the weld pool using artificial neural network modelling. J. Intell. Manuf. 2012, 23, 1995–2001. [Google Scholar] [CrossRef]
Sumesh, A.; Rahmeshkumar, K.; Mohandas, K.; Shyam, R. Use of Machine Learning Algorithms for Weld Quality Monitoring using Acoustic Signature. In Proceedings of the 2nd International Symposium on Big Data and Cloud Computing (ISBCC’15), Coimbatorei, India, 12–13 March 2015. [Google Scholar]
Khumaidi, A.; Yuniarno, M.E.; Purnomo, H.M. Welding defect classification based on convolution neural network (CNN) and Gaussian kernel. In Proceedings of the International Seminar on Intelligent Technology and Its Applications (ISITIA), Surabaya, Indonesia, 28–29 August 2017. [Google Scholar] [CrossRef]
Selvi, V.K.; Aravindar, D.J. An Industrial Inspection Approach for Weld Defects Using Machine Learning Algorithm. Int. J. Adv. Signal Image Sci. 2019, 5, 15–21. [Google Scholar]
Ayvaz, S.; Alpay, K. Predictive Maintenance System for Production Lines in Manufacturing: A Machine Learning Approach Using Iot Data in Real-Time. Expert Syst. Appl. 2020, 173, 114598. [Google Scholar] [CrossRef]
Rahmatov, N.; Paul, A.; Saeed, F.; Hong, W.; Seo, H.; Kim, J. Machine Learning–Based Automated Image Processing for Quality Management in Industrial Internet of Things. Int. J. Distrib. Sens. Netw. 2019, 15, 1550147719883551. [Google Scholar] [CrossRef] [Green Version]
Lee, W.J.; Wu, H.; Yun, H.; Kim, H.; Jun, M.B.; Sutherland, J.W. Predictive Maintenance of Machine Tool Systems Using Artificial Intelligence Techniques Applied to Machine Condition Data. In Proceedings of the 26th CIRP Life Cycle Engineering (LCE) Conference, West Lafayette, IN, USA, 7–9 May 2019. [Google Scholar] [CrossRef]
Wuest, T.; Irgens, C.; Thoben, K.D. An approach to monitoring quality in manufacturing using supervised machine learning on product state data. J. Intell. Manuf. 2014, 25, 1167–1180. [Google Scholar] [CrossRef]
Chen, S.; Wu, N.; Xiao, J.; Li, T.; Lu, Z. Expulsion Identification in Resistance Spot Welding by Electrode Force Sensing Based on Wavelet Decomposition with Multi-Indexes and BP Neural Networks. Appl. Sci. 2019, 9, 4028. [Google Scholar] [CrossRef] [Green Version]
Pereverzev, A.E.; Ivanova, I.V.; Maestro, A.; Zarubind, I.A.; Panfaye, W. The use of artificial intelligence to control the processes of welding and direct arc growth under the influence of disturbing factors. IOP Conf. Ser. Mater. Sci. Eng. 2019, 666, 012013. [Google Scholar] [CrossRef]
Afroz, A.S.; Digiacomo, F.; Pelliccia, R.; Inglese, F.; Stefanini, C.; Milazzo, M. Optimization of a wearable speed monitoring device for welding applications. Int. J. Adv. Manuf. Technol. 2020, 110, 1285–1293. [Google Scholar] [CrossRef]
Czimmermann, T.; Ciuti, G.; Milazzo, M.; Chiurazzi, M.; Roccella, S.; Oddo, C.M.; Dario, P. Visual-based defect detection and classification approaches for industrial applications—A survey. Sensors 2020, 20, 1459. [Google Scholar] [CrossRef]
Raut, M.; Achwal, V. Optimization of Spot Welding Process Parameters for Maximum Tensile Strength. Int. J. Mech. Eng. Robot. Res. 2014, 3, 506–517. [Google Scholar]
Hashmi, S. Comprehensive Materials Processing, 1st ed.; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
Mikno, Z.; Pilarczyk, A.; Korzeniowski, M.; Kustron, P.l.; Ambroziak, A. Analysis of resistance welding processes and expulsion of liquid metal from the weld nugget. Arch. Civ. Mech. Eng. 2018, 18, 522–531. [Google Scholar] [CrossRef]
Wan, X.; Wang, Y.; Fang, C. Welding Defects Occurrence and Their Effects on Weld Quality in Resistance Spot Welding of AHSS Steel. ISIJ Int. 2014, 54, 1883–1889. [Google Scholar] [CrossRef] [Green Version]
Sedani, C.; Gawai, B. Optimization of Process Parameters for Resistance Spot Welding Process of HR E-34 Using Response Surface Method. Int. J. Sci. Res. 2016, 3, 2002–2008. [Google Scholar]
Aidun, D.K.; Bennett, R.W. Effect of resistance welding variables on the strength of spot welded 6061-T6 aluminum alloy. Weld. J. 1985, 64, 15–22. [Google Scholar]
Dai, H.; Wang, L.; Dong, B.; Miao, J.; Lin, S.; Chen, S. Microstructure and high-temperature mechanical properties of new-type heat-resisting aluminum alloy Al6.5Cu2Ni0.5Zr0.3Ti0.25V under the T7 condition. Mater. Lett. 2023, 332, 133503. [Google Scholar] [CrossRef]
Reitermanovȃ, Z. Data Splitting. In Proceedings of the 19th Annual Conference of Doctoral Students, Prague, Czech Republic, 1–4 June 2010. [Google Scholar]
McLoone, S.; Pampuri, S.; Schirru, A.; Susto, G. Machine Learning for Predictive Maintenance: A Multiple Classifier Approach. IEEE Trans. Ind. Inform. 2015, 11, 812–820. [Google Scholar] [CrossRef] [Green Version]
Moisen, G.G. Classification and regression trees. In Encyclopedia of Ecology; Jorgensen, E.S., Fath, B.D., Eds.; Elsevier: Oxford, UK, 2008; Volume 1, pp. 582–588. [Google Scholar]
Müller, A.C.; Guido, S. Introduction to Machine Learning with Python: A Guide for Data Scientists; O’Reilly: Beijing, China, 2017. [Google Scholar]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef] [Green Version]
He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R; Springer Texts in Statistics; Springer Science and Business Media: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Wang, C.; Deng, C.; Wang, S. Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost. Pattern Recognit. Lett. 2019, 136, 190–197. [Google Scholar] [CrossRef]
Zhang, L. Machine Learning in Rock Facies Classification: An Application of XGBoost. In Proceedings of the International Geophysical Conference, Qingdao, China, 17–20 April 2017; pp. 1371–1374. [Google Scholar] [CrossRef]

Figure 1. Methodological process and ML Model Flowchart to predict welding quality.

Figure 2. Expulsion prediction architecture.

Figure 3. Controlling the welding current waveform (adapted from Hwang et al., [6]).

Figure 4. ML analytical process.

Figure 5. Resistance and weld time relationship.

Figure 6. Expulsion’s Relationship with other parameters.

Figure 7. Expulsion and heat relationship.

Figure 8. Wear–expulsion relationship.

Figure 9. Decision tree example for the parameters of welding process problems.

Figure 10. Decision tree example for the parameters of welding process problems.

Figure 11. Feature importance plot for GBM algorithm.

Table 1. Literature review regarding machine learning applications for weld quality.

Author	Problem Definition	Approach to Solve Problem	Limitation	Result
Haapalainen et al. (2006) [10]	Reducing welding feature dimension space and selection of welding parameters	Detecting similarity between observations with KNN algorithm	Detection of outlier observations	Improvement in spot welding process identification system
Chokkalingham et al. (2012) [11]	Predicting the depth of penetration and weld bead width from the infrared thermal image of the weld pool	Using artificial neural network modeling	Manipulation of numeric data	Predicting bead width and depth of penetration with good accuracy
Sumesh et al. (2015) [12]	Studying the root cause of weld defects	Extracting features from sound signals and using them as input in the classification algorithm	Data governance	Using arc sound for effective signature in weld quality monitoring
Wuest et al. (2014) [18]	Clustering undesired product states, increasing product quality in the production line	VM for classification and Agglomerative Cluster Analysis		Desired accuracy via VM
Khumaidi et al. (2017) [13]	Examining the welding defect types through image processing and automation of the manual examinations	CNN algorithm to classify defect images	Only one algorithm	Solved the welding defect problem with a multiple classification approach and achieved 95.83% accuracy
Escobar et al. (2018) [1]	Welding quality—defective detection problem	L1-regularized logistic regression learning algorithm and pattern recognition method	-	100% accuracy in detecting defective products
Gujre & Anand, (2019) [7]	Predicting weak-weld-quality tubes	Using model true negative outputs via intelligent classifier algorithm	Model adaptivity	Finding safe operating ranges
Lee et al. (2019) [17]	Determining cutting tools’ correct replacement time in the production line.	SVM algorithm	Only observed the maintenance times	SVM could precisely predict the correct replacement time
Chen et al. (2019) [19]	Improving defect detection accuracy	Backpropagation (BP) feature of neural networks	Only NNs applied	Improved defect detection accuracy
Pereverzev et al. (2019) [20]	The quality of welded joints and process performance	Artificial intelligence to arc welding processes and direct arc growth under disruptive conditions.	The welding process’s non-linearity and time-varying nature challenged a clear mathematical relationship between system’s input and output parameters	Intelligent control algorithms on the quality of welded joints and process performance. Nature of welding process required various approaches
Selvi et al. (2019) [14]	Reduction in welding defects with image classification by selecting the variables that can distinguish the weld defects most effectively	SURF (Speeded-up Robust Features) method; solving the classification problem with AEC (Auto-Encoder Classifier)		AEC classified the weld images differentiating the number of neurons in hidden layers at a rate of 98 per cent accuracy
Rahmatov et al. (2019) [16]	Detection of the abnormal products through image classification	Machine-learning-based approaches		Obtained 92 % accuracy from the model
Ayvaz et al. (2020) [15]	Detection and prediction of signals of potential failures using real-time data	Tree-based machine learning methods such as XGBoost and Random Forest algorithm		The factory production line efficiency can be increased by telling the operators to take preventive actions
Asif et al. (2020) [5]	Acoustic emission (AE) system designed to cover a wide range of frequencies as a real-time monitoring method for gas metal arc weld defects	Sequence tagging and logistic regression algorithms	Integration among different software systems	Deployment of real-time weld quality monitoring
Chen et al. (2022) [2]	To examine the operating status of the welding robot under the current parameter settings and to assess the welding quality of electrode caps under different types of plates in real time with large data sizes	PCA and K-means-based dynamic classification		Proposal of an adaptive parallel machine learning strategy for solder joint quality prediction, relying on noise reduction and classification of the weld process feature value data set

Table 2. Raw variables as candidate features for spot welding parameter prediction.

Variables
Expulsion	Act. Res.	Energy Diff.	Ref. Heat	Spot
ac.pr.st	Act. Volt.	Energy Excd.	Ref. Res.	Time Diff.
Act. Curr.	Act. Weld Time	Product Type	Ref. Volt.	Time Exclusive
Act. Energy	Date/Time	Ref. Curr	Ref. Weld Time	Timer
Act. Heat	Date/Time. 1	Ref. Energy	ref.pr.st	Wear

Table 3. Transformed features.

Variables
Expulsion	Act. Res.	Energy Diff.	Ref. Heat	Spot
ac.pr.st	Act. Volt.	Energy Excd.	Ref. Res.	Time Diff.
Act. Curr.	Act. Weld Time	Product Type	Ref. Volt.	Time Exclusive
Act. Energy	Date/Time	Ref. Curr	Ref. Weld Time	Timer
Act. Heat	Date/Time. 1	Ref. Energy	ref.pr.st	Wear

Table 4. Summary of numerical variables in data set.

	Count	Mean	std	Min	25%	50%	75%	Max
Time Diff	32,609	0.00035	0.00059	−0.004	0.000067	0.00015	0.000433	0.005017
Energy Diff	32,609	269.979985	323.483262	−639.5281	53.60913	177.9087	418.9768	2585.665
Act. Volt.	32,609	1.218594	0.112988	0.88	1.13	1.22	1,3	1.61
Ref. Volt.	32,609	1.21599	0.107169	1.04	1.1	1.2	1,3	1.56
Act. Curr.	32,609	7.782808	0.567765	5.96	7.42	7.84	8,14	9.62
Ref. Curr.	32,609	7.584849	0.421653	6.66	7.35	7.7	7,89	9
Act. Weld Time	32,609	445.757521	77.636114	260	377	450	497	749
Ref. Weld Time	32,609	424.754914	73.38584	260	370	440	460	700
Act. Energy	32,609	4232.302812	691.65053	2714.884	3811.024	4188.688	4518.706	8230.167
Ref. Energy	32,609	3962.322801	654.567956	2713.692	3584.036	3988.157	4215.521	6559.188
Act. Heat	32,609	9480.890787	1095.57221	6261.675	8887.01	9338.058	10,201.37	14,127.79
Ref. Heat	32,609	9260.637848	1038.939809	7263.215	8627.472	9063.936	10,314.32	12,918.02
Act. Res.	32,609	161.933209	17.734518	108	150	165	175	209
Ref. Res.	32,609	162.574228	14.739853	132	149	164	176	197
Act. Proc. Stab.	32,609	87.541722	9.052932	44	84	90	94	99
Ref. Proc. Stab.	32,609	100	0	100	100	100	100	100

Table 5. Summary of categorical variables in data set.

	Capak	Timer	Program	Spot	Wear
count	32,609	32,609	32,609	32,609	32,609
unique	2	1	61	61	140
top	No Expulsion	UNB0160WB02	7	11693_00_1	1
freq	30,192	32,609	2781	2781	276

Table 6. Program, wear, and expulsion relationships.

Program	Mean Wear	Capak	Program	Mean Wear Expulsion	Program	Capak
1	69.806729	0	1	20	1	8
2	50.836559	1	2	70.6	2	5
3	51.829023	2	3	63.428571	3	14
4	52.857245	3	4	67	4	15
7	54.302769	4	7	68.416667	7	36

Table 7. Correlation matrix for actual features.

	Act. Volt.	Act. Curr.	Act. Weld Time	Act. Energy	Act. Heat	Act. Res.
Act. Volt.	1	−0.0486259	−0.284968	0.1444119	0.798258	0.784336
Act. Curr.	−0.0486259	1	−0.441936	−0.159323	0.560379	−0.5922
Act. Weld time	−0.284968	−0.441936	1	0.840589	−0.494969	−0.00686755
Act. Energy	0.1444119	−0.159323	0.840589	1	0.0292443	0.155015
Act. Heat	0.798258	0.560379	−0.494969	0.0292443	1	0.289094
Act. Res.	0.784336	−0.5922	−0.00686755	0.155015	0.289094	1

Table 8. Correlation matrix for reference features.

	Ref. Volt.	Ref. Curr.	Ref. Weld time	Ref. Energy	Ref. Heat	Ref. Res.
Ref. Volt.	1	0.0584367	−0.205209	0.267899	0.870679	0.79766
Ref. Curr.	0.0584367	1	−0.522035	−0.310667	0.516001	−0.498392
Ref. Weld time	−0.205209	−0.522035	1	0.831765	−0.410598	0.0584696
Ref. Energy	0.267899	−0.310667	0.831765	1	0.0918778	0.319594
Ref. Heat	0.870679	0.516001	−0.410598	0.0918778	1	0.412025
Ref. Res.	0.79766	−0.498392	0.0584696	0.319594	0.412025	1

Table 9. OLS regression results.

		OLS Regression Results
Dep. Variable:		Capak		R−squared:	0.309
Model:		OLS		Adh. R−squared:	0.309
Method:		Least Squares		F−statistic:	3404
Date:		Monday, 22 Junuary 2020		Prob (F−statistic):	0
Time:		23:27:47		Log−Likelihood:	2407.8
No. Observations:		22,826		AIC:	−4808
Df Residuals:		22,822		BIC:	−4775
Df Model:		3
Covariance Type:		nonrobust
	coef	std err	t	P > \|t\|	[0.025, 0.975]
const	0.0219	0.02	10.533	0	0.018, 0.026
Dif.weld time.prop	2.0697	0.024	87.28	0	2.023, 2.116
Dif.Curr.prop	−1.9644	0.055	−35.549	0	−2.073, −1.856
Dif.Res.prop	−0.7303	0.026	−28.25	0	−0.781, −0.680
Omnibus:	9474.822	DurbinWatson:			1.964
Prob(Omnibus):	0	Jarque–Bera(JB):			60,064.871
Skew:	1.884	Prob(JB)			0
Kurtosis:	9.996	Cond. No.			40.1

Table 10. Confusion matrix score for SVM.

	Precision	Recall	F1-Score	Support
0	0.96	0.99	0.97	9058
1	0.86	0.42	0.57	725
accuracy			0.95	9783
macro avg	0.91	0.71	0.77	9783
weightedavg	0.95	0.95	0.94	9783
SVM References
Prediction			FALSE	TRUE
	No Expulsion		9010	48
	Expulsion		420	35
Count of Trainset Observation				22,826
Count of Testset Observation				9783

Table 11. Confusion matrix for GBM.

	Precision	Recall	F1-Score	Support
0	0.99	1	1	9058
1	0.97	0.93	0.95	725
accuracy			0.99	9783
macro avg	0.98	0.96	0.97	9783
weightedavg	0.99	0.99	0.99	9783
GBM References
Prediction	No Expulsion		FALSE	TRUE
	9037			21
	Expulsion		50	675
Count of Trainset Observation				22,826
Count of Testset Observati on				9783

Table 12. Confusion matrix for decision tree.

	Precision	Recall	F1-Score	Support
0	0.99	0.99	0.99	9058
1	0.91	0.88	0.9	725
accuracy			0.98	9783
macro avg	0.95	0.94	0.94	9783
weightedavg	0.98	0.98	0.98	9783
Decision Tree References
Prediction			FALSE	TRUE
	No Expulsion		8992	66
	Expulsion		84	641
Count of Trainset Observation				22,826
Count of Testset Observati on				9783

Table 13. Confusion matrix for random forest.

	Precision	Recall	F1-Score	Support
0	0.99	1	0.99	9058
1	0.94	0.89	0.92	725
accuracy			0.99	9783
macro avg	0.97	0.94	0.96	9783
weightedavg	0.99	0.99	0.99	9783
Random Forest
References
	FALSE		TRUE
No Expulsion	9017		41
Expulsion	77		648
Count of Trainset Observation				22,826
Count of Testset Observation				9783

Table 14. Confusion matrix for XG Boost.

	Precision	Recall	F1-Score	Support
0	1	1	1	9058
1	1	1	1	725
accuracy			1	9783
macro avg	1	1	1	9783
weightedavg	1	1	1	9783
XGBoost References
		FALSE	TRUE
No Expulsion		9058	0
Expulsion		0	725
Count of Trainset Observation				22,826
Count of Testset Observati on				9783

Table 15. Classification precision, recall, and F1 scores of the ML models.

Model	Precision	Recall	F1	Accuracy
Logistic Regression	0.65	0.30	0.41	0.94
KNN	0.86	0.79	0.83	0.98
SVM	0.86	0.42	0.57	0.95
Decision Tree	0.91	0.88	0.90	0.98
Random Forest	0.94	0.89	0.92	0.99
GBM	0.97	0.93	0.95	0.99
XGBoost	0.94	0.93	0.92	0.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bayır, M.; Yücel, E.; Kaya, T.; Yıldırım, N. Spot Welding Parameter Tuning for Weld Defect Prevention in Automotive Production Lines: An ML-Based Approach. Information 2023, 14, 50. https://doi.org/10.3390/info14010050

AMA Style

Bayır M, Yücel E, Kaya T, Yıldırım N. Spot Welding Parameter Tuning for Weld Defect Prevention in Automotive Production Lines: An ML-Based Approach. Information. 2023; 14(1):50. https://doi.org/10.3390/info14010050

Chicago/Turabian Style

Bayır, Musa, Ertuğrul Yücel, Tolga Kaya, and Nihan Yıldırım. 2023. "Spot Welding Parameter Tuning for Weld Defect Prevention in Automotive Production Lines: An ML-Based Approach" Information 14, no. 1: 50. https://doi.org/10.3390/info14010050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spot Welding Parameter Tuning for Weld Defect Prevention in Automotive Production Lines: An ML-Based Approach

Abstract

1. Introduction

2. Background

3. Methodology

3.1. Welding Process Parameters and Data Explanation

3.2. Predicting the Expulsion in the Welding Process Outputs of Automotive Production Line

3.2.1. Variable Setting and Feature Selection

3.2.2. Data Preparation and Selection for ML Application

3.2.3. ML Application: Models Selection and Findings

4. Results

4.1. Logistic Regression

4.2. Support Vector Machine (SVM) Model

4.3. GBM Algorithms

4.4. Decision Tree Model and Random Forest Model

4.5. XG Boost

4.6. Evaluation of the Results from All Models

5. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI