Integrating Multivariate Statistical Analysis Into Six Sigma DMAIC Projects: A Case Study on AISI 52100 Hardened Steel Turning

DMAIC (define, measure, analyze, improve and control) is one of the most utilized methods for guiding practitioners in the decision-making process of quality improvement projects. Industrial processes commonly deal with multiple critical-to-quality (CTQ) characteristics. When these characteristics are correlated, multivariate statistical techniques should be applied. This paper aims to propose a domain-specific Six Sigma method, the MDMAIC (multivariate DMAIC). The new stepwise procedure helps practitioners not only to reduce problem dimension but also to take account of the correlation structure among CTQs during the decision-making process. Principal component analysis has been applied for assessing the measurement system, analyzing process stability and capability, as well as modeling and optimizing multivariate manufacturing processes. A hardened steel turning case has been presented for proposal validation. The result analysis has shown that the MDMAIC was very successful in leading the practitioner during the steps and phases of the quality improvement project. The multivariate capability index of the enhanced process emphasized the substantial economic improvement.


I. INTRODUCTION
Continuous improvement has been implemented to several firms as a quality management strategy due to its capacity of providing higher competitive advantages [1]- [3]. Currently, Six Sigma has been adopted as a refined continuous improvement philosophy to improve organizational efficiency and customer satisfaction by decreasing operating costs and increasing profits [4]- [6]. It is defined by Linderman et al. [7] as an organized and systematic methodology for not only improving a strategic process but also developing new products and services. By this methodology, the significant reductions in defect rates are often achieved using statistical and scientific methods.
Initially used as a method to reduce variation, DMAIC (define, measure, analyze, improve and control) has been The associate editor coordinating the review of this manuscript and approving it for publication was Seyedali Mirjalili .
implemented in practice as a generic approach for problem solving [5], [8], [9]. This method is, as any other problem solving approach, subjected to power/generality trade-off, which has first resulted in the evolution towards a more generality and later into a large number of domain-specific adaptations. De Mast & Lokkerbol [10] have concluded that DMAIC method is applicable to a wide range of well-structured and semi-structured problems. It serves as routine to organize problems, in order to turn them into wellstructured problems.
Several researches have applied the DMAIC method as structured procedure to solving manufacturing problems with multiple CTQs. Some manufacturing applications are summarized as follows: automotive [11]- [14], casting [15], direct selling [16], extrusion [17], iron ore [18], printed circuit boards [19], [20], textile [21], [22], touch panel [23], white goods [24], services [1], [25], [26] and education [27]. In addition to the aforementioned papers, the book ''World class application of Six Sigma: real world examples of success'', by Antony et al. [28], brings other interesting manufacturing applications. A worth mentioning paper is Chang et al. [29], which describes the application of a Six Sigma project, using DMAIC, for integrating statistical process control (SPC) to engineering processes control. In the analyze phase, the authors have used multivariate control charts, Hotelling T2, to evaluate six quality characteristics of a curing process of high-pressure hose products. However, a Six Sigma project, in fact, cannot be restricted to SPC techniques.
Taking into account that industrial processes commonly deal with multiple critical-to-quality (CTQ) characteristics [14], few researches have been conducted using multivariate approaches and DMAIC procedure to solving manufacturing problems. In such complex systems, the correlation among CTQs cannot be neglected due to its influence on the optimization results [30]. This effect destabilizes the mathematical models producing errors in the regression coefficients. As a result, estimated models are unable to represent the objective or constraint functions properly [31]- [34].
This research aims to propose a domain-specific DMAIC, the MDMAIC (Multivariate: Define, Measure, Analyze, Improve, Control), to solving manufacturing problems with multiple correlated CTQs. Principal component analysis (PCA) has been utilized for integrating a multivariate approach to the Six Sigma method.

II. MULTIVARIATE SIX SIGMA METHOD
The generic Six Sigma's stepwise strategy proposed by De Koning & De Mast [8] has been modified to conceive a method for dealing with multivariate processes. Fig. 1 presents the proposed multivariate method, the MDMAIC. In the following subsections are shown PCA and the main multivariate statistical techniques, based on PCA, in MDMAIC method.

A. PRINCIPAL COMPONENT ANALYSIS
PCA has been extensively used to summarize the common patterns of variation among variables [35], [36]. Algebraically, PCA is a linear combination of q random variables CTQ 1 , CTQ 2 , . . . , CTQ q . Geometrically, these combinations determine a new coordinate system when rotating the original system [37], [38]. The coordinates of the axes now have the variables CTQ 1 , CTQ 2 , . . . , CTQ q and represent the direction of the maximum. The principal components are uncorrelated and depend only on the variance-covariance matrix , or the correlation matrix R, of original variables. PCA development does not require the assumption of multivariate normality. PCA provides pairs of eigenvalueseigenvectors (λ 1 , e 1 ) , (λ 2 , e 2 ) , . . . , λ q , e q , where λ 1 ≥ λ 2 ≥ . . . ≥ λ q ≥ 0 are eigenvalues for obtaining percentage of explanation for each principal component and e i are eigenvectors for estimating the component scores using (1).

B. DEFINE PHASE
Initially, the relevant process should be mapped in order to provide the same level of knowledge for the project's team. SIPOC (suppliers-input-process-output-customers) is a simple and useful tool for identifying suppliers, inputs, the high-level process flow, outputs, and customers. Moreover, the project charter should be created, stating the problem, objectives, goals, scope, schedule, team, and potential financial benefits of the project [28].

C. MEASURE PHASE 1) SELECT CTQS AND VALIDATE MEASUREMENT SYSTEM
After selecting the critical-to-quality characteristics (CTQ), the measurement system should be validated. ANOVA (analysis of variance) method for GR&R studies can be applied only to univariate data. In dealing with multiple correlated CTQs, multivariate methods are more suitable for estimating the evaluation indices of these measurement systems [36], [39], [40]. A multivariate GR&R model using q quality characteristics, p parts, o operators, and r replicates can be written as (2) [36], [40]: where µ is a constant and α i , β j , (αβ) ij , ε ijk are independent normal random variables with zero mean and variance, σ 2 α , σ 2 β , σ 2 αβ , andσ 2 ε , for part-to-part (process), operator, part * operator interaction, and error term, respectively. Similarly to a univariate model, these variance components can be translated into GR&R notation as (3): After variance components have been calculated for a multivariate GR&R study, R&R PCi and ndc PCi indices for v (withv ≤ q) principal components can be estimated by using (4) and (5). Then, multivariate evaluation indices can VOLUME 8, 2020 be agglutinated as (6) and (7): 2) CURRENT PROCESS CAPABILITY Process capability indices have been widely used to determine supplier's ability to deliver quality products [41]. Nevertheless, it is only recommended that the capability be evaluated when the process is under statistical control. Control charts are statistical tools commonly applied to assess process stability. According to Montgomery [42], Hotelling T 2 is the most familiar procedure for monitoring and controlling the mean vector of the process. Using i = 2, 3 . . . , n sample size, j = 1, 2, . . . , q quality characteristics and k = 1, 2, . . . , m subgroups, the test statistic T 2 is given by: where, CTQ is the mean of CTQ ijk values, CTQ is the mean vector of CTQ j (mean of CTQ jk values), and S is the sample covariance matrix. For retrospective analyses (phase 1), control limits for the T 2 control chart can be estimated as follows: where F represents the F distribution. Process variability can be monitored by the sample generalized variance, |S|. This statistic, which is the determinant of the sample covariance matrix, can be used as a measure of multivariate dispersion [42]. Control limits for |S| would be: where and (n − r) (12) In (10), if the calculated LCL is less than zero, the lower control limit is assumed to be zero.
Taking consideration of in-control systems, process capability indices (PCIs) provide numerical measures of whether or not a manufacturing process is capable to meet a predetermined level of production specification [41], [43]. C p , C pk , C pm and C pmk are the most used PCIs and are calculated as such: where USL and LSL are the upper and lower specification limits respectively, T is the target value, CTQ is the process mean, σ is the process standard deviation, M = (USL + LSL)/2 is the mid-point of the specification interval and d = (USL − LSL)/2 is the half length of the specification interval. For the multivariate case, (1) is used to determine the specification limits of the i th principal component [42]: where LSL, USL and T must be standardized specification limits if correlation matrix is used. Wang & Chen [44] proposed the MPCIs (multivariate process capability indices) MC p , MC pk , MC pm and MC pmk as follows: is the univariate measure of capability for the i th principal component, σ PC i = √ λ i and v denotes the number of principal components used to assess the capability. Similarly, they defined MC p , MC pm and MC pmk by replacing C pk;PCi with C p;PCi , C pm;PCi , C pmk;PCi , respectively, for i = 1, 2, . . . , v.

D. ANALYZE PHASE
In this phase, key process variables that cause defects should be identified. Design of experiments (DOE) along with hypothesis testing, analysis of variance (ANOVA) and Pareto chart are effective statistical tools for process modeling and optimization [45]. In order to determine which factors are statistically significant, the analysis of variance in Table 1 can be estimated.
Where a is the number of levels in factor A; b is the number of levels in factor B; n is the number of observations; p is the number of factors; PC i.. is the mean of the i th level of factor A; PC ... is the overall mean of principal components; PC .j. is the mean of the j th level of factor B; PC ijk is the principal component at the i th level of factor A, j th level of factor B, and k th replicate; and PC ij. is the mean of the i th level of factor A and j th level of factor B.
After estimating each component in the ANOVA table, each factor and interactions are evaluated by p-values taking 0.05 as significance level. After the screening stage, designs that are more comprehensive should be implemented in order to build a mathematical model and then finding the factor settings that produce optimal process performance [45].

E. IMPROVE PHASE 1) QUANTIFY RELATIONSHIP BETWEEN XS, CTQS AND PCS
After ''the vital few'' controllable factors have already been identified, response surface methods are useful designs for process optimization. If there is curvature in the experimental region, the approximating function, such as the second-order model in (18), is usually employed [35].
where β is the polynomial coefficient, x is controlled factors, k is the number of factors and ε is the random error term. The ordinary least squares (OLS) method is utilized to estimate β coefficients by using: where X is the matrix of independent variables and CTQ is the dependent variable. Curvature is assessed by the analysis of center points in the experimental design. Using (1), v principal components can also be fitted by a second-order model according to Eqs. (20) and (21).

2) OPTIMIZE PROCESS THROUGH PCS
According to Montgomery & Woodall [46], one of the main features of Six Sigma is the focus on variability reduction around the process' target. This information can be translated into MOOP as MSE (mean square error) functions for simultaneous optimization of mean and variance. The multivariate version of mean square error functions (MMSE), based on PCA, can be written as such [38], [47]: where  The objective is to obtain X * that can minimize not only the distance of expected mean (PC i ) from the target (T PC i ) but also the process variability (λ i ). The objective function is subjected only to the experimental region of interest defined by X T X ≤ ρ 2 . For a central composite design, a logical choice is the experimental axial distance [35], [38]. To solve this constrained nonlinear optimization system, GRG (generalized reduced gradient) is one of the most robust optimization algorithms [47], [48]. Validation experiments are required to verify whether the optimal solution is feasible.

F. CONTROL PHASE
Before implementing ongoing measures and actions to sustain improvement, process capability analysis must be conducted in order to check the optimized process capability. Finally, a phase 2 control chart study can be used for monitoring the mean vector of future production. The control limits are as follows [42]:

III. NUMERICAL EXAMPLE A. DEFINE PHASE
Hardened steel turning is a precision machining process highly productive and cost effective [49].  χ r = 95 • , has been adopted. Fig. 2 illustrates the AISI 52100 hardened steel turning.

B. MEASURE PHASE 1) SELECT CTQS AND VALIDATE MEASUREMENT SYSTEM
Roughness parameters such as R a (arithmetic average) are widely used in most manufacturing processes for assessing the quality of surface finishing of a work piece [39]. However, R a alone is incapable of describing a surface completely. Hence, R y (maximum roughness), which provides information about the deterioration of the vertical surface part, has also been adopted as a critical-to-quality characteristic.
To validate the measurement system, the multivariate GR&R study used p = 10 parts, o = 1 operator, and r = 3 replicates (Table 2). A portable roughness checker, set to a cut-off length of 0.25 mm, was utilized (Fig. 2). Table 3 shows that PCA was applied to R a and R y roughness parameters using the correlation matrix. PC 1 explained 97.91% of total variation from original CTQs and was the only principal component scores evaluated. These scores were adjusted by using analysis of variance, according to the model in (2). Table 4 presents the square root of variance component for this multivariate GR&R study. (4) -(7) were used to calculate the measurement system' evaluation indices. %R&R m = 7.74% and ndc m = 18 suggest that the measurement system is deemed acceptable (guidelines for acceptable measurement system: %R&R m < 10% and ndc m > 5 [29]).

2) CURRENT PROCESS CAPABILITY
Before assessing process capability, control charts should be used to verify process stability. Using (8)-(12), Fig. 3 shows Hotelling T 2 and |S| control charts for checking mean and   covariance stabilities, respectively. As can be seen from these charts, the multivariate process is under statistical control.
Turning now to the process capability analysis, Table 5 summarizes some descriptive statistics and specification limits. Table 3 shows that PC 1 explains 96.04% of total variation from the original CTQs, so that PC 1 was the only one principal component considered. For this particular case, there is only upper specification limits; thereby, MC pk has been the only one multivariate process capability index estimated. Specification limit for PC 1 was calculated by using (17). After that, (18) and (19) were used for obtaining the multivariate process capability index. MC pk = 0.66 determines that about 2% of defects are expected in this multivariate process. The goal is to increase this MPCI in order to obtain at least 1.33.

C. ANALYZE PHASE
In this phase, influencing factors and causes that affect CTQs' behavior are identified and the most significant ones are selected. Table 6 presents the control variables with their  respective levels for building a central composite design. It was adopted 8 corner points, 6 axial points, 5 center points and ρ = 1.682 in this response surface design. The sequential set of experimental runs was conducted and stored in Table 7. Table 3 provides the PCA results with 98.29% of total variation accounting for the first principal component.
Before fitting a response surface model for PC 1 , analysis of variance was assessed in order to identify the adequacy of a full quadratic model. As can be seen from Table 8, there were several non-significant terms included in the full quadratic model. Additionally, Fig. 4 illustrates how significant each factor was. It is essential to highlight that the most variation in PC 1 is due to the feed rate factor. In order to find the best fit for this turning process, several models were analyzed, taking account of lack-of-fit test, Anderson-Darling normality test for residuals and adjusted coefficient of determination (R 2 Adj ). Non-significant impact on PC 1 was provided by the cutting speed factor. Therefore, this factor was removed from the final reduced response surface model, as seen in Table 8.

D. IMPROVE PHASE 1) QUANTIFY RELATIONSHIP BETWEEN XS, CTQS AND PCS
Response surface models for R a , R t and PC 1 were built by using (18) and (19). The reduced models described in (25) - (27) can be illustrated through contour and surface plots. As shown in Fig. 5, low level of feed rate minimizes the scores of PC 1 .

2) OPTIMIZE PROCESS THROUGH PCS
Owing to the fact that PC 1 is positively related to R a and R y (see eigenvectors in Table 3), minimizing PC 1 is the same as minimizing both R a and R y . Therefore, the original constrained multi-objective optimization problem can be simplified by a constrained single objective problem, using (22) and (23), as follows: 864)] 2 + 1.966 Subject to: X T X ≤ 1.682 2 (28) In this particular case, the target for each CTQ was obtained from T CTQ = Min  Applying the GRG algorithm into Eq. (29), the optimal setting using coded units was (-1,480; -0,799) for this multivariate process. This solution (F = 0, 152 mm/rev and D = 0, 165 mm -uncoded units), which attends all the constraints, must be validated by a pilot test in order to compare the optimized process capability to the baseline.

E. CONTROL PHASE
Hotelling T 2 and |S| control charts were applied, by using (8)- (12), to verify mean and covariance stabilities, respectively. As can be seen from Fig. 6, the multivariate process is incontrol. Assessing now process capability, Table 5 summarizes some descriptive statistics and specification limits for the validation test. Table 3 determines that PC 1 explains 95.61% of total variation from the original roughness parameters, thus PC 1 was the only one principal component taken into account. Specification limit for PC 1 was calculated by  using (17) and MC pk by using (18) and (19). According to the index MC pk = 4.55, the probability of producing defective parts was extremely reduced at levels lower than 0.00%. Further, phase 2 control charts, using Eq. (21) as control limit, could be applied for sustaining improvements.
Basically, the computational complexity of the proposed method is increased by having to conduct PCA before performing the other MSA, SPC, DOE and MOOP studies. On the other hand, the reduction in dimensionality imposed by PCA results in reduced computational efforts when conducting the other studies of MSA, SPC, DOE and MOOP. Thus, the number of principal components is usually smaller than the original set of correlated variables.
Finally, this numerical example showed how to define, measure, analyze, improve and control processes with multiple CTQs using Six Sigma/DMAIC and PCA. When evaluating systems with multiple CTQs, univariate methods often generate inconclusive results for validating measurement systems, determining process capability and defining the optimum operating condition. Multivariate methods reduce the size of the problem and provide conclusive results when conducting MSA, SPC, DOE, and MOOP studies.

IV. CONCLUSIONS
Complex industrial processes generally deal with multiple correlated critical-to-quality characteristics. Thus, multivariate statistical techniques are required to measure, analyze, improve and control such applications. Literature presents several papers applying multivariate approaches to either MSA, SPC, DOE or MOOP problems. Nevertheless, combining these approaches with a well-structured procedure is demanded to adequately solving problems of multivariate manufacturing processes. The domain-specific MDMAIC method was proposed to integrate contemporary multivariate techniques into the decision-making process of Six Sigma projects. The numerical example has shown how a multivariate process can be properly assessed and optimized. Additionally, the following conclusions are addressed: • PCA effectively reduced problem dimension while applying the multivariate version of MSA, SPC, DOE and MOOP techniques; • According to the evaluation indices %R&R m and ndc m , the measurement system that assesses roughness parameters was validated by using the multivariate GR&R study; • Multivariate process capability indices were calculated in order to determine the economic losses before and after process improvement; • The multivariate manufacturing process was enhanced by adopting the successful RSM-PCA approaches coupled with MSE functions for simultaneously optimizing mean and variance of multiple correlated CTQs. The MC pk has been increased from 0.66 to 4.55; Finally, the numerical example of the hardened steel turning process has shown that the MDMAIC method was considered efficient and effective in leading the practitioner to the problem solution. TARCÍSIO G. BRITO received the degree in mechanical engineering, specialization in environmental engineering, the master's degree in mechanical engineering, and the Ph.D. degree in production engineering from the Federal University of Itajubá, in 2015. He is currently an Adjunct Professor C1 with the Federal University of Itajubá, Brazil. Has experience in mechanical engineering and production, acting on the following subjects: manufacturing processes, technical drawing I and II, machine elements, machine dynamics, and quality control. He works in the area of design and analysis of experiments and operational research.
ANDERSON P. PAIVA received the degree in mechanical engineering, the master's degree in production engineering, and the Ph.D. degree in mechanical engineering from UNIFEI. He is currently an Associate Professor III with the Federal University of Itajubá (UNIFEI / IEPG). He works in the area of Design and experiment analysis, multivariate statistics and optimization methods. His main line of research is manufacturing process optimization.