An Efficient Estimation and Classification Methods for High Dimensional Data Using Robust Iteratively Reweighted SIMPLS Algorithm Based on nu-Support Vector Regression

The statistically inspired modification of the partial least squares (SIMPLS) is the most commonly used algorithm to solve a partial least squares regression problem when the number of explanatory variables (<inline-formula> <tex-math notation="LaTeX">$p$ </tex-math></inline-formula>) is larger than the sample size (<inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula>). Nonetheless, in the presence of irregular points (outliers), this method is no longer efficient. Therefore, the robust iteratively reweighted SIMPLS (RWSIMPLS), which is an improvement of the SIMPLS algorithm, is put forward to remedy this problem. However, the RWSIMPLS is still not very efficient with regard to its parameter estimations and outlier diagnostics. It also suffers from long computational times. This paper proposes a new robust SIMPLS that incorporates a new weight function constructed from <italic>nu</italic>-Support Vector Regression in its establishment. We call this method the robust iteratively reweighted SIMPLS based on <italic>nu</italic>-Support Vector Regression, denoted as SVR-RWSIMPLS. To avoid misclassification of observations, a new diagnostic plot is proposed to classify observations into regular observations, vertical outliers, good (GLPs) and bad leverage points (BLPs). The numerical results clearly indicate that the SVR-RWSIMPLS is more efficient, more robust and has less computational running times than the RWSIMPLS when multiple leverage points and vertical outliers exist. The proposed diagnostic plot is also very successful in classifying observations into correct groups.


I. INTRODUCTION
The ordinary least squares (OLS) technique is a widely used method to estimate the parameters of a linear regression model. Under certain assumptions, the OLS has fine and attractive properties. Nonetheless, OLS estimates are inefficient as their standard errors become inflated when multicollinearity is present in the data. Multicollinearity occurs when two or more independent variables are correlated, and this scenario often occurs especially when dealing with high dimensional data where p n. In this situation, OLS fails to produce estimates because the X X matrix becomes singular.
The associate editor coordinating the review of this manuscript and approving it for publication was Yanbo Chen . As a solution to this problem, the Partial Least Squares Regression (PLSR) method is developed to build a regression model with multicollinear data where data are in high dimensional space. The main aim is to extract uncorrelated latent variables (components) using algorithms [1]. The Non-Linear Iterative Partial Least Squares (NIPALS) and the statistically inspired modification of the PLS (SIMPLS) [2] are very popular algorithms used to iteratively extract such components. When dealing with one or multivariate responses, the partial least squares regression is commonly referred to as PLS1 or PLS2, respectively. This article focusses only on PLS1 regression and SIMPLS.
It is now evident that the SIMPLS algorithm is very sensitive to outliers due to the use of the OLS estimation VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ technique and non-robust covariance matrix in acquiring the components. Several robust PLSR algorithms have been proposed to deal with this problem. Wakelinc and Macfie [3] were the first to introduce the algorithm by replacing univariate OLS regression steps with the robust alternatives; however, their approach received criticism as they employed non-robust initial weight. The iteratively reweighted algorithm by Cummins and Andrews [4] also received criticism because it was not able to handle outlying observations in the X -direction, popularly known as high leverage points (HLPs) or simply call leverage points. Then Hubert and Branden [5] proposed a robust covariance matrix into the SIMPLS algorithm, followed by Serneels et al. [6] who developed a partial robust M-regression estimator (PRM). Alin and Agostinelli [7] developed the Robust Iteratively Reweighted SIMPLS (RWSIMPLS) by employing a weight function introduced by Markatou et.al. [8] that relies on the response variable, the chosen model distribution and the sample empirical distribution. They found that despite having longer computational times, the RWSIMPLS outperformed the SIMPLS and PRM in terms of having the smallest MSEs, the smallest empirical influence function values and highest empirical breakdown values. However, using this weight function has shortcomings as there is no specific rule to indicate which observations are the outliers. Moreover, their diagnostic plots to classify observations into four categoriesregular observations, vertical outliers (outlying observations in Y -space), good leverage points (outlying observations in X space where they follow the pattern of the rest of the data) and bad leverage points (outlying observations in both X -space and Y -space) were constructed by plotting the robust standardized residuals versus the leverage values. This turned out to be not a good approach since it is now evident that leverage values are not very successful in identifying HLPs [9]. Motivated by the fact that employing the weighting function in the RWSIMPLS algorithm has shown to be more efficient than the SIMPLS and PRM, our primary aim is to perform some modifications to the existing RWSIMPLS algorithm by integrating a weight function based on our new proposed nu-Support Vector Regression for the detection of HLPs [10] to produce more efficient estimates with less computing times. Our secondary aim is to propose a better approach of classifying observations into the four categories mentioned above. The paper is organized as follows. The Robust Iteratively Reweighted SIMPLS Algorithm (RWSIMPLS) is briefly discussed in Section II, followed by the discussion on our proposed method in Section III. Section IV presents the results of a simulation study and two real datasets, followed by the conclusion of the study in Section V.

II. THE ROBUST ITERATIVELY REWEIGHTED SIMPLS ALGORITHM (RWSIMPLS)
Several types of PLSR algorithms have been introduced recently, with the main aim to build a multivariate (multiresponse) linear regression model, where the response variables Y = (y 1 , y 2 , . . . , y m ) is n * m matrix of m response variables; X = (x 1 , x 2 , . . . , x p ) is n * p matrix of p predictors, β = (β 1 , . . . , β m ) is (p+1) * m matrix; and ε i = (ε 1 , . . . ., ε m ) is n * m matrix of random errors. Hence, the general multivariate model for PLSR algorithm is as shown below, where the n × A component score matrices of X and Y are represented by T = XW = (t 1 , . . . , t A ) and U = YC = (u 1 , . . . , u A ) , respectively whereby the number of components denoted as A is smaller than or equal to p. The p × A and m × A loading matrices of X and Y are represented by K = (k 1 , . . . , k A ) and Q = (q 1 , . . . , q A ) , respectively. The p × A matrix W and the m × A matrix C are the weight matrices. The aim of PLSR is to build a relation between X and Y via uncorrelated latent variables that can be extracted such that the components of response and predictor variables will have maximum covariance. Hence, adding a constraint which is to have score vectors with length 1 will produce a unique maximization. The initial score vectors t1 and u1 are the solutions to the maximization problem (4) under the The singular value decomposition can be used to obtain the weight vectors w and c. Hence, the maximum covariance matrix of Equation (4) can be reached through vectors w1 and c1 corresponding to the largest singular value of X and Y [11]. Additional constraints need to be added to obtain score vectors. The orthogonal of score vectors is usually considered as t a t j = 0 and u a u j = 0 The empirical covariance matrix of the mean centered data that is highly sensitive to the outer points is deflated at the outset of SIMPLS. The regression coefficient matrices of X and Y on their corresponding score matrices are represented by the loadings of K and Q, respectively. All A components are iteratively extracted by utilizing the OLS estimation technique and nonrobust covariance matrix which make this algorithm very sensitive to outliers and less effective with non-normal errors.
To rectify this problem, Alin and Agostinelli [7] established the Robust Iteratively Reweighted SIMPLS (RWSIMPLS) by utilizing weight function constructed by Markatou et al. [8]. The weight is integrated in the algorithm of RWSIMPLS with the main aim of reducing influential observations. The weight function relies on the response variable, the chosen model distribution, and the sample empirical distribution and is defined as follows: where r i (β) is the ith residual and A is a residual adjustment function. The weight takes a value between 0 and 1 in [0,1]. [8], [12] developed a weighted likelihood estimation based on those weights. They noted that weight value equal to 1 corresponds to the maximum likelihood estimates, represented by A(δ) = δ. Moreover, A(δ) = 2(δ + 1)1/2 − 1 corresponds to the Hellinger distance estimate that is more constant than the maximum likelihood estimates if certain assumptions fail. Alin and Agostinelli [7] utilized the Hellinger residual adjustment function to calculate the weights. In the first step of the RWSIMPLS algorithm, the median is employed to center the variables X and Y . The RWSIMPLS algorithm is based on deflating the weighted covariance matrix S w = (X w ) Y w , where X w and Y w are the weighted matrices obtained by multiply every row by the square root of where the w r is is the weight of the ith residual corresponding to the sth response. Please refer to Alin and Agostinelli [7] for details on the RWSIMPLS algorithm.

III. THE PROPOSED SVR-RWSIMPLS ESTIMATOR BASED ON nu-SVR A. CHOICE OF WEIGHT FOR THE PROPOSED METHOD
In regression, outliers can be classified into vertical outliers, residual outliers, and leverage points [9]. Vertical outliers are observations that are outlying in the Y -space while residual outliers are observations that have large residuals. Outlying observations in the X -space is referred to as leverage points and it can be classified further into good and bad leverage points. Among the three types of outliers, leverage point has the most detrimental effect on the computed values of various estimates which leads to misleading conclusion about the fitted regression model [13]. Hence robust methods that are able to reduce/eliminate the effect of leverage points should be used as an alternative to least squares method. Coakley and Hettmansperger [14] proposed a new class of Generalized-M estimator (GM6) to reduce the effect of multiple leverage points in regression model. The shortcoming of this method is that its efficiency tends to decrease as the number of 'good' leverage points increases. Dhhan et al. [15] improved the efficiency of GM6 estimator by developing a GM estimator based on fixed parameters ε-tube support vector regression (FP-ε-SVR) that is highly efficient and has good resistance against multiple leverage points. Other robust methods that able to eliminate the effect of multiple leverage points with high robustness and computing efficiency when applied to energy system data, can be found in [16], [17]. The RWSIMPLS algorithm employed weight with the main aim of reducing influential observations (IOs). According to Belsley et al. [18], IOs are those observations which are either done alone or together with several other observations, and have an adverse effect on the computed values of various estimates. In this connection, vertical outliers and bad leverage points can be considered as IOs. As noted [9], [19], [20], good leverage points are not considered as IOs because they follow the pattern of the majority of the data. They may contribute to precise estimates since they have lesser or no impact on OLS estimates. As such, to get efficient estimates, only BLPs and VOs, not GLPs, are down weighted [15]. Hence, an efficient weight function should be formulated so that only genuine IOs will be assigned with smaller weight and GLPs is assigned with weight 1. The RWSIMPLS algorithm which employs the weight function has improved its performance compared to the SIMPLS and PRM algorithm. Nonetheless, the use of weighting function [8] in Equation (6) has its shortcomings due to the absence of specific rules to indicate which observations are the IOs. In view of this, we propose the weight function based on an efficient diagnostic method of identifying bad leverage points and vertical outliers in high dimensional data. A good weight function is one that depends on the efficient diagnostic method that is able to correctly identify VOs and BLPs.
Firstly, we developed a diagnostic robust method to identify outliers (vertical and bad leverage points) based on nu-SVR. Then we minimized their effects by assigning lower weights and incorporating this weighting function in our new proposed RWSIMPLS algorithm which is denoted as SVR-RWSIMPLS. The SVR is non-parametric approach that follows a statistical learning theory and the concept of structural risk reduction. Dhhan et al. [21] developed fixed parameters ε-tube SVR (FP-ε-SVR) to identify outliers (vertical and bad leverage points). However, this method was not very successful in detecting outliers. Motivated by their work, Abdullah et al. [10] developed nu-SVR which proved to be highly effective in identifying multiple outliers (VOs and BLPs) under various settings. The proposed method has managed to identify the actual number of outliers in a data set by taking control of free parameters (C = 200, nu = 0.22 and the kernel parameter h = 0.05).
In the nu-tube-SVR, the non-sparseness property of the εtube-loss function is used to detect abnormal points in data. The nu-tube loss function is defined as follows: As noted by [22], [23], the convex optimization problem can be rephrased as: Equation (7) can be written as the next convex quadratic problem The weight vector and final regression function are represented as follows: where α * i and α i are the Lagrange multipliers; K (x i .x) is the kernel function; and b is the bias term of the SVR model. Controlling the values of the parameters of the nu-SVR approach will reduce the effect of outliers in the solution as mentioned by Rojo-Álvarez et al. [24]. Moreover, the highest Lagrange multipliers (α * i and α i ) belong to the outlying point in the data set which is considered an outlier. As a result, the values of Lagrange multipliers based on the cost parameter value C will be affected by any change in the cost value. The nonparametric approach is advantageous due to its flexibility which allows for a significant change in the estimation value corresponding to the abnormal point with a small change to the estimation values corresponding to the rest of the data. To be precise, the parameter nu works automatically to minimize the value of the ε parameter and controls the number of the support vector that is allowed to use all observations. In return, it assists in diagnosing the outliers from the first iteration. The cost parameter C produces a high variance between Lagrange multipliers (α * i and α i ), implying a significant value of Z belonging to outliers which can be identified using a specific cut-off point criterion. Since the distribution of Z i is intractable, following Habshah et al. [13], we propose the following cut-off point (CP); where MAD = median {|Zi − median(Zi)|} d Median Absolute Deviation (MAD) is used as a robust alternative to the standard deviation as a measure of scale. Maronna et al. [25] suggested the use of d = 0.6745 to make the MAD comparable to the standard deviation for normally distributed data. Any observation that corresponds to Z i which is larger than CP is considered as HLPs (good and bad leverage points). As indicated by Dhhan et al. [15], a weight function is formulated based on the diagnostic method of identifying the outliers. The effectiveness of the weight function relies on the efficient diagnostic method of detecting outliers. Hence, a weight function based on nu-SVR is formulated as follows: where weight equals 1 is given to regular and good leverage points, and weight equals to CP Z i is given to vertical outliers and bad leverage points. We propose this weight to be in the algorithm of SVR-RWSIMPLS. Since this method has already been proven to be very efficient in detecting genuine outliers in high dimensional space (p n) with no masking effects and negligible swamping effects [10], we expect that the SVR-RWSIMPLS will be more efficient than the RWSIMPLS.

B. ALGORITHM OF THE PROPOSED SVR-RWSIMPLS ESTIMATOR
This section describes how the new version of the RWSIMPLS which is based on nu-SVR, denoted as SVR-RWSIMPLS, is developed. The establishment of the SVR-RWSIMPLS is done by integrating the weighting function obtained from nu-SVR which requires less computing times than the RWSIMPLS. Moreover, at the outset, the matrices of X and Y are centered based on Decile Mean (DM) instead of the median which has low efficiency at normal distribution. Rana et al. [26] showed that the accuracy of decile mean is better than the median as it has a smaller bias and smaller standard errors in a variety of situations. The centered variables are scaled as follows: where DM is the decile mean where it is defined as the sum of all deciles (D i ) divided by the number of deciles, . . + D 9 9 and wsdx and wsdy are weighted standard deviations for variables X and Y based on weight of nu-SVR as given in Equation (10).
The weighted standard deviations can be considered as the robust alternative to the ordinary standard deviations because the effect of vertical outliers and bad leverage points have been reduced by assigning smaller weights to them. The steps for our proposed algorithm can be summarized as follows: Step 1: Calculate the initial weight based on nu-SVR as in Equation (10).
Step 2: Both matrices of the predictor and response variables are scaled as in Equation (11).
Step 3: Using the weight in Step (2), compute the weighted predictors variables, X w and response variable Y w by multiplying each row by the squared root of weight as in the following equations: Step 4: Choose a small sample of size 5 without replacement, for example, X z w and Y z w from X w and Y w , then obtain the starting coefficient estimatesβ = (β 1 , . . . ,β m ) using the ordinary steps of SIMPLS on X z w and Y z w .
Step 6: Utilize the robust scale for residuals as follows: where DM and wsdr are the decile mean and weighted standard deviations of residuals, respectively, calculated using Equation (11).
Step 7: Compute the weighted covariance matrix S w = X w T Y w .
Step 8: Employ the SIMPLS method on S w to estimate the parameters.
Step 9: Repeat Steps 5-8 until convergence, i.e. when the maximum of the absolute difference for 2 consecutiveβ s is less than 0.00001.
Step 10: Steps 3-9 are repeated several times, for example, for c = 5 times. Then the final parameter estimatesβ, which corresponds to the minimum value 5 of the coefficients set.
In the next section, the performance of the proposed SVR-RWSIMPLS is assessed using a simulation study and real data set.

IV. RESULTS
This section includes the Monte Carlo simulation study and real data to evaluate the performance of the SVR-SIMPLS method compared to the SIMPLS and RWSIMPLS methods. All analyses were done using R program [27].

A. MONTE CARLO SIMULATION STUDY 1) STANDARD NORMAL AND SKEWED ERRORS DISTRIBUTION
To investigate the efficiency of the estimators, we compared the proposed method with the SIMPLS and the recently proposed RWSIMPLS method [7]. We followed the simulation designed of [6], [7] by considering the linear regression model as in Equation (1). The true value of the regression parameters is randomly drawn from a normal distribution with µ = 0 and σ 2 = 0.001. The explanatory variable matrix X is computed by X = TK t so that the rank of X equals the number of components A, which is the number of columns of T. Columns T and K are generated from the standardized normal distribution, k a ∼ N (0, 1) and t a ∼ N (0, 1), where a is the number of components, a = 1, . . . , A. Moreover, this simulation study contains different distributions of errors ε i : the standard normal, Student-t distribution with 2 degrees of freedom (t 2 ) and 5 degrees of freedom (t 5 ), and Cauchy and Slash distributions (heavy-tailed distributions). The performance of the proposed method was evaluated using MSE. The mean square errors for theβ estimates were computed as shown in Equation (14): where S rep is the number of replications used in the simulation studies. This study considered different size of p and n, and under each setting, we repeated the simulation 1000 times, i.e. S rep = 1000 [7]. The results are presented in Table 1.
It is interesting to observe that at normal errors, the SIMPLS, which is supposedly to be the most efficient estimator, performed less efficiently than the SVR-RWSIMPLS. In fact, it was found that the SVR-RWSIMPLS showed a better performance compared to the SIMPLS and RWSIMPLS as it indicated the least MSE values except for n = 24, p = 6 and A = 2. However, in this case, the MSE of SVR-RWSIMPLS is only slightly larger than the RWSIMPLS. The scenarios changed dramatically for SIMPLS when the errors were skewed, particularly its performance became worst under Cauchy and slash errors distributions where its MSE values increased significantly. The skewed error distributions reveal that the SVR-RWSIMPLS provides the best result as it shows the least values of MSE compared to the other two methods.

2) CONTAMINATED DATA WITH HIGH LEVERAGE POINTS
The performance of our SVR-RWSIMPLS is further assessed by investigating its performance in the presence of outliers in both X and Y spaces (bad leverage points). The simulation design is similar to the previous design, except that we only considered standard normal errors distribution, N (0, 1) with p explanatory variables (p = 10, p = 200 and p = 400), various sample sizes (n = 30, 50, 100, 150, and 200) and different percentages of contamination θ = (10%, 15%, and 20%). Following Dhhan et al. [15], at each step, a certain percentage of good X and Y values were deleted and replaced with large values equal 50. The SVR-RWSIMPLS, RWSIMPLS, and SIMPLS were then applied to the data. The results are exhibited in (Tables 2-4) and (Figures 1-5). As can be expected, the SIMPLS produced poor results in all scenarios. Let us first focus our attention to Table 2, for low dimensional data, p = 10. At 10% contamination, n = 30&50, the SVR-RWSIMPLS method has the smallest values of MSE, but its  contaminations, the RWSIMPLS outperformed the SIMPLS and RWSIMPLS, irrespective of sample sizes, n. The performances can clearly be seen in Figure 1. (Tables 3 and 4) and (Figures 2 and 4) illustrate the MSE of the three methods, along with the computation running times for high dimensional data (p = 200 and p = 400). It is interesting to observe that the SVR-RWSIMPLS consistently provides the smallest MSE and the least computational times, irrespective of the sample size and percentage of contaminations. The computation time for the SIMPLS was not compared because this method performed poorly as it produced the largest values of MSE. For a quick interpretation of the results, refer to (Figures 3 and 5).

B. REAL EXAMPLES 1) OCTANE DATA SET
The octane data set is the first real example to illustrate the performance of our proposed SVR-RWSIMPLS compared to the commonly used SIMPLS and RWSIMPLS methods. This data set consists of near-infrared absorbance spectra with p = 226 wavelengths collected on n = 39 gasoline samples. Esbensen et al. [28] had also used this data set in their study. [29], [30] noted that this data set contains outliers in samples 25, 26, 36, 37, 38, and 39 which contain added ethanol. The SVR-RWSIMPLS, SIMPLS, and RWSIMPLS were then applied to the data set by considering number of   components A = 2, following the methods by [31], [7]. Based on Alin and Agostinelli [7], the empirical influence function and empirical breakdown values of the three methods were constructed. To compute the empirical influence function, the Y variable was first scaled before being contaminated by replacing the first value of the X variables (x 11 = −0.060193) as well as the first value of Y variable (y 1 = −1.2271) with the values between −40 and 40. The empirical influence values were then computed using Equation (15) whereβ c i is the coefficient estimate for the contaminated data;β is the coefficient estimate for the clean data; and    Figure 6(a). Simultaneously changing y1 and x11 in the data resulted in more impact on the SIMPLS coefficient estimates. However, all these three methods had the lowest empirical influence values when contaminated at (y 1 = 0, x 11 = 0). It can be observed that there is large spike when the first value of y is contaminated with −4, and this has a negative effect on the SIMPLS estimates. Even though the empirical influence values for RWSIMPLS are smaller than those of the SIMPLS, RWSIMPLS cannot outperform the SVR-RWSIMPLS. Besides, it can be observed that there are moderate fluctuations for RWSIMPLS estimates. On the other hand, the proposed SVR-RWSIMPLS was impressive as it has the lowest empirical influence values without showing any fluctuations in its estimates. The robustness of the three methods were investigated further by studying the empirical breakdown values.
To construct an empirical breakdown plot, the response variables of the octane data were contaminated up to 50% by deleting the original y i and replacing it with 20 for i = 1, 2,. . . 20 whereby the first value of y is first contaminated ( 1 n * 100% Cont.), followed by the first 2 values of y ( 2 n * 100% Cont.), etc. Then the difference between the contaminated coefficient estimates vectorβ c i and the coefficient estimates vector was computed at each amount of contamination for a clean dataβ. The results are displayed in Figure 6(b). It can be seen that that the SIMPLS performs poorly as it cannot cope well even when there is only one outlier in the data. The results also signify that the RWSIMPLS can tolerate slightly over 33% of contamination. The proposed SVR-RWSIMPLS method has the smallest empirical breakdown points, and it can stand up to 48% of contamination.
The performance of the RWSIMPLS method was further assessed with regard to its detection of vertical outliers and leverage points. Alin and Agostinelli [7] constructed a diagnostic plot for identifying vertical outliers and good and bad leverage points by plotting the RWSIMPLS standardized residual against leverage values. They employed leverage values i.e. the diagonal elements of the hat matrix H = X (X X ) −1 X to identify HLPs. Any observation which corresponds to a leverage value greater than the cut-off point, 2p/n, is considered as HLPs. As pointed out by [13], leverage values are not successful in detecting multiple HLPs. As such, using leverage values for diagnosing HLPs will produce   Figure 7(a), the SVR-RWSIMPLS method successfully diagnoses the actual number of outliers in the octane data set without showing any masking and swamping effects. Nevertheless, the RWSIMPLS method badly suffers from masking and swamping effects. Additionally, the computational running time for octane data is 3.51 seconds for SVR-RWSIMPLS method and 4.09 seconds for RWSIM-PLS method. This indicates that the computational running times for SVR-RWSIMPLS is faster than the RWSIMPLS which is in agreement with the results of the simulation study.

2) GAS DATA SET
The second real example in this paper is the gas data set [32]. The gas data set has been studied by several researchers such as [31], [7]. According to Branden and Hubert [31], it is a clean data set which contains near-infrared spectra (NIR) of gasoline with sample size n = 60 with the number of explanatory variables, p = 409, which are the NIR spectra values measured in 2-nm intervals from 900 nm to 1700. The response variable was represented by the octane number with the values of between 83.4 and 89.6.
We followed the same process as described for the octane data in order to further assess the performance of our proposed method by constructing the empirical influence function, empirical breakdown values, and diagnostic plot.
To compute the empirical influence function, the response variable Y needs to be scaled at the outset in order to obtain a  similar range with the predictor variables. After that, the data were contaminated by concurrently substituting the first value of the first predictor variable (x 11 = −0.05019) and the first value of the response variable (y 1 = −1.2271) with values between −40 and 40. The empirical influence values were computed using Equation (13) and displayed in Figure 8(a). VOLUME 9, 2021 Similar to the octane data, it can be observed from Figure 8(a) that the SIMPLS method has higher values of empirical influence function as a result of the contamination, and it also has a high impact on the SIMPLS algorithm. When the data were contaminated with 0 (x 11 =0, y 1 =0), all the three methods prove to be equally good with smallest empirical influence values. However, when the first y value was contaminated with a value close to −4, the SIMPLS produced very bad results with a very huge spike. The RWSIMPLS has smaller empirical values but indicates moderate instability in the parameter estimates as evident by the existence of small fluctuations in the empirical influence values. On the other hand, the proposed SVR-RWSIMPLS has the smallest empirical influence values with no significant fluctuations which shows the robustness of the SVR-RWSIMPLS estimates.
Next, the empirical breakdown values for each of the three methods were constructed for the gas data set. The response variables were contaminated by up to 50% by substituting the y i with 20 for i = 1, 2, . . . . . . , 30 that is, the first value of y( 1 n * 100%Cont.) is the first to be contaminated, followed by the contamination of the first 2 ( 2 n * 100%Cont.) and so on. For each amount of contamination, the difference between the contaminated coefficient estimate vectorβ c i and the coefficient estimate vectorβ for clean data was computed, i.e. β c i −β . The results are displayed in Figure 8(b) which clearly show that the SIMPLS method is extremely sensitive even in the presence of one outlier in the response variable. The empirical breakdown values of SVR-RWSIMPLS are smaller than those of the RWSIMPLS; however, both can withstand the outliers by up to 48%.
Finally, the diagnostic plots based on the RWSIMPLS and SVR-RWSIMPLS were constructed for the clean gas data. The plots are shown in Figure 9(a-b). As seen in Figure 9(a), the plot based on the SVR-RWSIMPLS method does not show any outlier in the data set as highlighted by Branden and Hubert [31]. On the contrary, the plot based on the RWSIMPLS method presented in Figure 9(b) has shown many good observations that are declared as outliers (swamping effect) which will lead to a misleading conclusion. We then purposely contaminated the data by replacing the observations (cases1, 2, 3, 4, and 5) in both X and Y directions (bad leverage points) with large values (150, 151, 152, 153, and 154) . Furthermore, to create vertical outliers, two observations in the Y direction (Cases 6 and 7) were substituted with large values (100 and 105). The results are shown in (Figures 9c and 9d). A good method is the one that can successfully detect observations 1, 2, 3, 4 and 5 as bad leverage points and observations 6 and 7 as vertical outliers. It is interesting to note from Figure 9(c) that the diagnostic plot based on SVR-RWSIMPLS correctly diagnosed the abnormal points in X and Y spaces as bad leverage points as well as in Y space as vertical outliers. Nonetheless, the plot based on the RWSIMPLS method failed to correctly diagnose those observations. This diagnostic plot suffered from masking and swamping problems as shown in Figure 9(d). Furthermore, the computational running time for gas data for RWSIMPLS is 3.48 seconds and for SVR-RWSIMPLS is 3.36 seconds.

V. CONCLUSION
The aim of this study was to develop a robust iteratively reweighted of the SIMPLS algorithm (RWSIMPLS) based on weight obtained from our newly proposed nu-SVR (SVR-RWSIMPLS) in detecting outliers. Decile mean was employed instead of a median to center the response as well as the predictor variables. We also established a new diagnostic plot for classifying observations into regular observations, vertical outliers, and good and bad leverage points. The numerical results show that the SVR-RWSIMPLS is more efficient and more robust than the RWSIMPLS and SIMPLS when multiple leverage points and vertical outliers exist as it has the smallest MSE, the smallest empirical influence function and the smallest empirical breakdown values, and it can withstand up to 48% contamination. The computational running times for SVR-RWSIMPLS is also shown to be faster. The proposed diagnostic plot based on SVR-RWSIMPLS is also very successful in classifying observations into the correct categories. Nevertheless, the plot based on RWSIMPLS fails to correctly identify the actual outliers (multiple leverage points and vertical outliers). Hence, the newly developed SVR-RWSIMPLS method can be a suitable alternative to model multicollinear data.