Parametric Software Effort Estimation Based on Optimizing Correction Factors and Multiple Linear Regression

Context: Effort estimation is one of the essential phases that must be accurately predicted in the early stage of software project development. Currently, solving problems that affect the estimation accuracy of Use Case Points-based methods is still a challenge to be addressed. Objective: This paper proposes a parametric software effort estimation model based on Optimizing Correction Factors and Multiple Regression Models to minimize the estimation error and the influence of unsystematic noise, which has not been considered in previous studies. The proposed method takes advantage of the Least Squared Regression models and Multiple Linear Regression models on the Use Case Points-based elements. Method: We have conducted experimental research to evaluate the estimation accuracy of the proposed method and compare it with three previous related methods, i.e., 1) the baseline estimation method – Use Case Points, 2) Optimizing Correction Factors, and 3) Algorithmic Optimization Method. Experiments were performed on datasets (Dataset D1, Dataset D2, and Dataset D3). The estimation accuracy of the methods was analysed by applying various unbiased evaluation criteria and statistical tests. Results: The results proved that the proposed method outperformed the other methods in improving estimation accuracy. Statistically, the results proved to be significantly superior to the three compared methods based on all tested datasets. Conclusion: Based on our obtained results, the proposed method has a high estimation capability and is considered a helpful method for project managers during the estimation phase. The correction factors are considered in the estimation process.


I. INTRODUCTION
Software project development has become extremely complicated, and the necessary competence in this industry is high, which requires the skills of highly qualified people. In past decades, to complete a project and deliver it to the customer on time, schedule, and budget, project managers had to estimate the cost of the software product, effort, and project duration or defect density [1]. The 2018 Standish Group CHAOS showed that many software companies could not give the correct practical software cost and completed The associate editor coordinating the review of this manuscript and approving it for publication was Giuseppe Destefanis . their projects late schedule and over budget -(48%-65%) or failed to complete them at all -(48%-56%) [2]. The results indicated that most projects' actual efforts and schedules are over budget compared to the estimates. The project budget plays a role in competitiveness, which means that using an effort estimation method in a software company is mandatory.
Software Development Effort Estimation (SDEE) is a crucial activity in the early stages of software development that plays an important role in the project's overall success. The SDEE manages project activities before the project begins, specifically designing the project plan and managing the budget. To obtain accurate estimates, a project manager must select an appropriate method and then customize or configure VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/  it to suit the type of software project that the organization will perform. However, the SDEE cannot be expected to have absolutely correct results [3], [4]. Accurate effort estimation is still an open issue. An effort estimation method is used to minimize the project's risks or reduce the risk of surprises during the project to the lowest value. It gives project managers good controlling decisions to ensure that the right amount of effort is allocated to the various activities during the project's development life cycle. As a result, this has led many researchers to investigate software estimation for more accurate SDEE methods [5], [6]. Existing research efforts related to SDEE can be classified into three main groups [3], [7], [8]: 1. Non-algorithmic models, also called non-parametric models, include Expert judgement, Analogue-based, Price-to-win, Top-down, Bottom-up, and Wideband Delphi. These models can develop an estimation by using an expert's previous experience or historical projects to estimate software development costs. Descriptions of the non-algorithmic models are presented in Some models are based on nonlinear properties and can learn from historical data and be trained to better estimate effort [9]. Recently, these models have been used in combination with or as an alternative to algorithmic models.
In the requirements phase of the software lifecycle, Use Cases can be useful to measure the estimated effort at an early stage of a software project before obtaining the essential information [21]. As a result, the use of the Use Case for SDEE is widespread. In particular, a survey by Neil and Laplante [22] focused on the techniques used for the requirements elicitation, description, and modelling phases and found that over 50% of software projects used use cases in the early phase. The results of another review by Azzeh et al. [23] found that most studies focus on assessing Use Case Point (UCP) as a possible method for early SDEE. Researchers show interest in UCP-based approaches that are used as functional size metrics for effort estimation. UCP is used for object-oriented projects based on a structured scenario and actor analysis of the Use Case Model (UCM) [19].

A. PROBLEM FORMULATION
According to a systematic review of studies [23]- [26], the UCP is a promising effort estimation method during the early stages of software development and has many advantages for the software industry. However, from the project manager's point of view, there are still two well-known issues in applying UCP methods that could be improved.
First, there is no standard for the specification of use cases. Specifically, use cases are written in natural language, and there is no rigorous procedure to examine the quality or fragmentation of use cases. This leads to the number of steps in a use case that may differ, and the accuracy of the estimation is affected. In addition, the accuracy of the estimate may be affected if there is more than one scenario in a use case. Therefore, to achieve accuracy in estimation, use cases need to be adjusted or calibrated. Almost all previous methods based on UCP for software effort estimation focus on constructing the method, reevaluating the complexity of the use case model, and reevaluating the complexity weights [27]- [37]. For example, researchers focus their attention on extending the UCP model by specifying new complexity levels for use case and actor weights [27], [28] or modifying existing complexity levels into more detailed options for effort prediction [29], [30]. Other studies calibrate complexity weights into different complexity levels [31]- [35]. Other approaches calculate the use case complexity based on transactions and paths [36], [37]. A transaction is defined as a stimulus and response event between an actor and the system. Paths are computed based on a cyclomatic complexity metric from the text representation of the use cases.
Second, the evaluation of Technical Complexity Factors (TCF) and Environmental Complexity Factors (ECF) depends on the experience of experts, which have a certain degree of uncertainty [28], [38], [39]- [45]. It is difficult to assign an appropriate value to an ECF because of a lack of relevant information. The reason is that an ECF is linked to the level of information and experience of a particular software development team. There are similar problems with the value assignment for a TCF. In particular, factor T10 (Concurrent) shows some difficulties. This technical factor could express parallel processing, parallel programming, or if the system works independently or interacts with several other parties. The assignment of values to this factor may not be accurate, as there are no guidelines in the UCP that explain this factor precisely. Huanca and Ore [39] recognized that the main factors affecting the estimation accuracy of the UCP are the ECFs and TCFs. They emphasized that the correction factors need to be reevaluated. Nassif et al. [46] also pointed out the necessity to refine these correction factors.
Combining machine learning to build SDEE models based on the original UCP formula could be a solution to enhance its accuracy. Some approaches [47]- [56] have also explored variant models, particularly using regression models to optimize estimation accuracy based on historical data. These approaches have many improvements that minimize the influence of human error during the analysis of the UCM and simplifying the original principles of the UCP.
The main drawback of the above methods is that none of them is comprehensive or provides better accuracy in estimating software effort under all situations. We developed the Optimization Correction Factors (OCF) method [38]. The method has investigated the Least Absolute Shrinkage and Selection Operator (LASSO) method [57], [58] to determine the best technical and environmental complexity factors that significantly affect the estimation accuracy of the UCP method. The OCF method can help project managers reduce risks in evaluating correction factors and produce estimation results close to the actual effort [38]. The method has shown that the Sum of Squared Errors (SSE) is improved by more than 16% compared to the UCP estimation method. The SSE was also examined at the 5% significance level, and the p-value (0.0245) was below the 5% significance level. When analysing the Percentage of Prediction within 25% (PRED (0.25)) of the OCF method, the UCP method has a PRED (0.25) of 0.38, while the OCF method reaches a PRED (0.25) of 0.66. Our method is considered the first step for more intensive research to evaluate the technical and environmental complexity factors in the UCP method. We believe that the accuracy of the OCF method may be different when performed with various other datasets, and therefore, a bottom-up experiment is performed in this paper.
However, the OCF method does not currently provide a highly significant refinement to the estimation. Our goal of modifying the OCF method is aimed at achieving more accurate estimates. The proposed method is inspired by the possibilities of using a standard estimation procedure for solving the considered problems discussed above. Therefore, in this work, we aim to apply the Least Squared Regression (LSR) models or the Multiple Linear Regression (MLR) models to improve the ability of the OCF method to estimate the software size and minimize the prediction error. Our approach uses MLR on historical project data points to build regression VOLUME 10, 2022 models and minimize errors in the integration process or recursion.
This study proposes a parametric software effort estimation model based on the OCF method and MLR for SDEEthe Extension of Optimizing Correction Factors (ExOCF) method -to minimize the estimation error more efficiently. The research questions answered are as follows: RQ1: Is it possible to modify the OCF method so that its estimation accuracy improves?
RQ2: Does the proposed method outperform a baseline UCP method and another tested method?
RQ3: Is the difference in the accuracy of the estimate using different methods statistically significant?
To answer the research questions, we conducted an experimental study to evaluate the estimation accuracy of the proposed method and compared it with three methods used in the literature. Each method is run on four different historical datasets (D1, D2, D3, and D4) based on various evaluation criteria (28)(29)(30)(31)(32)(33)(34). In this paper, we used statistical pairwise t-test comparisons to validate the accuracy of the proposed method. The following statistical hypothesis was tested: H0: There is no significant difference in estimation capability between the proposed method and other estimation methods. This means that the estimation accuracy of the proposed method is not significantly different from that of the other methods.
H1: There is a significant difference in estimation capability between the proposed and other estimation methods. This means that the estimation accuracy of the proposed method is significantly better than that of other methods.

B. CONTRIBUTIONS
The main contributions of this study are as follows: 1) Investigation of the LASSO algorithm's use in exploring the best environmental and technical complexity factors on different datasets that improve the UCP size metric. 2) Machine learning techniques -LSR or MLR models -are combined with the OCF method to obtain better results in effort estimation. In this method, the software effort is a function of the OCF variables. The MLR formulation was created to estimate software effort values.
3) The results obtained by the proposed method are compared with three different estimation methods used in the literature. The methods are tested using the k-fold cross-validation technique. The training and testing datasets are the same for all methods. The datasets were obtained from the industry datasets of three data donors. To validate the accuracy of these methods, accuracy measures are chosen to avoid bias. The measurement criteria listed in Section 5 show how the evaluation metrics were selected. The experimental results show that the accuracy of the proposed method outperforms the other models.
The remaining sections are divided as follows: Section 2 introduces the related work. Section 3 presents the background of the methods used. The proposed effort estimation methods to achieve the research objectives are presented in Section 4. Section 5 describes the research methodology, including the presentation of the four datasets used in our experiments, the normalization of the data, the procedure of the experiments, and the evaluation criteria/metrics. The results of the experiments are presented in Section 6. Section 7 describes the threats to validity. Section 8 presents the conclusions. In the last section, we present future work.

II. RELATED WORK
Some problems related to the UCP model were presented in the previous section. In particular, many authors focused on adding more complexity levels for use case weight, actor weight, or both, discretizing the existing complexity levels, and calibrating the complexity weights. Kirmani and Wahid [27] added actor and use case weighting in the Re-UCP. They also added one extra rating level to the use case weighting system in UCP Sizing. Nunes et al. [28] identified six actor weightings in the iUCP. Wang et al. [29] integrated fuzzy set theory and Bayesian belief networks into the UCP model to extend the complexity levels of use cases. Periyasamy and Ghode [30] changed the actor complexity levels and reclassified the use case complexity in the e-UCP method. The UCPabc [31] approach applies an activity-based costing method to all variables in the UCP method, except the productivity factor is changed to 8.2 person-hours. An adjustment approach to the UCP, called Adapted UCP (AUCP) [32], is applied for incremental development estimations in large-scale projects. Braz and Vergilio [33] proposed two methods: Use Case Size Points (USP), and Fuzzy Use Case Size Points (FUSP), by calibrating the internal level of the use case. A USP introduces new components by considering the structures of a use case, the number and weight of scenarios, actors, preconditions, and postconditions. A FUSP is an extended version of a USP that uses the Fuzzy Set theory to reduce some use case classification problems. Qi et al. [34] improved the estimation accuracy of the UCP by using Bayesian analysis to calibrate the case complexity weights. Rak et al. [35] proposed a model for effort estimation called Use Case Reusability (UCR). The method gives a new classification for use cases based on their reusability. References [36] and [37] proposed an improvement method by computing paths from the cyclomatic complexity of the use case scenario. Although there is a small difference in precision, these approaches show that paths and transactions can be useful in computing the UCP.
In terms of SDEE methods based on machine learning techniques, we categorized them into three groups as follows. The first group uses neural network models such as Cascade Correlation Neural Network (CCNN) model, Multilayer Perceptron (MLP), Fuzzy Logic, or Artificial Neural Network (ANN) to estimate software effort, as shown in [46], [48], [53], [54]. Nassif et al. [46] proposed a UCP-based effort estimation model using fuzzy logic and neural networks to increase estimation accuracy. Reference [48] introduced a regression model using the Sugeno Fuzzy Inference System (FIS) approach to improve the estimation accuracy. The results show that an MMRE improvement of 11% can be obtained. Reference [53] proposed the CCNN model for use case diagrams. The proposed model was evaluated against the MLR and the UCP model with promising results as an alternative approach for SDEE. [54] presented the Adaptive Neuro-Fuzzy use Case Size Point (ANFUSP) model to estimate the effort for object-oriented software projects. The model results have less error than the UCP method.

Iraji and Motameni
The second group uses soft computing techniques with analogue-based estimation, such as [47], [55], [56]. Nassif et al. [47] proposed a model combining fuzzy logic and neural networks to increase the estimation accuracy of the UCP method. Here, the fuzzy logic used ten degrees for the complexity of the use cases, and the neural network was used to represent the input vectors of the UCP model. Bardsiri et al. [55] proposed a hybrid model based on Analogy Based Estimation (ABE) and Particle Swarm Optimization (PSO) algorithm. The model creates an attribute system that is weighted differently depending on the cluster. The results of the proposed model showed significantly improved accuracy of the estimates. Chiu and Huang [56] studied the effect of a genetic algorithm for adjusting the reused effort based on the distance between pairs of projects.
The last group applies regression models such as linear, nonlinear, and stepwise models [49]- [52]. Regression models can provide higher accuracy for effort estimation by examining the validity of UCP variables. Specifically, Nassif et al. [49] proposed a regression model based on the use case point size. The model considers the nonlinear relationship between software size in the UCP (Size) and the effort in person-hours (Effort), as well as the impact of the environmental complexity factors of a project on the productivity factor. The equation of the model is presented in (1). The results show that PRED (0.25) and PRED (0.35) were improved by 16.5% and 25%, respectively.
where the productivity value is between 0.4 and 1.3. Jorgensen [50] reported all variables included in the models to illustrate the accuracy and bias variation of the SDEE methods using regression analysis. Ochodek et al. [51] simplified the UCP method by discarding the UAW, measuring the UCP based on steps, or calculating the total number of steps in use cases.
Silhavy et al. [52] developed the Algorithmic Optimisation Method (AOM) to increase the accuracy of the correction coefficients of the effort estimation process. The proposed method uses multiple least squares regression with all UCP elements. The equation of the AOM method is presented in (2).
where α 1 , α 2 are coefficient parameters from the regression model applied to historical projects.
The authors then conducted several experiments to investigate the significance of the UCP variables on two different datasets [9]. Residual analysis and stepwise multiple linear regression models were used to examine the influence of model complexity through correlation analysis. They proved that all UCP parameters were associated with the dependent variable to varying degrees and had significant estimation accuracy.
The regression equation is shown in (3)(4), which contains an intercept, linear terms, and squared terms.
The next part discusses the latest development (2016 onward) in effort estimation accuracy achieved using the UCP . TABLE 3 lists studies on estimation methods related to our work. The table also shows that the datasets used for three industrial projects include Ochodek et al. [51], Nassif et al. [46], and Silhavy et al. [9], and educational projects. Moreover, most studies focus on developing new estimation models for the original UCP method or evaluating the accuracy of existing methods using historical datasets.
The accuracy measures used in these studies (2016 onward) are summarized in Multiple regression models relate to estimating regression effort applications where there is more than one independent variable [3], [24], [50]. The purpose is to obtain the best-fit line that minimizes the regression model's sum of squared residuals [75]. The form of the regression model is presented as a linear equation between a dependent variable and a set of p independent variables X 1 , X 2 , . . . ,X p as follows: i.e.
where y i is the dependent variable, X i1 , . . . ,X ip are the independent variables, α 0 is the intercept parameter, and α 1 , . . . ,α p are the regression coefficients. These variables are unknown constants that must be estimated from the dataset, and ε i are the error residuals. Equation (5) can be rewritten as follows: where vector y and vector ε are column vectors of length m, vector α is a column vector of length p + 1, and matrix X is an m by p + 1 matrix. Using LSR, vector α is calculated as follows: Polynomial regression is a multiple regression in which the relationship between the dependent variable and p independent variables is illustrated as a polynomial of degree n.
Based on the polynomial equation, a model can obtain a minimum error or minimum cost function. The model gives the best approximation of the relationship between the dependent and independent variables [55].

B. USE CASE POINTS
The original UCP method [19] is based on assigning weights to clustered actors and use cases (complexity weights). The elements of the UCP are shown in FIGURE 1.
The actor and use case employ three cluster classes (simple, average, and complex), as shown in TABLES 5 and 6.   The sum of the weighted actors and use cases are created for the Unadjusted Actor Weight (UAW) and Unadjusted Use Case Weight (UUCW). The UAW and UUCW are calculated by using (10) and (11), respectively.
where αt i is the number of actors in actor type i, w i is the complexity weight of actor i, uc j is the number of use cases in use case i, and w j is the complexity weight of use case j. Correction factors, i.e., TCFs and ECFs are used to describe the experience level of the software development team. The technical and complexity factors are shown in TABLES 7 and 8. The technical factors are calculated using (12), and the environmental factors are calculated using (13) as follows:   where T i is the value of TCF i, Wt i is the complexity weight of technical factor i, E i is the value of ECF i, and We j is the complexity weight of environmental factor i. The UCP is calculated using (14) as follows: For SDEE, Karner suggested a factor of 20 man-hours per UCP to measure work effort. This is presented in (15).

IV. THE PROPOSED METHOD
The ExOCF method can be divided into two phases. The first phase (Model Selection Phase) focuses on determining which of the technical and environmental complexity factors significantly affect the accuracy of the UCP based on the feature selection model. Then, two new regression formulas are created to calculate the selected factors through MLR models. The second phase (Fine-Tuning Phase) is conducted to optimize the OCF element obtained from phase 1. A detailed illustration of the ExOCF method is shown in FIGURE 2.

A. MODEL SELECTION PHASE
The Least Absolute Shrinkage and Selection Operator (LASSO) regression model [57], [58] is used to determine the factors selected in the regression analysis.
The LASSO estimate denotedβ (λ) is determined as follows:β where: λ ≥ 0 is the LASSO parameter that controls the strength of the penalty. The LASSO parameter λ is determined by the Leave One Out Cross-Validation (LOO-CV) method [76], [77]. This parameter's choice is adjusted based on the lowest possible prediction errors and a lack of bias towards the correction factors of the samples in the training set. The LASSO parameter relates directly to the number of selected correction factors via the number of nonzero β's. The number of nonzero β values can be changed by modifying the model parameter shown as t in (16). The LASSO-based n selected technical factors are named LaTF. A LASSO-technical factor (LaTF) can be described as follows: where LaT i is a technical factor that takes values from the interval [0, 5]. A value of ''0'' means that the technical complexity factor is irrelevant, while a value of ''5'' is essential. WLt i is the weight of technical factor i. α 0 , α i are regression coefficient parameters that are obtained from the MLR model. The LASSO-based m selected environmental factors are named LaEF. A LASSO-environmental factor (LaEF) can be determined as follows: where LaE j is an environmental factor that corresponds to the environmental factors. WLe j is the weight of environmental factor i. α o , α i are regression coefficient parameters that are obtained from the MLR model.

B. THE FINE-TUNING PHASE
In this phase, the effort estimation model is built using MLR as follows: where γ 1 , γ 2 are obtained according to two steps. First, the historical data points (P 1 , . . . , P n ) are collected. The UAW, UUAW, LaTF, and LaEF elements for each project are identified. The result of this step is the collection of values (x i1 , x i2 , y i ), i = 1 . . . n, where y i is the actual size (Real_P20 values) of the software project from a historical dataset.
The LSR model is then used to obtain the regression coefficients γ 1 , γ 2 as follows: Because y i is a real value from a historical dataset, the regression coefficient values of γ 1 , γ 2 can vary from each dataset. This means that when a historical dataset changes, this phase needs to be performed again to obtain new regression coefficient values. The second step of this phase will calculate the UAW, UUCW, LaTF, and LaEF of the current project, and (21) is applied with values γ 1 , γ 2 from step 1 to estimate the UCP.

V. RESEARCH METHODOLOGY
In this section, we describe the empirical analysis of our research methodology. The section begins with a description of the datasets for the experiment, including the statistical characteristics in the four datasets and data normalization. The next part is the process of setting up the experiment to evaluate the software effort estimation methods.

A. DATASET DESCRIPTION
The proposed method was evaluated using a dataset that the authors collected and used [9]. The dataset is based on three data donations (D1, D2, and D3). The projects from each data donors differ in size (measured by the UCP). All data donors work in different government, health, and business sectors. The projects were developed in Java and C# programming languages. After analysing the dataset, we noticed that the Real_P20 of some projects varied extensively. FIGURE 3 presents a boxplot of Real_P20 in each dataset. Real_P20 is real effort in man-hours, divided by productivity (PF -man-hours per 1 UCP).
We observed a substantial difference in Real_P20 between the data donors. The distribution of Real_P20 is observed according to the data donors. In particular, data donor D1 had the largest projects, while data donor D3 had the smallest VOLUME 10, 2022 projects. The significant difference in Real_P20 makes the dataset heterogeneous. Therefore, applying the same model to all projects was not effective. We grouped projects according to data donors, making the datasets more homogeneous. Datasets (D1, D2, and D3) were provided by data vendors. Projects in each dataset may be understood as being local data for each of the companies.
In addition, we also evaluate the effect of mixing projects with different data providers, and a fourth dataset (D4) was also added, which combined all three datasets.
Statistical characteristics of the Real_P20 of the four datasets are described in TABLE 9, FIGURES 4-7. Median person-hours represent the workforce value of the project development period, which was applied from the project's start date to acceptance date. The median Real_P20 shows the same value divided by PF = 20. It assumes that 20 personhours corresponds to 1 UCP [19]. This transformation was made because data donors did not provide estimations using the UCP. The minimum Real_P20 and maximum Real_P20 describe the smallest and largest project sizes, respectively. The Real_P20 range describes the difference between the minimum Real_P20 and maximum Real_P20. The last column (n) indicates the number of projects in the dataset.

B. DATA NORMALIZATION
All variables in the four datasets were standardized using Min-Max normalization [78], [79] to ensure that they had the same influence degree. Variables usually have various ranges, which may have a negative impact on the learning step. Using (26)(27), the variables are scaled and standardized from (x min ,x max ) to (New min , New max ).
x max = max x j 1≤j≤N , x min = minx j 1≤j≤N (27) C. EVALUATION CRITERIA In SDEE, different criteria are needed to evaluate the estimation accuracy of methods. The SDEE's accuracy in terms of the MMRE Men Magnitude of Error Relative to the estimate (MMER) [15], [48], [80] are the most commonly used metrics. However, these metrics may become biased [81], [82]. According to the systematic review of Azzeh et al. [23], the authors encouraged us to discard biased measures such as MMRE and MMER. Therefore, to evaluate the proposed estimation method, we use alternative criteria that produce an unbiased and symmetric distribution, as follows: (28)        • Mean Balance Relative Error (MBRE) • Mean Inverted Balance Relative Error (MIBRE) • Median of Magnitude of Relative Error (MdMRE) where n is the number of observations, y i is the real known value,ŷ i is the predicted value, and ε is the prediction error value.
On the other hand, we also used two measures to evaluate the accuracy of the estimation models, such as (33) Sum of Squares Errors (SSE) and (34) Percentage of Prediction within x% (PRED(x)). In particular, SSE is an important metric to estimate the variation in modelling error [75]. It is used because of its ability to describe errors for selected datasets. Second, PRED(x) is less biased towards underestimation and generally determines the same best method as the Standardized Accuracy (SA). According to the empirical evaluation of Idri et al. [83], an SDEE method that has high

D. EXPERIMENTAL SETUP
In this section, we present a series of experimental setups to evaluate the effectiveness of software effort estimation methods (see FIGURE 8). In step 1, the methods in this research direction are installed for experiments as follows: • ExOCF (proposed in Section 4) The results are compared with estimation methods as follows: • OCF [38] • UCP [19] • AOM [52] To evaluate the estimation accuracy, we experimented with five different runs (5-fold cross-validation). The comparisons of the effort estimation accuracy of each method are then based on the average results of these five runs.

VI. RESULTS AND DISCUSSIONS
This section presents the empirical results obtained from the analysis of the correction factors that significantly affect the        We also performed paired samples t-test comparisons [84]- [87] to investigate whether the ExOCF method is significantly different from the other methods to confirm the evaluation conclusions. The notations , and ≈ are used to express the empirical conclusion based on their p-value, which indicate the statistical superiority, inferiority, and similarity of the ExOCF method compared to each of the other methods, respectively. When the p-value ≤ 0.05, we can conclude that the difference in estimation accuracy between the ExOCF method and each other method is significant. In this work, we use the SSE, PRED (0.25), MdMRE, MAE, MBRE, MIBRE, and RMSE results as the sample test set for each method.

B. RQ1
Is it possible to modify the OCF method so that its estimation accuracy improves?
The accuracies of empirical validation for the two methods are given in TABLES 12-15 over the four datasets. As the results show, we can comfortably confirm that the proposed ExOCF method produces the best SSE, MdMRE, MAE,  MBRE, MIBRE, RMSE, and PRED (0.25) values, which indicates that it is possible to modify the OCF method to improve its estimation accuracy.
Specifically, the average SSE results of the ExOCF method decreased by 3.02 times, 1.3 times, 3.1 times, and 1.55 times compared with those of the OCF method on datasets D1, D2, D3, and D4, respectively (see FIGURE 25). Similarly, compared to the OCF method, the ExOCF method increases the PRED (0.25) average values by 2.18 times, 1.64 times, 2.4 times, and 1.33 times on datasets D1, D2, D3, and D4, respectively (see FIGURE 26). The average MdMRE results of the ExOCF method are 2.08 times, 1.17 times, 1.92 times, and 1.28 times lower than those of the OCF method (see FIGURE 27). The average MAE results of the ExOCF method are reduced by 2.08 times, 1.17 times, 1.92, and 1.28 times those of OCF (see FIGURE 28). The average MBRE results of the ExOCF method are 2.72 times,   1.13 times, 1.71 times, and 1.31 times lower than those of the OCF method (see FIGURE 29).
Then, the average MIBRE results of the ExOCF method are 2.22 times, 1.1 times, 1.59 times, and 1.25 times lower VOLUME 10, 2022   than those of the OCF method (see FIGURE 30). Finally, the average RMSE results of the ExOCF method are 1.76 times, 1.14 times, 1.78 times, and 1.25 times lower than those of the OCF method (see FIGURE 31).
Above all, we believe the use of the MLR model on the OCF variables has shown its effectiveness.

C. RQ2
Does the proposed method outperform a baseline UCP method and another tested method?
We measured the accuracy improvements achieved by the proposed ExOCF method over the baseline UCP method and another tested method, the AOM method. As shown in    First, we consider the SSE and PRED (0.25) results of the experimental methods (see FIGURE 32 and FIGURE 33  Above all, we can confidently confirm that the proposed method works better than the UCP and AOM methods.

D. RQ3
Is the difference in the accuracy of the estimate using different methods statistically significant?
To answer RQ3, we examined the statistical properties of the estimates resulting from methods based on paired t-test comparisons, as shown in TABLES [16][17][18][19]. The results show the average p-value results and the SSE, PRED (0.25), MdMRE, MAE, MBRE, MIBRE, RMSE over five different runs and the final statistical conclusions. The results confirm that the ExOCF method is statistically significant at the 95% confidence level compared to previous methods. Therefore, we are inclined to accept the alternative hypothesis (H1), which is also consistent with the results presented above.

VII. THREATS TO VALIDITY
The threats to the validity of this study, particularly to internal, external, and construct validity, can be summarized as follows:

A. INTERNAL VALIDITY
There is no superior approach to determine the regularization parameter λ to extract a selected variable set, as shown in (16), before applying LASSO regression. In practice, the tuning parameter λ, which controls the strength of the penalty, has an important effect. In particular, if λ is sufficiently large, the coefficients must be exactly zero, leading to the dimensionality being reduced. The larger the parameter λ is, the   greater the number of coefficients reduced to zero. Thus, we determined the λ value based on the LOO-CV technique, where the R-squared reaches its highest value. This technique is used because of its deterministic property and suitability for small datasets. The dataset summarizes data from three donors for a long time period. Independent variables were partly submitted by the data vendors. The complete process of using case point calculation -mainly in the factor weightsis not known. This may influence data quality and comparability between data donors. In past publications, datasets used were preprocessed, which may also have an impact on reliability.

B. CONSTRUCT VALIDITY
Construction validity concerns generalizing the results. In the case of this study, the goal of experiments was to minimize an estimation error. The process is based on a common process for tuning an estimation model. Implementation of 5-fold cross-validation and dealing with four datasets allows us to generalize the results. To avoid monomethod bias, measurements using several evaluation criteria were used. Unbiased evaluation criteria and statistical pairwise t-tests were used to confirm the validity of the results, such as the SSE, PRED (0.25), MAE, MdMRE, MBRE, MIBRE and RMSE, which have no asymmetric error distribution. Thus, we can conclude that the experimental results of this study are highly generalizable.

C. EXTERNAL VALIDITY, NAMELY, THE EXPERIMENTAL DATA
Our experiments are based on a collection of publicly available datasets, so the conclusions should be convincing. These datasets are a small part of all datasets in the real world. Therefore, the conclusions about these datasets may not be appropriate for other datasets.

VIII. CONCLUSION
In this paper, our goal is that by modifying of our OCF method, more accurate estimates can be realized. The proposed ExOCF method is inspired by the possibilities of using a standard estimation procedure to solve the problem of the influence of human errors during the analysis of the UCM and simplifying the original principles of the UCP that the OCF method is having. Specifically, we used MLR models on historical project data points to build regression models and minimize errors in the integration process or recursion. The proposed method improves the OCF method's ability to estimate a software size and minimizes the prediction error. This paper analysed important research questions related to the proposed method, as mentioned in Section 1. Regarding RQ1, according to the accuracy of the empirical validation for both the OCF and ExOCF methods regarding the SSE, PRED (0.25), MdMRE, MAE, MBRE, MIBRE, and RMSE, we can confirm that the ExOCF method is superior to the OCF method over four datasets. Applying the MLR model to the OCF variables using the ExOCF method has improved its estimation accuracy. For RQ2, we can declare that the proposed method outperforms the UCP and AOM methods. For RQ3, to confirm the validity of the empirical results, we analysed the statistical properties based on paired t-test comparisons. It can be concluded that the proposed method is statistically significantly superior to the other methods.
In conclusion, we believe that the results can also be understood as beneficial for industrial application, as they demonstrate that the proposed method leads to more accurate estimates of software size and effort.

IX. FUTURE WORK
In this paper, we proposed parametric software effort estimation based on Optimizing Correction Factors and Multiple Linear Regression for use in the early stages of software development. The ExOCF method uses the weighting of technical and environmental complexity factors as defined in the original UCP. These factors reflect how much productivity is approximately affected. One of our future works is to calibrate the weighting values of the correction factors to address the latest trend in the software engineering industry and improve the accuracy of the ExOCF method. Therefore, an approach to calibrate the weights of the correction factors using an artificial neural network [40] in the ExOCF model will be carried out in the future.
Another concern relates to an important aspect of deriving MLR models: the heterogeneity of the historical data. This could lead to an increase in the estimation error for SDEE. There are many solutions performed in the preprocessing step, such as outlier elimination, which is considered a solution performed in MLR-based effort estimation. However, the estimation accuracy is not significantly better because the difference in the distribution of historical data points cannot be resolved [56], [60], [88]. The use of clustering approaches is considered a solution to improve the estimation accuracy of the ExOCF method in our future work. He is currently an Associate Professor and a Researcher with the Computer and Communication Systems Department. His major research interests include effort estimation in software engineering and empirical methods in software and system engineering. From 1999 to 2018, he was appointed as a CTO in a company specialized on database systems development. He currently holds the position of an associate professor at Tomas Bata University in Zlin. His major research interests include software engineering, empirical software engineering, system engineering, data mining, and database systems.