Multivariable case adaptation method of case-based reasoning based on multi-case clusters and Multi-output support vector machine for equipment maintenance cost prediction

Case adaptation is crucial for case-based reasoning (CBR) because the solutions of old cases are not always the ideal answer for the encountered new problem. It is employed to solve new problems by utilizing the adaptation knowledge extracted from similar ones encountered in the past. The traditional adaptation method solves a new problem in the principle of k-nearest neighbor (k-NN), and the adaptation model was built based on k similar cases. Yet, the k similar cases retrieved by new case may locate in different case clusters in the case base composed of multiple case clusters. This article presents a new case adaptation method by the combination of multi-adaptation engines from different case clusters to improve the adaptation accuracy. First, the input and output of the cluster-based adaptation engine are established from the old cases to distill the adaptation knowledge in each case cluster. Then, the multivariable CBR adaptation engine based on multiple-output support vector regression (MSVR) is built for case adaptation. Furthermore, inspired by the fact that the training sample which contains two closet cases can provide more useful information than others, and reduce the impact of outliers, this study adds the hybrid weight into MSVR, and allocates high weights to the information provided by such high sample density and similarity samples during multi-dimensional regression estimation. Finally, the solution of the target case is gathered by incorporating the output of different adaptation engines. The proposed method was applied to the equipment maintenance cost prediction and compared with traditional statistical-based and machine learning-based methods. Empirical comparison results indicated that the proposed adaptation method could achieve the best performance by utilizing the adaptation knowledge in different clusters under multi-case clusters environment.


I. INTRODUCTION
Case-based reasoning (CBR) is a problem-solving paradigm that remembers previous similar situations (or cases) and reuses information and knowledge about the stored cases for dealing with new problems [1]. The situation of a new problem is always different from the historical case in case base, as the old solutions of similar ones cannot be applied directly to the new problem under normal circumstances [2]. Consequently, the case adaptation technique is necessary for the CBR system to generate a solution that is suitable for the new case. However, existing CBR systems are generally characterized by a sophisticated case retrieval mechanism without a well-developed case adaptation engine, this resulted from the fact that case adaptation needs to be guided by domain knowledge or specialists experience, while adaptation knowledge is not always accessible and available [3]. Therefore, how to acquire knowledge from the subsets of the case base automatically during the adaptation process has become an urgent problem to be solved.
How to automate case adaptation is a challenging problem for CBR methodology, previous studies of case adaptation adopted statistical ways, e.g., the Equal Mean (EM) [4], the Weighted Mean (WM) [5] and multiple linear regression analysis (MRA) [6] to utilize the solution values of retrieved cases. It is not surprising that these methods lack accuracy due to the information of similar cases has not yet been fully utilized [7]. Recently, various machine learning (ML) methods have been applied to perform automatic knowledgelean adaptation such as neural networks (NNs) [8], support vector machine (SVM) [9,10] and optimization algorithms such as genetic algorithm (GA) [11][12][13], particle swarm optimization (PSO) [14]. The ML-based adaptation methods reuse the information of similar cases to build a 'black box' model that can manifest the adaptation knowledge, and implement automatic case adaptation by inputting the situation part of the target case into the completed model. The solution is then modified or adapted to fit the target problem.
Among the above-mentioned MLs, SVM has been successfully proved to be superior to classical NNs. The reason is that SVM implements the structural risk minimization principle and tries to minimize an upper bound of generalization error instead of merely minimizing the misclassification error or deviation from the correct solution of the training data [3], compared with NNs it overcomes the shortcoming of easy to fall into a local optimum. But the standard formulation of SVM can be only used as a univariate modeling technique for CBR adaptation due to its inherent single-output structure, which limits its application in multi-solution values adaptation. Therefore, the multioutput SVM (MSVM) [15] has been introduced into the multi-variable adaptation model, some studies have employed MSVM for case adaptation with various degrees of success [3,10]. MSVM adopts a new L2 norm-based insensitive loss function vector to consider the errors of different output channels simultaneously, hence the traditional single-output SVM regression model can be extended to a multi-output structure and achieves an optimum network structure with multi-output regression tasks. Furthermore, MSVM eventually results in better generalization performance for multi-input and multi-output datasets with fewer network parameters.
When applying the ML-based adaptation method, the adaptation knowledge acquirement is performed in similar case set obtained from case retrieval process. However, similar cases retrieved by target case or new problems may come from different clusters in the case base. On the one hand, adaptation rules or knowledge should be further excavated in different case clusters to which similar cases belong, and case adaptation in diverse clusters should be operated separately because the adaptation knowledge in different case clusters may be disparate or even divergent. On the other hand, when applying the acquired knowledge from diverse clusters to implement automatic case adaptation, the weights of adaptation information in different clusters are disparate, on account of the differentiated number of similar cases in different case clusters. Therefore, the combination and employment of the adaptation knowledge need to be further explored by utilizing the implicit knowledge in the case base, and then the improvement of the adaptation accuracy can be achieved.
To sum up, focusing on the multivariable adaptation problem in CBR adaptation, this paper proposes a novel case adaptation strategy based on multiple related clusters. The case clusters related to the target case can be utilized for adaptation knowledge discovery and rule extraction. Then, the training and testing sets of case adaptation models based on MSVM are established by using case pairs in diverse case clusters. Finally, adaptation engines based on different related clusters are being coordinated together for the target case adaptation. The related works will be discussed in the following sections, and the paper is structured as follows. Section II gives A brief introduction to the case retrieval and reuse phase of CBR, and proposes a retrieval strategy based on multi-related clusters. Section III describes the framework of case retrieval and adaptation and corresponding algorithm. In this section, the training samples of case adaptation model is constructed by retrieved cases, and an improved MSVM is introduced. Section IV introduces simulation experiments, including background, experimental condition and obtained results. Finally, Section V presents the conclusion and further work.

A. CASE RETRIEVAL AND REUSE IN CBR
CBR implies knowledge and experience in cases, then retrieves and reuses similar ones for reasoning [16]. Generally, before the case retrieval process, the traditional clustering algorithm has been introduced to divide the case base into small clusters. The similarity between the target case and each clustering center needs to be calculated, and the target case is classified to the cluster with the largest similarity. In the stage of case retrieval, similar cases are searched in the range of the largest similarity cluster. In case reuse process, the retrieved similar cases are chosen as training samples to build an adaptation model for case reuse and adaptation knowledge acquisition, and the solution for the target case can be derived through the constructed adaptation model. The flow diagram of the traditional CBR is as shown in Figure 1.
Case retrieval and case reuse (or adaptation) are the most important parts that affect reasoning results, so the improvements always focus on these two parts. Case retrieval generally falls into one of three categories: k-nearest neighbor (k-NN), inductive learning, and knowledge guide. Among these methods, the k-NN search method is most frequently used for selecting similar past cases in the CBR system, and the k-NN method retrieves k cases based on the weighted sum of case features in the problem case against the cases in memory [17]. Afterwards, several learning adaptation methods have been employed in case reuse, such as NNs, SVR, genetic algorithm, and so on, conventional adaptation methods are not only influenced by their own learning mechanism, but also affected by the value of k. If k is too high, the CBR will retrieve too many unrelated cases, which may lead to poor adaptation results. By contrast, if k is too small, the adaptation method may lack sufficient reference cases to make a correct decision [18]. The adaptation methods need to be available in the circumstance of small-size training samples because the value of k is often small, and SVM has advantages in processing small sample data and modeling, and the solution of SVM is a global optimum, and overfitting is unlikely to occur with SVM. However, the performance of the learning machine is still closely related to the selection of similar case samples. Different similarity measurement metrics may lead to diverse results, and the accuracy and robustness of the CBR system are constrained.

B. CASE RETRIEVAL BASED ON RELATED CASE CLUSTERS
The process of case retrieval always operates in a certain cluster in the case base, because the target has been classified to the cluster transcendently. However, in high-dimensional feature space based on conditional attributes, different case clusters are distributed compactly in space due to the "highdimensional curse". Because of the different calculating techniques of attribute weight, the target case is likely to be categorized into different clusters, which may lead to opposite results, especially when there are many case clusters in the case base. Automatically, we can start with the case retrieval rather than the classification of the target case, and observe which clusters the similar cases retrieved belong to. The clusters of these similar cases are called related case clusters (RCCs) in this paper. As a result, there will be two situations in practice: (1) single RCC, (2) multi-RCCs. In the first scenario, as is shown in Figure 2, all similar cases are in the same cluster. Therefore, the sample of case adaptation can be operated in the same cluster, and the similar cases retrieved by the target case are denoted as direct similar cases (DSCs), the remaining cases in the same cluster with similar cases are indirect similar cases (ISCs). In the second scenario, as is shown in Figure 3, similar cases are affiliated to different clusters in the case base, so the case adaptation should not be confined to a certain cluster. The mining of case adaptation knowledge is based on multiple RCCs, it is necessary to integrate and combine the adaptation knowledge extracted from multiple case clusters to solve the new problem, and then obtain final solutions.  Furthermore, under the multi-RCCs scenario, the number of DSCs in different case clusters is diverse, hence the case adaptation knowledge based on different RCCs should not be implemented in the new problem equivalently. Adaptation modules generated in RCCs need to be embedded into the final case adaptation model, thus, the adaptation knowledge in each RCCs can be moderately applied.
Inspired by that, this paper intends to introduce a modularization technique into CBR adaptation, and the adaptation engine is performed into multi-case clusters according to the results of case retrieval. To improve the capability and efficiency of the adaptation engine, the training samples of adaptation model are generated by pairs of the DSCs and the corresponding ISCs. Then, the MSVMbased adaptation engine for each RCC could be constructed. Eventually, the adaptation model assembled by each adaptation engine is used to derive the target case solution.

A. FRAMEWORK OF CASE ADAPTATION
The framework of case adaptation is shown in Figure 4, which consists of case retrieval using k-NN algorithm, generation of adaptation sample sets and adaptation engine construction by MSVM. The k-NN algorithm is used to search the DSCs of the target case in the case base, and further obtain the ISCs from different related case clusters. After finding the DSCs and ISCs within each RCC, the information of similar ones is used to design the corresponding case adaptation sample sets. Each RCC has its adaptation engine constructed by MSVM, and the output of MSVM is the modified solution values. The information of the target case is entered into the multi-adaptation engine to output suitable solution values.

Historical case base
Step I: Case retrieval using k-NN algorithm Step II: Generation of adaptation sample set

Direct similar case set
Indirect similar case set Step III: Adaptation engine Construction by MSVM Step IV: Combination of multi-adaptation engine Target case Case solution

B. CASE RETRIEVAL BASED ON K-NEIGHBOR NEAREST METHOD
For the convenience of expression, the description of the case adaptation problem is based on the knowledge representation of rough set. The information system of CBR is denoted as where stands for the feature value of the target case on condition attribute , and is the feature value of the target case on solution attribute . Thus, the historical case base is expressed by: There is a similarity-based mapping relationship between the search results in the case base and the retrieval of the target case, which can be denoted as : N F x U → , the feature value difference in the condition attribute between xk and x N can be expressed as ,,  is the distance measure of attribute value. Define the case similarity between x N and xk on condition attribute set is given by where  denotes case condition attribute weight, which is calculated by Entropy Weight method (EWM) [19], and ,  refers to the inner product operation. According to the case similarity the DSC set based on k-NN algorithm can be calculated, which is denoted as Denote the RCC which case i x belongs to as () i Ux , and the ISCs in . It should be noted that different DSCs may belong to the same RCC, for each related case cluster () N Ux , the DSCs and ISCs is obtained.

C. SAMPLE GENERATION OF ADAPTATION ENGINE BASED ON RELATED CASE CLUSTER
In order to express the case adaptation knowledge in RCC effectively, the information of similar cases should be reconstructed, according to the assumption in literature [20]: The differences that occur between cases in the case base are representative of the differences that will occur between future problems and the case base. The DSCs and ISCs in the same RCC are used to build a sample pattern with the same inputs and outputs. Denote the target as , as is shown in Figure 5. The input of training sample in case cluster can be described as and output is k D . Therefore, the formal equation of the case adaptation model can be described as Obviously, when the number of DSCs in RCC is H, and Q ISCs in cluster, then we have total HQ  adaptation samples could be used to train the adaptation engine. Such quantity of adaptation sample can not only guarantee the sufficient data for adaptation engine construction, but also can improve the flexibility of training, and ensure the robustness of training results. Once the adaptation engine based on related cluster are established, likewise, as is shown in Figure 6, the solution values of the target case is outputted by the adaptation engine, and the input is . When there are multi-DSCs in related case cluster, the solution values of target case adapted by different solutions of DSCs, and the solution output of the adaptation engine based on ith RCC is given by where the E is the sum of DSCs in RCCs, and denotes the DSCs in ith RCC, is the number of related case clusters. By constructing the case adaptation knowledge mapping between the value difference of condition attributes and decision attributes, then the adaptation model is established. Since the output dimension is consistent with the output of the traditional multi-output adaptation model, its computational complexity has not increased significantly. On the other hand, using the solution attributes of DSCs as the input of adaptation knowledge is also in line with the subjective thinking mode of case adaptation, through certain adaptation knowledge or rules, the solution attribute values of the new case are modified based on the solution parts of similar cases. Furthermore, using case similarity as the input of the adaptation model can avoid the weight calculation of the combined output value.
Significantly, the proposed method of adaptation sample generation is not only suitable for single RCC, but also multi-RCCs. Compared with the traditional adaptation method in single RCC, the case information in the cluster can be more fully utilized. Moreover, using case similarity as the input can also alleviate the impact of samples with different degrees of similarity. Thus, the accuracy of case adaptation can be improved.

1) MULTI-OUTPUT SVM FORMULATION
As a machine learning algorithm based on statistical learning theory, Support Vector Machine (SVM) can find the best compromise between algorithm complexity and learning capability through Vapnik-Chervonenkis (VC) dimensional theory and structural risk minimization principle under the small samples circumstance [21]. The standard formulation of SVM can be only used as a univariate modeling technique for regression due to its inherent single-output structure, which results in the construction of a parallel SVM-based prediction module for each dependent variable, and the rapid increase of computing cost and the reduction of efficiency if the problem has large variables. But maybe most importantly, the structure of parallel SVM-based module is lack consideration of the relevance between dependent variables. Hence, in this paper, we introduce a generalization of SVRs to solve the problem of multiple variables output in forecasting the ship equipment maintenance expenses. Noted the samples set of case adaptation is   1 ( , ) where is the weight of the training sample, the value of which is determined by the density of neighborhood samples and the similarity to the target case sample and will be discussed in the next section. for every output to obtain a robust predictive model.

2) CALCULATION OF SAMPLE WEIGHT BASED ON SAMPLE DENSITY AND SIMILARITY
The critical issue of weight calculation should focus on the following two aspects, the higher the similarity to the target case, the better the weight distribution of samples. This is because the sample information with high similarity is not fully utilized, while the sample information with low similarity is over-utilized, which eventually leads to the poor performance of case adaptation. Furthermore, the loss function established based on the L2 norm in MSVM leads to the model being highly sensitive to the outlier sample. Hence, this paper adopts a hybrid weighting calculation strategy to minimize the impact of the above issues.  (6) After the similarity between 0 C and i C , and the neighbor sample density of i C is calculated by Euclidean distance, a geometry mean method is adopted to determine the hybrid weighting of training sample i C , and the formula is The training samples are weighted by formula (7). First, to reduce the influence of outlier samples on the model, the corresponding weight function is constructed by using the neighbor sample density. The greater the value of the neighbor density function of samples, the greater its weight in training. Thus, the sensitivity of the adaptation engine to outliers has been impaired. Meanwhile, considering that cases with higher similarity can provide more information, the similarity function of the case and the neighbor sample density function are combined to construct the blending weight function of the sample, enables the model better fitting performance, and can better excavate the adaptation knowledge in the corresponding case cluster during the regression processing.

3) RESOLUTION OF THE HYBRID WEIGHTED MULTI-OUTPUT SVM
Optimization problems are solved using iterative procedures that rely on each iteration k on the previous solution k W and k b to obtain the following one, until the optimal solution is reached. To construct the IRWLS procedure, the first-order Taylor expansion of () Lu is used to approximate the formula (3) over the previous solution and leading to    (9) and 0, and CT is a sum of constant terms that do not depend on either W and b , which also presents the same value and gradients as ( , ) ii) Compute the solution to where ( ) The literature [15] has proved the above algorithm can finally reach convergence, the optimal solution of (3) is continuously approached through the IRWLS algorithm, and the parameter vector j  and j b corresponding to each output channel can be obtained individually. The line search algorithm can be readily expressed in terms of j  .

A. NAVAL VESSELS MAINTENANCE COST PREDICTION MODELS BASED ON CBR
The maintenance of naval vessels (NV) plays a crucial role in the whole life cycle of NV equipment support, and accurately predicting the maintenance cost (MC) of NV is of great significance to improve the operational effectiveness of NV equipment maintenance expenditure. In recent years, the research about MC prediction models can be summarized in two types in general. They are statistical models based on history data and related time series information, and Artificial Intelligence Models based on intelligent approaches such as neural networks and expert systems. Although these models can achieve good results in a certain environment or situation, the result is often sensitive to parameter settings and sample quality. With the development of artificial intelligence, ML-based models like NNs and SVM have been introduced in forecasting the MC of NV. In this paper, we try to introduce a better forecasting model based on the improved case adaptation method of CBR based on multi-case clusters and Multioutput SVM. The case of NV-MC is made up of two parts: problem and solution features, corresponding with the condition and decision attributes. According to the needs of actual problems, 14 problem features are screened from NV-MC cases according to the degree of correlation between them and 6 solution features, as is listed in Table. I and Table. II.
To verify the validity of the case adaptation method proposed in this paper. Simulation experiments are conducted at actual maintenance cost from Chinese naval vessels (CNV-MC). For the experiment, the maintenance cost data of CNV in the past ten years has been collected in case base, according to its ship type and maintenance level, cases in case base are priors divided into nine categories. A structured case based can improve the efficiency and accuracy of case retrieval. Moreover, it can also facilitate the expansion of the case base. Generally, the ship type can be divided into three categories: Combat ships, Auxiliary ships, and Submarines; and maintenance type is divided into three levels: Dock repairment, Light maintenance, Medium maintenance. As is shown in Figure 7. Moreover, according to the condition attribute values of each case, the DBSCAN algorithm [22] is used to cluster cases and obtain the subdivided clustering under different categories. Thus, the efficiency of case retrieval has been further improved.   in cluster. Ten-fold-cross validation methodology was employed as the validation technique for assessing adaptation performance in each NCC in this research, the design samples are randomly divided into ten folds of the same size. One-fold is used as a test fold. The mean absolute percentage error (MAPE) function is used to examine the adaptation accuracy, and the accuracy of the adaptation model can be given as: 11 11 11

B. CONSTRUCTION OF DIFFERENT CASE ADAPTATION ENGINE IN DIFFERENT RELATED CASE CLUSTERS
where There are three parameters to be optimized, that is C,  and  . From the previous studies [10], the performance of  and the reasonable value of  is 0.01, while C and  are not known beforehand whose values are the best for one problem. In order to quickly obtain the approximate optimal solution of the parameters, and the grid search method is used to optimize ( , ) C  using ten-fold cross validation in the ranges Because the adaptation results of target cases are greatly influenced by the results of case retrieval, and k-NN algorithm was used to select k DSCs whose problem feature values were close to that of target case, and k DSCs may come from different clusters. Adaptation knowledge based on different RCCs should be used to generate corresponding target case adaptation solutions, consequently, the adaptation engines in different RCCs should be utilized to generate the final adaptation solutions of target cases, as is shown in Figure 11. The final solution values of the target case are the combination of different adaptation engines trained based on different RCCs. Furthermore, we intended to carry out adaptation processes using various k values, ten comparative adaptation experiments are performed in different RCCs under each 3-NN, 5-NN, 7-NN, 9-NN, and 11-NN algorithms independently to investigate the adaptation performance. In each experiment, one case was randomly chosen as a target case from different RCCs, and the solution parts of the chosen target case would be compared with the final adaptation solutions, and the mean adaptation accuracy of the experiment can be calculated as: 10   The mean adaptation accuracies of the proposed case adaptation method on 30 times experiments using different k values (k = 3, 5, 7, 9, 11) are illustrated in Figure 12, the weights of problem features are determined by Entropy Weight Method (EWM) [19]. Under different k-NN principles, the fluctuation trend of case adaptation accuracy in different RCCs is basically the same, when the adaptation accuracy of cases in different clusters is significantly higher than other k values, higher k values will lead to a certain deviation because the case information with low similarity is introduced into the model; and smaller k value will result in insufficient number of available similar cases, and the case adaptation model is more likely to fall into a local optimum, so it is imperative to choose a reasonable value of k to search for similar cases. In addition, the mean and deviation of case adaptation accuracies in different RCCs under diverse k values are depicted in Figure 13, we can find that choosing a reasonable k value can not only guarantee the adaptation model to maintain a good adaptation performance, but also ensure the robustness of the model. Furthermore, even under the same clustering algorithm in the case base, the optimal k value in different case clusters may be inconsistent. VOLUME XX, 2017 9

C. COMPARISON EXPERIMENT IN ADAPTATION ENGINE
The target of this empirical research is to investigate whether the proposed HW-MSVR method can achieve higher adaptation performance compared with other traditional adaptation generators using statistical and ML-based methods under the same results of retrieval. For statistical methods, the WM and MRA are adopted for comparison, and the weight of similar cases is calculated by the similarity measure methods, i.e., Euclidian distance and EWM are employed to figure out the similarity values between target case and similar cases. For ML-based methods, the traditional MSVR, MSVR-SW and backpropagation neural network (BPNN) are also employed. Therefore, there are a total of five examined methods, that is MSVR, BPNN, MSVR-SW [10], WM, MRA and HW-MSVM. The kernel functions of MSVR and MSVR-SW applied in this comparison experiment were RBF. For the sake of simplicity, the optimal parameters of these models are produced via parallel grid research as is mentioned in section IV. B, and the final parameters of MSVR, MSVR-SW and HW-MSVR in NCC-I, II and III are given in Table  III. For BPNN, the corresponding adaptation engine includes ( 1) 21 mn + + = input nodes and 6 output nodes, we set 30 hidden nodes, the learning epochs and rate and momentum term of BPNN is 1000, 0.1 and 0.7 respectively. In addition, the MRA method is utilized to derives a linear combination of independent variables to model the relations between problem features and solution features according to their values in similar cases. In different RCCs, we can also abstract the trends of adaptation accuracies of all comparative methods with increasing k values. Figure 15 describes the trends of adaptation accuracies of all comparative methods with increasing k values in different RCCS. Along with the increasing k, the performance of HW-MSVR, MSVR and MSVR-SW are both improved, but the increment of HW-MSVR is more than other comparative methods. The performances of all MSVR-based methods are greater than BPNN and statistical methods, because the MSVR uses the corresponding regularization method to reduce the degree of over-fitting under small-size sample environment, which enables it a better generalization than the NN-based method. For statistical methods, although they can synthesize the adaptation information of all similar cases, they are more sensitive to abnormal samples, and cases with a low degree of similarity are likely to have a great impact on the accuracy of their adaptation. Therefore, MSVR-based methods are more suitable for case adaptation when the size of each case cluster is small. HW-MSVR method can diminish the influence of outliers on the accuracy of the model, but we should figure out the optimized k values automatically to make this method more practical.