A Transfer Learning Approach Utilizing Combined Artificial Samples for Improved Robustness of Model to Estimate Heavy Metal Contamination in Soil

Benefiting from the nanoscale sampling intervals and subtle spectral information in the visible and near-infrared band, hyperspectral technology is considered as an efficient means for monitoring soil heavy metal contamination whereby the good robustness of prediction model is driven by the increase to spectral dimension in model analysis. Considering the positive correlation between samples size and spectral dimension, we focuses on a novel derivation of enlarging samples size in this study to improve model performance by i) preparing artificial samples taking into account of flexibility and control over the laboratory environment compared with collecting wild samples, and ii) using transfer learning method called transfer component analysis (TCA) for reducing spectral feature differences caused by soil heterogeneity to train model in the same data distribution. The proposed approach was tested on three heavy metals, namely copper (Cu), cadmium (Cd) and lead (Pb), collected in the mining area located in the Xiangjiang Basin, Hunan Province, China. The experiments showed that the initial model constructed by a small number of wild samples performed strong prediction sensitivity as the training samples change. In contrast, a modified model with TCA could showed good robustness with excellent predicted ability, the average prediction accuracy of the determinable coefficient (R2) and the ratio of prediction to deviation (RPD) improved to 0.73 and 1.90, 0.74 and 1.92, 0.72 and 1.73, respectively. The results illustrated there was a more reliable modeling method in potential to predict soil heavy metals based on hyperspectral analysis at low cost.


I. INTRODUCTION
In recent years, the frequent incidents related to excessive heavy metal content in grain have indicated the serious situation of soil contamination around the word [1]- [3]. Studies in soil remediation have confirmed the truth that heavy metals are difficult to be effectively degraded due to the complicated physicochemical property [4], [5]. Thus, the heavy metals in agricultural land are prone to accumulate for a The associate editor coordinating the review of this manuscript and approving it for publication was Yilun Shang . long time, which leads to great risk in food security with the decline of soil quality [6], [7]. More important, the public nuisance diseases, in which heavy metals entering the human body through the plant chain, have presented a challenge to health care in continuous accumulation [8]- [10]. Therefore, monitoring the heavy metal concentration in pedosphere has always been the focus of current soil environmental research.
The visible and near-infrared reflectivity (VNIR) spectroscopy have been applied to the quantitative estimation of heavy metal in soil for nearly two decades [11]. In tradition, most researchers prepared experimental data through collecting a large number of soil samples in the wild environment as the basis, and thereafter, they tended to analyze quantitative relationship between heavy metal concentration and spectral reflectance relied on statistical model in the study; ultimately, the constructed model was expected to make valid prediction in real environment for practical application [12], [13]. There were plenty of research scholars having done detailed investigation on this. For example, Choe [14] applied the stepwise multiple linear regression (SMLR) model to predict respectively the content of heavy metals (i.e. lead (Pb), zinc (Zn), arsenic (As)) in Rodalqular mining area located in Spain, and this work also showed the feasibility of mapping the heavy metal distribution combining with hyperspectral remote sensing. Likewise, Sun [15] presented a combination of partial least squares regression and genetic algorithm (GA-PLSR) to predict soil nickel (Ni) concentration in two different regions located in Hunan Province, China, and the spectral bands in connection with organic matter and clay minerals were deemed to be more effective in prediction model compared with entire VNIR spectroscopy. Moreover, Malmir et al. [16] extracted the data from hyperspectral imagine to develop partial least squares regression (PLSR) models to predict micro-elements in sieved and ground air-dried soils, and this study revealed that copper (Cu), magnesium (Mg), Zn associated with soil carbon and nitrogen could be successfully predicted with excellent accuracy.
Although aforesaid researches have shown satisfactory predictability to different heavy meals nowadays, most of them adopted statistical models such as PLSR, SMLR and SVR as the main approach to make quantitative predictions [17]. In this case, one fact that needs to be noted is that statistical models tend to perform good prediction only when the training sample data is a good representative of test samples in the process to model [18]- [20]. As a result, numerous soil samples must be prepared to ensure the reliability and robustness of the experimental results at the beginning of model construction. However, this will not embody the superiority of visible and near-infrared spectroscopy in the matter of minoring heavy metal contamination in soil. On the one hand, collecting substantial soil samples in the field will increase capital and time costs, and this may well occur similar predicament compared with inchoate geospatial interpolation technology in early research. On the other hand, the reflectance spectrum influenced by field environment will be interfered sharply by multifarious components in soil, which may lead to difficult feature extraction in reflectivity signal of heavy metals [21]. Furthermore, some studies attempted to demonstrate the model robustness on the condition of small sample data; nevertheless, these statistical models usually showed neither favorable robustness ability nor high prediction accuracy in this context [22], [23]. In addition, models trained with small sample data were likely to manifest sensitivity to the test samples, because the insufficient sample data was apt to a partial distribution in training set. To break through the limitation of existing method, this paper would put forward to artificially prepare soil samples contaminated by heavy metals to spike train samples with a view for expanding experimental data set at low cost.
Spiked samples were prepared artificially to increase the amount of data in experiment, which was proved to be feasible in different soil element analyses [24], [25]. A good case in point was soil organic carbon investigation in which large national soil spectral libraries were used to increase sample density in regional scale calibrations [26], [27]. In the process of model training, spiking calibration samples from libraries caused a decrease in bias, which improved effectively prediction accuracy compared to those constructed with small sample data. However, this mode is rarely adopted in the field of soil heavy metal assessment. The prime reason is that there is a significant spectral reflectance difference between the artificial samples in the laboratory and natural samples in the wild. To this issue, we proposed a transfer learning method called transfer component analysis (TCA) to reduce spectral differences. For one thing, transfer learning is an important branch in the realm of machine learning, which shows good applicability in recognize knowledge form previous domain to a new one. This is similar to how to learn the spectral characteristics commonality between the wild and laboratory environment. For another, transfer component analysis, a classic method of transfer learning, mainly lies in projecting data from diverse domains to a same feature space, which will facilitate the original data sets with different characteristics submit the identical distribution as much as possible in the same dimensional space. So long as both of soil spectrum from wild and laboratory show similarities, artificially prepared samples could be spiked in the training set consisted of wild samples to enhance model robustness.
Therefore, this article takes the mining area located in Xiangjiang River Basin, Hunan Province, China as a case analysis, and regards three different heavy metals included cadmium (Cd), copper (Cu), and lead (Pb) as research examples. It aims to: (i) analyze the sensitivity of statistical models to test sets trained by small sample data from wild collection; (ii) compare and contrast the data on soil spectral reflectance in the laboratory and field environment, and reduce the characteristic differences through TCA transformation; (iii) use artificially prepared sample data to spike wild samples to train model after spectral transformation with TCA method; (iv) compare and contrast the prediction accuracy and robustness of models constructed with different training samples to provide a more reliable modeling approach in practical environment.

A. ARTIFICIAL PREPARATION OF SOIL SAMPLES CONTAMINATED WITH HEAVY METAL
In this article, we provided three different types of heavy metal-spiked soil samples, namely cadmium, copper and lead. They were divided into two groups of experiments according to the differences in background soil sources. One VOLUME 8, 2020 group included cadmium, and the other group covered copper and lead.
To prepare Cd-spiked soil samples, there was around 30kg clear background soil collected in Hengyang City, Hunan Province, China. After the process of stones eliminating, air drying, fine girding and using 100-mesh polyethylene to sieve, the processed background soil was artificially contaminated by heavy metal standard solution. More specifically, the collected soil was equally divided into 66 parts in which each of them was 100g. One was used to examine the initial Cd concentration in the background soil, and the remaining sixty-five parts were sequentially added with  Table 1.
In the design of artificially prepared soil samples, the spiked cadmium concent value referred to (1) where C represents the standard solution volume (mL) added to artificially prepared soil samples, A represents the design content value in Cd-spiked soil samples (mg/kg), B represents the initial Cd concentration in the collected background soil, and ρ represents Cd standard solution concentration (In our experimental preparation, ρ =100 ug/L ). Different with Cd-spiked samples, the background soil spiked by heavy metal Cu and Pb derived from Changsha City, Hunan Province, China. In contrast, background soil types in both cities were dominated by latosol and red soil according to Chinese Soil Taxonomy (GBT 17296-2009). In the artificial preparation as for spiked samples of Cu and Pb, the concentration design principle was referred the work directive of reference materials (3) (GB/T 15000.3-2008/ISO Guide 35:2006). Similarly, the preparation process was submitted to the pattern mentioned above.
A closer look at the table 1 highlighted the fact that the work collected soil samples in the field was independent of the preparation of artificial heavy metal-spiked samples, which meant sample interpolation could be implemented in theory whereas the concentration distribution of heavy metal in samples was indeterminate in the wild. In addition, the background soil varied with the type of heavy metal in artificial preparation, which provided the experimental data on exploration whether it could break through regional limitations in the artificial sample preparation.

B. OVERVIEW OF THE STUDY AREA
The research area in this paper is located in Xiangjiang River Basin, Hunan Province, China. Hunan province is in the front rank of non-ferrous metal production with a long mining history, and there are a mass of mining areas in the Xiangjiang River Basin to support economic development [28]- [30]. Due to the significant role of Xiangjiang River in agricultural production and economic construction, the heavy metal contamination in this basin needs to be paid more attention for the sake of river ecosystem health. A total of 40 samples were collected in a lead-zinc mining area as data set in the wild. The sampling range was from 112 • 35'24" E to 113 • 36'37" E and 26 • 32'37" N to 25 • 34'12" N. According to Chinese soil taxonomic criteria (GBT 17296-2009), red soil and latosol were the main soil types in the region. It should be noted that the sample soil collected in the field derived from the Hengyang city, which was identical region to Cd-spiked soil, but was different from the background soil of Cu-spiked and Pb-spiked artificial samples.

C. SPECTRAL DATA PREPROCESSING
First of all, the collected samples were removed impurities such as debris and plant residues after air-drying. Second, they were ground in a ceramic bowl and passed through a 100-mesh sieve. At last, each processed sample was divided into two parts on average, which were used for spectral measurement and heavy metal concentration determination in the laboratory, respectively. In particular, the spectrometer PSR-3500 (spectrum range: 350 to 2500 nm) was used to measure the soil spectral reflectance for hyperspectral data. Due to the darkroom environment in the laboratory, a 1000W halogen lamp was provided as a test source to secure bare fiber. The illumination direction of the light source was at an angle of 15 • to the vertical direction, the distance of the light source was set to 30 cm, and the probe was kept at 45 • to the plane of the soil sample. Before the measurement, the standard whiteboard was calibrated and optimized in the first, and each soil sample was repeatedly measured 5 times, then the arithmetic mean value was taken as the actual reflection spectrum data of the soil sample. The spectral resolution was 1.5 nm at 350-1000 nm, 3.8 nm at 1000-1900 nm, and 2.5 nm at 1900-2500 nm. Ultimately, there were a total of 1024 band values in soil spectrum measurement. Meanwhile, the graphite furnace atomic absorption spectrophotometry was adopted to determine the concentrations of cadmium and lead in soil (GB/T 17141-1997), and the detection limits in which it was calculated by digesting 0.5g sample to a FIGURE 2. Soil spectral curves from different samples. Original spectral reflectance in the wild (a) and in the artificial samples spiked with heavy metals Cd (b), Cu (e), and Pb (f), respectively; spectral reflectance after first-order differential transformation in the wild (c) and in the artificial samples spiked with heavy metals Cd (d), Cu (g), and Pb (h), respectively. fixed volume to 50 ml were 0.01 mg/kg and 0.1 mg/kg in the standard specification, respectively. Furthermore, the concentration determination of copper was based on flame atomic adsorption spectrophotometry (GB/T 17138-1997), and the detection limit in which the calculation method was 1 mg/kg as well as before.
The soil samples spiked with three different heavy metals in the laboratory were compared with those in the wild concerning spectral reflectance. It was anticipated that spectral curves tended to be similar in the artificially prepared soil VOLUME 8, 2020 samples whereas there were significant differences of spectrum in the wild samples measured in the laboratory. In order to improve the signal-to-noise ratio, first-order difference was put into use to heighten potential response characteristics of heavy metals in soil spectrum. It could be clearly seen that the spectrum was enhanced in the wavelength of 600 nm, 700 nm, 900 nm and 2200 nm approximately. These bands coincided with the spectral response intervals of total iron, clay minerals and organic matter, and might contain potential heavy metal signatures due to significant correlation in the statistical.
Although artificially prepared samples behaved similarly in spectral reflectance, there were obvious discrepancy as compared to the spectra from wild samples. For instance, the number of band peaks was not the same in the wavelength range of 500nm to 900nm. In general, soil spectrum from different sources might vary in characteristic distribution, which would lead to application difficulty for artificially prepared samples in model training. To this end, this paper proposed the method of transfer component analysis for another spectral prepossessing.

D. TRANSFER COMPONENT ANALYSIS
The target of transfer learning is to recognize and apply the knowledge or patterns learned in a previous domain to different but related domains [31]. In particular, transfer component analysis (TCA) is a classic method of transfer learning proposed in recent years [32], [33]. The main purpose of using TCA in this paper is to reduce the difference in spectrum characteristics from two domains, and wild samples and artificially prepared samples are defined as source and target domains, respectively. The principle of TCA is to project the spectral reflectance data of source domain and target domain to the Reproducing Kernel Hilbert Space (RKHS), and minimize the Maximum Mean Discrepancy (MMD) as relevant criterion for estimating the distance between distributions. The mathematical process is explained as follows: Assuming that the data set from source domain is defined as X S = {x S 1 , · · · , x S n 1 }, where x S i ∈ χ corresponds to the spectral reflectance in wild samples, and n 1 represents the sample number in the data set; similarly, the data set related to target domain is defined as X T = {x T 1 , · · · , x T n 2 }, where x T i ∈ χ corresponds to the spectral reflectance in artificially prepared samples, and n 2 represents a certain heavy metalspiked sample number in artificially preparation. Meanwhile, P(X S ) and Q(X T )(or P and Q for short) are defined as the marginal probability distribution of X S and X T , respectively. In the case where the data characteristics of the two domains are inconsistent, there is usually P = Q. The dearest point in our experiment is to make P as similar as possible to Q. Borgwardt et al. [34] proposed the distance between two distributions P and Q could be measured by the squared distance between the empirical means of the two domains, and it was defined as MMD to estimate the distance of domains mapped into a RKHS. The calculation formula is as follow: where H represents the feature space of RKHS, and : χ → H. Due to the difficulty in finding nonlinear mapping , the TCA proposed by Pan et al. [35], Pan and Yang [31] transformed this problem into a nuclear learning problem by defining a kernel function k(x i , x j ) = (x i ) (x j ). Hence, Equation (2) can be reformulated as: Instead of learning k (·, ·), this problem can be well addressed through learning kernel matrix K , which is considered a semi − definiteprogram (SDP) in the end [31]; then, Principal Component Analysis (PCA) is used on the learned kernel matrix K to acquire a low-dimensional latent space across domains. To sum up, TCA method could reduce the spectral differences from two domains in low dimension space and retain the original spectra characteristic as much as possible. In this paper, spectral dimension form source and target domain dropped from 1024 to around 100 dimensions in three different heavy metal spectra transform. One further point to note was that the decrease of the spectral dimension avoided the redundancy of high-dimensional data in the model calibration.

E. MODEL CALIBRATION AND EVALUATION
Benefiting from the advantages of principal component analysis and canonical correlation analysis [36], partial least squares regression (PLSR) coupled with the latent variables (LVs) number determined by cross-validation was applied for model calibration. We developed three types of models for different purpose mentioned in introduction section.
(I) Initial model: To verify the robustness of model contrasted with small amounts of samples, we set up three kinds of initial models based on samples collected in the wild: Cd model, Cu model and Pb model. For each heavy metal initial model, we carried out 50 experiments through changing randomly sample order of whole datasets in order to avoid the contingency in the sample selection. In every experiment, one third of samples as calibration set were used to build model, one-third samples were as validation set to determine the optimal number of LVs by leave-one-out cross-validation for avoiding over-fitting, and residual one-third samples were as test set to evaluate the prediction accuracy of initial models.
(II) Modified model without TCA: To explore the effect of spiked samples without TCA spectral transformation, we set this model as a control to allow calibration set added with artificially prepared samples. There are also three kinds of heavy metal-spiked experimental groups, namely Cd modified model without TCA, Cu modified model without TCA and Pb modified model without TCA. In each group, a total of 50 experiments in which validation set and test set were consistent with the former were carried out in comparison with initial models.
(III) Modified model with TCA: To study the feasibility of TCA spectral transformation between field and artificially prepared samples, we adopted the spectral dataset of reducing dimensionality via TCA method to train model. To compare and contrast, the calibration set, validation set and test set remained the same with Modified model without TCA in three heavy metal experiments, except TCA spectral prepossessing.
The evaluation indexes of model accuracy are the coefficient of determination (R 2 ) (4), the root mean square error (RMSE) (5), and the ratio of prediction to deviation (RPD) (6), respectively. To be specific, R 2 illuminates the correlative degree of predicted values and measured values, if there is a high level of correlation between them, the calculated value of R 2 is closer to 1. RMSE represents the deviation between predicted and measured values, and a small value of RMSE indicates the small prediction error variability due to sensitivity to outliers. The RPD corresponds to the ratio between standard deviation (S.D) and RMSR, which was used to evaluate the reliability of prediction in the external test.
where y i = measured value,ŷ i = predicted value,ȳ = mean measured value, and n = the number of sample in the test set with i = 1,2,. . . ,n. According to the model accuracy assessment theory proposed by Williams [37] and Saeys [38], the criteria to evaluated the model prediction ability was as follows: the values of R 2 and RPD are great than 0.9 and 3.0 represents the model was excellent in prediction; good prediction is defined by R 2 value at 0.82 to 0.9 and RPD value at 2.5 to 3.0; approximate prediction is with R 2 value at 0.66 to 0.81 and RPD values at 2.0 to 2.5; the possibility to distinguish high and low values is identified with R 2 value at 0.50 to 0.65 and RPD values at 1.5 to 2.0; meanwhile, not only is the value of R 2 greater than 0.5, but also the RPD value is more than 1.5 is recognized as a successful model. A model with good robustness is supposed to meet high R 2 and RPD values, and low RMSE values in prediction accuracy. All model construction and accuracy evaluation were performed in MATLAB 2014a.

A. CONTAMINATION ANALYSIS OF SOIL SAMPLE
To investigate the model performance under the condition of small data, a total of 40 samples were collected in the wild for experimental data preparation. Two of them were eliminated due to the spectral noise interference in the measurement, and there were 38 remained to model analysis in the following experimental research. According to the soil environmental quality risk control standard for soil contamination of agricultural land promulgated by the Chinese government (GB15618-2018), the risk screening value was identified as maximum concentration of heavy metal that threaten crop growth and soil ecological environment. For heavy metal Cd, Cu and Pb, they were 0.3 mg/kg, 50 mg/kg and 70 mg/kg in acidic soil, respectively. The given data in Table 2 indicated that the heavy metal contamination in case area was a serious challenge, because the mean concentrations of three heavy metals exceeded the defined risk screening values to a large extent. Table 2 revealed data on three kinds of sample preparation, namely Wild Sample (WS), Artificially Prepared Samples (APS), and Wild Samples plus Artificially Prepared Samples (WS+APS). As the heavy metal-spiked samples (i.e. Artificially Prepared Samples) was not prepared in the premise of knowing field sample concentration distribution, maximum values of heavy metal in WS and APS performed inconsistency as well as minimum values. One further point to note was that the values of standard deviation in samples were high whereas they differed in three types of sample data. This might be related two reasons: one was that the heavy metal concentration from field samples did not behave normal distribution usually, which varied a lot due to complicated contamination cases; the another was that the chief purpose of AFS was to spike sample data from field environment for increases in data size. In theory, there was no requirement that the spiked samples (i.e. Artificial Preparation Samples) themselves followed a normal distribution. Figure 3 detailed the differences on three heavy metal concentration distribution, there were two conclusions drawn from it: (1) In the wild samples, most of concentration values gathered the range of low pollution. Although a few samples distributed in the scope of high concentration, the number of samples was insufficient to model in reality; (2) The role of artificial preparation samples was to increase samples in training set at high concentration to improve model learning capability. Accordingly, three types of model would be explored whether they behaved robustness or not, namely initial model, modified model without TCA, and modified model with TCA.

B. PREDICTIVE ANALYSIS OF THREE TYPES OF MODEL FOR HEAVY METAL Cd
As the background soil to prepare Cd-spiked samples was from the same region with samples collected in the wild, heavy metal Cd was first studied as experimental object, and PLSR as statistical model was trained to predict the content of heavy metal in the test set. To compare with modified models with TCA in later, the spectral reflectance after first-order transformation within all bands was used as independent variable (X) in initial model and modified model without TCA, and the content value of heavy metal after logarithmic transformation was regarded as dependent variable (Y) in order that the standard deviation of three heavy metals in different soil samples was large in Table 2. Meanwhile, we set validation set resembled machine learning to determine the optimal parameters (i.e. the optimal number of LVs) of PLSR to prevent outfitting. For every type of model, we carried out 50 experiments in case of occasional result, with each experiment, different samples were randomly selected as training set, validation set and test set to build the model and measure predictive performance. In particular, the standard deviations in 50 accuracy results were calculated to evaluate the robustness for reference in three models. Table 3 revealed the data on average prediction results in 50 random experiments. A closer look at the table highlighted the fact that the accuracy performance of modified model with TCA was superior to the other two, while the average values of R 2 and RPD were less than 0.5 and 1.5, respectively. This indicated the first two models were unsuccessful in predict performance of test set, and in addition, the value of RMSE in modified model with TCA was nearly half of those in initial model and modified model without TCA, and it admitted the superiority in evade predicted deviation in tertiary model. As far as model robustness was concerned, an overview of the table 3 showed that the standard deviations tended to decrease gradually in terms of R 2 , RPD and RMSE, which, on the other hand, meant that the smaller the value was in average prediction results, the more stable the predict model was in robustness. More specifically, in 50 random experiments, the predict performance of model should be consistent in accordance with different data set. According to the accuracy factors to assess whether the model had a successful prediction ability, it was clear from the pie charts in Figure 4 that the greatest predictive success rate was in modified model with TCA, with exactly 90%, whereas less than 20% of successful prediction rate was in initial model and modified model without TCA, which showed a majority of Cd predicted results came from low accuracy defined as unsuccessful prediction. There were three premier factors that could account for such phenomenon. First and foremost, when only thirteen samples were trained to model in initial model, our experimental results showed that it was hard to guarantee the training set could learn effectively spectral features with samples variation in test set. Moreover, although the number of training samples increased to 78 in the modified model without TCA, the spectrum characteristics from training set could not represent those in test set on the condition of direct supplement by artificial preparation samples. Last but not least, since both of spectrum between training and test set had been transformed via TCA, the model performed a significant improvement of predictive results, which implied the feasibility of modified model with TCA and deserved to study further via different heavy metals in various ways.

C. PREDICTIVE ANALYSIS OF THREE TYPES OF MODEL FOR HEAVY METAL Cu AND Pb
Considering the practical application, we hope that the background soil collection of artificial samples is more convenient, which does not necessarily come from a specific research area. To enhance the feasibility of research, we prepared heavy meal Cu-spiked and Pb-spiked samples on the basis of background soil from Changsha city. The twofold purposes of this section could attribute to: (1) verify the three types of model discussed above whether performed consistently in the case of different heavy metals; (2) explore that the background soil of artificial spiked samples whether could derive from different area compared with wild samples. To this end, according to the random experiment instructions, we estimated soil Cu and Pb concentrations in three types of model followed a very similar pattern.  Table 4 gave the average prediction results of Cu and Pb in 50 random experiments, compared to statistical results of heavy metal Cd listed in Table 3, there were three similar experimental phenomenon that could be summarized: (1) Unlike the low-precision performance of the first two models, the values of R 2 and RPD increased dramatically in the modified model with TCA; (2) in contrast, there was a rapid decrease in the value of RMSE in the third model; for instance, the value of RMSE had dropped from 1.09 to 0.54 in the prediction of heavy metal Cu; (3) in 50 random experiments, standard deviation took a gradual downward trend in terms of R 2 and RMSE in three types of model. From what have been discussed above, we might reasonably come to the conclusion that only the modified model with TCA could make the valid prediction with higher accuracy. Otherwise, even if quite a few artificial prepared samples we spiked in the training set, it could still by no means show feasibility and robustness in prediction for test set.
The pie charts shown in Figure 5 gave the information about the prediction success rate in 50 random experiments in three models. According to the sample data we collected in the wild real environment, which had been explained in detail in Table 2, it was considerable large in standard deviation as regards the concentration values of heavy meal Cu and Pb, which, indeed, exacerbated the demerit of modeling with small samples. More specifically, observing the prediction statistics about initial model in the Fig. 5(a) and Fig. 5(d), there was a sheer impossibility to model successfully based on thirteen training samples in the prediction of Cu and Pb, while a slight rise of prediction success rate took place in the modified model without TCA. However, the modified model with TCA witnessed a dramatic increase resembled statistical result of Cd experiment. It was clear from the pie chart in the Fig. 5(c) that a majority of statistical results came from VOLUME 8, 2020 successful prediction, with exactly 94%; similarly, for heavy metal Pb, 84% of successful rate contributed to the prediction results.
From the factors mentioned above, we might safely draw conclusions on three aspects: first, the small sample data collected under natural environment, in which the training samples could by no means represent the data feature in test set, was difficult to construct the prediction model in an efficient way. With the change in training samples, our experiments showed the initial model had a significant variation as for predictive capability, and was virtually impossible to predict extremum such as high concentration of Pb in wild samples. This work demonstrated that the estimation of heavy metal in soil was not appropriate in the condition of small sample data, it was necessary to supplement a considerable number of samples in training set. Second, if the artificially prepared heavy metal samples were directly spiked to the training set, there was less possibility for improvement in predicted accuracy. Because of the obvious spectral reflectance differences of both, the data features learned from artificial samples was not compatible with test samples in the wild. Therefore, in order to allow artificial samples to effectively participate in model training, spectral transformation must be performed to make the spectral reflectance from artificial and field samples as similar as possible. Third, TCA had a good performance in soil spectrum conversion. Our experiments showed that after the TCA transformation, artificially prepared samples could be spiked into the training set, and improve dramatically the predicted accuracy of model. Compared to the initial model and modified model without TCA, the modified model with TCA achieved high-precision prediction with good robustness, in spite of random variation of test samples. Since the above conclusions were based on the average prediction results in 50 random experiments, we preferred to provide more convincing experimental analysis to support the viewpoints in detail.

D. SENSITIVITY ANALYSIS OF INITIAL MODEL TO TRAINING SAMPLES
In the case of a small sample of data, how much did the training sample affect the predictive capacity of the model? We discussed it with two groups of experiments by virtue of samples collected in the wild. The first set of experiments was similar to what mentioned in section B, samples were randomly selected by means of the randperm function on the MATLAB. In the second experiment, the collected samples were sorted according to the laboratory concentration analysis, and the selected training samples were as consistent as possible with the test set in the concentration range. Moreover, the above two experiments were also conducted 50 random tests to count the best and worst predictions, accompanied by the changes of training and test samples. Table 5 compared and contrasted the data on the worst and best prediction results in 50 experiments based on different training and test samples for estimating the content of three heavy metals. An interesting phenomenon was that whatever way the training samples was selected to model, the constructed model had similar prediction performance for the test set. For example, the values of accuracy evaluation factors, namely R 2 , RPD and RMSR, had fluctuated slightly around the level of 0.7, 1.7 and 0.7 in the best prediction as for heavy metal Cd. In addition, one further point deserved to be noted was that there were significant differences between the worst a best prediction result. In the case for predicting the concentration of Cd, the difference values of R 2 , RPD and RMSR shown in the two sample selection approaches were around 0.7, 0.8 and 0.5, respectively. Although this situation can be changed to a certain extent in the improvement of statistical models, the negative effects caused by a small number of training samples could not be ignored in a real environment. In another word, due to the complexity of pollution causes in the natural environment, there were large concentration differences inescapably in the collected wiled samples, which, on the other hand, implied the a satisfied accuracy could not appear in the prediction for test set, unless a huge number of training samples were applied to model in the premise of same feature distribution.

E. ROBUSTNESS ANALYSIS OF THE MODIFIED MODEL WITH TCA
Huber had systematically given the robustness concept in three levels from the perspective of robust statistics [39]. In the first place, the model was supposed to show a higher accuracy or a better effectiveness than that in general, and in our research, a robustness model was recognized with a good predictive ability according to the appraised criterion. In the second place, assuming that a small disturbance such as the spectral noise appeared ineluctably in measurement, and theoretically, there were small influence in algorithm performance of model. As far as predictive model of heavy metal was concerned, the spectral differences between wild samples and artificially prepared samples could by no means effect the accuracy result in test set. In the last, considering large deviation in model assumption, for instance, outliers, it would not have a catastrophic impact in model prediction; to be specific, in each random experiment, high concentration values of heavy metal in the training set could not change the prediction accuracy of the model to a large extent.
The charts below gave information about accuracy assessment factors distribution in 50 random experiments. According to the evaluation criteria, if the values of R 2 and RPD were respectively less than 0.5 and 1.5, the model prediction was deemed to be unsuccessful in our experiments. In the bar charts shown in Figure 5, the values of R 2 , as well as RPD, had similar distribution in initial model and modified model without TCA. Specifically, among the predicted results of the three heavy metals, the values of R 2 were mostly concentrated in the range of less than 0.5, and the values of RPD almost distributed in the range of less than 1.5. Such low accuracy prediction performance indicated the two models might lacked good robustness in term of the first level mentioned above.
In contrast, the modified model with TCA showed a higher accuracy or a better effectiveness in the case of result statistic. In particular, the values of R 2 were chiefly distributed in the effective rang of 0.66 to 0.81; likewise, most RPD values were concentrated in the range of 1.5 to 2.0 and 2.0 to 2.5. When the modified model with TCA compared with the modified model without TCA, an important distinction was that the robustness of model could not be affect sharply due to the spectral difference in training set, which, from   However, whether it had better capability in the prediction of outliers deserved further discussion. In this section, the scatter plots below showed the best accuracy result in 50 random experiments for three models. Figure 7 demonstrated the best prediction in three types of model as for three different heavy metals, respectively. Refer to the third level definition mentioned earlier, model with good robustness ought to be insensitivity along with parameter perturbation caused by training sample change. In reality, the intervention of outliers usually gave rise to large fluctuations in prediction result. Since the experimental sample data used in this article, heavy metals content varied greatly in the high concentration range, the model especially needed to pay attention to the prediction effect of high concentration values. It was clear from the scatter plots above that the modified model with TCA was more superior than the initial model in the prediction of high concentration values. Specifically speaking, the predicted points were closer to the regression line in high concentration interval, so it was anticipated that the values of RMSE was minimum compared to the other two models, which indicated the prediction capability of the modified model with TCA would not be affected by abnormal values such as high concentration to a certain extent. In particular, refer to the evaluation criteria of model prediction capability, the modified model with TCA had excellent prediction in the experiment of heavy metal Cu, with the values of R 2 and RPD exactly 0.93 and 3.23, whereas R 2 and RPD values were respectively around 0.5 and 1.5 in the best accuracy results of another two models. In practical application, the model with good robustness would enhance the reliability in model predicted result. The modified model based on TCA spectral transformation was proposed in this paper through experimental exploration, which made up the insufficient of sample collected in the natural environment by means of the supplement of artificially prepared samples in the laboratory. The robustness of model was demonstrated by the improved prediction accuracy in the case of low cost. In the further study, it will be more fully verified whether the artificially prepared samples can be applied in a wider area.

IV. CONCLUSION
Considering the high sampling cost in the natural environment, studies on soil heavy metal contamination were usually based on small sample data, which made questionable in the matter of model predictive accuracy and robustness. In this paper, three kinds of heavy metals, namely cadmium, copper and lead, were explored experimentally to draw a conclusion: the model trained by a small number of samples did not have good prediction accuracy and robustness. In this case, artificially prepared samples in the laboratory were proposed to improve the amount of training sample. Because there was a certain spectral difference between artificial and wild samples due to soil heterogeneity, a transfer learning method called TCA was innovatively to transform spectrum for the similarity in the data feature. To this end, a modified model with TCA was presented to improve prediction accuracy and enhance robustness in the condition that the training set was supplemented associated with artificially prepared samples. The research in this article showed in the case of a small number of natural samples, artificially prepared samples was able to be adopted for model training by virtue of spectral transformation. A model with good robustness and accuracy assessment capability could be anticipated on the basis of low cost.