Spectroscopic Identification of Environmental Microplastics

Spectroscopic technology is widely used in identifying the categories of microplastics (MPs) for its non-destructive, rapid, and without pretreatment characters. Recognition of spectral category is often conducted by matching with spectral reference library, this works well with a known material library, but fails to blindly identify the unknown source of the environmental MPs. In this work, a robust classifier was proposed to differentiate the chemical types of environmental MPs samples, and the recognition rate was higher than 0.97. This robust classifier innovatively proposed an adaptive estimator in the developed k-nearest neighbor (kNN) model as the hard threshold to classify the environmental MPs, and thus the interference of spectral distortions and diversity was effectively eliminated. This method increases the ability to interpret the spectra of realistic environmental MPs samples.


I. INTRODUCTION
Microplastics (MPs) are plastic fragments less than five millimeters, and in recent years it gains much attention in environmental pollution researches. Plastic waste management is an important task of the world environmental safety [1]- [4]. The identification of plastics is the pre-condition of waste management and recycling processes, especially the accurate identification of chemical compositions of plastics is very necessary and important. Vibrational spectroscopic measurements, including infrared absorption [5]- [7], near-infrared diffuse reflectance [8], [9] and Raman scattering, are the widely used methods due to their advantages as non-destructive and simple preparations. This technology is reliable because it can provide molecular structural information [10], [12]. In the practical environments, plastics samples display large diversity, so the MPs samples are present in various functional groups and contain some chemical contaminations [13], [14], and more importantly, some environmental samples are present in different degradation states [15], [16]. Though the infrared spectral technology is not sensitive to external interference, The associate editor coordinating the review of this manuscript and approving it for publication was Guido Lombardi . all influences mentioned above leading to a considerable modification of spectra and can make the identification more challenging. It is difficult to find adequate matches from the reference spectra by the universal library searching methods [17], and it is very time-consuming to recheck the unmatched spectra manually. All these factors further hinder the automatic identification and increase manual labor during the spectral analysis. For the distortions of spectra, due to the diversity of environmental plastics samples, it is not easy to build an effective model to analyze the sample's compositions [18], [19]. Several reports have proposed the recognition algorithms for automatic identification of MPs category, such as the Principal Component Analysis (PCA) [20] or Random Decision Forest (RDF) [21]. However, the correction rate need to be improved further. Hence, an automatic method is urgently required to decrease classification errors and manual work. Some efficient automated identifying models have been proposed to analyze the unmatched environmental samples such as developing an in-house spectral library [22] and adjusting the threshold of the identification model [23]. In addition to traditional threshold-based approaches, machine learning is used mostly to identify the MPs by the spectra and images recognition. Due to the ability of machine learning in automatically extracting valid VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ information from complex data sets, researchers have applied machine learning methods such as CNN in high dimensional data analysis like MPs image recognition [24]. Similarly, some classic machine learning methods such as kNN, PCA, SVM [25] have been used in one-dimensional data analysis like spectral recognition. This paper proposed a new robust identification method to improve the robustness and the stability of the conventional model. Here a successful machine learning classifiers k-nearest neighbor (kNN) and its robust version were used for the identification of the diversity of environmental plastic debris collected from a wide geographical range of beaches [26]- [29]. The proposed model in this work consists of a simple adaptive estimation of the confounding factors from low Ratio of Signal to Noise (RSN) spectra of environmental samples and have the classification accuracy up to 90 %.

A. ROBUST CLASSIFIER
kNN is a widely used classification model for recognizing the object classes in pattern recognition [26]. In this model, each type of class's boundary was built by the reference or the training data set, then the new test dataset was classified into the closest type in the training data set. The closest type was defended by using the nearest metrics such as Euclidean distance or Mahalanobis distance. Furthermore, the kNN classification model with classic distance estimator was not robust to sample with the characteristic of diversity, and the value of metrics distance was sensitive to the spectral distortion, which was mainly caused by the diversity of environment [27]- [30]. These two factors (the diversity and the distortion of spectra) decrease the classification accuracy.
In this paper, a proposed robust classifier was built based on the Least Trimmed Squares (LTS), which is a typical estimator with a trimming weight function [27]- [30], rather than adding the Bayes function in the kNN framework [31]. This model maintains the smallest residual values and discard the rest of residual values, and only a fraction of the data is used for estimating the mean and covariance of the distance. The approach is more robust to these spectra with the feature of distortions, and because it does not take any distortions into the solution of classification as the distortions are correctly discarded, and thus it is practical to classify the environmental MPs samples [27]- [30]. The sum squares of residual distance from a spectrum to the center of the spectral cluster were determined by Eq.(1), and then a series of distance scalars were ranked from low to high as shown in Eq.(2). The least trimmed square estimator was inferred by Eq.(3) as followed: where x i is the vector of the wavelengths, y i is the output of class and n is the number of wavelengths. is Residual Sum of Squares, (β) (LTS) is sum of the least trimmed square estimator, h is number of the selected wavelengths and h/n is the degree of trimming fractions. This robust method has a wide application and can be transferred to other machine learning classifiers easily based on a distance estimator [29], [30].

B. SAMPLING PROCEDURE
The environmental MPs were sampled in Longwan seabeach (N120 • 53'34", E27 • 52'19", Wenzhou City, China), where various types of daily waste were drifted on the beach. The stainless-steel shovel was used to collect the upper surface layer of the sampling area. All the potential MPs samples were stored in glass containers and transported to the laboratory. Samples were immersed with 30% H 2 O 2 under dark surroundings for 24 hours to digest the potential organic matter and biological materials. Then, the samples were transferred into metallic trays, oven-dried at 60 • C for 0.5 hour. MPs samples were pre-selected with size of 1∼5 mm and thickness of roughly around 1 mm. Samples in each type were selected by their external shape and color, to maximize the variance of the samples for subsequent analysis. Almost 600 samples were collected from the sampling area.

C. COLLECTION OF FT-IR SPECTRA
The Fourier Transform Infrared Spectrometer (FT-IR, Vertex 70, Bruker, Germany) was used in acquired mode of Attenuated Total Reflectance (ATR) to collect the spectra of MPs and represent the spectral information of composition. In measurements, every single piece of the samples were performed by the table-top spectrometer respectively. The spectral background was measured against air with the same settings, and the spectra were collected with OPUS 7.5 software. The range of spectral wavelengths was 4000-500 cm −1 with a spectral resolution of 4 cm −1 . The samples were successively scanned 10 times on different sites with a small shift and these 10 spectra were averaged as the displayed spectral curve. Pretreatments such as smoothing and baseline correction were used to improve the RSN of the spectral dataset, and thus the influence of uncertain noises produced from the physical scattering effects, spectral transformation and manual operations was minimized.

D. SPECTRAL VALIDATION
The procedure of spectra validation for the samples was realized by OPUS software. For this, the spectrum was opened with OPUS and compared with the assigned reference spectra library, and the type of plastic spectral sample was identified according to the presence of its characteristic peaks and trend similarity. The automatic matching rule of the spectral library was considered in the analysis, which is required for the highest accuracy of the matched category and should be more than 80%. The spectra that were not matched to the plastic spectral library in the software could be additionally analyzed manually by expert knowledge, but they were removed in this work. In this automatic matching process, the prepared 500 samples were analyzed by OPUS Version7.5, and four kinds of plastic, including Polyethylene (PE, n = 132), Polyethylene terephthalate (PET, n = 103), Polypropylene (PP, n = 142) and Polystyrene (PS, n = 98) were recognized. The residual 25 samples failed to match the reference library and were discarded.
The matched 475 spectra were regarded as the known plastic category, thus forming a new spectral dataset. Each category of plastic samples was randomly divided into two subsets with the ratio of 4:3, and then these subsets were combined separately to form two new datasets. One was the calibration subset, contained 271 samples, to optimize the classifier's parameters and develop a robust classifier. The other was the prediction subset, contained 204 samples, to validate the performance of the obtained classifier. Before calibrating, these two spectral subsets were pre-processed to improve the RSN.  Fig.2 shows the averaged spectra of these four types of plastic samples. It can be observed that their featured peaks were varied from different types of MPs, and they appeared to classify their categories easily. In FT-IR analysis each plastic showed native groups of the reflectance features. Fig. 3 shows the FT-IR spectra of PE samples which have the most diverse characters. These spectra were trimmed between 500 and 2450 cm −1 , and the PE functional group form of the spectrum was preserved, FT-IR spectra of PE have the character around 2920, 1468 and 720 cm −1 . Nevertheless, there were some unique spectral features in the environmental sample. These unique spectral features were associated with samples in different degradation stages or caused by the experimental errors with instrument's or measurement's noises, such as sample's thickness, surface roughness or surface contamination. Sometimes the differences between the corresponding spectra are not highly characteristic, but in some cases some visual difference in obvious properties could be presented. Among the standard spectra of PE plastic, there were various distortion decreasing intensity of peaks such as at 730 and 715 cm −1 ; to some extent, the peaks disappeared (Fig. 2). The spectral distortion of PE plastic also demonstrated that a new group wa generated through environmental degradation, such as an alkyne bond (C-H) at 1435 cm −1 in some samples. Some other absorption bands were observed in samples at 835 and 637 cm −1 , accounted for the unknown additive. Eventually, the absorption-variant differences and the unavoidable interference in these spectra make the recognition of spectral category more challenging.

III. RESULTS AND DISCUSSION
Four different datasets of MPs samples as PE, PET, PP, PS were applied to validate the universality and effectiveness of the proposed classifying method.First, kNN and parameter-adjusting kNN were calibrated respectively, their performances were validated by these four datasets and were compared. Then the effect of the trimming fractions parameter was analyzed on the performance of the parameter-adjusting kNN classifier. In this study, three evaluation parameters, including accuracy (ACC), sensitivity (SEN) and specificity (SPE), were considered to assess the performance of the classification models. The higher the value of these evaluation parameters, the better the performance of the classification model. Table 1 shows the comparative performance of these two kNN classifiers.  It was found that the classic kNN model performed not well on the samples with distortion feature, and was very sensitive to the environmental sample. The classical kNN model obtained poor performance with lower accuracy of less than 0.925, which accounted that the diversity or contaminated samples were unavoidably in datasets. Findings also showed that when the calibration set contained the diverse samples, the poorest results were presented on almost all datasets tested by kNN model concerning the accuracy, specificity and sensitivity. Then turning to a robust kNN with the improvement of adjusting parameter, in which the trimming fraction was set to 0.5 to conduct the classification, means a half of the covariance of the distance estimation was controlled to keep the smallest residual values. Obvious improvements were observed in the robust kNN model, and good performances in all dataset were obtained on both the calibration dataset and the prediction datasets perfectly, the training accuracy and the testing accuracy were in the range of 0.958 ∼ 1.000 (that the value of accuracy trends to 1 shows the classifying result is very good). As for PE samples, the accuracy, specificity and sensitivity of the prediction sets predicted by the kNN classifier were 0.892 0.867 and 0.900, respectively, which were higher than the accuracy (.0.958 vs 0.892), specificity (0.967 vs.0.900) and sensitivity (0.956 vs.0.923) predicted by the robust model in the calibration set and prediction set. The accuracy of PP, PS was increased most by 5.2%, 5% separately, the less accuracy increased was accounted for the dataset of these two kinds of environmental samples (have large diversity). Though the 5% improvement is not remarkable, but high accuracy of 0.958 was achieved,which is a significantly high classification rate for recognition of environmental MPs. It indicates that robust-kNN is significantly more robust than the classic kNN for validating the diverse environmental samples or contaminated samples.
Then, the effect of the trimming fractions parameter on the performance of the robust kNN model was analyzed with different trimming fractions. Table 2 shows the performances with four trimming fractions (0, 0.15, 0.35 and 0.45) and the existing 0.5 on robust-kNN. Here the trimming fractions 0.5 means that half of the covariance was controlled in distance estimation to keep the smallest residual value, whereas 0 means all the covariance was used in the distance estimation. Compared to the performance of the kNN mode (trimming fraction = 0), the ACC, SEN, and SPE in the robust-kNN model tested on the prediction set were increased obviously. However, this robust-kNN model with different trimming fractions was sensitive in all datasets. The results also showed that for PE, PP, PET sample datasets, the robust-kNN with a fraction size of 0.45 can achieve the best prediction accuracy. Specifically, the average accuracy, specificity and sensitivity of the PE, PP, PET prediction sets validated by the developed robust kNN were (0.981; 1.000; 0.975), (0.975; 1.000; 0.967), (1.000; 1.000; 1.000), respectively. When the performance of the robust model was compared with different fractions, the specificity decreased from 1.000 to 0.922. It was obvious that the robust kNN with suitable parameters has a high efficiency in the identification of environmental samples. For all datasets with the decrease of fraction size, the robust-kNN (fraction 0.15) exhibited worse performance on accuracy but was still better than the classical kNN (the trimming fraction is 0.0), for accuracy (0.933 vs 0.892), specificity (0.900 vs 0.867) and sensitivity (0.944 vs 0.900) of the calibration and prediction sets.
According to the above analysis, although the robust kNN classification model is sensitive to the trimming fractions, the comprehensive performance of robust kNN shows the satisfactory classification model for accurately identifying the environmental MPs samples. The prediction accuracy in the robust-kNN decreased apparently and more quickly with fraction size less than 0.35 for the PE sample, the average accuracy (0.981 vs 0.933), specificity (1.000 vs 0.933) and sensitivity (0.975 vs 0.933) of the prediction set are shown in table 2. It might be that some contaminants negatively influenced the calibrating process of the classifier. Concerning the PET and PP dataset samples, when the fraction was 0.15 or 0.25, the robust-kNN was comparable to the kNN. In table 2, the accuracy to recognize PS sample is very stable with changing the size of the trimming fraction, which may account for their less diversity or with a little contaminant. It can be concluded that the robust kNN is an effective model for dealing with these samples with features of contaminants, yet satisfactory results were achieved when sample spectra were subjected to many interferences. Results showed that the robust-kNN model can not only avoid the interference of spectral identification effectively but also enhance the accuracy of spectral recognition. Thus, it is a useful tool for the robust identification of diverse environmental MPs samples.

IV. CONCLUSION
The study attempted to develop a simple and effective classifier to identify the category of environmental MPs. A robust classifier was proposed to adjust an adaptive distance estimation in conventional kNN, and to overcome the negative influence of the spectral distortions caused by environmental contaminants or plastic degradation. Four types of MPs and more than 400 spectra were applied to verify the effectiveness of the robust-kNN model. The results demonstrated better performance of the proposed method than the classical kNN, with the average accuracy of identification significantly increased from 0.919 (by kNN) to 0.975 (by robust kNN). Considering the advantages of the proposed method, it seems more suitable than the existing kNN model for classifying diverse samples. Results illustrated that the spectral technology combined with a robust kNN classifier method has a significant ability to identify the environmental MPs. This work further throws some light on the fact that the limitations of blind identification of diverse environmental plastics by spectral techniques can be overcome through this robust classifier.
XI CHEN received the Ph.D. degree in electrical system from Xiamen University, China, in 2014. From 2014 to 2015, she was a Visiting Scholar with Nanyang Tech University. Since 2015, she has been an Assistant Professor with the College of Electrical and Electronic Engineering, Wenzhou University. Her research interests include identification of microplastics (>1mm) in coastal environments-based on spectroscopic, fundamental study of spectral detection, and data mining.
JIANCHENG ZHOU received the B.S. degree in electric automation in 2019. He is currently pursuing the M.S. degree with the College of Electrical and Electronic Engineering, Wenzhou University, China. His research interests include spectrum detection, mathematical modeling and data mining-based on infrared, Raman, and chromatographic signals.
LEI-MING YUAN received the B.S., M.S., and Ph.D. degrees from Jiangsu University, China, in 2010, 2013, and 2016, respectively. Since 2016, he has been a Lecturer with the College of Electrical and Electronic Engineering, Wenzhou University. His research interests include photoelectric detection, data mining, agricultural product quality and safety, applied to the quality and safety inspection and non-destructive rapid inspection of agricultural products.
GUANGZAO HUANG received the Ph.D. degree from Xiamen University, China, in 2018. Since 2019, he has been a Lecturer with the College of Electrical and Electronic Engineering, Wenzhou University. His research interests include data analysis, algorithm design and mathematical modeling, mainly used in machine learning, spectral analysis, and chemometrics.