Improving Time Series Reconstruction by Fixing Invalid Values and its Fidelity Evaluation

MODIS time series data have been widely used in the research of regional and global ecosystems and climate change. For vegetation monitoring, vegetation indices such as NDVI (normalized difference vegetation index), EVI (enhanced vegetation index) and NBR (normalized burn ratio), are usually derived from MODIS reflectance data. However, noise usually makes it difficult to generate reliable time series of vegetation indices. Although some methods have been developed for reconstructing NDVI time series data, they still suffer from some limitations. First, there is no reliable approach for detecting and dealing with low-quality data, resulting in poor outcomes. Second, no effective evaluation of the fidelity of the corrected data to the original data has been discussed. For these reasons, we developed a new time series reconstruction approach, named Fixing Invalid Value (FIV) method. The proposed method assumes that the noise in surface reflectance data stems from invalid data, such as clouds, ice, and missing values. The FIV method first uses the spatially and temporally neighboring pixels to estimate the invalid values and then applies morphology operations to remove the residual noise. Finally, the Savitzky-Golay (S-G) filter is employed to generate the final results. The FIV method is tested on 8-day composite MODIS surface reflectance time series data from 2001 to 2012 in Jiangxi and Fujian provinces, China. The results show that the FIV method outperforms the conventional S-G filter and the HANTS method both in terms of visual inspection and quantitative evaluation. Furthermore, the fidelity evaluation reveals that the proposed FIV method produces high-quality time series data under all weather conditions.


I. INTRODUCTION
In the recent three decades, Earth observation satellites have played a central role in monitoring land cover dynamics and ecologic environment. With its high temporal resolution and multi-spectral information, the Moderate resolution Imaging Spectroradiometer (MODIS) data have enhanced our capabilities for monitoring regional and global vegetation dynamics, ecosystem changes, and land disturbances [1]- [3]. MODIS time series data are commonly used to track dynamic changes in the landscape over time. An example is the monitoring of forest by means of time series analysis with vegetation indices (VIs) [4]- [6]. These VIs, including NDVI, EVI, NBR, and other self-defined indices, represent the absorptive and The associate editor coordinating the review of this manuscript and approving it for publication was Peng Liu . reflective characteristics of vegetation in the electromagnetic spectrum. However, MODIS reflectance data unavoidably contain large amounts of noise caused by cloud contamination, atmospheric variability, and aerosol effects. For this reason, the sensors are not always able to obtain the actual information about the land surface, introducing low-quality or missing data. VIs are important indicators of vegetation growth state and are widely used for forest change monitoring by detecting abrupt changes in the time series data. Unfortunately, noise causes inaccurate or severely biased VI breaks, resulting in the distortion of time series and significantly affects the accuracy of terrestrial monitoring.
Comparative studies of these methods have shown that each approach has strengths and limitations and that no method is significantly superior to others [23], [26]- [31]. The MVC method has been recognized by scholars as being essential to the pre-processing of NDVI data, such as the 16-day NDVI data synthesized from MODIS [32] and the 15-day GIMMS NDVI synthesized from the Advanced Very High-Resolution Radiometer (AVHRR) [33]. Although the MVC can eliminate the influence of clouds to some extent, it cannot avoid the presence of some residual noise. The BISE method eliminates noise with a sliding window and a threshold that depends on experience; the method is also timeconsuming. The Fourier-based method does not perform well on irregular or asymmetric time series since it is based on symmetric sine and cosine functions. Some methods, such as the HANTS, the A-G, and the D-L, require the determination of several key parameters that are subjective and depend on experience. The more the parameters, the more complicated the methods' implementation, and the more errors they will introduce.
Apart from the drawbacks of these methods themselves, pre-processing is not sufficiently taken into account. Noise can be caused by different reasons and manifests different characteristics, which is not considered by conventional methods. For instance, the S-G filtering method does not take into account invalid data and treats them without discrimination, which may affect the quality of the original data and result in poor fidelity.
In summary, although time series reconstruction methods have shown the effectiveness in current applications, they still suffer from the following weaknesses: 1) these methods rely on some sort of mathematical principles or formulas and do not make full use of information that comes along with the data itself [34]; 2) they require high-quality original input data and are often significantly influenced by invalid values [35]; and 3) some methods require many input parameters that can be difficult to determine. For instance, A-G and D-L require seven and six parameters, respectively, and HANTS requires five parameters. These parameters are often set manually according to the experience of the user rather than based on established rules [18].
To overcome these limitations, new gap filling methods have been reported in recent studies [11], [19], [36], [37]. To generate high-quality time series data, this paper proposes a compound filtering method based on fixing invalid values (FIV) and S-G filtering. Rather than a completely new method, this is a strategy that tries to reduce as much as possible the negative impacts caused by invalid data. S-G is selected as the basic filtering method since it is still widely reported in the recent literature [30], [38]- [41]. Its popularity is mainly attributed to its effectiveness in removing noise and ability to preserve the shape of peaks in the signal waveform. In particular, the S-G method requires only two parameters, namely the window size m(2 ∼ 7) and the polynomial degree d(2 ∼ 4), which makes it relatively easy to use. Our method has the following characteristics: 1) it combines the MODIS quality control file and cloud detection algorithm to identify invalid pixels; 2) it proposes a 2-dimensional interpolation method to fix the invalid data points in the temporal domains of year and month; and 3) it uses the mathematical morphological method to eliminate residual abnormal concave points in the time series before applying S-G filtering.
In addition to developing a simple and easy to implement approach for time series reconstruction, another goal of this paper is to conduct a fidelity evaluation on the reconstructed results. Whether the invalid data points cause significant effects on the valid data and the magnitude of the effects are rarely investigated. Although a quantitative evaluation framework has been adopted in [27], their assessment is based on synthetic NDVI data, not on real observation. In this paper, we selected an uncontaminated subset of data as the reference data, on which the assessment was performed. Therefore, the main objectives of this paper are summarized as follows: 1) To develop a simple and easily used method to reconstruct high-quality time series, reducing the negative impacts as much as possible; and 2) to conduct a comprehensive fidelity evaluation on the reconstructed results.
The remainder of this paper is organized as follows. In section II, study area and data are introduced. The proposed method and some details are discussed in section III. Results and validation are presented in section IV. Finally, conclusions are given in section V.

II. STUDY AREA AND DATA A. STUDY AREA
The study area consists of two provinces, Jiangxi and Fujian, in the southeast part of China. This region ranges in latitude from 23 o 33 N to 30 o 05 N and longitude from 113 o 34 E to 120 o 40 E (Fig. 1), and it is dominated by hilly and mountainous terrain with a forest cover rate of about 60%. The two provinces are characterized by a subtropical monsoon climate, which is humid and hot in the summer and cold in the winter. Due to high rates of evaporation, this area has high cloud coverage throughout the year, and snow precipitation is occasional in winter months.

B. DATA
For vegetation monitoring, the MODIS VI products (MOD13) provide consistent NDVI and EVI time series data. However, instead of the MOD13 products, we used the MOD09A1 data (surface reflectance products) because of the following reasons: 1) MOD13 provides only NDVI and EVI time series products, while with MOD09A1 we can calculate additional vegetation indices (e.g. NBR). 2) In the study, we found that some cloud covered pixels were not identified by the state flag files, thus we adopted another algorithm based on reflectance data to detect clouds. 3) The NDVI data provided by the MOD13 products have a temporal resolution of 16-days, whereas we can generate VI data with 8-day resolution from reflectance data. Although the MOD09A1 products require more processing, they provide additional benefits. The MOD9A1 products provide 8-day composite surface reflectance data with 7 bands (band 1 -7) at a spatial resolution of 500 m. This product has undergone atmospheric correction to eliminate the influences of atmospheric scattering, ozone, aerosol, and thin clouds as well as geometric correction. In the MOD09A1 reflectance data, band 1 is the red channel, band 2 the near-infrared channel, band 3 the blue channel, and band 4 the green channel. In this dataset, each pixel contains the best possible L2G observation during an eight-day period, and missing pixels have been marked as -28,672. For each image, the MOD09A1 product also provides a state file that indicates the state information (cloud, fog, ice, and snow) for each pixel.

III. METHODOLOGY
The goal of this study was to reconstruct VI time series data for vegetation change detection and monitoring. As we have mentioned before, the VI can be NDVI, EVI, NBR, or other vegetation index. Experiments show that these VIs are highly correlated and display similar patterns in the time series curves. For simplicity, in our experiments we use only the NDVI time series.
The idea of this paper was to eliminate as many lowquality pixels as possible before filtering. Fig. 2 shows the overall flowchart of the whole process. First, the missing pixels were identified and interpolated using linear regression in both the spatial and temporal domains. This processing restored reflectance images as well as provided more valid values for fixing invalid pixels in the later steps. Subsequently, cloud, ice and snow pixels were extracted according to the state flag files, and then residual cloud pixels were identified using a cloud detection method called IBCD algorithm. Thus, these contaminated pixels, also called invalid data points in this paper, have been marked. In the following step, these marked invalid pixels were fixed according to the neighboring valid pixels in the temporal domains of year and month. Finally, the morphological closing operation was employed to remove residual noise and was followed by the S-G filtering. The remainder of this paper details each process.

A. FILLING MISSING PIXELS
This processing is only necessary when the data contain missing pixels. Although these missing pixels could be fixed along with the invalid pixels, we processed them in advance because missing pixels tend to only occur in some small regions and can be reliably estimated in both the spatial and the temporal domains. Once the missing pixels have been filled, more valid values are available when fixing invalid points in the later process. Filling missing pixels consists of the following two steps.

1) MARKING MISSING PIXELS
Due to unknown reasons, the MOD09A1 products in our study area contain some missing pixels, whose values have been set to -28,672. These invalid data points can be easily identified by scanning the whole dataset. Fig. 3 illustrates the basic principle of fixing invalid pixels. Image 1 is the current image containing invalid pixels, in which region A (dark color) indicates the invalid pixels and the outer circular region B is the buffer of A. The pixels in buffer B are valid. Image 2 is the reference data (cloudfree image) that is closest in temporal domain to Image 1, either from the previous or the next image acquisition. Pixels in both regions A' and B' are valid. Because Image 1 and Image 2 are very close in the temporal domain, there is a very strong correlation between the corresponding pixels in region B and region B'. The same correlation also exists between the pixels in region A and A'. If the linear regression relationship between the pixels in region B and B' is established, as shown in (1), the invalid pixels in region A can be inferred from the pixels in A'.

2) INTERPOLATING MISSING PIXELS BASED ON LINEAR REGRESSION
To do this, we first calculate the coefficients α and β by introducing the pixels from region B and region B' into (1). After obtaining α and β, the pixel values in region A can be computed by introducing the pixels in region A' into (1). This replaces the invalid data. To obtain better results, we first calculate the correlation between region B and region B' using both adjacent images from the previous and next acquisitions, and then select the one having a higher correlation with the reference data. In summary, the algorithm can be described as follows: • Step 1: Choose the previous cloud-free image (denoted as Img Pre ) and the next cloud-free image (denoted as Img Next ) that are closest in temporal order to the current image Img as the reference data.
• Step 2: Determine the invalid region A and buffer region B in Img and then calculate the corresponding buffer regions B in Img Pre and B in Img Next .
• Step 3: Calculate the correlation index R using the pixels in B and B, and R between B and B.
• Step 4: If R > R , use Img Pre to fix Img; otherwise, use Img Next to fix Img. It is worth mentioning that this method may not be accurate when applied to large missing areas or when the terrestrial features change over a short period of time. In spite of this limitation, this method was still applicable to our study because the amount of missing data was small. Later experiments also indicated that the results were satisfactory. On the other hand, even if the values are not accurate, the fixed pixels are less biased than the original, thus still reducing negative influences to some extent.

B. PROCESSING INVALID PIXELS
The NDVI time series used in our experiment was derived from the eight-day composite MODIS product. Theoretically, the time series of each pixel in forest areas should reveal a smooth curve that reflects the real phenological information of vegetation. However, because of clouds, snow or ice, and atmospheric conditions, the NDVI curves unavoidably contain noise, failing to correctly represent the actual vegetation seasonality. Thus, the contaminated NDVI values can result in a misinterpretation of forest change and should be fixed.
Before filtering the NDVI time series, we need to identify and fix the invalid pixels. The fixing step is necessary because the invalid values are severely biased and will cause a considerable negative influence on the effectiveness of the filtering algorithm, as shown in Fig. 4, in which the red square boxes indicate the cloud contaminated pixels. Although this sample was chosen from evergreen forest land cover, the S-G filtered result presented abrupt drops in NDVI values during the summer months in each year. This example shows that the S-G filtering fails to remove severely biased noise (invalid values). It is worth mentioning that fixing invalid pixels aims to obtain less biased values rather than accurate values. Following [42], the invalid pixels are only estimated using valid pixels from adjacent acquisitions within halfmonth forward or backward. This reduces the influence of vegetation growth fluctuations.
In this study, we dealt with two categories of contaminated pixels: 1) ice and snow coverage; and 2) cloud obstructions. These contaminated values were defined as invalid data in our study and were fixed in later steps.

1) MASKING ICE AND SNOW
Along with the surface reflectance data, the MOD09A1 product also provides a state flag file for each frame image, using a 16-bit unsigned integer (flag value) to indicate the state of each pixel. For instance, if the binary value of the 15th bit of the flag value (from right to left) is 1, it means that this pixel is covered with ice or snow. According to the state data of the whole image, we can extract the ice and snow cover mask.

2) MASKING CLOUDS
Cloud cover prevents the MODIS sensor from acquiring accurate surface reflectance information of the terrestrial objects, resulting in missing data or severely biased values at these locations. In this study, we identified clouds and fog by using VOLUME 8, 2020 the state flag file as well as using a cloud detection algorithm. Similarly to the missing data points, a portion of cloud points was easily extracted according to the state flag file.

3) CLOUD DETECTION
Although the MOD09A1 state files can identify some cloud contaminated pixels, some residual pixels that cannot be identified may still exist. To address the problem, we used an existing cloud detection method, named inflection-based cloud detection (IBCD) algorithm [43], to identify the remaining cloud pixels. This algorithm is specifically targeted at MOD09A1 time series data, and it searches inflection points that can divide cloudy and clear-sky observations on ascending sorted time series of B3 (the 3rd band) and R37 (the ratio of B3 to the 7th band B7) of MOD09 data (for details, see [43]). The inflection point was used to identify contaminated pixels.

4) FIXING INVALID PIXELS
After the above-mentioned steps, invalid pixels were extracted and marked. To fix these invalid pixels, we proposed a folding interpolation method that reformats 1-dimensional signals into 2-dimensional data by folding the VI time series, as shown in Fig. 5. The NDVI time series in Fig. 5(a) covered a period of 12 years (from 2001 to 2012). Due to the periodical characteristic of vegetation growth, each section within one year had a similar shape; the NDVI values did not vary too much within the same season. After rearranging all the 12 sections into two dimensions, as shown in Fig. 5(b), we obtained a 2-dimensional image in Fig. 5(c), in which the invalid pixels are shown in black. In this way, the original 1-dimensional signal was reformatted into a 2-dimensional (month and year) signal, in which the neighboring pixels had a strong positive correlation. Based on this fact, the invalid pixels were interpolated from the valid neighboring pixels. Each pixel in Fig. 5(c) was processed as follows: where, V o (i, j) presents the original value at location (i, j) in the image; M (i, j) is the 2-D average smoothing operation; w is the radius of the neighboring window and set to 2 in our experiments; and V (i, j) is the final value at (i, j). It is noted that the invalid pixels are not involved in the average smoothing operation M (i, j), implying that only the valid pixels are used to estimate the values that will replace the invalid pixels. Experiments show that this proposed operation can correct invalid pixels as well as remove isolated noise, as shown in Fig. 5(d). After interpolating the invalid pixels, we concatenated the 12 sections to form a new VI time series, in which invalid pixels had been fixed. A comparison between 1-and 2-dimensional interpolation is demonstrated by Fig. 5(e), in which some invalid points that cannot be The above-mentioned correction operation can fill a majority of the invalid values, but experiments show that a small number of abnormal data points still exist in the corrected time series. These abnormal values are obviously lower than their neighbors, and thus form concave breaks in the curves, as shown in Fig. 6(a). As mentioned in some studies [12], [19], since vegetation shows a nearly linear profile over a short period, these lower values should be considered noise rather than vegetation growth changes. To fill these breaks, we applied a basic mathematical morphology operation called closing filtering [44]. Fig. 6(b) gives an intuitive explanation of how closing filtering works. It is achieved by placing an infinite number of identical disks (also called structural elements) in contact with the top of curve A along all the curve and taking the lower boundary of the disks [45]. It is obvious that closing filtering suppresses valleys on the curve while retaining the peaks. For practical use, a linear structural element is usually used instead of a disk. In this paper, the size of the linear structural element is set to 3. As shown in Fig. 6(c), after the closing operation, the curve retains its overall shape; however, two concave points have been eliminated.

2) S-G FILTERING
Since the S-G filtering was proposed, it has received significant attention from researchers. Essentially, S-G filtering is a convolution that can be understood as a weighted average algorithm with a moving window [11]. Its weighting coefficients are not constant; rather, they are obtained by conducting least-squares fitting for the given high-order polynomial in the sliding window. The least-squares fitting can minimize the root-mean-square error, so as to preserve higher moments within the data. The least-squares convolution is defined as follows: In (3), Y * j is the value of the time series data after filtering, namely, the fitted value of the vegetation index; Y j+i is the original value of the time series data, namely, the original value of the vegetation index; C i is the filtering coefficient; m is the size of the moving window; and N represents the number of convolutions, which is generally equal to the number of data points contained in the sliding window.
The S-G filtering is a process of continuously approaching the upper envelope line of the vegetation index, and therefore it can reconstruct a smooth time series. S-G filtering is so widely used that it has been integrated into image processing software such as ENVI, R, MatLab, and TIMESAT. For practical use, the S-G filtering algorithm requires two parameters: the size of the moving window m and the order of the polynomial d. Larger m or smaller d generates smoother curves and vice versa.

IV. RESULTS AND VALIDATION
The experiments consist of two parts: 1) missing data fixing and evaluation; and 2) time series data reconstructing and fidelity evaluation. were chosen. The missing pixels in the image of day 81 were estimated using the valid images of day 73 (the previous acquisition, Fig. 7(a)) and of day 89 (the next acquisition, Fig. 7(c)). For comparison, besides the proposed method, we also applied two other approaches, namely triangulated irregular network (TIN) spatial interpolation and averaging method, to estimate the missing pixels.
The TIN interpolation method directly uses the single phase of the image (day 81) to interpolate the missing pixels; the averaging method fills the missing pixels with the average value of the images of day 73 and 89. The linear regression method estimates the missing pixels according to the image of day 89 or 73.
A visual inspection shows that the result of the TIN interpolation method was not satisfactory due to a blurring effect. As shown in Fig. 7(d), this method only produced piecewisesmooth patches, failing to restore the texture. On the contrary, both the averaging method ( Fig. 7(e)) and the linear regression method (Fig. 7(f)) produced satisfactory results, with clear boundaries and discernible terrestrial objects.
To carry out a quantitative evaluation on the three methods, we conducted a simulation experiment on three consecutive image acquisition. We first selected a valid region from the image of day 81 ( Fig. 8(a)), and then created an artificial missing area by deleting some pixels ( Fig. 8(b)). Finally, the TIN interpolation method, averaging method, and linear regression method were applied to the missing area. The results are presented in Fig. 8 Table 1.
We can observe that the TIN interpolation method produced a larger min_err and mean_err, whereas the averaging method and the linear regression method produced smaller errors. Although the latter two methods achieved comparable results, the averaging method required two adjacent valid images whereas the linear regression method required only one (either the previous or subsequent acquisition). Therefore, the linear regression method had lower data requirements and was easier to use.

B. TIME SERIES RECONSTRUCTION AND EVALUATION 1) RESULTS AND VISUAL INSPECTION
The NDVI time series data used in this experiment were derived from the MOD09 surface reflectance products from 2001 to 2012, in Jiangxi and Fujian provinces. The dataset consisted of 46 image acquisitions per year, altogether adding up to 552 acquisitions. For comparison, the S-G filtering and the HANTS methods integrated in TIMESAT were also applied to the time series data.
Chen [11] noted that in most cases, when the window size (m) and the degree of the polynomial (d) are set to 4 and 6, respectively, the S-G filtering usually yields the best results. The two parameters are suggested when the interval in the MODIS data is 16 days. In our study, eight-day composite data were used. In this case, optimized results can be obtained when m is set to 8 and d 6. For the HANTS filter, the parameters are set as follows: fit error tolerance (ERR_TOL = 0.02), number of frequencies (NUM_FREQ = 36), degree of overdeterminedness (DOD = 5), min valid data (MIN = 0.0), and max valid data (MAX = 1.0). Fig. 9 presents the reconstructed images on day 161 (in the middle of May) of 2008; Fig. 9(a) is the color composite image, in which some areas were covered by thick clouds, and (b) is the corresponding NDVI image. Fig. 9(c), (d) and (e) are the NDVI images reconstructed by S-G, FIV+SG (the proposed method in this paper), and HANTS, respectively. The results show that the three methods obtained significant gains in details of the occluded terrestrial objects and the edges of rivers and lakes were restored. Compared with Fig. 9(c), the terrestrial objects in Fig. 9(d) and (e) are more discernible, indicating that the S-G method is more affected by the contaminated pixels and yields poorer results than the two other methods.
To better compare the results in the temporal dimension, we randomly selected some forest sample points from the study area, depicting the original and the reconstructed NDVI curves. Because of space limitations, only 4 samples are presented here, as shown in Fig. 10. Since these samples were from evergreen forest, their NDVI curves should have exhibited strong seasonality. However, because of cloud and snow cover, the original NDVI time series curves was contaminated, resulting in abrupt drops. Due to the severe influence of the invalid data, the S-G method produced NDVI curves (blue curves) that fell far below the upper envelope and deviated from the seasonal trend of forest growth. By contrast, the two other methods produced better results, approximating the upper envelopes of the original curves. Comparing the results of FIV+SG and HANTS, it can be observed that the FIV+SG method was less affected by invalid data because it had eliminated the majority of the invalid data before filtering, making the filtered results more consistent with the seasonal trend of the forest. The results also demonstrate that the curves reconstructed by FIV+SG were closer to the upper envelope line of the original NDVI curves, whereas the HANTS method did not approximate the upper envelope in some parts of the time series.

2) ERROR EVALUATION
Time series reconstruction aims to restore contaminated values. To assess the performance of the proposed and other existing methods, ground truth NDVI data are needed. However, since obtaining ground truth data was not possible for this study, the assessment has been carried out on synthetic data instead of real data. We followed the quantitative analysis framework adopted in [12], [19]. Ideal NDVI data were synthesized using the average of the reconstructed results generated by the three methods and were regarded as ground truth data. Then, five levels of noise were simulated by randomly selecting 10%, 30%, 50%, 70%, and 70% of the points. The values of these selected points were randomly reduced by 10% -90%, simulating cloud contaminations. Subsequently, the individual reconstructed data were compared against the ideal data, and the absolute errors were calculated on these modified points. Finally, the statistics (maximum, mean, and standard deviation) of the errors were calculated on 116 random samples of the study area drawn from different land use types, including forest, farmland, bare land, cities, and VOLUME 8, 2020   other mixed types, as shown in Fig. 9(f). Table 2 shows the statistical results under different levels of noise.
One can notice that S-G always produced the largest maximum errors under all levels of noise, whereas FIV+SG and HANTS produced smaller maximum errors. FIV+SG did not always outperform HANTS under all levels of noise. It is worth mentioning that maximum errors may just occur on several abnormal points, and thus they are not always reliable for performance evaluation. By contrast, mean errors are more reliable. As expected, FIV+SG always produced the smallest mean errors under all levels of noise, and S-G the largest. Additionally, FIV+SG outperformed HANTS both in terms of mean errors and standard deviations. Further analysis on the mean errors indicated that the performance of S-G degraded rapidly with increasing levels of noise, whereas FIV+SG was stable and robust, as shown in Fig. 11. Compared with HANTS, FIV+SG was more effective and produced smaller and more concentrated errors.

3) FIDELITY EVALUATION
An excellent reconstruction algorithm should effectively restore the contaminated data as well as preserve the original valid data. To evaluate how well the algorithm can retain the original high-quality data points, we conducted another quantitative analysis by comparing the reconstructed results with the original data. This assessment can be conducted on the real data instead of the synthetic data, because some pixels are acquired under clear weather and can be considered 7566 VOLUME 8, 2020  high-quality data. If the algorithm preserves high fidelity, these high-quality pixels are expected to remain unchanged after applying the reconstruction algorithm. Based on this idea, the fidelity was evaluated on the difference between the original high-quality pixels and the corresponding reconstructed results. Fig. 12(a) and (b) show an example of a subset of a high-quality image on day 113 of 2012. Comparing the three results in Fig. 12(c), we can see that the color tone of forest areas in the image reconstructed with S-G was darker than that of the original data in Fig. 12(b), indicating that the S-G filtering resulted in a decline of the NDVI values in forest areas. In contrast, the HANTS and FIV+SG results were consistent with the original data. Fig. 12(d) illustrates the residuals between the original data and the three reconstructed results. In the residual error images, red and orange indicate larger errors, and blue and purple imply smaller errors. Comparing the three residual error images, we can notice that HANTS and FIV+SG produced smaller overall residuals than the S-G method.
To obtain a comprehensive quantitative evaluation, we selected 12 high-quality images from the entire dataset through visual inspection. These images were acquired under clear weather conditions, and several of them had only a negligible portion of cloud cover (Fig. 13). To reduce the cloud influence to the utmost extent, these cloudy areas were detected and ignored during the evaluation. Therefore, the rest of the chosen images were regarded as the reference data that reflected the exact information from the terrestrial surface, and the NDVI images derived from these reference data were considered to be of high quality. The reconstructed  NDVI images were then compared against these reference NDVI images to check how well the algorithm retained the original high-quality data. Table 3 presents the evaluation results, in which the summary statistics, namely, maximum, mean, and standard deviation (stdev) are derived from the residuals of each image. According to the magnitude of the mean residuals, the 12 images can be divided into two groups: A and B. Comparing the statistics of S-G with that of FIV+SG and HANTS, we can notice that, in both groups, the three methods produced comparable maximum residuals. Experiments show that maximum errors often occur on several abnormal points and make no sense for performance evaluation, whereas mean errors can provide valuable information. As shown in Table 3, in terms of mean residuals and standard deviations in group A and B, FIV+SG and HANTS always outperformed S-G. Smaller means and standard deviations indicate that the residuals produced by FIV+SG and HANTS were more concentrated around zero. This was further confirmed by the histograms of the mean residuals in Fig. 14 and Fig. 15, from which we can notice that all the histograms of FIV+SG and HANTS were narrower and that their peaks were closer to zero. On the contrary, S-G had a flatter histogram in each image, especially in group B.
It is worth noting that in group A, FIV+SG and HANTS produced comparable mean residuals and standard deviations, indicating that both methods worked well in this case. In group B, FIV+SG always outperformed HANTS in terms of either mean residuals or standard deviations. Fig. 15 also demonstrates that FIV+SG had narrower histograms than HANTS.
From Fig. 14 and 15, we can further notice that S-G produced relatively steeper histograms in group A but very flat ones in group B, whereas FIV+SG always produced steep histograms in both groups. By inspecting Table 3, we can notice that all the mean residuals produced by FIV+SG were   less than 0.1, whereas S-G produced mean residuals that were less than 0.1 in group A and that were larger than 0.1 in group B. By further inspection of the time series, we discovered that the previous and subsequent images in group A were also acquired under clear weather conditions or slightly cloud contaminated, whereas those in group B were severely contaminated. Fig. 16(a), presenting 12 consecutive image subsets from the 17th to the 105th day of 2005, is a typical example of a dense evergreen broad-leaved forest located in Sanming District, Fujian Province. Among these images, five of them (2005-017,065,089,097,105) were acquired under clear weather conditions, and the others were severely cloud contaminated. Fig. 16(b) shows the original and recon- structed NDVI curves at the center point of this region. The clear data  indicate that the NDVI values should be about 0.7, but the reconstructed NDVI value in image 2005-065 decreased to about 0.3 after the S-G filtering, due to the influence of the previous and subsequent contaminated NDVIs. By contrast, the FIV+SG method produced the best result and restored the contaminated data while retaining the original uncontaminated data. This example further proved that FIV+SG can obtain satisfactory results under bad weather conditions, whereas S-G is significantly influenced and HANTS is moderately influenced by cloud cover.
To further clarify this fact, we plotted the mean residuals of S-G, FIV+SG, and HANTS in group A and B, as shown in Fig. 17. In group A, S-G did not produce severely degraded results compared to FIV+SG and HANTS. However, in group B, the results of S-G significantly degraded due to the influence of previous or subsequent contaminated data. The results show that, under bad weather conditions, the overall residuals produced by the S-G was about 5.5-9.2 times larger than that of FIV+SG. Similarly, the HANTS method also performed well under clear weather conditions, but could not rival FIV+SG under bad weather conditions. According to the above fidelity evaluation results, we can notice that the S-G method is severely influenced by weather conditions and cannot retain the original high-quality data. This means that applying the S-G filtering to the time series will cause a decline in the values of the original highquality NDVI. Especially when cloudy weather conditions are observed over consecutive acquisitions, the S-G results are severely degraded. In contrast, the FIV+SG and HANTS methods produce better results than S-G under either clear or bad weather conditions. Although both HANTS and FIV+SG produce comparable results under clear weather conditions, FIV+SG outperforms HANTS in the case of severe cloud contamination. In summary, compared with the S-G filtering and the HANTS method, our proposed approach produces time series data with high fidelity under both clear and cloudy weather conditions.

V. CONCLUSION
High-quality VI time series data are the key prerequisite to the long-term monitoring of forests. Although some methods for reconstructing time series data have been proposed, none of them has been recognized as the optimal solution. This paper proposed a simple and easily implemented approach that has achieved high fidelity on the severely contaminated MODIS data of Jiangxi and Fujian provinces in Southeast China. Instead of being a completely new method, the basic idea behind the proposed approach is to mark and fix invalid data points before applying the S-G filter. This paper also proposed a method for fidelity evaluation. The experimental results suggest the following conclusions: 1) Due to severe cloud obstructions in Southeast China, it is difficult to obtain high-quality MODIS reflectance data. Applying the S-G filtering algorithm directly to the VIs time series data will unavoidably cause large residuals, resulting in inaccurate time series data. Fixing the invalid data before applying the filtering operation is an effective way to address this problem.
2) To fill in the missing data, we can employ the averaging method or the linear regression method. Both methods can achieve satisfactory results. However, the averaging method requires at least two cloud-free images, one of which is acquired before the target image and the other after; therefore, its application is restricted. Comparatively, the linear regression method requires only one valid image, which lowers the data requirements, making it more applicable.
3) The simulation experiments indicate that the proposed method works well in restoring the contaminated time series data and produces smaller mean errors than the S-G filtering and the HANTS methods. S-G filtering degrades rapidly with the increase of noise levels, whereas HANTS is more stable and the proposed method is the best of the three methods.
4) The fidelity evaluation shows that the proposed method preserves the original high-quality pixels better than the S-G filtering and the HANTS method. In terms of fidelity, the proposed method produces mean residuals that are less than 0.035 under any weather conditions, whereas the S-G method produces mean residuals ranging from 0.040 to 0.072 under clear weather conditions and ranging from 0.111 to 0.202 under cloudy weather conditions. The proposed method rivals the HANTS method under clear weather conditions, whereas, outperforms HANTS under cloudy weather conditions.
The key factor for the excellent performance of this approach is that the proposed FIV+SG method fixes a large number of invalid data points and uses mathematical morphology to further remove abnormal points before applying the S-G filter, which prevents these severely biased invalid data from adversely influencing the valid data. This strategy is not only applicable to S-G filtering, but it can also be extended to other filtering methods.