Spectra-GANs: A New Automated Denoising Method for Low-S/N Stellar Spectra

Numerous spectra can be obtained from sky surveys such as the Sloan Digital Sky Survey and the Large Sky Area Multi-Object Fibre Spectroscopic Telescope. However, a considerable fraction of such spectra, which are also valuable for astronomical research, are of low quality, possessing characteristics such as low signal-to-noise ratio (low-S/N). Principal component analysis is widely used to process these low-S/N spectra, but it is not efficient enough to describe the non-linear properties within the spectra. Wavelets are often used to denoise the low-S/N spectra. However, as is well known, the most optimal wavelet basis for each type of spectra needs to be determined; therefore, wavelet analysis is very difficult to use in practice. Restricted Boltzmann machine is a non-linear algorithm that performs poorly when applied to low-S/N spectra. Denoising Convolutional Neural Networks (DnCNN) is a promising denoiser, however, its performance is unsatisfactory due to the lack of suitable noise model. To better exploit the spectra with low-S/N, we propose a new method that can be used to obtain better denoised spectra when compared to those obtained using other methods. A new method called the Spectra Generative Adversarial Nets (Spectra-GANs) is introduced. Spectra-GANs is simply a feedforward neural network that learns the difference between the input vector and the target by minimizing the loss function. It can be used in spectral denoising. The performance of Spectra-GANs is better than those of other methods with regard to denoising the spectra, especially with regard to extremely low-S/N spectral processing. Thus, Spectra-GANs proposed herein is a suitable alternative to previously used methods in spectral denoising.


I. INTRODUCTION
Multi-object spectroscopy has substantially improved the effectiveness of observation, and large data sets have been obtained from modern spectroscopic surveys such as the Sloan Digital Sky Survey (SDSS) and the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST or Guo Shoujing Telescope) [1], [2]. Such large data sets provide us with a robust basis to conduct astronomical research. However, among them, there are a considerable fraction of low-quality data. These spectra show significant quality defects such as large noise, unobvious spectral line features, low signal to-noise ratio (low-S/N), continuum anomalies, and splicing anomalies, of which low-S/N spectra The associate editor coordinating the review of this manuscript and approving it for publication was Long Xu . account for a great part. There are many reasons that result in low-S/N spectra, such as shooting noise, skylight background, cosmic noise, readout noise, dark current noise, and so on. The processing and analysis of these low-S/N spectra are of great significance for the improvement of spectral utilisation, multi-band cross-validation, and the discovery of rare celestial bodies, and so on. Historically, these spectra were inspected by human. However, a systematic processing by experts is too time-consuming to be a practical proposition for largescale surveys. Therefore, the efficient and automatic processing of these low-S/N spectra is one of the main difficulties to be overcomed.
In this paper, we introduce a new method, the Spectra Generative Adversarial Nets (Spectra-GANs), to denoise low-S/N spectra. Generative Adversarial Nets (GANs), as a promising class of generative models, is one of the most popular neural networks. Goodfellow et al. first proposed GANs, and its variants have produced remarkable results in various applications, including image synthesis, semantic image editing, image style conversion, and image super-resolution [3]. Since then, several studies on GANs have been conducted. Mirza et al. proposed the Conditional Generative Adversarial Networks (CGANs), which introduced conditional variables in both the generation model G and the discriminant model D [4]. Gulrajani et al. proposed the Wasserstein GAN, which theoretically analysed the reasons for the difficulty of training traditional GANs and formulated the Wasserstein distance [5]. Zhang et al. proposed a Stacked Generative Adversarial Networks (StackGANs) to generate photo-realistic images conditioned on text descriptions [6]. Because convolutional neural networks has substantial advantages with regard to image processing, GANs, such as LAPGAN [7] and DCGAN [8], also use the structure of convolutional neural networks. Similar to LAPGAN, SimGAN can make images more realistic by using real unlabelled images [9]. To better achieve a style transfer between different domains, Zhu et al. presented an approach called the Cycle-consistent Generative Adversarial Nets (Cycle-GANs), which is used to learn how an image can be translated from a source domain to a target domain [10]. In astronomy, CGANs has been used to recover features in astrophysical images of galaxies beyond the deconvolution limit [11]. Stark et al. also used CGANs trained on galaxy images to recover point sources and host galaxy magnitudes with less systematic error and a lower average scattering [12].
In this study, we developed a method to denoise low-S/N spectra based on GANs. Our method is inspired by Cycle-GANs. Similar to Cycle-GANs, Spectra-GANs attempts to capture the special characteristics of one data collection and determine how these characteristics could be translated into another data collection. Therefore, Spectra-GANs can convert a spectrum from one representation to another, such as from a low-S/N representation to a high signalto-noise ratio (high-S/N) representation. To fairly assess our method for denoising stellar spectra, we compare our method with three typical stellar spectral denoising algorithms, namely, Wavelet, RBM, and PCA. In addition, we also compare our method with a promising supervised denoising method called Denoising Convolutional Neural Networks (DnCNN) [6].
The layout of this paper is as follows. In Section III, we provide a brief introduction to Spectra-GANs. In Section IV, we describe the spectral data used in the experiment. In Section V, we compare the performance of Spectra-GANs with those of PCA, Wavelet, RBM, and DnCNN with regard to spectral denoising. In Section VI and section VII, we discuss the limitations and possible applications of Spectra-GANs. Finally, we conclude the paper in Section VIII.

II. RELATED WORK
Principal Component Analysis (PCA) is among the most widely used techniques for processing astronomical data [13]. Deeming first applied PCA to process astronomical spectra [14]. Since then, PCA has proven to be a viable tool in astronomy, for example, in the classification and detection of atypical spectra by reducing the dimensionality of the original spectral data [15]- [22]. In addition, it is also widely used for processing low-quality spectra such as incomplete data. Connolly et al. used PCA to address the problem of incomplete and noisy galaxy spectra and demonstrated that this method could derive an optimal interpolation that reconstructs the underlying galaxy spectral energy distributions in the regions of the missing data [23]. PCA is also applied to reconstruct the missing regions in the spectra for the gap correction of quasar and galaxy spectra. They first fixed the gaps of the spectra by other methods and then constructed a set of eigenspectra from the gap-repaired spectra. Finally, the gaps in the original spectra were corrected via the linear combination of the eigenspectra [24]. Re Fiorentin et al. applied PCA to accurately estimate stellar atmospheric parameters and found that it could act as an effective filter for denoising and recovering missing fluxes of the spectra [25].
In addition to PCA, many other methods have also been used in the pre-processing of astronomical spectra. Lu et al. used wavelets to extract features for denoising the stellar spectra by removing higher frequency components [26]. Bu et al. proposed a new method called the Restricted Boltzmann machine (RBM) as a substitute to PCA for repairing incomplete spectra, spectral denoising, and spectral dimensionality reduction [27]. To improve the efficiency of spectral pre-processing, Wang et al. applied a deep neural network to recover defective spectra [28].

A. GANs
To elucidate our method more clearly, we first introduce GANs. GANs are generative models consisting of two neural networks, of which one is the generator G and the other is the discriminator D. We assumed the instances x obtained from the true data distribution P data(x) to be real and the candidates G (z) produced by the generator G from some noise z ∼ P (z) to be fake. The training goal of the generator G was to minimise the discrimination accuracy of the discriminator D. In contrast, the training objective of the discriminator D is to maximise its discriminative accuracy. That is, the discriminator D tries to classify the instances x obtained from the true data distribution P data(x) as real and the synthetic data G (z) as fake, whereas the generator G tries to make the discriminator D to classify G (z) as real, as shown in Fig. 1. The loss function was formulated as follows: We trained D to maximise the probability of assigning the correct label to both training examples and samples from G VOLUME 8, 2020 (as in Eq. (2)).
Simultaneously, we trained G to minimise log(1-D(G(z))) (as in Eq. (3)). In other words, D and G played a two-player minmax game.
To better achieve a style transfer between different domains, Cycle-GANs is proposed. Our method was inspired by Cycle-GANs; therefore, we will briefly introduce Cycle-GANs in the following section.

B. CYCLE-GANs
In contrast with GANs, Cycle-GANs combine two GANs in different directions into one. Thus, it has two generators G: X → Y and F: Y → X as well as discriminators D X and D Y , where D X aims to distinguish between images {X} and translated images {F(Y)} and D Y aims to discriminate between images {Y} and {G(X)}. The adversarial loss consists of two parts: Meanwhile, to further regularise the generators, Cycle-GANs introduce two cycle consistency losses that one domain should be back to the initial conditions after experiencing two different generators: (a) forward cycle consistency loss: x → G(x) → F(G(x)) ≈x, and (b) backward cycle consistency loss: y → G(y) → G(F(y)) ≈y. These two cycle consistency losses can be combined into one function: +E y∼p data (y) [ G(F(y)) − y 1 ]. (6) Then, the full objective function is +λL cyc (G, F) (7) where λ controls the relative importance of the two objectives.

C. SPECTRA-GANs
The Spectra-GANs method is inspired by Cycle-GANs. Similar to Cycle-GANs, Spectra-GANs are also composed of two generators (G and F) that can satisfy both the forward mapping of X →Ŷ and the reverse mapping ofŶ →X . In fact, this is the idea behind the variational autoencoder [29], which is to adapt to different input images to produce different output images and improve the convergence speed of the algorithm.
However, unlike Cycle-GANs, Spectra-GANs learn the spectra translations using paired samples. As shown in Fig. 2, X represents a low-S/N spectrum, and Y represents the corresponding high-S/N spectrum that has the same right ascension (R.A.) and declination (Decl.) as the one with the low-S/N. Then, owing to our objective for denoising, there is only one cycle, that is, X →Ŷ →X , and only one discriminator D y . The objective of our model comprises three loss functions: adversarial loss, cycle-consistent loss, and generation-consistent loss: • The adversarial loss, as shown in Eq. (4), consists of the adversarial loss for G.
• The cycle-consistent loss is similar to Eq. (6), which is used to prevent the mode collapse. As there is only one cycle, our formula contains only the first half of Eq. (6), that is: • The generation-consistent loss is shown in Eq. (9). To further reduce the space of possible mapping functions by the generator and to control the forward mapping of the cycle to converge toward the objective function more quickly, we added this generation-consistent loss function (as shown in Fig. 2).
Finally, the full loss function is the sum of these three loss functions: of which λ 1 and λ 2 are hyper-parameters that are equal to each other in this study.
Therefore, compared to Cycle-GANs, the tasks of the discriminator remained unchanged; however, the generator was tasked to not only fool the discriminator but also to be near the ground truth output in an L1 sense.

IV. DATA
LAMOST is a quasi-meridian reflecting Schmidt telescope located in the Xinglong Station of the National Astronomical Observatories, Chinese Academy of Sciences. It is a special reflecting Schmidt telescope with 4000 fibres and can be used to obtain spectra of celestial objects as well as sky background and calibration sources. The spectra used in this study are from the fifth data release of LAMOST (LAMOST DR5) [30], [31]. Up until July 2017, LAMOST had completed its first five years of regular surveys that began in September 2012. These spectra have been released through LAMOST DR5. After this six-year survey, we obtained a total of 8 952 297 spectra, including those of 7 930 178 stars, 1 52 608 galaxies, 50 132 quasars, and 8 193 79 unknown objects. These spectra have different S/N. In LAMOST, the definition of S/N is shown bellow: where Flux n represents the spectral flux, φ n stands for the inverse variance of one flux point and N is the number of the flux points. In LAMOST, the calculation of the inverse variance require error propagation of many variables, which is difficult to be acquired in deep learning, so the S/N is not used as a metric to evaluate the denoising effect in this paper. The data used in this experiment were divided into three groups: D1, which consisted of spectra with 10≤S/N≤15 and with S/N≥50; D2, which consisted of spectra with 5≤S/N≤10 and with S/N≥50; and D3, which consisted of spectra with 2≤S/N≤5 and with S/N≥50 (see Table 1). Note that each group consisted of two parts, the first part consisted of high-S/N spectra and the second part consisted of low-S/N spectra. For each high-S/N spectrum in the first part, there was a corresponding low-S/N spectrum in the second part. These were the spectra for the same celestial body observed by LAMOST at different times (see e.g. Table 2 and Fig. 3). The difference between D1, D2, and D3 was that the spectra in the second part covered a different S/N range. The performance of our method on different datasets demonstrated the ability of our method to process spectra with different S/Ns.
The resolution of the LAMOST spectra was R ∼ 1800, and the wavelength coverage was 3700 ∼ 9100 A. We only used the wavelength range between 4000 A and 8096 A, which covered important spectra lines that were necessary for our analysis. We did not take any further steps such as continuum subtraction to pre-process the spectra except normalising the fluxes of each spectrum to [0,1] using the following equation Each of the three aforementioned groups were divided into two subsets: training sets and test sets. The training sets of D1, D2, and D3 contained 6000, 7000, and 5000 spectra, respectively, and all the testing sets consisted of 1000 spectra. During training, the epochs were set to 20000 and the batch size was 5.

V. EXPERIMENTS AND RESULTS
In the experiment, the reconstructed low-S/N spectra were compared with the corresponding high-S/N ones to verify the denoising effect. Spectra with S/N >= 50 were used as the high-S/N ones, and the corresponding ones with 2≤S/N≤5, 5≤S/N≤10, and 10≤S/N≤15 were used as the low-S/N ones. Each group of data was processed by Spectra-GANs, and as a means of comparison, we also used PCA, Wavelet, RBM, and DnCNN to denoise the spectra. For PCA, we obtained eigenspectra from the test spectra in D1, D2, and D3. Considering the denoising effect, the final eigenspectra used were the first 3 eigenspectra of D1, first 8 eigenspectra of D2, and first 83 eigenspectra of D3, which yielded a relevant variance contribution rate greater than 99% [19]. With regard to the Wavelet, the Haar wavelet is a simple and effective method that is often used in denoising the astronomical spectra [26]. Therefore, we used Haar as the basis of the wavelet. We experimentally found that the 2nd decomposition can well retain the spectral line features and reduce the noise. Therefore, we reconstructed the spectrum with the coefficients on the 2nd decomposition. For the RBM (see Appendix A), the hidden layer vectors were similar to the PC vectors given by PCA. We experimentally set the size of the VOLUME 8, 2020  hidden layer of RBM to 500 and the number of epochs to 1000 [27]. We set the DnCNN (see Appendix B) parameters similar to those described in [32]. In addition, DnCNN needs pure data and pure noise to train itself, so we use Kurucz synthetic spectra [33] as the pure data and the Gaussian White Noise (WGN) as the pure noise (details are shown in Appendix B).
We applied these five methods on the low-S/N spectra and compared how well the reconstructed spectra matched the original high-S/N spectra both quantitatively and qualitatively.
A. QUALITATIVE COMPARISON Figure 4, Figure 5, and Figure 6 show the denoised results of Spectra-GANs, PCA, DnCNN, RBM, and Wavelet on the spectra selected from D1, D2, and D3, respectively. In these figures, the five colour spectra represent the denoised results of the five denoising methods. Among these, PCA, Wavelet, and RBM are unsupervised methods, while Spectra-GANs and DnCNN are supervised methods. These figures indicate that our method (the results of which are shown in orange) obtains better denoised results compared to the other methods.
Furthermore, compared with the other methods, our method was able to not only preserve the spectral lines consistently, but also effectively correct inaccurate continuum, such as the undesired continuum slope in Fig. 4. From this point of view, our method can perform many other tasks, such as recovering defective spectra, repairing incomplete data, and so on. We plan to further explore the application of Spectra-GANs in processing other types of low-quality spectra in the near future.
As shown in these figures, PCA was able to reconstruct the most important spectral lines and eliminate most skylights, but the width and depth of the spectral lines did not match well. Moreover, PCA could not correct the inaccurate continuum, i.e., the continuum of the reconstructed spectrum was not consistent with the original one. For the DnCNN, the denoised effect of the observed spectrum was slightly worse than the other methods. Although the DnCNN is a promising denoiser, it can be seen from our experiments that it tends to produce over-smooth spectra. The most likely reason is that the DnCNN in our experiment is trained using WGN due to the lack of suitable noise model, whereas real noise of the observed spectra is more complex. Therefore, its performance was not as good as expected. The RBM and Wavelet also did not produced satisfactory denoised effects.
To conclude, our method outperforms other methods. The results suggest that our method may have learned the noise model of the spectra and can effectively convert spectra with low-S/N values to those with high-S/N values. Comparison of denoised results obtained using Spectra-GANs, PCA, DnCNN, RBM, and Wavelet from D1. The green line represents the original low-S/N spectra, the blue line represents the original high-S/N spectra, the orange line represents the denoised spectra using Spectra-GANs, the red line represents the denoised spectra using PCA, the purple line represents the denoised spectra using DnCNN, the brown line represents the denoised spectra using RBM, and the pink line represents the denoised spectra using Wavelet. The figure shows that all methods can recover the most important lines, whereas only Spectra-GANs can better correct the inaccurate continuum, such as the undesired continuum slope. . This is the same as Fig. 4, except the results are of spectra from D2. The figure shows that Spectra-GANs can perfectly recover these line intensity and continuum. PCA was able to recover the most important spectral lines, but their width and depth did not match well. DnCNN tended to produce over-smooth spectra. RBM and Wavelet could not denoise the spectra.

B. QUANTITATIVE COMPARISON
For the quantitative analysis, we measured the mean absolute error (MAE) of the Lick/IDS index and the full spectrum between the original high-S/N spectrum and the reconstructed high-S/N spectra. The MAE was defined as follows: wherein, e is the residual of the corresponding flux (Lick indices) from the original high-S/N spectrum and the reconstructed high-S/N spectra; M represents the number of flux points in the spectrum. The MAE is the average difference between the original high-S/N spectrum and the reconstructed high-S/N spectra. Hence, when the MAE was high, the reconstructed spectrum was significantly different from the original spectrum, and the performance of the method was unsatisfactory; otherwise, the performance was considered good. Hence, the value of VOLUME 8, 2020 FIGURE 6. This is the same as Fig. 4, except the results are of spectra from D3. As described in Fig. 5, Spectra-GANs performed better than the other methods in the cases of extremely low-S/N spectra. the MAE can be used to measure the performance of our method.

1) PERFORMANCE ON LICK INDEX
In this subsection, we compare the performance of Spectra-GANs, PCA, DnCNN, RBM, and Wavelet on recovering the Lick index. The Lick/IDS index system is a set of absorption-line indices, defined by the Lick group for low-resolution (R∼8) spectra with wavelength coverage between the range of 4000 A and 6400 A [34]- [39], which is often used as an approach to measure the atmospheric parameters in stars and galaxies. Barbuy [43].
In this experiment, we first denoised the spectra with Spectra-GANs, PCA, DnCNN, RBM, and Wavelet, and then computed the Lick indices of the denoised spectra. The derived Lick indices were then compared with those derived  from the corresponding high-S/N spectra, and the MAE between these two Lick indices was used to measure the performance of different methods in denoising the spectra. The wavelength definitions of the Lick indices are shown in Table 3. Figure 7 shows the MAEs for the three test sets D1, D2, and D3 derived with Spectra-GANs and the other four methods. The plots show that Spectra-GANs performs significantly better than other methods in terms of Lick index recovery, especially for the Balmer lines (Hβ, Hδ A , Hγ A , Hδ F , and Hγ F ). However, the PCA performs slightly better than Spectra-GANs when recovering Fe4531, Fe4668, and Fe5709 from spectra with 5≤S/N≤10. Thus, Spectra-GANs was able to better recover spectral lines than other methods even for extremely low-S/N spectra (2≤S/N≤5).

2) PERFORMANCE ON THE FULL SPECTRUM
In this subsection, we compare the full spectra denoised by Spectra-GANs with those denoised by PCA, DnCNN, RBM, and Wavelet. We first denoised the spectra with different methods and then computed the MAE between the denoised spectra and the corresponding high-S/N spectra. Evidently, the method with smaller MAE values performed better than that with larger MAE values. The results are shown in Fig. 8,  Fig. 9, and Fig. 10. As indicated in these figures, our method presents the smallest mean and variance values compared to other methods, and the DnCNN produces the largest ones due to the lack of suitable noise model. Furthermore, the mean and variance values of the other three methods are almost two times larger than those obtained with our method.  . The green line represents the original low-S/N spectra, the blue line represents the original high-S/N spectra, the orange line represents the denoised spectra using Spectra-GANs. Four example spectra on which the Spectra-GANs failed are shown. The reason for the failure in the top panel was that there were too many skylights in some low-S/N spectra. The reason for the failure in the bottom panel was the fault correspondence; that is, the high-S/N spectra and the corresponding low-S/N spectra (as shown in Table 2) were not from the same star.
In general, the MAE obtained by Spectra-GANs was significantly smaller and more concentrated than the MAE obtained by other methods, indicating that the denoised spectrum obtained by Spectra-GANs was closer to the corresponding true high-S/N one in comparison with other methods.

C. ANALYSIS OF TIME COMPLEXITY
In this subsection, we analyze the time complexity of Spectra-GANs, PCA, DnCNN, RBM, and Wavelet. The experiments were carried out in the PyCharm 2019.2.4 environment running on a workstation with an Inter(R) Core(TM) i9-9900X (3.50 GHz) CPU with 64 GB memory and Windows 10 system.
We calculated the denoising time of test sets in D1, D2 and D3 respectively, each of which contains 1000 stellar spectra. From Table 4, we could conclude that the time complexity of our method is almost equal to the DnCNN, inferior than PCA, RBM and Wavelet. The reason is that as the deep learning algorithms, the models of the Spectra-GANs and DnCNN are more complex, so their time complexity is inferior to other three methods. In addition, for the Spectra-GANs and DnCNN, we need to train them before denoising, and the time complexity of training is much higher than that of denoising process. However, in the processing of astronomical spectra,  most of the stellar spectra do not require real-time processing, so the time complexity is not an important evaluation metric. Meanwhile, compared to other methods, another advantage of our methos is that the trained Spectra-GANs can be directly used to denoise the LAMOST low-S/N stellar spectra with the same S/N range as the training set.

VI. FAILURE ANALYSIS
Although in most cases Spectra-GANs performed better than other methods in denoising the spectra, it failed in some spectra. In this section, we show some examples where Spectra-GANs faced difficulties. There were three major failure cases: (1) As shown in the top panels of Fig. 11, there were too many skylights in some low-S/N spectra. The randomness of these skylights was beyond the scope of the effective input, which led to reconstruction failure; (2) As shown in the bottom panels of Fig. 11, there were some mismatches; that is, the high-S/N spectra and the corresponding low-S/N spectra (as shown in Table 2) were not from the same star. This case could have led to certain training and test errors; (3) In addition to these two cases, Spectra-GANs may fail on rare objects that were absent or low in number in the training set, beacuse the performance of our method depended on the training set. Spectra-GANs was unable to reconstruct features that were absent in the training set. It is worth noting that although Spectra-GANs did not perform well on some spectra, such cases were very rare. More examples of the results are included in Appendix C, which demonstrate that in most of cases, our proposed method works well in denoising the spectra.

VII. DISCUSSION
The proposed method learns how to derive characteristics from a collection of spectra (such as those with low-S/Ns) and determines how these characteristics can be translated into other types of spectra (such as those with high-S/Ns).
The results demonstrate that Spectra-GANs can effectively denoise the stellar spectra with low-S/Ns from LAM-OST, especially stellar spectra with extremely low-S/Ns (2≤S/N≤5). Our method can also be applied to process other types of low-quality spectra, such as defective spectra, incomplete spectra, and so on. We also expect our method to be applied to spectra from other spectroscopic surveys such as the SDSS and 2dF. Although our method works very well when dealing with low-S/N spectra, there are still some shortcomings. The main limitation of our method is that the training set ultimately limits the ability of Spectra-GANs to recover features. We trained spectra in pairs and applied the trained model to spectra in a similar S/N range. If we intend to apply our method to spectra in another S/N range, we should train the model using spectra in similar S/N ranges with an appropriate epoch. Another limitation of our method is that sophisticated details such as weak spectral lines are impossible to recover perfectly, as the probability model is complex. Furthermore, if rare objects are absent from the training set, the method may fail, as illustrated in Section VI.

VIII. CONCLUSION
In this study, we propose a method called Spectra-GANs to automatically improve the spectral quality of LAMOST spectra. Our method does not assume any prior knowledge and neither does it require any manual intervention. Hence, it is not task-specific but is rather a general-purpose method that can be used for different spectral processing tasks. As verified by the experimental results, compared to other unsupervised and supervised methods, Spectra-GANs achieves a better generalisation performance on spectral denoising. Spectra-GANs also has a better scalability for various low-S/N spectra than other methods. These improvements are expected to significantly increase our ability to classify stellar population, estimate stellar distances and ages, as well as solve many other scientific challenges. Thus, Spectra-GANs is a promising method for processing low-S/N spectra. Certainly, Spectra-GANs also has limits. For example, PCA takes less time to extract principal components or obtain decomposition coefficients, whereas Spectra-GANs takes a longer time for training.

APPENDIX A RBM
In this section, we provide a brief introduction to RBM. RBM is a stochastic neural network rooted in statistical mechanics, which is a two-layered structures composed of a visible layer and a hidden layer. This neural network has no connections within the layers and is fully connected between layers (Fig. 12). Its neuron is a kind of random neuron. The output of a neuron has only two states (inactive and activated), which are generally represented through binary 0 and 1. The value of the state is determined according to statistical rules. RBM is an energy-based model. The energy of the joint configuration of the visible variable v and the hidden variable h is: (14) where θ represents the parameter {W , a, b }, W is the weight of the edge between the visible and hidden units, and b and a are the biases of the visible and hidden units, respectively. With the energy of the joint configuration of v and h, we can get the joint probability of v and h: where Z (θ) is the normalization factor, also known as the partition function. According to Eq. (14), Eq. (15) can be written as: The likelihood function P(v) of the observed data is maximized. P(v) can be obtained by calculating the marginal distribution of P(v, h) against h in Eq. (16): The parameters of the RBM are obtained by maximizing P(v), which is equivalent to maximizing log(P(v)) = L(θ ):

APPENDIX B DnCNN
In this section, we give a brief introduction to DnCNN and its training sets. DnCNN is a deep convolutional neural network for Gaussian denoising, which uses residual learning as well as batch normalization to utilize fast convergence and good performance for deep network in image restoration. This neural network has 17 layers ( Fig. 13), and the structure is explained as follows: Conv Conv: In the last layer, the c-dimensional image is reconstructed as output through c filters of size 3 × 3 × 64.
BN represents the Batch Normalization, which normalizes the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1. Therefore, the goal of BN is to achieve a stable distribution of activation values throughout training, which can significantly accelerate the convergence and avoid gradient explosion. ReLU is on behalf of the Rectified Linear Unit, which is a activation function and shown in Eq. (19). Rectifier units help to find better minima than other activation functions during training [44].
In addition, the receptive field size of the network is (2d + 1) × (2d + 1), that is, 35 × 35. Therefore, the corresponding network depth d is 17. The input is y = x + v, where y is the noise observation, x is the clean image, and v is the residual image (noisy image). The network expects to learn R(y) = v, that is, noisy images. The loss function is the average mean square error between the predicted residual image and the true residual image: Here, θ represents the trainable parameters to be learned, and N represents the number of train images. DnCNN needs pure data and pure noise to train itself. The pure data used in this paper are synthetic spectra from the KURUCZ model [33], which are calculated using the SPECTRUM program, and the resolution of the spectra is convolved to 2000, which is basically consistent with the resolution of the LAMOST spectra. The wavelength range is 3000Å∼10000Å. In order to facilitate the subsequent processing of the Spectra-GANs, we have selected the wavelength ranging from 4000Å to 8095Å.
The stellar spectra contain many sources of noise, such as shooting noise, skylight background, cosmic noise, readout noise, dark current noise, and so on, so its noise model is very complicated. In order to simplify the calculation, WGN is commonly used as the noise model in the stellar spectra, so the pure noise used in this paper is also WGN. Therefore, in the DnCNN, the pure data set and pure noise set consist of synthetic spectrum and WGN, respectively, and the synthetic low-S/N spectrum is the superposition of synthetic spectrum and WGN (as shown in Fig. 14). The S/N of the synthetic spectrum is calculated by the following formula: so the calculation of the WGN under a certain S/N is as follows:  where Flux represents a one-dimensional random array with Gaussian distribution. In order to keep the data distribution consistent with the training sets of Spectra-GANs (as shown in Table 1), we build three data sets for training DnCNN(as shown in Table 5).

APPENDIX C LICK/IDS INDEX
In this section, we briefly introduce the Lick index. The Lick/IDS index system measures 25 optical absorption features, 19 of which are atomic indices and 6 of which are molecular bands (as shown in Table 3). Each index was measured according to the following scheme: Two pseudo-continuum bandpasses were defined on each side of a central bandpass. A line representing the continuum between the midpoints of the two flanking bandpasses was drawn, and the flux difference between this line and the central bandpass flux determined the index. There are two kinds of indices, i.e., molecular bands and atomic features. Molecular bands were expressed in magnitudes, while atomic features were expressed in angstroms of equivalent width. In the pseudo-continuum (P) bandpass, we calculated the average bandpass flux from the spectrum: A molecular band measured in magnitudes was then Denoising examples from D1. The green line represents the original low-S/N spectra, the blue line represents the original high-S/N spectra, the orange line represents the denoised spectra using Spectra-GANs, the red line represents the denoised spectra using PCA, the purple line represents the denoised spectra using DnCNN, the brown line represents the denoised spectra using RBM, and the pink line represents the denoised spectra using Wavelet.  and an equivalent width as the atomic feature was where F I λ and F Cλ were the fluxes per unit wavelength in the index passband and the straight-line continuum flux in the index passband, respectively.

APPENDIX D DENOISING EXAMPLES
In this section, we randomly present some denoising examples using our method and other methods in Fig. 15, Fig. 16 and Fig. 17.