Carbon Sinks and Variations of pCO2 in the Southern Ocean From 1998 to 2018 Based on a Deep Learning Approach

The Southern Ocean comprises 25% of the global ocean surface area, accounts for nearly half of the total carbon sink of the global oceans, and is a place that significantly reduces the impacts of anthropogenic CO2 emissions. Due to the sparsity of observational data, the changes in Southern Ocean carbon sinks over time remain uncertain. In this study, we integrated correlation analysis and a feedforward neural network to improve the accuracy of carbon flux estimations in the Southern Ocean. Based on observation data from 1998–2018, we reconstructed the Southern Ocean's pCO2 grid data during this period. The root-mean-square error obtained by fitting the observation data was 8.86 μatm, indicating that the results were better than those of the two primary statistically based models in the Surface Ocean pCO2 mapping intercomparison. The results also showed that the Southern Ocean's capacity to act as a carbon sink has gradually increased since 2000; it reduced during 2010–2013 but increased significantly after that. The Southern Ocean's seasonality is characterized by minimum carbon uptake in winter due to increased upwelling; this is followed by a rapid increase toward maximum uptake in summer, which is mainly biologically driven. There is an apparent double-ring structure in the Southern Ocean, as noted in other studies. This study confirms that the inner ring (50–70°S) is a carbon source area gradually transforming into a carbon sink, while the outer ring (35–50°S) continues to serve as a carbon sink.

has absorbed ∼30% of anthropogenic carbon dioxide emissions [2]. The ocean plays a crucial role in absorbing atmospheric CO 2 ; without this absorption, earth's surface temperature would be much higher [3], [4]. Past observational data and model estimates have indicated that the Southern Ocean is responsible for absorbing up to half of all anthropogenic CO 2 emissions [2], [5]. Therefore, the Southern Ocean is inextricably linked to the global carbon cycle and climate change [6], [7]. Although the Southern Ocean is essential, due to a lack of observational data, past studies in this area have not provided accurate calculations of the trends in CO 2 changes, let alone an understanding of the driving factors of such changes [8].
Traditional estimates, such as atmospheric inversions approaches, were limited by the sparse atmospheric CO 2 measurements [9], [10]. Some spatial and temporal interpolation based on empirical relationships between CO 2 and proxy variables [11], [12] were largely focused on the relatively observation-rich regions [13]. The emergence of biogeochemical models temporarily provided more credible solutions for predicting CO 2 fluxes in this region [14], but this model simulation relied on the correctness of the process simulation structure [15].
In recent years, neural network algorithms have been widely employed to reconstruct surface pCO 2 data. Compared with statistical interpolations, neural network approaches are not limited by sparse observations and can adequately represent the interannual variability [15]. Gregor et al. [16] used a support vector machine (SVM) and random forest (RF) to reconstruct the pCO 2 data of the Southern Ocean. The resultant root-meansquare errors (RMSEs) were 16.45 and 24.04 μatm. Meanwhile, Landschützer et al. [17] merged a self-organizing map (SOM) and feedforward neural network (FFNN) to construct the SOM-FFNN method to reconstruct the pCO 2 data of the Southern Ocean. The inputs include sea surface temperature (SST), sea surface salinity (SSS), the depth of the mixed layer (MLD), chlorophyll concentration (CHL), and other parameters. Their results indicated that the Southern Ocean carbon sink stagnated or even decreased between 1980 and 2000, but then gradually recovered its original strength in ∼2002. Both data products showed good interannual and seasonal cyclical changes, but the SOM-FFNN algorithm offers better performance than the machine learning algorithm (SVM and RF). Denvil-Sommer et al. [18] employed the Laboratory of Climate and Environmental Sciences (LSCE)-FFNN method to reconstruct global This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ pCO 2 data, which maintained consistency with observational results. However, compared with the observed data, the Southern Ocean's reconstructed data have a larger error than other regions with more in situ observations.
In this study, using the Surface Ocean CO 2 ATLAS (SOCAT V.6) data of the Southern Ocean from 1998 to 2018, we applied the correlation analysis (CA)-FFNN method to reconstruct the monthly and 1°× 1°pCO 2 data of the Southern Ocean. FFNN have advantages over other algorithms because it produces data more stable in sparse regions [15] and interpolates the sparse pCO 2 data with small bias [19]. This method is primarily divided into two steps. First, the correlation index of each parameter is calculated and sorted. Second, using the parameters with relatively high correlation coefficients as the input variables of the FFNN, the correlation matrix between pCO 2 data and other observed variables is constructed, and correlation parameters are used to reconstruct the pCO 2 data in the blank area of the Southern Ocean. This method facilitates data analysis and reconstruction for specific regions and improves the current situation wherein sites with limited observation data have higher RMSE values. Therefore, this approach may be generalizable for reconstructing regional data. Finally, we analyzed the seasonal, interannual, and interdecadal variations of pCO 2 in the Southern Ocean.

A. Data
The parameters used in the CA of the pCO 2 data included SST, SST anomaly (SSTA), SSS, and SSS anomaly (SSSA); these parameter data were all from the gridded dataset of Global Ocean Heat Content Change [20], while anomaly data were obtained by subtracting the average data values from the climatic state data of each month. Chlorophyll concentrations (Chl-a) were based on satellite remote sensing data from the European Space Agency's Glob Colour Project, and MLD data were from the French Research Institute for Exploitation of the Sea. The uand v-components of the wind field at 10 m above sea level (a.s.l.) were taken from the European Centre for Medium-Range Weather Forecasts. All these data except MLD are monthly averages over a 1°× 1°lat/lon box. MLD data are monthly averages over 0.5°× 0.5°.
In this study, the measured pCO 2 data used for FFNN training and validation were converted from the SOCAT grid fCO 2 dataset. The conversion relationship between fCO 2 and pCO 2 is as follows [21]: where p is the atmospheric pressure (Pa), R is the gas constant (8.314 JK −1 mol −1 ), SST is the sea surface temperature (K), T subskin is the subskin temperature, and B and δ are the correction coefficients, which are calculated as The partial pressure of atmospheric CO 2 at each grid point was calculated by the following formula [7]: where xCO 2 is the dry air mixing ratio of atmospheric CO 2 .
The relevant data were collected from the United States National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratories marine boundary layer reference. Additionally, P eq is the pressure at equilibrium, and VP(H 2 O) is the steam of seawater at a given temperature [22] where is the T subskin is subskin temperature. To reduce the computational difficulty and influence of excessively large datasets on neural network learning, we normalized all data by Since the Chl-a data in this study did not include relevant records before the SeaWiFS launch in 1997, our study period is between 1998 and 2018. The spatial resolution of all parameter data was 1°× 1°. Longitude (Lon) and latitude (Lat) are in 360°and 180°coordinate systems, and trigonometric conversion functions were used to ensure continuity and normalization.

B. FFNN Construction
The data collected and the corresponding pCO 2 observation data were used as the inputs for CA, and a covariance matrix was constructed using (8) and (9) as follows, to calculate the correlation coefficient, as shown in Fig. 1: Here, u is the mean of the value, β is standard deviation of the value, Cov(X,Y) is the calculated covariance matrix, and ρ is the correlation coefficient. Through CA, the parameters with correlation coefficients >0.1 were used as the input parameters, considering the relevance of chemical effects between SST and pCO 2 , the SST should be used as input parameter [23]. After correlation analysis, the selected parameters were the SST, SSSA, MLD,CHL, the u-component (U) of the sea level wind field, and the partial pressure of atmospheric CO 2 (pCO 2a ). The established correlation equations between pCO 2 and other parameters are summarized in   . k-fold cross validation, in this study divided into fourfold, in each fold, 25% data for test and others for training to build the optimal neural network. The yellow shape is test data and blue shape is train data.
The FFNN was used to construct a nonlinear regression model. Although the output data of an FFNN improve and become more accurate as the number of layers and neurons in the FFNN increases, the model's size also depends on the amount of data used for model training. Since the amount of observational data available for the Southern Ocean is significantly less than that for other regions, we constructed a relatively simple FFNN, the neutral network structure of which is shown in Fig. 2 , the final model at Step 2 has eight layers (six hidden layers), and the number of each layer's input tensor is marked on the figure. The grey square is the dropout layer, and the dropout rate is 0.5.
The k-fold cross validation is used to determine the hyperparameters of the neural network (see Fig. 3). The Southern Ocean data were divided into 25%/75% portions used for testing/training sets. The trained neural network contained eight layers, and the middle layer had three fully connected hidden layers. Simultaneously, to prevent the FFNN from overfitting, we added three dropout layers and gave each layer's dropout ratio 0.5. Through many tests and detailed analyses, the hyperbolic tangent (Tanh) was selected as the activation function of the neuron, and the mean-squared error (MSE) as the loss function where observed i is the observation data, and predicted i is the data predicted by the FFNN model using RMSProp as the optimization function [24]. By adjusting the adaptive learning rate ρ, the amount of information obtained was controlled. The CA-FFNN was then formed by combining a factor analysis of the main influencing factors and the FFNN.

C. Computation of Sea-Air CO 2 Fluxes
The formula for calculating the CO 2 flux at the air-sea interface is [25] where a is the solubility of CO 2 in seawater (molkg −1 atm −1 ), calculated by Weiss [21] ln a = −60.2409 + 93.4517 100 T subskin + 23.3585 In (12), a subskin is calculated by the subskin temperature, a skin is calculated by the skin temperature. f CO 2w is the fugacity of subskin seawater CO 2 , f CO 2w is the fugacity of subskin seawater CO 2 , f CO 2a is the fugacity of atmospheric CO 2 , and K is the exchange rate, which is usually considered as a function of wind speed [26] Here, Sc is the Schmidt number of CO 2 in seawater at a given T subskin temperature, such that where U is the monthly mean wind speed (m/s) at 10 m height from the cross-calibrated multiplatform ocean surface wind vector analysis product and Γ is the scale factor which was evaluated based on different wind speed products (e.g., 0.39, 0.251, 0.31, etc.) and have been used in other studies [7], [27], [28]. Based on an average wind speed of 6.38 ms −1 in the ECMWF product the scale factor of 0.31 was used to reach a global mean transfer velocity of 16 cm h −1 , consistent with the new radiocarbon-based constraints.

D. Evaluation
As limited observational data exist for the Southern Ocean, and the dataset used for validation will be very small, the division of this dataset will result in substantial variance in the RMSE and mean-absolute error (MAE). To ensure reliable  model validation, we used 100% of the data for model training, testing, and validation, and to continuously optimize both the neural network model and the internal weight. The final neural network was used to predict the area of the existing observations. The calculated RMSE is 8.86, while MAE is 5.01. Fig. 4 shows that the predicted values are very close to the observed values and R2 = 0.93. In Table I, SOM-FFNN merged a SOM and FFNN, and the RMSE is 12.24. LSCE-FFNN employed the Laboratory of Climate and Environmental Sciences, and the RMSE is 17.40. We conclude that the CA-FFNN-based models outperform both the SOM-FFNN and LSCE-FFNN.

A. Seasonal Variation in Southern Ocean Sea Surface pCO 2
Based on our new data, we find a clear pCO 2 seasonal variation in the Southern Ocean. This observation is highly consistent with the seasonal changes observed in the reconstructed data products of other studies [29]- [31]. The seasonal mean amplitude of surface pCO 2 was 13.02 μatm; our products have a similar seasonal trend compared with a station-observed actual data in the Southern Ocean [32], the pCO 2 reaching its maximum in winter, and becoming smaller in summer (see Fig. 5), and the seasonal changes of pCO 2 in the Southern Ocean resulted from the combined effects of biological and physical factors [33]. In winter, due to the strengthening of the Southern Ocean wind field, as shown in Fig. 6, the Ekman transport caused by the   wind field also intensifies [34], [35], strengthening upwelling and improving the efficiency of the biological pump.
The transfer of dissolved inorganic carbon from the bottom layer to the surface causes the pCO 2 of the surface layer to increase during this period, with upwelling playing a leading role. As sea ice melts in summer, primary productivity is enhanced, the Chl-a concentration increases, as shown in Fig. 7, and the ability of organisms to consume CO 2 is gradually restored [36], resulting in a decrease in surface CO 2 concentrations. Biological factors dominate this period.

B. Annual Variation in Southern Ocean Sea Surface pCO 2
From the calculated results of the model, the mean pCO 2 of the surface waters of the Southern Ocean increased from 351.88 μatm in 1998 to 372.65 μatm in 2018-a total increase of 20.77 μatm in 21 years and an annual mean increase of 0.99 μatm(yr) −1 . This high growth rate has been maintained, as shown in Fig. 8 .    By calculating the rate of change of every grid point in the Southern Ocean, the overall pCO 2 was found to be gradually increasing, as shown in Fig. 9. In terms of its regional distribution, a relatively high growth rate has been maintained from 35-55°S.  The overall rate is relatively uniform. Previous studies have also shown that the pCO 2 of the Southern Ocean has maintained a strong growth momentum since 2002 [37], and our model captures this phenomenon well.

C. Variability in Sea-Air CO 2 Flux
Regarding the rate of change of ΔfCO 2 over space, the Southern Ocean is developing into a carbon sink. The red/black dots in Fig. 10 represent ΔfCO 2 regions toward positive/negative trends with high change rate. Based on the pCO 2 distribution of the Southern Ocean since 1998, the inner ring region (50-70°S) gradually changed from a carbon source to a carbon sink, while the outer ring (35-50°S) has been a carbon sink, and the sinking has become more strengthened. By reconstructing the Southern Ocean, the CA-FFNN-based model was able to fully reproduce the temporal and spatial variations in the CO 2 flux and maintain consistency with other models concerning the evolution of intensity [38]. Using (12) to calculate the CO 2 flux, the Southern Ocean's carbon sink was found to have changed substantially over the past two decades. The absorption of surface CO 2 in the Southern Ocean also displayed prominent  seasonal characteristics, with the weakest absorption at the end of winter and the strongest in early summer (see Fig. 11). In this region, carbon sinks were saturated in the 1990s and restored to their original strengths in the early 21st century [39]. The data products could reconstruct this recovery in intensity and capture seasonal and regional changes in flux.
In terms of interannual changes, the carbon sink of the Southern Ocean increased from −0.21 PgC(yr) −1 in 1998 to −1.67 PgC(yr) −1 in 2018.
One standard deviation was used as an indicator of error where x i is the actual value, x i is the mean value of x, n is number of data, and the error range was within ±0.0.087 PgC(yr) −1 .
We found that the carbon sinks in this region have strengthened since 2000, except for when their development gradually slowed from 2010 to 2013; since then, they have recovered to their previous strengths. As shown in Fig. 12, this phenomenon was observed across different models [40].
The last period of stagnation occurred in the 1990s and there is much evidence to suggest that the stagnation during this period was closely related to the changes in the southern annular mode (SAM) [41]. However, the modeled stagnation was not strongly correlated with the SAM. Stability during this period was mainly due to the weakening of the carbon sink intensity from 35-50°S.Changes in this region have also been attributed to the barometric asymmetry of the zontal waves 3 (ZW3) model [42]. Due to the lack of observational data, it is difficult for traditional models that rely on observations to capture such large interannual changes.
From the analysis of changes in the regional CO 2 flux, an obvious double-ring structure was observed in the fluxes of the Southern Ocean, as shown in Fig. 13, which is not always a carbon sink. The outer ring is the main carbon sink, at 35-50°S, and undertakes most CO 2 absorption. The outer ring is the main carbon sink, at 35.50°S, and undertakes most CO 2 absorption (Fig 14). In April, May, June, July, August, and September, the region serves as a carbon source, emitting CO 2 into the atmosphere. In October, November, December, January, February, and March, it absorbs CO 2 , as shown in Fig. 15. Based on our analysis of the interannual changes, this area is still a carbon sink.
However, this ring structure is gradually disappearing. Fig.  16 shows that, from 1998 to 2018, the carbon source in the inner ring area was disappearing each year, and the carbon sink in the outer ring was strengthening. As shown in Fig. 13, most Southern Ocean regions become carbon sinking regions, because the ΔfCO 2 in the Southern Ocean decreases significantly since 1998.

IV. CONCLUSION
In this study, we propose a deep-learning-based method for reconstructing pCO 2 data in the Southern Ocean that is generalizable for reconstructing regional data. The procedure is carried out in two steps. First, we collected the parameters that may affect changes in pCO 2 from the literature and experimental data and constructed a covariance matrix for all parameters. The correlation coefficients were then calculated, and the parameters with higher correlations and having an effect on the process change of pCO 2 were retained as the input FFNN in the second step after continuous and iterative calculation and optimization, the final model was constructed and used to reconstruct the pCO 2 data of the Southern Ocean with a monthly temporal resolution and a spatial resolution of 1°× 1°. First of all, we screen the parameters that affected pCO 2 changes according to the environmental characteristics of the Southern Ocean. Second, because the neural network has the advantage of interpolation in sparse regions, we build a new model based on the Southern Ocean's environmental data and neural network methods. Finally, in the Southern Ocean, the models have a better fitting than other models; compared with the measured data, the model maintained a favorable RMSE of 8.86 μatm.
Analyses of the reconstructed results showed that the surface layer pCO 2 in the Southern Ocean changes seasonally and has increased yearly since 1998, such that the carbon sink is also increasing. The regional changes in flux exhibited a double-ring structure from 35-50°S, which is the main carbon sink area; seasonal carbon sources and sinks alternately appeared south of 50°S. Although our results are consistent with previous studies' results, due to the limited number of observational data in the Southern Ocean, the reconstructed surface pCO 2 products still need continuous validation. With the expanding observational granularity of the Southern Ocean, the algorithm will be further improved.

ACKNOWLEDGMENT
The pCO 2 data were provided by SOCAT (https://www.socat. info/), the SST data and SSS data were provided by gridded dataset of Global Ocean Heat Content Change (http://159.226. 119.60/cheng/), the Chl-a data were provided by European Space Agency's GlobColour Project (https://hermes.acri.fr/), the MLD data were provided by French Research Institute for Exploitation of the Sea (http://www.ifremer.fr/), the wind data were provided by the European Centre for Medium-Range Weather Forecasts (https://www.ecmwf.int/), and the xCO 2 data were provided by United States National Oceanic and Atmospheric Administration (NOAA) Earth System Research Laboratories marine boundary layer reference (https://www.esrl. noaa.gov/).