NeSNet: A Deep Network for Estimating Near-Surface Pollutant Concentrations

With the threat of atmospheric pollution on the rise in recent years, round-the-clock monitoring of the concentration of atmospheric gases has become utterly necessary. As opposed to traditional in situ measurement strategies, satellite monitoring offers a convenient alternative for truly global coverage. However, satellite measurements do not provide information about the vertical profile of concentration, and estimation methods must be used to deduce near-surface concentration. Existing works that address this problem often adopt approaches that use auxiliary variables such as meteorological parameters and population density information along with vertical column density (VCD) measurements. In remote areas where such information is not available, these methods are likely to fail. In our work, we propose a near-surface network, a convolutional neural network that has been designed to perform the estimation of near-surface concentrations of atmospheric trace gases using only VCD values. We demonstrate the working of our method for nitrogen dioxide (NO<inline-formula><tex-math notation="LaTeX">$_{2}$</tex-math></inline-formula>), sulfur dioxide (SO<inline-formula><tex-math notation="LaTeX">$_{2}$</tex-math></inline-formula>), and ozone (O<inline-formula><tex-math notation="LaTeX">$_{3}$</tex-math></inline-formula>). The proposed method shows RMSE scores of 6.272, 7.20, and 16.03 <inline-formula><tex-math notation="LaTeX">$\mu$</tex-math></inline-formula>g<inline-formula><tex-math notation="LaTeX">$/\text{m}^{3}$</tex-math></inline-formula> for SO<inline-formula><tex-math notation="LaTeX">$_{2}$</tex-math></inline-formula>, NO<inline-formula><tex-math notation="LaTeX">$_{2}$</tex-math></inline-formula>, and O<inline-formula><tex-math notation="LaTeX">$_{3}$</tex-math></inline-formula>, respectively. We also perform a detailed analysis of the impact of various factors on model performance. In the future, this method also use to determine the concentration of additional air pollutants including PM<inline-formula><tex-math notation="LaTeX">$_{2.5}$</tex-math></inline-formula> and PM<inline-formula><tex-math notation="LaTeX">$_{10}$</tex-math></inline-formula>. To possibly improve the effectiveness of the model, other meteorological variables, such as temperature, relative humidity, wind speed, and wind direction can be incorporated.

the onset of which is already being observed, keeping a check on atmospheric pollutant levels is now more important than ever. Trace gases play a major role in atmospheric chemistry. Many of these are also regarded as major atmospheric pollutants. The concentration of gases, such as (SO 2 ), (O 3 ), (NO 2 ), etc., are indicators of air quality. Therefore, in this study, we primarily concern ourselves with concentrations of NO 2 , SO 2 , and O 3 .
Oxidation of atmospheric NO 2 results in the formation of nitrogen aerosols, which impacts the particulate matter (PM) concentrations. Moreover, NO 2 is a precursor to O 3 . NO 2 in itself is an abundant atmospheric pollutant and has been proven to be harmful to humans. It may result in the contraction of various cardiovascular and respiratory ailments [1]. A high concentration of NO 2 contributes to acid rain, which causes corrosion. Stratospheric ozone is responsible for blocking UV rays. However, in the lower atmosphere, excess ozone concentration results in a decrease in agricultural yield [2] and also has an adverse effect on human health. Like NO 2 , sulfur dioxide also causes the formation of secondary pollutants. SO 2 causes respiratory illness, especially in children and the elderly. It also contributes to the formation of smog and acid rain.
The near-surface concentration of gases including NO 2 , SO 2 , and O 3 have traditionally been measured from ground monitoring stations for accuracy in measurements [3]. This approach, however, is not devoid of demerits. The greatest among them is the fact that for monitoring concentrations in an area, all necessary monitoring equipment must be set up in the locality. This proves to be a challenge in remote corners of the world. Moreover, even for areas that have monitoring stations, the measurements are accurate only till certain distances from the monitoring stations. Therefore, high-resolution concentration measurements are usually only available in urban areas [4], [5]. In such a scenario, satellite measurements offer a lucrative alternative [6], [7]. Atmosphere observation instruments such as Ozone Monitoring Instrument (OMI) aboard NASA's Aura [8] satellite and Tropospheric Ozone Monitoring Instrument (TROPOMI) aboard ESAs Sentinel 5-p satellite provide round-the-clock measurements of atmospheric trace gas concentrations [9], [10]. The measurements are in the form of vertical column density (VCD) [11]. Although VCD values have a strong correlation with near-surface concentration, these measurements do not provide information about the vertical distribution of concentration. This means that the exact surface concentration of atmospheric gases cannot be measured from satellite instruments. It is because of this reason that various methods have been proposed for the estimation of the near-surface concentration of atmospheric gases from tropospheric or total VCD. Early works proposed in this area were mostly based on atmospheric chemistry models and simple regression models [12], [13]. In the last decade, the use of machine learning methods for the estimation of various atmospheric gas concentrations has been on the rise. In recent years, deep learning is also slowly making its way into the area [14], [15], [16]. In the work by Lamsal et al. [17] GEOS-Chem atmospheric chemistry model is used for the simulation of surface NO 2 concentration over the United States and Canada. Gu et al. [18] use RAMS-CMAQ air quality modeling system for a similar study over China. In the last few years, the use of machine learning models for the estimation of near-surface NO 2 concentration has garnered attention. Most of the methods that have been proposed use a number of features along with tropospheric VCD of NO 2 to perform the estimation. For example, in their work, Kang et al. [19] used a large number of satellite data variables along with auxiliary variables such as population density, wind speed, direction, etc., to construct estimation models. Support vector regressor (SVR), random forest (RF), XGBoost, and Light GBM were used. The introduction of deep learning to this field is relatively new. In their work, Li and Wu [20] proposed the use of residual deep neural nets for imputing missing data in tropospheric VCD of NO 2 and performing an estimation of near-surface concentration. Chan et al. [21] proposed a neural-network-based method for the estimation of NO 2 concentration over Germany. A few other deep learning methods have been proposed for similar tasks [22], [23], [24].
In this work, we propose a Near-Surface Network (NeSNet), a deep learning method using convolutional neural networks for the estimation of the near-surface concentration of nitrogen dioxide, sulfur dioxide, and ozone using tropospheric/total VCD measurements from TROPOMI aboard ESAs Sentinel 5-p satellite. The study is carried out over the landmass of Ireland. Although CNNs have been used in other domains in remote sensing [25], [26], to the best of our knowledge, they have not been applied to this particular task. The novel contributions of this work are as follows: 1) application of convolutional neural networks in the domain of satellite-based remote sensing; 2) extensive study of the performance of deep learning models for estimation of near-surface concentrations for multiple trace gases; 3) independence of the proposed method from ground-based measurements, thus making it suitable for prediction over remote locations and, therefore, giving true global coverage; 4) univariate approach to the estimation problem, i.e., we only use VCD values as inputs to our model. The network has been designed for univariate estimation of near-surface concentrations. We only consider VCD values as input to the model. However, we also experiment with using satellite altitude as an auxiliary input to the model and record the change in performance. Here, we incorporated altitude to investigate how the model might behave at higher altitudes.

II. DATASET
In this work, we consider the area over Ireland for the estimation of the near-surface concentration of NO 2 , SO 2 , and O 3 using satellite measurements. Therefore, the dataset created for this purpose has two facets, ground monitoring station measurements and satellite measurements of tropospheric or total VCD. The estimation is carried out using data over a period of 16 months ranging from January 1, 2020, to May 1, 2021, for three different regions for three pollutants. The level of pollution is dependent on the location and the type of pollutant. Therefore, we have studied three pollutants in different regions according to their higher impact on the air. The data preparation strategy for satellite and ground data are individually described in the following sections.

A. Satellite Data
Satellite measurements of VCD for NO 2 and total VCD for SO 2 and O 3 are obtained from the TROPOMI aboard ESAs Sentinel 5-p satellite [27]. TROPOMI is a space-borne, nadirviewing imaging spectrometer capable of measuring wavelength bands between ultraviolet and shortwave infrared. It operates with a push broom configuration and has a swath width of 2600 km on the earth's surface. The L2-processed products for each of these gases are available in the form of netCDF files with a hierarchical data structure. The tropospheric VCD values are available for various ground pixels for each day in the period of study. However, for SO 2 and O 3 , only total VCD data are available for the period. Therefore, for these two gases, estimation is made with total VCD instead of tropospheric VCD. We also note here that for a single day, multiple sets of measurements may be available at different times of the day for a given area. In our work, we will ignore concentration variations within the day and only concern ourselves with the estimation of daily average concentrations. To meet our needs, we extract the VCD values and perform a regrinding procedure in order to obtain the data in the form of a matrix that represents a geospatial grid over the area of study. The exact geospatial coordinates that mark the area of study vary for each of these gases and are chosen according to the available ground monitoring stations.
For NO 2 , the coordinates of the lower-left and upper-right corners of the bounding box are (51.8285 • N, −9.4003 • E) and (54.323 • N, −6.032 • E), respectively. Inside this bounding box, a grid is defined such that each cell in the grid represents a certain geographic area. The resolution of the grid is fixed at 0.05 • × 0,05 • . Therefore, the grid has 49 rows and 67 columns. The following Algorithm I is a step-by-step description of the subsequent procedure.
With this procedure, a dataset of 485 matrices is obtained where each matrix represents gridded, satellite-measured VCD values.
For SO 2 , the lower-left and upper-right corners of the bounding box for the area of study are (51.795 • N, −9.0893 • E) and (55.004 • N, −6.105 • E), respectively. Defining a grid with resolution 0.05 • × 0.05 • gives 64 rows and 59 columns in the grid. The regrinding procedure as described for NO 2 is followed to obtain daily average VCD matrices of size 64 ×

B. Ground Data
Since the problem is one of estimation of near-surface concentrations of atmospheric trace gases, we require ground monitoring station data in order to train our model. These data are obtained from the Environmental Protection Agency of Ireland's monitoring stations. For NO 2 , data from 29 active monitoring stations across Ireland are available during the period of January 1, 2020, to May 1, 2021. For SO 2 , 14 stations are available, whereas for O 3 , 20 stations are available. After obtaining the data for the entire period of study for all three gases, a regrinding procedure similar to the one used for satellite data is used. The different bounding boxes as mentioned in the previous section are used to discern the geographical area of interest. A grid with resolution 0.05 • × 0.05 • is defined for this area.
The concentration values are then assigned to the grid cells by using geographical distance to determine the nearest monitoring station. In the end, we obtain 485 matrices each for NO 2 , SO 2 , and O 3 having sizes 49 × 67, 64 × 59, and 71 × 86, respectively.

III. METHOD
Upon inspection of near-surface concentrations, we observe that certain local patterns are exhibited, i.e., close geographical locations seem to have similar concentration values. Exploiting these local patterns in concentration may prove to be beneficial for an estimation model. Based on this, we propose a deep convolutional neural network to perform a regression of the near-surface concentration values from VCD measurements.

A. Convolutional Neural Networks
Convolutional neural networks, brought into popularity by LeCun et al. [28], have emerged as the go-to algorithm for image tasks. They are based on the principle of filters that share their weights for individual layers. A single convolutional layer may contain several learnable filters. These filters are applied to an image by convolution using sliding windows that convolve with all regions of the input image with an overlapping distance called the stride and produce outputs known as feature maps. The weights of the filters are learned during model training. Multiple convolutional layers can be stacked to build a hierarchical feature extractor.
The convolutional operation between the input feature maps and a convolutional layer within the CNN architecture is given in where * denotes a 2-D convolution, h (n) j is the jth feature map output in the nth hidden layer, h (n−1) k is the kth channel in the (n − 1)th hidden layer, is the kth channel in the jth filter in the nth layer and b (n) j is its corresponding bias term. CNNs owe their performance superiority mainly to the following characteristics: 1) extraction of local patterns in a hierarchical form by the means of stacked convolution layers; 2) translation invariance by virtue of the convolution operation; 3) reduced number of parameters as compared to feedforward networks owing to weight sharing. In light of these characteristics, we use a convolutional neural network model for further experiments on our task. The architecture of this model is described in the subsequent section.

B. Proposed Network Architecture
The proposed network architecture is depicted in Fig. 1, and is made of four convolutional layers, two dense layers, and a max-pooling layer. All convolution layers used in the models have filters of size 3 × 3. The input layer feeds the satellite measurements matrix to the first convolution layer which has 16 filters of size 3 × 3. The second convolution layer also has 16 filters. The output of this layer is passed to a max-pooling layer with pool size 2 and strides 2. Therefore, this layer effectively reduces the dimension of the feature map by half. The output of this layer is then fed to another convolution layer with 16 filters followed by a convolution layer with 32 filters. This makes up the convolutional block of the model. The output of this convolution block is then flattened to make it suitable for feed-forward processing. This output is then passed to a dense layer having 100 units and a linear activation function. Finally, for regressing the ground concentration value, a dense layer with the same number of units as the number of grid cells in the ground concentration matrix is used. Hence, the length of the output vectors of the models for NO 2 , SO 2 , and O 3 are 3283, 3776, and 6106, respectively.

IV. RESULTS
In this section, we describe the procedure for training the model. Subsequently, we discuss model validation and compare the performance of the proposed method against previously used methods. We also perform a detailed analysis of the effect of various parameters on model performance. The Python code for this work has been documented in a reproducible manner and is publicly available at https://github.com/Bibhash123/NESNet.

A. Model Training and Validation
In this study, we have generated satellite VCD data to calculate the concentration of surface pollution namely NO 2 , SO 2 , and O 3 . we have used this satellite VCD data in the CNN model to evaluate and predict the concentration of near-surface air pollutants in the various areas near the station. The proposed architecture are trained individually for NO 2 , SO 2 , and O 3 . Models with the proposed architecture are defined such that for each of these models, the output shape is the same as the number of grid cells for which concentration is to be estimated. The loss function for training is chosen as the mean-squared error. This is because we want our estimated values to be close to the true concentration values and, therefore, are interested in minimizing the deviation. Also during training, we observe the root-mean-squared error value at each epoch. To achieve proper convergence of the model an Adam optimizer with a learning rate of 0.001 is used. Adam optimizer has gained widespread popularity in recent years due to its reliable performance.
As mentioned previously, our dataset for each of these gases has 485 samples. In order to train and validate our models with this data, a fivefold cross-validation strategy is used. This implies that in each fold, 80% of the data is used for training and 20% is used for validation. The model is trained for 50 epochs. However, most of the time, the model seems to start overfitting to the training data after 30 epochs. To tackle this problem, model checkpoints are used to save the model with the best validation root-mean-squared error. We also use early stopping with a patience of 5 to stop the training if the validation RMSE does not improve for 5 consecutive epochs. For NO 2 , the trained model gives a cross-validation RMSE score of 7.20 µg/m 3 . For SO 2 , this score is 6.272 µg/m 3 , whereas for O 3 , it is 16.03 µg/m 3 . We also validate our models using a few other metrics. The proposed method shows a mean absolute error (MAE) of 4.94 µg/m 3 , 2.965 µg/m 3 , and 11.804 µg/m 3 for NO 2 , SO 2 , and O 3 , respectively. Apart from observing the deviation of the predicted near-surface concentrations from the true values, we are also interested in analyzing how well the predicted values correlate with the true values. Therefore, Pearson's correlation coefficient seems to be an appropriate metric. However, in our case, we are trying to predict the daily average near-surface concentration because of which the satellite and ground concentration data are related for individual days. We, therefore, feel that it is more reasonable to calculate Pearson's correlation coefficient between predicted and true values individually for each day. To report the performance, all of these coefficients are averaged out. Throughout this article, we refer to this modified correlation metric as R2 DAvg . With the proposed method, R2 DAvg of 0.648, 0.779, and 0.734 are obtained for NO 2 , SO 2 , and O 3 , respectively. Traditionally, various regression algorithms, boosting algorithms, and simple neural nets have been used for the task of concentration estimation. For example, Kang et al. [19] have used XGBoost and light GBM to perform an estimation using satellite VCD data as well as other auxiliary variables. Therefore, the performance of our model has been compared to that of other algorithms for different metrics in Table I. It is clear that in the case of univariate estimation, the proposed method performs all other previously used methods. Near-surface air pollutants such as NO 2 , SO 2 , and O 3 gases have a significantly negative effort on human health and sustainable development. To identify the level of air pollutants, government agencies have defined the air quality index (AQI). It is calculated based on the level of several air pollutants. In this work, we employed the CNN model to analyze and forecast the concentration of near-surface air pollutants. The AQI is calculated using this concentration of air pollutants. Fig. 2 shows the satellite VCD, Ground concentration, and prediction concentration of three pollutants near the surface for the proposed model as well as Linear Regression XGBoost Regression, and LGBM regression. We clearly observe that local patterns in near-surface concentrations are preserved in the predictions for the proposed method. There are no such patterns that can be observed between ground concentration and prediction concentration for linear regression, XGBoost, or LGBM.

B. Impact of Filter Layers on Model Performance
In the network architecture presented in this article, four convolutional layers with a max-pooling layer between the second and third convolutional layers are used. This architecture was arrived at via an experiment. In this analysis, we perform two subexperiments. In both of these, the number of filter layers in the model is gradually increased and the model is retrained. The cross-validation RMSE score is recorded in each case. To vary the filter layers initially, a model with one convolutional layer with 32 filters and 2 dense layers is defined. On top of this block, we gradually keep adding 16 filtered convolutional layers. In the first subexperiment, only convolutional layers are added, whereas in the second subexperiment after every three network layers, a max-pooling layer is added. The variations of the RMSE score obtained in both these cases are shown in Fig. 3(a). In the no max-pooling layer case, we can see that in general, the RMSE   On the other hand, fluctuations in the RMSE score are observed in the case of max-pooling. However, we note that optimum performance is obtained in the case where three 16-filtered layers are used. As shown in Fig. 3(b), this architecture corresponds to the one that has been presented as the proposed architecture in this work.

C. Impact of Dataset Size on Model Performance
In the majority of past applications of deep learning models, it has very well been established that the size of the training dataset is pivotal to the model performance. To validate this idea for our proposed model, we carry out an experiment to observe variations in the performance of the model with changing dataset size. Our dataset originally has 485 samples. In this experiment, we perform a random sampling of the data to select training sets with varying sizes and then retrain the model for each sampled dataset using a threefold cross-validation strategy. The steps of the experiment are detailed as follows.
1) 10% of the total dataset is selected by means of random sampling. 2) A model with the proposed architecture is trained and validated on the sampled data using a threefold crossvalidation strategy. The average RMSE score is recorded.
3) Steps 1) and 2) are repeated 20 times. In each repetition, the randomly sampled dataset varies. Therefore, a spread of RMSE score values is obtained. 4) The values obtained in step (4) are plotted in a box plot. 5) Steps 1)-4) are repeated by gradually varying the sampling size from 10% to 100% in steps of 10%. The box plots obtained in this manner for NO 2 , SO 2 , and O 3 are shown in Fig. 4. It is evident from the plots for all three cases that as the dataset size increases, the spread of the RMSE score decreases. This means that for larger training dataset sizes, the model is more robust to variations in the data the most robust model being obtained when using the entire dataset. Numerical evidence of this observation is shown in Table II. The interquartile range (IQR) is the difference between the RMSE values Q2 and Q1, where Q1 represents the point for which the 25 percentile of the cases have a lower RMSE and Q2 represents the point for which the 75 percentile of the cases have a lower RMSE. The IQR, therefore, gives a measure of the spread of the RMSE score. It is clear that as the dataset size increases, the IQR decreases.

D. Inclusion of Altitude Parameter on Model Performance
Throughout this work, we have explored a univariate approach to the estimation of near-surface concentrations of NO 2 , SO 2 , and O 3 . In our proposed method, we have only considered VCD  TABLE III  PERFORMANCE EVALUATION WHEN SATELLITE ALTITUDE DATA IS USED AS AN  AUXILIARY VARIABLE FOR MODEL TRAINING COMPARED TO WHEN ONLY  VCD DATA WITHOUT ALTITUDE IS USED values as inputs to our model. In this section, we discuss the possibility of including other variables such as temperature, relative humidity, wind speed, and wind direction as predictors as well. To that end, we conduct an experiment by including the altitude of the satellite at respective scan lines as another input in our model. Here, we incorporated altitude to investigate how the model might behave at higher altitudes. However, in the data available to us, the altitude values were only available for NO 2 . Therefore, the results of this experiment are shown for NO 2 .
For the experiment, a slightly modified architecture of the proposed neural net is considered. We take two independent convolutional blocks of the same architecture as that of the one in the proposed method. The inputs to the first convolutional block are the VCD matrices and the input to the second convolutional block are satellite altitude matrices. These satellite altitude matrices are obtained by adopting the same gridding procedure that is used for obtaining VCD matrices (described in the dataset section). The outputs of both these convolutional blocks are then concatenated and passed to the block of fully connected layers of the same structure as for the proposed model. This approach gives us a model with two input heads and a single-output head. One more aspect that is considered here is the fact the scale of values for VCD and altitude do not match and altitude values are particularly high in magnitude. To scale both types of inputs, standardization is used. The necessity of standardization was ascertained by the observation that model performance deteriorated significantly when standardization was not done. The results obtained in this experiment are shown in Table III.

V. CONCLUSION
In this work, we have proposed a convolutional-neural-netbased method for the estimation of the near-surface concentration of NO 2 , SO 2 , and O 3 using measurements of VCD from Sentinel 5-p TROPOMI. Our method performs this estimation using only satellite VCD measurements and does not rely on any ground-based measurements. In our experiments, we have shown that our method performs significantly better than methods that have been traditionally used in this domain. Our model gives RMSE scores of 6.272, 7.20, and 16.03 µg/m 3 for SO 2 , NO 2 , and O 3 , respectively. Also, we clearly establish that exploiting local patterns in concentrations can give good performance even when a univariate estimation is performed. While the proposed method gives a good correlation between predicted and true near-surface concentrations for individual days, other methods show drastically poor correlations for the univariate estimation case.
In the future, this work may be extended to perform similar estimations for other atmospheric pollutants, such as PM 2.5 and PM 10 . To possibly improve the effectiveness of the model, other meteorological variables such as temperature, relative humidity, wind speed, and wind direction can be incorporated. There is no meteorological information in the VCD data. To do this, we must gather meteorological data from a ground-based sensor. In the future, we develop an algorithm to correlate meteorological data to the VCD data.