Terrain-Guided Flatten Memory Network for Deep Spatial Wind Downscaling

High-resolution wind analysis plays an essential role in pollutant dispersion and renewable energy utilization. This article focuses on spatial wind downscaling. Specifically, a novel terrain-guided flatten memory network (abbreviated as TIGAM) with axial similarity constraint is proposed. TIGAM consists of three elaborately designed blocks, i.e., the similarity block, the reconstruction block, and the denoise block. To achieve long-spatial dependence, the similarity block interpolates low-resolution data to high resolution in an axial attention manner. Meanwhile, the reconstruction block aims to obtain a clearer high-resolution representation in closed form. Taking both of the meteorological prior and network design principle into consideration, this article also proposes a flatten memory module with learnable input for high-resolution denoising. Furthermore, for accurate detail reconstruction, a terrain-guided enhanced loss is presented benefitting from the high-resolution remote sensing data. This loss function integrates wind spatial distribution and terrain elegantly. Extensive quantitative and qualitative experiments demonstrate the superiority of the proposed TIGAM.


I. INTRODUCTION
A CCURATE and fine meteorological forecasts have always been urgent needs for both scientific research and intelligent service. Thanks to the development of remote sensing techniques, researchers are capable of obtaining high-resolution observations. The high computation costs of numerical weather simulations, however, limit the availability of fine-scale meteorological predictions. As a consequence, spatial downscaling, which is a hopeful technique of generating high-resolution meteorological data from the low-resolution one, grows into a hot research topic among atmospheric research. As an important component of meteorological analysis [1], [2], [3], [4], [5], spatial wind downscaling [6], [7] plays a substantial role among areas including oceanographic research [8], [9], climate analysis [10], renewable energy generation [11], and accurate meteorological forecasting [12]. Spatial downscaling aims to provide meteorological reanalysis/forecast as precise as possible for given areas, which has been studied for decades. Various downscaling methods have been proposed to improve spatial resolutions in meteorological research, which can be categorized into dynamical downscaling and statistical downscaling [13]. For dynamical downscaling, high-resolution numerical models are employed to simulate subgrid-scale physical processes based on the large-scale circulation predicted from coarser-scale models [14], [15]. However, dynamical downscaling is well known for its high requirements on computing resources. In contrast, statistical methods can produce competitive results with a low-cost computational resource compared to dynamical downscaling. There emerge a large amount of statistical methods to enhance the spatial resolutions of meteorological variables, such as bilinear interpolation, nearest neighbor [16], support vector machines [17], and weather generators [18]. For example, Jia et al. [19] develop a multiple linear regression model for downscaling the spatial precipitation fields by applying the relationships between precipitation and other environmental factors such as topography and vegetation. Lima et al. [20] propose a Bayesian Kriging model to downscale daily Global Climate Model rainfall into a fine-resolution grid, which can successfully reproduce the spatial variability in the observed rainfall. Moreover, based on a nonlinear relationship between precipitation at high-resolution and covariates at coarse/fine resolution, adaptable random forests have been utilized for spatial precipitation downscaling [21]. Empirical analysis indicates that it can outperform bilinear interpolation and replicate the spatial and temporal distribution of observed precipitation fields. Recently, Jing et al. [22] propose an end-toend network, which consists of a global cross-attention module, a multifactor cross-attention module, and a residual convolutional module, for satellite precipitation downscaling. Wang et al. [23] propose a new algorithm based on the Taylor expansion for land surface temperature downscaling. Results show that this method got the best downscaled results when the land surface temperature acquired time is consistent with the time of empirical concavity factor.
As for spatial wind downscaling, the diagnostic models, such as California Meteorology (CALMET) [24], is commonly used to generate high-resolution wind fields with horizontal resolution of several hundred meters [25], [26], [27]. Höhlein et al. [6] use a convolutional neural network (CNN)based model for downscaling low-resolution wind forecast simulations to a higher spatial resolution. Kirchmeier et al. [28] propose a probabilistic-based statistically downscaling approach This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ to predict a daily-varying probability distribution of local-scale wind speed, conditioned on the large-scale wind speed. Moreover, wind downscaling is extremely complicated on complex topography. Helbig et al. [29] apply local, fine-scale topographic parameters to surface wind speed downscaling method for numerical weather prediction (NWP) simulation. Based on coarsescale NWP wind speeds, the method can efficiently reproduce distribution statistics in the near-surface wind speed measurements. Winstral et al. [30] introduce a novel optimization scheme that can capture the local terrain structure for gridded wind speed downscaling. Given the wind-terrain interaction, the downscaling method effectively improves the error metrics and spatial distributions correlation between the downscaled wind field and observation. Based on the CNN, Dujardin and Lehning [31] propose a near-surface wind fields downscaling approach that considers the state of the atmosphere on various scales and its interaction with high-resolution topography. The downscaling method perform well in generating 50-m resolution wind fields, especially under the effects of complex topography like ridge acceleration, sheltering, and deflection.
Recent advances in machine learning, especially deep learning, have witnessed great success for meteorological applications. Specifically, the widely investigated image superresolution in the field of computer vision shares a similar objective with meteorological spatial downscaling, i.e., minimizing the reconstruction loss as far as possible. Recently, many superresolution methods based on deep learning have achieved stateof-the-art performance. Superresolution convolutional neural network (SRCNN) [32] is the first deep learning method for single image super-resolution. It is a three-layer CNN for patch extraction, nonlinear mapping, and high-resolution reconstruction. Basically, the SRCNN preprocesses the low-resolution image with bicubic interpolation, while the fast super-resolution CNN (FSRCNN) [33] applies deconvolution to reconstruct the corresponding high-resolution image from the low-resolution one directly. Recently, the efficient subpixel CNN (ESPN) [34] has become the common upsampling strategy and it has been widely used in state-of-the-art methods, e.g., enhanced deep residual super-resolution (EDSR) [35], residual channel attention network (RCAN) [36], hybrid residual attention network (HRAN) [37], and super-resolution recursive fractal network (SRRFN) [38]. There are also methods trying to dig out the deeper nonlinear information with more convolutional layers, e.g., very deep super-resolution network (VDSR) [39]. Specifically, different convolutional layers extract distinct features, in order to fully exploit these information, Liu et al. [40] raise a residual feature aggregation framework for more efficient feature extraction. This framework groups several residual modules together and directly forwards the features on each local residual branch by adding skip connections. Therefore, it is capable of aggregating these informative residual features to produce more representative features. However, different from the modelbased methods that can handle image super-resolution with different scale factors under a unified framework, the learningbased methods generally lack such flexibility. To address this issue, Zhang et al. [41] propose an end-to-end trainable unfolding network that leverages both learning-based methods and modelbased methods. By unifying diverse scale factors into a joint end-to-end framework, image super-resolution can be divided into several subproblems, that can be solved iteratively. Recent contributions [42], [43], [44], [45] also start to seeking a better performance under the widely used transformer [46] mechanism. Apart from convolutional methods, there are brilliant pioneer contributions focusing on recurrent super-resolution. Most of them devote to video super-resolution [47], [48], [49], because the convolutional recurrent networks are inherently suitable for spatial-temporal data. There are also researches concentrating on recurrent single-image super-resolution. For example, Yang et al. [50] introduce a deep-edge-guided recurrent residual network to progressively recover the high-frequency details. Yang et al. [51] propose a deep recurrent fusion network. Through which, multilevel features with large receptive-field can be obtained. Han et al. [52] propose a dual-state recurrent network for single-image super-resolution, and proved many state-of-the-art super-resolution techniques can be reformulated as a single-state recurrent network.
Regardless of the resemblance between meteorological downscaling and image super-resolution, we claim that utilizing the method designed for super-resolution directly for downscaling confronts with the following four ASCL challenges that must be thoroughly considered.
1) Apriority: Atmospheric science has been investigated for hundreds of years, and it has now established a perfect theory with multiple physical-constrained differential equations, including the motion equation, the continuity equation, the energy equation, the state equation, and the potential temperature equation. Through solving these equations, the current and future meteorological state can be obtained. In other words, the casuality of meteorological downscaling is embedded into prior expert knowledge. However, it is not the case for image superresolution. From the perspective of optimization, image super-resolution is an ill-posed problem with multiple feasible solutions. It is also a challenging direction of imposing prior knowledge into the data-driven model, i.e., the image super-resolution methods. Consequently, in order to take the benefits of the existing image super-resolution techniques, the apriority must be taken into consideration. 2) Similarity: Spatially, the meteorological state at a particular location is highly correlated with its corresponding neighbors. Temporally, driven by the earth rotation and revolution, the same meteorological state can occur periodically. More importantly, the meteorological state at the same latitude or longitude often tends to be consistent. For a natural image, taking a cat as example, it can appear at anywhere and anytime of the image, without much apriority. Nevertheless, it is not the case for meteorological data. For example, the typhoon can only occur at the ocean or coastal areas during regular time. This means, for accurate spatial downscaling, the high similarity correlation must be handled elaborately. From the perspective of computer vision, the axial attention mechanism [53], which explores  long-range spatial dependencies can provide an insightful solution to this problem. 3) Coupling: Even though the relationship among different meteorological elements is extremely complex, they have great impact to each other. Taking the wind as a simple example, the pressure difference p between two locations caused wind w. Both of the temperature T and the air mass are key impact factors of p. In addition, the humidity h influences air mass a lot. The wind w in turn, can also change h and T . In order to tackle this challenge, a common practice is to impose additional weather variables as input. However, treating weather factors as input data directly has several imperfections. On one hand, due to the fact that weather factors are typically obtained via professional meteorological sensors or observation stations, it is impractical and costly to record extra weather data. On the other hand, it is somewhat unreasonable to integrate weather factors together directly. In other words, different weather factors vary in data distribution, and how to integrate these factors together remains a worthwhile research. For generality, a substitutable latent learnable input with spatial memory should make sense. 4) Locality: Distinct from most of the computer vision related tasks, the meteorological problem is typically scale susceptible. Typically, a reasonable RGB image might be colorful and uniform with feasible saturation and contrast on a large scale for human perception. However, being affected by the terrain, the extreme weather, e.g., tornado, Fig. 3. Illustration about the concepts "cleaner" and "clearer." thunder, hail, etc., often occurs locally on a relatively small scale. For better performance, the high-resolution terrain should be a key indicator of local meteorological variation. To address the former illustrated issues, this article proposes terrain-guided flatten memory network (TIGAM), an elaborately designed deep convolutional network with meteorological prior guidance for spatial wind downscaling. Specifically, taking the atmospheric apriority into consideration, this article proposes a novel flatten memory module with learnable position-sensitive input. Benefitting from this mechanism, TIGAM is capable of simulating the prior meteorological motion equations and depicting the multielement coupling implicitly. Spatially, the axial attention policy is embedded into TIGAM for better describing the long-range spatial similarity. Moreover, through a comprehensive statistical analysis, we find that the winds are highly correlated to terrain. Consequently, this article further presents a new enhanced loss. This auxiliary guidance, which calculates the standard deviation among local areas and treats the standard deviation of terrain as targets, restricts its spatial distribution analogous to the corresponding terrain. Experiments compared with three basic interpolation techniques and 20 state-of-the-art deep learning methods demonstrate the effectiveness of the proposed TIGAM.
The remainder of this article is structured as follows. Section II presents the study area and the used data. The proposed method, i.e., TIGAM, is illustrated in detail in Section III. Section IV formulates the experimental configuration and the results. Details are discussed in Section V, while conclusions are finally drawn in Section VI.

A. Study Areas
The study region is constituted of the central and southern Guangdong Province of China and part of South China Sea, with a good coverage of 20.51 • N-24.50 • N, 111.50 • E-115.49 • E (see Fig. 1). The topography of this area exhibits spatial variability, with an arc-shaped mountain ranges (e.g., Jiulian Mountain) in Fig. 4. Employed meteorological prior guided denoise block. This block consists of four types of modules, i.e., the basic conv module, the downconv module, the flatten memory module, and the upconv module. Given a reconstructed wind image X R , the denoise block works in a cascaded ResUNet manner. Specifically, a flatten memory module is designed to fit the atmospheric differential equation, i.e., (8), in an implicit way. To imitate the multifactor atmospheric theory, an extra learnable input X L is adopted. Intuitively, the flatten memory module can be regarded as a spatial GRU network, and that is the reason why we called this module a flatten memory module. the north, a plain area in the middle, and the ocean area in the south. The wind field in this area is extremely affected by the interaction of complex topography, land-sea distribution, and weather systems (such as typhoons, low-level jets, cold fronts, etc.), making spatial wind downscaling in this region a worth studying problem.

B. Data Description
This article evaluates the proposed method on hourly China Atmospheric Real-time Analysis System-surface analysis at 1-km spatial resolution (CARAS-SUR1km) reanalysis data . CARAS-SUR1km) is a refined real-time professional meteorological product generation system developed by the Public Meteorological Service Center, China Meteorological Administration. It provides surface real-time professional service products (temperature, relatively humidity, u-wind, v-wind, etc.) per hour with 1 km × 1 km spatial resolution. This system integrates more than 60 000 ground stations observation data, CMA-GRAPES regional typhoon prediction model (GRAPES-TYM) data, high-precision terrain and underlying surface data, and other multisources observation data, by using the upgraded multigrid variational method. Therefore, the CARAS-SUR1km products have good temporal and spatial continuity, and can reflect the local refined topographic features well. Following the typical strategy for single image super-resolution in computer vision, the high-resolution CARAS-SUR1km data are treated as ground truth and it is sampled to 2 km as low-resolution model input. The high-resolution (1 km) orography data are also employed as auxiliary guidance information. Specifically, the CARAS-SUR1km data from 2016 to 2018 are used for training, while the year 2019 is employed for validation and the year 2020 for testing.

C. Data Preprocessing
For better convergence, the wind components are preprocessed utilizing standard normalization as while for terrain, it is first divided by 1500, and then, normalized as (1). Following the typical normalization trick in deep convolutional networks, the global mean and standard deviation are computed.

A. Problem Formulation
This article focuses on spatial wind downscaling. Different from precipitation or temperature, winds are vectors with both magnitude v and direction ω. For better understanding, winds are typically decomposed into two orthogonal components, i.e., the U-wind v cos ω and the V-wind v sin ω. Analogous to color images, this article regards these two elements as two relevant channels. Without specific illustration, this two-channel data are referred to wind image.
Accordingly, given a low-resolution wind image X t ∈ R w×h×c of a specific area at time t, where w and h represent the spatial width and height, c = 2 denotes the two orthogonal components, spatial wind downscaling aims at recovering a relatively high-resolution and accurate wind image f (θ; X t , s) =Ŷ t ∈ R sw×sh×c , which is as similar as possible to the ground-truth wind image Y t ∈ R sw×sh×c . Here, s represents the dowscaling ratio and f denotes the downscaling function defined by parameter θ, which can be mathematically obtained Fig. 6. Wind standard deviation versus terrain standard deviation. To be specific, the high-resolution terrain and the corresponding wind image are first split into small nonoverlap patches. Then, the standard deviations are calculated within each patch. Furthermore, a linear regression model is raised to fit the two standard deviations. Even though this relationship cannot be described in a fixed-parameter model, these two figures indicate that the wind standard deviation is highly correlated to terrain standard deviation. Fig. 7. Mechanism of the terrain-guided enhanced loss. Except the reconstruction loss, an auxiliary terrain-guided loss, which restricts the relationship between terrain and wind spatial distribution, is employed.
where L is a specified loss function, such as L1 loss, L2 loss, or others.

B. TIGAM Architecture
To achieve spatial wind downscaling, this article proposes TIGAM network. An intuitive illustration can be found in Fig. 2. Basically, TIGAM consists of three types of blocks, i.e., the similarity block, the reconstruction block, and the denoise block.
Given a low-resolution wind image, the similarity block first interpolates it to the desired size, and then, employs axial attention to discover the long-range spatial dependence.
After that, the reconstruction block aims to find a clearer high-resolution wind image and the denoise block devotes to obtain a cleaner high-resolution one. We should also note that, inspired by deep unfold super-resolution network (USRNET) [41], the reconstruction block, and the denoise block are executed iteratively in an RNN manner for better performance. We then present details of each block.
Analysis: Taking an RGB image as example, a "clearer" image emphasizes more on clarity with details, while a "cleaner" image highlights on purity without noise. Fig. 3 presents an intuitive expression. In Fig. 3, the subfigure (a) is noisy and blurred, the subfigure (b) is clear but noisy, the subfigure (c) is clean but blurred, and the subfigure (d) is clear and clean. These two processes are typically necessary. To be specific, given a low-resolution input, the reconstruction block aims to reconstruct a relatively high-resolution one. Nevertheless, this process introduces noise inevitably due to the inner property of solving an ill-posed super-resolution (downscaling) problem. These two types of block are then required.

C. Similarity Block-Exploring Long-Range Spatial Similarity
As illustrated in Section III-B, the similarity block interpolates a low-resolution wind image X t to high resolution roughly by taking long-range spatial dependence into consideration, through the axial attention mechanism as [53]. Without loss of generality, we omit the subscript t for simplicity (i.e., X is equivalent to X t without specific illustration). Consequently, this block can be formulated mathematically as where q, k, v, and r are the corresponding query, key, value, and relative position encoding, respectively. supersample s (· · · ) represents interpolating with scale factor s, and N denotes the corresponding neighbors. X S ∈ R sw×sh×c is the output of the similarity block. Analysis: Let us denote w = sw, h = sh, and C = c for simplicity, i.e., X ∈ R w ×h ×C . When conducting width-axis attention, i.e., calculating q, k, and v using X , the size of learnable weight matrixes are typically set as W hq ∈ R C×d , W hk ∈ R C×d , and W hv ∈ R C×2 d , where d is the hidden dimension. Taking q as example, the width-axis multiplication operation is then executed on two matrixes X i ∈ R h ×C and W hq ∈ R C×d for each i = 1, . . . , w . And the resulting matrix The same operation is implemented  TABLE I  OVERALL PERFORMANCE OF THE PROPOSED TIGAM AND OTHER STATE-OF-THE-ART METHODS on k and v. In practice, we often employ an integrated learnable weight matrix for efficiency. In other words At last, for all i = 1, . . . , w , we denote As for the relative position encoding r, it is a learnable input vector predefined in the axial attention layer that can be treated as relevant to the input data, i.e., X . Through former illustration, the proposed similarity block can exploit long-range spatial dependence, i.e., the similarity challenge, via the axial attention mechanism.

D. Reconstruction Block-Recovering a Clearer Wind Image
Intuitively, even though the similarity block takes both of latitude and longitude relevance into consideration through axial attention, the reconstructed wind image X S is relatively coarse. Following the strategy implemented in USRNet [41], this article also decouples the reconstruction process [i.e., (2)] into two cascaded half-quadratic subproblems. The first subproblem, which attempts to recover a clearer wind image X R in a closed-form solution, can be solved utilizing fast Fourier transform without optimizing any trainable parameters. The second subproblem, i.e., the denoising problem, which focuses on achieving a cleaner wind imageŶ , can be settled via the following proposed denoise block with a flatten memory strategy.

E. Denoise Block-Reconstructing a Cleaner Wind Image
The denoise block is designed in a ResUNet framework. A detailed description can be found in Fig. 4. Specifically, the denoise block contains a Basic Conv Module for feature transformation, three DownConv modules for high-level feature extraction, a flatten memory module for fitting atmospheric prior, three UpConv modules for feature reconstruction, and a basic Conv module for wind image reconstruction.
For efficiency, the basic Conv module contains only a single convolutional layer without any normalization or activation, i.e., X B = Conv(X R ). The DownConv module consists of three convolutional layers with a ReLU activation and a residual connection. Note that the last convolutional layer is set to be with stride for down sampling. The DownConv module can be formulated as where * represents convolution with parameter W and * S denotes convolution with stride. We omit the bias term for simplicity. Atmospherically, the wind is highly correlated with many essential factors as where − → V , p, ρ, − → Ω , − → g , and − → N indicate wind, air pressure, air density, Coriolis force, gravitational acceleration, and air friction, respectively. Therefore, according to (8), the interactive influence among different meteorological elements must be thoroughly considered for accurate spatial wind downscaling.
Specifically, a novel flatten memory module is proposed to deal with this issue. Suppose the output of last DownConv module is X D , the flatten memory module fit (8) and takes an extra learnable input X L to imitate other meteorological factors.
Generally, X D and X L are first concatenated and processed via a single convolutional layer with sigmoid activation, i.e., After that, we obtain a hidden memory representation X H via   The final output of the flatten memory module is defined as More details can also be found in Fig. 5. The UpConv module recovers high-level feature representation in a cascaded mechanism. Mathematically, it can be written as where * T denotes transpose convolution.
Denote the output of last UpConv Block as X U , another basic convolutional layer is implemented to reconstruct the high-resolution wind image asŶ = Conv(X U ).
Here, X L represents the learnable input, X D denotes the output of last DownConv module, and X M stands for the output of flatten memory module. As for X Z , X R , and X H , we follow their initial declarations in GRU [54] or ConvGRU [55] as update gate, reset gate, and hidden state.
Analysis: On one hand, atmospheric science has been investigated for hundreds of years, and it has now established a perfect theory. On the other hand, the emergency of deep learning has facilitated many related areas, such as meteorology. We believe that the brilliant meteorological theory is a constructive guidance for deep-learning-based meteorological methods . Then, we attempt to explore an elaborate network combing with the established meteorological theory. Consequently, the flatten memory module is proposed.
Specifically, the flatten memory module is designed to deal with the apriority and coupling challenges. In the case of apriority, it is hard to design a deep network under the physical constraint [see (8)] directly. However, (8) can be roughly divided into two parts, i.e., wind and other variables. To imitate (8), two input branches corresponding to wind and other variables are essential. For simplicity and generality, we defined other variables a learnable term. Putting this term aside, (8) contains only basic operations such as addition and multiplication. A single convolutional layer is then capable of imitating this relationship. Nevertheless, instead of using a single layer network, we employ a memory unit to fit this function. Meanwhile, the coupling challenge is considered through the former illustrated learnable input.
The reason why we define it a flatten memory module is twofold. First, this module is mainly inspired by the gated recurrent unit (GRU), which consists of multiple temporal memory units for sequential learning. The basic structure illustrated in Fig. 5 is similar to this memory unit. That is the reason why we still say it a memory module. Second, instead of exploiting temporal dependence, the newly proposed module trying to explore spatial relevance. This equals to flatten a recurrent module spatially while still preserves the property of memory, even though this memory stores spatial relevance rather than temporal dependence. That is the reason why we say it a flatten module. Consequently, we say this a flatten memory module.
In spite of the fact that ConvGRU and the proposed flatten memory module share same basic architecture, these two components are distinct at cell level. ConvGRU cell takes the original stable sequential snapshots as input while the flatten memory module cell operates on learnable spatial dimension. More specifically, the only input for ConvGRU is indeed the given sequence X. While for the flatten memory module, apart from the given input X D , there is an additional learnable positionsensitive part X L . This learnable part is independent from X D . We say it learnable owing to the fact that X L is obtained via backward optimization instead of forward computation. We say it position sensitive because the size of X L is similar as X D . From this point of view, the ConvGRU cell is a special case of the flatten memory module cell. In other words, when X L is set to be frozen during backward optimization and be relative to X D in the forward process, the flatten memory module degrades to ConvGRU.

F. Terrain-Guided Enhanced Loss
Through the former illustrated strategy, it is capable of dealing with first three ASCL challenges (Apriority, Similarity, and Coupling) and obtaining a relatively high-resolution wind image. As for the Locality, we observe that the standard deviation of wind within local areas is highly relevant to local terrain. Fig. 6 reveals this discovery.
Specifically, after an investigation of early contributions and a statical analysis, we found that the standard deviation of local terrain is an important indicator for the clear wind image reconstruction. Therefore, a terrain-guided enhanced loss is elaborately designed (see Fig. 7).
To be specific, the reconstructed wind imageŶ is first split into nonoverlap windows of size k × k, the standard deviation is then calculated within each window. The same operations are also conducted on terrain T . The enhanced loss is then define as ξ is a hyperparameter and the final loss L is a combination of both the reconstruction loss L re = Ŷ − Y q and the enhanced loss L en , i.e., where δ is a hyperparameter for tradeoff. The involved parameters can be optimized via back-propagation. Analysis: According to early works [29], [30], [31] and after a statical analysis, we say there is high correlation between terrain and wind, especially between terrain std and wind std. Unfortunately, how this relevance is mathematically formulated is unknown and has not been revealed. To find out this correlation, we denote x = std(Ŷ k×k ) and y = std(T k×k ), wherê Y , T , and k represent the reconstructed high-resolution wind image, the terrain and the nonoverlap window size, respectively. We can define this correlation abstractly via a nonlinear map as y = f (x). According to the theory of Taylor expansion, we can reformulate this as Alternatively, we rewrite (15) as . This meets with the proposed terrain-guided enhanced loss (13). We must note that this enhanced loss is quite coarse and limited. However, this is supposed to be a reasonable approximation of f (x) and the experimental results also demonstrate its validity.

A. Experiment Setting
TIGAM is implemented on Pytorch. Codes and sample data are available at https://github.com/Tsingzao/TIGAM. Generally, the downscale factor is set to be 2. The numbers of reconstruction and denoise block are all set to be 8. Without specific illustration, the hidden channel is set to be 64 and the convolutional kernel size is 3 × 3. There are three DownConv and three UpConv modules in the denoise block, the nonoverlap window size k = 8 and the hyperparameter δ = 0.05.

B. Evaluation Metrics
For fair comparison, this article employs standard metrics mean absolute error, for quantitative evaluation. Suppose Y t andŶ t denote the tth ground-truth high-resolution wind image and the down-scaled one, then the corresponding mathematical formulation are defined as follows: Apart from that, the widely used structure similarity (SSIM) and peak signal noise ratio (PSNR) are also evaluated. Specifically, SSIM and PSNR are formulated as where I max denotes the maximum value of wind, c 1 and c 2 are constants, and μ and σ represents the corresponding mean and variance, respectively. For PSNR, the related RMSE is defined For MAE, the smaller the better, while for SSIM and PSNR, the larger the better.
More concretely, early contributions such as SRCNN [32], SubPixel [34], VDSR [39], FSRCNN [33], and EDSR [35] perform slightly worse than the other methods, and this is mainly due to the fact that these models are comprised of simple stacked convolutional layers without much specific design for downscaling. Obviously, the recent methods SAN [62], US-RNET [41], and the newly proposed TIGAM are superior to the remaining methods. WindTopo [31] outperforms most of the single image super-resolution methods and the performance of Restormer [45] is quite competitive to TIGAM. From this table, it should be noted that the proposed TIGAM outperforms other methods consistently on all of the evaluated metrics (i.e., MAE, SSIM, and PSNR) in terms of both U and V components, demonstrating the effectiveness of the proposed method. We should note that WindTopo is an elaborately designed downscaling network for particular data format. We cannot employ this for our data directly. Therefore, we modify the original released codes on five aspects. First, we change the station-based data format to grid based. Second, the reimplemented WindTopo employs nonlocal convolution instead of multiple zoom patchbased convolution for efficiency. Third, the network is altered to totally pixel-in pixel-out. Fourth, for grid-based downscaling, we conduct the most widely used pixelshuffle operation. Fifth, limited by the available data, we only take the u-wind, v-wind, DEM, slope, and aspect as model input.

D. Qualitative Results
In order to give a more intuitive comparison, this subsection presents qualitative comparisons among TIGAM and other state-of-the-art methods. Specifically, the results at time stamp 2020-8-19 08:00 (BJT) are showed, during which, the seventh typhoon Higos in 2020 attacked the study area. The maximum wind speed reaches up to 126 km/h, and caused huge damages to the passed areas. Fig. 8 exhibits the differences between downscaling results and the ground truth. Since the proposed terrain-guided enhanced loss has taken the terrain effect into consideration, it enables our model to achieve a better spatial wind downscaling performance in the terrain area. It is worth noting that other meteorological variables beyond wind are also highly correlated with the terrain, e.g., temperature and precipitation. Therefore, our proposed model can be extended to these meteorological variables. This is a challenging topic and would be explored in our future research.

A. Ablation Study
Ablation studies are first conducted to systemically and comprehensively analyze TIGAM. Table II demonstrates the performance of the newly proposed blocks, including the axial similarity block (Axial), the flatten memory strategy (FM), and the terrain-guided enhanced loss (TL). For efficiency, both of SRCNN and SwinIR are employed as the baselines. SR-CNN/SwinIR + * denotes adding the corresponding * block to SRCNN.
According to Table II, the proposed three blocks have positive effects, illustrating that the newly proposed strategies reasonable for spatial wind downscaling. On one hand, the axial similarity block promotes the downscaling results due to the fact that meteorological elements are highly correlated spatially. On the other hand, the local spatial distribution constraint, which is embedded into the terrain-guided enhanced loss, also facilitates spatial downscaling. Finally, benefitting from the flatten memory strategy, a position-sensitive learnable input is taken into account and it can fit the motion (8) indirectly.
In addition, implementing SwinIR as the baseline outperforms that of SRCNN. The reasons are straightforward for that SwinIR consists of more elaborately designed modules for feature extraction, feature transformation, and details reconstruction. The proposed three new blocks promote model performance when they are embedded to SwinIR, illustrating these blocks feasible for wind downscaling. However, we should also note that the performance improvement of employing SwinIR as baseline is smaller than using the SRCNN.

B. Sensitive Analysis
As illustrated in Section III-B, the denoise block is executed iteratively in an RNN manner. Therefore, there is indeed only one denoise block. However, the memory units such as LSTM often runs in a recursive manner and the flatten memory module can be regarded as a special case of memory units. Consequently, the denoise block working in an iterative manner is reasonable. We then do sensitive analysis about the number of recurrent blocks. The results are presented in Fig. 9. From this figure, with the growth of iterative denoise block, the model performance first increases, and then, drops. Obviously, the model efficiency also decreases with the growth of iterative denoise block. Taking both of the model performance and efficiency into consideration, we select eight blocks eventually.
We also do sensitive analysis considering the tradeoff parameter δ using SwinIR as the baseline model. The results are presented in Table III. Specifically, the model achieves relatively best performance when the tradeoff parameter is set to be 0.05.

C. Maximum Wind
Furthermore, Fig. 10 presents daily maximum wind during January, April, July, and October, which can represent the winter, spring, summer, and autumn, respectively. The results indicates that the TIGAM can successfully reproduce the temporal variability in the observed daily maximum field. The daily maximum wind speed obtained by TIGAM is basically consistent with the observation. Compared to other methods, TIGAM has a better or competitive performance in terms of the bias of daily maximum wind in spatial wind downscaling.

D. Bias Analysis
We present an intuitive expression about the wind speed bias in Fig. 11. From this figure, the results of TIGAM exhibit high correlation with the ground truth, especially in extreme winds. The mean bias error (MBE) between TIGAM and observed winds is −0.0032, which indicates that TIGAM slightly underestimated the wind speed compared to the ground truth.

E. Deviation Analysis
Regression-based downscaling approaches have a tendency to underestimate extremes limited by the frequently used L1 or L2 loss. To avoid these underestimations and for better describing extreme details, we proposed the terrain-guided enhanced loss. Figs. 12 and 13 present the local deviations. Specifically, we display these deviations in two manners, i.e., temporally and spatially.
For temporal deviation, we select the central point of the study area as example and calculate its standard deviation temporally (from 2020-01-01 00:00 clock to 2020-12-31 23:00 clock). Similarly, there is no significant difference between the ground-truth observation and the TIGAM prediction, indicating that TIGAM performs consistently with time.
For spatial deviation, we calculate the standard deviation among the study area hourly. In order to illustrate TIGAM's performance clearly, Fig. 13 shows the deviation difference between ground-truth observation and TIGAM prediction. From Fig. 13, there is no remarkable difference between them, illustrating TIGAM is a feasible method for extreme spatial downscaling. Table IV shows the reimplemented model complexity and running time of all compared methods. Specifically, the running time is calculated on a single 24 G TITAN Xp GPU. We should note that the running time and model complexity are not linearly correlated. Compared with the simple architectures such as SRCNN, FSRCNN, SUBPIXEL, and VDSR, both of the model complexity and running time of TIGAM is large. Compared with the relatively complex methods such as RCAN, RDN, SAN, and USRNET, both of the model complexity and running time of TIGAM is comparable. Taking the model complexity, the running time and the model performance all into consideration, TIGAM is superior to other methods.

VI. CONCLUSION
This article proposed a novel TIGAM network for spatial wind downscaling. Specifically, a similarity block with an axial attention mechanism is proposed for exploiting long-range spatial dependence. To investigate the advanced meteorological prior, this article also presents a denoise block with a flatten memory module. Benefitting from this module, a position-sensitive input can be learned to fit the complex influence among different meteorological elements. Furthermore, a terrain-guided enhanced loss is proposed to depict the details of downscaling results. Qualitative and quantitative results demonstrate the superiority of the proposed TIGAM.
Generally, spatial downscaling equals to generate unknown values conditioned on limited information, and the widely explored generative adversarial networks provides a possible solution. Consequently, the future work will focus on learning a deep generative model for meteorological donwscaling.
Besides, in view of the strong temporal dependence among different meteorological elements, such as temperature, relative humidity, air pressure, and winds, a multimodal spatial-temporal downscaling algorithm with strict atmospheric restriction is also under construction. Apart from that, this article tries to solve the deep-learning-based atmospheric problem indirectly. We leave the direct method our future research topic.