Reduced-Complexity Multirate Remote Sensing Data Compression With Neural Networks

One of the main limitations to the adoption of deep learning for image compression is the need to train multiple models to compress at multiple rates. In the case of onboard remote sensing data compression, another limitation is the computational cost of the neural networks. Addressing both limitations, this letter presents a new reduced-complexity architecture for multirate compression of remote sensing images. The proposed architecture enables compressing at a precise user-selected rate while keeping a competitive performance in lossy compression on different sets of remote sensing data. The proposed approach is amenable for onboard deployment.


I. INTRODUCTION
T HERE is a lot to see from above, as illustrated by the approximately 5500 satellites that are currently in orbit around the Earth, of which more than 1100 are dedicated to Earth observation, according to the Union of Concerned Scientists as of May 2022 [1].With 19% of those Earth observation satellites launched in the last two years, it is clear that interest in remote sensing remains strong today, with ever more data being sensed and requiring transmission down to Earth.
Remote sensing data compression is crucial, given the vast volumes of captured data and the satellite's limited downlink capacity.In particular, lossy compression is often considered in order to fit the bitrate requirements of the mission [2], [3].Furthermore, computational capabilities are severely limited on board, introducing yet another key requirement to a remote sensing data compression algorithm.As a result, remote sensing data compression is an active field of research, with many new proposals and developments every year.
Machine learning (ML) has produced a breakthrough in lossy compression for natural images in the last 6 years, surpassing techniques such as JPEG [4], JPEG 2000 [5], and intraframe HEVC [6] in lossy compression [7], [8], [9], [10], [11], [12].ML compression has also been applied to remote sensing data [13], [14], [15], [16], [17], [18], [19].These contributions have employed models presented in [7] and [9] as baseline architectures.Regarding architectures dedicated to single-band images, Alves de Oliveira et al. [15] that applying the architecture proposed in [9] outperforms JPEG 2000 [5] for satellite image lossy compression, and further proposed a reduced-complexity version of that architecture competitive with the baseline models.This reduced-complexity design was later used for compression and denoising of panchromatic satellite images [20] and as part of a 1-D + 2-D framework for on board compression of hyperspectral satellite images [21].Other works published on ML compression of single-band remote sensing images include those by Xu et al. [22] and by Di et al. [23], both of which are also based on [9].
A crucial barrier for the practical adoption of models like those cited above is that they are trained for a specific ratedistortion trade-off, regulated by a parameter in their loss function.As a result, multiple models have to be trained to allow for compression at multiple rates.This is not only costly to train, but also has computational implications (storing multiple models in memory and loading them on and off for compression at different rates), not to mention the fact that they do not allow for a continuous choice of rates.Numerous authors have tackled this problem in order to propose multirate neural image compression [12], [24], [25], [26], [27], [28].
In this letter, we propose a novel multirate variant of the reduced-complexity compression architecture for remote sensing data from [15].The proposed method features compression at a user-defined bitrate, a novel capability with respect to other multirate compression neural architectures.To the best of our knowledge, this is the first application of such methods to remote sensing data, as well as the first attempt at complexity reduction of such multirate architectures, and a first in practical compression at a user-defined bitrate.It is demonstrated that the proposed multirate compression architecture performs on par with other more complex existing multirate compression architectures and with the multimodel baseline.In the following, multimodel baseline refers to a model trained multiple times, one for each rate-distortion trade-off.
The rest of this letter is structured as follows.Section II introduces the end-to-end optimized transform coding paradigm this work is based upon, going into detail on some of the multirate compression techniques proposed to date.Section III describes the proposed ML method and the associated architecture.Section IV reports experimental results.Finally, Section V provides a discussion of our findings.

II. END-TO-END OPTIMIZED TRANSFORM CODING
End-to-end optimized transform coding is the state-of-theart approach for lossy image compression based on ML.Just like in classical transform coding, it consists in encoding the image by transforming it to a latent domain, quantizing it, and entropy-encoding it.For decompression, the bitstream is entropy-decoded and transformed back to the original image domain.Setting this paradigm apart from classical transform coding, here, two neural networks act as the encoder and decoder transforms, respectively, and are jointly trained to minimize the rate-distortion trade-off [11].
Entropy coding in ML image compression can be achieved, for instance, by an arithmetic coder with some probability distribution known to both coder and decoder.In [7] this distribution is fixed, but rate-distortion performance can be greatly improved by adapting the distribution to the input data.This was investigated in [9], introducing a hyperprior, which consists of an additional neural network that processes the latent representation to extract and encode some of its parameters, such as its standard deviation.These parameters have to be encoded and sent to the receiver as side information.We will refer hereafter to the autoencoder architecture presented in [9] as the Ballé2018 architecture.This lossy compression paradigm using autoencoders has been refined over time and state-of-the-art ML image codecs today use increasingly complex versions of this concept, such as a Gaussian Mixture entropy model and an expanded residual CNN as the main transform [29], or an Asymmetric Gaussian entropy model with a large hyperprior network [12].
The loss function to be optimized in training by these autoencoders is where R(•) stands for the rate, D(•, •) stands for the distortion between the original and the reconstructed image, and λ is a parameter set during training that regulates the rate-distortion trade-off.Since the model is optimized for a specific ratedistortion trade-off, different models have to be trained in order to allow for compression at different rates.
To overcome the practical limitation imposed by having to train multiple models to compress images at different rates, a multirate architecture would allow for continuous bitrate choices and require the training of a single model, greatly reducing cost in time and resources.
Modulation is one of the most relevant techniques to achieve learned multirate image compression, and aims to mimic the adjustment of quantization step size performed by classical transform codecs.A modulated autoencoder is an autoencoder together with an auxiliary neural network-the modulating neural network-which, given some parameter (in this case λ), adjusts the activations in the main autoencoder network to , is carried out scaling every channel in the tensor by its corresponding entry in the vector.
produce a different output.For autoencoders used in image compression, a simple version of this consists in learning a scaling of the latent representation, which amounts to varying the quantization step of the encoder in relation to the rate-distortion multiplier λ, regulating that tradeoff as in the fixed-rate case.In this method, which we will refer to as bottleneck modulation, it is preferable to jointly optimize the main autoencoder and the modulating network, in which case the modulated network practically matched the rate-distortion performance of the multimodel baseline [24], [28].
Various more complex modulated autoencoders have also been proposed, which generally modulate the outputs of every layer in the encoder and decoder, not just the latent representation.We will refer to this approach as full modulation.It has been shown that full modulation could more closely match the performance of the multimodel baseline than bottleneck modulation [26].Full modulation can be extended further to include the parametrization of the entropy model [25].

III. PROPOSED METHOD
To address the specific needs of remote sensing data compression using this ML paradigm, we propose a novel method that can be feasibly run on on-board hardware, allows to compress at a user-selected bitrate, and is competitive with current standards and techniques.To the best of our knowledge, this is the first reduced-complexity multirate neural compression method for remote sensing data.

A. Reduced-Complexity Bottleneck-Modulated Compressive Autoencoder
A novel neural network architecture for remote sensing data compression is proposed, based on [9] and shown in Fig. 1.This design's complexity is reduced as in [15] by using a reduced number of filters in the hidden layers.A bottleneck-modulating network [24], [28] is included, which allows us to finely scale the features in the latent space for multirate compression.The proposed architecture also incorporates range-adaptive normalization as proposed in [21] Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I DETAILED COMPLEXITY OF THE REDUCED-COMPLEXITY BOTTLENECK-MODULATED ARCHITECTURE (PROPOSED AND BALLÉ 2018)
for data sources with widely varying sample distributions, as is common for remote sensing images.The choice of bottleneck-modulation over full modulation as advocated by Yang et al. [26] is motivated by two reasons: complexity reduction and precise bitrate allocation.Regarding complexity, bottleneck modulation clearly requires far fewer operations than full modulation and, although this negatively impacts performance, the difference is small, as is indeed found by Dumas et al. [24] and Yang et al. [26].

B. Complexity Analysis
Just as in [15], we compute the complexity of our proposed architecture and that of the Ballé2018 baseline counting the number of operations per pixel of each layer in Table I.In particular, we specify the number of operations of the encoder since this is the part subject to onboard constraints.Note that the number of filters per layer is detailed in Section IV.Since the modulating network's size is fixed regardless of the input data size, it is assumed in the calculation that the size is 128 × 128 pixels for the floating-point operations per pixel (FLOP/pixel) calculation.Clearly, the contribution of bottleneck modulation to the number of operations is practically negligible, and the overall complexity and number of parameters in the network is almost identical to that of the equivalent fixed-rate model.As shown in Table I, our proposed method requires 64% fewer operations than the Ballé2018 architecture in encoding.
The number of encoder operations of the proposed architecture of around 15kFLOP/pixel is compatible with an embedded implementation on board using hardware such as the Movidius Myriad 2 [15], and, by extension, with more capable and efficient state-of-the-art hardware, where the method can be run in a short-enough time so that it does not create a backlog of data to be processed.For completeness sake, we also mention that this complexity is two orders of magnitude higher than that of the CCSDS-122.0or JPEG 2000 standards [30].

C. Precise Bitrate Allocation
Using a bottleneck-modulated network makes precise bitrate allocation computationally practical, unlike using a fully modulated network.This feature is a first in multirate neural image compression.Indeed, the λ parameter used to regulate the rate-distortion tradeoff does not guarantee a fixed bitrate independent of image content.With the Ballé2018 architecture-trained with a single λ value-one may find widely different bitrates for different images depending on their contents.Using a bottleneck-modulated autoencoder, however, the choice of λ may be adjusted iteratively to achieve a user-selected bitrate with a single feed-forward pass.The small computational cost of the modulating network and the scaling of latent representation compared to the rest of the encoding scheme, as illustrated in Table I, makes this viable.This strategy of actively adapting λ to approximate a given bitrate would require multiple feed-forward passes if we used a fully modulated autoencoder, which makes it not feasible in practice.
In the proposed method, precise bitrate allocation is implemented as an iterative process using binary search: starting from the minimum and maximum λ values used in training, λ min and λ max , the bitrate at both ends and at their arithmetic mean, λ mid = ((λ min + λ max )/2), is computed.If the target bitrate is above that produced by λ mid , λ min ← λ mid is set, and otherwise λ max ← λ mid is set, until the bitrate obtained by λ min or λ max is off from the target by some precision error.

IV. EXPERIMENTAL RESULTS
To assess the proposed method, a number of models are trained using the bottleneck-modulated architecture from Fig. 1 or the equivalent fixed-rate multimodel baseline (same backbone transform without modulation).Either mean squared error (mse) or structural similarity index measure (SSIM) are used as distortion metrics in the loss function of the models, optimized using Adam [31].The reduced-complexity models use N = 64 and M = 192 filters per layer, while the Ballé2018 architecture is as in [9] (N = 128, M = 192), including range-adaptive normalization instead of uniform normalization.The proposed model is compared to said multimodel baseline and to JPEG 2000 [5], and code and visual examples are available at a GitHub repository at the time of submission. 1hree different remote sensing datasets are used in our experiments to show the general validity of the proposal as follows.
1) 12-bit simulated panchromatic Pléiades images of 50 cm resolution.A total of 96 820 × 820 images are used in training and 32 820 × 820 images are used in testing.Fig. 2 reports the rate-distortion performance of the different models we tested, trained for either mse or SSIM.As is clear from those diagrams, our reduced-complexity bottleneck-modulated models performed on par with the equivalent reduced-complexity multimodel baseline, on par with the Ballé2018 multimodel baseline, and decisively surpassed JPEG 2000 in all datasets, both under PSNR and SSIM as target or evaluation distortion metrics.As expected, our models  performed more competitively under the metric they were optimized for.These results show that the architecture simplification from [15] and bottleneck modulation as we propose do not significantly compromise compression performance on a variety of datasets with different characteristics when optimized for the mse and SSIM metrics.
Beyond mse and SSIM quantitative metrics, Figs. 3 and 4 provide some qualitative assessment.Fig. 3 shows a visual comparison of a Pléiades image compressed at low bitrate using mse or SSIM-optimized models of the proposed architecture.It can be observed that the SSIM-optimized model recovered certain features more accurately, such as the grooves in the earth near the plane, and generally producing more clearly defined edges.See our Github for more visual examples for Landsat 8 and AVIRIS datasets.Fig. 4 reports the average spectral angle (SA) loss between the original spectral pixel (x ∈ R n ) and the distorted spectral pixel ( x ∈ R n ), computed as SA = arccos (x T x/∥x∥∥ x∥) for multispectral Landsat 8 and hyperspectral AVIRIS images.As shown, our reduced-complexity single model once more surpasses JPEG 2000 and performs on par with the other learned models.Thus, despite our models only compressing in the 2-D domain, they remain competitive under spectral loss metrics, notably those optimized for mse.
Finally, a runtime comparison between the learned models is carried out to assess the complexity difference theoretically estimated before.The models for Pléiades images are evaluated, measuring the average time to compress all 688 test images.This experiment was conducted on a NVIDIA GeForce RTX 3060 Ti GPU, and Table II lists those times.As expected, the reduced-complexity multimodel baseline was the fastest method, followed by the reduced-complexity modulated model with a single quality input (hence modulating only once).Using a target bitrate in the modulated model yielded a slower runtime than the Ballé2018 multimodel baseline.Our multirate proposal requires an average of only 12 iterations to converge to the target bitrate with a precision of ±0.005 bps.

V. CONCLUSION
A reduced-complexity neural multirate compression architecture for remote sensing data is proposed, which can compress images at multiple and varying bitrates in a single execution, introducing, for the first time, a novel scheme that allows compression at a user-defined bitrate.Experimental results show the proposed method performs on par with the multimodel baseline and is superior to current JPEG 2000 standard in compression of remote sensing images of varying sources and resolutions.Finally, as was the case for the reduced-complexity baseline [15], the proposed encoder could be feasibly run on board using currently available hardware.

Fig. 1 .
Fig. 1.Proposed architecture.Blocks labeled "Conv N × k × k/s" indicate convolution with N filters using k × k kernels with a stride of length s, and the arrow indicates downsampling or upsampling.Blocks labeled "Dense k" indicate a dense feed-forward layer with k nodes.GDN stands for General Divisive Normalization, and ReLU stands for Rectified Linear Unit.The product between a tensor and a vector,, is carried out scaling every channel in the tensor by its corresponding entry in the vector.

Fig. 2 .
Fig. 2. Rate-distortion performance of our mse and SSIM-optimized models in simulated Pléiades panchromatic images, frame-by-frame Landsat 8 OLI images, and frame-by-frame AVIRIS images.Rate is measured in bits per sample (bps), also known as bits per pixel per component (bpppc).

Fig. 4 .
Fig. 4. SA performance of our models in frame-by-frame AVIRIS images and frame-by-frame Landsat 8 OLI images.