Panchromatic and Hyperspectral Image Fusion: Outcome of the 2022 WHISPERS Hyperspectral Pansharpening Challenge

This article presents the scientific outcomes of the 2022 Hyperspectral Pansharpening Challenge organized by the 12th IEEE Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (IEEE WHISPERS 2022). The 2022 Hyperspectral Pansharpening Challenge aims at fusing a panchromatic image with hyperspectral data to get a high spatial resolution hyperspectral cube with the same spatial resolution of the panchromatic image while preserving the spectral information of hyperspectral data. Four datasets acquired by the PRISMA mission owned and managed by the Italian Space Agency have been prepared for participants. They are made available for the benefit of the scientific community. Each dataset contains a panchromatic image and a hyperspectral cube with different spatial resolutions. More than 100 registrations have been received for the event. Four teams submitted their outcomes. Since no team actually outperformed the baseline provided by the organizers, the challenge was declared inconclusive and no winner was recognized.


I. INTRODUCTION
I N THE design of optical remote sensing sensors, you cannot have your cake and eat it too. There are tradeoffs between the signal-to-noise ratio (SNR), spectral resolution, and spatial resolution. Therefore, optical remote sensing images are typically either provided with a high spatial resolution but with limited bands (e.g., panchromatic or Red-Green-Blue (RGB) images), or with high spectral resolution but with lower spatial resolution (e.g., multi-/hyperspectral images). To obtain a super image (with both high spectral and spatial resolution image), researchers have developed many methods, ranging from image super-resolution (combining multiple hyperspectral images to enhance their spatial resolution) to pansharpening. Pansharpening aims at fusion of a panchromatic image with a multi-/hyperspectral one to generate an image with the same spatial resolution of the panchromatic data and the spectral resolution of the multi-/hyperspectral image. Many applications have benefited from pansharpening, such as visual interpretation in Google Earth, land-cover and land-use mapping, and so forth.
HyperSpectral (HS) images are widely used for several tasks thanks to their very appealing spectral features. The other side of the coin is represented by the coarse spatial resolution, often limited to 30 m for satellite-based observations. To overcome this issue, recent missions, such as the Hyperspectral Precursor and Application Mission (PRISMA) owned and managed by the Italian Space Agency, have been designed to simultaneously acquire both an HS cube and a Panchromatic (Pan) image. Leveraging on the different spatio-spectral features, hyperspectral pansharpening relies upon the fusion of the abovementioned products with the aim of taking the best of them.
This research topic has been strongly debated in the last decade leading to an extensive review paper in 2015 [1] presenting a qualitative and quantitative comparison of different HS pansharpening algorithms, both considering methods originally developed for multispectral pansharpening [2] and techniques specifically designed for HS pansharpening. This study highlighted two important aspects that are still relevant today: 1) the tradeoff between computational cost (critical for images with hundreds of bands) and fusion performance; and 2) the effect of a residual space-varying registration error between the Pan band and the HS cube. When such misregistration occurs, classical pansharpening methods, mainly those based on component substitution or some specifically adapted multiresolution analysis methods, are still competitive thanks to their robustness to these issues [3], [4]. Excluding previous works on multisensor classification that did not adopt any HS pansharpening algorithm, the pioneering study about HS pansharpening was published in 2007 [5]. It proposed an optimized component substitution method, which was formalized in 2008 for multispectral This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ pansharpening [6], and compared it with existing pansharpening algorithms. To the best of the authors' knowledge, it was the first paper presenting experiments on real Hyperion HS images sharpened by the concurrent Advanced Land Imager (ALI) panchromatic acquisition. Novel HS pansharpening methods appeared in the following years, either based on quality index optimization [7], or spectral preservation constraints [8]. An interesting study on the pansharpening performance on HS/Pan data acquired by the same platform or by different platforms was presented in [2]. New classical methods were then proposed, based on the guided filter [9], variational approaches [10], and a component substitution technique improved by saliency analysis [11]. Recently, the number of research papers on HS pansharpening has grown dramatically, in particular related to deep-learning [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]. Among them, it is worth citing the algorithm proposed in [21] that has been tested also at the original Pan scale on a real Hyperion/ALI dataset.
Only very few works on HS pansharpening of PRISMA images have been published so far. An application-oriented work using pansharpened PRISMA data has been presented in 2021 [25]. The goal of the proposed challenge has been to boost the research on hyperspectral pansharpening pushing researchers toward addressing more challenging issues involving the use of new data. Hence, four datasets acquired by the PRISMA mission have been prepared. Each dataset contained a Pan data and an HS image. The spatial resolution of the Pan image is 5 m. Instead, the HS sensor acquires about 250 spectral bands with a spatial resolution of 30 m.
The contest has been organized in conjunction with the 12th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS). IEEE WHISPERS has the aim of bringing together all the people involved in HS data processing, i.e., everything from the acquisition, the calibration to the analysis (image processing, signal processing, feature extraction, dimension reduction, unmixing and source separation, classification). The event has also been supported by the Italian Space Agency, the Geoscience and Remote Sensing Society, and the Image Analysis and Data Fusion Technical Committee. The organizers observed a good interest, with more than one hundred registrations. However, the proposed task appeared a difficult one, especially because of the unusual ratio in the spatial resolutions (6, while most traditional pansharpening problems deal with a ratio of 4) and probably also because of the novelty of the sensor and its spectral characteristics. Eventually, only four teams submitted their results for final assessment. The four teams addressed the proposed issue exploiting innovative solutions relied upon machine learning and variational optimization-based methodologies. Despite of the use of state-of-the-art methodologies, the teams did not get outstanding results if compared with some baseline methods proposed tens of years ago for multispectral pansharpening. For this reason, the committee decided to close the contest and claim it is inconclusive (no winner).
This article presents the four datasets exploited for the challenge together with the description of the data preparation procedures (coregistration, band selection, etc.). The baseline approaches used for performance assessment of the participants' outcomes have also been described. Moreover, the article focuses attention on the protocols (both at reduced resolution and full resolution) and the related quality metrics adopted to assess the performance. Afterward, the quantitative results (both at reduced resolution and full resolution) and a qualitative analysis are shown to the readers. Finally, some further (and more general) considerations about the results of the contest and the new trends rewarding the use of artificial intelligence solutions have been proposed.
The rest of this article is organized as follows. Section II is related to the presentation of the Hyperspectral Pansharpening Challenge with the related datasets, baseline methods, and protocols for performance assessment. Instead, Section III is devoted to the description of both the quantitative and qualitative outcomes. A general discussion about hyperspectral pansharpening is also presented in Section IV. Finally, Section V concludes the article.

II. IEEE WHISPERS 2022: THE HYPERSPECTRAL PANSHARPENING CHALLENGE
The Italian Space Agency's PRISMA (Hyperspectral Precursor of the Application Mission) satellite was launched from the European space base in Kourou (French Guiana) on March 22, 2019. In 2020, the authors of this article conceived the idea to propose an international contest for PRISMA data exploitation. The contest, to be organized in conjunction with the Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), was intended to boost the research on hyperspectral pansharpening, thanks to the spatial and spectral capabilities of the HS and Pan sensors mounted on the PRISMA platform. After the pandemic emergency, the concept became reality in conjunction with the 2022 edition of IEEE WHISPERS, taking place in Rome in September 2022. Thanks to the Italian Space Agency (ASI), four datasets have been authorized for public distribution and finally prepared for the participants. The 2022 WHISPERS Hyperspectral Pansharpening Challenge was finally launched in February 2022.

A. Datasets
Four datasets are distributed for the PRISMA contest 1 , namely FR1, FR2, for pansharpening at full spatial resolution (5 m for the Pan channel, P, and 30 m for the HS image, HS), and RR1, RR2 for pansharpening at reduced spatial resolution (30 m for the reduced resolution Pan image, P ↓ , and 180 m for the reduced resolution HS cube, HS ↓ ).
The datasets have been prepared for the challenge participants through the following preprocessing steps.
1) Downloading level-2-D image data product from the ASI PRISMA portal for data distribution [26]. The level-2-D product refers to the geocoded at-surface (Bottom-of-Atmosphere) reflectance data [27]. Specific information on the PRISMA HDF5 format are available in the PRISMA products specification document [28]. The document also indicates different tools for data reading both for commercial remote sensing software packages and Python usage. 3) Removing atmospheric water absorption bands, low signal-to-noise ratio (SNR) bands, and bands affected by severe striping, which is due to a temporary unbalanced response of a VNIR or SWIR detector. 4) Assembling the selected VNIR and SWIR bands into a single image cube spanning the VNIR and SWIR wavelength range. The final number of bands varies among the four datasets, as reported in Table I. 5) Correcting any residual space-varying misalignment between the HS bands and the Pan band through local correlation computation and nonrigid transformation. This operation has been performed by interpolating the HS bands at the 5-m panchromatic scale through bicubic interpolation and by using the Sentinel-2 temporally closest acquisition as reference. Sentinel-2 band 4 at 665 nm is bicubically interpolated first at 5 m and then used as a common reference for displacement estimation of the 660 nm PRISMA band and the Pan band. It should be noted that the estimated displacement between the 660-nm band of PRISMA and the Sentinel-2 B4 band is finally used for all VNIR and SWIR bands of PRISMA, which are perfectly coregistered in the original ASI product. The images P and HS of the full resolution datasets FR1 and FR2 have been obtained by extracting a 12 km × 12 km portion (2400 × 2400 pixels for P and 400 × 400 × N pixels for HS) from the original 30 km × 30 km PRISMA acquisition, after accurate coregistration.
The 900 × 900 P ↓ image and the coregistered 150 × 150 × N HS ↓ denote the 30 m resolution Pan image and the N -band 180 m resolution HS image on a geographical area of 27 km × 27 km. P ↓ has been obtained from the original P by using an ideal antialiasing low-pass filter, while HS ↓ has been produced from the original HS by applying spatial filters matching the sensor's Modulation Transfer Functions (MTFs) of the VNIR and SWIR bands.
For easy portability, images are in ENVI format, that is, a flatbinary raster file in 16-b unsigned integer data format and Band Sequential (BSQ) interleave type, with an accompanying ASCII header file. Each HS image contains N bands selected from the original VNIR-SWIR PRISMA bands. The ASCII header also contains the values of the central wavelengths of the HS bands. Table I reports the main characteristics of the four datasets in terms of spatial size, number of bands, and geographical extension. Fig. 1 shows the Pan bands and true-color composites from the HS image of the four datasets.
FR1 denotes the PRISMA acquisition over the city of Bologna, Italy, and its surroundings, on November 7, 2020. The FR2 dataset was acquired over the Florence area (Italy) on June 27, 2020. For the abovementioned cases, the spatial resolutions of the Pan and HS bands are those of the original acquisitions, i.e., 5 and 30 m, respectively. RR1 denotes the acquisition over the city of Barcelona (Spain) on January 24, 2020, while RR2 corresponds to the PRISMA acquisition on August 13, 2020, over the suburbs of Milan, Italy. The latter two datasets are composed of spatially degraded images, whose characteristics are recalled in Table I.
The four datasets, together with a Matlab toolbox for testing and evaluating the baseline methods, are publicly available at [29].

B. Baseline Methods
Five methods are exploited as baseline solutions in this challenge. The code of the baseline solutions is made available to the scientific community. 2 They have been borrowed from the pansharpening literature [2] and successively used for addressing the hyperspectral pansharpening task [1].
The first two methods belong to the component substitution class [30], [31]. More specifically, the Gram-Schmidt (GS) approach [32] is considered. This is a quite dated solution relied upon the substitution of the first component of the HS cube after the GS transformation. The high resolution Pan image (with adjusted statistics) is substituted to get the resolution enhancement. The fused image is obtained by the inverse GS transformation on the new set of transformed components. The second technique (an enhanced version of the GS) is based on the same transformation (i.e., the GS one) but improving it using a liner model for the intensity (first) component. The GS Adaptive (GSA) [33] synthesizes an intensity component using a linear model, properly estimating its weights through the relationship between the HS image and a low-pass filtered and decimated The other three baseline methods are representative of the multiresolution analysis class [30], [31]. The third method is the classical Additive Wavelet Luminance Proportional (AWLP) [31], [34]. The wavelet planes of the Pan image are added to the luminance component of the HS image. The adopted injection rule is based on the idea to have proportionality between each HS band and the injected details, thus preserving the spectral signature. The fourth technique is the MTF-Generalized Laplacian Pyramid (MTF-GLP) [31], [35]. It exploits GLPs to extract details from the Pan image. The Gaussian filters are designed to match the HS sensor's MTFs [35]. A linear regression model [31] is used to address the details injection problem. Finally, the last method is the Morphological Filters (MF) [36]. It relies upon a nonlinear decomposition scheme using half gradient MFs. A multiplicative injection model is exploited to complete the fusion procedure [37].
It is worth to be remarked that machine learning-based approaches are not adopted as baseline methods. Indeed, their use with data having different spatio-spectral features (e.g., a different number of spectral bands) often requires changes in the original network architectures and their retraining to properly address the problem at hand. Thus, the application of these approaches to the fusion of PRISMA data is not straightforward.

C. Protocols
The assessment of image fusion products is a hard problem with a nontrivial solution. Since 30 years ago, researchers have studied the topic proposing ways to address it considering its ill-posed nature. Indeed, the assessment at the working (full) resolution leads to the absence of a reference (ground-truth) image that is the image that the HS sensor would observe with the highest spatial resolution (i.e., the one of the Pan sensor). This is the so-called synthesis property of Wald's protocol [38], which is hard to be measured without a reference. Thus, several indexes, inspired by the quality without reference (QNR) protocol [39], have been proposed in the literature. These metrics leverage on working at full resolution, but paying it with often inaccurate evaluations because of the absence of a reference (ground-truth) image. Thus, to complete the assessment, an evaluation at reduced resolution is crucial (with the aim of implementing Wald's protocol) to generate a reference image (i.e., the original HS cube). The assumption under this kind of assessment is an invariance among scales of the performance of the pansharpening approach under evaluation, i.e., a method showing some performance at reduced resolution should have the same behavior at finer resolution. Obviously, this hypothesis is not always valid and the process of reducing input data resolutions can represent a degree of freedom, often unacceptable for a validation procedure (even though step forwards have been done imposing the design of filters taking into account the spatial models of the acquisition sensors [35]). Thus, considering the pros and cons of both the procedures, this challenge relies upon a protocol accounting for both reduced and full resolution assessments to have a complete evaluation of the submitted outcomes. Finally, a visual inspection of fused products have also been performed to highlight local patterns and distortions, in particular for full resolution outcomes.
Reduced resolution assessment measures the similarity of the fused product to an ideal reference, i.e., the original HS image. That is possible by degrading the resolutions of both the original HS and Pan images, and by performing fusion from those degraded data. Clearly, the choice of the filter is crucial in this validation protocol. The filter is defined for ensuring the consistency property [38] of the pansharpening process. Thus, it is straightforward that the resolution reduction of the HS image should be done by exploiting spatial filters matching the HS sensor's MTFs [30]. 3 In addition, the filter used to degrade the Pan image should be designed to preserve the details that would have been seen if the image were acquired at reduced resolution. Accordingly, a common choice is the use of an almost ideal filter [30]. The more similar is the obtained pansharpened image to the original HS image, the higher is the measured quality. Such a similarity degree can be easily computed through score indexes that compare two multiband images. In this challenge, we use the following set of well-established metrics [30], [31].
1) Spectral Angle Mapper (SAM) [30], [31], [40]. Given two spectral vectors, v andv, both having N components, in is the test spectral pixel vector, the SAM denotes the absolute value of the spectral angle between the two vectors where < ·, · > indicates the dot product, cos −1 denotes the arccosine function, and || · || 2 is the 2 norm. The SAM is usually expressed in degrees. The lower the value, the better the quality. The SAM is equal to zero if and only if the test vector is spectrally identical to the reference vector, i.e., the two vectors are parallel and may differ only by their moduli. A global spectral dissimilarity, or distortion, index is obtained by averaging the index over the whole scene. 2) ERGAS [30], [31], [41]. The index, whose French acronym stands for relative dimensionless global error in synthesis, is a normalized dissimilarity index that offers a global indication of the distortion toward the reference of a test multiband image where d h /d l is the ratio between pixel sizes of Pan and HS; μ(n) is the mean (average) of the nth band of the reference; RM SE(n) is the root mean square error (RMSE) calculated between the fused and the reference images for the nth spectral band; and N is the number of bands. Low values of the ERGAS indicate high similarity between fused and reference HS data. The ideal value is zero. 3) Q2 n [30], [31], [42]. It is the multiband extension of the Universal Image Quality Index (UIQI). Each pixel of an image with N spectral bands is accommodated into a HyperComplex (HC) number with one real part and N − 1 imaginary parts. Let z = z(m, n) andẑ =ẑ(m, n) denote the HC representation of the reference and test spectral vectors at pixel (m, n). Analogously to UIQI, namely, Q2 0 = Q, Q2 n can be written as the product of three terms where σ ·,· , σ · , and· denote the covariance, the standard deviation, and the mean operators, respectively, and | · | represents the modulus of a vector. The first term is the modulus of the HC Correlation Coefficient (HCCC) between z andẑ. The second and the third terms measure contrast changes and mean bias, respectively, on all the bands simultaneously. Statistics are calculated on N × N blocks, typically, 32 × 32, and Q2 n is averaged over the blocks of the whole image to yield the global score index. Q2 n takes values in [0, 1] and is equal to 1 if and only if z =ẑ for all the pixels. Full resolution assessment infers the quality of the pansharpened image at the Pan resolution without resorting to a single reference image, which is not available. Consequently, the problem of assessing the quality of pansharpened products at full resolution is intrinsically ill-posed. To solve this issue, new distortion measurements have been introduced, such that they do not depend on the unavailable high resolution HS cube. The Q * index [43] is defined as which is composed by the product of D k λ and D * S , quantifying the spectral and the spatial distortions, respectively, exploiting the weights α and β (both experimentally set to 1). The higher the Q * index, the better the quality of the fused product. The maximum theoretical value of this index is 1 when both D k λ and D * S are equal to zero. The spectral distortion index is calculated as follows [30], [43]: where HS ↓ is the MTF-filtered pansharpened HS image considering a resolution ratio equal to R; HS is the original HS image interpolated to the Pan scale (R times lower than the HS scale); and Q is the UIQI averaged along the HS spectral bands. 4 Instead, the spatial consistency, 1 − D * S , proposed in [44], is defined by the multivariate linear regression modeling the relationship between the original high resolution Pan and the pansharpened HS bands. The figure of merit of the matching between the abovementioned images is given by the coefficient of determination that is used to measure the spatial consistency [44]. The choice of this combination of metrics for measuring spatial and spectral qualities has also been corroborated 4 It is worth to be remarked that the original implementation of D k λ is based on the use of the Q2 n index instead of the average of Q metrics calculated for each spectral band. This modification has been provided for computational reasons considering the high number of HS bands leading to a very slow evaluation of the Q2 n index. Anyway, comparable performance can be obtained by this modification with a relevant advantage from a computational burden point of view. for classical pansharpening in [43]. The code for assessing the performance both at reduced resolution and at full resolution is made available to the scientific community. 5

III. OUTCOMES OF THE IEEE WHISPERS 2022 HYPERSPECTRAL PANSHARPENING CHALLENGE
This section is devoted to the presentation of the results of the IEEE WHISPERS 2022 challenge on hyperspectral pansharpening. The quantitative assessment will be presented first under the protocol defined in Section II-C. Four datasets (two at reduced and two at full resolutions) have been shared with the participants. The quality metrics have been calculated for both the baseline methods and the outcomes provided by the teams joining the challenge. Moreover, a qualitative assessment of the delivered products has been proposed in this section. This is done to support the assessment, in particular at full resolution. Finally, in Section III-C, a general discussion on the obtained outcomes is reported.

A. Quantitative Assessment
Four datasets have been considered for this challenge. All the images are captured by sensors onboard of the PRISMA mission. Two datasets refer to the reduced resolution assessment (named RR1 and RR2), instead, the other two are at full resolution data (called FR1 and FR2). Baseline approaches, as presented in Section II-B, have been run on the four datasets. The performance at reduced resolution and at full resolution for both baseline approaches and the outcomes provided by the participants is measured according to the protocol described in Section II-C.
The results at reduced resolution are reported in Tables II  and III for the RR1 and RR2 datasets, respectively. Having a look at the overall Q2 n index, we can remark very similar results between the two datasets. Indeed, the best approach is always the GSA followed by Team 3's one. Instead, the worst method  is always the one provided by Team 1. The ERGAS index, even though measuring a different quantity (just a radiometric distortion in 2 ), is generally in agreement with the findings of the Q2 n . Again, the best approach is represented by the GSA and the worst results are related to Team 1. 6 Focusing on teams' outcomes, again, Team 1 is the worse followed by Team 2. Instead, more comparable performance can be obtained by Teams 3 and 4 representing the best results among the teams' ones. The last consideration is about the spectral distortion measured by the SAM index. Generally speaking, results seem to be much more dependent on the scenario and an overall evaluation turns out to be more complicated. Just focusing on the teams, again, Team 1 is the worse showing a relevant spectral distortion of the results. Among the other methods, the highest results of the SAM index are obtained by Team 4. Instead, the products of Team 2 and Team 3 seem to be more spectral consistent. The results at full resolution are instead reported in Tables IV and V for the FR1 and FR2 datasets, respectively. As for the assessment at reduced resolution, focusing on the overall quality metric, Q * , we got quite similar results on FR1 and FR2. Indeed, the best approach is always represented by the one of Team 4 followed by two baseline techniques, i.e., GSA and MTF-GLP. Comparable performance is reported for the techniques provided by Teams 2 and 3. Again, Team 1 got the worst overall performance, but showing the best spatial consistency, thanks to the estimation of linear-based spatial and spectral responses following the approach proposed in [45]. However, the best performance on this index is not enough to get the overall good performance because of the relevant spectral distortion. From a spectral distortion point of view, the best approach is the simple upsampling with bicubic interpolation (named EXP) thanks to the fact that no injection of spatial details is performed, thus, avoiding the introduction of spectral distortion in this phase (as in the case of the other compared methods). The overall results, exploited to define the ranking in Table VII, are obtained taking into account the overall quality indexes both at reduced resolution and at full resolution. More specifically, the Q2 n is taken at reduced resolution representing a good candidate to give a big picture considering both radiometric and spectral distortions in the fused products. Instead, at full resolution, the unique metric representing an overall accuracy is the Q * index obtained by combining the spatial (D * S ) and spectral (D k λ ) distortions. Since we combined two Q-like metrics (i.e., Q2 n and Q * ), the final overall accuracy is simply obtained by averaging these metrics calculated for RR1 and RR2 (Q2 n ) and for FR1 and FR2 (Q * ). The results are reported in Table VI. The best results are reached by the GSA method followed by the MTF-GLP approach. Teams 3 and 4 obtained high performance getting the third and fourth positions, respectively. Team 3 is the best among the participants' approaches showing a good balance in performance and robustness varying the assessment procedure and scenario under test. Instead, Team 4 has the best performance at full resolution paid by a lower accuracy at reduced resolution (in particular on the RR1 test case). Medium performance is instead shown by the outcomes provided by Team 2. Finally, the worst performance (worse than the simple upsampling approach, EXP) is reached by Team 1.

B. Qualitative Assessment
The results on the four datasets of the challenge, namely FR1 and FR2 for the full resolution assessment and RR1 and RR2 for the reduced resolution assessment, are displayed and compared in this section for qualitative, visual evaluation. Fig. 2 shows a 512 × 512 fragment of the FR1 scene. A true-color representation, a color composition in the SWIR and NIR wavelengths, both at the original 30-m resolution, and the 5-m Pan band are reported. The true-color pansharpening result of a baseline method, namely the GSA [33], is also shown in Fig. 2. This color composite is displayed with the same visualization parameters of the true-color upsampled image to visually evaluate the spectral quality of the fused image. Since a reference image is not available for full resolution assessment, the GSA result, which achieved the best score indexes among the tested algorithms (Tables VI-VII), is reported in Fig. 2 for useful comparison with the challenge results illustrated in Fig. 3. Fig. 3 shows the results obtained by the four teams on the FR1 dataset. Each image is displayed with linear stretching between 1% and 99% of the histogram range for each band. The same visualization parameters used for the true-color composites in Fig. 2 cannot be applied to the fusion results produced by the four teams, due to mean bias (significant for Team 1) between original and pansharpened bands. As an example, the histograms of the 478-nm band before and after Team 1's pansharpening are shown in Fig. 4.
In this way, Fig. 3 allows for a fair comparison of the results from the different teams at their best visualization conditions. Team 1's result appears sharp but presenting spectral distortions in the vegetated area, particularly evident in the true-color composition. Comparisons with Fig. 2 confirm the numerical results of Table IV. Superior spectral quality is shown by Team 4, both from the true-color and the false-color compositions.   Fig. 2. Linear stretching between 1% and 99% of the histogram range for each band is used for visualization.
A common problem of the products by Teams 1, 2, and 3, but not Team 4, is the injection of residual striping artifacts into the displayed bands. This problem is evident in the true-color representation.
The results for the FR2 dataset are reported in Figs. 5-6. The spectral distortion introduced by Team 1's pansharpening algorithm can be appreciated on the rooftops of the residential area in the lower part of the image, both for true-color and false-color representations. On the other hand, the spatial details injection of the Team 1 product is efficient. For comparison, see the bottom images of Teams 2-4 that appear less sharpened. Similarly to the FR1 dataset, also the FR2 visual assessment reveals that Team 4 better preserves the spectral information.
The results for the RR1 dataset are shown in Figs. 8-9. Here, a direct visual comparison of the challenge results from the four teams and the original (reference) image in Fig. 8 is possible. Spectral distortion is severe for the pansharpened images by Teams 1, 3, and 4. As an example, the spectral information of the river appears highly distorted in products provided by Teams 1 and 4, particularly in the false-color representation at (2053,1555,855) nm wavelengths (bottom of Fig. 9). The Team 2 outcome shows insufficient spatial details injection, which is more evident in the true-color representation. Fig. 7 instead shows the spectral signatures of the ground-truth and the ones obtained by the compared approaches for three selected points representing three different kinds of landscape (i.e., urban, vegetated, and mixed). The closer the spectral signatures with respect to the ground-truth, the better the results. It is easy to be remarked that the best results are obtained by the EXP method followed by the GSA in agreement with the SAM values in Table II. Figs. 10 and 11, concerning the RR2 dataset, confirm the visual analysis on the RR1 dataset. Team 1's result is severely spectrally distorted, while Team 2's pansharpened image suffers from underenhancement of the spatial details extracted from the Pan image. Team 3 and 4's products show significant spectral distortions, more evident in the true-color image for Team 3 and in the false-color (one NIR and two SWIR wavelengths) for Team 4.

C. Discussion on Challenge Results
Spectral information is of paramount importance in hyperspectral imaging. A discussion on the challenge results presented by the four teams should start from specific considerations about the preservation of the spectral information of the original HS image after the spatial enhancement through hyperspectral pansharpening. This kind of assessment is straightforward on the two datasets at reduced resolution, since a reference HS image, the original image at 30 m resolution, is available. It is evident from an objective assessment in Tables II-III and visual  analysis in Figs. 8-9 for the RR1 dataset, and Figs. 10-11 for the RR2 dataset, that the Team 1 algorithm introduces significant spectral distortions, and Teams 2, 3, and 4 are almost equivalent, since Team 4 outperforms the others on the RR1 dataset, but not on the RR2 dataset (see, e.g., the poor spectral fidelity of the false-color composite in Fig. 11). In summary, the results by all the teams are spectrally insufficient, since the lowest value of the SAM index for the RR1 and RR2 datasets (and the most spectrally consistent color compositions in the visible, NIR, and SWIR wavelengths) are provided by the baseline GSA method. This conclusion is confirmed by the results at full resolution, where the Team 4 results show an acceptable spectral behavior.
Similarly to spectral quality, spatial enhancement can be easily assessed on the reduced resolution datasets, RR1 and RR2, through direct comparison with the original 30-m image and computation of multiband quality indexes. Team 3 provides the best results in term of injection of spatial details. This is proved by the score indexes in Tables II-III and confirmed by Tables IV and V. As pointed out in Section III-A, all the teams presented pansharpened HS images with lower overall quality with respect to the baseline GSA method.

IV. HYPERSPECTRAL PANSHARPENING: WHERE ARE WE NOW? A DISCUSSION ON THE CHALLENGE
This contest puts a spotlight on the hyperspectral pansharpening problem pushing researchers in finding solutions to a wellknown problem in the image fusion literature, but involving new data. Indeed, PRISMA images are not widely used in the related scientific community and this represents a further challenge, i.e., the development of the best hyperspectral pansharpening approach for a new set of data. This implies that there is no prior knowledge about the particular problem at hand together with the absence of pretrained models for machine learning-based approaches.
In this context, the four teams addressed the issue proposing innovative solutions relied upon machine learning and variational optimization-based methodologies, sometimes borrowed by the related multispectral and hyperspectral image fusion literature. Anyway, despite the use of state-of-the-art methodologies, such as deep networks (which already demonstrated high performance for closely related tasks), the teams did not get outstanding results if we compare them with some baseline methods that were proposed tens of years ago in the multispectral pansharpening literature (see, e.g., the GSA and the MTF-GLP). For this reason, the committee decided to close the contest and claim it is inconclusive (no winner).
Generally speaking, the organizers observed a good interest (with more than 100 registrations) for this challenge, but just four submissions. One of the reasons could be found in the fact that some researchers are more prone to tune fusion approaches with trial and error procedures and this contest does not provide this opportunity because of the absence of ground-truth samples. The lack of ground-truth samples was even more critical in this challenge because of the specific nature of the data (new sensor with specific spectral coverage and specific ratio for the spatial resolutions), hence preventing deep networks pretrained with data provided by other satellites from providing useful estimates. In particular, machine learning approaches usually require a tuning phase for hyperparameters that is often addressed treating the networks as black-boxes and driving the tuning process through improving a quality metric on ground-truth samples. Unfortunately, an evaluation server (which continuously assesses the performance on test cases) was not set up for this   challenge, as done for other well-known contests as the Data Fusion Contest organized by the IEEE GRSS Image Analysis and Data Fusion Technical Committee. In the opinion of the organizers, the absence of ground-truth samples (related to the testing scenarios) and an evaluation server (following the line drawn by other contests) limited the possibility of fine-tuning for the proposed approaches, thus, reducing the performance.
This last point opens the door to some general considerations about the use of machine learning-based approaches to solve image sharpening problems. Indeed, it is not seldom to see   deep networks with millions of parameters working after a lightweight training involving few samples compared to the number of parameters (biases and weights) to estimate. It is worth to remark that the training is an estimation problem where a number of parameters related to the particular network configuration is estimated starting from the available samples (examples) provided during the training phase. Some constraints can sometimes be added to facilitate the training. A classical example is provided by convolutional neural networks (widely exploited for computer vision tasks), which implicitly limit Fig. 11. Pansharpening results on a 400 × 400 portion of the RR2 dataset with color composites displayed in Fig. 10. Linear stretching between 1% and 99% of the histogram range for each band is used for visualization.
connections to a neighborhood, thus reducing the size of the receptive field with respect to fully connected neural networks. However, to estimate millions of parameters, a huge amount of data is required that are often unavailable for remote sensing image sharpening. Hence, a clear reduction of the training phase, because of the absence of data, leads to the adaptation of neural networks to the particular problem presented in the training phase, thus decreasing their generalization ability. This problem, which is a hot topic for multispectral pansharpening, is still more crucial for hyperspectral sharpening because hundreds of spectral bands are fused with respect to tens of spectral bands of the multispectral case. Indeed, more bands mean more network inputs leading to more complex (with more parameters) networks, thus requiring more data for training.
We want to conclude this section stating that this contest opened our eyes to new problems. The scientific community is moving faster and faster addressing more challenging issues everyday. New potentialities have recently been identified in artificial intelligence solutions for facing with remote sensing image sharpening. Anyway, the Community should also consider the development of remote sensing-based approaches, strongly exploiting knowledge about the problem at hand and integrating it in approaches that are not just borrowed from other scientific communities (as the computer vision one). Indeed, the exchange of information among communities will surely be the key to success for developing new solutions to address more challenging research problems with outstanding performance.

V. CONCLUSION
This article presented the scientific outcomes of the 2022 Hyperspectral Pansharpening Challenge organized by the 12th IEEE WHISPERS 2022. The article described first the four datasets used for this challenge, even pointing out the data preparation procedures. Afterward, the baseline approaches forming the benchmark have been detailed together with the protocols (both at reduced resolution and full resolution) for performance assessment of participants' outcomes. Finally, quantitative and qualitative results have been shown with some discussions about these latter and some thoughts on the state of the researches in the field of hyperspectral pansharpening and the new trends in image sharpening.
All in all, the organizers observed a good interest (with more than 100 registrations), but just four teams joined it submitting their outcomes. The participants proposed innovative solutions relied upon machine learning and variational optimization-based methodologies. Anyway, despite the use of the state-of-the-art approaches, the teams did not get outstanding results if we compare them with some baseline methods that were proposed tens of years ago in the multispectral pansharpening literature (see, e.g., the GSA and the MTF-GLP). For this reason, the committee decided to close the contest and claim it is inconclusive (no winner).