High spectral resolution imaging based on dual-camera system with filter wheel (November 2021)

High spectral and high spatial resolution imaging is necessary in optical imaging. However, the filter wheel imaging system which is composed of band pass filters obtains discrete spectral information, with high spatial resolution, but low spectral resolution. This paper proposes a novel high spectral resolution imaging approach based on dual-camera system with filter wheel, to reconstruct high spectral and high spatial resolution images from a limited number of spectral images. The proposed architecture comprises three key components: design of dual-camera system with filter wheel (DCS-FW), establish the mapping relationship between RGB image and spectral images, achieve spectrum reconstruction by interpolation. The proposed system combines useful information from both cameras. And the conservation of energy is used to achieve the mapping relationship between RGB image and spectral images, which we used to establish interpolating compensation computational reconstruction. Both simulation and experiments are adopted to verify the validness of our approach, the results show that our method achieves an accurate and robust high spectral resolution reconstruction while maintaining spatial resolution.


I. INTRODUCTION
Humans can observe the colorful world, because the cone cells in the human visual system can convert light in the visible spectrum. Conventional cameras imitate the human eyes by using RGB color filters for recording RGB measurements on the sensor, which leads to a great loss in spectra details. Spectral imaging technology can record the reflection response of natural light in different spectral bands, and record the physical and chemical properties of substances that are difficult to be obtained in RGB images, such as the material, morphology, surface smoothness of the objects. These attributes are widely applied in many fields, including remote sensing [1], [2], military reconnaissance [3], medical diagnosis [4], agriculture [5], object detection and tracking [6], [7], etc. Meanwhile, high spatial resolution images can provide a lot of spatial details, which cannot be ignored.
Spectral imaging techniques can be divided into dispersion type, interference type, and filter type based on the spectroscopic principle. The dispersive components used in the dispersive imaging spectrometer include prisms and gratings. The dispersive spectrometer has a simple structure, but it is difficult to obtain both high resolution and high luminous flux. To improve the spectral resolution, the luminous flux has to be sacrificed. Different from the dispersive spectrometer, the interference spectrometer [8] has the characteristics of high luminous flux, however, because of the scanning mirror structure, it also has poor structural stability, which means a small vibration may cause a large imaging error. There are many implementation schemes for filter type spectrometers [9], [15], among which the tunable filter method and the physical filter wheel method are widely used. The specific form of the physical filter wheel method is the filter wheel structure, the spectral images of different wavelengths, which have high spatial resolution and retain a large amount of spatial information, can be obtained by controlling the rotation of the filter wheel. But there also exist great limitations. On the one hand, the spectral information of the obtained spectral image is relatively limited, only discrete spectral reflectance curves can be obtained, which leads to low spectral resolution. On the other hand, time loss brought by the rotation of the filter wheel can't be ignored.
In this paper, we propose a novel computational high spectral resolution imaging method named High Spectral Resolution Imaging using Interpolating Compensation based on Filter Wheel Dual-Camera (HSRIIC-FWDC) to obtain high spectral resolution images without using learning-based methods. Specifically, we construct the mapping relationship between RGB channels and spectral data based on energy conservation. The basic idea is that the RGB images contain the information of the three-channel spectral sensitivity curve of RGB camera, which reflects the same physical quantity as the spectral reflectance curve. Thus, we can establish the corresponding relationship. Then, to achieve high resolution spectral reflectance curve, we adopt interpolating compensation method. More importantly, the time efficiency is further improved by optimizing the filter wheel control system. Fig. 1 demonstrates the overview of the proposed method which contains three steps. First, the RGB image and spectral images are captured by our system. Then, with the known camera spectral sensitivity, we can establish the mapping relationship between RGB channels and spectral data. Finally, the spectral reflectance is recovered by interpolating compensation, thus, we can achieve high spectral resolution imaging. Additionally, the spatial resolution of the imaging system comes from RGB camera.
In particular, we summarize the major features of our method as follows:  It completes the reconstruction by using the captured images, instead of using large numbers of spectral images to form spectral datasets, saving resources on dataset acquisition.  The corresponding relationship between RGB image and spectral images is calculated according to energy conservation, ensuring high accuracy.  The recovery of spectral reflectance based on interpolating compensation method, which has low computational complexity and is based on original data captured by our imaging system, thus, it is efficient and reliable.  The dual camera structure enables our imaging system to achieve high spectral resolution imaging and maintain high spatial resolution.

A. SPECTROMETERS FOR SPECTRAL IMAGING
Spectral imaging has long interested researchers for its great potential in various applications, and an imminent task is capturing three-dimensional spectral datasets using a twodimensional sensor. Traditional spectral imaging systems involve spatial dimension scanning approaches [10], [11] and spectral dimension filtering approaches [12], [13]. Spatial dimension scanning approaches include whisk-broom and push-broom. Whisk-broom spectrometers, such as Air Visible/Infrared Imaging Spectrometer (AVIRIS) [10], capture the spectrum of a single scene point at each time and move the scanning system to record the spectrum of all scene points. Push-broom spectrometers, such as Hyperspectral Digital Imagery Collection Experiment (HYDICE) [11], records the spectrum of a slit at each time, which improves the scanning efficiency. Spectral dimension filter spectrometer uses multiple color band-pass filters to record the spectrum of different spectral bands. For example, a spatial variant color filter is distributed over the sensor which requires multiple exposures to record the spectrum of whole scene points [14], or a tunable spectral filter [12] or rotating filter wheel [15] is placed in front of the lens of the camera to change the spectral bands of the capture images.
The above-mentioned conventional spectral imaging methods are difficult to obtain high temporal and spectral resolution at the same time. Compared with traditional spectral imaging, computational spectral imaging can obtain the entire high-dimensional spectral data in a single exposure time. The spectral images can be tomographically reconstructed by Computed Tomography Imaging Spectrometer (CTIS) [16], [17] which projects the highdimensional data onto the sensor with the computergenerated hologram disperser, but the CTIS has the inevitable problem of undersampling. Coded Aperture Snapshot Spectral Imaging (CASSI) [18], [19] based on compressive sensing method, replacing the aperture with a two-dimensional random encoding device. The basic assumption is that the natural scene spectrum has multi-scale inherent sparseness. The CASSI system can be improved by acquiring multiple snapshots that are recorded from a coded mask shifting on a piezo stage [20]. Meanwhile, the CASSI also has some obvious disadvantages. First, the CASSI is based on the premise assumption that the natural scene is inherently sparse, so there are inevitable errors in the reconstruction. Second, the reconstruction algorithm of the CASSI has a high computational complexity, which leads to the inability to reconstruct high-dimensional spectral data matrix in real time. Prism-mask Multispectral Video Imaging System (PMVIS) [21], using spatial sampling methods, such as a mask or a microlens array, combined with the traditional dispersion method, captures the scattered spectrum by the high-resolution sensor. Aiming at the limitation of the low spatial resolution of the PMVIS system, the method of capturing high spatial resolution spectral images through a dual-channel system [22], [23] is proposed, and this method propagate the multispectral data to the other pixels according to color similarity and spatial proximity. The light-field spectrometer couples spectral information to light angle information [24], which is easy to integrate, but the improvement of its spectral resolution is built on the sacrifice of spatial resolution.

B. RECONSTRUCTION OF SPECTRAL REFLECTANCE
The spectrometers base on deep learning is widely used in recent years, whose core technical point is to establish the mapping relationship between the RGB values and the spectral high-dimensional data, and reconstruct the spectral reflectance. Several learning-based methods to get the mapping relationship have been widely researched, including Radial Basis Functions (RBF) [27], sparse coding [28]- [30], and Convolutional Neural Networks (CNN) [32]- [36]. To solve the problem of low spectral resolution, a quantity of datasets that contain tristimulus values from the surface reflectance spectrum is used to acquire the principal components of the spectrum [25]. Abed et al. [26] proposed a method that uses local linear interpolating compensation to estimate the reflectance curve with tristimulus values. Nguyen et al. [27] used a radial basis function network to map the RGB values to the reflectance spectra of the scene. Hyperspectral prior is collected and pre-processed only once using tools from the sparse representation literature to recover hyperspectral signatures from RGB measurements [29]. Fu et al. [30] presented an approach that learns multiple non-negative sparse coding dictionaries from the training spectral datasets in terms of clustering results. Otsu et al. [31] presented an approach that reformulates conversion of tristimulus colors to spectra via principal component analysis, and proposes a greedy clustering algorithm that minimizes reconstruction error. Xiong et al. [34] proposed a deep learning framework, i.e., HSCNN, which is one of the first CNN-based methods for hyperspectral recovery from a single RGB image. However, the upsampling operation in HSCNN requires the knowledge of an explicit spectral response function, it thus restricts the applicability of HSCNN when the spectral response function is unknown or difficult to obtain. Z. Shi et al. [35] improved HSCNN by removing the hand-engineered upsampling step and using a deep residual network and densely connected network to achieve more accurate results. Fubara et al. [32] proposed a CNN-based strategy for learning RGB to hyperspectral cube mapping by learning a set of basis functions and weights in a combined manner and using them both to reconstruct the hyperspectral signatures of RGB data.
The above spectral reconstruction methods by mapping RGB values and the spectral high-dimensional data need large prior datasets. Considering that the prior datasets are sensitive to scene and light, this paper proposes a mapping method that is suitable for any real-world scene. We use the same model of cameras to capture RGB images and spectral images and get the mapping relationship by matching the energy of RGB images and spectral images based on conservation of energy.

III. OUR METHOD
In this section, we elaborate on the working principle of DCS-FW, and then demonstrate the basic principle of HSRIIC-FWDC by mathematical formulating its mapping relationship between the RGB values and the spectral highdimensional data. In addition, the spectrum of each scene point can be reconstructed efficiently.
Compared with the CASSI and PMVIS, our method has the characteristics of high spatial resolution. In our system, spatial information provided by RGB image can achieve high spatial resolution imaging.

A. DUAL CAMERA SYSTEM WITH FILTER WHEEL
Dual-camera system with filter wheel is composed of filter wheel, beam splitter and two cameras of the same basic parameters, one of which is used to capture RGB images and the other to capture spectral images as Fig. 1 shows. To ensure that the dual cameras image the same scene, we design the optical path of the dual-camera system precisely. The angle between the plane of the beam splitter and the plane of the filter is controlled to 45°, and the angle between the plane of the RGB camera objective and the plane of the filter is controlled to 90°. Further, the dual cameras are controlled to have same optical range to the plane of the beam splitter. The light containing spectrum is separated into two rays which have the same information and same energy according to the spectroscopic characteristic of the beam splitter, and the rays captured by dual cameras are theoretically equal according to the light propagation principle. By controlling the rotation speed and the acquisition rate of cameras, we can obtain the images efficiently, guaranteeing high time resolution.

Let ( , , )
h x y λ denotes the 3D images and , x y is the 2D spatial coordinates, λ is the spectral dimension. In conventional photography, RGB camera captures a 2D image ( , ) i x y without spectral dimension, we use i replace ( , ) i x y . The RGB camera has the characteristic of obtaining RGB values from the camera spectral sensitivity, and the relationship between RGB values and the camera spectral sensitivity can be formulated as In practice, the spectral information is discretized across wavelength and (1) can be rewritten as is the discrete representation of wavelength λ and B is the number of spectral bands.
Our DCS-FW can capture limited number of spectral images of different bands, in other words, the number of spectral bands B is relatively small, which leads to low spectral resolution.
We further simplify (2) in matrix form as = I SR , Our task is to reconstruct the spectral reflectance R from the limited number of spectral images, and the estimation function can be formulated as 2 2 min || || − R I SR .
where 2 || ||  is L2-norm. Here, we propose a method that uses weight coefficients of the camera spectral sensitivity of RGB channels to reconstruct the spectral reflectance based on conservation of energy. For our system, only narrow bands of light can pass through the filters, which leads to a large energy loss, while the RGB camera does not. The energy matching is essential to the subsequent steps.
Known several spectral images of different bands which is captured by our spectral camera, we can obtain discrete spectral reflectance curve. According to previous studies [37], the spectral reflectance at each pixel can be approximately represented by using a small basis as where j d denotes the basis function for spectral reflectance, and j m α denotes the corresponding coefficient, and J denotes the number of basis used.
In our terms, we use the camera spectral sensitivity of RGB channels { } n s to act as j d in order to meet the requirement of energy matching, because { } n s has the characteristic of spectral reflectance. Furthermore, because the RGB camera has three channels, we assign 3 to J . Thus, (5) can be rewritten as  In practice, the number of spectral bands B equals the number of filters in our system. In a hence, the mapping relationship between RGB image and spectral images is established. Then we use an efficient interpolating reconstruction method to pursue high spectral resolution imaging.

C. HIGH SPECTRAL RESOLUTION IMAGING BASED ON INTERPOLATING RECONSTRUCTION
The spectral images captured by our system can analysis out discrete spectral reflectance curve as Fig. 1 where ε is the residual error.

IV. EXPERIMENTS
In this section, the implementation details and analysis are provided. First, we describe the images captured by our spectral system and the image preprocessing we implement. Then, we use the proposed approach to recover the spectral reflectance.

A. SYSTEM SETTINGS AND IMAGES ACQUISITION
We capture the RGB images and spectral images by DCS-FW, and the prototype is demonstrated in Fig. 2. The sensor of RGB camera and grayscale camera we use are the FL2-20S4 series which have maximum resolution of 1624×1224 at 15fps and pixel size of 4.4μm, and the objective lens we use have focal length of 35mm and aperture value of F1.4. We use the filter wheel control system to control the rotation speed of the filter wheel to 2 circles/s. Meanwhile, considering the frame rate of the grayscale camera, we set the image sampling interval of the grayscale camera to 12 fps, and we are thus able to capture two sets of images per second, maintaining time resolution. The average time for our method to reconstruct spectral reflectance from a whole image is 0.256s, which is much less than the acquisition time of a set of images. Therefore, in our imaging system, the acquisition time of the spectral cube depends mainly on the acquisition time of the images, in other words, the acquisition time of the spectral cube is 2cubes/s. In this section, we use narrowband filters with center wavelengths of 450nm, 470nm, 540nm, 560nm, 630nm, 680nm to obtain spectral images, and control the spatial dimension of RGB images and spectral images to 1024× 1024. To investigate the impact of illumination on the reconstruction performance, we do our experiment under the illuminations of CIE D65 and sunlight, the images we capture are shown in Fig. 3, and the top three rows are captured in the indoor environment under the illumination of CIE D65, the bottom three rows are captured in the outdoor environment under the illumination of sunlight. We select the representative points of the scenes as sampling points for testing, and we show the discrete spectral reflectance curves in Fig. 4.

B. RECOVERY OF SPECTRAL REFLECTANCE
Here, we set the residual error ε to 1, which is an empirical value and we recover the spectral reflectance of sampling points in the region of interest that are meaningful for the algorithm demonstration. The results of reconstruction of spectral reflectance are showed in Fig. 5.

V. ANALYSIS AND DISCUSSION
In this section, we use objective evaluation metrics to analyze the accuracy and robustness of our method objectively, and use comparative experiments to verify the effect of our method.
We use three objective evaluation metrics to evaluate the quantitative performance for the spectral reflectance reconstruction, including root mean square error (RMSE) as in (9) and mean relative absolute error (MRAE) as in (10) to measure the error and spectral angle mapper (SAM) as in (11) to measure the similarity, and smaller values of RMSE and MRAE mean better performance, otherwise, the bigger values of SAM mean better performance, and SAM is in the range of 0 to 1.
where n denotes the total number of data points in the spectral cube, i R and gt i R denote the reconstructed and ground-truth spectra. Table I shows the MRSE, RMAE, and SAM between the reconstructed spectral reflectance and the original discrete curve. We can see that the spectral reflectance reconstructed by our method bases on the original values, and maintains the feature of original values. Combining Fig. 5 and Table I, we can conclude that the error between the results of our method and the original values is small, indicating that our method has good fidelity. To further analyze the accuracy of our method, we use three comparison methods, including principal component analysis based on a greedy clustering algorithm which is proposed by Otsu et al. [31], the sparse representation-based method (SR) [29], and a learning-based method, i.e., HSCNN-D [35].

A. ACCURACY ANALYSIS
Because the methods based on dictionary learning require large training set, we use CAVE [38] spectral datasets as the training set for comparison methods. CAVE spectral datasets contain 32 real-world objects, and each object contains an RGB image and 31 spectral images, whose wavelength is across 400nm to 700nm at 10nm intervals and spatial   To show the performance of each method clearly, we put recovery results of spectral reflectance of each method in the same scene into the same coordinate system, which is showed in Fig. 6.
We also put the spectral reflectance obtained by PMVIS into the coordinate system as the reference. In this paper, our default premise is that the spectral reflectance obtained by PMVIS has a small error from the ground truth, which has been proved in previous research [22].
From Fig. 6 we can see that in each scene, the reconstructed spectral reflectance curves have roughly the same trend as the ground truth, which indicates that our method and the comparison methods are feasible. In addition, spectral reflectance curves reconstructed by our method are much closer to the ground truth in most scenes. Table II shows RMSE, MRAE, and SAM results of recovered spectral reflectance for different methods under different illuminations respectively. We can see from Table II that our method performs better than other comparison methods in most cases. Comparing the average values of each measurement, we can see that our method has a better performance under different illuminations, which indicates that our method has better capability in terms of accuracy and fidelity.

B. ROBUSTNESS ANALYSIS
To analyze the robustness of our method, we also compare the actual reconstruction results of eight patches in the color checker in Fig. 7, and the image is taken under indoor illumination of CIE D65. The quantitative errors of these patches are also shown in Table III. Specifically, in Fig.7(b) and Fig. 7(h) we can see that the reconstruction results of our method, Otsu, and HSCNN-D have roughly the same curve trend with the ground truth, however, SR method does not perform well in this aspect. Compared with Otsu and HSCNN-D, our method has more detailed curve information, e.g., in Fig. 7(b), same as the ground truth, the curve reconstructed by our method has a peak near 620nm, which is not found in the curves reconstructed by Otsu and HSCNN-D. In Fig. 7(h), we can see that the curve reconstructed by our method has peaks near 600nm and 670nm, which are the same as the ground truth and are not found in the comparison methods. The details described above prove that our method performs better in obtaining detailed features. From items (b) and (h) in Table III, we can see that our method has good results on all three objective evaluation metrics in both items, better than the results of SR, Otsu, and HSCNN-D overall, in other words, the reconstruction results of our method have lower error. Combining the results of Fig. 7 and Table III, we can also get the conclusion that the reconstruction results of our method have lower error and higher shape fidelity, in other words, our method has a better performance.
Synthesizing the results of the experiments we did, we can get the conclusion that our method is robust, it has a good reconstruction effect no matter under different illuminations or for different objects.

C. FURTHER DISCUSSION
Our method combines the captured discrete spectral reflectance curve and RGB spectral sensitivity with detailed features, which guarantees the constructed spectral reflectance with more detailed features, meanwhile, get better reconstruction results. The comparison methods rely on the training dataset, and the quality of training dataset affects the reconstruction results directly. We can see from Table II that under the illumination of daylight, the comparison methods have worse performance than under the illumination of CIE D65 in most cases, because the training datasets are captured under the illumination of CIE D65, and the reconstruction performance of spectral reflectance is sensitive to the environments and illuminations of training hyperspectral image datasets. If the training datasets contain different illuminations, the performance would be better. Since the work of collecting hyperspectral image datasets is laborious, the effect of learning-based methods is limited.
Known the spectral reflectance of an object is determined by its material composition and the spectral properties of the illumination, the illumination is very influential, our method captures directly under different illuminations, which avoid the errors caused by illumination.

VI. CONCLUSION
In this paper, we present an effective spectral reconstruction method based on the mapping relationship between spectral images and RGB image. Spectral images and RGB image are first captured by dual-camera system with filter wheel. To establish high spectral resolution imaging, the camera spectral sensitivity and conservation of energy are used to establish the mapping relationship which is the key to spectral reconstruction. Compared with previous works, our method does not require hyperspectral prior, which is based on hyperspectral image datasets. From the experiment results under different illuminations, we show that the spectral reflectance can be recovered with low RMSE, low MRAE, and high SAM, which proves that our method has good accuracy and robustness. We believe that the proposed approach can facilitate a wide range of spectral imaging applications.
A limitation of our work is that the time resolution of our method is closely related to the acquisition speed of spectral images. In the future, the method to improve time resolution should be further optimized. Besides, another limitation would be the capability of cameras that may cause additional errors, the cameras in this paper are close to cut-off when approaching 700nm, so we are aiming at solving this problem from the perspective of algorithm and equipment. Since the range of three-channel spectral sensitivity curve of RGB camera is in the visible light band, our method can reconstruct visible spectrum. The reconstruction of the spectral that beyond the wavelength of the RGB will be further researched by using cameras with wider spectral range or combining with learning-based methods, and how to integrate these technologies is the key point of future research.