High-Quality Object Reconstruction From One-Dimensional Compressed Encrypted Signal Based on Multi-Network Mixed Learning

Conventional optical image encryption methods based on phase-shifting interferometry need at least two interferograms, and the storage or transmission of interferograms needs to occupy a lot of resources. At the same time, the low quality of reconstructed complex natural images has always been a main limiting factor in the application of optical image security. In this paper, a high-quality object reconstruction method from one-dimensional compressed encrypted signal based on multi-network mixed learning is proposed. First, an encrypted interferogram can be obtained using the double random phase encoding (DRPE) method. Then, we can obtain the one-dimensional compressed sampling signal of the encrypted hologram on the photodiode using single-pixel compressive holographic imaging method. Finally, the mapping of 1D signal to 2D object image can be learned utilizing multiple neural network models. Numerical simulation results show that the complex natural images can be reconstructed using the proposed method with high quality at lower sampling ratio.


I. INTRODUCTION
With the rapid development of cloud computing, Internet of Things (IOT) and digital communication technology, the security and efficiency of information in the process of storage and transmission have received more and more attention [1]- [5]. In recent decades, information security technology based on optical theory and methods has been one of the research hotspots, benefiting from the parallel processing of optical information and the characteristics of high-speed operation and multi-dimensional capabilities [6]. Since the introduction of the double random phase encoding technology (DRPE) [7] based on optical 4f system, various optical image encryption technologies have been widely developed [8]- [12]. However, the DRPE-based 4f system is essentially a linear system, which makes the functional relationship between the key, plaintext, and ciphertext The associate editor coordinating the review of this manuscript and approving it for publication was Liangtian Wan . interdependent, so it is vulnerable to various attacks, thereby limiting the security of the system [13]. Reference [14] proposed a photoelectric information encryption technology based on phase-shifting interferometry (PSI), which can perfectly reconstruct the complex amplitude information of an object using four -frame interferograms. However, due to the huge number of interferograms, storing or transmitting them requires occupying or consuming a large amount of media resources, which greatly limit their practical applications. An application of two-step PSI in image encryption is proposed [15], thereby reducing the data redundancy of the interferogram. Compressed sensing (CS) theory can effectively reconstruct the signal at a lower sampling rate through the solution of the underdetermined equation [16]. To further solve this problem, many methods are proposed in combination with CS theory and PSI [17]- [19]. Taking compressed optical image encryption [17] as an example, the object image uses the Mach-Zehnder interferometer to obtain three-frame encrypted interferograms using the FIGURE 1. The schematic of high-quality object image reconstruction from one-dimensional compressed encrypted signal based on multi-network mixed learning method. The whole training process includes three steps: two linear regression models (LR1 and LR2) and a CGAN model. Where P1 and P2, random phase plates; , measurement matrix; , network optimization process;Ĩ, reconstructed hologram in the LR1 model training process;Î, the hologram reconstructed by the optimized LR1 model;T , the reconstructed object image in the LR2 model training process;T , the low-quality object image reconstructed by the optimized LR2 model;Õ, the high-quality reconstructed object image in the CGAN model training process;Ô, the reconstructed object image during the test process. DRPE encryption method. Then according to the theory of CS, the encrypted image is highly compressed into a one-dimensional signal, and then the interferogram is reconstructed by the Two-Step Iterative Shrinkage/Thresholding Algorithms (TwIST) [20]. Finally, the object image is reconstructed by using three-step phase shifting method. The above scheme can effectively reconstruct a binary image or a simple grayscale image with a lower sampling rate, but when reconstructing a complex natural image with a lower sampling rate, the high-frequency detail information of the object image cannot be reconstructed, and it is greatly affected by the noise in system.
In recent years, learning-based methods, including linearbased learning methods and non-linear-based learning methods, have been widely used in the field of optical signal processing. When the overall system model can be viewed as a linear system, such as a random-phase-encoded optical cryptosystem [21] and blind reconstruction for single-pixel imaging [22], when the phase masks or illumination patterns are unknown, they can be regarded as linear systems of black box model. And these black box systems are suitable for the black box model based-neural network training methods. Therefore, the learning method of training linear regression (LR) model can be used to reconstruct the target object [23]. Deep learning (DL), a nonlinear learning method is widely used in various fields of optical image processing, including cryptography [24], holographic reconstruction (phase retrieval) [25]- [28], computational ghost imaging [29], [30], super-resolution [31]- [34] and so on. The method based on LR has certain advantages over DL in the case of a small number of training samples and complex natural images. However, this method cannot effectively reconstruct high-frequency detail information of images and cannot deal with nonlinear systems. Nonlinear-based DL has more powerful processing capabilities than LR methods in processing nonlinear systems and image high-frequency detail information reconstruction. However, this method requires a larger training sample size.
Based on this, a high-quality object image reconstruction from one-dimensional compressed encrypted signal based on multi-network mixed learning method is proposed for the first time. In the image encryption process, the proposed method only needs one interferogram, thus we do not require PZT to produce phase shifts which greatly reduces the experimental burden. And then the multiple neural network models are developed to directly obtain the secret two-dimensional image from the one-dimensional compressive sampling data, which makes full use of the high-speed and multi-dimensional parallel processing ability of optical method, and greatly reduces the amount of data stored and transmitted. In addition, an important key-measurement matrix, is added to increase the security of the system. The overall flow diagram is shown in Figure 1. First, a LR model is used to reconstruct the hologram by learning the physical VOLUME 8, 2020 relationship between the one-dimensional compressed signal and the hologram. Next, another LR model is used to initially reconstruct the object image by learning the relationship between the reconstructed hologram and the object image. However, in this step, the quality of the reconstructed object image is not high and the high-frequency detail information of the image cannot be reconstructed. Therefore, the DL model based on conditional generative adversarial network (CGAN) [35] is used to reconstruct the object image with high-quality. By using the cascaded three models, 2D object image can be directly quickly reconstructed from one-dimensional compressed signal. In addition, this method can efficiently reconstruct complex natural images at a lower sampling ratio.

A. THE APPROACH OF COMPRESSIVE OPTICAL IMAGE ENCRYPTION
The compressive optical image encryption process is shown in ''Data Processing'' in Fig.1, and the corresponding optical setup is shown in Fig.2. The laser beam emitted from a He-Ne laser (REO/30989) with the wavelength λ of 633 nm is divided into an object beam and a reference beam. First, the image is added to the spatial light modulator (SLM) and illuminated by the object beam, and then the image is encrypted using the DRPE method through two random phase masks P1 and P2. In the other arm, the reference beam passes directly through the mirror. Then, the two beams are overlap on the digital micromirror device (DMD) to form an interference pattern, and then the DMD device is used to modulate the encrypted complex light field, and the compressive sampling data is obtained through the photodiode detector. Finally, the compressive sampling is transmitted to the computer, and the trained model can be used to directly decrypt and reconstruct the object image. In the process of data production, a CCD is needed to be added in the position of DMD to collect the interferograms to train the model. Once the model is trained, the object image can be reconstructed directly from the one-dimensional compressed sampling data measured by DMD.
For simplicity, the real amplitude of the reference plane ave is supposed as R, and the complex object field in plane P 1 is set as U 0 (x 0 , y 0 ). exp[i2π · p 1 (x 0 , y 0 )] and exp[i2π · p 2 (x 1 , y 1 )] represent the complex amplitude transmittance of P 1 and P 2 , respectively, where p 1 (x 0 , y 0 ) and p 2 (x 1 , y 1 ) are two independent white noises uniformly distributed in [0, 1]. d 1 is defined as the distance between P 1 and P 2 , d 2 is defined as the distance between P 2 and DMD. FR d represents the Fresnel transform of distance d. Then, the complex object field on the DMD plane can be expressed as Then, the hologram I of the complex amplitude field on the DMD plane are expressed as where I 0 is the zero-order light given by Finally, when the DMD device is modulated by the complex amplitude field and the measurement matrix loaded into DMD, the compressed sampling signal is coupled to the photodiode detector through the lens and the one-dimensional compressed sampling data of the encrypted hologram is obtained. The mathematical description of this process is as follows: we can obtain where is the measurement matrix loaded into the DMD, ⊗ is the inner product. The final compressed sampling data Y ∈ R M ×1 on the photodiode can be obtained after repeating the process M times.

B. LEARNING-BASED IMAGE DECRYPTION AND RECONSTRUCTION
A LR model can be considered as a single-layer fullyconnected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP) [36], For a linear optical system such as random-phase-encoded optical cryptosystem [21] and blind reconstruction for single-pixel imaging [22], We simply assume that the function of this linear system is expressed as: f (x) = w · x, x represents the input signal, y represents the output signal, w represents weighting matrix which corresponds to the linear optical system. If given a large number of labeled training samples (x n and f n (x), n = 1, 2, . . . , N ), after training these batches of samples, this optical system is likely to be learned. We first initialize w randomly, and then during the training process we use the back-propagation algorithm [37] to continuously optimize w, and finally when the model converges, the optimized w can approximate the optical system. For nonlinear deep learning networks, it can be equivalent to multilayer fullyconnected neural network, and its learning process is similar to the above process. The method proposed in this paper first employs LR model to directly reconstruct the encrypted hologram from the one-dimensional compressed sampling data. The process can be described asĨ whereĨ is the reconstructed hologram in the training process, LR1 {·} represents function mapping from the sampling signal Y to the encrypted hologram. This mapping can be learned after training a LR model from N pairs of different label training data, and each pair has a known hologram I n and a sampling signal Y n , where n = 1, 2, . . . , N . This training process is similar to the optimization process and can be expressed aŝ whereˆ LR1 {·} represents optimized LR1 model,Î represents the hologram reconstructed by the optimized model, || · || is a loss function about the error between I n andĨ n . The learning processes of the other two models LR2 and CGAN are similar to the above process and can be expressed as follows: where LR2 {·} represents the function mapping fromÎ to the label O,T is the reconstructed object image in the LR2 model training process,ˆ LR2 {·} represents the optimized LR2 model,T is the low-quality object image reconstructed by the optimized LR2 model, CGAN {·} represents the function mapping fromT to the label O,Õ is the high-quality reconstructed object image in the CGAN model training process, andˆ CGAN {·} represents the optimized CGAN model. Once these three optimized models are obtained, we can use the cascaded three models to directly reconstruct the objectÔ from the sampling signal Y during the testing process. The average time to reconstruct an object image from a compressed data is 0.03 seconds with the sampling ratio of 20%.

C. THE DESIGNED NETWORK ARCHITECTURE AND TRAINING PROCESS
The detailed network structure is shown in Fig.3. The LR model actually can be equivalent to a fully connected neural network without a nonlinear activation function, shown as ''LR1 model'' and ''LR2 model'' in Fig.3. In LR1 model, the input and output of the network are compressed data and reconstructed hologram, respectively. The neuron size of the input layer is M ×1, and the neurons in the output layer of 4096 × 1 in size are reshaped to an image with the size of 64×64. In LR2 model, the input and output of the network are the hologram reconstructed by the optimized LR1 model and the low-quality reconstructed object image, respectively. The input layer and output layer are both composed of neurons of 4096 × 1 in size. A CGAN model is used to reconstruct high-quality object image, which is composed of generator network and discriminator network. The input of the generator network is a low-quality object image generated by the optimized LR2 model, then it goes through 6 downsampling convolutional layers and 6 upsampling convolutional layers to reconstruct high-quality object image. Discriminator network adopts Markovian discriminator (PatchGAN) structure that only penalizes structure at the scale of patches, which can make the image generated by the generator network more similar to the object image in semantic and texture [35]. One of the inputs of the discriminator is the input of the generator, and the other input is the output of the generator or label image. After four convolutional layers, discriminator outputs a 14 × 14 size feature map for calculating the discriminator loss to optimize the generator. The detailed parameters of these two networks can refer to the Fig.3. During the training process, we employ the mean square error (MSE) as loss function to optimized LR1 and LR2 model. The least absolute deviations loss (L1_Loss) and binary cross entropy loss (BCE_Loss) are employed to optimize the generator network and discriminator network in the CGAN model, respectively. We set the learning rate to 0.00002 for LR1 and LR2 mode and 0.0002 for CGAN model and use stochastic gradient descent (SGD) and Adam optimizer to optimize and update parameters of model. The training step is 500, and all programs are running under Pytorch in Python3.7 environment, and accelerated calculations are performed using NVIDIA Geforce GTX1080Ti GPU.

D. PREPARATION OF TRAINING DATA
In the proposed method, two complicated natural image datasets including Faces-LFW dataset [38] which is labeled faces in the wild and Dog [39] are used as training data to test the method proposed in this paper. The background of the image in Dog dataset is more complex than Faces-LFW dataset. We randomly select 6000 object images from each dataset as training samples and 600 object images for testing which do not participate in the training process. In the simulation, we grayed and resized them to 64 × 64, and then we can generate the same number of encrypted holograms and compressed sampling data. For comparison, we generated 5 sets of data with different sampling ratios of 50%, 20%, 30%, 10% and 5%.

III. RESULTS AND ANALYSIS A. RESULTS
For convenience, we named the proposed model based on multi-network mixed learning as LR + CGAN model. The qualitative results of the proposed method are shown in the column (a) of Fig.4 and Fig.5, and the size of the reconstructed image is 64 × 64. The reconstruction efficiency and quality of the encrypted hologram are not only related to the reconstruction algorithm and measurement matrix, but also related to the system parameters and the accuracy of the optical key. Most of CS reconstruction algorithms heavily rely on two conditions include the sparsity condition and incoherence condition of the image [40]. Since the encrypted image is like noise, and the traditional optical decryption method often fails to obtain satisfactory results, such as our previous work [17], [19]. In terms of the efficiency of hologram storage and transmission, the proposed scheme only needs to use one encrypted hologram to complete the reconstruction of the object image, while all-optical method [17] needs three encrypted holograms. This can reduce storage space by 1/3. Therefore, from the visual effect, it can be found that the two-dimensional complex natural object image can still be successfully reconstructed from one-dimensional sampling data using the LR + CGAN model even when the sampling rate as low as 5%. When the sampling rate increases to 20%, the object image can be almost perfectly reconstructed, and high frequency information can be retained. In addition, the noise has almost no effect on the reconstructed image. For comparison, the all-optical scheme [17] are performed and the simulation results are shown in the column (b) of Fig.4 and Fig.5. Evidently, for the all-optical scheme, when the sampling rate is increased to 30%, the contour of the object is reconstructed successfully, and most of the low-frequency information of the image can be reconstructed when the sampling rate increased to 50%. We also found that the quality of image reconstruction can be improved by improving the resolution by using the all-optical method, and the reconstruction results at different resolutions are shown in Fig.6. Despite all this, the method proposed in this paper hardly needs to consider the limitation of image resolution, and it can still reconstruct images well even when the image resolution is as low as 64 × 64.
For quantitative analysis of the above results, two indicators including Peak Signal to Noise Ratio (PSNR) and structural similarity (SSIM) [41] are calculated to quantitatively analyze the method proposed in this paper, in which the average of these two indicators for the 600 reconstructed images in the test set are calculated (i.e., the results of all the quantitative analysis below are the average of 600 reconstructed images). All the images in the test set did not participate in the training process. The PSNR curves and SSIM curves at different sampling ratios are plotted in Fig. 7. It can be seen from the change trend of the curves, the reconstruction quality of images decreases as the sampling ratio decreases, and the more complex the objects, the lower the reconstruction quality, which is in line with the actual situation.
To further evaluate the performance of the proposed model, the performance is tested on reconstructing images with different number of training sets at a sampling rate of 20% and the data volume changes from 300 to 6000. The results are presented in Fig.8. As can be seen from Fig. 8(a), the image can still be successfully reconstructed when the number of training samples is as low as 300. Of course, the reconstruction quality of the object improves as the number of samples increases. The corresponding quantitative analysis is shown in Figs.8 (b) and (c). We found that the PSNR and SSIM values of the reconstructed target image have little difference when the training samples are 3000 and 6000. Even when the training sample data is 300, the PSNR value of the reconstructed object image can reach 17.8dB. Therefore, the above results show that the proposed model can achieve ideal results with a small number of training samples.

B. ANALYSIS OF THE DESIGN OF THE NETWORK MODEL
As we all know, the design of the model plays a vital role in the performance of the system in machine learning. In order to verify the superiority of the proposed method, we compare the proposed LR + CGAN model with two other feasible models. One of them is to use only linear regression models to train and reconstruct images, named as LR. The other one is that two linear regression layers are added to the CGAN model as its initial two fully connected layers, we named it csCGAN. The quantitative analysis results are shown in Fig.9. Compared with the LR model and csCGAN model, the proposed LR + CGAN model performs best. Due to the LR model cannot learn the non-linear relationship of filtering out the random noise, the noise has a great influence on reconstruction results using the LR model. Furthermore, the facial expression in the csCGAN-based reconstruction has changed and distorted, the main reason is that the DL model is easy to learn the high-frequency information of the image and but will miss the low-frequency information, and the DL model is often easy to fall into the local optimal and miss the global optimal under this training mechanism of parameter sharing. Quantitatively, the PSNR and SSIM distributions obtained using above three models at different sampling ratios are presented in Fig.10, in which the superiority of the proposed method is further proved.

C. ROBUSTNESS
Because the propagation of optical information may be affected by noise pollution, we also study the robustness of the proposed method on noise. The Gaussian white noise is added to the detection signal with different signal-to-noise ratios (SNR) and the SNR can be expressed as   x 2 m and P noise represent the powers of signal x and noise. For comparison, the quantitative analysis results obtained by the proposed method and LR method with the sampling ratio of 50% are shown in Fig.11 and Fig.12. The results show that with the increase of SNR, VOLUME 8, 2020  the quality of image is improving. Although the SNR of the noise is less than 30 dB, the image can still be successfully reconstructed using the proposed method indicating that the proposed method has a strong anti-noise ability. In addition, compared with the LR model, the reconstructed images using LR + CGAN model can improve the PSNR value by 2-3 dB and the SSIM value by 0.1 on average.

D. GENERALIZATION
Generalization ability is one of the important standards to measure whether the model is practical. We use the model trained on the Faces-LFW dataset to reconstruct the object images from other five disjoint datasets, and the quantitative analysis results with the sampling ratios of 20% and 50% are shown in Fig.13. Although the model is trained on a Faces-LFW dataset containing only human images, it can still be used to successfully reconstruct other types of objects, including handwritten digits, clothes, animals, handwritten characters, etc. This means that our proposed model learns the entire optical system, rather than just fitting a certain data set.

E. SECURITY
We also studied the security of the proposed compressed optical image encryption method. Among the keys, the principal key p 2 (x 0 , y 0 ) and measurement matrix play critical roles in the optical image encryption system. If one of the keys is incorrect, the retrieved object image is the same as the noise Retrieved images with the incorrect keys in the decryption process: (a) when the principal key p 2 (x 0 , y 0 ) is incorrect; (b) when the measurement matrix is incorrect; (c) when λ has relative error 3%; and (d) when d 1 has relative error 1%. and fully unrecognizable, as shown in Fig.14 (a) and (b). When the first diffraction distance d 1 of the object image and the wavelength λ of the He-Ne laser exhibit a relative error, the reconstruction images cannot be recognized, as shown in Figs.14 (c) and (d). The simulation results show that the decrypted image is very sensitive to the correct key. Only when all optical keys are available, the correct decrypted image is obtained, which can prove that the proposed compressive optical encryption system is safe.

IV. CONCLUSION
In this paper, a high-quality object reconstruction method from one-dimensional compressed encrypted signal based on multi-network mixed learning is proposed for the first time. The complex natural images can be reconstructed directly from a one-dimensional compressed sampling data of the encrypted information with high quality at lower sampling ratio and only a small number of training samples is needed by combining the optimized linear regression and deep learning model. In addition, we also compare different models in the quality of reconstruction results, and prove that the proposed LR + CGAN model is the most suitable model for compressive optical image encryption system. Further, the strong anti-noise ability, the powerful generalization ability and the security are verified and show that the proposed method can overcome the limitation of the current hologram data volume of the optical image encryption system and improve the efficiency of information storage and transmission.

ACKNOWLEDGMENT
(Yuhui Li and Jiaosheng Li contributed equally to this work.)