Tree-Structured Dilated Convolutional Networks for Image Compressed Sensing

To better recover a sparse image signal carrying redundant information from many fewer measurements than the Nyquist-Shannon sampling theorem suggested, convolutional neural networks (CNNs) can be used to emulate a compressed sensing (CS) process. However, the existing CS methods based on CNNs have the problems of high computational complexity and unsatisfactory reconstruction effect. This study aims to present a faster algorithm based on CNNs to obtain reconstructed images with finer texture details from CS measurements. A tree-structured dilated conventional network (TDCN) for image CS is proposed. To extract the image multi-scale features as much as possible for better image reconstruction, the TDCN combines tree-structured residual blocks made of three dilation convolution layers with different dilation factors; the output of each dilated convolution layer is directed to fusion layer to eliminate information loss due to the multiple cascading dilated convolutions. Moreover, L1 loss is employed as an objective optimization function instead of L2 loss to improve training results of the network and achieve better convergence. Extensive CS experiments in the study demonstrate that the proposed TDCN outperforms existing state-of-the-art methods in terms of both PSNR and SSIM at different sampling rates while maintaining a fast computational speed. Our code and the trained model are available at https://github.com/UHADS/TDCN.

∈ R M ×N . Compared with the Nyquist-Shannon sampling 23 theorem [3], the CS theory suggests that a sparse signal can 24 be recovered from many fewer measurements based on the 25 sparsity of the signal. 26 In addition, the CS process can be seen as a process of ran-27 dom subsampling signals. Algorithm can eliminate the arti- 28 facts caused by random subsampling; thus, the original signal 29 can be accurately recovered. As a result, the subsampling of 30 CS reduces the demand for high transmission bandwidth and 31 The associate editor coordinating the review of this manuscript and approving it for publication was Wen Chen . storage space. Due to its simultaneous sampling and compres-32 sion at the same time, CS also offers low-cost on-sensor data 33 compression [4]. CS has been applied in a variety of reali-34 ties, including but not limited to radar image acquisition [5], 35 [6], [7], novel imaging devices [8], [9], magnetic resonance 36 imaging (MRI) [10], [11], and wireless telemonitoring [12]. 37 It is well known that information carried by images is 38 redundant and can be sparsely represented in sparse domains. 39 Therefore, the image can be compressed and reconstructed 40 accurately according to CS theory. The goal of image CS is 41 to ensure that an image can be accurately reconstructed from 42 very few measurements. There are two main challenges that 43 need to be solved to meet the goal. These include selecting 44 the sampling matrix and designing appropriate reconstruction 45 methods. 46 Most studies [13], [14], [15], [16] select a random matrix, 47 binary matrix, and structure matrix as the sampling matrix. 48 However, these sampling matrices are image-independent 49 and ignore image features. To make full use of the features of 50 the image and design a sampling matrix with high correlation 51 with the image to achieve high-quality results, a convolutional 52 layer has been proposed in CSNet [17] to simulate the CS 53 sampling process and adaptively learn the sampling matrix 54 from the training images. 55 For the design of the reconstruction method, some studies [24], [4]. However, these methods of using CNNs ignore 68 the linkages between blocks and only rely on intra-block 69 information to recover the image, and blocking artifacts will  To enhance the quality of the reconstructed images,  The TDCN consists of a sampling network and a recon-113 struction network. The sampling network adopts the same 114 network structure as in CSNet [17], and can obtain CS 115 measurements through a sampling matrix that is trained 116 adaptively from training datasets. The reconstruction net-117 work, which is established to learn end-to-end mapping 118 from CS measurements to reconstructed images, contains a 119 linear preliminary reconstruction network and a nonlinear 120 deep reconstruction network. The preliminary reconstruction 121 network results in a preliminary recovery image through 122 a deconvolutional layer, whereas the deep reconstruction 123 network uses several TSRB modules to refine the prelimi-124 nary reconstruction image further and obtain better recovery 125 quality.

126
In addition, instead of the mean square error (MSE) or 127 L2 loss, the mean absolute error (MAE) or L1 loss is 128 used as a loss function in the image reconstruction net-129 work because the literature [29] suggests that L1 loss can 130 potentially help achieve better training results on many 131 occasions.

132
The experimental results show that the proposed TDCN 133 can achieve higher PSNR and SSIM values than most 134 existing methods in image CS because of the following 135 contributions.  2) To quickly obtain recovery images from CS measure-143 ments, we introduce dilated convolution to the TSRB 144 modules and dilated convolutions in TSRB made as 145 a tree structure. Therefore, TSRB can easily obtain 146 multi-scale features of images and ensure that the 147 extracted shallow information is not lost in the deep 148 network.

149
3) We use the L1 loss function in TDCN instead of the 150 L2 loss function. Experiments show that L1 loss results 151 in recovered images with more detail and better visual 152 effects while achieving better convergence.

153
The remainder of this paper is organized as fol-154 lows. Section II introduces the background of the model. 155 Section III introduces the proposed TDCN method. 156 In Section IV, the performance of TDCN is discussed and 157 compared with that of some state-of-the-art methods through 158 experiments, and we conclude the paper in Section V.      The TSRB process can be described by Equations

236
(1) and (2). In a CNN-based image CS, the choice of loss function is also 268 essential, and an appropriate loss function can help the model 269 achieve the best and fastest convergence.

270
L2 loss is the most widely used loss function in image 271 recovery and is also the main performance measure (PSNR) 272 for these problems. However, research [29] reported that L2 273 loss training does not guarantee better performance in terms 274 of PSNR and SSIM. In their experiments, L1 loss was used as 275 a loss function, and the experiments showed that the network 276 trained by L1 loss had better performance than that trained by 277 L2 loss.

278
L2 loss function is the mean squared error (MSE) between 279 the predicted value f (x i ) and the target value x i , which is 280 defined in Equation (3): L1 loss function is the mean absolute error (MAE) between 283 the predicted value f (x i ) and the target value x i , which is 284 defined in Equation (4): where N the total number of images.

289
The TDCN proposed in this study imitates the image CS 290 process, as shown in FIGURE 3. Similar to block-based com-291 pressive sensing (BCS), TDCN uses a CNN to complete three 292 operations: compression sampling, preliminary reconstruc-293 tion, and deep reconstruction. TDCN has a sampling network 294 and a reconstruction network, where the sampling network is 295 used to obtain CS measurements through a learning sampling 296 matrix, and the reconstruction network is used to obtain 297 the reconstructed images from the CS measurements. Nor-298 mally, a reconstruction network consists of a preliminary 299 reconstruction network and a deep reconstruction network. 300 The preliminary reconstruction network is a linear operation 301 that reconstructs images from the CS measurements initially, 302 whereas the deep reconstruction network is a nonlinear oper-303 ation that can further improve the quality of the preliminary 304 reconstructed images.

336
The sampling convolution layer can be described as where * represents the convolution operation, y samp is a

344
According to the CS theory [2], the image can only be 345 reconstructed from measurements under sparse conditions. 346 We design a preliminary reconstruction network for the pre-

364
In the deep reconstruction network, we cascade multiple 365 TSRB modules to increase the non-linearity of the network.

366
To avoid losses of contextual information learned by TSRB, 367 we extract the feature maps from each TSRB for fusion at 368 layer ''Concat''. To reduce the memory cost and increase the 369 running speed, we add two convolutional layers to reduce the 370 output dimensions of the feature fusion layer. At the output 371 of these two convolutional layers, the output of the first con-372 volution layer of the deep reconstructed network is added to 373 form a global residual network module. A feature aggregation 374 operation is used to obtain the final output images.

375
The above process can be expressed as: where y out is the final recovered high-quality image, y int is the 378 output low-quality image of the preliminary reconstruction 379 network, W out and B out correspond to the feature aggregation 380 operation kernel and biases respectively, y TSRB is the residual 381 between quality images y int and high-quality images y out . The 382 final TDCN is shown in FIGURE 3.

385
For the purpose of comparison, the network parameters of 386 TDCN are set as follows: the block size in the sampling pro-387 cess is the same as that of CSNet, that is, B = 32 and l = 1. 388 We initialize the weights using the method described in [35], 389 which is a reasonable and effective method for networks 390 with the ReLU activation function. Training is performed by 391 optimizing equation (4) using adaptive moment estimation 392 (Adam) [36], and we use the default settings to initialize the 393 other parameters of Adam.

394
All our experiments are conducted by training the network 395 with a common image super-resolution dataset, DIV2K [37]. 396 The DIV2K dataset includes 800 training images, 100 val-397 idation images, which are saved as ''. png'' file. Similar to 398 CSNet, data augmentation technology has been applied to 399 increase the training dataset [17]. We crop the training images 400 with a stride of 32 to obtain a sub-image size of 96×96 pixels. 401 We then randomly choose 96000 sub-images for network 402 training. A total of 100 epochs are trained, and each epoch 403 has 3,000 iterations, with a batch size of 32. We set the initial 404 learning rate to 0.0004 and decay it to half per 10 epochs. 405 Different sampling rates are used to measure the image. 406 We use Set5 [30], Set11 [24], Set14 [31], and BSD100 [32] 407 as test datasets. All experiments are performed on a platform 408 with an i9-9900k CPU and NVIDIA RTX2080Ti GPU.   parameters. The test on these algorithms is performed on 440 dataset Set11. It is noted that the four traditional algorithms 441 use random matrix as sampling matrix but the proposed 442 TDCN uses convolution layer.

443
As shown in TABLE 1, TDCN consistently performs better 444 than all the compared algorithms at different sampling rates 445 on dataset Set11. In terms of PNSR, on average, our pro-446 posed TDCN wins TV, MH, GSR, and D-AMP over 5.72 dB, 447 3.03 dB, 1.43 dB, and 6.54 dB, respectively.          Experimental results show that TDCN_TFA performs bet-547 ter than TDCN_TFA + and TDCN_TFA ++ . However, our 548 proposed TDCN outperforms other variants in PSNR value. 549 It proved that using DConv only is the best option compared 550 with other assumed situations in Table 4.

552
In this study, we propose a tree-structured dilated convolu-553 tional network for image compressed sensing. The algorithm 554 uses a tree-structured residual block to recover the detailed 555 image features in deep reconstructed networks fully. Mean-556 while, we use L1 loss rather than L2 loss to train the network. 557 All experimental results demonstrate that the reconstructed 558 images of TDCN have more detailed structural information 559 and a sharper appearance. The proposed TDCN outperforms 560 the current algorithms in both the PSNR and SSIM metrics, 561 and the running speed of the algorithm is comparable to 562 that of the current algorithms. In the future, we will con-563 sider applying TDCN to CS in hyperspectral remote sensing 564 images and study an algorithm that utilizes interspectral cor-565 relation to obtain higher reconstruction quality.