An Adaptive Distributed Compressed Video Sensing Algorithm Based on Normalized Bhattacharyya Coefficient for Coal Mine Monitoring Video

Compared with traditional video surveillance systems, wireless video sensor systems are more suitable for emergency application scenarios, such as underground coal mine disaster rescue, due to their low power consumption and rapid deployment. Considering the limited computing power and transmission bandwidth of video sensor nodes, we propose an adaptive compression and hybrid multiple hypothesis based residual reconstruction algorithm based on normalized Bhattacharyya coefficient (NBCAC-MHRR) to solve the high efficiency video coding (HEVC) problem in underground coal mines. First, a low-complexity adaptive sampling rate allocation method is performed on the encoding side. Second, by integrating the background subtraction idea, we combine the high-quality reconstruction of the foreground with the multi-hypothesis residual reconstruction of the background to improve the overall reconstruction effect of the video sequence. Simulation results show that the proposed algorithm can achieve higher reconstruction quality and efficiency than previous methods, especially in underground coal mine application scenarios.


I. INTRODUCTION
Because of the limited transmission bandwidth of wireless communication in underground coal mines, how to reconstruct high-quality video sequences from a limited number of samples has become the main research direction. Traditional video compression algorithms, such as MPEG-2/4 and h.26x, require a lot of additional calculations on the encoding side, which bring a huge energy burden to the wireless video sensor node. In addition to the design of micro-power consumption multiple-input and multipleoutput devices [1], transferring the amount of calculation from the encoding end to the decoding end is another feasible solution. Distributed compressed video sensing (DCVS) [2] combines the characteristics of ''independent coding The associate editor coordinating the review of this manuscript and approving it for publication was Jinming Wen . and joint decoding'' of distributed video coding [3] and the low coding complexity of compressed sensing (CS) [4], which makes low coding complexity and high reconstruction quality possible. In DCVS, an important research direction is to obtain more accurate side information (SI) from the reference frame. Tramel and Fowler [5] obtained a higher reconstruction quality by using iterative pixel-domain motion estimation (ME)/motion compensation (MC) and Tikhonov regularization. Chen et al. [6] proposed a hybrid hypothesis prediction method combining single-hypothesis and multiple-hypotheses (MH). Ou et al. [7] developed a two-stage MH reconstruction scheme, this scheme successively performs ME/MC iteration and residual reconstruction in measurement-domain and pixel-domain, improving the reconstruction quality at the expense of a certain reconstruction efficiency. Li et al. [8] presented a multihypothesisbased residual reconstruction scheme (MR-MHRR) that VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ generates hypothesis blocks in the residual domain and calculates the linear prediction weights in measurement-domain.
Chen et al. [9] presented an iterative reweighted Tikhonovregularized scheme for MH prediction reconstruction and a Bhattacharyya coefficient-based stopping criterion to avoid over-iteration. References [5]- [9] mainly use the pixel domain /measurement domain ME/MC method to generate the hypothesis as accurately as possible, thereby improving the performance of residual reconstruction.
Besides, the HEVC standard introduces more compression and coding techniques, such as variable-size block transformation, multi-directional intra prediction, multi-frame MC, and so on. Li et al. [10] proposed a new ME method called High Efficiency Video Coding Motion Estimation (HEVC-ME), which uses coding units of different sizes, the calculated residuals are subjected to Hardman transform first and then sum the absolute values as the rate-distortion function to generate more accurate SI. Zhang et al. [11] used the 3D block matching approximate message passing [12] to reconstruct key frames, and then integrated the motion vector prediction method to propose a new HEVC-ME algorithm, which combines the concept of segmentation with ME to obtain more accurate SI, finally the l 1 − l 1 minimization model is employed for joint high-quality reconstruction of non-key frames.
In general application scenarios, reconstruction quality is widely concerned as an important performance indicator. Tian et al. [13] presented a dictionary learning based reconstruction method to achieve the joint optimization of sparse representation and signal reconstruction. Oishi and Kuroki [14] used l 1 -norm error instead of l 2 -norm to increase the robustness against outliers, and then applied an alternating direction method of multipliers to minimization the cost function. However, the actual situation has slight differences when applied to underground coal mines. Surveillance videos in the underground environment usually have the characteristics of poor illumination and low resolution, and their frames are closer to the gray image. Jiang et al. [15] indicated that the underground coal mine monitoring requires rapid response to various anomalies in a complex environment.
The idea of distinguishing the foreground and background in target detection algorithms [16] is also applicable to video compression. However, most background difference methods will increase the computational burden and energy consumption of video sensor nodes, which is contrary to the DCVS theoretical framework.
In terms of adaptive sampling, Stanković et al. [17] proposed a sparseness prediction method based on previous reference frames which are sampled conventionally. The image blocks are simply divided into sparse and non-sparse blocks, and the sparse blocks are compressively sampled and the non-sparse blocks are fully sampled. Wu and Zhu [18] proposed a dynamic measurement rate allocation method that can adaptively adjust measurement rates by estimating the sparsity of each block via feedback information.
Vijayanagar et al. [19] introduced an adaptive measurement rate allocation algorithm using rate-distortion optimization and utilized a simple block classification and framedifferencing module to reduce encoding complexity. The methods proposed in [17]- [19] have limitations such as high coding complexity or relying on feedback information, which are usually not suitable for practical application scenarios.
With insight into the adaptive sampling strategy which can reduce background redundancy by well exploiting temporal correlation, an adaptive compression and hybrid multiple hypothesis residual reconstruction algorithm based on normalized Bhattacharyya coefficient (NBCAC-MHRR) is presented. There are two contributions in the NBCAC-MHRR: 1) By using the normalized Bhattacharyya coefficient to shift the center of gravity of the video compression from frame to block, which provides convenience for adaptive compression and reconstruction. 2) A hybrid reconstruction framework that combines traditional reconstruction methods with multihypothesis-based residual reconstruction is used for the subsequent reconstruction, which improves the overall reconstruction performance.

II. RELATED WORK
In this chapter, we briefly introduce the basic MH prediction and residual reconstruction framework and describe the research on adaptive sampling.

A. MH PREDICTION AND RESIDUAL RECONSTRUCTION
In DCVS, the block-based compressed sensing method is used for sampling video frames, the initial frames are divided into b × b blocks, then each block is independently sampled denotes a rasterized vector of the block in row i, column j and frame k, B is an m × b 2 orthonormal Gaussian random measurement matrix such that the block sampling rate is S = m/b 2 . How to get a hypothesis that is as close as possible to the original image block is a key issue in the MH prediction scheme. The reconstructed frames are used as a priori information to participate in the subsequent reconstruction of non-key frames in the form of SI, and the hypothesis set H k i,j will be generated from the obtained SI, then the MH prediction is conducted by: wherex k i,j is an approximate estimate of x k i,j , H k i,j is a b 2 × P matrix whose columns h 1 , h 2 , · · · , h P are the rasterizations of hypothesis blocks generated from the SI andŵ k i,j is a linear prediction weight vector obtained by the following optimization method: in (2) is as follows: For the ill-posedness of the solution in (3), the most common method is to use Tikhonov regularization: where λ is a factor and is the Tikhonov matrix defined as: In fact, for each block of the video frame,ŵ k i,j can be directly calculated by the following formula: , as an alternative to directly reconstructing y k i,j , the final result can be obtained from (7) and (8): where reconstruct(•) represents a certain CS image reconstruction algorithm.
In recent years, scholars have conducted a lot of research on reconstruction algorithms for compressed sensing. Wen et al. [20] proved sufficient conditions for the minimization of x 1 − x 2 based on mutual coherence. Li et al. [21] presented a sufficient condition for the exact support recovery of K -row sparse matrices in the noisy case. The smoothed projected Landweber reconstruction algorithm is widely used because of its simplicity and efficiency. On this basis, Fowler et al. [22] used a method called SPL-DDWT that applied dual-tree DWT (DDWT) [23] as the sparsity transform with bivariate shrinkage [24] to enforce sparsity. Compared with the method of directly using reconstruction algorithms, the residual reconstruction can usually achieve higher accuracy.

B. ADAPTIVE SAMPLING SCHEME
Considering the limited storage and computing power at the encoding end, CS-based video encoding methods usually directly compress the captured video data without temporarily storing the original data. Therefore, it is difficult to allocate the sampling rate without the support of the original data.
The method proposed in [17] performs a full sampling of key frames and then applies DCT on each block. The sparse blocks are selected in the following manner. Let C denotes a small positive constant, T represents the average number of non-significant DCT coefficients over all blocks, and D i,j represents the number of DCT coefficients in the block x i,j . If T < abs(D i,j ) < C, the block x i,j is selected as a reference for compressive sampling. This method of using DCT coefficients to determine the sparsity of image blocks requires full sampling of references and additional transformation operations, and the sampling rate allocation method that determines only partial sampling or full sampling is relatively rough.
In response to the problems in [17], an adaptive sampling rate allocation method based on feedback information was proposed in [18]. This method performs sparsity estimation at the decoding end and sends it back to the encoding end in the form of feedback information. The detailed description is shown in Fig. 1.
Although this method of using feedback information can reduce the calculation and storage burden of the encoding end, it does not consider the complex transmission structure and is not suitable for non-point-to-point application scenarios, such as wireless video sensor networks.
Reference [19] proposed a measurement rate computation method based on rate-distortion optimization. The method first performs DCT transformation on the original signal, then calculates the difference between consecutive DC measurements, and binarizes the residual using a k th -order Exp-Golomb code. Let the size of the compressed DC measurements be denoted by R DC , then the maximum size for encoding the AC measurements can be calculated by: where R total represents the bit-budget available for compressing, R map is the size of the classification map. Assume that the set of all CS blocks is defined as X = {X 1 , X 2 , · · · , X P }, and L = {L 1 ,L 2 , · · · ,L P } represents the set of all measurement rates, then the problem can be transformed into the following constrained optimization problem: where R (X , L) represents the total sample number of the AC measurements, and D(X , L) is the distortion obtained by VOLUME 8, 2020 calculating the sum of the mean square error between the original CS blocks and the reconstructed blocks. Equation (10) can be rewritten as an unconstrained Lagrangian formulation as: where λ ≥ 0 is the Lagrangian multiplier, D X p , L p and R X p , L p represent the distortion and total sample number of the block X p , respectively. This rate-distortion optimization method provides an idea for adaptive sampling. However, it is noted that the calculation of the distortion D(X , L) requires the reconstructed information on the decoding end, which is difficult to achieve for practical application scenarios such as coal mine video surveillance systems.

III. ADAPTIVE COMPRESSION AND MULTIHYPOTHESIS-BASED RESIDUAL RECONSTRUCTION ALGORITHM
Although many researchers have proposed various methods to optimize the accuracy of SI, there are still differences between MH predictions and actual frames. From a block perspective, these differences will decrease between background blocks and increase between foreground blocks. Especially, it will be reduced to zero for completely still background blocks.
The NBCAC-MHRR algorithm is an adaptive sampling method that uses normalized Bhattacharyya coefficient (NBC) to simply reduce background redundancy and make full use of limited sampling. It uses an additional codebook to associate the encoding and decoding end, and then uses a hybrid scheme for adaptive reconstruction. By integrating the advantages of adaptive sampling and multihypothesis-based residual reconstruction, the proposed method solves the problem of inaccurate SI to a certain extent.

A. NORMALIZED BHATTACHARYYA COEFFICIENT
The Bhattacharyya coefficient is widely used in similar image recognition because of its characteristic of intuitively reflecting the degree of similarity between images. In the proposed NBCAC-MHRR, the NBC is used to distinguish between foreground and background.
For each block of the current frame, the calculation method of the NBC is as follows: Step1: Initialize a histogram array of size 256 for the current frame and the preceding frame respectively, and then perform frequency distribution statistics based on pixel values.
Step2: Use the obtained histogram array for calculation according to the following formula: where N k i,j represents the NBC between x k i,j and x k−1 i,j , I i,j (h) is the frequency of the pixel-value h in the given block.
The closer the NBC value to zero, the smaller the interframe variation of the given block. According to this characteristic, an adaptive compression method will be described in detail later.

B. ADVANTAGES OF THE PROPOSED METHOD USING NBC
For grayscale images, the method of using NBC for sampling rate allocation has the following advantages compared to the methods proposed in [17]- [19]:

1) STORAGE BURDEN
For each block of the preceding frame, a histogram array of size 256 needs to be stored. Compared with the method of storing entire key frames, the method using NBC can save about three-fourths of the storage space.

2) COMPUTATIONAL BURDEN
Compared with the complicated transformation, quantization, and coding operations, the proposed method using NBC is relatively simple.

3) DATA TRANSMISSION
A fixed-size codebook used to record the number of adaptive samples needs to be additionally transmitted. The size of the codebook depends on the number of blocks, and usually does not bring additional transmission burden. Besides, the one-way transmission method is more suitable for practical application scenarios than the two-way transmission structure that requires feedback information from the decoder.

C. ADAPTIVE COMPRESSION AND RECONSTRUCTION BASED ON NBC
In the proposed scheme, a major constraint is to ensure that the actual total number of samples is consistent with the target, and the number of samples per block does not exceed the block sampling limit. Therefore, the method for sampling rate allocation can be described as the following constraint optimization problem: L i,j = maxf s 1,1 , · · · , s i,j , · · · , s P,Q subject to L 1,1 + · · · L i,j + · · · + L P,Q = L where L i,j represents the number of samples in block x i,j , L represents the total number of target samples, f is a function between the numbers of samples and the ratio of NBC, V is the maximum block sampling rate used to prevent oversampling, b is the block size, and s is the NBC proportion whose calculation is as follows: where r_max and c_max represent the number of rows and columns respectively.
The specific adaptive compression scheme is as follows: Step1: Calculate the NBC between x k i,j and x k−1 i,j . Step2: Initialize the number of samples for each block according to the NBC proportion by the following formula: Step3: where W is the accumulation of the oversampling number whose initial value is zero.
Step4: For all blocks whose L k i,j < V × b 2 , distribute the oversampling number W according to the NBC proportion. Repeat steps 3 to 4 until W becomes zero.
For a completely static background block, the number of adaptive samples will be zero, and the reconstructed image block at the corresponding position of the previous frame is used as the reconstruction result. Also, since blocks with large inter-frame variations are assigned a higher sampling rate, a special reconstruction framework is used for the reconstruction of blocks whose adaptive sampling number reaches a preset upper limit. First, we use an orthogonal matching pursuit [25] algorithm with a discrete wavelet transform as the sparse basis for the initial reconstruction of the image block. Then, a total variance denoising algorithm [26] is used to improve the visual effect of the reconstructed blocks. The multi-hypothesis residual reconstruction method for blocks with small inter-frame motion will be described in the next section.
Through the above steps, we allocate more samples to the blocks with a larger inter-frame motion to obtain better reconstruction quality. Compared with the traditional uniform sampling method, the proposed method is particularly suitable for coal mine monitoring application scenarios due to its uneven sampling characteristics and is more conducive to reconstructing relatively clear images with lower sampling rates.

D. MULTIHYPOTHESIS-BASED RESIDUAL RECONSTRUCTION
The method based on multiple reference frames trades a lot of extra calculations for better reconstruction quality, which is not suitable for underground coal mine monitoring scenarios. A compromise method is to use the reconstructed frames that contain more SI as the reference. In this section, we use a measurement domain multi-hypothesis residual reconstruction scheme to reconstruct background blocks. The detailed process is as follows: Step1: Extend the reference frame. The block to be reconstructed at the edge of the image will result in the inability to generate a complete search window, a feasible method is to extend the reference frame. For all blocks at the edge position, a mirror projection step is conducted to obtain an extended reference.
Step2: In the extended reference, the search window is generated with the same position as the current block. The size of the search window is generally related to the degree of motion between frames.
Step3: In the search window, all blocks are obtained by rasterizing traversal and then vectorized to generate the search matrix. Afterward, select the best matching block by using the following method:

Algorithm 1 NBCAC Algorithm
Input: Current frame x k , the histogram arrays I k and I k−1 , the number of samples available L Output: Adaptive sampling number matrix L k 1: Step1: Calculate the normalized Bhattacharyya coefficient for each block. 2: for i = 1 to r_max,j = 1 to c_max do 3: if L k i,j > V × b 2 then 11: where s k represents the kth column of the search matrix.
Step4: Generate a hypothesis set generation window centered on the position corresponding to match_block. Then the hypothesis set H 1 is obtained by rasterizing traversal.
Step5: Obtain the hypothesis set H 2 from the other reference. Then combine H 1 and H 2 to get the final hypothesis set H i,j .
Then, the measurement-domain linear prediction weight is calculated by (6) and the MH prediction is obtained by (1).   The SPL-DDWT algorithm is used for the final residual reconstruction.

IV. EXPERIMENTAL RESULTS AND ANALYSIS
In this section, extensive experimental evaluations based on peak signal to noise ratio (PSNR) and reconstruction time are conducted to demonstrate the superiorities of the proposed NBCAC-MHRR.

A. CONFIGURATION AND SETTINGS
Experimental environment: We use an Intel(R) Core(TM) 4 CPU (∼3.6 GHz), with 8GB of memory. The programming software is Matlab-R2019b, and the operating system is Microsoft Windows 10. The formula for calculating the PSNR is as follows: where f max represents the maximum gray value and the default value is 255. One set of standard test video images and three sets of underground coal mine surveillance video images are used to verify the actual performance of the algorithm. The size of all video frames used for testing is 288 × 352, and the block size is 32 × 32, DDWT decomposition level is 3, the maximum sampling rate of all blocks is set to 0.8. All schemes use the Gaussian random matrix as the measurement matrix and SPL-DDWT [22] as the reconstruction algorithm.
Besides, we use a 99-frame underground coal mine surveillance video to verify the performance of the proposed algorithm when applied to video sequences. The size of the GOP is set to 9, where the first frame is regarded as a key frame and the rest are CS frames. The sampling rate of key frames is set to 0.6, and BCS-SPL [27] is used as the compression and reconstruction scheme.

B. PERFORMANCE OF NBCAC-MHRR
In this subsection, we verify the superiority of the proposed NBCAC-MHRR through multiple sets of comparative experiments. We compare it with some other DCVS schemes: 1) MH-BCS-SPL [5], which uses intra-frame multiple hypotheses to iteratively optimize the initial reconstruction results. 2) MR-MHRR [8], which uses a multi-hypothesis residual reconstruction framework based on multiple reference frames. 3) HEVC-ME [10], which generates more accurate SI by performing motion estimation with coding units of different sizes. Fig. 5 shows the original images used for experiment and analysis. We select the test image sets from video sequences ''Akiyo'' and ''Walk'', which include  an eye-opening motion and a leg-raising motion, respectively. Fig. 6 shows the reconstructed images of different algorithms at the sampling rate of 0.1. By comparison, it can be found that the overall reconstruction effects of the four schemes are relatively close, but there are some differences in the areas containing motion details. In these areas, MR-MHRR and HEVC-ME lose part of the detailed information, resulting in the reconstructed image being closer to the reference image rather than the original image. The proposed algorithm retains relatively complete motion information due to its non-uniform sampling characteristics. For a completely still background area, the adaptive samples number is zero, and for regions with large inter-frame motion, the adaptive sampling number will be much larger than the fixed sampling number to solve the uneven performance of SI. In this way, the problem of poor reconstruction quality of the image detail due to the limitation of the total number of samples has been solved to a certain extent, while the reconstruction quality of the background part will not be significantly reduced due to the relatively accurate SI obtained from the reference frame.
To further illustrate the superiority of the adaptive sampling scheme and enhance the contrast effect, we conduct the remaining experiments and analysis at the sampling rate of 0.2. Fig. 7 shows the distribution of the number of samples at the sampling rate of 0.2. Compared with fixed sampling, the adaptive sampling method can make full use of the limited number of samples, thereby improving the reconstruction quality of the moving area. Table 1 shows the PSNR and reconstruction time of several algorithms at different sampling rates. From the perspective of reconstruction quality, MH-BCS-SPL only uses intra-frame multi-hypothesis prediction without considering inter-frame correlation, resulting in poor performance at low sampling rates. From the perspective of reconstruction efficiency, the coding unit segmentation operation used by HEVC-ME will consume extra time, resulting in lower overall reconstruction efficiency. On the other hand, the amount of calculation required for inter-frame MH prediction will increase as the sampling rate increases. Therefore, the reconstruction time required by the proposed method and  MR-MHRR will increase as the sampling rate increases. It is worth noting that the proposed NBCAC-MHRR has obvious advantages when the sampling rate is low. As shown in Fig. 8, we compare the PSNR of the ''Walk'' sequence reconstructed by the four algorithms at the sampling rate of 0.2. MH-BCS-SPL performs the same intra-frame multi-hypothesis operation on all video frames, so the reconstruction quality is relatively stable. However, such a method that does not consider the inter-frame correlation is greatly limited by the sampling rate, resulting in lower overall  reconstruction quality. The other three algorithms all use the scheme of high-quality reconstruction of key frames and multi-hypothesis residual reconstruction of non-key frames, this kind of scheme will cause the reconstruction quality of non-key frames in the same GOP to show different degrees of fading. HEVC-ME optimizes motion estimation by calculating the rate-distortion cost of the current frame and the refer-ence frame. This method is largely limited by the accuracy of the reference frame, which causes its reconstruction quality to lag behind the multi-reference-based schemes. The proposed NBCAC-MHRR has a significant improvement in the overall reconstruction quality due to the adaptive sampling scheme.
The relatively slow fading curve shows that this scheme can effectively reduce the influence caused by error transmission.
On the other hand, the adaptive sampling scheme allocates more samples to the foreground area, which will inevitably lead to the degradation of the reconstruction quality of the background area. In most cases, this decline will not affect the visual quality of the reconstructed image, but it will still cause a decrease in PSNR. As we can see in Fig. 8, there are a few points in the PSNR curve of MR-MHRR that are better than the proposed NBCAC-MHRR, but the greatly attenuated curve shows that the scheme cannot maintain this advantage. Fig. 9 shows the reconstruction time curve of different algorithms. Compared with the other three algorithms, the proposed NBCAC-MHRR has obvious advantages.
In general, our simulation experiments show that the proposed NBCAC-MHRR can bring a certain improvement in reconstruction quality and reconstruction efficiency. This non-uniform sampling scheme is particularly suitable for monitoring application scenarios.

V. CONCLUSION
Herein, an adaptive sampling rate allocation and hybrid multihypothesis-based residual reconstruction algorithm based on normalized Bhattacharyya coefficient is proposed. The proposed algorithm uses NBC for adaptive sampling at the encoding side and then combines the pixel-domain ME technology with the measurement-domain linear prediction weight calculation to perform the residual reconstruction. In particular, a method combining OMP reconstruction and TV denoising is used to reconstruct blocks whose number of samples reaches the preset upper limit. The simulation results show that the proposed algorithm improves the reconstruction performance of video compression in complex surveillance scenarios, and achieves relatively high-efficiency and high-quality reconstruction at low sampling rates. The algorithm is of great significance for the realization of high-performance compression and reconstruction of surveillance video and the establishment of wireless video surveillance systems in underground coal mines. GANG HUA received the B.S. degree from Southeast University, Nanjing, and the M.S. and Ph.D. degrees from the China University of Mine and Technology, Xuzhou, China, in 2000 and 2004, respectively. He is currently a Professor with the Department of Information and Electronic Engineering, China University of Mine and Technology. His research interests include the control and supervision of mining safety, signal processing, compressed sensing, and network technology.
JIANWEI CHENG received the Ph.D. degree from West Virginia University, USA. He is currently an Associate Professor with the China University of Mining and Technology and an Australia ''Endeavour'' Research Fellow with Curtin University. He is teaching and researching in the field of underground mining safety. VOLUME 8, 2020