Intra Prediction-Based Measurement Coding Algorithm for Block-Based Compressive Sensing Images

Block-based compressive image sensor (BCIS) captures light and represents them as compressed data called measurement. It has potential to revolutionize conventional image and video acquisition system that builds upon high complexity and redundant process. However, by comparing the compression performance between these two systems, BCIS cannot reduce bitrate to similar factor as the compressed media by pixel-based compression algorithms. It still requires enormous amounts of bit to store and transmit data. In this work, we introduce intra prediction based measurement coding algorithm for giving an extra compression performance to measurement. Moreover, importantly, there is a requirement that sensing matrix for BCIS must not be derived from non-uniform distribution in order to control prediction accuracy and quality. Therefore, we use structural sensing matrix made of sequency-ordered Walsh-Hadamard. Furthermore, it allows boundary pixels of adjacent blocks to be accessible through measurement, which helps intra prediction to generate its candidates accurately. The algorithm encodes prediction error between target measurement and multiple prediction candidates, resulting in smaller data size. This work can significantly reduce bpp by 10.90% and simultaneously increase 3.95 dB in PSNR compared to the state-of-the-art works. Moreover, we implemented the proposal on FPGA. It gave 10 times higher throughput than software. The core power consumption is at 50 mW and working at 88 MHz when processing $3840\times2160$ pixels with the sampling rate of 1/4.


I. INTRODUCTION
Over the past few years, block-based compressive image sensor (BCIS) has gained significant interest in imaging technology. It can solve analog-to-digital converter (ADC) problems in conventional image sensor such as slow pixel readout time and power consumption.
A single-pixel camera was successfully developed by using digital micromirror device (DMD) array [1]. It is useful for microscopy and microanalysis applications [2]. Nevertheless, it required long sensing time when the resolution is relatively high. Robucci et al. proposed separable-transform BCIS [3]. It could capture image faster than single-pixel camera. Nevertheless, it had limited in frame-per-second (fps).
The associate editor coordinating the review of this manuscript and approving it for publication was Li Minn Ang .
Later, Oike and El Gamal proposed programmable BCIS with per-column Sigma-Delta ADC to reduce sensing time and increase frame-rate up to 1920 fps [4], which overcame the problems of [1] and [3].
In [5] and [6], they gave an opinion that BCIS can revolutionize image and video capturing and compressing scheme, where bitrate could be varied depending on preferred quality. However, at this stage, it is not suitable for consumer devices because they do not have enough computational resources for decoding the measurement. In the meantime, there is a suitable application for BCIS such as wireless surveillance system because the measurement can be decoded at monitoring sites with unlimited resources [7], [8].
To transmit measurement wirelessly, it must be compressed into more compact format, resulting in lower transmission costs [9], [10]. Therefore, in this work, we propose four modes intra prediction based measurement coding (IPMC) algorithm including upper, left, average, and no prediction. However, there is a requirement that sensing matrix for BCIS must not be derived from non-uniform distribution in order to control prediction accuracy and quality. Therefore, we use structural sensing matrix (SSM) made of sequency-ordered Walsh-Hadamard (SoWH). It allows boundary pixels of adjacent blocks to be accessible through measurement, which helps intra prediction to generate its candidates accurately. The algorithm encodes prediction error between target measurement and multiple prediction candidates, resulting in smaller data size.
To demonstrate the applicability and versatility, we evaluate the proposal using numerous video datasets in 4K resolution using peak-to-noise-ratio (PSNR), structural similarity index measure (SSIM), and bits-per-pixel (bpp). Moreover, we increase compression throughput by extending the proposal from software to hardware using FPGA.
We establish notation and provide a brief background of compressive sensing theory in subsection A. and related works on measurement coding in subsection B. Section II provides the proposed IPMC algorithm for BCIS and hardware architecture. Section III provides extensive simulation results and compares them with state-of-the-art works. Further, we also report hardware implementation summary. Section IV provides conclusions.

A. COMPRESSIVE SENSING THEORY
Compressive sensing (CS) is built upon two major fundamental conditions consisted of sparsity and incoherent [11], [12]. There are essential conditions in order to apply CS. First, a signal characteristic of x must be sparse when expressing in a specific orthonormal transform basis (See Appendix A-A). Next, sensing matrix ∈ R m×n , it can be made of random distribution called random sensing matrix (RSM), where the number of dimensional m must be less than n, known as sampling rate (SR).
The measurement y ∈ R m×1 can be obtained by projecting to x. Nevertheless, traditional CS is not suitable for a large scale problem because it requires long sensing time. In [13] proposed partitioning approach to traditional CS by dividing an entire frame into multiple non-overlapping blocks. Instead that n equal to frame size, now n will be equal to b × b, where b is block size. Hence, x will be sampled with smaller , resulting in faster projection. It can be expressed mathematically as: where y i ∈ {ẏ 1 ,ẏ 2 ,ẏ 3 , . . . ,ẏ m } is measurement of compressible signal of x i ∈ {ẋ 1 ,ẋ 2 ,ẋ 3 , . . . ,ẋ n } and i is the block order through raster scanning as shown in Figure 1.
To guarantee a good reconstructed image, the sensing matrix should satisfy restricted isometry property (See Appendix A-B). In this work, we recover y by using a classical method via convex optimization called 1 -minimization (See Appendix A-C).

B. RELATED WORKS ON MEASUREMENT CODING
Up to the present time, most of the CS literature has been devoted to study the recovery of sparse signals from a small number of measurement, but less in measurement coding algorithm. By referring to the legacy vector compression algorithms, introduced for lossless coding [24], and its extension for lossy coding [25]. Although, it is possible to use these legacy approaches to encode measurement. Nevertheless, it requires a precise design for each system specifically, which is not convenient.
Scalar quantization (SQ) provided a straightforward approach to compress measurement. By comparing to vector compression algorithms, SQ gave higher performance and versatility than vector compression. Nevertheless, it has been established that SQ is highly inefficient in terms of information-theoretic rate-distortion (RD) performance [26]- [30]. Additionally, it require an iterative recovery algorithm to predict corrupted quantized measurement such as quantized iterative hard thresholding (QIHT) [31], quantized compressed sampling matching pursuit (QCoSaMP), and adaptive outlier pursuit for quantized iterative hard thresholding (AOP-QIHT) [32].
Next, differential pulse-code modulation (DPCM) was introduced to reduce bitrate [33]. DPCM used a single prediction candidate to predict target measurement. However, the disadvantage is that the single prediction candidate may contain irrelevant information to target measurement, resulting in unstable bitrate reduction.
Afterward, spatially directional predictive coding (SDPC) was introduced in [34]. This work was implemented based on DPCM. It gave higher compression performance than SQ and DPCM, while improved image quality. However, this work used RSM as sensing matrix, resulting in unstable quality and unstable bitrate when coding the same image. Importantly, they could not embed this kind of sensing matrix into hardware device, where it can only handle binary signal sources.
Later, intra prediction based measurement coding with modification of RSM was introduced in [35]. This work was inspired by intra prediction concept from conventional pixel-based compression algorithms that uses boundary pixels of adjacent blocks to predict target block. By imitating the conventional approach, they modified sensing patterns of RSM corresponding to obtain boundary pixels of adjacent blocks called hybrid sensing matrix (HSM). They used that FIGURE 2. The overall architecture of BCIS and IPMC algorithm, where each measurement of block denoted by y 1 , y 2 , y 3 , and y m , respectively. Subsequently, each element in measurement denoted byẏ 1 ,ẏ 2 ,ẏ 3 , andẏ m were sampled by SoWH 1 , SoWH 2 , SoWH 3 , and SoWH m , respectively. To demonstrate the data flow of the coding process, we illustrated an example of the process by using y 1 , which refers to the first block.
boundary pixels information to generate intra prediction candidates. This work significantly reduced bitrate lower than SQ, DPCM, and SDPC. Nevertheless, it produced sampling artifact to the image due to the modification of sensing matrix.
In our previous work [36], we adopted SSM made of Natural ordered Hadamard (NoH) to obtain boundary information. We proposed four modes intra prediction included upper, left, average, and no prediction. This work significantly reduced bitrate and improved image quality compared to other works in this literature.

II. PROPOSED INTRA PREDICTION BASED MEASUREMENT CODING FOR BCIS
In this work, we improve coding performance and image quality based on the previous work in [36]. The overall architecture including BCIS and IPMC algorithm can be seen in Figure 2. There are three primary signals control BCIS including column selector, row selector, and pixel selector where each pixel will be selected according to the sensing matrix.
Subsequently, we adopt SoWH as SSM. It can gather information more efficiently than RSM, HSM, and SSM made of NoH due to higher orderliness, resulting in better image quality. Furthermore, it allows pixels boundary information of neighboring blocks to be accessible through measurement without modifying the sensing matrix.
After obtained the measurement, it will be transferred to IPMC algorithm to compress. We use boundary pixels of adjacent blocks to deliver four modes intra prediction. We encode target measurement by finding minimum distortion with multiple prediction candidates, resulting in smaller data size.
Next, we apply quantization to reduce the probability symbol for Huffman coding. In addition, we include inverse quantization inside the transmitter to estimate prediction candidates loss by quantization at the receiver. Subsequently, we use that estimated prediction candidates to predict the next target measurement. Otherwise, the decoder at receiver will act as error accumulator, in which a single corrupted measurement can initiate recovery error to the whole image.
Moreover, to increase coding performance, throughput, and to realize IPMC algorithm in real-world applications, we implement the proposal in hardware level and evaluate it on FPGA.

A. SEQUENCY-ORDERED WALSH-HADAMARD SENSING MATRIX
To use BCIS to capture the light, there is an implementation constraint that sensing matrix must be {0, 1} because the pixel selector can handle only digital signal (i.e., low (0) and high (1)). Let NoH can be obtained by order n, it can be said to be NoH if the transpose of the matrix T NoH is closely related to its inverse. It can be expressed as given below: where nI n×n is the identity matrix and T NoH is the transpose of matrix. By applying Sylvester's construction to NoH , resulting in Walsh-Hadamard matrix denoted by WH as the following: for 2 ≤ k ∈ n, where ⊗ denotes the Kronecker product. Subsequently, we applying bit-reversal and gray-code permutation, resulting in sequency order of WH denoted by SoWH this sensing matrix satisfies the RIP with a probability VOLUME 9, 2021 In general, boundary pixels of adjacent blocks have information that closely related to target block. Hence, we use that boundary information to deliver four modes intra prediction including upper, left, average, and no prediction.
Firstly, prediction parameter preparation, different sensing patterns can refer to each order of SoWH , which use to obtain each element of y. It offers several features that allow boundary information to be accessible from measurement. Hence, we can trace back to which pixels in the block had read. In this work, there are three significant sensing patterns as shown in Figure 3.
where white and black squares indicate the pixel that is being read and skip, respectively.
For instance, by multiplying x with SoWH 1 , the first element denoted byẏ 1 is the summation of 4 × 4 pixels; The second elementẏ 2 can be obtained by multiplying x with SoWH 2 , which is the summation of upper-half 2 × 4 pixels; and the third elementẏ 32 can be obtained by multiplying x with SoWH 32 , which is the summation of half-left 4 × 2 pixels.
However, the parameters that necessary for generating prediction candidates are located in black squares, which are opposite-side of SoWH 2 and SoWH 32 . To retrieve them, sinceẏ 1 is a summation of all pixels in the block. Therefore, the data in black squares can be obtained by subtractingẏ 1 withẏ 2 andẏ 1 withẏ 32 , resulting in sum of bottom-half 2 × 4 pixels and sum of the right-half 4 × 2 pixels, respectively.
To understand the concept clearer, we present the subtraction process by referring to sensing patterns subtraction as shown in Figure 4. We note that this method delivers the same results as modifying the sensing matrix to obtain boundary pixels of adjacent blocks. Besides, the image quality will not be disturbed as the work in [35]. Further, we demonstrate the group of pixels after subtraction over image as shown in Figure 5.
At this stage, the parameters are the representation of multiple pixels. It is necessary to average them by dividing by the number of active pixels (i.e., in the case of 2 × 4 pixels and 4 × 2 pixels, the number of active pixels equals 8).  Afterward, we multiply the averaged parameters with SoWH to generate vector known as intra prediction candidate. To sum up, the candidate generation procedure of each mode can be explained by the following equations: Up mode: Left mode: Average mode The final prediction candidate y p can be estimated by finding minimum error between target measurement y with prediction candidates y c ∈ {y u , y l , y avg }. It can be expressed as the following: In addition, in case there is no prediction candidate selected from y c , y p will be equal to zero, which means no prediction. The residual measurement y r can be calculated by subtracting y with y p . It can be expressed by

C. SCALAR QUANTIZATION
We further reduce bitrate and probability symbols of y r using SQ. It maps residual measurement y r into a finite sequence of codewords with quantization step equal to . It can be expressed by: y q = y r (10) where Q b is quantization bit and quantized measurement denoted by y q . Subsequently, inverse quantization maps y q into y iq that is an approximation of y r . It can be expressed by: In this work, we fixed Q b at 4 bits, which is sufficient to reduce bitrate and probability symbols. Furthermore, we include inverse quantization inside the transmitter to estimate prediction candidates loss by quantization at the receiver. Subsequently, we use that estimated prediction candidates to predict the next target measurement. If both sides do not have the same prediction candidates information, the decoder will act as error accumulator, in which a single corrupted measurement can ruin the whole image. We note that y q needs to transmit along with 2 bits side information of prediction mode and of each block to the receiver.

D. HARDWARE IMPLEMENTATION OF PROPOSED IPMC ALGORITHM FOR BCIS
In this section, we extend IPMC algorithm from software to hardware for increasing throughput. The hardware architecture can be seen in Figure 6. It can be placed next to BCIS. Hence, the target measurement y can be encoded and transmitted immediately. The hardware procedure of measurement obtaining and coding can be described as the following step: Step 1: Send block coordinate denoted by rows_addr and columns_addr to BCIS.
Step 2: Obtain y from BCIS according to rows_addr and columns_addr.
Step 3: Fetch prediction parameters from registers. We note that when block coordinate rows_addr = 1 and columns_addr = 1, y will go straight to quantization without prediction. This is a special case in IPMC algorithm because there are no prediction parameters available for the first block.
Step 4: Average prediction parameters and multiply them with SoWH to generate y u and y l .
Subsequently, we reported RD-curve performance in various setting of Q b as shown in Figure 10.
Step 6: Subtract y with y p , resulting in y r .
Step 7: Apply quantization to y r , resulting in y q .
Nevertheless, the data structure of y is vector. Without optimization, it requires at least m − 1 clock cycles to encode y. Therefore, we optimize vector summation module using a tree-like pipeline technique as shown in Figure 7a. and non-pipeline in Figure 7b, in which clock cycle can be shorten from m − 1 to log 2 (m). Consequently, it requires slightly higher resources than non-pipeline.
Subsequently, it is necessary to prepare the prediction parameters for the next target measurement. The procedure of prediction parameters preparation can be described as the following step: Step 1. Apply inverse quantization to y q , resulting in y iq .
Step 2. Decode y iq by adding y p , resulting inŷ.
Step 5. Store the results in registers for the next prediction.

III. EXPERIMENTAL RESULTS
In this section, we reported the performance of IPMC algorithm using PSNR, SSIM, and bpp. The simulation results delivered by MATLAB using l 1 -minimization via primal-dual interior-point method [18]. We used multiple 4K datasets [37] consisted of Beauty, ReadySetGo, Bosphorus, and HoneyBee. Lastly, we reported hardware implementation results in terms of device specification and throughput.
same SR, reflecting a higher ability to gather compressible signals.

2) OVERALL PERFORMANCE COMPARISON WITH STATE-OF-THE-ART WORKS
As the results shown in Table 1, firstly, we compared our proposal with the works that used SQ to code measurement such as Bernoulli + SQ [26], NoH + SQ [27], and Gaussian + SQ [28]. Our proposal overcame them in terms of higher PSNR and SSIM, and lower bpp. Nevertheless, NoH + SQ gave incredible results in bpp reduction because the data structure of measurement has highly uncorrelated. By using the equation (9), it returns a large parameter of . Thus, SQ will give a huge image degradation as can be noticed by artifacts. Based on the uncontrollable performance of SQ, where the performance will be varied depending on sensing matrix. We assume that SQ is an inefficient coding method, which correspond to the opinion stated in the most recent literature. Next, we compared our proposal with state-of-theart works that utilized measurement coding and SQ such as DPCM + SQ [33], SDPC + SQ [34], and Intra Pred. + HSM [35]. This work significantly outperformed by reduced 10.90% of bpp, increased in PSNR and SSIM by 3.95 dB and 10.17%, respectively.
These results emphasized our opinion that compression performance can be increased by designing a good pair of measurement coding algorithm and sensing matrix. The measurement sampled by SSM has higher data structure consistency, which enabled coding algorithm to perform better, resulting in higher compression performance. Hence, the most important element in measurement will be encoded and will not be ruined by quantization, resulting in an improvement of PSNR and SSIM.
Lastly, we provided visual quality comparison of reconstructed images in Figure 9. This work provided better image quality than state-of-the-art works without compression artifacts at the edge of object.
It can be seen that this work gave a remarkable coding performance, where the data were encoded and not ruined by quantization even at shallow Q b .

3) RESULTS OF HARDWARE IMPLEMENTATION
we reported hardware specification of IPMC algorithm in Table 2. The full block diagram and schematic in the Altera Quartus tools is located in Figure 11. The IPMC algorithm consumed total logic utilization by 5,948/41,910 logic elements and total registers by 2,138. The throughput of this algorithm is 5 Gpixels/s and operated at 88 MHz. This architecture cost the power of 50 mW for encoding 3840 × 2160 pixels, where the SR is fixed at 1/4. Lastly, we provided top-level timing diagram of BCIS and IPMC algorithm in Figure 12.

IV. CONCLUSION
BCIS is an innovative approach, in which turned conventional image and video system upside down. In this work, we closed the gap of compression performance between these BCIS and conventional systems. Our proposal capitalizes on a good pair of sensing matrix and the IPMC algorithm, which gave an extra compression performance to the traditional CS paradigm. Further, our proposal gave the highest compression performance compared to state-of-the-art works, which gradually closing the possibility gap to replace image and video acquisition system with BCIS and novel coding algorithm. Moreover, we implemented the proposal on FPGA, which IPMC algorithm gave simpler and less complexity than conventional algorithms.

APPENDIX A COMPRESSED SENSING THEORY A. SPARSE SIGNAL CHARACTERISTIC
This characteristic implies that only a few coefficients would contain the majority of the signal information. It can be expressed by: where x ∈ R n×1 is the vectorized signal and θ ∈ R n×1 is the sparse vector that contains the projection of x in the transform basis ∈ R n×n .

B. RESTRICTED ISOMETRY PROPERTY
To guarantee a good reconstructed image, we should follow a hinge on a characterization of sensing matrix called restricted isometry property [14]. We can determine the lower bound dimensional of m for non-uniform distributed sensing matrix using the following equation: where δ s ∈ {1, 0}.

C. SPARE SIGNAL RECOVERY VIA CONVEX OPTIMIZATION
By solving ill-posed linear inverse problems via convex optimization to recover the signal, CS states that if the signal x is compressible by sparse transform and is highly incoherent to . The signal can accurately recover from dimensional m of incomplete measurement in the coefficient domain as:x where −1 is inverse transform [15]- [17]. In this work, we recover y by using a classical method via convex optimization called 1 -minimization [18] as follows: x = arg min 1 2 x-y 2 + λ x 1 (15) Further, there are greedy-based recovery algorithms included orthogonal matching pursuit (OMP) [19] and its extension stagewise OMP [20], A*OMP [21], CoSaMP [22], and TwIST2 [23] have been proposed for CS. However, by comparing to 1 -minimization, greedy-based methods are generally faster because they take advantage of sparsity structure via minimizing a sequence of subspace problem. However, it requires higher prior knowledge and a decent amount of measurements to make a good reconstruction image.