A Lossless Electrocardiogram Compression System Based on Dual-Mode Prediction and Error Modeling

Long-term electrocardiogram (ECG) monitoring requires high-ratio lossless compression techniques to reduce data transmission energy and data storage capacity. In this paper, we have proposed a high-ratio ECG compression system with low computational complexity. Firstly, as the morphologies of the ECG change over time, we divide the signal of each heartbeat cycle into two regions. To achieve high prediction accuracy, a 1st order linear predictor and a combination of the template predictor and 3rd order linear predictor are applied in the two regions respectively. Secondly, we introduce a context-based error modeling module to the system, which cancels the statistical bias of the prediction algorithm and further improves the prediction accuracy. Thirdly, we modify the Golomb-Rice encoding algorithm to adaptively encode the prediction errors, while preserving a code space for packaging the information that is necessary for prediction. We evaluate the proposed system by using the MIT-BIH Arrhythmia Database (ARRDB). The experimental results show that with memory requirements as low as 444 to 14556 total variables this system achieves a compression ratio (CR) from 2.975 to 3.040, suggesting that it is highly applicable to both the low-power design and the cloud.


I. INTRODUCTION
Electrocardiogram (ECG) indicates the electrical activity of the heart and it is the most commonly used method to monitor the heartbeat. With wireless and wearable healthcare devices, long-term ECG data can be recorded continuously for monitoring and diagnosis. However, collecting such a large amount of data requires excessive transmission energy or storage capacity, which significantly increases the cost of long-term ECG applications [1], [2]. Therefore, an effective and efficient data compression method for ECG signals is required.
ECG compression methods include lossless compression and lossy compression. In lossless systems, the reconstructed ECG can be exactly the same as the original ECG, which is generally more useful for cardiac disease diagnosis. Because the lossy compression discards some morphological The associate editor coordinating the review of this manuscript and approving it for publication was Charith Abhayaratne . information, and it has not been approved by medical bodies in most countries and cannot be used in commercial devices [3]. Therefore, this paper focuses on lossless ECG compression.
A good lossless ECG compression algorithm must achieve a high CR with low computational complexity and a low number of variables, which are closely related to the hardware requirement of the algorithm. Especially for lowpower application-specific integrated circuit (ASIC) designs or embedded system designs, low computational complexity and a low number of variables can decrease the chip area, thus reducing the chip cost and power consumption. Conventionally, the lossless ECG compression technique first predicts or transforms the signal to reduce the signal entropy, then entropy encodes the signal to remove redundant bits [4]. To date, various lossless ECG compression methods have been proposed, including algorithms such as data pulse code modulation (DPCM) + Golomb-Rice encoding [5], adaptive linear predictor + modified Huffman encoding [6], [7], adaptive VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ linear predictor + variable-length encoding [8], short term linear predictor + fixed-length encoding [3], and adaptive linear predictor + Golomb-Rice encoding [9]. These algorithms are relatively simple, but they do not utilize sufficient ECG morphologies hence causing poor CRs. Some other methods achieve high CRs. Among which, Miaou and Chao [10] applied the combination of distortionconstrained codebook replenishment (DCCR), set partitioning in hierarchical tree (SPIHT), and bit-plane coding (BPC) algorithms, but this method contains recursive operation and requires many variables. Zhou [11] applied k-means cluster predictor + Huffman encoding, and Tseng et al. [12] applied Takagi-Sugeno fuzzy neural network + arithmetic encoding, but they require multiply-accumulate operation many times when predicting each point. Tsai and Tsai [13] applied adaptive linear predictor + Golomb-Rice encoding, and Rzepka [14] applied selective and multichannel linear predictor + asymmetric numeral systems encoding, but they all require integer division operation hence increasing the computational complexity. Therefore, the methods of [10]- [14] are difficult to be implemented on wearable ECG monitoring devices.
This study proposes a lossless ECG compression algorithm based on dual-mode prediction with context error modeling. The features of the proposed method are: 1) Given the feature that the morphologies of the ECG change over time, we divide the signal of each heartbeat cycle into two regions. A 1 st order linear predictor and a combination of the template predictor and 3 rd order linear predictor are applied respectively to achieve high prediction accuracy. 2) A context-based error modeling module is added after the predictors to cancel the statistical bias of the prediction errors and further improve the prediction accuracy. 3) A modified Golomb-Rice encoding algorithm is designed for entropy encoding. This algorithm adaptively encodes the prediction error according to the current error amplitudes to improve the CR. Besides, it preserves a code space for packaging the information which is necessary for prediction. 4) By adjusting the size of the templates and contexts, this system can balance the number of variables and compression performance, in this way enabling compression applications from the edge to the cloud.
The remainder of this paper is organized as follows. Section II briefly introduces the database that applies to this research. Section III presents the proposed lossless ECG compression system. The proposed method is evaluated in Section IV. Section V discusses the experimental results. Finally, section VI draws the conclusion.

II. DATABASE
This paper uses the MIT-BIH Arrhythmia Database (ARRDB) to evaluate the proposed lossless ECG compression system. The ARRDB contains 48 excerpts of twochannel ambulatory ECG recordings, which are obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. 23 recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample. The recordings were digitized at 360 samples per second per channel with an 11-bit resolution over a 10 mV range, and there are in total 650,000 points in each channel. Two or more cardiologists independently annotated each record; disagreements were resolved to obtain the computer-readable reference annotations for each beat (approximately 110,000 annotations in all) included in the database [15], [16]. Figure 1 shows the block diagram of the proposed method. In the compression part, this method split each sampled ECG point into the prediction value and error value, i.e.

III. METHOD
where x [n] represents the ECG samples at the n th point,x [n] represents the prediction value which is derived from the past samples by using the dual-mode prediction and contextbased error modeling methods, and [n] represents the error value. Then, we package the error codes with the information which is necessary for prediction to form the final bitstream.
In the decompression part, we use the same prediction and error modeling methods to get the same prediction values. By adding the error values with the prediction values, the lossless reconstruction ECG can be obtained.

A. DUAL-MODE PREDICTION
The normal ECG comprises a sequence of P, Q, R, S, and T wave. Among them, the Q, R, S waves compose the QRScomplex which is the main spike seen on the ECG, while the waveform out of the QRS-complex is flatter as shown in Fig.  2.
According to the different morphologies of the ECG in different periods, we divide the ECG signal into the QRS regions and the non-QRS regions first for separate prediction. The QRS duration of a normal ECG is between 60-109ms. However, when there are some abnormalities in the ECG, such as the premature ventricular contractions (PVCs), the QRS duration may be ≥120ms [17]. To obtain relatively accurate division results for both normal and abnormal ECG, we set the length of the QRS region to 100ms and the R-peak position as the midpoint of the QRS region. So the number of sampling points in the QRS region W qrs is calculated according to where f s represents the ECG sampling rate. We apply the Rpeak detection algorithm proposed by Ieong et al. [2] to locate  the QRS region since this algorithm achieves a high R-peak detection accuracy with low computational complexity. For the flat non-QRS region, the difference between the two adjacent points is small. Therefore, we predict the current signal by using the previous sampled point, which is called 1 st order linear prediction, i.e.
wherex [n] is the prediction value of the n th point, and x [n − 1] is the sampled value of the (n − 1) th point. Koski [4] applied one template to predict the points belonging to the QRS complex since the morphologies such as the directions and the slope values of continuous heartbeats are similar. However, in general, more templates lead to higher prediction performance, because 1) Some ECG signals contain multiple QRS morphologies, as shown in Fig. 2 (b). More templates can cover more QRS morphologies. 2) Even if the morphology of two QRS-complexes is the same macroscopically, there may be some differences in details. For example, the amplitude of the 5 th R-peak is smaller than that of the 4 th R-peak in Fig. 2(a). More templates can help to reduce prediction errors. In this system, we introduce multiple templates to predict the signal in the QRS regions. However, it should be noted that more templates will increase the number of variables and computation amount, thus making it difficult to be applied to low-power design. Therefore, it is necessary to use different numbers of templates to balance the memory requirement, computation amount, and CR for different scenarios. For narrative purposes, we assume that there are N t templates in the system. So, the templates have a total of W qrs × N t variables. Since the ECG is vulnerable to the interference of the baseline drift noise, which causes some bias [18], to make the templates not affected by the signal bias, the templates only store the difference between two adjacent points, i.e.
where v is the value that will be saved to the template. Besides, on considering the fluctuation feature of the QRS, we also use the 3 rd order linear prediction as a supplementary predictor to predict the data of the QRS region more accurately when 1) there is no template for prediction in the initial stage, or 2) the templates do not contain this QRS morphology. The prediction formula is given in (5).
There are N t + 1 predictors for the QRS region, including N t template predictors and a 3 rd order linear predictor.   Block diagram of the context-based error modeling, where e is the prediction error, E e|C is the expectation of the prediction errors, which is also used as the correction value for the current context,x is the corrected prediction value, and is the prediction error after correction.
The predictor with the minimum total prediction error is used to predict this QRS region, then the difference data of the newest QRS region will be saved as a template to replace the ''Least Recently Used'' (LRU) template. Therefore, a realtime updated LRU table is required to record the order of the used templates. The detailed flow chart of the proposed dualmode prediction algorithm is shown in Fig. 3. The index of the predictor used for each QRS region will be packed into the final bitstream to use the same predictor when decompressing. To make full use of the code space, the length of the index code B idx is set according to (6).
The context-based error modeling technique that captures the statistical information of the prediction errors can be used to improve the CR [19]- [23]. This technique first classifies the prediction errors based on their contexts. Then the expectation of the prediction errors from each category is determined and this value will be added to the prediction value to correct the prediction, as shown in Fig. 4. Compared with the original prediction values, the corrected prediction values are more accurate, thus increasing the CR.
A simple but effective context classification model is based on the binarized differences of past points, i.e.
where W c is the number of points used for context classification. c (x [n]) is the category index of the n th point. Q ( x [n]) is the binarized differences of the n th point, and the formula is given in (8).
The total number of contexts is 2 W c . In general, more contexts capture more accurate statistical information, thus improving the CR. However, more context also increases the number of variables. So, it is necessary to use different numbers of contexts for different scenarios.
Memon et al. [22], Sriraam and Eswaran [23], Chua and Fang [5] also applied the context-based error modeling technique for lossless biosignal compression. They estimate the expectation of each context category by dividing the sum of the prediction errors with the occurrence counter. However, this algorithm has two drawbacks. First, it requires the ''divide'' operation, which is difficult to be implemented on the low-power ASICs or embedded systems. Second, it is sensitive to outliers, thus reducing the accuracy of the expectations. Weinberger et al. [24] designed another error modeling algorithm for image compression. It only requires the ''add'' and ''compare'' operations and reduces the influence of the outliers. Therefore, we select this algorithm for error modeling to achieve higher CRs with lower computational complexity. In this algorithm, there are 3 variables in each context category, i.e. COR, CNT , RES. COR stores the expectation and also the correction values of the prediction errors; CNT stores the occurrence number; and RES stores the sum of errors after bias cancellation. By limiting the range of RES, this algorithm reduces the influence of outliers. COR is updated by comparing CNT and RES to make the mean value of the corrected prediction errors close to ''0''. The specific pseudocode of this error modeling algorithm is shown in Fig 5. C. ENCODING Figure 6 shows the prediction errors after dual-mode prediction and context-based error modeling. It can be seen that the prediction errors fluctuates around ''0'', and the occurrence probability of small values is much higher than large values. Golomb-Rice code is quite useful to encode such prediction errors as it has optimal prefix code for this distribution [25] and it has low computational complexity.
The Golomb-Rice encoding algorithm requires mapping the integer prediction errors into the non-negative integer first, i.e.
where M [n] 2 k can be calculated by using the ''shift right'' operation in the ASIC or embedded system.
The efficiency of the code is sensitive to k value. Specifically, when k is set to log 2 M [n] , the code length of M [n] is the shortest. Considering that the prediction errors of neighbor points distribute in similar ranges, as shown in Fig. 6, we propose a k value prediction mechanism. First, we add a variable t to predict the value of M [n]. Actually, since only integer operations can be performed in low-power devices, we set t to 4 times the predicted value to improve the precision. Second, k is calculated by (11) to encode M [n]. For low power ASICs or embedded systems, log 2 t 4 can be calculated by searching the most significant bit of t. When k = 0, the code length increases significantly with the increase of M [n], thus causing a great penalty when t 4 is lower than M [n]. Therefore, we force k ≥ 1 to reduce the penalty. Third, after coding, t is updated by (12) to predict the next data. This update method gives a higher weight to the data closer to the uncoded data and improves the prediction accuracy. The initial values of t and k are set to 64 and 4 to avoid excessive code length due to the large value of the first uncoded data.
Besides, the Golomb-Rice code has covered all the coding space and made it impossible to add the R-peak position information to the code. Therefore, we add a bit ''1'' to all the unary codes with quotient ≥ 8 to reserve the code ''111111110'' as the R-peak indication code. Table. 1 shows several comparison examples between the Golomb-Rice code and the modified Golomb-Rice code. This encoding method makes the QRS code length relatively short while only increasing the code length of data with large quotient (which rarely appears) by 1 bit.

D. PACKAGING
In the packaging process, all the information needed for the decoder to reconstruct the ECG will be packaged into the final bitstream. It should be noted that the sampling frequency and resolution of the ECG sensor, the template number and context width of the proposed compression system, and the initial values of all the variables should already be known to the decoder. Therefore, the final bitstream doesn't contain this information. Figure 7 shows the packaging format. First, we package the first three signals by using the original binary codes so that the decoder can predict the following points according to the known points. After that, for each heartbeat cycle, the data is packaged in the following order to compress and decompress the signal in real-time: 1) Package the modified Golomb-Rice codes of the current non-QRS region. 2) When entering the QRS region, the R-peak indication code ''111111110'' and the prediction index for current QRS region are first packaged to inform the decoder. 3) Package the modified Golomb-Rice codes of the current QRS-region.  Table.
2 shows an example of using the proposed packaging format to package the ECG. The sampling frequency and resolution of the example are 360Hz and 11-bit respectively, which are the same as those of the ARRDB. The N t value for the compression system used in Table. 2 is set to 7. The 51 st to 86 th sampling points in this example belongs to the QRS region of cycle 0.

A. EVALUATION CRITERIA
We take all the recordings from the ARRDB as our test set. CR is used as the evaluation criteria for each ECG recording, which is calculated according to where B o is the total bit number of the original ECG recording, N p indicates the number of sampling points in the recording, and R indicates the resolution. For ARRDB, N p = 650, 000 and R = 11, So B o = 7, 150, 000.
B c is the total bit number after compressing. In this paper, B c is calculated according to (15) where B init is the bit number for initialization, N hb is the heartbeat number, B qrs is the R-peak indication code length, and B code is the overall bit length for modified Golomb-Rice code. For ARRDB, B init = 33, B qrs = 9, and B idx is calculated according to (6). The final CR is obtained by calculating the average CR of all the recordings in the database.

B. N t , W c SELECTION AND COMPRESSION RESULTS
To select N t , we remove the context-based error modeling module and set N t to 1, 3, 7, 15, 31, 63, 127, and 255 for alternatives. At this time, the total number of the predictors for the QRS region is exactly an integer power of 2, thus maximizing the utilization of the index code. Figure 8 shows the CRs obtained by using different N t . It can be seen that the CR improves significantly as N t increases from 1 to 7, and achieves the maximum value at 63. When N t continues to increase, CR begins to fall. It is because 1) when the number of templates reaches a certain level, the templates contain enough QRS morphologies that are currently required, 2) using more templates increases B idx value, thus decreasing the CR. Since the low-power ASICs or embedded systems are sensitive to the memory requirement, we select N t = 7 for low-power design. For the cloud ECG compression scenario, we select N t = 63.
To select W c , we set W c from 1 to 15 for alternatives and do experiments under the condition of N t = 7 or N t = 63. Figure 9 shows the results. It can be seen that the CR improves significantly as W c increases from 1 to 6, and achieves the maximum value at 12. As W c continues to increase, CR begins to fall. It is because that the correction values need to be updated during bias cancellation to converge to the statistical expectations. With the increase of the contexts, the points of each context become insufficient for converging,  thus reducing the performance of bias cancellation. On both considering the memory requirement and the CR, we select W c = 6 for low-power design. For the cloud ECG compression scenario, we select W c = 12.
For convenience, we use ''S'' and ''L'' to represent the proposed compressing system for low-power design (which has a small number of variables) and the cloud (which has a large number of variables). Table. 3 shows the specific variable numbers and CRs on the ARRDB by using the proposed S system or L system. Table. 4 compares the CR of the proposed system with several other existing methods. The proposed method achieves the CR second only to Miaou's method [10]. However, Miaou's method contains a codebook with a size of 1024 × 64, meaning that there are as many as 65536 variables in the codebook. Moreover, Miaou's method requires recursive operations, so that the computational complexity and the variable number of Miaou's method are higher than that of the proposed method. We have reproduced Miaou's method and modified the size of the codebook to explore the relationship between the CR and the variable number. Figure 10 shows the comparison of the proposed method and Miaou's method with different variable numbers. It can be seen that the CR of the proposed method is higher than Miaou's method when the variable numbers are similar and less than 16384. References [11]- [14] also achieve relatively high CR. However, the K-means cluster of Zhou's method [11] needs to square each point when matching templates, and the Huffman codebook in Zhou's method contains 2048 items the compress the ECG signal of the ARRDB (which has 11-bit resolution), more than the variable number of the proposed S system; the Takagi-Sugeno Fuzzy Neural Network in [12] contains many multiplyaccumulate operations when predicting each point; and both [13] and [14] need integer division operations. So the computational complexity of [11]- [14] are higher than that of the proposed method. References [3], [5]- [9] used simple prediction and entropy encoding algorithms, but the proposed S system achieves significantly higher CR than these methods so that this method can save more transmission power or storage space.

A. EFFECT OF DIFFERENT R-PEAK DETECTION ALGORITHMS ON CR
In the proposed system, the R-peak detection algorithm used for the QRS region location can be replaced by other algorithms. We select two R-peak detection algorithms, i.e. Pan-Tompkins (PT) algorithm [26] and quadratic spline wavelet transform (QSWT) algorithm [27] as examples to explore the CRs of the proposed system when combined with   other R-peak detection algorithms. Among them, the Pan-Tompkins algorithm is a popular R-peak detection algorithm proposed in 1985, whose detection accuracy is a bit lower than Ieong's algorithm. The QSWT algorithm has a bit higher detection accuracy than Ieong's algorithm. Besides, we also combine the proposed method with the annotations from the ARRDB to obtain the reference CRs. The results are shown in Table. 5. It can be seen that although the CR using the PT algorithm is the lowest, the performance decrement is not significant. The CR of the PT algorithm is at most 0.010 lower than the highest CR, which is obtained by using the annotation. The CRs of the QSWT algorithm and Ieong's algorithm only decrease 0.003 and 0.004 at most. It is because the false detected or false missed R-peaks are generally noisy or have atypical morphologies, which have similar prediction performance by using the QRS region predictor or non-QRS region predictor, as shown in Fig. 11. At present, many Rpeak detection algorithms or hardware implementations with higher detection performance than PT algorithm or QSWT algorithm have been published, such as [3], [28]- [31]. It can be inferred that using the proposed compression system FIGURE 11. (a) A piece of the ECG from the ARRDB recording 108. It contains some sharp noises enclosed in red boxes. (b) Prediction errors of (a) by using the annotations. (c) Prediction errors of (a) by using the PT algorithm. (d) Prediction errors of (a) by using the QSWT algorithm. (e) A piece of the ECG from the ARRDB recording 210. There is an atypical R-peak morphology enclosed in a red box. (f) Prediction errors of (e) by using the annotations. (g) Prediction errors of (e) by using the PT algorithm. (h) Prediction errors of (e) by using the QSWT algorithm. The green backgrounds in (a),(b),(e),(f) indicate the QRS region obtained by the annotation, and the green backgrounds in (c),(d),(g),(h) indicate the QRS region obtained by the respective R-peak detection methods. combined with these R-peak detection methods can also achieve similar CRs.

B. LIMITATIONS
Though the proposed ECG compression system achieves high CR based on simple prediction and encoding algorithms, its TABLE 5. CR of each channel on the ARRDB by using the proposed system combined with the R-peak annotations, PT algorithm and QSWT algorithm.
limitations should be recognized. First, this method needs to cooperate with an R-peak detection algorithm. Second, this method contains two arrays, QRS templates, and contexts, which occupy some storage space when compressing.

VI. CONCLUSION
In this paper, we have proposed a lossless ECG compression system based on dual-mode prediction and error modeling. The ECG signal is firstly divided into the QRS regions and the non-QRS regions. The 1 st order linear predictor is applied to predict the non-QRS region, and the combination of the 3 rd order linear predictor and QRS templates predictor is applied to predict the QRS region. After prediction, a context-based error modeling module is added to cancel the statistical bias of the prediction errors. By packaging the modified Golomb-Rice codes of prediction errors, R-peak indication codes, and QRS prediction indexes, the final compressed bitstream is obtained. The proposed system is evaluated on the ARRDB. Experiment results show that the proposed system achieves a CR from 2.975 to 3.040 with memory requirements as low as 444 to 14556 variables. The proposed ECG compression method is highly applicable to both the low-power ECG monitor and the cloud.