A New Approach for Secure Cloud-Based Electronic Health Record and its Experimental Testbed

The tremendous development of the Internet of Things (IoT) technology in the last decades has fostered advancement in automatic medical assistive devices to support the existing Electronic Health Record (EHR) system. As an integral part of the EHR electronic model, public cloud servers store medical data. Unfortunately, public cloud servers are prone to security and privacy breach. This paper introduces a novel non-cryptographic approach to preserve electrocardiograph (ECG) data confidentiality and integrity in the EHR environment. The main objective of the proposed anonymization algorithm is to obscure the patient’s cardiac information during transmission and to protect information stored in the cloud database. Although we focus on ECG data, generalization to other types of clinical data can be derived using the proposed method. Performance evaluation of the proposed scheme showed that the algorithm conceals both fiducial and non-fiducial features of the data. Therefore, confidentiality feature is preserved. This paper examined confidentiality of the anonymized data using the Percentage Residual Difference (PRD) and investigated the integrity of the reconstructed data in terms of cross-correlation. Security analysis carried out using the PRD, brute force attack, and performance comparison between the proposed algorithm and existing methods. Evaluation showed that the proposed scheme offers a secure non-cryptographic model for transmission and storing clinical data in the cloud. Moreover, in terms of processing time, the proposed algorithm is ten times faster than the existing wavelet packet method when processing long ECG data, 65,536 sample points. In a real-time experimental testbed, the implemented proposed system was successful.


I. INTRODUCTION
With the advent of Internet of Things (IoT) technology and the development of medical Internet-based applications and devices, we have witnessed remarkable progress over the last two decades [1]- [6]. For example, according to the Global Market Insights, the Electronic Health Records (EHR) market The associate editor coordinating the review of this manuscript and approving it for publication was Ilsun You . size in 2018 was valued at over 25.5 billion USD around the world and it is forecasted that the market size will become 38 billion USD in 2025 [7].
The proliferation of the current information and communication technology has changed the way medical data recorded and stored. These medical data may include several personal medical data such as: medical histories and medication and laboratory test results. Additionally, it contains vital information such as an electrocardiograph (ECG), phonocardiograph (PCG), electroencephalogram (EEG), blood pressure, respiration rate, and body temperature (the so-called human vital signs). The vital signs data were previously recorded in paper-based format. The transformation from paper-based electronic documents has gradually become a compulsory requirement to provide efficient and flexible healthcare services. EHRs have several advantages, including enabling data sharing across various stakeholders for future clinical analysis and decision making, easy data access at any time and anywhere, reduced time for locating physical records, reduced physical storage, and less medical costs [8].
Further, the widespread use of cloud computing technology for a broad range of applications is another key factor that has attributed to the fast growth of EHR in the healthcare industry. There are several features that make cloud computing different from traditional computing techniques, including on-demand self-service, broad network access, resource pooling, rapid elasticity, and measure service [9], [10]. These features make cloud services an important part of the deployment of the EHR, since they reduce costs associated with storing and maintaining data, improve accuracy of processing, and allow data to be exchanged among stakeholders.
While the cloud computing paradigm offers tremendous change to the way EHRs are implemented, it also poses particular security and privacy threats [11], [12]. Due to the loss of full control over their clinical data in this cloud network model, users are more concerned about security and privacy. The escalating threats to this cloud system may include spoofing identity, tampering with clinical data, and the disclosure of sensitive information 13]. Moreover, cloud virtualization that enables multiple users to share the same physical storage can lead to a problem of multi-tenancy, whereas malicious users may access data that belongs to other users [9], [17]. In addition to the possibility of external threats, special attention needs to be paid to internal threats from person with authority over data such as database administrators, software engineers, programmers, or probably key managers.
Several attempts have been made to prevent security breaches in the cloud-based EHR system. However, the existing security and privacy techniques do not offer adequate protection for healthcare services [13]- [15]. For example, the well-accepted methods such as attribute-based encryption (ABE), identity-based encryption (IBE), searchable encryption, symmetric key encryption (SKE), were found to be inefficient due to their computational complexity [12]. In contrast, the public key encryption (PKE) techniques have a relatively slow operation due to larger key sizes.
Recently, blockchain technology has emerged as an alternative method for protecting medical records and other sensitive healthcare-related information [16]- [19]. As the number of users of the blockchain grows linearly, it forms a growing distributed ledger of records. Each node in this ledger structure is commonly called a block. Each block is digitally signed by cryptographic hashes and validated by network peers. Due to this chain structure, blockchain can be a secure way of storing medical records [20]. Nevertheless, blockchain is still in its initial phase and lack defined standards [21].
In this paper, we introduce a new non-cryptographic scheme to secure clinical data. Although we focus on electrocardiograph (ECG) data processing, generalization to other types of clinical data such as PCG and EEG data, can be derived using the proposed method. The main objective of the proposed algorithm is to obscure patient's cardiac information during transmission of ECG data and to protect information storing in the cloud database of the EHR environment. ECG data have been used for object evaluation because they embody cardiac health information of a patient and that information is highly unique for every individual [22]. As a result of this discovery, several companies are currently exploring ECG biometric that will be used for personal attribute authentication and plan to deploy it for enterprise applications [23], [24]. The unique features, however, also make ECG data vulnerable to malicious attacks and make it crucial for any service providers that handle clinical data to provide maximum security.
The main contributions of this paper can be summarized as follows: i We introduce a practical non-cryptographic approach to preserve ECG data confidentiality and integrity in the EHR environment. It should be noted that the approach can convey any types of clinical data, not just ECG data. ii The ECG data is anonymized using signal processing algorithms, and then stored in the cloud system as a secure data. Therefore, the proposed algorithm conceals the cardiovascular features of a patient fulfilling the Health Information Protection and Privacy Act (HIPPA) regulations (1996) [25]. iii We establish a reconstruction algorithm to retrieve the original ECG data from the anonymized ECG. iv We have implemented the proposed system in a real-time testbed e.g., wireless sensor node and ECG signal processing module to evaluate the practicality and attractiveness of the algorithm to support the existing EHR system. The proposed algorithm is based on our previous development of the anonymization algorithm in [26] with significant improvement in this paper to enhance the efficiency and robustness of the algorithm. In common medical terms, anonymization refers to a process that removes personal data, e.g., name, address, postcode, so that a data subject can no longer be identified directly or indirectly [27]. In this paper, a similar term, anonymization will be used to describe an algorithm to obfuscate the structure of ECG data so that it cannot be identified without reconstruction. The term was originally used in [28].
The rest of the paper is organized as follows. In the first section, we elaborated on the crucial need for security and privacy in the cloud-based EHR system followed by main contributions of this work. Next, we will explore the VOLUME 10, 2022 existing anonymization algorithms and evaluate their performance in Section II. Section III describes the detail of the proposed algorithm including the anonymization and reconstruction methods as well as security analysis of the proposed algorithm. Evaluation of the algorithm and its experimental testbed will be carried out in Section IV and finally, conclusions will be drawn in the last section.

II. RELATED WORKS A. REVIEW ON ANONYMIZATION METHODS
Several studies for securing ECG data with anonymization techniques were widely available in the literature. In this section, we will first discuss two approaches to ECG anonymization. Both algorithms decompose the ECG data using wavelet packet techniques. Later, we will explain the fast Fourier Transform (FFT) algorithm that consumes less energy in real-time implementation.
The authors of [28] suggested using the wavelet packet method to transform the ECG data from the time domain into the frequency domain and decompose the data to subband coefficients. In this method, lower frequency components of the data were then removed and replaced with zeros that corresponded to the distorted coefficients. Subsequently, all coefficients from the ECG data were reconstructed, including those associated with the zeros in the lower subband and those associated with higher frequency coefficients, and transformed back into the time domain to yield anonymized ECG data. Here, the removal of the lower subband coefficients in the frequency domain distorted the structure of the ECG data. Thus, the anonymized ECG data strongly differed from the original. It was done purposefully in order to deform the time domain structure of the original ECG data. Furthermore, for distribution of the data across the networks, the camouflaged ECGs were sent over the Internet to a certain medical center database, while the low frequency subband was sent to an authorized person as a secret key. Finally, the original ECG was recovered by combining the secret key and the distorted ECG data on the receiver side using the reconstruction method. Careful examination in [29] showed that the method in [28] does not fully conceal the fiducial features of the ECG. The reason is that the RR-interval is still present in the anonymized ECG data and can still be identified easily; consequently, heart rate variability of a patient can be revealed using this anonymized data.
A short review about the fiducial and non-fiducial features of ECG data is as follows. The fiducial-based feature representation exploits the characteristic points on the ECG data to reveal amplitude, distance, envelope, slope, time/interval, and area features. The characteristic points are referring to the locations of the peaks and boundaries of P, Q, R, S, and T waves on the ECG waveform. In contrast, the nonfiducial-based feature representation extracts the distinctive information within the ECG by way of the autocorrelation coefficient, Fourier coefficient, and wavelet coefficients.
As a result of the anonymization, it was shown in [28] that the waveforms of the anonymized ECG data were very similar to the original ECG data. Therefore, malicious users might be able to recognize the anonymized data without any difficulties and use it for their purpose. Fortunately, despite the drawback conceived in the algorithm, it showed some significant results. For example, the examination in the paper revealed that the size of the secret key achieved only 5.8% of the original ECG data size. Secondly, the algorithm established the use of compression and encryption techniques to secure the secret key before distribution.
Subsequently, an algorithm that was introduced in [29] showed significant performance improvement over the previous work by using a slightly different approach. The security algorithm made use of the generalized wavelet packet method to decompose the ECG data in several subbands encompassing low-frequency components to highfrequency components. In contrast to the work in [28], the paper claimed that the proposed ECG anonymization method could successfully obfuscate intrinsic features such as fiducial and non-fiducial features. The algorithm had thoroughly examined over normal and abnormal ECG data.
The algorithm in [29] removed the lower subband of ECG data points and treated them as a secret key. This secret key was distributed separately to a medical center server whereas the anonymized ECG data was transformed to time domain and stored in a public server. At the receiver end, only authorized persons with the ability to access the secret key and the reversible function would have the authority to reconstruct the original ECG from the anonymized ECG. The performance of the proposed framework had been evaluated using fiducial features such as the cross-correlation analysis, power spectral density, and the percentage residual difference (PRD). The paper argued that the reconstructed ECG data was highly correlated with the original one. In other words, the proposed method has successfully achieved a lossless reconstruction of the ECG data and proved its robustness. However, the real-time practicality of both works in [28] and [29] have not been examined and tested in a hardware testbed.
The use of the wavelet transforms to manipulate the ECG data as in the described papers leads to the increment of overall computation complexity. This is mainly due to the ECG data decomposition and reconstruction processes with the help of the wavelet-packet algorithm that consumes almost 90% of all anonymization processing time. A study by way of computer simulation in [26] showed that replacing the wavelet decomposition and reconstruction procedures using the fast Fourier transform (FFT) method could achieve the speed of processing 5 times faster than the wavelet-packet based algorithm. For example, this study showed that the proposed method could anonymize ECG data with a length of 16,384 points in 3 ms only, while in contrast to that, the wavelet-packet transform as in [29] required approximately 33 ms to complete the whole anonymization process using the same evaluated ECG data. Therefore, the proposed ECG security method in [26] could be considered as the most suitable algorithm (compared to the existing ones) for implementation of the whole set of systems in the IoT environment, where some constraints like power and computation limitations could be substantial factors.

B. EVALUATION ON THE EXISTING METHODS
In this sub-section, the anonymization of ECG data based on the FFT described in [26] will be investigated thoroughly. The investigation mainly focuses on the weaknesses of the algorithm in order to find an appropriate solution that will mitigate the vulnerability that exists.
Assume a time-domain ECG data sequence, x[n], represented by a mathematical form {x[n]: n = 1, 2, . . . , N }, where N is the ECG sequence length. This ECG sequence is then transformed into its frequency domain using the fast Fourier transform (FFT) algorithm. The mathematical form of the frequency domain signal is expressed as {X [k]: k = 1, 2, . . . , N -1}. The frequency domain form is written in capital letters as is typical of many literary works. Fig. 1 depicts the block diagram of the anonymization algorithm developed in [26]. ECG data anonymization approach [26], [29].
The most important part of this method is the partition of the frequency domain ECG signal into two subbands, i.e., the low-frequency subband and the high-frequency subband as described in (1), In (1), X 1 [0, . . . , P] represents the low-frequency component of the ECG data and X 2 [P + 1, . . . , Q] represents the high-frequency component.
The structure of the ECG data was then modified using the two steps. Firstly, the algorithm removed X 1 [0, . . . , P] from X [k] and called it a secret key, κ[k]. This secret key was subsequently encrypted, compressed, and sent as a secure key to an authorized healthcare provider database (see Fig. 2). Secondly, the algorithm manipulated the high-frequency component, X 2 [P+1, . . . , Q] with a reversible function, [k] according to (2), where X 2 [k] and [k] are an element-wise multiplication and [k] is a vector of where the offset value was defined as offset = |min(κ)|+η and element η is a constant value to prevent division by zero. The anonymized signal, x 2 [n], was finally produced after taking It should be noted that the element-wise multiplication in (2) can be replaced by an element-wise division. This is because both multiplication and division inherit the same computational complexity notated by O(n 2 ).
In its implementation, the anonymized signal, x 2 [n], shown in Fig. 2 was transmitted and stored in a public cloud database whereas K was stored separately in the healthcare provider server. Therefore, a potential attacker who is somehow able to get access to this public database can only see the anonymized ECG data, not the original data. To retrieve the original ECG data on the receiver side, an authorized medical doctor/specialist performed a reconstruction method using the secret key taken from a healthcare provider's server and the anonymized signal taken from a public cloud database. VOLUME 10, 2022 Careful examination of (2) shows that the element-wise multiplication of vector X 2 [k] by a vector [k] requires both vectors to have the same size. However, we see in (3) that the size of vector [k] is in fact smaller than the size of vector X 2 [k]. Therefore, the vector [k] is mandatorily repeated several times until it has the same size as X 2 [k] to maintain the valid operation of (2). Our further studies showed that the modification of the X 2 [k] by way of element-wise multiplication or division of the repetitive vector [k] has eventually produced an obvious vulnerability for the anonymized ECG data itself. This defect is clearly shown in Fig. 3 whereas the power spectral density peaks appear every 125 Hz for both element-wise multiplication and element-wise division operations. Fig. 3 shows the frequency response in terms of power spectral density of the anonymized ECG data (i.e., patient245, signal s0474 from PTB database) with sampling frequency f s = 1, 000Hz. The frequency response of the ECG in Fig. 3a has been modified using an element-wise multiplication of vector X 2 [k] and vector [k] as shown in (2). On the other hand, Fig. 3b depicts the frequency response of the anonymized ECG data that has been modified using elementwise division. The two figures obviously show repetition of the magnitude peak and subband of the frequency response every 125 Hz period. Although signal repetition in Fig. 3b is more subtle than the one in Fig. 3a., careful examination shows that there is a clear individual peak started at every 125 Hz span (i.e., 125 Hz, 250 Hz, 375 Hz) followed by several ripples.
Frequency domain recurrence in Fig. 3 directly associates with the repetition of the vector [k] as a result of the operation in (2) and (3). Therefore, a malicious attacker can potentially detect the structure of vector [k] and use it to exploit the ECG anonymization due to this vulnerability. An attacker who has access to the secret key, can easily interpret and reconstruct the anonymized ECG data by simply repeating the secret key every 125 Hz period followed by a reverse function operation of the vector [k] and vector X 2 [k], subsequently.

III. THE PROPOSED METHOD
Modification of the existing anonymization algorithm will be presented in this sub-section. Besides removing frequency domain repetition, the new algorithm should comply with sensor devices limitation, such as low energy consumption and low computational processing. Therefore, vector operation and other mathematical operations in the proposed algorithm were set to obey such behavior while at the same time improve the security of data transmission. Fig. 4 displays a block diagram of the proposed method.

A. THE PROPOSED ANONYMIZATION ALGORITHM
Assume ECG data sequence in the form of {x[n]: n = 0, . . . , N -1}. In the first step of the algorithm, we apply the FFT to the ECG data sequence to obtain the frequency domain representation that is represented by Secondly, the frequency domain segmentation takes place after transformation from time to frequency domain in the first step. At this stage, we split the frequency domain vector, X [k], into two subbands, i.e., X 1 [k] and X 2 [k]. The first subband, X 1 [k] represents low-frequency components of the ECG data whereas X 2 [k] signifies high-frequency components. Segmentation of this frequency domain ECG data can be seen in (4), To eliminate the recurrence behavior of the existing algorithm as shown in Fig. 3, we employ individual operation for each segment in the vector X 2 [k] without repeating vector [k]. Suppose the segmentation of X 2 [k] is represented by (5), where R is the number of segments or sub-subbands created from X 2 [k]. Next, considerẌ [k] as a Root Mean Square (RMS) of the ECG data computed in the frequency domain and denoted as in (6).Ẍ The RMS represents the overall energy level contained in the ECG data across a frequency range and will be used to define the offset values in the anonymization process. Utilization of RMS as in (6) will guarantee non-zero values for the offset as the total energy is preserved in the signal. Nevertheless, several experiments showed that this RMS value requires a scaling operation to obtain acceptable offset values for the anonymization process. Therefore, the offset values are defined according to (7), The element η in (7) is a constant to prevent division by zero in both the anonymization and reconstruction processes.
Subsequently, the vector [k] can now be redefined as in (8) where κ[k] is a secret key which has been defined above as We define the core of the anonymization process as an element-wise multiplication between X 2.r [k] and r [k] as shown in (9).
Element-wise multiplications in (9) serve as a reversible operation in the anonymization procedure. Consequently, it will become a division betweenX 2.r and r [k] in the reconstruction process to retrieve the original data back.
The individual segment multiplication in (9) results in the modification of each subband of the ECG data where the secret key performs as a modifier. After the multiplication of each segment, the process continues to the next step that merges each segment into one single vector, i.e. X 2 [k]. It becomes: In the final step, the inverse fast Fourier transform (IFFT) algorithm performs its action to transform the modified frequency domain ECG data, X 2 [k], into its time-domain data as in (11). The result is the anonymized ECG data, Fig. 4 shows that the proposed algorithm produces two sets of vectors, i.e., the secret key, κ[k] stacked with the offset, and the anonymized data, x 2 [n]. It is advisable that for security purposes the two vectors should be stored in different locations/cloud systems. Furthermore, the key security will be achieved by encrypting the secret key and the offset vectors according to (12), where E is the encryption operator and is a collection of offset defined in (13), = {offset 1 , offset 2 , · · · , offset r : r = 1, 2, · · · , R} . (13) After encryption, the K can be securely distributed to the healthcare providers' storage while at the same time the anonymized data stored in the public cloud server. See Fig. 1 for clarity. Both data are stored with a unique identification number and specific metadata for each ECG data.

B. THE PROPOSED RECONSTRUCTION ALGORITHM
An expert or doctor resided in a particular healthcare provider needs to employ a reconstruction algorithm to retrieve the original ECG data based on the anonymized ECG stored in the digital storage. Here we elaborate on the reconstruction algorithm as a backward procedure for the previous anonymization algorithm.
On the first step, the device on an expert or a doctor side requires to download both the encrypted secret key and from the healthcare provider storage and the anonymized ECG from the public cloud server. It is then followed by decryption of the secret key and according to (14), where D (.) is the decryption operator. Secondly, the algorithm transforms the anonymized ECG, x 2 [n], from the time domain to frequency domain representation using the well-known FFT algorithm. As a result, we get the segment X 2 [k] that contains the modified frequency domain ECG as defined in (10).
In order to retrieve the high frequency subband ECG data, X 2 [k], it is compulsory to do a reverse operation of X 2 [k] shown in (9) towards the vector r [k]. Therefore, when vector operation in (9) performs element-wise multiplication, the reconstruction algorithm should enforce element-wise division as shown in (15). Conversely, when vector operation in (9) takes element-wise division, the reconstruction algorithm in (15) should impose an element-wise multiplication operation.
The last procedure in the reconstruction algorithm is combining the reconstructed vector X 2 [k] and the secret key κ[k]= X 1 [k] to becomeX [k] as shown in (16), Subsequently, an inverse FFT operation should be applied to (16) to obtain the estimation of the time domain original ECG data,x[n].

C. SECURITY ANALYSIS
Within the EHR system, we include security aspects that emphasize confidentiality and integrity in our proposed algorithm to ensure that the ECG data is secure throughout the cycle. Confidentiality ensures protection of the ECG data from being exposed to unauthorized individuals or parties whereas integrity refers to protection of the data from being modified intentionally or unintentionally.
The confidentiality aspect is shown by relationship between the original and the anonymized data in terms of the Percentage Residual Difference (PRD). The PRD has been used by several papers [26], [29] to quantify the distinction between the original ECG data and the anonymized ECG data. It is defined according to where x[i] denotes the original ECG data, x 2 [i] is the anonymized ECG data, and i = 1, . . . , N , N is the total number of samples in the ECG data. A careful examination of formula (17) will reveal that PRD = 0 when the two time series are identical. Therefore, PRD > 0 measures the degree of distortion among the two data sequences.
The PRD of the ECG data with identification number bs10089603 as in Fig. 6 for different values of η is shown in Table 1. Our experiment shows that the PRDs are larger than 0 for all η. Based on this table, it can be interpreted that the anonymized ECG is significantly different from the original ECG data. The malicious attackers who forcefully gain access to this anonymized data from the public cloud servers will find it nearly impossible to unravel the fiducial features of the ECG data. According to this analysis, the proposed algorithm protects the confidentiality of the data well. Despite confidentiality, brute force attacks are commonly used to compromise data. The current strong computing power enables this attack to guess passwords or encryption keys by guessing combinations of characters.
In our proposed scheme, the secret key, κ[k], comprises of analog numbers with certain length. For example, a key length of 1024 contains 1024 floating-point numbers representing the X 1 [k] vector as in (4). Computers normally comply with the IEEE 754 standard to adequately store this secret key into the single precision (32 bit) or double precision (64 bit) digital form. Consequently, for the case of the single precision format and a key length of 1024, a secret key in our model can be represented by 32.768 bits length in the IEEE 754 standard. Therefore, it may take 2 32.768 combinations to brute force the secret key.
We examine processing time for brute forcing the secret key with a key length of 1024. However, due to limitation in our computer, we considered a brute force simulation by injecting as many as 10 9 key combinations (it approximately equals to 2 30 combinations) out of 2 32.768 combinations that ran for several types of Microprocessors, i.e., Intel Core i7 memory 16GB, Intel Core i7 memory 8GB, and Intel Core i5 memory 12 GB. Fig. 5 shows that the slowest processor took approximately 1976.2 s (32.94 minutes), whereas the fastest processor spent approximately 1031.2 s (17.2 minutes) to complete the simulation. Therefore, assuming the processing time grows linearly with the number of trials for brute force attack, it will take approximately 1031.2 × 2 32.738 s for the fastest computer in the study to attempt all combinations. In other words, it is extremely difficult to accomplish this task using today's fastest processor within acceptable time.
To conclude the security analysis of the proposed algorithm, Table 2 presents performance comparison study among the existing methods for electronic healthcare cloud security and the proposed algorithm in terms of their characteristics, strength, and weakness.

IV. RESULTS AND DISCUSSIONS
This section provides an evaluation of the proposed algorithm for securing ECG data both in transmission and storing in the cloud database. The first evaluation includes the algorithm ability to obscure fiducial and non-fiducial features and its ability to maintain ECG data integrity. We then compare the processing time of the algorithm with the similar function of the existing methods, i.e., the wavelet packet-based and the FFT-based algorithms. Secondly, we will elaborate on a real-time implementation of the proposed algorithm in a small device. The hardware implementation of the proposed algorithm serves as an experimental testbed that observes its applicability in the future.

A. RESULTS AND ANALYSIS
Simulations and comparisons with existing methods were used to observe the performance of the proposed algorithm. Both time domain representation and frequency domain representation of the ECG data were evaluated in terms of power spectral density (PSD).
The ECG data were collected from a real-time measurement of 32 healthy volunteers with careful supervision from a medical doctor (volunteers' mean age is 23 ± 2 years). The process of capturing ECG signals in this study was a non-invasive process, thus it did not involve any harmful procedure. However, all volunteers consented before the signal was retrieved. Each segment of the ECG data has 10 seconds duration with a sampling frequency of 1000 Hz. Therefore, the number of samples in each segment is 10,000 samples. To ensure the broad application of the proposed algorithm, we also applied the algorithm to anonymize ECG data obtained from the publicly available PTB database [30].
In this evaluation, the size of the secret key as in (1) was set to P = 1, 024, associated with a frequency range of 0 Hz to 125 Hz. Based on application of (4) to the ECG data, the frequencies between 0 Hz and 125 Hz are removed and retained for use as the secret key. Meanwhile, the frequency components larger than 125 Hz will be anonymized and uploaded to a public cloud server after being transformed into its time-domain data. In the anonymization phase, the ECG data was adjusted using (7) to (9) with a constant value of η = 0.3. This value was chosen to provide moderate PRD shown in Table 1. Fig. 6 shows a time-domain representation of original ECG data from a subject with identification number: bs10089603 and its anonymized ECG. It can be seen in Fig. 6b that the algorithm has removed fiducial features of the original ECG wave, i.e., the P, Q, R, S, and T peaks. The peaks have been concealed by high fluctuations of signals. Fig. 7 (a) and (b) exhibit frequency domain representation in terms of Welch's power spectral density (PSD) estimation for both the original ECG and the anonymized data, respectively. Both diagrams show the PSD in dB/Hz as a function of frequency ranging from 0 Hz to one-half the sampling rate, i.e., 500 Hz. It can be seen in Fig. 6b that the non-fiducial features of the ECG data have been removed utterly by the anonymization process of the proposed algorithm. For example, repetition of the peaks and subbands displayed in Fig. 3 has completely disappeared. Additionally, multiplication in (9) gives an effect on leveraging the magnitude of the PSD of all frequencies above 125Hz. Therefore, modification in the frequency domain is directly associated with the presence of the high-frequency components in the anonymized ECG as shown in Fig. 6b. Fig. 6b also infers that the original ECG data cannot be interpreted properly after being anonymized. The anonymized data appear to be random and unstructured. If a malicious attacker successfully penetrates a cloud system to acquire the anonymized ECG data, he/she cannot elucidate the structure of the anonymized ECG data without reconstructing it. The anonymized data bears meaningless information in the absence of the reconstruction algorithm and the secret key. Due to the secret key being stored in a different database system, e.g., in the healthcare provider's server, it will be difficult for attackers to access both servers. Therefore, the proposed scheme guarantees a secure and robust model for protecting patient's ECG data. Fig. 8a shows the reconstructed ECG data and Fig. 8b depicts the cross-correlation between the original ECG data and the reconstructed data. The cross-correlation at lag 0 is 1 that clearly indicates the reconstructed signal is exactly the same as the original ECG. It proves that the proposed reconstruction algorithm explained in III.B works remarkably well to retrieve the original ECG data without losing any information. In this way, the proposed algorithm guarantee integrity of patient's ECG data during transmission and storing in the public cloud database. Fig. 9 shows the anonymization process of the ECG data when the algorithm was executed using element-wise division in (9). Fig. 9a reveals that the amplitude of the anonymized signal is relatively small due to the element-wise division. However, it can also be noticed that the fiducial structures VOLUME 10, 2022   of the ECG data are successfully disguised by the algorithm. For example, the R-to-R peaks of the ECG data can not be detected from the figure. Fig. 9b shows the frequency response of the anonymized ECG and Fig. 9c shows the reconstructed data. A strong correlation between the original and the reconstructed ECG is indicated by a value of 1 at lag 0 in Fig. 9d. Moreover, a comparison between Fig. 9a and Fig. 6b concludes that element-wise division operation in (9) generally produces anonymized ECG data with lower average amplitude than the operation of element-wise multiplication.
To observe the computational complexity of the algorithm, we examined the processing time between the proposed algorithm and the existing anonymization schemes [26], [29]. We have used the ECG data that were obtained from the PhysioNet PTB Database [30] (i.e., patient180, signal s0475). Fig. 10 draws a comparison graph of the processing time (ms) as a function of ECG data sequence length ( 2 log(N )) among several anonymization methods. The figure clearly shows that the processing time of the proposed algorithm is comparable  to the FFT-based anonymization algorithm [26]. However, when ECG data sequence length is increasing up to N = 2 16 = 65, 536 data, the processing time of the proposed algorithm is slightly longer than the anonymization algorithm in [26]. This is true because the algorithm requires time to compute the RMS in (5) and the offset values in (7) whereas the FFT-based algorithm does not undergo those processes.
In contrast, the wavelet packet-based anonymization algorithm spends the longest processing time among others. For example, the wavelet-based algorithm has approximately 10 times slower process than the proposed algorithm for processing with 65,536 points of ECG data. Surprisingly, for processing with N = 2 12 = 4, 096 data, the proposed algorithm VOLUME 10, 2022 FIGURE 10. Processing time of the proposed algorithm for several key lengths compared to the existing anonymization algorithms including (FFT-based) in [26] and wavelet packet-based algorithm in [29]. performed 150 times faster than the wavelet packet-based algorithm processing time. The detail of the averaged processing time over 1000 trials for all schemes is shown in Table 3.

B. EXPERIMENTAL TESTBED
To conclude this section, we describe the experimental testbed development of the proposed scheme. The ECG data from an observed person were captured using a three-electrode ECG sensor that was directly connected to an integrated single-lead ECG signal conditioning module, i.e., AD8232 as shown in Fig. 11 whereas the circuit diagram connection between the AD8232 and the microcontroller unit is illustrated in Fig. 12. The AD8232 module provides high signal gain (G = 100) with dc blocking capabilities. It is also equipped with a two-pole adjustable high pass filter for reducing motion artifacts, a three-pole adjustable low pass filter for eliminating any additional noise, and a right leg drive (RLD) amplifier. The RLD is commonly used to eliminate interference noise from any biological signals.
Volunteers were positioned in the seated posture as shown in Fig. 13 with the ECG electrode clamps positioned at the left arm, right arm, and right leg to obtain single-lead ECG data. Referring to Fig. 12, the output pin of the AD8232 module was connected to the analog input of a microcontroller for analog to digital conversion. After this process, the microcontroller transmitted the data to the ECG security processing module. In this paper, we made use of the Arduino Uno microcontroller board that has an embedded 10-bit analog to digital conversion (ADC) module which converted analog input from the AD8232 directly to digital form. At this point, the single-lead ECG signal could be directly monitored using a display module attached to the board.
The ECG security processor in Fig. 11 performs the anonymization algorithm. In terms of hardware specification, this processor can be anything ranging from single-board computers like a Raspberry Pi or a dedicated computer/server that is able to receive multiple connections from several sensor nodes. Due to this reason, we decided to separate the sensor node and the ECG security processing. The scheme allows the ECG security processor to execute the proposed algorithm for several ECG signals from different sensor nodes. At the same time, it can maintain mobility activities of the sensor nodes when each of the sensor nodes is equipped with wireless connection capabilities like Bluetooth or WiFi. As part of our experiment, we utilized a Raspberry Pi 3 single-board computer with a processor speed of 1.2GHz and Random-Access Memory (RAM) of 1GB, which is more than enough to run the anonymization algorithm. Using Raspberry Pi has another advantage in our case since it has onboard Bluetooth and 802.11 wireless LAN connections, which should support scalability and mobility for the nodes in the future.
The ECG security processor produces the anonymized ECG data as in (11), a sequence of the secret key in (8), and an offset vector, , in (13). Because the secret key and the offset vector were left unprotected by the algorithm, we instructed the processor to run the AES-256 encryption to encrypt the secret key and the vector altogether.
Finally, the gateway received the anonymized ECG data, the encrypted secret key, and the offset vector as shown in the rightmost of Fig. 11. Upon receiving these data, the gateway transmitted the anonymized ECG data to a public cloud server and at the same time transmitted the encrypted secret key and offset vector to the healthcare provider server as shown in Fig. 1.
On the receiver side, an authorized medical doctor or expert equipped with a mobile application could display and observe the reconstructed ECG data after reconstruction. Reconstruction's backend process is described as follows. Firstly, the healthcare provider's server performed decryption of both the secret key and offset vector. Secondly, the server downloaded the anonymized data from public cloud database. Finally, it executed the reconstruction algorithm for the anonymized data using the decrypted secret key and offset vector. Fig. 14 presents a mobile phone display of ECG data from a subject with identification number ss25029503. The left part of the figure displays the anonymized ECG data taken from the public cloud server without reconstruction. This figure is an example of ECG data that was being exposed by an attacker that was somehow able to break through the cloud server security and got access to the data. The figure shows that the mobile phone displays a mere noisy like data instead of the expected ECG data. Only the authorized medical doctor or expert who has the secret key and the offset vector (stored in the healthcare provider server) can     successfully obtain, reconstruct, and display the original ECG data. The reconstructed ECG data is shown in the right part of Fig. 14.
To conclude the evaluation, we measured averaged processing time for the anonymization and reconstruction algorithms independently to acquire clear figure of the algorithm time response in a real environment. Table 4 shows the averaged processing time (over 100 trials) that appears to be constant against variation of the key length. The results are in agreement with the results of our previous examination in Table 3. Due to the efficacy of the FFT and iFFT algorithms used in the proposed model, the single-board computer would not be able to detect small differences in processing times between key lengths. Furthermore, the table reveals that the anonymization and reconstruction phases consume similar amount of time. The only difference is that the anonymization algorithms was performed in a single-board computer (see the ECG security processor in Fig. 11) whereas the reconstruction algorithms was executed in the healthcare provider's server.

V. CONCLUSION
This paper established a non-cryptographic approach to preserve ECG data confidentiality and integrity in the EHR environment. The proposed algorithm has been shown to perform well to preserve confidentiality of patient's cardiac information (both fiducial and non-fiducial features) during transmission of ECG data and to protect information storing in the cloud database. We have examined confidentiality of the anonymized data using the PRD. In order to prove the integrity of the reconstructed ECG data, we have evaluated both the original and the reconstructed data in terms of the cross-correlation. On the other hand, security analysis has been carried out using PRD, brute force attack, and performance comparison between the proposed algorithm and the existing methods. Based on this evaluation, it can be concluded that the proposed scheme offers a secure and robust model for protecting patient's privacy in both transmission and storing patient's clinical data in the cloud. Additionally, performance evaluation in terms of processing time proved that the proposed algorithm approximately 10 times faster than the existing wavelet packet-based algorithm for processing with 65,536 points of ECG data and it approximately 150 times faster for processing with 4,096 points of data. Finally, the experimental testbed has been accomplished to show that the proposed scheme works well to support the existing EHR system. Evaluation on the processing time of both the anonymization and reconstruction algorithms shows advantages of the proposed model in a way that the processing speed is not affected by variation of the key length. He was appointed as an Associate Professor by Shantou University, in 2019. He is the part of the research and teaching faculty with the Department of Biomedical Engineering, Shantou University. He has eight years of teaching experience and has taught various courses in electrical engineering, including signal processing in biomedical engineering with the Department of Electrical and Electronic Technology, Lincoln College International, Saudi Arabia. Prior to this, he was a Research Fellow with RMIT University. He has 11 years of professional working experience at Digital Signals Division of Future Fibre Technologies (FFT) Ltd., Melbourne. As the Leader of the research team at FFT, he developed a variety of innovative algorithms and identification technologies for FFT's optical fiber intrusion detection system. His work on machine learning for perimeter intrusion detection system has led to three international patents. He has been involved in a number of research projects, has a number of patents, published more than 20 journal articles which have been cited more than 40 times, and published one book and three book chapters. He is also serving in the Editorial Board for the Journal of the Computer Science (JCS) and a reviewer for a number of IEEE and Elsevier journals.
ROY LAURENS (Member, IEEE) received the B.S. degree in computer science from Sekolah Tinggi Teknik Surabaya, Indonesia, in 1996, and the M.S. degree in information network from Carnegie-Mellon University, USA, in 1998. He is currently pursuing the Ph.D. degree in computer science with the University of Central Florida, USA. He is also a Distance-Learning Lecturer with the Department of Computer Engineering, Dinamika University, Surabaya. His research interests include payment card fraud and network security.
Since 2012, he has been the System Architect for the payment card fraud prevention at Sola Fide, Inc., in Tampa, FL, USA. He is also the director of the several non-profit organizations in the Central FL area. He has a U.S. patent for his work in mobile and wireless environment. His research interests include automating fraud detection using both rule-based machine learning and anomaly detection to identify fraudulent transaction while reducing customer friction.
MUSLEH ALSULAMI received the B.Sc. degree in computer science from Imam University, Saudi Arabia, in 2004, the M.Sc. degree in information technology, in 2010, and the Ph.D. degree in information systems from Monash University, Australia, in 2017. He is currently an Assistant Professor in information systems with Umm Al-Qura University. His current research interests include enterprise resources planning (ERP), cloud ERP, ERP life cycle, ERP implementation stakeholders' conflicts, digital transformation in government organization, software quality, and human-computer interaction.
QIANG FANG (Member, IEEE) received the B.S. degree in applied physics form Tsinghua University, Beijing, China, in 1991, and the Ph.D. degree in biomedical engineering from Monash University, Melbourne, Australia, in 2000. He is currently the Founding Chair of the Biomedical Engineering Department and a Full Professor with Shantou University, China. Before, he moved to Shantou University, in 2017, he was a permanent Academic Staff of RMIT University, Melbourne, for 15 years. His research interests include intelligent and miniaturized medical instrumentation, wearable and implantable body sensor networks, and pervasive computing technologies. VOLUME 10, 2022