Robust Peak Detection for Holter ECGs by Self-Organized Operational Neural Networks

Although numerous R-peak detectors have been proposed in the literature, their robustness and performance levels may significantly deteriorate in low-quality and noisy signals acquired from mobile electrocardiogram (ECG) sensors, such as Holter monitors. Recently, this issue has been addressed by deep 1-D convolutional neural networks (CNNs) that have achieved state-of-the-art performance levels in Holter monitors; however, they pose a high complexity level that requires special parallelized hardware setup for real-time processing. On the other hand, their performance deteriorates when a compact network configuration is used instead. This is an expected outcome as recent studies have demonstrated that the learning performance of CNNs is limited due to their strictly homogenous configuration with the sole linear neuron model. In this study, to further boost the peak detection performance along with an elegant computational efficiency, we propose 1-D Self-Organized ONNs (Self-ONNs) with generative neurons. The most crucial advantage of 1-D Self-ONNs over the ONNs is their self-organization capability that voids the need to search for the best operator set per neuron since each generative neuron has the ability to create the optimal operator during training. The experimental results over the China Physiological Signal Challenge-2020 (CPSC) dataset with more than one million ECG beats show that the proposed 1-D Self-ONNs can significantly surpass the state-of-the-art deep CNN with less computational complexity. Results demonstrate that the proposed solution achieves a 99.10% F1-score, 99.79% sensitivity, and 98.42% positive predictivity in the CPSC dataset, which is the best R-peak detection performance ever achieved.


I. INTRODUCTION
A N ELECTROCARDIOGRAM (ECG) acquires the heartbeat sequence in time displaying the electrical depolarization-repolarization patterns of the heart.ECG signal forms itself in the QRS complexes and ventricular beats and it bears essential information about the status of the heart.Among many other tools, ECG is still the most significant noninvasive tool for cardiac monitoring and clinical diagnosis.R-peak detection is the primary operation that usually precedes any kind of ECG analysis, such as ECG beat classification and cardiac arrhythmia detection [1]- [21].Conventional Holter monitors and the recent introduction of low-cost and lowpower mobile ECG sensors present a significant motive and challenge for robust and real-time detection of the R-peak locations.Especially, robustness is a key issue since it was reported in a recent study [22] that the R-peak detection performance can severely deteriorate when the ECG acquisition is poor and corrupted by a high level of noise.
Particularly for clinical ECG recordings, numerous R-peak detection algorithms have been proposed in the literature.One of the first and the most widely used algorithm was proposed by Pan and Tompkins [23], which has served as the benchmark method for more than three decades.Afterward, several other popular methods based on signal processing have emerged, such as wavelet transform [24], [25], Hilbert transforms [26], and ensemble empirical mode decomposition [27].Some hybrid methods that consist of traditional signal processing and machine learning have followed, e.g., R-peak detection by radial basis functions (RBFs) [28] and hidden Markov models (HMMs) [29].The common approach in those classical detectors was to perform R-peak enhancement first by using signal processing techniques, such as filter banks and spectral analysis, and then applying the peak detection.A crucial advantage of such methods is that they are very fast.However, they are all designed for clinical ECG recordings with a clean, almost noise-free signal.All of them were evaluated on the benchmark Massachusetts Institute of Technology-Beth Israel Hospital (MIT-BIH) arrhythmia dataset [35] or other similar datasets with high-quality clinical ECG recordings.Their performance level significantly deteriorates when the ECG signal quality is poor [22], and thus, they are not suitable for low-power mobile ECG sensors.In fact, a proper evaluation constitutes a major problem in general because even This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/Fig. 1.Typical Holter ECG segments from the record of patient 6 in the CPSC dataset.Red circles represent the R-peaks detected by the Pan-Tompkins method [23] where several FPs and FNs are visible.
though there are few public datasets [30], [31] containing some noisy ECG signals with ground-truth R-peak locations, they are limited in size and duration.This problem has become the main bottleneck even for recent and modern peak detectors based on deep learning paradigms, including long short-term memory (LSTM) networks [32], and convolutional neural networks (CNNs) [33], [34].Both approaches in [32] and [33] aimed to improve the robustness but only against the artificial (additive) noise.For this purpose, they have only induced certain additive noise, such as baseline wander and motion artifacts, to ECG records in the MIT-BIH dataset and reported a reasonable level of robustness.However, such artificial noise addition does not represent the actual degradations that occur in a Holter monitor where besides the severe and varying noise levels and occasional glitches, the baseline dc level varies drastically and frequently along with the dynamic range of QRS complexes, as shown in Fig. 1.Another issue in such evaluation over the MIT-BIH dataset is that this benchmark dataset does not contain a sufficient number of beats and variations where a deep network can truly be tested.This brings the danger of a certain bias and overfitting the results.
Recently, the study in [34] has proposed a novel method based on deep 1-D CNNs that has actually been applied over a real Holter ECG repository, the China Physiological Signal Challenge (2020) dataset (CPSC) with more than 1 million ECG beats.This method has been tested over the entire dataset and achieved the state-of-the-art performance level for R-peak detection with a significant margin.However, it still suffers two major drawbacks: 1) 12-layer deep model poses a significant computational complexity, and 2) both falsepositives (FPs) and false-negatives (FNs) are still too high, especially on the arrhythmia beats.In this study, we address both issues with a novel network model.
According to the recent studies [42]- [50], the main drawback of the conventional multilayer perceptrons (MLPs) and their derivatives, CNNs, is that they both depend on the ancient neuron model (McCulloch-Pitts) [41].It is neuroscientific fact that the mammalian neural systems are highly heterogeneous and consist of diverse (nonlinear) neuron types with specialized electrophysiological and biochemical properties [42], [43].This linear neuron model is only a crude and simplified model of its biological counterpart.As a consequence, MLPs and CNNs with an entirely homogenous network configuration with linear neurons are capable of learning for relatively simple and linearly separable problems; however, they entirely fail to do so whenever the solution space of the problem is highly nonlinear and complex [42]- [50].To address this major deficiency, generalized operational perceptrons (GOPs) [42]- [49] and later on operational neural networks (ONNs) [50] have been proposed.Both models are heterogeneous with any nonlinear neuron model, and this gives them an elegant diversity level to learn highly complex and multimodal functions or spaces with minimal network complexity and training data.The operational neurons in GOPs and ONNs mimic the biological neurons with, nodal (corresponding to the synaptic connections) and pool (corresponding to the integration in the soma) operators.An "operator set" is the combination of nodal, pool, and activation operators, and the operator set library is formed in advance to store all potential operator sets.However, ONNs can only achieve a limited heterogeneity level since one operator set has to be assigned to all neurons of each hidden layer.Furthermore, they are bound to the limited number of operators in the operator set library.The latter can be a serious bottleneck on the learning performance when the right operator set needed for the learning problem at hand is missing in the library.Finally, ONNs pose a high computational complexity especially during the training because the right operator sets have to be searched in advance with several backpropagation (BP) runs.
In this study, in order to address the aforementioned issues and drawbacks, we propose 1-D self-organized ONNs (Self-ONNs) with the generative neuron model for robust R-peak detection for the Holter ECG in real-time.The generative neuron model enables the self-organization capability for Self-ONNs [51]- [55] where the nodal operators are iteratively generated during the BP training to maximize the learning performance.Obviously, the ability to create any nonlinear nodal operator enables superior operational diversity and flexibility.Therefore, Self-ONNs neither need an operator set library in advance, nor require any prior search process to find the optimal nodal operator.The 2-D Self-ONNs have been proposed in recent studies [50] and [54] have shown that even with a few neurons they can achieve a superior learning performance for various image processing and regression tasks while the performance gap between ONNs and CNNs widens further.Hence, in this study, our primary goal is to achieve a superior R-peak detection performance compared with the deep 1-D CNNs [34] while reducing the network complexity and depth significantly for a real-time application.Besides [34], we shall also perform comparative evaluations against the earlier state-of-the-art methods [23], [56]- [59] to accomplish an overall validation.As a summary, we can enlist the novel and significant contributions of this article as follows.
1) This is the first study where 1-D Self-ONNs1 have even been proposed for ECG peak detection evaluated over the largest ECG benchmark dataset the CPSC (2020) dataset with more than one million ECG beats.2) Thanks to the heterogeneous network model with generative neurons, 1-D Self-ONNs exhibit a superior learning capability that achieved the state-of-the-art peak detection performance even though it has half the network depth and more than four times fewer neurons.
3) The most important contribution of the proposed approach is the crucial reduction achieved on the missed (false-negatives) for arrhythmia beats compared with [34] and other major peak detection methods.4) Along with superior R-peak detection performance compared with the deep 1-D CNNs [34], the proposed 1-D Self-ONNs enable a significant reduction in the depth and complexity of the deep CNN model proposed in [34].5) Finally, this is the first article where the raw-vectorized BP formulations are presented for 1-D Self-ONNs along with the corresponding computational complexity analysis.The rest of the article is organized as follows.Section II presents 1-D Self-ONNs with generative neurons and formulates the forward-propagation (FP) and BP training.Section III outlines the methodology followed in the paper.Section IV describes the ECG datasets and presents the experimental setup used for testing and evaluation.The experimental results and comparative evaluations using standard performance metrics against several state-of-the-art techniques are provided, followed by a detailed complexity analysis, in the same section.Finally, Section V concludes the paper and suggests topics for future research.

II. 1-D SELF-ORGANIZED OPERATIONAL NEURAL NETWORKS
In this section, we will proceed by revisiting how ONNs generalize the 1-D convolution operation.Then, the mathematical model of the proposed generative neuron-based 1-D Self-ONN will be presented.To conclude, a simplification of the generative neuron will be discussed which can significantly reduce the computational cost by enabling the use of fast vectorized operations.
ONNs are derived from the GOPs in the same way CNNs are derived from MLPs with two restrictions: limited connectivity and weight sharing.GOPs have been proposed in [42] and [45] to replace the basic (linear) neuron model from the 1950s (McCulloch-Pitts) [41] aiming to address the wellknown limitations and drawbacks of MLPs.Recently, GOPs have outperformed not only MLPs but even the latest variants of extreme learning machines (ELMs) [46]- [49].Derived directly from GOPs, ONNs [50] are heterogeneous networks encapsulating neurons with linear and nonlinear operators, hence, carrying a closer link to biological systems.In brief, ONNs extend the sole usage of linear convolutions in the convolutional neurons by the nodal and pool operators.
Let us consider the case of the kth neuron in the lth layer of a 1-D CNN.For the sake of brevity, we assume the same convolution operation with unit stride and the required amount of zero padding.The output of this neuron can be formulated as follows: where b l k is the bias associated with this neuron and x l ik is defined as Here, w ik ∈ R K is the kernel connecting the i th neuron of (l−1)th layer to the kth neuron of the lth layer, while x l ik ∈ R M is the input map, and y l−1 i ∈ R M are the lth and (l−1)th layers' kth and i th neurons' outputs, respectively.By definition, the convolution operation of (2) can be expressed as The core idea behind an operational neuron is a generalization of the earlier as follows: where ψ k l (•) : R M×K → R K and P l k (•) : R K → R 1 are termed as nodal and pool functions, respectively, and assigned to the kth neuron of lth layer.In a heterogenous ONN configuration, every neuron has uniquely assigned ψ and P operators.Owing to this, an ONN network enjoys the flexibility of incorporating any nonlinear transformation, which is suitable for the given learning problem.However, hand-crafting a suitable library of possible operators and searching for an optimal one for each neuron in a network introduces a significant overhead, which rises exponentially with increasing network complexity.Moreover, it is also possible that the ideal operator for the given learning problem cannot be expressed in terms of wellknown functions.To resolve this key limitation, a composite nodal function is required that is iteratively created and tuned during BP.A straightforward choice for accomplishing this would be to use a weighted combination of all operators in the operator set library and learn the weights during training.However, such a formulation would be susceptible to instability issues because of the different dynamic ranges of individual functions.In addition, it would still rely on the manual selection of suitable functions to populate the operator set library.Therefore, to formulate a nodal transformation that does not require any preselection and manual assignment of operators, we make use of the Taylor-series-based function approximation.
The Taylor series expansion of an infinitely differentiable function f (x) near a point x = a is given as The Qth order truncated approximation of ( 5), formally known as the Taylor polynomial, takes the form of the following finite summation: The above-mentioned formulation enables the approximation of any function f (x) sufficiently well in the close vicinity of a.If the coefficients ( f (n) (a)/n!) are tuned and the inputs are bounded, the formulation of ( 6) can be used to generate any transformation.This is the key idea behind the generative neurons which form Self-ONNs.Specifically, in terms of the notation used in (4), the nodal transformation of a generative neuron would take the following general form: In (7), Q is a hyperparameter that controls the degree of the Taylor series approximation, and w l(Q) ik is a learnable kernel of the network.A key difference in (7) as compared with the convolutional (3) and operational (4) model is that ψ l k is not fixed, rather a distinct operator over each individual output, y l−1 i , and thus, requires Q times more parameters.Therefore, the K × 1 kernel vector w l ik has been replaced by a The input map of the generative neuron, x l ik can now be expressed as During training, as w l(Q) ik is iteratively tuned by the BP, customized nodal transformation functions will be generated as a result of ( 8), which would be uniquely tailored for ikth connection.This enables enhanced flexibility which provides three key benefits.First, the need for manually defining a list of suitable nodal operators and searching for the optimal operator for each neuron connection is naturally alleviated.Second, the heterogeneity is not limited to each neuron connection i → k but down to each kernel element as As show in Fig. 2, such diversity is not achievable even with the flexible operational neuron model of ONNs.Third, in generative neurons, the heterogeneity is driven only by the values of the weights w l(Q) ik and the core operations (multiplication and summation) are the same for all neurons in a layer, as shown in (8).Owing to this, unlike ONNs, the generative neurons inside a Self-ONN layer can be parallelized much more efficiently, which leads to a considerable reduction in computational complexity and time.Moreover, a special case of (8) can also be expressed in terms of the widely applicable convolutional model.

A. Representation in Terms of Convolution
If the pooling operator P l k is fixed to summation operator, x l ik is then defined as Exploiting the commutativity of the summation operations in (9), we can alternatively write Using (1) and ( 2), the formula in (10) can be further simplified as follows: Hence, the formulation can be accomplished by applying Q 1-D convolution operations.If Q is set to 1, (11) entails the convolutional formulation of (3).Therefore, as CNN is a subset of ONN corresponding to a specific operator set, it is also a special case of Self-ONN with Q = 1 for all neurons.

B. Vectorized Notation
Expressing explicit loops in terms of matrix and vector manipulations is a key idea behind vectorization, which is a major driving factor behind fast implementations of modernday neural network implementations.In this section, we first introduce how the vectorized notation can be used to express the 1-D convolution operation inside a neuron.Afterward, the same key principles will be exploited to express the generative neuron formulation of (9) as a single matrix-vector product.
First, an alternate formulation of the operation of ( 3) is now presented.We introduce a transformation δ(•, K ), which concatenates y l−1 i such that values inside each K -dimensional kernel as rows to form a matrix Y l−1 i ∈ R M×K .The process is visually depicted in Fig. 3 for K = 3, and mathematically expressed in (12) Second, we construct a matrix W l ik ∈ R M×K whose rows are repeated copies of We now consider the Hadamard product of these two matrices Applying the summation operation across rows, we get which is equivalent to (3).We also note that Therefore, Hence, the 1-D convolution operation can be represented in terms of a single matrix-vector product.This operation lies at the heart of conventional explicit general matrix multiplications (GEMM)-based convolution implementations and enables efficient usage of parallel computational resources, such as GPU cores.

C. Forward Propagation Through a 1-D Self-ONN Neuron
Equation (11) shows how the Self-ONN formulation of (10) can be represented as a summation of Q individual convolutional operations.Moreover, from (12), a convolutional operation can be represented as a matrix-vector product.We now use these two formulations to represent the transformation of (11) as a single convolution operation, and consequently, a single matrix-vector product, instead of Q-separate ones.
We start by introducing where •n is the Hadamard exponentiation operator.The mth row of Y l−1 (Q) i can be expressed as Moreover, we construct along the row dimension, as expressed in the following: Taking the Hadamard product of Summation of the earlier yields This is equivalent to (10).Therefore, one can now express In addition, using (24), we can write Finally, from ( 25) and ( 26), we can simply infer that The formulation of ( 27) provides a key computational benefit, as the forward propagation through the generative neuron is accomplished using a single-matrix-vector multiplication.Hence, in theory, if the computational cost and memory requirement of constructing matrices is considered negligible, the complexity of a convolutional neuron is approximately the same as that of the generative neuron, as both can be accomplished by a single-matrix-vector product.Finally, to complete the forward propagation, using (1), we can express

D. Backpropagation
We now proceed to derive the BP formulation for the generative neuron model of 1-D Self-ONN by utilizing the vectorized notation introduced in Section II-C.To backpropagate the error through the generative neuron, given the derivative of the loss with respect to the neuron's output, d L/d x l ik , we aim to define d L/dy l−1 i , d L/dw l(Q) ik , and d L/db l k .We start by taking the derivative of ( 27) w.r.t.Y l−1 (Q) i as follows: Using (29), we can now apply the chain rule to get Given ), we aim to find the derivative of the loss w.r.t. to the previous layer's output We know from (19) that Using this, we can write Finally, we can calculate the derivative of the loss w.r.t.y l−1 i as follows: . (34) From ( 34) and ( 12), we can notice that (dY l−1 i (m)/dy l−1 i (m)) will be equal to 1 only when the condition m ≤ m ≤ (m + K − 1) is met, and 0, otherwise.Moreover, as there are no repeating entries in each row of Y l−1 i , only one element of (dY l−1 i (m)/dy l−1 i (m)) can be nonzero and the location of this nonzero element is given by mod(m, K ).Based on these two observations, we can infer the following: The only other partial derivative needed for completing the BP is the of the loss w.r.t. the weights of the neuron w l(Q) ik .Again, by the chain rule, we can write where (d ) can be calculated by taking the derivative of ( 27) w.r.t.
w l(Q) ik as follows: For the bias, we can use (28) to write Finally, assuming a stochastic gradient descent (SGD)-based optimization, the weights and biases can be updated as follows: where (t) is the learning factor at iteration t.

III. METHODOLOGY
In this study, we are aiming to achieve a superior R-peak detection performance compared with the state-of-the-art method in [34] based on a 12-layer 1-D CNN with 448 neurons while reducing the network complexity and depth significantly.Therefore, as shown in Fig. 4, for a fair comparison with [34], the same R-peak approach is followed.However, the architectural complexity of the network is reduced by using a six-layer 1-D Self-ONN with less than 100 neurons.In order to perform fair comparative evaluations against [34], the same network model of [34], the UNet is used for Self-ONNs.As shown in Fig. 4, the peak detection problem is converted to a regression task, which aims to learn the peak locations by transforming the original (normalized) ECG segment into a sequence of five-sample wide pulses.The center of each pulse corresponds to the R-peak location.In order to evaluate its effect, the order of the Taylor polynomials, Q, is varied in {3,5,7} over three networks.Each 20-s (8000 samples) ECG segment is linearly normalized in the range, [−1, 1], and then used as input for the 1-D Self-ONNs.The same optimizer (Adam) is used to train the network with the same number of epochs and hyperparameters.The details of the experimental setup and network parameters will be presented in Section IV.

IV. EXPERIMENTAL RESULTS
In this section, we will first introduce the benchmark ECG dataset, CPSC 2020, used in this study, and then present the experimental setup used for testing and evaluation of the proposed R-peak detector using 1-D Self-ONNs.An extensive set of ECG classification experiments and comparative evaluations against the recent methods over the benchmark CPSC-DB Holter dataset will be presented next.Finally, the computational complexity analysis of 1-D Self-ONNs and 1-D CNNs will be reported in detail.

A. China Physiological Signal Challenge-2020
The CPSC-DB dataset consists of ten single-lead ECG recordings which are collected from arrhythmia patients, each of the recordings lasts for about 24 h.Table I presents the patient information in detail [41].The other properties of the CPSC-DB (Holter) ECG dataset can be summarized as follows: All ECG data were acquired by a unified wearable ECG device with a sampling frequency of 400 Hz and the total number of beats is 1 026 095.The recordings include irregular heart rhythms and supraventricular premature beats (SPBs or S beats) and premature ventricular contraction (PVC or V) type beats.All recordings are provided in MATLAB format with corresponding S and V beats annotations.R-peak annotations for each ECG cycle were annotated by a team of biomedical researchers.To show the robustness of the R-peak detector against noise and other artifacts, CPSC-DB presents a real-world Holter dataset containing numerous ECG containments and artifacts.

B. Experimental Setup
For all experiments, as in [34], shallow training is employed where the number of BP epochs is limited to 50.We set the learning rate, (0), as 10 −3 and an Adam optimizer is used for minimizing the binary cross-entropy loss (BCE).We performed three individual BP runs for training over each patient's data, and for comparative evaluations, we report the best detection performance.
The comparative evaluations of the proposed R-peak detector based on 1-D Self-ONNs are carried out against the following two detectors with 1-D CNNs: 1) the deep 1-D CNN from [34], and 2) 1-D CNN with the same network configuration as 1-D Self-ONN.Moreover, comparative evaluations are also performed against five earlier state-of-theart classifiers from the literature [23], [56] II.The (performance) loss (%) in FN (or FP or other "miss" metrics) of a method X compared with the method with the minimum FN can be defined as follows: It is clear that 1-D Self-ONNs achieved a significant performance gap when compared with the 1-D CNNs with the same number of learning units (neurons) and depth.Overall, Self-ONNs with Q = 3 and Q = 5 reduced the FPs and FNs on peak detection by more than 43% and 64%, respectively.Even when compared with the state-of-the-art 1-D CNN configuration in [34] with twice the depth and more than four times the number of neurons, the detection errors, FPs and FNs, were reduced by more than 7% and 37%, respectively.Such a substantial reduction especially on FNs over both 1-D CNN configurations shows that 1-D Self-ONNs can indeed accomplish a superior learning capability to detect the actual peaks.The performance loss is significantly higher for the earlier methods [23], [56]- [59].Finally, the results indicate that the best performances are obtained by 1-D Self-ONNs with either Q = 3 or Q = 5.We have found that the former setting is a better choice because of the higher F1 score, significantly lower FPs, and lower computational cost.In peak detection, detecting the arrhythmia, i.e., supraventricular premature (S) and premature ventricular contraction (V) beats is crucial since R-peak detection is the prior operation to an automated ECG beat classification and arrhythmia detection.Obviously, this aim cannot be fulfilled if the peak detector fails to detect an abnormal beat.For this purpose, the focus is then drawn on the peak detection performance over the arrhythmia (S and V) beats.As shown in Table III, once again 1-D Self-ONNs have shown superiority for detecting both S and V beats over both 1-D CNN configurations.This time even the deep 1-D CNN causes over 52% and 58% more misdetections, respectively, and again, the gap further widens over the 1-D CNN with the same configuration.Finally, the loss of the earlier methods [23], [56]-[59] has peaked above 90%, which shows that those methods are not robust at all for the Holter ECG.
As for visual comparison, over the four ECG segments, Fig. 5 shows R-peak detection results of 1-D Self-ONNs and the two 1-D CNN configurations.Such typical visual results clearly show that both 1-D CNN models yield numerous FPs and FNs especially when the R-peak is corrupted with high noise or some abrupt change occurring in the close vicinity, e.g., abrupt shifts on the baseline or occasional voltage glitches.Especially, the deep CNN model [34] missed both arrhythmic peak locations in Fig. 5(d) which is a substantial error.

D. Computational Complexity
In this section, we provide the formulation for calculating the total number of multiply-accumulate operations (MACs) and the total number of parameters (PARs) of a generative neuron of a 1-D Self-ONN.To calculate the number of trainable parameters, we recall from Section II that, for each kernel connection, the generative neuron has Q times more learnable parameters.Cumulatively, the number of trainable parameters, n l k , of the kth neuron of lth layer is given by the following formulation: In (43), N l−1 is the number of neurons in layer l −1, K l k is the kernel size used in the neuron, and Q l k is the approximation order selected for this neuron.Finally, to calculate the total number of MAC operations, one can note from (26) that to produce a single element in the output x l ik , we require K l k * Q l k MAC operations for each output map y l−1 i of the previous where | • | is the cardinality operator.For notational convenience, the bias term and the cost of the Hadamard exponentiation are omitted from (44).We implemented the proposed 1-D Self-ONN using Python and FastONN [61] library, based on PyTorch [62].All the experiments reported in this article were run on a 2.2-GHz Intel Core i7-8750H with 16 GB of RAM and an NVIDIA GeForce GTX 1060 graphic card.Both training and evaluation of the classifier were processed by CUDA kernels.Along with the average time complexity, using the formulations in (43) and (44), we provide the overall PARs and MACs for both network models in Table IV.As shown in Table IV, a significant gap occurs between 1-D Self-ONN and deep CNN models in terms of the total number of parameters and average computation time.The Self-ONN network requires the most number of multiply-accumulate operations.However, a higher percentage of these operations are independent, and thus, parallelizable.Therefore, an efficient implementation using the formulation of ( 27) results in an actual running time that is less than the deep CNN and only marginally (3.1%) higher than the equivalent 1-D CNN, which is negligible considering the crucial gain in the detection performance.optimized PyTorch implementation of Self-ONNs is publicly shared in [63].

V. CONCLUSION
In this study, 1-D Self-ONNs are proposed for R-peak especially for poor quality ECG signals, e.g., acquired by Holter monitors low-power mobile ECG sensors.The primary goal is to achieve the state-of-the-art R-peak detection performance with an elegant computational efficiency for a real-time application over such low-power ECG devices.As a new-generation network model, a Self-ONN is a highly heterogeneous network composed of generative neurons.This yields a crucial advantage of optimizing the nodal operator function of each kernel element, and thus, Self-ONNs can achieve an utmost heterogeneity that maximizes the network diversity and the learning performance.As a result, the traditional weight optimization of conventional CNNs is entirely turned into an operator generation process via optimization.Despite its highly nonlinear kernel elements, each Self-ONN layer can still be implemented by a single 1-D convolution, and this allows a parallelized implementation similar to the one for conventional CNNs.
We performed tenfold comparative evaluations over the benchmark CPSC dataset with more than 1M beats.Against the current state-of-the-art method proposed in [34] with a 12-layer CNN, 1-D Self-ONNs significantly reduced both FPs and FNs even though it has half the depth and more than four times fewer neurons.Against the 1-D CNN with the equivalent configuration, the performance gap further widens.The most crucial advantage is that 1-D Self-ONNs can reduce more than 52% and 58% of the overall misdetections of the S and V arrhythmia beats, respectively, compared with the deep CNNs.Finally, the 1-D Self-ONN model used in this study presents a superior computational efficiency with respect to the deep 1-D CNNs, and thus, especially for lowpower, mobile devices, such as Holter monitors, the proposed approach can conveniently be used as an R-peak detector in real-time.The optimized implementation of the proposed peak detector and the benchmark CPSC dataset is shared in [64].

Fig. 2 .
Fig. 2.Visual comparison of different nodal transformation profiles entailed by the kernel of a convolutional, operational, and the proposed generative neuron of order Q.The generative neuron model enables enhanced nonlinearity and heterogeneity within the kernels.

Fig. 3 .
Fig. 3. Reshuffling operation used to convert y l−1 i to Y l−1 i .
-[59].R-peak detection experiments are performed over the CSPC dataset.Each classifier is trained over nine patients and tested over the (unseen) patient.Therefore, tenfold cross validation is performed for comparative evaluations over all ten patients with overall 1 026 095 beats.Over the cumulated TP, FP, and FN counters, and the standard performance metrics, Precision or Positive Predictivity ( Ppr), Recall or Sensitivity (Sen), and F1-score, which is the harmonic mean of the model's Precision and Recall are computed.The true-negative (TN) is omitted in peak detection (TN = 0).The expressions of these performance metrics using the hit/miss counters, e.g., TP, TN, FP, and FN, are as follows: Ppr = TP TP + FP , Sen = TP TP + FN , F1 = 2PprSen Ppr + Sen .(41) C. Peak Detection Performance Evaluation R-peak detection results over the entire CSPC dataset are presented in Table

Fig. 5 .
Fig. 5. (a)-(d) Visualization of R-peak detection results over four ECG segments.For each segment, (top to bottom) rows show the ground truth, the output of the 12-layer CNN ([34]), the output of the 6-layer CNN version of [34], and the output of the proposed 1-D Self-ONN.Typical FPs and FNs are visible over the detections of both 1-D CNN models.Ground truth peak locations are shown on the top plot where red and black dashed lines correspond to normal and arrhythmic peak locations, respectively.

TABLE I PATIENT
INFORMATION ON THE ECG DATA FROM CPSC-DB

TABLE II OVERALL
PEAK DETECTION PERFORMANCE OF THE CLASSIFIERS.1-D CNN ( * ) AND 1-D SELF-ONN HAVE THE SAME CONFIGURATION.THE BEST RESULTS ARE PRESENTED IN Bold TABLE III PEAK DETECTION PERFORMANCE OF THE CLASSIFIERS OVER THE ARRHYTHMIA (S AND V) BEATS.1-D CNN ( * ) AND 1-D SELF-ONN HAVE THE SAME CONFIGURATION.THE BEST RESULTS ARE PRESENTED IN Bold

TABLE IV NETWORK
MODELS AND THEIR COMPUTATIONAL COMPLEXITIES.THE AVERAGE TIME CORRESPONDS TO THE TIME TO DETECT R-PEAK LOCATIONS OF A 20-s ECG SEGMENT layer.Generalizing this, we can write the following: