Shock Decision Algorithms for Automated External Defibrillators Based on Convolutional Networks

Automated External Defibrillators (AED) incorporate a shock decision algorithm that analyzes the patient’s electrocardiogram (EKG), allowing lay persons to provide life saving defibrillation therapy to out-of-hospital cardiac arrest (OHCA) patients. The most accurate shock decision algorithms are based on deep learning, but these algorithms have not been trained and tested using OHCA data. In this study we propose novel deep learning architectures for shock decision algorithms based on convolutional and residual networks. EKG electronic recordings from a cohort of 852 OHCA cases (4216 AED EKG analyses) were used in the study. EKGs were annotated by a pool of six expert clinicians resulting in 3718 nonshockable and 498 shockable EKGs. Data were partitioned patient wise in a stratified way to train and test the models using 10-fold cross validation, and the procedure was repeated 100 times for statistical evaluation. Performance was assessed using sensitivity (shockable), specificity (non-shockable) and accuracy, and the analysis was conducted for EKG segments of decreasing duration. The best model had median (interdecile range) accuracies of 98.6 (98.5–98.7)%, 98.4 (98.2–98.6)%, 98.2 (97.9–98.4)%, and 97.6 (97.4–97.8)%, for 4, 3, 2 and 1 second EKG segments, respectively. The minimum 90% sensitivity and 95% specificity requirements established by the American Heart Association for shock decision algorithms were met, and the best model presented significantly greater accuracy (p<0.05 McNemar test) than previous deep learning solutions for all segment durations. Moreover, the first AHA compliant shock decision algorithm using 1-s segments was demonstrated. This should contribute to a combined optimization of defibrillation and cardiopulmonary resuscitation therapy to improve OHCA survival.


I. INTRODUCTION
Cardiac arrest is the unexpected sudden cessation of the cardiac function, and occurs mostly in a pre-hospital setting. Out-of-hospital cardiac arrest (OHCA) constitutes a major global health problem. Only in the US one thousand OHCA events are estimated to occur daily, with survival rates around 10% [1]. Two therapies are key for OHCA survival: defibrillation, to restore the normal function of the heart; and The associate editor coordinating the review of this manuscript and approving it for publication was Carmen C. Y. Poon. cardiopulmonary resuscitation (CPR), to generate an artificial blood flow and deliver oxygen to the vital organs when the heart cannot be defibrillated [2]. Electrical defibrillation can be provided by non-medical staff through automated external defibrillators (AEDs), which are equipped with a shock decision algorithm that automatically interprets the patient's electrocardiogram (EKG). These algorithms must have a high sensitivity (Se) to detect shockable heart rhythms, i.e. malignant ventricular arrhythmia like ventricular fibrillation (VF) and tachycardia (VT). The specificity (Sp) must also be high to avoid inappropriate shocks that may deteriorate the rhythm FIGURE 1. EKG samples from the OHCA study dataset. These samples illustrate the variability in waveforms and EKG characteristics of VF, ORG and AS rhythms. VF varies from coarse in amplitude and dominant frequency (top) to almost AS like waveforms (bottom). ORG rhythms can present well defined and narrow QRS complexes at normal heart-rates (top) but also aberrant QRS waveforms (bottom). Typically AS is characterized by a flat line (top), but it can present some activity in the form of isolated heartbeats, tremors and disorganized low amplitude activity that can be confused with VF. and cause myocardial damage [3]. To guarantee a safe and efficient use of the device, the American Heart Association (AHA) establishes that the Se and Sp of these algorithms must be above 90% and 95%, respectively, when tested on OHCA data [3]. In addition, CPR must be interrupted to analyze the patient's EKG because chest compression artifacts in the EKG may confound the algorithm [4]. These time periods without chest compressions (no-flow intervals) for AED rhythm analysis can take from 5 s to 30 s [5], and have a negative impact on OHCA survival [6]. Therefore, there is a need to develop accurate shock decision algorithms capable of interpreting very short EKG segments to minimize interruptions in CPR [7], [8].
Research on shock decision algorithms has been framed traditionally as a VF detection problem [9]. Initial advances included analyses from EKG signal processing experts in the time, frequency and time-frequency domains [10]- [12], from which classification features were proposed and heuristic decision algorithms designed [13]. Later, machine learning algorithms like support vector machines or ensemble methods were introduced [14]- [16]. These approaches effectively combined the systematic and comprehensive extraction of EKG features [17], and the selection of the optimal feature subsets for VF detection [18]. Over the last years, deep learning has superseded traditional machine learning in many physiological signal analysis realms [19], [20]. This includes various EKG applications [21], ranging from heartbeat classification [22] to the detection of arrhythmia like atrial fibrillation [23], or even general cardiologist-level arrhythmia classification [24]. Recently, deep learning methods for VF detection have been described, either using fully convolutional neural networks (CNN) [25], CNNs in combination with ensemble methods [26], or CNNs mixed with recurrent networks to identify the time dependencies in the data [27].
One of the caveats of deep learning solutions is the need for large annotated datasets to adjust the thousands or even millions of trainable network parameters. OHCA data with quality controlled rhythm annotations is scarce [17]. Consequently, most machine and deep learning solutions for VF detection have been demonstrated using Holter EKG data, recorded at the onset of the arrest, and available from public repositories like the MIT-BIH database [14], [15], [28]. However, the performance of shock decision algorithms degrades when trained/tested using OHCA data from defibrillators [18], [27], which are generally recorded minutes after the onset of the arrest. As the arrest progresses myocardial perfusion deteriorates [29], and so does the electrical activity of the heart. VF waveform characteristics like amplitude, dominant frequency and waveform complexity decrease over time [30]. Nonshockable rhythms with organized electrical activity (ORG), present lower heart-rates and more aberrant heartbeat waveforms (longer QRS complex durations) [31], [32]. Finally asystole (AS), the absence of electrical activity, becomes prevalent [31], [33], and the recommendation is not to shock AS (Sp > 95%) and to resume CPR immediately [2], [3]. Fig 1 shows some examples extracted from the study dataset that illustrate the variability in waveform morphology in OHCA rhythms.
There are few studies on shock decision algorithms using OHCA data gathered from defibrillators [7], [8], [18]. Among the VF detection algorithms based on deep learning, only one study included OHCA data [27], and all the studies excluded AS from their datasets [25]- [27]. But as shown in Fig. 1 the EKG may present electrical activity during AS that can be confused with a shockable rhythm, leading to an electrical shock that would worsen the prognosis of the patient. Moreover, AHA compliant algorithms using very short EKG segments (less than 3 seconds) have not been demonstrated, and they could be of importance to optimize defibrillation/CPR therapies. This study covers those knowledge gaps. First, it is supported on a large dataset of OHCA rhythms that includes AS, which was obtained using AEDs during treatment and annotated by a pool of expert clinicians. Second, the dataset was used to develop and test optimized convolutional network architectures for shock/no-shock classification capable of analyzing the rhythm using EKG segments as short as 1-s.

II. MATERIALS
The study dataset was collected between June 2013 and December 2015 by the Emergency Medical Services (EMS) of the Basque Country. It comprises OHCA cases treated by emergency medical technicians working in Basic Life Support (BLS) ambulances [34]. The Basque EMS serves a 2.2 million population with an estimated annual incidence of 39 EMS treated cases per 100,000 inhabitants [35]. Around 60% of the cases treated by the BLS ambulances in the study period were included, totaling 852 defibrillator files obtained from Lifepak 1000 AEDs (Stryker, Kalamazoo MI, US). The electronic data comprised the defibrillator messages, with treatment information like AED analysis intervals, and the signals recorded from the defibrillator pads: the EKG to analyze the heart rhythm, and the thorax impedance to monitor CPR activity. The Lifepak 1000 AED records the EKG with a sampling frequency of F s = 125 Hz, a resolution of 4.8 µV per least significant bit, and a typical AED bandwidth of 0.5 − 21 Hz [36]. This narrow EKG bandwidth, typical of AEDs, ensures very low levels of the main EKG noise sources such as baseline wander or power line interference.
Defibrillator data were converted from a proprietary file format to an open Matlab (Mathworks, Natick MA, US) format, and a custom tool was prepared to annotate the rhythm in the analysis periods of the AED. Six clinicians specialized in OHCA treatment blindly revised the EKG and annotated the rhythm. The clinicians adhered to the following rhythm definitions [37], [38]: VF (disorganized ventricular rhythms with amplitudes above 200 µV) and VT (regular ventricular rhythms with rates above 150 min −1 ) in the shockable class; and ORG (rhythms with visible QRS complexes and heart rates above 12 min −1 ), and AS (rhythms with peak-to-peak amplitudes below 100 µV or heart rates below 12 min −1 ) in the nonshockable class. The majority vote from the pool of six clinicians was adopted as the final ground truth rhythm annotation. The final composition of the annotated dataset is shown in Table 1. A total of 4216 AED analysis intervals were reviewed, from which 498 were shockable and 3718 nonshockable. The inter-rater agreement measured using the Fleiss kappa score (κ) [39] was excellent among the six clinicians, with κ = 0.895. Moreover, the sub-pool of three clinicians with highest agreement had a κ = 0.955. The quality control of the annotations guaranteed robust ground truth shock/no-shock labels for the development and evaluation of the deep learning classification models.
The AED analysis intervals were extended to cover the full EKG interval without artifacts, that is from cessation of CPR to resumption of CPR or defibrillation. If rhythm transitions occurred the interval was extended up to those transitions, to ensure a unique rhythm per interval. The intervals included time delays to push the rhythm analysis button, pre-shock charging, and delays in resumption of CPR which typically occur during OHCA treatment. The median (interquartile range, IQR) duration of the extended intervals was 11.8 (9.1 -16.6) s for all rhythms, 20.1 (17.6 -22.5) s for shockable and 11.2 (8.9 -14.6) s for nonshockable rhythms. These intervals were then divided into non-overlapping segments of 512 (4.096 s), 384 (3.072 s), 256 (2.048 s) and 128 (1.024 s) samples. Fig. 2 shows how an AED analysis interval was first extended and then divided into segments of different duration. In what follows these segments will be denoted by their approximate durations: 4 s, 3 s, 2 s and 1 s. Table 1 lists the number and proportions of segments grouped into shock/no-shock categories and further into rhythm types.

A. CONVOLUTIONAL NETWORKS
For classification each segment can be represented as (x i , y i ), where x i ∈ R 1×D is the signal segment of D samples, and Example of the extraction of the non-overlapping EKG segments from the extended analysis interval. The figure shows a shock delivered to a patient in VF. First CPR is stopped to analyze the rhythm (start of extended analysis interval), then after approximately 4 s the rescuer pushes the AED analysis button and VF is identified, then the AED is charged and finally the shock is delivered (end of extended analysis segment). After the shock CPR is resumed. y i ∈ {0, 1} is the class label inherited from the analysis interval (0 ≡ no-shock, and 1 ≡ shock). The data were partitioned into train and test (see section III-B) to optimize the convolutional models, and to obtain the performance metrics, respectively. Two architectures were studied (see Fig. 3), a CNN network and a residual network (ResNet).

1) CNN ARCHITECTURE
The CNN architecture consists of four blocks, each comprised of a convolutional layer, a batch normalization layer (BN), a max-pooling layer, and a rectified linear unit (ReLU) non-linear activation layer. The convolutional layers linearly transform an input X ∈ R M ×D , consisting of M signal channels of D samples to produce an output Y ∈ R N ×D . The input is convolved with a set of N filters W n ∈ R M ×L , where L is the filter size, and then shifted channel-wise by a bias b n . The individual elements y n, d of Y can be obtained as where the filter weights w n m, l and biases b n are learnable parameters, adjusted during training. In our network, all convolutional layers had a filter size L of 16 and an increasing number of filters N = {8, 16, 32, 64}. Input channels were zero-padded symmetrically so D = D, and bias terms were removed. The BN layers adjust the output of the preceding layer so that the data distribution forwarded to the next block does not depend on complex cross-layer weight interactions. This lessens the need for fine parameter tuning and weight initialization, and speeds up training by allowing the use of larger learning rates [40]. It also improves generalization, reducing overfitting. For an input batch I with data samples i ∈ {1, . . . , I }, a BN layer computes the channel-wise means µ I, n and variances σ 2 I, n and normalizes each channel by: with a small value added for numerical stability. The normalized channels are then scaled and shifted to make the most use of the following non-linear activation. The outputs z b n, d are thus given as where γ n and β n are trainable parameters. After training, channel-wise means and variances were computed for the full training set, and then used during test to normalize the data. Max-pooling layers downsample input data by selecting the largest element in blocks of K elements along the time-dimension d, such that: where K = 2 for all blocks in our architecture. This reduces the computational burden of the network by increasing the FIGURE 3. Architecture of the convolutional networks designed for the shock decision algorithms. The left-most architecture is a fully CNN architecture, and the right most one is the ResNet architecture (with its expanded residual block to the right).
field of view of the filters without increasing their size. Finally, the ReLU layers introduce nonlinearity in the network through the activation function f (x) = max{0, x}, which allows learning complex nonlinear mappings. The output of the last convolutional block (N = 64 filters) was flattened to produce a feature vector The feature vector was input to a dense network composed of two fully connected (FC) layers with 10 neurons (ReLU activation) and 1 neuron (sigmoid activation for classification), respectively, to produce the binary shock/no-shock decision.

2) ResNet ARCHITECTURE
The second architecture we studied was a residual network or ResNet [41], which addresses the problem of performance degradation as layers are stacked and the depth of the network increases. Given H(x), the transformation of a series of stacked layers to the input data x, residual network design argues that it is easier to map the residual transformation F(x) := H(x) − x. This is achieved in practice by enabling a secondary shortcut path which directly connects the input x to the main path's output. An element-wise addition of both paths is then performed for an effective mapping of Our ResNet architecture (See Fig. 3b) was designed to mimic that of the CNN, increasing the network's depth while maintaining a recognizable structure. Each convolutional block was replaced by two residual blocks, preserving spatial length D and channel depths N (see Fig. 3b). For each residual block the main path consisted of a sequence of convolutional, BN and ReLU layers, following the improved pre-activation configuration described in Han et al. [42]. Pooling layers were replaced by strided convolutions, which skip every other step in the filtering process. When length and depth had to be adjusted, the shortcut path included a strided convolution to produce a linear projection of the input. Finally, the hidden fully connected layer was replaced by a global average pooling (GAP) layer which outputs the mean value of each input channel. This was meant to reduce overfitting and improve robustness to spatial translations of data [43].

3) CONFIGURATION OF THE NETWORK OPTIMIZER
The training process was optimized using stochastic gradient descent (SGD) with momentum (m = 0.9), and an initial learning rate of 0.05. The mini-batch size was set to 256 for segments of 4 s. For the other segment durations, the batch size was adjusted to produce a similar amount of training iterations, that is, 341, 512 and 1024 for 3 s, 2 s and 1 s respectively. As few as 10 epochs were sufficient for the networks to converge. A piecewise learning rate decay factor of 0.5 per epoch was applied, allowing the solver to transition from rougher to finer optimization steps over a short training process. All calculations were performed using the Matlab's deep learning toolbox, on a multi-GPU setting over 4 Nvidia GeForce RTX 2080 Ti GPUs.

B. DATA PARTITIONING AND MODEL EVALUATION
Both architectures were evaluated using a 10-fold cross-validation (CV) strategy. Data partitioning was conducted patient-wise and in a quasi-stratified way, so that the analysis intervals contained in each fold would present a rhythm distribution close to that of the whole dataset. Patient-wise data partitioning ensured that the training and test patients did not overlap, and thus may have different chest configurations, physiological characteristics and/or defibrillator pad placements. Since the results could depend on how data were partitioned, 100 different random 1 CV partitions were used. This allowed the statistical characterization of the performance of the models.
Data augmentation was used to increase the available number of shockable segments. This addressed two problems. First, the large class imbalance of the dataset, with a nonshockable to shockable class proportion in excess of 4:1 (see Table 1). Second, the low number of VT samples. An almost four-fold increase in the number of available segments was achieved by extracting segments with a 75% overlap. Augmented data were used only during training; all testing was carried out using the original non-overlapping segments.
A shock decision algorithm is a binary classification problem, in which shockable rhythms are the positive class and nonshockable rhythms the negative class. The standard performance metrics are Se, Sp and accuracy (Acc). However, in OHCA nonshockable rhythm prevalences are much higher than those of shockable rhythms. Since the AHA requires high Se/Sp values, the balanced accuracy (BAC) was also computed [17]: Fig. 4 shows the performance metrics for each model as a function of the segment duration. Performance dropped as the duration of the segment shortened. Still, AHA performance goals were met for all segment durations, with median Acc and BAC scores above 97.4% for segments as short as 1 s. CNN models achieved slightly higher median Se, although the differences were not significant. The ResNet models had higher Sp and therefore higher Acc, given the much larger prevalence of nonshockable rhythms in the dataset. When compared using the McNemar test, the ResNet models were significantly more accurate than the CNN models (p < 0.05) only for 4 s and 3 s. Table 2 shows the detailed shock/no-shock classification results for the different rhythm types. The worst performance was obtained for VT, which was by far the least prevalent rhythm. However, all models met the AHA's 75% sensitivity performance goal for VT. The differences between the CNN and ResNet models were small. ResNet models outperformed CNN models in ORG rhythms for longer segments, and the converse occurred for VT. For shorter segments both models performed similarly in all rhythm types.

B. DATA CLUSTERIZATION AND ERROR SOURCES
A dimensionality reduction technique was applied to visualize how each network separated the data, and to identify potential sources of error. Models were trained and tested using the full dataset. Then, the activations of the last convolutional block were retained and projected to a 2-D map using the t-distributed stochastic neighbor embedding algorithm (t-SNE), a non-linear dimensionality reduction method well suited for high-dimensional data representation [44]. Fig. 5 shows the resulting t-SNE scatter plots for both networks and 4 s segments.
VF samples formed a well-separated cluster, similar for both architectures, ending in a boundary connecting to AS (borderline AS/VF rhythms). The boundary between both regions could be intuitively associated to the amplitude thresholds commonly accepted for the definition of coarse VF (>200 µV) and AS (<100 µV) [37], [38]. Moreover, this region also showed an overlap between VF and ORG rhythms, associated to ventricular arrhytmia with slower ventricular rates and ORG rhythms with aberrant QRS complexes (borderline ideoventricular rhythms).
It is noteworthy that although the networks were not trained to differentiate AS and ORG, the 2-D maps show a good clusterization of AS and ORG rhythms. The borderline cases could be intuitively associated to the heart-rate thresholds customarily used in the definitions of AS (<12 min −1 ) and ORG (>12 min −1 ) [17], [45]. Finally VT samples were spread along the VF cluster, and were therefore not well separated.
The largest concentration of shock/no-shock classification errors occurred in the interface of the AS/ORG/VF clusters shown in Fig. 5. In order to better appreciate the reasons for these errors the different rhythm classes were parametrized using standard measures applicable to each rhythm type [8], [32], [37], [46]. For AS we computed the mean power of the signal [8], [37], for ORG rhythms the mean heartrate [32] and for VF the amplitude spectrum area (AMSA), which is a weighted sum of spectral amplitudes correlated to myocardial perfusion [46]. Fig. 6 shows the t-SNE plot of the CNN model graded by the typical parameter of each rhythm type. As shown in the figure, VF and AS mix when the power in AS is moderate-to-high (> 10 −3 mW), or AMSA in VF is low, indicating those VF rhythms have lower fibrillation frequencies and lower amplitudes. ORG rhythms mix with VF when heart-rates are moderate and amplitudes are low (AS/VF/ORG boundary). These samples correspond, in most cases, to ideoventricular rhythms, that is slow ventricular rhythms with wide QRS complexes. This is better appreciated in the examples of classification errors shown in Fig. 7. The figure also shows why shock decisions are specially challenging for very short EKG: short transient low amplitude intervals during VF or short periods of fast disorganized activity during nonshockable rhythms may result in an incorrect decisions (see lower panel in Fig. 7).

C. COMPARISONS WITH PREVIOUS MODELS
The convolutional models developed in this study were compared with the published deep learning algorithms for shock decision [22], [25]- [27]. These architectures include a CNN combined with a recurrent network [27], and three designs based on CNNs: Acharya et al. [25] for shock decision, Kiranyaz et al. [22] for heartbeat classification (better suited for short segments), and Nguyen et al. [26] with a multi-channel input using the EKG and two components obtained from the variational mode decomposition [47] of the EKG. For the analyses, all the models were trained and tested as described in section III-B, and using the optimizer described in section III-A3. This allowed a much faster training and yielded better preliminary results. BN layers were also added at the output of each convolutional layer, which reduced overfitting and improved convergence. Fig. 8 shows the accuracy and BAC of our best model (ResNet) compared to those of the previous deep learning algorithms. Our best model had a significantly higher accuracy (p < 0.05 in the McNemar test) for all segment durations. The accuracy of the ResNet model was greater by 0.27-points (16.5% of remaining errors corrected) for 4 s, 0.22-points (12.0%) for 3 s, 0.25-points (12.1%) for 2 s and 0.56-points (18.7%) for 1 s. The second best model was a recent design combining convolutional networks and a recursive network, but its performance degraded for short EKG segments.

V. DISCUSSION
This study presents a comprehensive analysis of shock decision algorithms based on convolutional networks, introducing   a new CNN architecture and a ResNet architecture to assess the benefits of deeper networks for this task. Our results show that a rhythm analysis compliant with AHA specifications is possible using EKG segments as short as 1-s, what would contribute to shorten interruptions in CPR for the analysis of the rhythm. Most importantly, all the experiments were done using OHCA data gathered using AEDs, and with quality controlled rhythm annotations, thus ensuring the results are meaningful for the clinical scenario at hand.
Our models had a significantly better accuracy (p < 0.05) than the next best deep learning models of previous studies (see Fig. 8). The accuracy of our best model was 0.2-points greater for any segment duration than that of the next best model, an architecture mixing convolutional and recursive layers, and was greater by 0.4-points than any previous convolutional design. This means that at least 10% (20% for convolutional designs) of errors were corrected. Moreover, the advantages were larger for 1 s segments (see Fig 8), for which AHA compliant shock decision algorithms had not yet been demonstrated. Two reasons may explain this improvement. First, the architecture was inspired by a recent approach using ResNets for general purpose EKG arrhythmia detectors [24]. We used filters of width 16, which produced the best preliminary results, and an increase in channels per block in powers of 2 (with max-pooling layers of 2), and a smaller sampling rate better adapted to AED EKG bandwidths than Acharya et al. [25] or Kiranyaz et al. [22]. The use of narrower filters with larger sampling rates as in [25] may produce poorer representations, and using very wide filters with larger downsampling (max-pooling) as in [26] produces representations that are too coarse. Second, we introduced BN layers [40], which were not present in previous solutions [22], [25], [26]. These layers stabilize the learning process and add a regularization effect, thus avoiding overfitting  Table 2). and allowing larger learning rates and fewer epochs to train the networks. These design characteristics were particularly important as the segment duration increased because they allowed the network to learn subtle signal details that differentiate borderline shockable and nonshockable rhythms (see Fig. 5), and explain why our ResNet design was only significantly more accurate (p < 0.05) than our CNN design for segment durations of 4 s and 3 s.
From a clinical perspective, our models met, for all segment durations, the minimum Se/Sp values recommended by the AHA. Even for VT, the least prevalent rhythm class, the algorithms were over 9-points (see Table 2) past the minimum 75% AHA recommendation. Our results show, for the first time, that an AHA compliant shock decision is possible analyzing 1-s EKG segments, which is over 3-times shorter than the typical segment duration used in AEDs [7], [8], [48], [49]. Even for 1-s segments, Se was above 97% and Sp was above 97.5%, that is 7-points and 2.5-points over the minimum values established by the AHA, respectively. This ensures a safe (high Sp) and efficient (high Se) use of the algorithm for very short EKG segments, opening the possibility of a combined optimization of CPR therapy with AED use. In addition to shortening no-flow intervals for rhythm analysis, two other possibilities open up. First, a fully automatic rhythm analysis during 30:2 CPR (30 compressions to 2 ventilations) during ventilation intervals [50], an application that involves the use of existing algorithms to detect pauses in compressions based on the impedance recorded by AEDs [51], [52]. Second, the algorithm could be used to improve current methods for rhythm analysis during CPR [38], which have an unsafe positive predictivity due to an excess of false positives. So when a shockable rhythm is suspected during CPR, the AED could instruct the rescuer to stop CPR for a short confirmatory analysis [7], and then use a very short EKG segment without artifacts to confirm the decision using a very accurate algorithm.
Typically AED memory and computation resources are limited because the devices are equipped with low end processors that handle many other tasks in parallel [53]. So it is important to design simple shock decision algorithms, with low computation and storage demands unlike those of CNN architectures. In order to simplify our architecture we developed thin-CNN and thin-ResNet solutions based on depth-wise separable convolutional layers [54], which separate the filtering process in two steps: first, a 1-D filter of size L is applied to each of the M input channels; and then, N linear combinations of the results are computed as output channels. This reduces the number of trainable weights from M ·N ·L to M ·L+N ·M . The CNN architecture could be further simplified by replacing the hidden fully connected layer by a global average pooling layer, as done in the ResNet architecture (see Fig. 3), which removes the D/2 4 · 64·10+10 weights of the FC layers. These modifications reduced the number of weights by at least 90% (worst case of 1 s segments).  Fig. 3b to the deep learning solutions proposed in Kiranyaz et al [22], Acharya et al [25], Nguyen et al [26] and Picon et al [27]. Note that Nguyen et al [26] cannot be implemented for T = 1 s due to filter and max-pooling sizes.
In addition, the number of element-wise products in the convolutional layers is reduced from N ·L·D·M to L·D·M + N ·D·M , a reduction of over 90%, which could make a big difference for low-end hardware not specially designed for matrix computations. Fig. 9 shows the results for the thin solutions compared to the complete solutions of Fig 3. The thin solutions were significantly less accurate than the complete solutions (p < 0.05), but there were no significant differences in accuracy between the thin-ResNet and the best previous deep learning solutions. Moreover, the thin-ResNet outperforms all previous convolutional solutions. Our thin solutions are considerably lighter than all previous deep learning solutions [22], [25], [26]. Their performance is above AHA specifications even for segments as short as 1-s, making them an implementable deep learning solution in the type of microprocessors customarily found in AEDs.
Finally, this study originates from a dataset of EKG from OHCA patients recorded using AEDs, and annotated by a pool of specialized clinicians. By including all OHCA events in a two and a half year period of data collection, we made sure the dataset included a diverse set of over 850 patients that properly represented the differences in gender or age, chest configurations and physiological characteristics. Moreover, by partitioning the data patient-wise we also made sure that different patients were used to learn the characteristics of the OHCA rhythm classes, and to test the accuracy of our algorithms. And very importantly, we made sure our dataset included all OHCA rhythm types, including AS. Our results show that shock decisions for AS were as difficult as for ORG rhythms (see Table 2). All previous studies using deep learning methods had dismissed the use of AS in the design of their algorithms, arguing that AS could be safely identified using a simple power-threshold [26], [27]. Our clustering analysis shows (see Fig. 5) that borderline AS/VF are rhythms pose one of the greatest challenges to the accuracy of the algorithms, and that AS cannot be safely identified using FIGURE 9. Performance degradation for the thin models compared to the complete architectures introduced in this study. The thin architectures are better suited for low-end processors by reducing the storage needs and number of products to compute by at least 90%. a power threshold (see Fig 6). In OHCA research, obtaining quality audited annotated signal datasets involves large time and money investments to recruit pools of specialized clinicians [36], [45], [48]. For instance, the annotation and quality review of the data for this project was a year long project involving data collection, the development of easily deployable annotation tools, and collecting and auditing all the clinician's annotations [34]. Since the need for large quality audited datasets is one of the limitations of deep learning solutions, more complex algorithmic applications like the classification of OHCA rhythms into 5-classes (4 rhythm types plus presence of pulse) using deep learning may be a challenge [17], [38]. However, our data dimensionality reduction analysis shows that our architectures found richer structures in the data than the shock/no-shock classes. Samples were grouped into three well defined clusters corresponding to the three main rhythm types (VF/AS/ORG), and graded by meaningful waveform characteristics (see Fig. 6). So it is likely that in the future deep learning solutions for multi-class OHCA rhythm annotation could be developed using smaller OHCA datasets and the networks developed in this study as base models for transfer learning [55].
This study has some limitations. First, data were collected using a single device model, and other AED models may have different EKG acquisition circuitry. However, the two fundamental EKG acquisition characteristics of the device, the EKG bandwidth (0.5 − 21 Hz) and amplitude resolution, are typical of AEDs [36]. Consequently, the algorithms would most likely be usable in any AED after EKG resampling (F s = 125 Hz), or fine-tuning of the convolutional networks (with small datasets) if the data had a more restrictive bandwidth. If larger bandwidths were used, such as in monitor-defibrillators, the EKG could be first filtered to the 0.5 − 21 Hz band, and then fed to the convolutional networks. Second, the effect of typical EKG noise sources, such as baseline wander or power-line interference, were not analyzed. However, given the AED analysis bandwidth, these noise sources would be considerably attenuated. The EKG data used for the study were taken from the device after preprocessing, and all the AED analysis intervals of the patient cohort were used. Only intervals with chest compression activity were discarded because CPR must be interrupted for AED rhythm analysis [4]. That is, data were not discarded if the typical EKG noise sources were present, thus ensuring the algorithms are accurate regardless of these noise sources.

VI. CONCLUSION
New convolutional architectures were proposed for AED shock decision algorithms. The algorithms were trained and tested using OHCA data recorded using AEDs and annotated by a pool of specialized clinicians. The accuracy of our methods improves that of previous solutions, and we demonstrated the possibility of an AHA compliant shock decision with EKG segments as short as 1 s. This should contribute to a combined optimization of defibrillation and CPR to improve OHCA survival. His fields of expertise are biomedical signal processing, machine learning, and data management in the fields of pre-hospital emergency medicine and cardiac arrest. In this field, he has collaborated with leading European and U.S. clinicians, and he has published over 40 articles in SCI-IF journals and contributed over 100 communications in scientific conferences.
ELISABETE ARAMENDI (Member, IEEE) was born in Azkoitia, Spain, in 1969. She received the M.Sc. degree in telecommunications engineering from the University of the Basque Country (UPV/EHU) in 1993. She has been an Assistant Professor with the UPV/EHU, since 1994, and an Associate Professor, since 2002. She is a Founding Member and the Director of the Bioengineering and Resuscitation Research Group, recognized as outstanding by Basque Science System. Her fields of expertise are statistical signal processing, biomedical signal processing, and data management in the fields of pre-hospital emergency medicine and cardiac arrest. In this field, she has collaborated with leading researchers of the Regional Outcomes Consortium (ROC), and she has published over 30 articles in SCI-IF journals and contributed over 100 communications in scientific conferences.
BEATRIZ CHICOTE received the B.Sc. degree in industrial electronics engineering, the M.Sc. degree in advanced electronic systems, and the Ph.D. degree in electronics and telecommunication from the University of the Basque Country (UPV/EHU), in 2013, 2015, and 2019, respectively. In 2019, she joined the Lortek Technological Center, IK4-Lortek, a member of the Basque Research and Technology Alliance, where she is currently a member of the Digital Department of Smart Manufacturing. Her research was focused on developing new processing techniques for shock outcome prediction in the fields of pre-hospital emergency medicine and cardiac arrest.
DANIEL ALONSO was born in Donostia, Spain, in 1961. He received the degree in nursing from the University of the Basque Country (UPV/EHU), in 1982. He is currently with the Basque Emergency Medical System (Emergentziak-Osakidetza), Basque Health Service, Hospital Donostia, where he is also the Scientific Coordinator of the Basque Cardiac Arrest Registry. He is a part of the Spanish initiative for a nation-wide cardiac arrest registry and contributes to the European registry (Eureka initiative). He has participated in numerous research projects from competitive calls, has published eight articles in SCI-IF journals, and has participated in many national conferences on emergency medicine. His fields of research include the management of cardiac arrest patients, new technologies in the treatment of cardiac arrest, and the management of cardiac arrest registries.
ANDIMA LARREA was born in Etxebarri, Spain, in 1966. He received the degree in medicine and surgery from the University of the Basque Country (UPV/EHU), in 1992, and the degree in epidemiology and public health, in 1999. He has been working as an Associate Doctor with the Emergency Unit of the Basque Government's Department of Health, Basque Health Service, since 1993. He is a member of the CPR Group. He is a member of the Board of Directors of the Spanish Society of Emergency Medicine and Emergencies (SEMES), Basque Country, and the Spanish Committee of International Trauma Life Support (ITLS). He is an expert in the field of pre-hospital emergency medicine, where he has taught many courses in workshops, participated in forums, contributed SCI-IF articles, and many contributions to scientific conferences.
CARLOS CORCUERA was born in Bilbao, Spain, in 1971. He received the degree in general medicine from the University of the Basque Country (UPV/EHU), in 1996. He has been working in emergency medicine since 2000, and as an Associate Doctor with Emergency Medical System (Emergentziak-Osakidetza) (Basque EMS), Basque Health Service, since 2001. He teaches internal resident doctors and actively collaborates in research on the management and treatment of OHCA.