Generalized Rational Variable Projection With Application in ECG Compression

In this paper we develop an adaptive transform-domain technique based on rational function systems. It is of general importance in several areas of signal theory, including filter design, transfer function approximation, system identification, control theory etc. The construction of the proposed method is discussed in the framework of a general mathematical model called variable projection. First we generalize this method by adding dimension type free parameters. Then we deal with the optimization problem of the free parameters. To this order, based on the well-known particle swarm optimization (PSO) algorithm, we develop the multi-dimensional hyperbolic PSO algorithm. It is designed especially for the rational transforms in question. As a result, the system along with its dimension is dynamically optimized during the process. The main motivation was to increase the adaptivity while keeping the computational complexity manageable. We note that the proposed method is of general nature. As a case study the problem of electrocardiogram (ECG) signal compression is discussed. By means of comparison tests performed on the PhysioNet MIT-BIH Arrhythmia database we demonstrate that our method outperforms other transformation techniques.


Generalized Rational Variable Projection With Application in ECG Compression
Péter Kovács , Sándor Fridli , and Ferenc Schipp Abstract-In this paper we develop an adaptive transformdomain technique based on rational function systems.It is of general importance in several areas of signal theory, including filter design, transfer function approximation, system identification, control theory etc.The construction of the proposed method is discussed in the framework of a general mathematical model called variable projection.First we generalize this method by adding dimension type free parameters.Then we deal with the optimization problem of the free parameters.To this order, based on the well-known particle swarm optimization (PSO) algorithm, we develop the multi-dimensional hyperbolic PSO algorithm.It is designed especially for the rational transforms in question.As a result, the system along with its dimension is dynamically optimized during the process.The main motivation was to increase the adaptivity while keeping the computational complexity manageable.We note that the proposed method is of general nature.

As a case study the problem of electrocardiogram (ECG) signal compression is discussed. By means of comparison tests performed on the PhysioNet MIT-BIH Arrhythmia database we demonstrate that our method outperforms other transformation techniques.
Index Terms-Variable projection, Model selection, Separable nonlinear least squares, Nonlinear regression, Rational functions, Particle swarm optimization, ECG compression.

I. INTRODUCTION
A NALYSIS of signals by means of mathematical trans- formations proved to be an effective method in various aspects.For instance, dimensionality reduction methods are strongly related to the problems of compression and noise suppression of the original signal.Moreover, the transform can also be used for extracting features in classification tasks.Many of these transform-domain techniques are generated by fixed basic functions like the trigonometric functions in the Fourier transform, Walsh functions in the Walsh-Fourier transform, mother wavelet function for the wavelet transform, etc.In order to surpass the limitations in compression ratio, reconstruction error, adaptivity and computational complexity of these algorithms, the dictionary based methods such as matching and basis pursuit were proposed by many authors, see e.g.[1], [2].In that case, an overcomplete set of base functions was applied to increase the adaptivity.Also, the wavelet packet transform (WPT), which utilizes different tilings of the time-frequency plane, was introduced by Coifman et al. [3].Although these led to improved adaptivity, the performance remained limited because of the lack of free parameters.
There is a trade-off between overfitting and underfitting, which is controlled by the number of parameters, i.e., the model order.Information theory provides many order selection rules, which quantify the optimized models according to the Bayesian information criterion (BIC), Akaike information criterion (AIC), etc.In this case, the BIC and/or the AIC of the candidate models are precalculated for various parameter setups, then the best among them is selected according to the given criterion [4].Basis pursuit can also be applied to make an initial assumption on the number of basic function, which is followed by an additional optimization step to find the best nonlinear parameters.
The orthogonal systems of rational functions play distinguished role and proved to be very effective in several areas and applications.Although their properties, like simplicity and the high variety of such systems, make them promising candidates for adapting transform domain technique there are only few examples for their application in signal processing.Moreover, the methods used in those examples do not employ the capacity of such systems completely.
In this paper we systematically develop a new method that fully utilizes the versatility of rational systems.To this end, we start from the well-known variable projection functional [5] and generalize it by introducing a new cost function.In the generalized variable projection method we integrate system dimensionality into the set of free parameters.In contrast to previous approaches, we jointly optimize both the accuracy and the complexity of the model.By means of rational systems we demonstrate that the increased computational demand is manageable, and the generalized method is efficient.Toward this we set up a new architecture space that supports the structure of the parameter space (number of poles, multiplicities) of the rational functions.We develop the hyperbolic version, based on the Poincaré model, of the stochastic Multi-Dimensional Particle Swarm Optimization (MDPSO) [6], [7] for the nonlinear optimization problem of system parameters.Finally, ECG compression with the analysis of the parameters and This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/comparisons with the state-of-the-art methods is taken as casestudy.The comparison tests show that our method outperforms other transformation algorithms.
Regarding the application, the main advantage of the proposed method is its adaptivity.State-of-the-art ECG processing methods usually fix the function system a priori based on the shape similarity between the ECGs and the basic functions.For this reason, Hermite functions, B-splines, and wavelets became popular in this field.Although the shape of the basic functions correlates very well with the normal ECG morphology, it is difficult to represent abnormal beat classes.The proposed method automatically scales up the number of free parameters in order to approximate abnormal beats with an acceptable level of error.
The paper is organized as follows.Section II contains the projection methods and the corresponding function systems.In Section III we develop the generalized variable projection model.In order to solve the corresponding optimization problem we extend the basic and multi-dimensional PSO algorithm using the Poincaré model of the hyperbolic geometry in Sections IV-V.In Section VI we apply our technique for the construction of an ECG compression method using optimized rational functions.In Section VII a comparative study of different adaptive transform-domain based techniques evaluated on the MIT-BIH arrhythmia database [8] is provided.The discussion of the results can be found in Section VIII.Finally, Section IX is a summary of conclusions and future plans.

II. PROJECTION METHODS
This section serves as background and framework, along with examples, for the construction of generalized variable projection method presented in Section III.

A. Non-Variable Projection Method
Let us start with the classical non-variable projection method.By projections we will always mean orthogonal projection, which is an important transform-domain technique.It is strongly related to approximation theory in Hilbert spaces.Namely, the space of the appropriate signals is usually considered to be a Hilbert space H with scalar product •, • .Then, a signal f ∈ H is modeled by its orthogonal projection onto a closed subspace S ⊂ H.In practical applications S is a finite N ∈ N + dimensional subspace spanned by a linearly independent function system Θ := { Θ j ∈ H : 0 ≤ j < N }.It is well-known that for any signal f ∈ H the best approximation f ∈ S uniquely exists: is the usual one induced by the scalar product.The orthogonal projection from H onto S, which is a continuous linear operator, will be denoted by P N Θ .Then, a signal f ∈ H is represented in S as follows: where the coefficients c j ∈ R (j = 0, . . ., N − 1) are the solution of the system of linear equations Gc = b.In this G is the identity matrix and c i = f, Θ i is the ith Fourier coefficient of f .A typical example for the space of appropriate real valued signals is H = L 2 w (Ω) with a positive weight function w.Ω is usually an interval, bounded or unbounded, for analog signals and a proper countable set of real numbers for discrete-time signals.Various types of basic functions have been used so far depending on the particular problem.For instance, trigonometric functions are applied in MP3 coding, while wavelets in JPEG 2000 standards.In these cases the shapes of the basic functions are fixed, they cannot be adjusted to the individual signal.This limitation proved to be important, especially in dynamically changing environments, where we need to adjust the system to the signal.For instance, in case of ECGs, normal beats usually dominate the signal, and even the non-adaptive techniques work well on them.On the other hand, abnormal beats can have rather complex waveforms for which those techniques provide poor results.These phenomena raise the need for more sophisticated models, in which the basic functions can be adapted to the shape of the signal due to their free parameters.

B. Variable Projection Method
The theory of variable projection methods was laid down by Golub and Pereyra in [5].They provided formulas for theoretical and numerical derivation for the related Gauss-Newton type algorithms.Since then these methods have found applications in many areas including neural networks, telecommunications, dynamical systems, etc.A good summary on them are given in [9], [10].We point out that several well-known transformations, e.g.B-splines, orthogonal polynomials, can be understood as variable projections.As a consequence the related results can be interpreted in a unified framework, which has avoided the attention so far.The above-mentioned special models are discussed in the next subsection.
Let us now suppose that instead of a fixed function system we have a collection of systems { Θ(a) : a ∈ Γ 1 } where Γ 1 is an index set.We will assume that for every a ∈ Γ 1 the function system Θ(a) is of the form Θ(a) := { Θ j (•, a) ∈ H : 0 ≤ j < N }, where N ∈ N is fixed and the Θ j (•, a)'s are linearly independent.Then the general nonlinear model is where the c j 's are real or complex coefficients.In order to get the best model for a particular analog signal f ∈ H, we need to find the optimal values for c and a, i.e., to minimize the nonlinear functional Let us denote the a dependent Gram matrix by G(a).For a fixed parameter a the optimalization with respect to c is linear and the result is the orthogonal projection denoted by P N Θ(a) f .We note that c(a) can be calculated by solving G(a)c(a) = b(a).
Using this fact, the original problem in Eq. ( 3) can be reduced to minimize which following Golub and Pereyra [5] is called variable pojection functional.
In applications, we work with discrete signals f ∈ R M and with matrices Θ(a) ∈ R M ×N representing discrete function systems.The formulas developed above for the analog case can easily be adjusted to obtain their discrete versions.Namely, the coefficient vector c(a) ∈ R N can be calculated via c(a) = Θ(a) + f , where Θ(a) + is the Moore-Penrose generalized inverse of Θ(a).Moreover, the discrete variant of the orthogonal variable projection operator is P N Θ(a) = Θ(a)Θ(a) + .Then the discrete variable projection functional is of the form where • 2 stands for the usual Euclidean norm in R M .In [5], Golub and Pereyra showed that the functionals r(c, a) and r 2 (a) have the same global minima in both the analog and the discrete versions.Furthermore, they demonstrated the fact that iterative nonlinear algorithms converge faster on the reduced r 2 (a) problem.

C. Examples for Variable Projections
Now we show examples that can be understood as special variable projections.Consequently the optimization problems involving them can be reduced to the form given in Eq. (5).Although the theoretical framework is the same in the discussed cases the corresponding algorithms can be different in several respects, such as efficiency, complexity.We demonstrate the pros and cons using the example of ECG compression.
1) B-Splines: We consider the optimization of B-splines B j, (•, a) (n ∈ N + , j = 0, . . ., n − 1, a ∈ R n ) of degree ∈ N, which are defined by where Let us take the B-splines with fixed degree .Then the base functions in the variable projection model are Θ j (t, a) = B j, (t, a), and the free parameter is the vector of free knots a = (a 0 , a 1 , . . ., a n−1 ) T .The problem is to find the optimal locations of the knots.Unfortunately, the reduced functional r 2 (a) has several stationary points in this case, which makes the optimization quite a hard task.This phenomenon, called lethargy problem, was thoroughly studied by Jupp in [11].We note that the B-spline model with free knots was adapted to ECG data compression tasks by Karczewicz and Gabbouj in [12].Here, the optimization process is an iterative method, which removes the least significant knot at each step.Because of its high flexibility both the approximation and compression properties of the spline algorithm are comparable with the recent methods (see e.g.Table IV).On the other hand, the computational cost is high due to lack of orthogonality.
2) Hermite Orthogonal Polynomials: Orthogonal polynomials are widely used in signal processing.In particular, Hermite polynomials have found many applications in ECG compression [13], [14], classification [15], [16], feature analysis [17] and QRS detection [18].Classical Hermite polynomials are defined by the following recursion: where H 0 (t) = 1 and H 1 (t) = 2t.The so-called Hermite functions are constructed from them by dilation: These functions form a complete orthonormal system in L 2 (R).
In the Hermite type model of ECG signals, three individual variable projection methods are combined.Each of them is based on a dilated Hermite system Θ j (t, a i ) := ϕ j (t, a i ) (0 ≤ i < 3) and is used to represent the corresponding segments P, QRS, T of the heart beat.The best value of each dilation parameter a i is determined via optimization.It is worth mentioning that in a recent paper [14] on discrete Hermite functions a significant improvement is presented in terms of compression of the QRS complex.In Section VII we enhanced the original algorithm in [13] by combining it with [14] and used it in our comparative study.For the theory of orthogonal polynomials and numerical algorithms we refer to the fundamental books [19], [20].
3) Wavelets: The discrete wavelet transform (DWT) is one of the most popular transforms in signal processing [21], [22].The construction of wavelet transforms (WT) is based on a so-called scaling function φ for which the translates {φ(t − k)} k∈Z form an unconditional orthonormal basis in the initial subspace V 0 ⊂ L 2 (R).V 0 is supposed to be invariant with respect to integer translations.It generates the multiresolution Then, the original signal is decomposed according to the subspaces V i and W i , where W i is the orthogonal complement of V i with respect to V i+1 .The corresponding bases in V i and W i are respectively where ψ is the so-called mother wavelet induced by φ.
In practice we deal with discrete wavelet transform for finite signals, which can be viewed as periodic signals.There the scaling function φ and the mother wavelet ψ are completely characterized by a compactly supported low-pass filter h as follows: where g k = (−1) k h L−1−k is a high-pass filter and L is the filter length.Although the constraints of orthogonality restrict the values of the filter's coefficients, we still have L 2 − 1 degreeof-freedom to choose h k (see e.g.Sect.5.9 in [23]).We note that, for ECG modeling [24], [25] the filter dimension is usually L = 6.Then there are two free parameters a 1 and a 2 which determine the filter's coefficients h k [23]: Now we insert the DWT into our framework by defining the vector index j = (i, k).Then, for a fixed L = 6 dimension we have Θ j (t, a) := ψ j (t, a), where a = (a 1 , a 2 ) T .4) Rational Functions: In our last example we consider rational function systems, which can be viewed as the generalization of polynomial systems [26].To this order let C stand for the set of complex numbers, for the unit circle (or torus).For a sequence a = {a n ∈ D} n∈N the elements of the corresponding orthogonal system, which is called Malmquist-Takenaka (MT) system [27], [28], can be given in an explicit form as follows: where B(z, a) is the so-called Blaschke function defined by The parameter a is called inverse pole, where 1/a is the pole in the usual sense.
For the finite-dimensional version let a = (a 0 , . . ., a n−1 ) T ∈ D n be a vector of distinct inverse poles with multiplicities m = (m 0 , . . ., m n−1 ) T ∈ N n + .Then we will consider the MT system that corresponds to the inverse pole vector: where We note that although the MT system itself depends on the order of the inverse poles, the generated subspace and so the projection is invariant with respect to it.In this setting a signal f belonging to the Hardy space H 2 (D) is modeled by taking Θ j (t, b) = Φ j (e it , b) in Eq. ( 1) with which are called the MT -Fourier coefficients.Applying appropriate discretization algorithms such as those in [29], [30], the integral above can be substituted by finite sums (see Theorem 2. in [31]).We note that this model was effectively used in QRS modeling [32], system identification [33], EEG seizure classification [34]- [36], sleep stage classification [37], etc.

III. GENERALIZED VARIABLE PROJECTION METHOD
Let us start with the variable projection model and the corresponding functional given in Eq. ( 4).We note that in this model the dimension N of the subspace is a priori fixed.Our aim is to develop a generalization by dropping this constraint, i.e., adding a new free parameter related to the dimension of the subspace.Toward the definition of the generalized variable projection method we define the new index set as In simple cases Γ 2 = N represents the dimension itself.This is the situation in Section II-C on Hermite functions.In that case increasing N results in nested subspaces and so in a better minimum of the nonlinear functional Eq. ( 4).Consequently the optimization would terminate at the highest possible value of the dimension.On the other hand, high dimensions are not desired in real applications because it increases the complexity of the model.For controlling the dimension we introduce a penalty function Λ(N ) that is monotonically increasing.For the rest of the paper we always assume that f is of zero mean with unit variance.Then the generalized variable projection functional including the penalty term is defined as follows: ) There are however more complex cases when the parameters in Γ 2 are not simply dimensions, and the subspaces are not embedded into each other.In order to address this problem we modify the definition above to obtain the final form for the generalized variable projection functional (10) where Θ(a, d) := { Θ j (•, a, d) ∈ H : 0 ≤ j < N(d) } and Λ(d) increasing in d measures the complexity of the corresponding system in some sense.It may depend not only on the system but also on the specific task.
It is easy to see that the variable projection functional can be considered as a special case in which Γ 2 has exactly one element.We note that the main advantage of the generalized method is the simultaneous optimization with respect to the system and the dimension parameters.On the other hand, it makes sense only if all of the following conditions hold for the system: 1) it is flexible enough but easy to parametrize; 2) the complexity function is properly designed; 3) an efficient optimization can be constructed.(11) We want to underline the fact that the rational functions are satisfy all these conditions.We will demonstrate the feasibility of rational function systems for generalized variable projection via signal compression problems.One of the key questions is to find an effective optimization algorithm.A generalized PSO type algorithm will serve our purpose.In the next section we establish the construction of it by starting from the basic version.

A. Basic PSO Algorithm
The basic PSO algorithm was introduced by Eberhart and Kennedy [38] as a population based stochastic optimization technique.In case of n dimensional search space, the method is initialized by a random population where S ∈ N + denotes the size of the swarm and every x k is a potential solution for the optimization problem.The x k 's correspond to inverse pole configurations in our problem.For every x k ∈ R n let y k ∈ R n denote the personal and let y ∈ R n denote the global best solutions achieved so far.In each step, both the position and the velocity of the particles are updated in the following way: where the learning factors c 1 , c 2 are predefined constants and r 1 , r 2 ∈ (0, 1) are uniformly distributed random numbers.The inertia weight w was introduced later [39] in order to control the overall behavior of the swarm.For instance, one can favor exploration in the first few steps by increasing the value of w .Arbitrary large jumps are usually inhibited in the search space.
To this end, the velocities and the positions are restricted to a certain interval defined by the parameters, V max , X min , X max .We note that following the standard we use this algorithm by setting c 1 := 1.5, c 2 := 2.Moreover, w is linearly decreasing from 0.8 to 0.2.For other strategies of the parameter selection and convergence analysis we refer to [40], [41].

B. Hyperbolic PSO Algorithm for Single-Pole Problems
In this section we develop the hyperbolic variant, inspired by single-pole rational optimization, of the PSO method.In this case the particles contain only two coordinates, i.e., the real and imaginary parts of the inverse pole.If the algorithm terminates in the kth optimal particle, then the optimal inverse pole is x k,1 + ix k,2 .Furthermore, as we know from Section II-C4, the inverse poles of the MT system must belong to the open unit disc D. Hence, the search space is D. This implies the idea to use the Poincaré model of the hyperbolic geometry to keep the particles within the search space.According to this idea, we will replace the arithmetic operators in Eq. ( 12) by their hyperbolic variants.We note that in this way the constrained optimization problem converts to non-constrained one.Moreover, by using proper mappings it can be applied to regions more general than the unit disc.[42] serves as a general reference work in this section.
1) Hyperbolic Scaling: Using the terminology of Euclidean geometry, the vector scalar multiplication of the hyperbolic space can be defined in a similar way.Namely, it means the scaling of a hyperbolic vector by keeping its direction.In this case, the geodesics of this space are represented by arcs of circles that are orthogonal to the torus.We recall the definition of the hyperbolic metric for which (D, ρ) is a complete metric space.This metric space is invariant with respect to the Blaschke transforms B(t, a) := B(t, a), where a := (a, ) ∈ D × T .We will use the fact that the hyperbolic segments can be defined via B(t, a), which maps the interval [0, p] onto the hyperbolic segment connecting w 1 , and w 2 , where Now the hyperbolic vector − −− → w 1 w2 can be defined as a directed segment with B(0, a) = w 1 and B(p, a) = w 2 .Let us consider the scaling of a hyperbolic vector − −− → w 1w2 by the factor λ ∈ R. The new endpoint w λ of the solution vector − −− → w 1 w λ is In summary, the hyperbolic scaling λ − −− → w 1 w2 := − −− → w 1 w λ can be evaluated with w λ = B(s λ , a) as Eqs.( 13)-( 14) for any λ ∈ R.
2) Hyperbolic Addition: It turned out that the right way to define the hyperbolic addition is based on the compositions of Blaschke functions.Namely, it can be shown that the collection of Blaschke transforms B := {B(•, a) : a ∈ D × T } is closed for composition.Moreover, (B, •) is a subgroup of the well-known Möbius transformations, which maps the unit disc onto itself.In particular, if a 1 = (w 1 , 1) and a 2 = (w 2 , 1), then B(•, a 1 ) • B(•, a 2 ) = B(•, a), where a = (w, ) with = 1 + w 1 w 2 1 + w 1 w 2 and w = w 1 + w 2 1 + w 1 w 2 .The latter formula can be interpreted as a vector addition in the hyperbolic space for vectors with initial point at zero and endpoints at w 1 , w 2 (see Sections 3.4-3.5 in [43]).Following this, we will use the operations Then the hyperbolic PSO (HPSO) is defined by replacing •, +, − in Eq. ( 12) with their hyperbolic variants , ⊕, .This algorithm can be applied directly in the single-pole case, i.e., when there is only one inverse pole, in other words n = 1 in Eq. ( 7) and its multiplicity is fixed.

C. Hyperbolic PSO Algorithm for Multi-Pole Problems
The multi-pole problem is to determine the optimal inverse pole combination a = (a 0 , . . ., a n−1 ) T ∈ D n with fixed multiplicities m = (m 0 , . . ., m n−1 ) T ∈ N n + .For most of the heartbeats three poles are well separated according to the natural segmentation (P, T waves, QRS complex).Because of the localization property of basic rational functions the interference between terms with different poles is relatively small [44].Therefore, we perform the optimization separately on each complex coordinate min a i r 2 (a 0 , . . ., a i , . . ., a n−1 ) (i = 0, . . ., n − 1).(15) Then the solution of the multi-pole problem is reduced to successive applications of the single-pole optimization.It is a natural consequence of the hyperbolic model that the swarm cannot leave the unit circle during the algorithm.This makes the constraints X min , X max used in the original Euclidean algorithm unnecessary.
In [45] we showed that the HPSO outperforms the well-known Nelder-Mead simplex algorithm in terms of reconstruction error and stability.The latter was proved to be important, especially when the MT system is used in classification problems [35]- [37].

V. GENERALIZED RATIONAL VARIABLE PROJECTION WITH OPTIMIZATION BY MULTI-DIMENSIONAL HPSO
In this section we give a non-trivial example for generalized variable projection inspired by a real applications, namely by signal compression.Keeping our eye on conditions (11) in Section III, we first choose a proper system.This will be the system of rational functions; the reasons behind it were provided in Section II-C4.

A. Cost Function
The first term in Eq. (10), which measures how well the signal is represented by the projection, is set.Furthermore, it is natural to assume that the penalty term is strongly connected with the compression ratio in this case.According to this we specify the cost function as the linear combination of the approximation error and the reciprocal compression ratio (RCR) as follows: where f ∈ R M is a discrete signal with M samples, and Recall that f is of zero mean with unit variance, n is the number of distinctive poles and N is the sum of multiplicities (see Eq. ( 7)).The approximation error is computed as the usual percent root mean square difference (PRD).We point out that the second term containing RCR plays two roles.Namely, n + N is obviously inversely proportional to the compression ratio on the algorithmic level.On the other hand, RCR can be viewed as the measure of complexity (dimension) of the system.The consequence of the penalty term is that the gain in PRD must reach a certain level in order to move to a higher dimension.The role of the regularization parameter α ∈ (0, 1] is to customize the method to different applications.The proper value of α for ECG compression is given in Section VI-C2.Since α is constant in a given problem, the cost function can be divided by 100 • α to obtain The minimization of the cost function is designed to find the optimal pole configuration and the corresponding optimal pole positions.In order to do that we need to define an architecture space designed especially for the ECG signal compression problem.

B. Architecture Space
The pole configurations are described by the parameter vector m in Eq. (7).We note that the structure of the configuration set is rather complicated in the sense that the subspaces generated by different configurations are not comparable even if the numbers of the inverse poles along with the total dimensions of the subspaces, which are the sums of the multiplicities, are the same (see e.g. the cases (2, 6, 2) and (2,4,4)).This is a more complicated situation than the case of nested subspaces, which naturally induces a linear ordering and leads to the problem in Eq. ( 9).In the next step we will simplify the parameter set, i.e., we assign a virtual dimension d to each parameter vector m, in such a way that it follows the system complexity n + N .Based on our experience we chose a set of 30 configurations that are worth considering in the ECG compression problem.Table I contains the pole configurations m, system complexity n + N and the virtual dimension d.We note that there is however another expectation concerning dimensions.Namely, it is natural to expect that the increase of d results in better PRD.Figs. 1 show that taking records 117 and 119 from the dataset [8] our ordering defined purely on the basis of system complexity n + N behaves adequately in this respect.It is easy to see that by introducing the virtual dimension d, Eq. ( 17) can even formally be considered as an example for the generalized variable projection functional in Eq. (10) with

M
, where the connection between d and n + N is given in Table I.
In former works [32], [46], [47], it turned out that three inverse poles are sufficient for accurate representation of the heartbeats.These inverse poles are well separated and their multiplicities reflects the natural segmentation (P, T waves, QRS complex) of the heartbeat.Therefore, the inverse pole that corresponds to the QRS complex usually dominates the approximation, i.e., its multiplicity is higher than the others.The second and third most significant inverse poles correspond to the T and the P wave, respectively.Of course, in case of abnormal heartbeats some of these waves can be missing.These observations justify the pole configurations in Table I that we chose especially for ECG compression.

C. Multi-Dimensional HPSO
After having defined the cost function, i.e., the generalized variable projection functional, we turn to the problem of optimization.To this order we take the so-called multi-dimensional (MD) PSO algorithm introduced by Kiranyaz et al. [6], and we adapt it to our case.MDPSO is a generalization of PSO, which along with the process of adaptation was established in Section IV.PSO based algorithms are constructed for static environments, but many practical problems change dynamically.This motivated the generalization of PSO to MDPSO, in which the dimensions are not fixed a priori.Then the optimization becomes a mixed integer nonlinear programming (MINLP) problem.The native structure of the swarm was extended by dimensional parameters.Thus, the particles can seek both positional and dimensional optima.The MDPSO was originally developed to evolve Artificial Neural Networks (ANN) for supervised learning [48], where the weights and bias of the network should be determined in order to minimize the classification error.We emphasize that since there is no penalty term concerning the structure of the ANNs, this optimization problem is still a variable projection method (see Section 3.2 in [48] and Section 1 in [9]).The original MDPSO algorithm [6] is the following Position updates: Dimension updates: where [.] is the integer rounding operator.The main changes compared to PSO given in Eq. ( 12) are the dimensional indices d k , d k , d ∈ I = {d min , . . ., d max }, which denote the current, personal and global best dimensions, respectively.In this case, every particle has a certain position and velocity in each dimension.For instance, x d k k denotes the position of the kth particle at the dimension d k ∈ I.The dimensions are kept within I by using so-called clamping operator.Note that in this algorithm the dimension parameter is a natural number assigned to every ANN structure.
Following the reasoning given in Section IV we adapt MDPSO to rational systems by replacing the arithmetic operations in the position update equations Eq. ( 18) by their hyperbolic variants.We call this modification MDHPSO and provide its pseudocode Alg. 1 in the Appendix.Recall that the vector m in Eq. ( 7) is the natural characterization of the complexity of the rational system.This was converted in the sequence 1, . . ., 30 in Table I.The real explanation of this conversion was to make our 2. Approximation of biomedical signals with α = 0.5 by using the first four channels of the record slp02a from the MIT-BIH/slpdb database [8].
optimization problem compatible with MDHPSO.In this way we made sure that our generalized rational variable projection constructed for ECG meets all the three conditions in (11).
Fig. 2 shows examples where the multi-dimensional rational variable model along with MDHPSO is applied to different types of signals such as blood pressure (BP), respiration (RESP), ECG and EEG.It is transparent that the method automatically adjusts the number of the inverse poles and the coefficients to the complexity of the signal.For this reason, only two inverse poles are used to represent the BP and RESP signals with 6 and 8 coefficients, and dimensional parameters d = 1, 4 respectively.In case of ECG and EEG signals the optimal dimensions of the architecture space are increased.The number of inverse poles changes from two to three, the number of coefficients are 12 and 18, and the dimensional parameters are d = 14, 23.We note that the solution produced by the MDHPSO can be refined by fast local methods such as Gauss-Newton algorithms staying within the optimal dimension.

VI. ECG COMPRESSION
Biomedical monitoring of the human body is one of the most important tools for patient's diagnosis.It is for instance principal for proactive prevention of diseases.We note that long-term recordings such as ECG Holter-monitoring or 24 hours multi-channel electroencephalogram (EEG) recordings generate a large amount of data.This explains the need for compression in such cases.Of course the variable projection can also be used for various tasks other than compression including features extraction for classification, person identification, etc.Here we choose the task of compression of ECG to demonstrate the efficiency of our method.
A wide range of algorithms have been proposed in this field.They can be classified into three categories [49], namely, parameter extraction algorithms, direct time-domain methods, and transform-domain techniques.Here we are considering the latter one, which can be interpreted as projections of the signal to low dimensional subspaces.Existing methods use sinusoids, wavelets, wavelet packets, Walsh functions, orthogonal polynomials, splines, principal components, etc. [13], [50]- [53].For instance, Karczewicz and Gabbouj [12] proposed an algorithm that approximates the signal by linear combinations of B-splines with free knots.Although the algorithm is highly adaptive the computational cost is quite large due to the lack of orthogonality.In order to overcome this problem, orthogonal polynomials (in particular Hermite functions) were applied [13], [15], [16].As a consequence of orthogonality, the coefficients of the Hermite representation can be easily calculated by using scalar products of the corresponding Hilbert space.Efficient implementations for both the continuous and the discrete cases are presented in [14], [17].In contrast with the B-splines, Hermite-based compression schemes contain only one free parameter, namely the dilation of the base functions.Analogously, parametrized orthogonal wavelets can also be applied for signal compression.Then a wavelet decomposition with L/2 − 1 degrees-offreedom is obtained, where L is the length of the wavelet filter.Theoretically the number of free parameters is infinite, but for a large number of parameters the formulas become unmanageable.For this reason, in most of the signal processing applications the length of the filter is restricted to L = 6, which means only two degrees-of-freedom (see e.g.[24], [25], [54]).We note that only a few methods utilize rational functions [32], [46], [47], [55], [56], and even in those cases, the complexity of the models is fixed a priori.We will show that our approach is essentially different from them, and the proposed algorithm outperforms these methods.

A. Preprocessing Stage 1) Beat Detection/Normalization:
The compression method is based on successive evaluations of the MDHPSO algorithm on each heartbeat.The QRS complexes should be detected first to identify the heartbeats [57].Then we follow [58] to get the segmentation.Namely, the original signal is cut at every 130th sample before each QRS peak.In the next step the linear correction is applied to avoid jumps at the endpoints: where f = (f 0 , f 1 , . . ., f M −1 ) T is the discrete signal segment and M is the number of samples.Then we apply normalization For the reconstruction of f the values f 0 , f M −1 and f * 2 should be stored as well.
2) Hilbert Transform: We will apply a rational transform, given in Section II-C4, for ECG records.Recall that the signal in Eq. ( 8) belongs to H 2 (D).This implies that the real function representing a heartbeat should be extended to a complex valued function in H 2 (D).It can be preformed by means of the wellknown Hilbert transform.Therefore we will employ the discrete Hilbert transform H to f to obtain F := f + iH f .

B. Compressing Stage
For the compression of F obtained during preprocessing we will apply the multi-dimensional generalized rational variable projection method along with MDHPSO developed in Section V. Note that the original MDPSO was successfully applied in optimization problems related to dynamical environments [6].From this point of view, the problem of ECG compression is similar due to the physiological behavior of the human heart.Although, the ECG signals are characterized by strong interbeat correlation, the segments are influenced by several factors including the respiration rate.Namely, the heart rate (HR) increases during inhalation and decreases during exhalation.In some cases only a few coefficients are needed for a good approximation.In other cases more coefficients are required to store the significant diagnostic information.Typical examples are the abnormal heartbeats, in which sudden changes are present.
1) Basic Algorithm: After the preprocessing stage, the MDHPSO algorithm and the corresponding rational projection are executed on each heartbeat.The result is an optimal inverse pole configuration, which is quantized and stored together with the related coefficients.The architecture space in Table I is also saved in the header of the compressed file.It serves as a look-up table for the pole configurations.The structure of the compressed file and the block diagram of the method can be seen in Table II and in Fig. 5.
2) Aligned Algorithm: As mentioned above, ECG signals have strong interbeat redundancy caused by cardiac cycles.Taking advantage of this phenomenon, compression results can be highly improved in certain situations.For this reason, we apply the average beat subtraction technique [59].In this approach, the length of the heartbeats are equalized by zero-padding after segmentation and beat alignment.Then, the average of the first 30 beats in a record is approximated by using the pole configuration of the highest dimension of the architecture space.It provides an accurate representation of the mean cycle which is subtracted from all the segments of the ECG.Finally, the basic compression algorithm is applied on the residual signal.Although, the parameters of the average beat should also be stored in the header, higher accuracy/compression ratio is expected due to the interbeat correlation.

C. Parameter Estimation 1) Initial Swarm:
In order to speed up the convergence of the optimization, we use a starting estimate for the system parameters.Namely, we keep the optimal pole configurations of the previous segments in the initial swarm of the MDH-PSO algorithm.More precisely, MDHPSO is initialized with a random swarm in which the first anb particles contain the optimal pole configurations of the previous anb beats.anb is the average number of beats in a respiratory cycle computed by anb = [HR/brc], where brc denotes the number of breaths counted in a minute.In our implementation we set anb = 5.
2) Calibration of Parameter α: We investigated the influence of the regularization parameter α in the cost function f cost on CR and PRD.Our goal was to achieve the highest CR by keeping PRD within the "very good range" according to Table III (cf.[60]).To this end, we took 24 records (see e.g.Table IV) of the MIT-BIH Arrhythmia database from Physionet [8].Both the basic and the aligned versions of the proposed algorithm were applied on the signals by setting α = j/10, (j = 1, . . ., 10).By Fig. 3 we conclude that 0.5 meets our requirements for these records.Indeed, we evaluated the normalized quality scores (QSN) in Eq. ( 21), which indicates that the optimal value of α is around 0.5.Although the performance is slightly better in terms of QSN for α = 0.6, the corresponding PRD degrades to the "good" quality range.Therefore, we performed the tests for α = 0.5.The results in Table IV show that this value is a good choice generally.
3) Quantization: The optimal inverse poles and the corresponding coefficients are linearly quantized and stored in the compressed file.In order to find the optimal number of bits for the representations, we use the records 117 and 119.These are extremely irregular ECGs, which are widely used for evaluating the performance of ECG compression algorithms (see e.g.[25], [58]).We need k bits to represent the arguments ∠a j , ∠c j and the absolute values |a j |, |c j | if the quantization steps are 2π/2 k and 1/2 k , respectively.Indeed, each beat is divided by its 2 norm, so the energy of the signal is equal to 1.As a consequence of the orthogonality of the MT system and the Parseval's theorem, the constraint |c j | ≤ 1 holds.Moreover, a j ∈ D, so 1/2 k is a proper quantization step for both |a j | and |c j |.In order to find the optimal quantization step, we executed the MDHPSO 100 times on the first four minutes of the records 117 and 119 for each quantization step.The average reconstruction errors  of the runs for k = 2, . . ., 10 is shown in Fig. 4. It is evident that 4 bits are enough to represent the angles and the absolute values of the inverse poles.Similarly, it is not worth to store the coefficients using more than 7 bits, since it does not improve the PRD significantly.The number of bits assigned to the quantities included in the compressed file are displayed in Table II.
We note that the quantization of the inverse poles is especially important, since it directly influences the optimization.In order to demonstrate this effect, we examined the compression performance on the same 24 records as in Table IV by using two quantization configurations: p3c7 and p4c7.The abbreviaton pxcy denotes the number of bits we used to store both the absolute values and angles of the poles (p) and the coefficients (c), e.g.p4c7 denotes the optimal setup.As was to be expected, the sub-optimal setup p3c7 yielded worse reconstruction error (PRD) and compression ratio (CR) compared to p4c7.The reason is twofold.On one hand, storing the parameters on a better resolution decreases the PRD.On the other hand, in case of p3c7, it turned out that the low resolution of the inverse poles were compensated by an increased number of coefficients.In the optimal setup p4c7, the optimization choose less complex rational function systems to minimize the cost function which means less coefficients to store, i.e., better CR.This phenomenon can be seen in Fig. 6, where we displayed the histograms of the best dimension indices at the final iteration of the optimization.The figure shows that the optimization for p4c7 terminated more frequently in lower dimensions than its counterpart p3c7.

VII. EXPERIMENTS
For comparison tests we used the MIT/BIH arrhythmia database [8], which has been widely used for testing ECG compression techniques.The dataset contains 48 half-hour long ECG recordings, which was digitized to 11-bit resolution.We note that in many papers the tests were performed for only a small portion (couple of records, minutes) of the whole set.Here we take those 24 records suggested in [61], and compress the first channel of the entire signal.It results in a total of 12 hours long raw ECG data which was used to compare the proposed algorithm and those described in Section II-C.We note that we implemented the recent versions of the corresponding algorithms.Namely, the Hermite expansions of the QRS complexes were calculated by discrete orthogonal polynomials [14] and the mother wavelet parametrization was applied along with wavelet packets optimization [24].In the latter case, the wavelet domain was represented via the modified run-length coding recommended in [25].
Although the reconstruction error is usually expressed in terms of PRD, we cannot ignore the distortion measures designed especially for ECG signals.Unfortunately, only a few distortion criteria are available in ECG compression for performance evaluations.One of them is the so-called weighted diagnostic distortion (WDD) measure [62].On one hand, it correlates very well with mean opinion scores (MOS) of the clinical experts.On the other hand, there is no standard code for computing the WDD.Besides, the measure is unstable due to the requirement of accurate classification for characteristic features of the ECG signal.For this reason, we will use another diagnostic distortion measure called the wavelet-based weighted (WW) PRD [60].In that case, the signal is decomposed into five sub-bands which are weighted regarding their cardiological significance.Let us denote the coefficient vectors of the jth wavelet level of the original and the reconstructed signals with zero mean by c j and ≈ c j , respectively.Then the WWPRD can be defined as follows: The sequence of weights are w = 6 27 , 9 27 , 7 27 , 3 27 , 1 27 , 1 27 which were heuristically assigned to each sub-band emphasizing their diagnostic significance.We note that both the PRD and the WWPRD measures provide five quality groups corresponding to different ranges of the values, which are summarized in Table III (see Table VII in [60]).According to [63], the PRD should be computed for zero mean signals.Then it is not affected by the mean of the signal.In order to distinguish the two types of PRD we will use the following notations: where ≈ f is the reconstructed signal, and PRDN stands for the normalized PRD, i.e., PRD of a signal with zero mean.
We performed the comparison tests in two stages.First, the proposed algorithm, both the basic and the aligned variants, were executed by using the optimal parameters given in Section VI-C.Second, we took the four other variable projection methods discussed in Section II-C: B-splines [12], Hermite functions [13], [14], wavelets [25], wavelet packets [24], [64], [65].In order to make a fair comparison, we executed these four algorithms iteratively till their PRDNs became close to those of our basic method for each record.We increased the number of coefficients gradually, and we stopped the iteration if the PRDN of our method (the values in the fourth column of Table IV) or the maximum number of coefficients was reached.As a consequence, sometimes the PRDN was not even close to that of our method (see e.g. the Hermite representation for the record 102).One can see the results of the experiment in Table IV, where CR denotes the ratio of the number of bits of the original and the compressed files.Note that we used the normalized definition of the PRDN which is independent of the mean of the signal.In addition, we calculated the PRDN for each beat and presented their average for the in Table IV.

VIII. DISCUSSION
First, we compare the basic and the aligned variations of the proposed compression algorithm.As it was shown in [59], the regularity of ECG cycles facilitates the application of beat subtraction techniques.This complies with the significant improvement that can be seen for more than half of the records: 100, 103, 105,115,201,202,205,209,213,214,215,217,219,232.Namely, in these cases the mean of the differences in PRDNs and WWPRDs are 0.41% and 0.62%, and the compression ratio is 1.38 times better in the aligned case, on the average.These results agree with those in [59], especially for the records 103, 201, 202, 213, 214, 215, 217, 219 (see e.g.Fig. 8 in [59]).The standard deviations of PRDN, WWPRD and CR for the aligned and the basic algorithms are close to each other and quite low.This indicates the stability of the algorithms.Moreover, according to Table III, the average PRDN  and WWPRD of the reconstructed ECG signals fall within the range of very good and good quality for the basic and the aligned variation of the proposed method.On the other hand the aligned method is not always preferable, see e.g. the records 104, 207, 208 in Table IV.In these cases, the average beat estimation is poor due to the high presence of abnormal beats and irregular cardiac cycles.According to [59], the compression performance of the simple average beat subtraction technique highly depends on the similarity between the estimated average beat and the beats to be compressed.Therefore, a big improvement can be achieved for those recordings that present one type of beats in a regular rhythm.In order to analyze the performance of the aligned algorithm, we computed the average Pearson correlation coefficient ρ x,y between the estimated average beat and the beats to be compressed.As we discussed, the rhythm is also important, hence we calculated the standard deviation σ RR of the R to R peak distances.Since the compression performance is proportional to 1/σ RR and ρ x,y , we analyzed these measures along with the normalized quality score differences: where QSN is defined in Eq. (21).Fig. 7 shows that the average beat subtraction improved the compression performance for almost every record, except 104, 207 and 208.In case of 207 and 208, both 1/σ RR and ρ x,y are low, i.e., the average beat estimations are bad and the rhythms are irregular.In Fig. 8, we displayed the first 30 beats of the record 207.The set comprises two completely different beat types, thus their simple average will be a bad estimate of any of them.Record 104 is another example, which contains a lot of paced (No 1373) and fusion (No 664) beats.The difference between these two beat types can be seen in Fig. 9(a).Also, Fig. 9(b) shows the first 30 beats of the record 104.Although the estimated average beat is quite similar to the paced beats, it is still very different from the fused ones.This is why the resulting QSN is lower for the aligned method, despite of the high 1/σ RR and ρ x,y values.We note that there are other more sophisticated beat subtraction techniques including average beat code books.Based on the results on regular ECG records, we expect the proposed method can be further improved by using these techniques.This will be a part of future work.Now, we compare the proposed method with the other four ECG models discussed in Section II-C.We note that also these algorithms can be modified by beat subtraction techniques.Therefore, in order to make a fair comparison, we exclude the aligned method from the analysis.Moreover, as mentioned in Section VII, every method was adjusted to have PRDNs similar to the basic method.This is reflected in the "Reconstruction error" section of Table IV.Comparing the WWPRDs we conclude that the basic method is superior to the others both in terms of average and standard deviation.The same holds for most of the individual records as well.It is worth mentioning that the averages of both PRDN and WWPRD fall in the very good quality category.The best values for the standard deviations indicate the reliability of getting high quality compressed signals with CR 1:15.It is evident that the highest CR, which was our ultimate motivation, was achieved by our algorithm.The CR values for the others, except for the B-spline case, are significantly lower.In terms of WWPRD and CR the B-spline method turned to be the second best.We note that it has much higher computational cost compared to the basic method due to the iterative knot removal algorithm and the lack of orthogonality.
The variable projection via Hermite functions is one of the least effective methods in sense of PRDN, and WWPRD.In fact, the desired level of PRDN could not be reached, within the predefined low compression ratio in several cases.This is due to the segment based representation.We get a good approximation of a wave/spike if the main peak is close to the middle of the segment.On the other hand, if the predictions of the endpoints of the P, QRS, T waves are not good enough, then the approximation error is higher.The three free dilation parameters available cannot compensate the effect of bad segmentation.
We note that the optimization of AWT with respect to the two free parameters always terminated near the values a 1 = 1.3598, a 2 = −0.7821.The average differences are less than 7.7 • 10 −3 and 1.1 • 10 −2 , respectively.These are the parameters This complies with the experience in [24].The idea behind the optimal behavior of the db3 wavelets for ECG signals lie in the facts that they have three vanishing moments (see e.g.[23], [66]), and ECG curves are quite smooth signals.The latter one explains the tendency of using B-splines, orthogonal polynomials in this field.
The optimization of the adaptive wavelet packet transform (AWPT) is carried out in two steps.Following [24], we selected the best basis from the wavelet packet tree with depth 7 according to the Shannon entropy.Then, we changed the parameters of the mother wavelet using Eq. ( 6).As shown in Table IV, the CR values of AWPT are worse than those of AWT.This is due to the fact that the optimal parameters were again close to the wavelet basis db3.In contrast to the simple AWT, we have to store the best wavelet basis as well, which explains the low CR values.Our observation again coincides with the results of [24].In that work, the same adaptive wavelets and wavelet packets were applied as in Section II-C for compressing ECG and EMG signals.Although they achieved a high improvement for EMG signals, the results on ECG curves were very close to the db3 wavelet basis.Hence, they concluded that the gain in performance is strongly depended on the signal type.We call the attention to the fact that the diagnostic distortion (WWPRD), which was not examined in previous works like [24], [64], [65], is significantly higher for AWPT than for AWT.
We also compare the proposed algorithm to state-of-theart methods that are based on various approaches including wavelets [25], [67], [68], wavelet packets [74], wavelet encoding [69], [71], discrete cosine transform [70], image compression [73], and deep neural networks [72].In order to evaluate the performance of these methods the so-called quality scores were introduced in [61]: QS = CR/PRD, QSN = CR/PRDN.(21) The higher the QS or QSN, the better the performance.In Table V, we summarized the results for those records that are common in the cited papers and in our work.The results show that the aligned variation of the basic algorithm outperforms all the others.In addition, even the basic procedure is very close to the competing methods.Particularly, the QS is very high for [55], [56], which are also utilizing rational functions.However, in those works, the complexity of the model, i.e., the number of inverse poles n, and the subspace dimension N were fixed a priori, while these parameters are found automatically in our algorithm.It is also worth mentioning that most of the competing methods in Table V apply entropy based lossless compression techniques as a final step.These algorithms can be utilized for the proposed method as well, and it would further improve our results.Although the proposed generalized rational variable projection algorithm is quite complex numerically the time complexity is manageable.Let N it stand for the number of iteration of the MDHPSO, and recall the notations for the swarm size S and for the number of dimensions |I|.At the initialization step we need to evaluate the cost function in Eq. ( 16) at each particle and for each dimension.It is followed by S number of function evaluation for each iteration.Therefore the overall number of function evaluation is equal to S • |I| + S • N it .In our experiments the algorithms were implemented in MATLAB.An Intel(R) Core(TM) i7-6700 @ 3.40 GHz CPU was used.The size of the swarm and the number of iteration were set to 30 and 20 in the optimization, respectively.Then the average execution time of a 30 minutes long ECG record is about 23 minutes for both the basic and the aligned algorithms.The only competitive algorithm in terms of WWPRD and CR is the B-spline method, for which the average execution time is about 91 minutes.Note that the B-spline method utilizes a knot removal algorithm in which the initial number of knots is proportional to the number of samples in a beat.Hence, high sampling rate increases the execution time.In contrary, the computational complexity of our method depends on the number of function evaluations and the number of iterations only, which are predefined manually.In addition, one can reduce the number of relevant inverse pole configurations in Table I based a priori information of specific class of signals.The execution time can be further decreased by applying parallel implementations of the PSO and the MDHPSO algorithms, but these are beyond the scope of this work.

IX. CONCLUSION
In the first part of this paper we considered a mathematical model called variable projection and showed that several known signal processing methods can be understood as special case of that.Then we studied the rational function systems in this framework.We generalized the rational variable projection model by adding the model complexity as free parameters.For the optimization of the system parameters we developed the multi-dimensional hyperbolic variant of the well-known PSO algorithm.Based on our experience we believe that our method can be effectively used for various problems in signal processing including data fitting, feature extraction, segmentation, etc.In this work the problem of compressing ECG signals was chosen for demonstrating its efficiency.It turned out that the generalized rational variable projection algorithm outperforms the previously known ones.The MATLAB implementations of all the algorithms included in this paper and the test results are available at the website [75], [76].We emphasize that the proposed algorithm is of general nature.Namely it is flexible and can be adjusted to different types of signals like EEG [35].

Fig. 4 .
Fig. 4. Quantized arguments and absolute values of the inverse poles and the coefficients.

Fig. 7 .
Fig. 7. Analysis of the compression performance of the basic and the aligned algorithm.

Fig. The first 30
Fig. The first 30 beats in the record 207.

TABLE I ARCHITECTURE
SPACE OF THE MDHPSO Fig. 1.Average PRDs of each pole configuration for the records 117, 119 of the PhysioNet MIT-BIH Arrhythmia dataset [8].

TABLE II DATA
STRUCTURE OF THE COMPRESSED FILE

TABLE III PREDICTION
RANGES OF PRDN AND WWPRD Fig. 3. Influence of the parameter α on the PRDN, CR, and QSN.

TABLE V COMPARISON
OFPROPOSED METHODS WITH THE STATE-OF-THE-ART of the Daubechies wavelet db3.This means that the advantage of the adaptivity of this method is negligible in ECG signals.