Maximum Feasible Subsystem Algorithms for Recovery of Compressively Sensed Speech

The goal in signal compression is to reduce the size of the input signal without a significant loss in the quality of the recovered signal. One way to achieve this goal is to apply the principles of compressive sensing, but this has not been particularly successful for real-world signals that are insufficiently sparse, such as speech. We present three new algorithms based on solutions for the maximum feasible subsystem problem (MAX FS) that improve on the state of the art in recovery of compressed speech signals: more highly compressed signals can be successfully recovered with greater quality. The new recovery algorithms deliver sparser solutions when compared with those obtained using traditional compressive sensing recovery algorithms. When tested by recovering compressively sensed speech signals in the TIMIT speech database, the recovered speech has better perceptual quality than speech recovered using traditional compressive sensing recovery algorithms.


I. INTRODUCTION
A SPARSE solution is one in which most of the variables have lhe ,,a1ue zero. The few variables that take nonrero values are called the sr,ppon. Sparse solution estimation or sparse reco,,ery is an important part of Compressive Sensing (CS) and plays a major role in reconstructing a compre~ively acquired signal.
Sparse reco,,ery can be cast as an instance of lheMaxinwm gi\,en an infeasible set of linear constraints, find the largest cardinality subset thal admits a feasible solution. This is lhe same as tlr minimum unsatisfied linear relation problem (MIN ULR) of finding tlr minimum number of constraints in an infeasible linear system such lhat its complement is feasible (2]. Finding a ma'dmum feasible subsystem has applications in a wide variety of fields. including machine learning f31. misclassification minimizalion [4). training of neural networks (21,  In compressive sensing, a sparse inpul signal a of size n x l ha\' ing S non,..eros (S4 sparse) is compressed by mulliplying it by an m x n mea.s.r,rement matrir If). wlrre m << n , to yield lhe compressed signal y (also called tlr mea.s.r,rement veaor) of srze m x 1. i.e. y = ~a, where~ is typic.ally a ran-dom matrix. Random matrices are oonsidered in compre~i\'e sensing as lhey ha,,e lhe Restricted I sometry Property I 11 ) which is required for signal reco\'ery. The compressed signal y c.an now re transmitted or stored much more efficiently because of its gre.ally reduced size.
The goal of tlr signal reco,,ery process is to recreate the input signal a gi\,en lhe compressed signal y and If). This is an underdetermined system that has multiple solutions, but knowing thal the input signal is sparse. the reco\'ery process also lries lo return a sparse signal. Unfortunately. reCO\'ering a sparse solution from an underdetermined system of linear equations is NP-hard ) 12 ), ool the sparsity of too recovered signal should be c.lose to the sparsity of lhe inpul signal so lhat the "sparse appro.'dmalion" is almosl an exact reco,'el)'. Mathematically. tlr sparse approximation problem is lo fi nd x = a,y minx llxllo subject lo y = «>x where lhe nwnter of noru..eros in a ,,ector is commonly expressed as lhe zero "norm" llxll 0 . Because lhe reco,,ery is NP-hard, most algorithms inslead minimize soar other nonn IJxllr = (L:;'.. 1 lz;l")F, p ?, l. Baraniuk ll 31 evaluated sparse reco,,ery based on lp norm minimization at different values of p . Not all norms provide sparse reCO\'ery reliably. For instance, f.2-minimizalion performs poorly.
f o minimization is a difficult noncom,ex problem. Donoho and Hue I l4I.
[15] developed a com,ex optimization approach called Ba.sis Purs11i1 (BP) which minimizes the t I norm of x. Basis Pursuit is effective in returning an x that matches lhe input a wren a is very sparse [ 15H 17 I. lhal is BP has small critical sparsity (the maximwn sparsily at which the algorithm returns sparse solutions reliably). Beyond the critic.al sparsily, the reco,,ered signal will usually ha,,e more non-rero elements lhan the original sparse signal. and lrnce will lead to a poor approximation.
IL has ooen shown empirically I l81 lhal using t,, norm minimization when p < L requires fewer measurements (i.e. gre.ater compression) lhan for p = 1. Chartrand and Yin proposed lhe nonoonvex ltermive Rewe.iglue.d LeaSJ Squares (IRIVLS) algorithm 1191 and showed that it ooeds fewer measurerrents and has a larger critic.al sparsity. ll can correclly reco,,er less sparse input signals than can re recovered by lhe unregularized versions of otter nonconvex algorithms.
A small critical sparsity rreans lhal lhe reOO\'ery algorithm needs a longer rreasurement vector if it is to return lhe inpul vector accurately, so lhe compressed vector must be larger. BP and greedy algorithms such as Ma1cl1ing P1,rs1,it (MP) [201 and Onhogonal Mmciling Pl1rsi1i 1 (OMP) 12 11 are relati\,ely fast, but lheir low critical sparsity means thal they may fail to recover lhe inpul signal accurately when lhe compressed signal is not long enough relative to tlr sparsity of the input signal. They are thus inappropriate for use with more highly compressed signals. Plumbley 1221 proposed the greedy technique Pol}rope Faces Pursr,iJ (PFP) to obtain belier reOO\'ery of compressed signals which are difficull for MP. This technique is based on lhe geometl)• of the polar polytope and uses BP to approximate the sparse solution.
Tlr main issues in sparse reco,,ery are: ( i) lhe small critical sparsities of many widely used reco,,ery algorithms and (ii) the quality of the reCO\'ered signals. Existing algorithms c.an reco,,er lhe input signal exactly with high probability only when the input signal is \'el)' sparse and it is nol compressed much [231. otherwise lhe r=ered signal is of low quality. ln practic.al applications, a sparse solution is needed even if these conditions are nol art (231. In practice, lhe input signal sparsity is nol known during lhe reCO\'ery phase: it is either estimaled or assumed. Recognizing We im,estigate lhe compression of speech signals as they are not sparse by nature (25 ). and hence are challenging for CS. As a main oontribution, we demonstrale that MAX fS..based solution algorithms are able to accurately reco,,er more highly compressed speech signals with better quality. though lhey require more computation. This is less of an issue in recent years due to lhe easy availability of computational resources. e.g. "ia cloud computing.
Our experiments show thal the critical sparsities for BP. O~P. PFP, MP and IRWLS require measuremenl vectors of length m > 3. 2S. 2.8S, 3.2S , 6.45, and 8.5S respecli,,ely. In contrasl, lhe MAX FS solution algorithms require only m > 25 for accurate reCO\'ery of low pa~ speech segments. a reduction of 37.5%, 28.6%, 37.5%, 68. 7% and 76.5% in the length of lhe compressed signal with respect to BP. OMP. PA', MP, and IRIVLS. They require m > 2.65 for accurate reCO\'ery of high pass speech segrrents, still better lhan the existing algorithms. We also observe higher quality in tlr recovered signals. The MAX FS-based sparse reCO\'ery algorithms perform well in fi nding lx>th the positions and the values of the noni.eros. We believe that it is time to consider MAX FS-based solutions for CS reC<l"ery.
The remainder of the paper is organized as follows. Section II g· ives a brief O\'erview of CS and existing CS sparse reco,,ery algorithms, as well as background about MAX FS.
New MAX FS solution algorithms for sparse reco,'el)' are developed in Section Ill. The CS-based process for speech signals is proviOOd in Section [V. Experirrental setup and empirical results are presented in Sections V and V I. Section VII concludes the paper and ouUines our future work. 11. BACKGROUND

1) Signal Acquisition and Sparsification
CS compre~ion requires lhal the input signal re sufficiently sparse. When it is not sparse. tlr input signal can be sparsified by applying a suitable basis to prnduce an Ssparse signal a Many real-world signals can re sparsified by applying tre OCT (Discrete Cosine Transform) or DWT (Discrete Wavelet Transform) in which the basis coefficient weights satisfy a power law decay. More precisely. if the gi,,en input in lhe time domain. f. is sparsified using lhe basis ,r, as fn x l = 'l 1 n x nB-n x l and tlr coefficients are sorted in descending order such lhal la, I ?. l"'>I ?. .. . ?. l•nl . then the signal is compre~ible if it satisfies ( I) where Cons t is a constant and q > 0. To obtain a S-sparse signal all rut tlr S largest coefficients are sel to i.ero.
To e nsure a good reco,,ery. the number of measurements m is often detennined as given below. where ,l ( cJ), ,r,) = l After O!termining m , the compressed ireasurement vector, Ymx l · is obtained U,• mulliplying tre signal. a n x t, by IDm xn in the last step of signal acquisilion lo achie,,e oompression.
2) Sparse RocOVGI)' Sparse recovery algorithms can be broadly c.lassified into three c.ategories: com,ex relaxalions, greedy algorithms. and non~Ol'J\'ex optimization techniques 127 ). We compare the proposed irethods with one example algorithm in each class. BP and IRWLS use com,ex relaxation and a non-col'J\'ex optimization technique. respectively. while MP. OMP. and PFP are greedy algorithms. These alg«ithms are known lo pr<J1' •ide sparse solulions having good reconstructed signal quality. We review the main steps in trese algorilhms to clarify their approaches.
After updating the suppon, the new residual ,,eclor and the sparse solution are calculated using Eqn. 8 and Eqn. 9.
The algorithm halts wren the stopping condition is achieved (e.S, llr ,11:, ,). c: Orthogonal Matching Pursuit (OMP) Orthogonal Matching Pursuit (OMP) [21] is an impro,..,ment of MP. In e.ach iteration, tre residual vector r 1 is orthogonal to the columns already selected Therefore, no columns are selected L wice. The inputs to this greedy algorithm are the measurement matrix ID and the measureirent vector y (28). A new eleirent is selected at eac.h slep and cJ)•"P, has fu ll column rank. The OMP algorithm is summariz.ed as follows: I ) Initialization: • lteralion Counter: t i-1.
6) If llr, 11 > threshold. increment t and go to Step 2. Output: • T -sparse signal, a,-. The goal is obtaining an output signal ha\'ing a sparsity T as close as possible to S . ln O~P. the sparsity of the input signal S can be gi\'en to the algorithm as an input If S is specified, tre maximum iteration counter t will be equal to S . Otherv.-'ise. the algorithm stops wren ,., reacres L o the defi ned error tolerance 10-:.. This algorithm (22) performs BP to fi nd lhe sparse solution of the dual LP max, { yT cl<!> Tc :, 1} ;;: 0. In the style of the MP algorithm it adds one new basis vector at each slep. PFP adopts a path following method through the relative interior of the faces c,f too polar polytope p • = {cl <l\Tc :, l} associated with the dual LP problem and se.arches for the ,,ertex c· € P' that maximizes yT c. TOO notation (.)t means pseudo--il'J\ 'erse matrix. The steps of the PFP Algorithm are summarised below [22]: 2) Find fare: suppo,·t 1 <-8upport,\ [j J,x 1 <-(g..svp.)fy 5) Ct <-(cJ. >.up, )rrl ,Yt (lf)•up, X1, f 1 <y -Yt 6) If termination condition is net (e.g. sparsity or residual) then exil Else go to S<ep 2. Output: • T -sparse signal, or. The algorithm stops when tlr sire of support reaches the maximum sp:nity, S . (i.e. if specified in the initialization stage. t = S) or if maxi <t,f r' -1 is smaller than the minimum residual oondition. Omin· e: Iterative Rew9ighted least Squares (IRNLS) A noncom,ex variant of BP 118) has teen shown to pro\'ide exact reco\'ef)' with fewer measurerrents. The f 1 norm is replaced by ti>, e, norm. ( 10) where 0 < p < 1. p ~ 1 was studied tefore Rao and KreulZ-Oelgado 129) considered p < 1. replacing the f p cost function in Eqn. 10 by a weigh<ed e. oorm: • . "' . m~n L w,:ri s.t.
In I 191. If, is assumed to ha\'e tlr uniqi.e representation property (any m columns are linearly independent) (301. This property leads to J unique solution of IDx = y ha\'ing sparsity ll xllo = S. The approach fi nds weights based on Eqn. 12 for each iteration t. where E 1 is a sequence con\'erging to rero. Et € (0, 1), 0:;: p < 2 and y = cJix. Then. a unique solution of a con\'eX optimization problem Eqn. 11 [31 ) are used in this paper for CS sparse reco,'el)'. 1re algorithms may return a suppon ha\'ing superfluous members. Some can be re mo\'ed by postprocess--ing 123) as follows First, all non-support :r; are set to rero (or relllO"ed from 11-e model) in y = <l\x. Next. temporarily force eac.h remaining \'ariable to zero in twn: if tlrre is 3 fe3Sible solulion, I.hen lh3l ":ui®le is removed from the support.
The \'alues of the support \'ariables are found i:,, , sol\'ing a fi nal LP. The systeITT cJ•.al containing only the oolumns of If, corresponding to the support \'ariables is constructed. Then an LP is sol\'ed lo c,btain the \'alues of u,; and v; :   Goto STEP 2. OUTPUT: S11pportSet is a small number of \'ariables forming a support for the sysiem of equations.

JV. SPEECH PROCESSING VIA CS AND MAX FS
Speech is a challeng· ing input for CS as it is not typically sparse and any sparsity \'aries greatly O\'er time (321. Our process for speech processing using CS with MAX FS sparse approximation has these main sleps: • Signal Acquisition:    as an approximation to a. • Speech Signal Recovery: I) Apply a re,,erse OCT transform to x lo reCCJ\'er lhe speech seg.rrent in the lime domain. 2) Concalenate all reCO\'ered se..sments lo obtain lhe reoonstructed speech signal, f. TOO silent portions of a signal contain no usefu l information, so removing them decreases processing time and increases recovery accuracy. In our experiments, the word transc.ription information in the dataset is used to identify lhe silent parts of the input Based on [26]. by using a proper sparsifying orthonormal basis <l'. we have llf -fs ll• = Ila -as lh where fs = 'l'as .
• When a is sparse or compressible, a is well eslimaled by using as and, consequenll)'. the error ll f -fs ll• is small. so all except the S largest components of the compressible signal a can be rell)()\'ed without much loss [26 ). Here, lo obtain a 5 , the OCT coeftkienls of eac.h segrrent are sorted in descending order or magnitude; these decay rapidly lo zero if the signal is compressible. The S largest coeffic.ienls are selecled l7J' thresholding. The threshold used here is 1.3 limes the mean of all OCT coefficients in a seg.rrent and was fixed after examining O\'er 100 different speech seg.rrents from the database used in this work.

A SPEECH SAMPLES
Examples are drawn fro m lhe TIMIT database or speech samples that includes tine-aligned orthographic, phonetic and word transcriptions and speech wa,,eforms s.ampled a t
In the signal acquisition stase. two types of random me.a-surement matrices ID are used lo oompress tlr speech signal: Random Normalized M a1rices (RNM ) and Random Gat1.Ssi a11 Mturices (RGM). For lhe firsl set of experimenls. the speech signals were firsl diviOOd inlo two groups: signals thal have energy concentration in the "low frequency reg· ion" (low pass) and signals wilh energy concentralion in the "'high frequency region·• (high pass). A speech signal is low pass if the first 100 DCT coefficienlS (low frequencies) conlrioote m,ore 1 0 the total energy in tlr signal than lhe resL A speech signal is high pass if lhe componenLS after the 100 th coefficienl contribule sisnificaml)' to too total energy of too signal (s.iy 95% of the L olal energy). 10 low pass and 10 high pass male and fe male speec.h segmenls were selected for lhisexperi.menL Examples of low pass and high pass segmenLS are shCM·n in Fig. 3. For lhe firsl set of experiments. lhe reco,ered signal sparsi ly is oompared with tlr inpul signal sparsily. The speech reco,,ery is successful if T . lhe numlx':r of nonzeros in lhe reco,,ered sparse vector. is i~nlical lo S, tlr number of oonzeros in the OCT inpul signal. We record the a,,erage Tsparsity of lhe recovered OCf signals C>\ 'er 10 lrials. Tav~ag.: an various values of S . TOO numter of succe~ful reCC>\ 'eries is. recorOOd TOO GeometricMean (GM) of the a,,erage Tsparsity (&jn. 15) is used to compare algorllhms, following [23 ]. E,.,, GM = (IT Towrag.:J~ ( 15) where Ew, is tlr L ola) numlx':r of entries.

All algorithms are implemented in
TOO second experirrenl also evaluates algorithm performance based on lhe qualily of the reco,,ered speech signals as measured by the Re/a,ive Sq11ared Error (RSE) (Eqn. PESQ is a standardized algorithm reoommended by the lnlernalional Teleconrmunicalion Union (ITU) (391 and used L o assess 11-e quality of speech (38]. PESQ constructs a loudness spectrum by applying an auditory transform. which is a psyc.hoacoustic IDOCEI lhal projects the signals inlo a representalion of perceived loudness in time and frequency (38]. The loodness spectra of L oo original input signal are lhen compared with those of lhe reco,,ered signal to produce a single number in too range I (Bad) L o 5 (Excellent) corresponding lo the prediction of tlr perceptual mean opinion score.

A CRITICAL SP11RSITY OF THE SPARSE REC<NERY ALGORffHMS
A signal re<X>\ 'ery algorithm is successful iflhe re<X>\ 'ered signal is exactly tlr s.ame as tlr original input signal. Successful reco,,ery becomes har~r as the fraclion of nonreros in the inpul signal increases (i.e. lhe inpul is nol sparse enough). In our ex perirrenLS, il is observed thal if the outpul signal Tsparsity equals the input signal S-sparsity. L I-en the signals are also identical, so \Ji.'e use the malching of lhe signal srzes as our measure of success. Failures are declared ifT > S .
The concentralion of lhe OCT coefficients in low and high freque ncy inlervals a:ffecLS tlr success of sparse reCO\'ery heuristics, so resulLS are analysed for low pa~ and high pass segmenls separalely.
The resulLS for brth RNM and RGM measuremenl matrices and for low pass and high pass sesrrenLS are summarized in Table I and Tobie '2. Eac.h cell shows tlr 3\'erage oulpul T-sparsity Tot>.:Nagc o,,er 10 segments al gi\,en values of inpul S-sparsity. with the number of successes shown in parentheses. The inp:ul S-sparse signal is oonstructed by retaining only lhe S largest OCT coefficients among the 256 inpul positions. Complete success occurs wren T = Sin all 10 trials and is indicated in boldface. The lasl three rows in lhe tables ha\'e the followi ng meanings: "Tot Succ." shows lhe lotal numter of successes, "Min M" shoo•s lhe minimum nwnber of measuremenls required for each algorithm. and "GM'' indicates lhe geometric rrean over each oolumn. A 1gorilhms ha\'ing smaller GMs pro\'ide sparser solutions. Table I shows that all algorithms except IRWLS and MP perfonn very well for S s_ 35. MP succeeds oompletel)• only when S :;: 20 and the rreasuremenl malrix is RGM. lRWLS fails for all S for RN M and iLS crilical sparsily is L 5 while using RGM. Failures increase wilh larl}er S . asexpecied. The lhree MAX FS algorithms produce lhe sparsesl solulions in geometric mean and fail onl y when S > 65. The general outcome is similar in Tobie 2. though the algorithms are less successful for the high pass segments. Methods B, M and C again prO\•ide better results than the others. TOO geometric rreans from Tables I and 2 are swnmarized in fig. 4 to compare the effect of RNM \'S. RGM. Existing algorithms show better performance on signals compressed using RGM. In oontrast, tlr \'el)' best performance is seen for tlr MAX FS algorithm C when using RNM.

QUALITY OF THE RECOVERED SPEEat SIGNALS
The quality of reco,,ered speec.h signal depends on tlr reco"ery of lhe sparse OCT coeffic.ients as described pre\' iously. 48 male and 48 female speec.h sisnals of differe nt lengths are oonsidered. A !though RNM pro"ides better results for lhe MAX FS algorithms. ID is RGM since this is preferred by lhe existing sparse reco"ery algorithms.
Each speech siznal is segmenJed inlo frames of lenglh 256. After taking the OCT of eac.h .,girenl the S largest coefficients are selected by thresholding, where the threshold in each segment is 1.3 times the mean of all of its OCT coefficients. TOO sparsity of the e ntire speech signal is lhe sum of the sparsities of all of its segments. The speech inputs are compressed al CR= 50% and tlrn reCC>\'ered. The performances of tlr algorithms in approximating tlr input sparsities of the complete speec.h signals are shown in 7. The black box shows the sparsity of all 06 uncompressed speec.h signals. Blue boxes stx,..y the estimated sparsities returred by the reCC>\'ery algorithms. The sparsities are shCM'n as box-and-whisker plots with tlr median sparsity as the central mark in tlr box and the 25th and 15th percentiles as the box boundaries. The whiskersextend to the most extreme spaJSilies 001 considered oulliers. and the oulliers are plotted using the ' +' symlx>l.
The median sparsities are also listed in tlr inset texl The MAX FS methods ha"e reco,,ered sparsities that are only slightly larger than the input sparsities, and similar ranges. The median reco,,ered sparsities obtained using OMP and the 25t.h percentile of BP are in the upper quartile of the original sparsity te,,el. MP returns the worst result among all algorithms, and its lCM'er extreme of reco,,ered sparsity .... , " 100.ll " " " " " " " " " " ,. ,, . . , 1111.& ,. 'l'l'l-ll ,. ,. ,. ,. ,. '•""' ,. ,. ,. ,. is higher than the upper extreire of L he original. The MAX FS algorithms are more successful al reco,,ering the original sparsity of the speech signals al SO% compre~ion than any other algorithm comidered. They oulperfonn ex isling sparse reCO\'el)' methods in estimating sparsity in real.world speech signals, e,,en when the signal is lon~r than considered in the prev ious section. To reco,,er the oomplete speech signal all segrrents are concatenated after taking the iJ'J\'erse OCT. The 06 reCC>\'ered signals are e\•aluated using the Relali\'· e Squared Error in Fig.  8. The RSE for the MAX FS irethods are very small. They pr<:,\•ide higher fidelity reco\'ered sig nals e,,en though their solutions are sparser than those of the otter algorithms. ,. ' """',..,.
• C CMP SP P~ IP.HLS MP "'""" Fig. 9e\'aluates the quality of the returned signals using the ~rceptual E\'alualion Speech Quality (PESQ). The a\'erage PF.SQ score for reco\'ered female speech signals is better than that for reco,,ered male speec.h signals. For both male and female speech signals, the MAX FS algorithms outperfonn the others, prov iding the highest PESQ score of 4.3 for female speech signals. OMP provi~es the highest average PF.SQ score among the traditional reOO\'el)' algorithms. )'el  Too spectrograms and the frequency responses of the linear predictor coefficients of the reC<Jl\'ered and original speec.h signal of a randomly selected male and randomly selected female speech signal are presented in Figs. 10-12. The., figures compare too MAX FS methods with OMP since OMP proviOOd the smallest RS E and sparsity among the existing algorithms as shCM'n in Fig..7 and Fig. 8. The spectrograms of female sample FDRWO -SA l and male sample MCA LO -SX58. both too original signal and the reco,,ered signal, are obtained by using Ha mming windowing at 16ma. To imprO\'e the FFf performance, a length that is an ,o exact power of two is chosen. The nwnberof data points used for the FFf in eac.h block is 1024 . Fig. 12 shows too good performance of the MAX FS methods in reco,,eiring the spectrum of tlr original signals FDRWO -SA l and MCALO -SX58. The first three formants of the reco,,ered signals follow the first three formants of both female and male original signals. Table 3 compares the reco,,ered sparsity and the formants of the MA X FS methods and OMP with too sparsity and formants

VII. CONCLUSIONS
This paper describes a technique thal uses MAX FS solutions for sparse reco,,ery in compressed sensing speech processing. ll shows that MAX FS solulion algorithms reco,,er the inpul signal teller l!han recovery rrelhods commonly used in com-pressi\,e sensing. MAX FS-based techniques require fewer measurement signals (on lhe order of m 2:: 2.58) for sparse reco,'el)' to succeed Thus wren the reco,'ef)' algorithms are MAX FS-based, higter compression can be used in lhe measurement phase of oompressive sensing. MAX FS-based re<X>\ 'el)' requires more computation tlhan most existing; reco,,ery algorithms. but its ability to recover more highly compressed signals wilh higher quality means that it is especially useful for applications such as archiving where it is i1111ponant to minimiz.e storage siz.e and re<:O''ery need nol be done in real tine. We plan to work tCM-'ards speeding up l' .he algorithms to give it wiCEr applicability.
We also plan to investigate the application of these na.Y iechniques in non-speech awlicalions. e.g. medical uses such as compres.sion and reCO\'ery of ECG signals. We are also studying how to adapt the technique to handle noisy signals.