Optimal Non-Uniform Sampling by Branch-and-Bound Approach for Speech Coding

Speech coding plays a significant role in voice communication and improving network bandwidth efficiency for applications that require long-distance communication or storage space utilization. Non-uniform sampling (NUS) is a technique for the same, which performs data reduction by sampling at irregular intervals. In the literature, researchers use the structural property of the speech waveform for studying various NUS methods, such as LCSS, MMD, IPD, and zero-crossing point. However, in this paper, we consider the speech signal’s statistical properties to propose an optimal NUS approach. The proposed technique statistically analyzes the speech signal to sample the abrupt changes over a time frame and approximates the signal with minimal reconstruction error using cost and linear penalty functions for avoiding the over-fitting problem. The proposed technique further performs the optimization using the branch-and-bound. To evaluate the proposed NUS, we design a speech waveform encoder called Block Adaptive Amplitude Sampling (BAAS). A BAAS encoder can directly perform statistical analysis on the speech waveform to select data samples corresponding to the most significant changes in the signal. The decoder approximates the eliminated values using linear interpolation. We experimentally study the proposed technique using various matrices and measures such as POLQA and MUSHRA test. The evaluation shows that the proposed NUS technique retains only 25% of data samples to get an acceptable quality signal regeneration. In addition, comparative studies with MMD and IPD show that the proposed algorithm performs 1.6% better with 30% lower MSE scores.

modern wireless communication channels. A non-uniform sampling (NUS) can be a solution to the above problem. Thus, in the paper, we focus on a non-uniform waveform coder. NUS (variable sampling rate) relies on the idea that linear segments can approximate any signal in a short time frame to remove triviality in the sampled data. The various applications of NUS includes efficient computation of atmospheric turbulence spectra [25], mechanical scanning radar [26], non-invasive blood pressure (NIBP) oscillometry [27], a discrete-time sampling rate converter [28]. However, none of the above approaches use audio data.
NUS is a popular and widely studied technique in the field of speech coding also. There are many approaches for signal reconstruction using non-uniform samples like, zero crossings points [29], random points [30], level crossing points [31], local maxima and minima [32], or the inflection points [33]. The main limitation of random sampling is that it cannot skip more than two consecutive samples, which may result in redundancy. The Level crossing detection (LCD) is based on pre-determined thresholds, while sampling the speech signal at local maxima and minima (MMD)results in imprecision due to under sampling [32]. To overcome this problem extra samples are added based on a pre-determined distance value between the samples [34]. Further, in the inflection point detection algorithm (IPD) [33] the extra points added along with the local maxima and minima are points of abrupt changes in the slope of the signal. The idea is to exploit the geometrical structure of the speech signal. Researchers have also studied NUS in amplitude domain. The authors in [35], presents an iterative algorithm to recover a bandlimited signal from samples generated by amplitude sampling. In amplitude sampling the amplitude of the signal is quantized i.e. samples are only taken whenever the signal crosses a predefined set of amplitude levels while the time is represented with infinite precision [36], [37].
Recently, researchers also proposed adaptive level cross sampling (LCSS) for the next-generation data-driven applications [24] and mobile applications [23]. The LCSS technique only samples the amplitudes crossing the fixed threshold level, i.e. the samples above a pre-defined threshold are non-uniformly spaced along the time axis. In all the above approaches, the sample selection methodology relies on the structural property of the waveform, and to the best of our knowledge, few attempts are made to optimize the number of samples used for reconstructing the speech signal. In [34], the authors analyze the quantization properties of non-uniform sampling to optimize the speech samples using dynamic programming. The paper emphasizes quantization, i.e., on the optimal number of samples and not the perfect reconstruction of the signal.
However, in this paper we propose an optimal non-uniform sampling method based on the changes in the shape of a signal by studying its statistical properties in the amplitude domain to reduce the bit rates and retain the advantages of irregular sampling. Compared to the previous approaches, the proposed approach selects the time instance (or change points) of sudden changes in the amplitude of a signal that affects the marginal distribution of its statistical properties like Mean, Standard Deviation, and the linear statistical parameter (LSP, considering both mean and slope). The intuition is to capture time intervals where the signal differs from its statistical property within an upper bound. For this, we define a cost function(C) which minimizes the number of point selections based on the statistical property changes. In other words, the proposed approach analyzes the speech signal statistically to sample the abrupt changes over a time frame and reconstruct the speech signal via interpolation. The signal is approximated with minimal reconstruction error using the cost function(C) and a linear penalty function to avoid the over-fitting problem.
More specifically, the proposed methodology samples the time instance where the amplitude distribution of the signal differs with respect to at least one of the statistical criteria such as mean, variance, or slope. The approach relies on approximating the signal with minimal reconstruction error, as shown in equation 1. The approach also defines a cost function C and a linear penalty function to avoid the overfitting problem. To further remove redundancies in the sam- Combined peak and level-crossing sampling scheme Based on peak and pre-defined threshold level in waveform Byeong-Gwan [33], [39], [40] The challenges of the proposed methodology are as follows: * Optimal Samples: To ensure that the number of samples selected must be optimal, i.e., the samples should not inherit the redundancies of the waveform. * Block Size: The block size selected must be small enough that the signal appears linear over the time segment. * Signal Quality: Optimization of the samples must not affect the quality of the reconstructed signal. That is, the decoded speech signal's quality should not get compromised by the partial data approximation of the signal. As an application of the proposed non-uniform sampling approach, a waveform coder is designed, referred to as Block Adaptive Amplitude Sampling (BAAS). The coder divides the waveform into fixed-size blocks. Then, each block is encoded using the most significant linear changes of mean and slope in the amplitude domain. In addition to the sample points, the encoder transmits the time instance with their corresponding amplitudes. The decoder reconstructs the speech signal using linear interpolation in the amplitude domain.
We evaluate the performance of BAAS for various statistical parameters like mean, variance, and standard deviation. The results show that the number of samples retained by the proposed methodology depends upon the selected statistical parameter. That is, mean and standard deviation retains 90% and 50% sample points of a signal on an average, respectively, whereas a linear combination of both mean and slope retains only 25% sample points. Furthermore, we study the quality of BAAS using the five-point MOS (mean opinion score) scale using POLQA for various statistical parameters [41], and also evaluate the subjective quality MUSHRA score [42]. We also compare BAAS with other non-uniform time sampled waveform coders [32]- [34], [39], [40]. The evaluation shows that the number of change points selected by the proposed methodology is significantly less to the previous methods, and the reconstructed signal is a close approximation of the original sound signal.
The rest of the paper is organized as follows. Section-II presents the related work. Section-III presents the methodology of the proposed non-uniform sampling approach. In section IV, we discuss the experimental setup and simulation results. Finally, section V concludes the paper. Figure 1 shows a broad classification of speech coding techniques. Parametric coders analyze the speech signal to derive the acoustic parameters like pitch, fundamental tones, transient, harmonics, and uses them to re-synthesizing the speech signal. The output speech signal from these coders is an approximation of the input speech signal [43]. The main advantage of a parametric coder is that it re-syntheses a reasonable quality signal at a very low bit-rate [44]. On the other hand, the waveform coders attempt to code the shape of the speech waveform with minimal error, ignoring the details of the speech production mechanism and speech perception. These are low complexity coders that can produce a highquality speech signal with a high data rate (≥ 16 Kbps [44]). The hybrid coders combine the advantages of both parametric and waveform coders to produce a medium-rate high-quality speech signal. However, according to [45], the waveform coder is widely used as it can produce high-quality speech signals at low computational complexity.

A. WAVEFORM CODERS WITH NON-UNIFORM SAMPLING
Since the proposed methodology is a waveform coder, in the following we focus our discussion on the same. The most prevalent works on the waveform coders are based on frequency [11], [12] and time-domain [13]- [15]. The waveform coders commonly use uniform time sampling to retrieve the exact shape of the speech signal. The disadvantage of time sampling is that the memory requirement increases with the increasing size of the signal [46].
As an alternative to uniform sampling, researchers proposed various non-uniform coding algorithms, like polynomial predictor [47] and interpolator using PAN-algorithm [48]. The non-uniform sampling in [47] is based on the observation that a signal may be approximated by a combination of linear segments in a small interval of time. Similarly, in [30], authors propose a non-uniform sampling technique by eliminating random sample points obtained from uniform sampling to perform data reduction. The algorithm establishes a linear relationship between known and estimated samples, to reduce it to sets of coefficients with which the known samples can be weighted to recover the skipped samples. Although the technique proposed in [30] can reduce the sampling rate (below the Nyquist rate) but the method works with a restriction of not skipping more than two consecutive samples. In [31], authors propose a signal reconstruction method by sampling the data at level crossing points (LCD) by defining threshold levels (or predetermined levels). This pre-determination in the LCD method results in roughly sampled data for a rapidly changing signal between the predefined levels.
Later, in [32], [49], authors approximate the speech signal reconstruction by using a non-uniform sampling technique that uses local extreme values (local maxima and minima (MMD)). The primary goal of [32] is to establish the importance of the local extrema values for the speech signal. The reconstruction is performed using a sequence of complex numbers called a skeleton, a pair (amplitude, distance). The study defines the limit of quantization for amplitudes and distances to preserve the intelligibility of the reconstructed speech signal. The method results in an imprecise sampling of the signal [34].
In order, to reduce the imprecision and overcome the sparseness problem in [32], extra sample are inserted in [34]. These extra samples are selected using a predefined threshold, i.e., if the distance between the two consecutive sample locations is above a threshold, a mid-point is added. Recently in [33], [39], [40], the author proposes the concept of the inflection point in non-uniform sampling, i.e. instead of adding extra samples to overcome the spareness problem of the MMD method, the algorithm uses the geometrical structure of the speech signal. For this author's samples, the local maxima-minima and the points where the slope of the signal changes abruptly. The reconstruction of the speech signal is performed using interpolation. The non-uniform sampling techniques in [33], [39], [40] are the variants of the inflection point detection (IPD) method that mainly use the inflection points, i.e., the points where the slope of the signal changes abruptly.
In another approach, level crossing sampling is combined with the peaks of the signal [38]. The authors sample only the points closely located to the signal peaks. The reconstruction of the speech signal is done via linear interpolation. Recently researchers proposed the level cross NUS [23] method for A/D convertors to reduce the bandwidth and computational complexity (for various data processing applications by reducing the sampling rate of the signal). The technique's sampling is directly proportional to the activity in the signal, viz. the sampling frequency is higher if the signal activity is more and vice-versa. V. Ravuri et. al. [24] and Senay et. al. [50] proposes an adaptive level cross sampling scheme (ALCSS), challenging the traditional uniform sampling analog-to-digital (A/D) converters as they sample at a constant rate which results in oversampling, making the data processing computationally intensive. Table 1 gives a brief overview of the current state-of-art non-uniform sampling algorithms.

B. SUMMARY
In comparison to the above approaches, the proposed methodology considers the statistical change points of the speech signal to perform non-uniform sampling. Figure-1 and Table-1 shows the distinction between the proposed and previous methodologies. Our intuition for using the statistical change points for NUS is to detect the lack of homogeneity (in terms of statistical distributions) in the sequence of a speech signal to find an optimized number of change points for reconstructing an acceptable quality speech signal. As, each selected change point is checked for a bound condition to reduce redundant samples, resulting in better compression. The approach further optimizes the selected sample points using the branch-and-bound solution to include minimum sample points based on the cost function (C). As suggested in [45], the statistical change points can model time-series data, which can be useful for application areas such as medical condition monitoring, climate change detection, speech and image analysis, and human activity analysis.

III. PROPOSED METHODOLOGY
As mentioned above, non-uniform sampling is a technique of selecting the samples of time-series data at irregular intervals that further approximates the original signal by a sequence of polynomial functions [47]. In the following, we discuss the proposed non-uniform sampling methodology and its optimization using the branch-and-bound approach. The approach analyzes the speech signal statistically to sample the abrupt changes over a time frame. The selected data samples are optimized to reconstruct the speech signal via interpolation. The sampling of discrete-time instances is performed with an assumption that the statistical property remains constant between the consecutive data samples. Also, we discuss the extension of the proposed non-uniform sampling methodology to design a waveform coder, called block adaptive amplitude sampling (BAAS).

A. STATISTICAL ANALYSIS TO PERFORM NUS
In the following, we start with a broad definition of the statistical change points, as instances where the statistical property of the signal changes significantly. That is, given a time series data x(n) = {x 1 , x 2 , · · · , x N }, a change point occurs at instant τ , if the distributions of {x 1 , x 2 , · · · , x (τ −1) } and {x (τ +1) , x (τ +2) , · · · , x N } differ with respect to at least one criterion such as Mean, Standard Deviation or Linear property (Mean and Slope). For sampling, we consider the fact that the distribution of x i changes at unknown time instances t 1 < t 2 < · · · < t k , where t 1 is the first instance of change, t 2 second and so on. Although the characteristics of a signal can change suddenly at unknown time instance t j , but within the interval (t j−1 : t j , (1 ≤ j ≤ k)) it is bounded by a very small threshold (th). These time instances can be sampled to approximate the source signal using interpolation.
For example, considering Mean as the statistical parameter, we show binary segmentation of a signal in figure 2. In the figure the vertical line marks the value of τ i.e. the change point for the signal and the statistical property remains within a threshold th for the intervals {x 1 , x 2 , · · · , x (τ ) } and {x (τ +1) , x (τ +2) , · · · , x N }. The change point (τ ) is the instance that satisfies the following condition. In the following, (th) is a threshold used for selecting the change points of a signal.
Similarly, we can consider equation 3 as the Linear statistical property (LSP ), by combining the Mean and Slope of the signal. The change point τ can be calculated using the equation 4. The equation 3 calculates the LSP of segment (t 1 , t 2 ). A change-point is selected, if the difference between statistical estimates above a threshold.
Note that the threshold (th) in the above equations depends on the signal and the statistical parameter. It can be determined as the average of the minimum values in the segment. In other words, it depends on the rate of the variability of the signal with respect to the studied statistical parameter.
In the literature, similar definition of change points is used for ECG anomaly detection [51], machine anomaly detection [52], climate change detection [53] and several other applications [54]. As shown in equation 2 and 4, the change points are usually selected based on a pre-defined threshold value, which is application-dependent. In this paper, we use the concept of change points to perform non-uniform sampling of the speech signal and use the sampled data to design a waveform coder.

B. NAIVE STATISTICAL NUS
For minimizing the threshold (th), we associate a cost function (C) with every change point. This is important for reducing the redundancies in the sampled data. The cost function C(x t1:t2 ) defined in the interval [t 1 : t 2 ], can be computed as the sum of deviation of each point in the interval from the selected statistical estimate ( x t2 t1 , in the interval).
To find a changepoint in an interval [1 : n], we divide the signal in two parts at some instant τ such that, sum of the cost of first and second segment [1 : τ ] and [(τ + 1) : n] is less than the total cost of the signal, as shown in equation 5.
Similarly, we can use equation 5 to find all K change points of a signal by using a bottom-up approach. As a base case, we consider the first data point as a change-point and iteratively add/remove the subsequent points using the cost function stated above. The K change points results in K + 1 segments of a signal, with an i th segment containing all points in the interval x (ti−1+1):ti .
Notice that equation 5 can sample all the data points in the worst case, particularly for a rapidly changing signal. To control the number of change points, we introduce a linear function αg(K) = αK in the equation above, where α is a constant, which controls the rate of growth of the linear function and in turn controls the number of change points. The problem of finding K change points can be formulated using equation 6, where α is a penalty added to the cost function (C) each time a new change point is detected.
Update change pts 7: k ← k + 1 Increment k 8: end for 9: return (cp(n)) return the change pts 10: end procedure Algorithm 2 Optimization by Branch-and-Bound Input cp is the list of change points, t currently selected change point 2: for all t ∈ cp(i − 1) do 4: if H(t) + C(x t+1:t ) + α < H(t ) then Input (x1, x2, · · · , xn) time series audio sample 11: for i = 1, 2, . . . , n do 12: end for 16: return (cp(n)) return the change pts 17: end procedure Notice that if α > 1, the total cost function (C) increases significantly, resulting in redundant sampling and vice versa. Thus, α is a linear function used to optimize the number of non-uniform data samples. In the experimental section, the cost function C is evaluated for various statistical parameters like Mean, Variance, Standard Deviation, and Slope in the signal.
In equation 7, H(s) is the minimization of equation 6. The intuition is to choose a time instance (τ ) such that the cost function (C) reduces with respect to the deviation from the statistical estimate of the signal, as given in equation 5. The idea is to reduce the cost function whenever a new change point is introduced to the sampled data set. Next, we discuss the elimination of certain sampled data points using the branch-and-bound approach to improve computational efficiency and remove the redundancy in the signal.

C. OPTIMIZATION BY BRANCH-AND-BOUND
To further optimize the non-uniform sampled data, we propose a branch-and-bound solution to remove the redundancies in the sample. The branch-and-bound algorithmic paradigm solves the combinatorial optimization problems. Each selected state of the solution is checked against the bound conditions defined in the algorithm to determine the next state of the optimal solution for the problem [55]. In the algorithm, we use the concept for optimizing the number of sample points. For this, we assume that the selected subset of the change points at any instance is the current state of the algorithm. Based on the optimality principle, we re-write equation H(s) as equation 8 Next, we define the bound condition for eliminating the overfitting points. For this, let us assume that for some time instance τ the selected change points are T τ = {t 1 < t 2 < · · · < t τ < τ }. That is, the current state of the algorithm is T τ . To determine if the next data point (τ * = τ + 1) can be included in the sample, or not, we use the bound condition as shown in equation 9. The point is included in the sample if the cost of the next state (H(τ * )) is less than the cost of the previous state.
Algorithm 1 shows the implementation of the proposed non-uniform sampling technique by selecting the change points based on the statistical deviation. It uses a time-series signal x as input to generate the set of change points cp(n).  The algorithm begins with initial values cp(0) = N U LL and H(0) = −α, where α as a constant. In each i th iteration, the algorithm selects the change point t with minimum deviation from the statistical estimate as a nonuniform data sample (step 4 of algo-1). Algorithm 2 optimizes the problem of sample overfitting of algo-(1). The procedure M inSamples_Bound is similar to algo-1, however it checks the previous state (cp(i − 1)) and newly selected t against a bound condition (step 14 in algorithm 2) to optimize the sample data. The procedure BBOpt of algorithm 2, decides the inclusion of the newly selected change point by checking for the bound condition (step 4 in algorithm 2), and is used for the optimizing of the selected change points.

D. BLOCK ADAPTIVE AMPLITUDE SAMPLING (BAAS)
To evaluate the proposed non-uniform sampling methodology, we design a waveform coder for the speech signals. The non-uniform sampling is performed in the amplitude domain based on the statistical changes in the signal's amplitude property. One of the challenges in designing an encoder using the proposed methodology is to define the block size so that the sample points can reconstruct an acceptable quality speech signal. Another challenge is to determine the statistical parameter that can effectively reconstruct the signal. signal, fs, ← read audio file 3: denoised ← Denoise the signal using DWT 4: f size ← size of the frame, fs/2 5: nof ← number of frames of frame size f size 6: for i ← 1 to nof do 7: f rames(i) ← get the samples of length f size 8: Find opt change pts (cp) using algo-2 9: amp(i) ← amplitude corresponding to cp(i) 10: end for 11: for i ← 1 to cp do 12: j ← cp(i) 13: bitsream ← add 0 from current index to j-1 14: bitsream(j) ← add 1 followed by 16-bit binary number of amplitude 15: end for 16: return (bitstream) 17 for i ← 1 to length(slopes) do 12: p ← encodedsignal(i) 13: q ← encodedsignal(i + 1) 14: k ←1 15: for p to q do 16: y = slope(i) * k + p 17: k ← k + 1 18: end for 19: return (y) y is the decoded speech signal 20: end for 21: end procedure

1) BAAS Encoder
Encoding is a technique of converting the source data into symbols which can further be used for communication or storage purposes. In the following, we discuss the proposed non-uniform sampling technique to design a waveform coder. Figure 3 gives a general flow of the proposed waveform coder by considering the transmission of the speech signal over the network between two parties, i.e., the sender and receiver. The encoder selects the data samples that correspond to the most significant change in the signal using the algorithm proposed above. The selected data samples are the encoded signal which sends over the network. At the other end, the receiver decodes the signal using linear stylization. Although we begin with denoising the signal, it is important to note that the analysis can be done directly on the raw speech signal, as well. VOLUME   Algorithm 3 shows the steps followed by the encoder to compress a speech signal. Firstly, we de-noise the input signal using the discrete wavelet transform (DWT). The coefficients of the transformed signal are changed appropriately into the wavelet signal according to their local features. Eliminating all the coefficients below the threshold value. Finally, reconstruct the signal from the remaining coefficients using the inverse wavelet transform. To retrieve the stationarity of the signal framing is performed in the subsequent step, a speech signal can be considered as stationary in a short period. The frame size selected is f s /2, where f s is the sampling frequency of the speech signal. Finally, the encoder performs non-uniform sampling, discussed in section III-A, to reduce the size of the input signal. Figure 4 shows the structure of a BAAS speech encoder. At the transmitter end, the encoder detects the change points using algorithm 2 and transmits the time instance of change points and the corresponding amplitude (16 bits). For the rest of the samples, only the time instances are transmitted. To distinguish between the two cases, we use the flag field. Figure 5, shows the structure of the bit-stream, where flag '0' shows the interpolation point, and flag '1' is the change point followed by a 16-bit binary number (the corresponding amplitude). For example, consider a bitstream generated by an encoder < 1 a 1 1 a 2 0 0 0 0 0 0 1 a 3 >, where 1/0 represents the flag and the a i (16 bits) are the amplitudes of the selected change points. The bitstream begins with flag 1, which represents that the next 16 bits is the amplitude of a change point. Similarly, the next flag bit (following a 1 ) is 1 and shows that the next 16 bits (of a 2 ) is a change point. Following this, the flag field in the bitstream contains a sequence of 6 zeros and the next change point is represented by flag 1 and amplitude a 3 .
We can also modify the proposed BAAS encoder to design a fixed-rate waveform encoder. For this, we fix the number of samples extracted in the frame using the algorithm 1. The algorithm adapts the selected subset by pruning the excess samples using equation 9. The input to the algorithm is the raw speech signal. The algorithm iteratively selects the change points for each frame using the changes in the statistical parameters, as discussed in algorithm 3. The retrieved change points and the corresponding amplitude are stored in an array for transmission or storage for later usages.

2) BAAS Decoder
The encoded speech signal is approximated using linear interpolation between two consecutive non-redundant sample points [47]. In the literature, researchers have studied various ways of approximating a signal, including quadratic approximation [32], spline approximation [30], linear approximation [39]. However, in this paper, we use the linear approximation for decoding the original speech signal. The resulting waveform results in a saw-tooth shape in the voiced segment with a better approximation. The intervals of silence and unvoiced have randomness, represented by a straight line with zero slopes. Algorithm 4 shows the steps followed by a decoder to reconstruct the speech signal. On receiving the bitstreams, the decoder reconstructs the speech signal.

IV. EXPERIMENT AND RESULTS
For the experimental evaluation, we use the speech samples from LibriSpeech ASR corpus [56] and EUSTACE speech corpus [57]. The LibriSpeech is a corpus of English speech containing approximately 1000 hours of audio samples of 16kHz, prepared by Vassil Panayotov with the assistance of Daniel Povey. The EUSTACE is the Edinburgh university speech timing archive and the corpus of English.

A. PARAMETER SELECTION
In this section, we study the effect of block size on the performance of the proposed methodology. The block size ensures that the number of samples selected is optimal, i.e., they do not inherit the waveform's redundancy. Also, we investigate the performance of various statistical parameters and their effect on the reconstructed signal's quality. Selecting the best statistical parameter for the proposed methodology such that the decoded speech signal is of acceptable quality or partial data approximation does not compromise the quality of the decoded signal.

1) Block Size Selection
In this experiment, we empirically try to determine the effect of the block size on BAAS by encoding and decoding the speech signal with various block sizes, i.e., (f s, f s/2, f s/4, f s/8, and f s/16) and use the sample reduction rate (SRR) and mean square error (MSE) for selecting the appropriate parameter. Figure 6b, shows the sample reduction rate (SRR) for the various block sizes. From the figure, we find that the SRR decreases linearly with block size. In other words, as the block size decreases, the number of data samples selected for the approximation increases. Besides, in figure 6a, we use mean square error (MSE) to test the accuracy of the decoded signal with decreasing block size. The graph shows that the MSE is low for f s/2 and then nearly constant. We select the block size as f s/2 for the following set of experiments, as it gives a better compression rate with acceptable loss in audio quality.

2) Statistical Parameters Selection
The statistical NUS is performed with an assumption to record the instance of significant changes in the statistical parameter within an interval. The various statistical parameters studied in this paper are Mean, Standard Deviation, and linear statistical parameters (LSP, considering both mean and slope). Figure 7 shows data sampled by the various statistical parameters. The figure shows that average samples retained by the proposed algorithm depend upon the chosen statistical parameter. The mean and standard deviation can retain about 90% and 50% of the signal on an average. However, considering the linear combination of mean and slope can retain about 25% of the samples on an average. Figure 8a shows the Average MSE (mean square error) for all three statistical parameters, showing the dif-   figure 8b, we study the perceptual objective listening quality assessment ITU P.863 (POLQA) [41]. This is an intrusive object quality test which is based on the comparison of the original signal to the transferred signal. Figure 8, shows that the average POLQA for Mean is 3.0 and, it retains approximately 90% of the data samples. The STD has a lower subjective quality score of around 2.3. for 50% of the change points. In the following experimental evaluations, we use LSP, as it retains about 25% of data samples and generates a good signal. Figure 9 shows the percentage of the NUS retained (using LSP as the statistical parameter) with the POLQA score of each signal, with green dotted lines showing the average POLQA for the Libre Speech Corpus [56]. Furthermore, figure 10 shows the variations in the SegSNR of each signal. Figure 11 shows a comparison of the original and reconstructed signal using the chosen parameter.

B. FIXED RATE SAMPLING
One of the primary disadvantages of non-uniform sampling is that it selects a variable number of sample points per frame, which may not be suitable for the communication applications [58]. To solve this, we modify BAAS to select a fixed number of samples per frame. Figure 12

C. COMPARISONS WITH OTHER SAMPLING TECHNIQUES
In this section, we begin with a preliminary investigation on the number of non-uniform sample points selected for interpolation by the MMD, IPD, and the proposed methodology. Further we also compare the proposed sampling technique with ADPCM [59] and also with level cross algorithm.

1) Comparison with other NUS Techniques
The MMD method samples the signal by local maxima and minima, marked by solid blue circles in figure 13. However, the IPD method selects the abrupt changes in the signal's slope along with the local maxima and minima to overcome the sparseness problem of the MMD method, as shown by diamonds. Finally, the proposed methodology selects the data samples using statistical analysis of the signal, marked by the squares in figure 13. We consider equation 3 to mark the change points using linear (LSP ) of the signal. The figure shows that the proposed methodology selects fewer change points compared to the previous methods. Also, figure 14 shows a close approximation of the original signal, where we use the retrieved change points to approximate the original signals using the linear approximation. Figure 15a shows the average data samples generated by the three methods on the Libre Speech Corpus. It is evident from the figure that BAAS retains a comparatively less number of samples than the IPD method. Similarly comparing the three algorithms in terms of average POLQA [41] scores, as shown in 15b, we find that the performance of MMD and IPD methods are close to each other; whereas the performance of BAAS is better than IPD method with better compression.  Furthermore, we compare BAAS, MMD and IPD methods using the EUSTACE speech corpus [57]. Table 2 shows the average results of the three methods. As shown in the table, BAAS performs better with lower MSE and higher POLQA scores in comparison to MMD and IPD.

2) Comparison of BAAS with ADPCM
We compare the proposed BAAS with ADPCM Speech Codec (Adaptive Delta Pulse Code Modulation). ADPCM takes advantage of the high correlation between the consecutive speech samples by encoding the difference between a predicted sample and the speech sample. In contrast, the proposed approach samples the changes in the statistical prop-erty of the signal following BAAS (block adaptive approach). We use the following parameters for evaluating the speech quality of the two algorithms: segmental SNR (segSNR) [60] weighted-slope spectral distance (WSS) [60], POLQA [41] and log-likelihood ratio (LLR) [60]. The results are shown in table 3. The table shows that the proposed BAAS algorithm produces an acceptable quality signal with an average POLQA score of 3.0. However, we can achieve POLQA scores comparable to the standard wave-coder ADPCM by marginally increasing the sampling rate as the quality of the reconstructed signal by BAAS is directly proportional to the percentage of the retained sample (as seen figure 12a).

3) Comparison with Level Crossing Sampling
In this section, we compare the performance of BAAS with the recently proposed level crossing sampling algorithms [23], [24]. For this, we use 16−level linear quantization of the amplitude and 256−level µ law quantization. The other parameters are similar to the ones discussed above. Table 4 shows the average results of BAAS and level crossing sampling. The comparative analysis suggests that the proposed technique has a lower MSE score, and thus the regenerated speech waveform is a better copy of the original signal.  n is the number of retained samples. Besides, the algorithm computes intermediate points using basic arithmetical calculations.
We also perform the MUSHRA listening test (Multiple Stimuli with Hidden Reference and Anchor) following the ITU-R BS.1534-3 recommendation [42], to test the performance of BAAS. The experiment uses a web-based platform [61]  have added an anchor signal (low-pass filtered version of the original signal with a cut-off frequency of 3.5 kHz). In the experiment, we consider three test scenarios for three different audio samples. In each test scenario, evaluate five conditions. These include the reference signal (REF), lowanchor (Anchor35) signal, ADPCM, BAAS, and Level Cross Sampling methods speech codec. The audience consisted of 10 trained normal-hearing adults, with ages between 25 and 35 years.
Each speech sample is played twice before rating. Figure  17 shows the results of MUSHRA test. The first column shows the averaged value for all the three scenarios and the remaining three columns shows the individual ratings per test scenario. In the experimental result, we mark the average MUSHRA scores with confidence interval of 95%. The figure shows that the performance of BAAS in comparison to the other NUS sampling algorithms. From figure 17, BAAS has a subjective score higher than 92 on average, which indicates an acceptable quality of the regenerated signal.
Finally, we compare the performance of BAAS on two different corpora, i.e., Libre Speech Corpus [56], and EU-STACE speech corpus [57]. We perform the simulation with a block size set as half of the sampling frequency (fs). The encoder uses the proposed algorithm to retain the statistical changes in the signal. We use the following parameters for comparing the speech quality of the two datasets: segmental SNR (segSNR) [60], weighted-slope spectral distance (WSS) [60], POLQA [41] and log-likelihood ratio (LLR) [60]. We also verify the quality of the decoded signal using various quality scores like Csig Mean opinion score (MOS) to predict signal distortion [63], Cbak MOS predictor for backgroundnoise intrusiveness [63] and Covl MOS predictor for signal quality on the whole [63]. Table 5 shows the average results, making it evident that the proposed NUS technique gives a

V. CONCLUSION
In this paper, we propose an approach that statistically analyzes the speech signal to sample the abrupt changes over a time frame and approximate the signal with minimal reconstruction error using cost and linear penalty functions for avoiding the over-fitting problem. To further optimize the selected change points, we also propose a Branch-and-Bound algorithm. The methodology approximates the eliminated samples by linear interpolation. The proposed NUS technique retains only 25% of data samples to get an acceptable quality signal regeneration. The idea of using nonuniform sampling in the amplitude domain is to capture the time instance of sudden changes in the signal affecting the marginal distribution of the signal's Mean, Variance, or Spectral Distribution. These time instances retrieve the structural property of the speech signal with minimal error. To evaluate the proposed non-uniform sampling methodology, we design a waveform coder viz. Block Adaptive Amplitude Sampling (BAAS). Compared to conventional non-uniform sampling methods that detect local maxima, minima, and inflection points, the proposed method shows a reduction in the number of detected samples and improvement in subjective quality. A primary disadvantage of non-uniform sampling is variable code rates that are not suitable for a fixed communication channel. We also propose a fixed-rate NUS based on BAAS and evaluate its performance on different standard databases.