Machine Learning-Based Scoring System to Predict the Risk and Severity of Ataxic Speech Using Different Speech Tasks

The assessment of speech in Cerebellar Ataxia (CA) is time-consuming and requires clinical interpretation. In this study, we introduce a fully automated objective algorithm that uses significant acoustic features from time, spectral, cepstral, and non-linear dynamics present in microphone data obtained from different repeated Consonant-Vowel (C-V) syllable paradigms. The algorithm builds machine-learning models to support a 3-tier diagnostic categorisation for distinguishing Ataxic Speech from healthy speech, rating the severity of Ataxic Speech, and nomogram-based supporting scoring charts for Ataxic Speech diagnosis and severity prediction. The selection of features was accomplished using a combination of mass univariate analysis and elastic net regularization for the binary outcome, while for the ordinal outcome, Spearman’s rank-order correlation criterion was employed. The algorithm was developed and evaluated using recordings from 126 participants: 65 individuals with CA and 61 controls (i.e., individuals without ataxia or neurotypical). For Ataxic Speech diagnosis, the reduced feature set yielded an area under the curve (AUC) of 0.97 (95% CI 0.90-1), the sensitivity of 97.43%, specificity of 85.29%, and balanced accuracy of 91.2% in the test dataset. The mean AUC for severity estimation was 0.74 for the test set. The high C-indexes of the prediction nomograms for identifying the presence of Ataxic Speech (0.96) and estimating its severity (0.81) in the test set indicates the efficacy of this algorithm. Decision curve analysis demonstrated the value of incorporating acoustic features from two repeated C-V syllable paradigms. The strong classification ability of the specified speech features supports the framework’s usefulness for identifying and monitoring Ataxic Speech.


I. INTRODUCTION
T HE cerebellum integrates information from a range of sensory inputs with the aim of aiding the production of coordinated movement.Cerebellar Ataxia (CA) refers to the uncoordinated movement resulting from dysfunction of the cerebellum; it is caused by many processes, including neurodegeneration, multiple sclerosis, stroke and trauma.As the cerebellum regulates many aspects of movements, CA results in uncoordinated movements of the limbs, trunk, gait and eyes.Speech is also regulated by the cerebellum and dysfunction in relevant cerebellar regions [1] can result in cerebellar ataxia of speech, sometimes referred to as Ataxic Dysarthria but referred to here as 'Ataxic Speech'.Clinically Ataxic Speech is recognised as increased variability of impaired timing, highly variable syllables (large variation in the duration of syllables and interval between syllables and loudness of individual syllables), articulatory imprecision [2], variations in pause durations [3], [4] and peak amplitude [5].
An important tool for emphasising these features of Ataxic Speech is to ask the subject to repeat a pair of syllables consisting of a Consonant followed by a Vowel (C-V) [6].Interestingly, at the bedside, the distinction between the speech of individuals with ataxia and controls (i.e., individuals without ataxia or neurotypical) is more readily achieved using C-V repetition than sentence utterances [7], [8].Most of the studies in the literature used time-based measurements (perturbation measures, jitter, shimmer), frequency features (fundamental frequency, low-to-high frequency components ratio, harmonics-to-noise ratio (HNR) and pitch) and level of sound pressure to distinguish Ataxic Speech from healthy speech [9], [10].A few studies considered cepstral-spectral measurements [11], [12], which included signal measurements such as the prominence of the cepstral peak (CPP) and smoothed CPP (CPPs).These methods were examined in relatively small cohorts and scored using perceptual assessments: where human tags or labels features in the recorded speech signal in the time or frequency domain to aid in the analyses of speech.
This clinical evaluation of Ataxic Speech consists of a qualitative assessment of the individual's performance on various phonetic tasks [16], [17] and has remained largely unchanged since the early 1900s.To enhance repeatability and reduce variability caused by individual differences in interpretation, it is necessary to establish an objective measurement method for Ataxic Speech rather than relying solely on subjective or perceptual assessments.Although the state-of-the-art research on ataxia indicates varying diagnoses for speech ataxia, such as Spinocerebellar and Friedreich ataxia [3], [18], Multiple system atrophy [19], and Multiple sclerosis [20], [21]), our study specifically investigated Pure (central) CA [22], CA with Bilateral Vestibulopathy (CABV) [23], [24], and CABV with Somatosensory impairment [25].Additionally, all patients in our study exhibited pure CA without any co-occurring dysarthria (Table I).The literature review in Table II shows that there is currently no fully automated system for identifying Ataxic Speech and estimating its severity with probability scoring charts.In our previous studies, we have explored the effectiveness of time-based [10] and cepstral-based measurements [13] in two separate repeated C-V syllable paradigm tasks and designed models for identifying Ataxic Speech and estimating its severity.Based on this previous work, the development of interpretive visual scoring charts for estimating the probability of Ataxic Speech risk and Ataxic Speech severity is now required.
In this study, we developed a fully automated speech monitoring and scoring system for identifying Ataxic Speech and estimating its severity with probability scoring charts.Time-domain, spectral-domain, and cepstral domain features were used, along with non-linear features; however, as linear features are obtained from the traditional linear source-filter speech production system model, they do not take into account the nonlinear phenomena of 3D fluid dynamics produced during speech.While non-linear dynamic descriptions have been used to analyse other pathological speech [26], [27], there is no evidence of its use in Ataxic Speech.The study employed two distinct C-V syllable paradigm tasks, with the assumption that incorporating multiple speech tasks would Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
illuminate unique features of Ataxic Speech that may not be as readily discernible through a single speech task.By combining the outcomes of these two tasks, the identification of Ataxic Speech and its severity assessment was expected to be enhanced.The features were then used in nomograms to generate interpretive visual scores, that would indicate the likelihood of Ataxic Speech being present or absent in a particular individual and their severity.
To summarise, in this study, four speech domains (time, spectral and cepstral domains along with non-linear dynamics) were quantitatively assessed by objective acoustic analyses to determine specific dysarthric features and estimate their reliability in separating Ataxic Speech from control subjects' speech and estimating the severity of Ataxic Speech, expressing both in terms of probability.The aims of this study can be summarised as follows: 1) Examine speech features in the temporal, spectral, cepstral and non-linear dynamics domains to extract a complementary dataset that effectively captures different signalling attributes of Ataxic Speech.2) Assess whether the combined use of linear and non-linear speech measures improves the distinction of Ataxic Speech from control subjects' speech and its severity.3) Investigate the potential effectiveness of two separate speech tasks in the assessment of Ataxic Speech, with the objective of describing variations in Ataxic Speech features across tasks.4) Design a fully automated quantitative system based on machine learning to separate Ataxic Speech from control speech and develop interpretive visual scoring charts (using nomograms) for estimating the probability of Ataxic Speech risk and Ataxic Speech severity.The nomograms are designed based on the binary and ordinal logistic regression models.

A. Participants and Speech Corpus
Speech data from 126 Australian native English speakers were collected.The patients were 65 individuals previously diagnosed with ataxia.Our study specifically investigated Pure (central) CA [22], CA with Bilateral Vestibulopathy (CABV) [23], [24], and CABV with Somatosensory impairment [25].Additionally, all patients in our study exhibited pure CA without any co-occurring dysarthria.The controls were 61 volunteers with no history of speech difficulties or neurological disease.The demographics of the participants are summarised in Table I.None of the participants (with ataxia or without ataxia/neurotypical) had undergone speech therapy prior to the study.

B. Ethics Approval and Informed Consent
This study was approved by the Human Research and Ethics Committee, Royal Victorian Eye and Ear Hospital, Australia (HREC Reference Number: 11/994H/16) and funded by the National Health and Medical Research Council (NHMRC) Grant: GNT1101304.The study was conducted according to the NHMRC's "Australian Code for the Responsible Conduct of Research, 2018" [28] and written consent was sought from all the participants prior to their enrolment.The subject shown in Figure 1 provided informed consent to publish their image.

C. Comprehensive Objective Speech Assessment System
The Ataxic Speech assessment was performed using the following steps: 1) Speech Inputs Participants performed the following two speech tasks: a) Speech Task 1: Participants were instructed to recite the phrase British Constitution (BC) three times.This phrase is a well-established benchmark for assessing the characteristics of Ataxic Speech.The acoustic measurements were obtained by taking the average of the three recorded instances for each individual.b) Speech Task 2: Participants repeated the syllable /ta/ continuously for five seconds, producing a syllabic sequence of /ta/-/ta/-/ta/.This task will be referred to as the Repeated Ta (RT) task.Speech intelligibility in the individuals with ataxia was perceptually scored by an experienced clinician in accordance with the Scale for Assessment and Rating of Ataxia (SARA) [29] from the subject's performance of the above two tasks.The SARA speech item ranges from 0 to 6 indicating the severity of Ataxic Speech where "0" is normal speech, "1" is disturbed speech, "2" is distorted speech, but simple to comprehend, "3" is where it is sometimes difficult to understand words, "4" is where it is difficult to understand several words, "5" represents only single words being comprehensible and "6" represents unintelligible speech.The evaluation of the SARA speech item was performed by the same clinician for all patients.The evaluator was blinded to other patients' conditions and was not informed about the specific diagnosis or treatment status of the patients to reduce the risk of bias and ensure that the evaluations were as objective as possible.
2) The speech recordings were captured by a condenser microphone clipped at an average distance of 10 cm from the subject's lips, in a quiet room with low ambient noise.The recording was conducted using the BioKinMobi T M [30] application on an Android phone, under the supervision of a trained investigator.These speech tasks resulted in 252 speech recordings, with 126 recordings corresponding to each speech task.3) Wireless transmission occurred to a blockchain based distributed cloud network [31] where the proposed machine learning scoring framework (Section II-D) was applied.4) Data analysis results are transformed into a clinically relevant format.A pictorial representation of the assessment platform is illustrated in Figure 1(A).

D. Machine Learning Scoring Framework
In this study, a fully automated framework (Figure 1(B)) for the scoring system was developed through the following stages.
1) Data Representation: Let M(M = 126) be the total number of individuals enrolled in this study.The k th individual, where 1 ≤ k ≤ M, is instructed to perform T (T = 2) speech tasks in a day.Distinctive acoustic features are extracted from each speech task.Further, the j th speech task, where 1 ≤ j ≤ T , consists of N j acoustic features, where N j varies between 24(RT) and 77(BC).Let T j=1 N j = F such that F = 101, that is the total number of features extracted in this study.For each individual k, the feature set consisting of all the features extracted from j speech tasks can be represented through the row matrix, where X k ϵ R 1×(N 1 +N 2 +...+N T ) .If the i th feature of the j th speech task is measured for the k th individual, then his every feature can be statistically represented as x k ji , where 1 ≤ k ≤ N , 1 ≤ j ≤ T and 1 ≤ i ≤ N T .Therefore, the total feature vector X for M individuals in this study has a size of M × F.
A model consisting of M individuals, the input matrix feature set X , and the output response level set Y can be denoted as column matrices, ( The dimensions of X are M × (N 1 + N 2 + . . .+ N T ), the dimensions of responses Y are M × 1 and the dimensions of the obtained dataset combining features and responses are The algorithm's output, Y k , can be either binary or dichotomous, used for Ataxia-Control classification, or ordinal for determining the severity of Ataxic Speech, based on SARA speech scores.In the upcoming section, a machine-learning algorithm using the framework illustrated in Figure 1 (B)) is presented.
2) Feature Set Construction: In this study, the time-based features were extracted from speech task 1 [10]; the spectral and cepstral [13] features were extracted from speech task 2; and the traditional and non-linear features were extracted from both speech tasks (1 & 2).Supporting Document Table I presents a brief description of all 101 acoustic features extracted in this study.Six acoustic measures, namely, RT Duration Regularity (RT_Dr50), RT Gap Regularity (RT_Gr), Average RT Peak Prominence (RT_PPa), Average RT Compensation (RT_Ca), RT Damping Ratio (RT_DR75) and RT Resonant Frequency (RT_RF50) were extracted from the Repeated Ta (RT ) data for an individual k according to the Topographic Prominence based algorithm of our previous study [10].
c) Spectral features: Spectral descriptors [14], [15] have been widely used for the perceptual analysis of audio signals in machine learning applications.The spectral features used in this study to extract features for an individual k are summarised in Supporting Document Table I along with their brief mathematical description.For each of these features extracted from the RT and BC data, we further calculated the descriptive statistics (mean (M), standard deviation (SD) and interquartile range (IQR)).
d) Classical features: Fundamental frequency and parameters describing the variability of frequency in time (jitter) [33], [34], amplitude disruptions (shimmer) [35] and pitch perturbations [36] are the most common features describing disturbances in the vibratory characteristic of the vocal folds.These classical characteristics are good candidates, for speech tremor quantification [37].We included harmonic ratio, shimmer, jitter and their variants [9], [38] in this study.
e) Non-linear dynamic features: Abnormal speech patterns can lead to significant variations in vocal fold tension, resulting in irregular, aperiodic, and noisy-like voices.This deviation may involve sub-harmonics and chaos, which can obstruct the analysis of conventional speech signal processing techniques.Nevertheless, recent studies have indicated that nonlinear dynamical analysis can adequately explain these signal patterns [26], [27].
• Largest Lyapunov Exponent (LLE): In accordance with the Rosenstein equation, L L E is calculated as the average divergence rate of neighbouring trajectories in the attractor.For this algorithm, it is appropriate to estimate the closest neighbours to each point in the trajectories; a neighbour must complete a temporal separation greater than the "period" of the time series to be known as the nearest neighbour [27].It is possible to say that the point separation in a trajectory is in accordance with the following expression.
where λ 1 is the maximum Lyapunov exponent, d(t) is the average divergence taken at the time t, and C is a normalization constant.The following expression can be obtained by assuming that the j th pair of nearest neighbours roughly diverge at a rate of λ 1 where λ 1 is the average line slope that occurs on the logarithmic plane when such an expression is drawn.
Based on the Rosenstein equation adopted in our study, we also incorporated Liu et al.'s [39] first-order correction method of the exponential divergence of trajectories in state space of the original noisy speech signal to improve the signal-noise ratio while estimating LLE.The largest Lyapunov exponent could be estimated using nonlinear least squares fitting based on the correction.
• Detrended Fluctuation Analysis (DFA): For the estimation of the scaling exponent α in non-stationary time series, the stochastic portion of the voice signals can be analysed using DFA.Any number on the real line may be inferred by the scaling exponent; however, the representation of this scaling exponent on a finite sliding scale from zero to one will be more convenient; thus we need g : R → (0, 1), a mapping function.The logistic function g(x) = (1 + ex p(−x)) −1 is one such function that finds common use in statistical and pattern recognition applications, so the normalised scaling exponent is, On this scale, each sound will lie somewhere between the extremes of zero and one, according to the self-similarity properties of the stochastic part of the dynamics.When speech sounds have α nor m closer to one, this characterises speech disorders.3) Feature Selection: In order to select features for large-scale data in a study on Ataxic Speech classification, we employed a two-stage feature selection process (Figure 1

(B)).
a) Stage 1 (for binary outcome): We used a mass univariate approach to eliminate features that were not significantly related to the binary outcome of ataxia or control, as determined in the development dataset.To do this, we conducted a feature-wise KS test to examine whether each feature varied significantly across the two groups.The resulting features were then further pruned using regularisation, with the elastic net method [40], [41] being selected due to its balance between interpretability and parsimony.
b) Stage 2 (for ordinal outcome): To achieve an ordinal outcome in the study, we assessed the correlation between each selected feature and the SARA speech score using Spearman's rank correlation test.We found that the selected features were significantly correlated with the outcome, but not necessarily with each other.This suggests that the heterogeneous groups of features from the two speech tests may each represent a proportion of the outcome's variability.
4) Handling Multicollinearity: We also incorporated the Correlation matrix/ Correlation plot and Variation Inflation Factor (VIF) to detect the multicollinearity.An optimal value of VIF <10 is selected in our experiment.The application of the elastic net regularization method in the selection of features for the binary outcome (as discussed in Section II-D.3.a)effectively addressed the issue of multicollinearity in our highly dimensional dataset.This was accomplished by removing highly correlated predictors from the data [42].
5) Model Building and Ataxic Speech Assessment: This section involved designing a 3-Tier Ataxic Speech automated assessment architecture as depicted in Figure 1 (B): (i) classify speech as ataxic or control, (ii) estimate the severity of Ataxic Speech, and then (iii) use a nomogram based scoring chart to predict Ataxic Speech risk and probability of Ataxic Speech severity.
To construct models for (i) and (ii), we employed binary and ordinal logistic regression, respectively.Subsequently, nomograms were developed for (iii) utilizing the binary and ordinal logistic regression models.Nomograms have been widely used in different clinical settings [43], [44], to indicate the probability of an event, such as death or the presence of a disease, primarily by reducing statistical predictive models to a single numerical estimate tailored to the individual patient profile.
Therefore, the ordinal logistic regression models take the forms, = ln( = ln( where π j , j = 1, 2, 3 are the category probabilities, he covariate X k = [X k1 , . . ., X kp ] consists of the selected p features from the speech recordings of the individual k, the slope, β = [β 1 , . . ., β p ] is a vector of regression coefficients and α j is an intercept, depending on j.Equations ( 7)-( 8) share the same coefficient β for covariate X k .However, the intercept α varies and is denoted by different annotations(α 1 to α 3 ).The intercepts can be used to calculate the predicted probability of patients with a given set of characteristics being in a specific category.
The probability of a patient to be in a category j, where j = 1, 2, 3 can be computed as, where P(Y k ≤ j) is the cumulative probability of Y k less than or equal to a particular category, j = 1, . . ., J − 1.

E. Data Distribution, Statistics and Model Performance Metrics
To evaluate the generalization performance of the proposed framework on previously unseen data, the study utilized a randomly selected 20% (26 participants) holdout subset of the dataset for testing the trained models, while the remaining 80% (100 participants) of the observations were used for development and validation of the models (Table I).All statistical tests were two-tailed, and a p-value<0.01was considered statistically significant.To tune the hyperparameters (α and λ) during the elastic net regularization (Section II-D.3.a), a nested crossvalidation scheme was applied in the development-validation dataset and the number of folds (k) varied from 2 to 100.The process was iterated 100 times for each fold where the development-validation dataset was randomly permuted in each iteration.All data management and statistical analyses were carried out using the software, MATLAB and R-version 4.1 (R Foundation for Statistical Computing, Vienna, Austria).
The prediction performance of the Ataxic Speech assessment models was evaluated using the concordance index (C-index) value on a scale from 0 to 1 with a 95% confidence interval (CI), as well as the area under the curve (AUC) for the receiver operator characteristic (ROC) plot [45].Generally, a value of C-index> 0.70 and AUC>0.80 indicates that the model is good for discrimination.The nomogram was validated by measuring calibration curves both internally (validation set) and externally (test set).The calibration was analysed using the Hosmer-Lemeshow goodness-of-fit test (HL test), which assesses how well the speech pattern in the data under analysis is described; non-significant p-values indicated that the fit of the model was good [46].For the calibration curves, the results in the development dataset were validated using the bootstrap method (bootstrap = 500).Discrimination between observed and predicted outcomes was also assessed using the Mean Absolute Error (MAE), Mean Squared Error (MSE), Somers' Dxy (Dxy), and Nagelkerke R2 index (R2).A good discriminative model has a high value of R2 (>0.7) and low values of MSE and MAE.Large values (tending towards −1 or 1) for Somers' Dxy indicate the model has good predictive capacity.Goodness-of-fit tests, such as the likelihood ratio test, show model suitability, and the Wald statistics evaluate the significance of individual independent variables.The Brier score was calculated; complete precision [47] is indicated by a Brier score of 0. Discrimination and calibration [48] does not measure clinical effectiveness, or the potential to make better decisions with a model than without; therefore, we used an alternate method [48] to perform a decision-curve analysis in our study.

III. EXPERIMENTAL RESULTS
This section describes the results of implementing our machine learning framework for Ataxic Speech assessment.
Recently published epidemiological data indicated 26/100, 000 children and 2.7/100, 000 adults diagnosed with a dominantly inherited cerebellar ataxia [49], [50].These studies also report the prevalence of recessively inherited cerebellar ataxias as 3.3/100, 000.Hence, for a given large effect size (AUC = 0.97 and Cohen's d of 2.7), a minimum Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
sample size of 50 (for each group) was calculated by power analysis, with the error probability set at 0.05 and a false negative rate set at 0 (that is a power of 1).
To retain more response information while ensuring each response category has a relatively large number of respondents, an ordinal scaled version (four groups) of the original SARA-rated version (five groups), was analysed.The four category ordinal version is grouped as: Group 1(SARA'0'), Group 2(SARA'1'), Group 3(SARA'2'), Group 4(SARA'3' & '4') with 17, 17, 16 and 15(10 & 5) ataxic speakers respectively.Our results revealed that there was no significant difference in age between control and ataxia speaker groups in both the development-validation dataset (t(98) = 1.25, p = 0.215) and the test dataset (t(24) = 0.87, p = 0.393).Therefore, we concluded that age was not a confounding variable in our study.

A. Feature Extraction and Selection
A total of 101 spectral, cepstral, time-domain, non-linear and traditional features were initially extracted from speech recordings of the two speech tasks.A Correlation heatmap for the original feature set (101 features) and reduced feature sets are provided in Supporting Document Figure 1.Further, a box plot distribution of 101 features is presented in Supporting Document Figure 2, where the blue shaded plots represent the features that were statistically significantly different for ataxic and control (KS test, p≤0.01) in elastic net regularisation (α = 0.2).
This initial feature set when subjected to the two-stage feature selection approach (Section II-D.3) resulted in the following two reduced feature subsets (FS).2(A) and 2(B) show the trace plots and corresponding cross-validated deviance of the elastic net fit. Figure 2(C) indicates the performance of the α (elastic-net mixing parameter) when varied from 0 to 1.When α increases, the number of selected features reduces.By the principle of parsimony [51], eight features were selected in this step at k = 10, α = 0.9, λ (regularization penalty) = 0.0609 (Accuracy = 93%, Sensitivity = 94%, Specificity = 91.8%)as indicated by the red dotted line in Figure 2(C, D).
2) Decision Curve Analysis: The standardised net benefit for Model 1 (all features from FS 1), Model 2 (RT features from FS 1), and Model 3 (BC features from FS 1) were plotted against the threshold probability for categorising a subject's speech as Ataxic Speech (Figure 3(B)).The "all" line shows the net benefit of detecting all subjects with ataxia, and the "none" line is the net benefit of detecting controls.The plot demonstrates that Model 1 is superior to Models 2 and 3 across almost all threshold probabilities (0.1-0.85), with the highest difference at a threshold probability of around 0.6; at that

TABLE III MODEL STATISTICS FOR ATAXIC SPEECH DIAGNOSTIC
threshold, the net benefit (for ataxia speech) is 0.70, 0.20 and 0.80 for Models 2, 3 and 1 respectively.At that threshold, according to the net benefit concept [48], one can administer about 80-70 = 10 and 80-20 = 60 more profitable treatments (in every 100 subjects with ataxia) when using Model 1 rather than Model, 2 and Model 3 respectively; to those who would otherwise be left untreated (i.e., net of false positive).Hence, Model 1 had superior performance compared to Models 2 and 3 as its net benefit surpassed the net benefit of the other two models across the threshold probability range (0.1-0.85).This result indicates the effectiveness of using both speech tasks over only using one for detecting subjects with ataxia.
3) Experiment 2 -Ataxic Speech Severity Estimation: The four acoustic features in FS 2, namely, RT _P Pa, RT _Gr , BC_SpecK ur t_M, and BC_SpecSkew_M.were significantly correlated (p<0.01) with the SARA Speech scores and were included in the severity estimation model design.An increase in the time domain feature (RT_GR) extracted from RT was associated with an increase in the odds of considering a high Ataxic Speech severity, with an odds ratio of 1.46, Wald z = 2.46, p=0.01 (Table IV).
4) Experiment 3: This experiment sought to explore the effectiveness of a combined feature set, integrating features from our original Ataxic-Control Classification (Experiment 1) and Ataxic Speech Severity Estimation (Experiment 2).This experiment aimed to determine if features selected for severity estimation could also accurately classify ataxic and control speakers.Results from the validation phase showed that the combined features achieved an AUC(ROC) of 0.98 (95% CI 0.88-1), with sensitivity and specificity rates of 88% and 86%, respectively.These rates were comparable to those observed in Experiment 1, which achieved an AUC(ROC) of 0.98 (95% CI 0.89-1), alongside a sensitivity of 90% and a specificity of 84%.However, performance on the test sets revealed a slightly reduced accuracy for Experiment 3, with an AUC(ROC) of 0.95 (95% CI 0.90-1), sensitivity of 92%, and specificity of 83%, compared to an AUC(ROC) of 0.97 (95% CI 0.90-1), a sensitivity of 97.43%, and specificity of 85.29% in Experiment 1. Notably, the sensitivity was consistently higher in Experiment 1 across both sets.Considering the critical balance between sensitivity and specificity in medical diagnostics, where high sensitivity is paramount for effective Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV MODEL STATISTICS FOR ATAXIC SPEECH SEVERITY PREDICTION
Fig. 4. Nomogram scoring charts for predicting the probability of A. Ataxic Speech risk and B. Ataxic Speech Severity.To use the nomogram scoring chart, an individual subject's value from the test dataset was presented on each variable axis, and a vertical line (in red) is plotted upward to calculate the number of "Points" received corresponding to each variable value.The sum of these scores is located on the total points axis and draws a line vertically down (in blue) to find the probability of Ataxic Speech risk or Ataxic Speech severity greater than or equal to a specific category.We demonstrate the use of the chart with an empirical example.A. A subject from the validation cohort with speech features located on the variable axes of the chart corresponds to the following values on axis Points, as such (a)RT_PPa = 15,RT_JitterSD = 20,BC_MGDCC1 = 30, BC_JitterRAP_SD = 50, BC_HR_M = 60, BC_SpecRoP_SD = 10, BC_SpecFlux_IQR = 5, BC_SpecKurt_IQR = 20, which adds up Total Points (250) and their Ataxic Speech risk probability is more than 0.63; indeed the subject is with ataxia as confirmed by medical diagnosis.B. A subject from the validation cohort with speech features located on the variable axes, RT_PPa, RT_Gr, BC_SpecKurt_M, BC_SpecSkew_M; will sum up to Points (5+10+15+20+35) and Total Points (80), and their probability scores to be in CA Severity ≥1, ≥2, ≥3 are 0.78, 0.50, 0.23 respectively; their SARA score is 2 as confirmed by clinician's assessment.The higher colour intensity bar in the CA severity score plot indicates a higher severity of Ataxic Speech.
screening [52], the results suggest that the feature set from Experiment 1 is preferable, particularly given its greater parsimony and efficiency in classification.These findings corroborate the necessity for task-specific optimized feature sets, reinforcing our assertion that separate models are warranted for diagnosis versus severity prediction of ataxic speech.This result aligns with similar observations from previous studies in other domains of Cerebellar ataxia [53], [54], and Friedrich ataxia [55], where diagnostic and severity prediction tasks have yielded different sets of optimal features.

C. Nomogram Scoring Charts
The nomogram scoring charts for predicting the probability of Ataxic Speech risk and Ataxic Speech Severity for an individual subject from the test dataset were presented in Figure 4. 1) To Stratify Ataxic Speech Risk: In the test dataset, the nomogram was used to evaluate the risk of Ataxic Speech for all subjects; the probability indicated by the nomogram was then compared with the probability of the regression model.For the test set, the AUC for the nomogram was 0.95 (95% CI, 0.88-1, P<0.0001) and the AUC for the model was 0.97 (95% CI, 0.90-1,P<0.0001;Figure 3(A)), indicating that the logistic regression model and nomogram performed very well in predicting the risk of Ataxic Speech.
a) Calibration curves: Evaluation of the nomogram model on the validation set yielded a C-index value 0.98 (95% CI, 0.89-1, P<0.0001, which was identical to the AUC of the regression model (Figure 3(A)).The calibration curve for the validation dataset showed a strong agreement with a predictive probability ranging from 0 to 0.7 between nomogram predictions and actual observations (Figure 3(C)).Data for the test group yielded a C-index value of 0.96 (95% CI, 0.89-1, P<0.0001), approximately similar to its AUC (Figure 3(A).The MAE and MSE values for the validation dataset and test dataset were (0.092, 0,03) and (0.12, 0.01), respectively.Further, the HL test demonstrated no significant statistical difference between the calibration curves and the ideal curves in both the validation and testing cohorts (Figure 3(C, D))).
2) To Predict the Probability of Ataxic Speech Severity: Following the regression analysis in Section III-B.3, the same four acoustic features were incorporated into the nomogram scoring chart to predict the ordinal probabilities.Only Ataxic Speech data were used to construct and calibrate the nomogram.On the basis of the individual scores of these four variables, a user may calculate the total score and obtain a particular probability of Ataxic Speech severity.We illustrate the use of the ordinal nomogram scoring chart, detailed scores of all variables and their interpretation using a participant from the test dataset in Figure 4.According to the nomogram, BC_SpecSkew M had the greatest influence on estimating the probability of the severity of Ataxic Speech, followed by the features BC_SpecK ur t M , RT _Gr and RT _P Pa.The mean areas ROC curve of the regression models resulting in the Multiclass ROCs in the validation and testing groups were 0.84 and 0.74, respectively (Figure 5(B)).
a) Calibration curves: The C-index of the nomogram model in the validation group was 0.81 (Table IV).In the validation dataset, calibration plots showed better consistency between the nomogram projections and the actual observations for Ataxic Speech Severity ≥2, = 3 than Ataxic Speech Severity ≥ 1 (Figure 5(A)).

IV. DISCUSSION
Ataxic Speech is a crucial aspect of the clinical manifestation of CA, yet automated methods for Ataxic Speech evaluation are currently lacking.Thus, this study aimed to establish a comprehensive and objective evaluation of Ataxic Speech, providing a probability of its presence and severity using a nomogram connected to the SARA.A total of 126 participants, including 65 with CA and 61 controls, were recruited to obtain speech samples.The algorithm utilized feature selection and elastic-net regularization to develop regression models that distinguish Ataxic Speech from controls and detect severity, followed by nomogram-based reports to predict Ataxic Speech probability and severity.It is important to note that identifying the presence of speech ataxia in individuals with CA and categorizing it from controls does not constitute a clinical diagnosis.Instead, it enables discrimination between controls and those with Ataxic Speech with a wide range of severity levels.Diagnosis of ataxia usually occurs early in the disease when symptoms are mild, and the current tool is not intended for clinical diagnosis.Instead, it could provide insights into developing tools that can differentiate early or emerging Ataxic Speech from unaffected individuals' speech.
In this implementation, we extensively investigated a variety of speech features that have been previously reported as relevant to Ataxic Speech, including time, spectral and cepstral measurements, to extract 101 acoustic features.This allowed us to explore a wide range of acoustic and phonetic characteristics that may be related to Ataxic Speech and select the most informative features.This study also investigated non-linear speech features to determine their relevance to Ataxic Speech specifically.While the non-linear acoustic features have been used previously in the analysis of pathological speech [26], [27], their capability to characterise Ataxic Speech has not been explored.Their effectiveness and complementary behaviour in differentiating individuals with ataxia from controls and objectively estimating Ataxic Speech severity were studied here.While the feature sets FS1 (RT_PPa, RT_JitterSD, BC_MGDCC1, BC_JitterRAP_SD, BC_HR_M, BC_SpecRoP_SD, BC_SpecFlux_IQR, BC_SpecKurt_IQR) and FS2 (RT_PPa, RT_Gr, BC_SpecKurt_M, BC_SpecSkew_M) for diagnosing and predicting the severity of speech ataxia, are indeed distinctive for Ataxic Speech, some of them, such as the spectral and classical features, has also been found to be altered in other types of dysarthria, such as hypokinetic Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
dysarthria in Parkinson's disease [56].However, it is important to note that the combination of these features, rather than any individual feature, provides the best separation accuracy between ataxia and controls, suggesting that these features are uniquely informative for Ataxic Speech and capture specific acoustic and phonetic properties that are characteristic of Ataxic Speech, such as changes in pitch, timing, and spectral properties.
Also, the fact that the features leading to the best separation between ataxia and controls are completely different from those selected for the prediction of Ataxic Speech severity may seem counterintuitive.However, this can be explained by the fact that the two tasks have different goals.The goal of the first task (separation between ataxia and controls) was to identify the most distinctive features that can differentiate Ataxic Speech from normal speech.The goal of the second task (prediction of Ataxic Speech severity) was to identify the features that are most strongly associated with the severity of Ataxic Speech, regardless of whether they are also altered in other types of dysarthria.Therefore, it is possible that some of the features that are distinctive for Ataxic Speech do not necessarily correlate with the severity of the disorder, and vice versa.
Two separate repeated C-V syllable paradigms were assessed with the expectation that variations in Ataxic Speech features might be found in the two tasks.The rationale was that in each speech task, different features of Ataxic Speech may be more pronounced and a cross-task comparison may be helpful in isolating and documenting aspects of abnormalities.The comparative plots for Model 1 (All features from FS 1), Model 2 (RT features from FS 1), Model 3 (BC features from FS 1) demonstrated in the decision curve analysis confirmed this rationale.The Model 1 performance was superior to Models 2 and 3 as its net benefit surpassed the net benefit of the other two models across the threshold probability range (0.1-0.85).This decision curve analysis affirmed the clinical usefulness of our selected model.
In regression models, the actual 'effect' determines the value of the outcome and more specifically, with logistic regression, it is the 'log-odds ratio'.We introduced nomogram-based scoring models and charts in our study to accentuate this idea and translate it to an easily interpretive visual score.Two predictive nomograms were developed using the independent factors from feature sets FS 1 and FS 2 to generate indicators for estimating the risk of Ataxic Speech and its severity, respectively.In order to predict disease severity, as well as repeatability and reliability in different populations, severity scoring systems should be valid, calibrated and discriminated against.The supporting predictive scoring chart in our study will provide a combined quantitative tool for clinicians to assess the risk of Ataxic Speech and the individual probability of its severity.Further, to test and validate the prognostic accuracy of the nomogram model, adequate discrimination and calibration were performed.Our assessment models performed better than the objective assessments of Ataxic Speech from previous literature as depicted in Table II.
In conclusion, this study demonstrates the effectiveness of incorporating the complementary behaviour of objective speech measures extracted from the four domains (time, spectral, cepstral and non-linear dynamics) in differentiating individuals with ataxia from controls and objectively estimating Ataxic Speech severity.Additionally, the use of a combination of speech features from different speech tasks highlighted specific aspects of Ataxic Speech less easily identified by a single task; and improved the identification of Ataxic Speech and the estimation of its severity.The findings show that the automated analysis of meaningful acoustic features from recordings of the two repeated C-V syllable paradigms selected (RT and BC) can be a reliable tool for monitoring CA-associated vocalisation deficits.Furthermore, nomogram based scoring charts assist in offering an accurate individualized prediction of the presence of Ataxic Speech and its severity while highlighting the important prognostic information that can be gleaned from simple speech tests.We believe that this technique can be further developed and translated for other motor evaluations as well as into the entire spectrum of motor speech disorders manifested by other neurodegenerative disorders.The presented pilot study results offer new possibilities for future research on motor speech disorders, ranging from conventional laboratory-based analyses; and monitoring the impact of therapy and progression of longitudinal disease to high-throughput screening possibilities.

Fig. 1 .
Fig. 1. A. Comprehensive Objective Speech Assessment System, B. Workflow of the machine learning scoring algorithm.
a) Time-based features: Let S k be of the set of (RT ) syllables for an individual k, where S k = [S k1 , S k2 , .. ., S kn ].Similarly, their corresponding measures of full prominence, widths/ time-duration at half prominence, and position of peaks are denoted by P k , W h k , Pk k respectively, where P k = [P k1 , P k2 , . . ., P kn ], W h = [W h k1 , W h k2 , . . ., W h kn ], Pk k = [(Pk k1 , T p k1 ), (Pk k2 , T p k2 ), . . ., (Pk kn , T p kn )], Pk k represents the elevation and T p k represents the time-point.The variable n varies from one RT recording of an individual to another.

6 )
Regression Models and Nomogram Construction (Scoring Chart): Let Y k denote the response of an individual k belonging to an outcome category J .a) Construction of nomogram (binary responses): When estimating the probability of Ataxic Speech over control speech, the outcome categories, J (J = 2), have the responses 1 (ataxic) or 0 (control).The odds of Y k being equal to either 1 or 0 are denoted, respectively, by,

Fig. 2 .
Fig. 2. Elastic net fit plots A. Trace plots of coefficient fit by elastic net Cross-validated Deviance of Elastic net fit C. Accuracy estimation and a number of characteristics selected when the alpha elastic-net mixing parameter takes values from 0 to 1 D. Eight features were selected in this step with α = 0.9 (Accuracy = 93%, Sensitivity = 94%, Specificity = 91.8%)as indicated by the red dotted line.

Fig. 3 .
Fig. 3. A. Area under the curve (AUC) for the receiver operator characteristic (ROC) plot of Ataxic Speech diagnosis for the validation and test datasets are 0.98 and 0.97 respectively B. Decision curve analysis for Model 1 (All features from FS 1), Model 2 (RT features from FS 1), and Model 3 (BC features from FS 1) for diagnosing CA.The three curves were compared to the curves of detecting none (black) and all (grey) individuals with ataxia.The Calibration Curves for Ataxic Speech risk prediction nomogram for C. validation dataset D. and test dataset.Performance statistics, namely, Mean Absolute Error (MAE) and Mean Squared Error (MSE), are indicated for both the plots (C. and D.).The x-axis showed the predicted probability of Ataxic Speech severity.The y-axis showed the actual probability of Ataxic Speech severity.The solid line showed the efficiency of the nomogram.An almost close to the diagonal dotted line indicates a good predicted capability.

Fig. 5 .
Fig. 5. A. The Calibration Curves for the probability of Ataxic Speech Severity prediction nomogram for the validation dataset.The x-axis showed the predicted probability of Ataxic Speech severity.The y-axis showed the actual probability of Ataxic Speech severity.The solid line showed the efficiency of the nomogram.A line close to the diagonal dotted line indicates good predictive capability.B. The area under the curve (AUC) for the receiver operator characteristic (ROC) plot of Ataxic Speech Severity prediction for the validation and test datasets are 0.84 and 0.74 respectively.