Novel Multi Center and Threshold Ternary Pattern Based Method for Disease Detection Method Using Voice

Smart health is one of the most popular and important components of smart cities. It is a relatively new context-aware healthcare paradigm influenced by several fields of expertise, such as medical informatics, communications and electronics, bioengineering, ethics, to name a few. Smart health is used to improve healthcare by providing many services such as patient monitoring, early diagnosis of disease and so on. The artificial neural network (ANN), support vector machine (SVM) and deep learning models, especially the convolutional neural network (CNN), are the most commonly used machine learning approaches where they proved to be performance in most cases. Voice disorders are rapidly spreading especially with the development of medical diagnostic systems, although they are often underestimated. Smart health systems can be an easy and fast support to voice pathology detection. The identification of an algorithm that discriminates between pathological and healthy voices with more accuracy is needed to obtain a smart and precise mobile health system. The main contribution of this paper consists of proposing a multiclass-pathologic voice classification using a novel multileveled textural feature extraction with iterative feature selector. Our approach is a simple and efficient voice-based algorithm in which a multi-center and multi threshold based ternary pattern is used (MCMTTP). A more compact multileveled features are then obtained by sample-based discretization techniques and Neighborhood Component Analysis (NCA) is applied to select features iteratively. These features are finally integrated with MCMTTP to achieve an accurate voice-based features detection. Experimental results of six classifiers with three diagnostic diseases (frontal resection, cordectomy and spastic dysphonia) show that the fused features are more suitable for describing voice-based disease detection.


I. INTRODUCTION
The voice is one of the most important factors that we use in communication between people. Voice and speaking skills are the easiest way to reflect the thought and it is the most important component that distinguishes it from other living things. The voice is part of the personality and character of almost every person. We can also understand diseases The associate editor coordinating the review of this manuscript and approving it for publication was Kin Fong Lei . using voices because some diseases directly affected human voice [1]- [3].
Finding diagnoses such as frontal lobe resection, spasmodic dysphonia and cordectomy from patient voice data has become possible with today's technologies. Little is known about the neuropsychological outcome after frontal resection [2]. However, detection of disease from voice data is still a very new study. Spasmodic dysphonia (SD) is a neurogenic, central originated focal laryngeal dystonia that occurs during speech, with intermittent or continuous spasm of the cord vocals, and muffled sound quality is monitored.
Cordectomy is the excision of the vocal cord by performing a thyrotomy [3].
Disease detection by using voice has become increasingly popular in the literature over the past few years with the development of medical diagnostic systems [4]. Computer-processed voices are used everywhere these days such as Alexa, Siri, Google Assistant and other services etc. [5], [6] In traditional medical diagnoses, a medical doctor's report is required, which takes the patient's neurological history and examines various motor skills [7], [8]. It may be necessary to obtain information from many laboratory results for an accurate diagnosis. Therefore, clinical situations that do not lead to misdiagnosis are one of the biggest areas of interest for medical expert systems. In addition, smart medical diagnostic systems have the potential to optimize medical decisions, improve medical treatments, and reduce waiting times and financial costs. Voice recordings have been considered as a potential (noninvasive and low cost) biomarker to diagnose some voice related diseases [9]- [11].

A. MOTIVATION
Voice and sound classification are hot topic research areas of machine learning. In the biomedical engineering applications, automated voice-based disease detection method has been widely used. To develop intelligent health assistant systems, automated voice-based disease detection methods should be developed and enhanced. In the literature, binary classification methods (two class method) have generally been proposed, our main motivation is to perform multiclass-pathologic voice classification by using novel multileveled textural feature extraction and iterative feature selector.

B. RELATED WORKS
This work is about voice signal recognition in biomedical engineering. The main of this work is to reach high accuracy rate by using a naïve method. In this section, some pathological voices classification and detection works were listed in Table 1.

C. OUR METHOD
In this study, we presented a novel one-dimensional TP for feature extraction we called MCMTTP as it uses multi centered and multi threshold based ternary pattern. The main objective of the proposed MCMTTP is to comprehensively extract features. To generate multileveled features, we use maximum pooling, which is widely used in the deep learning network. The extracted features are selected by using an iterative neighborhood component analysis we called INCA. In the classification phase, 6 classifiers were selected to obtain comprehensive benchmark for evaluating 8 cases.

D. CONTRIBUTIONS
The provided contributions of the proposed multileveled MCMTTP based disease classification method are shown in below. 1) TP has been widely used in the literature, but it has some several drawbacks that need to be tackled and enhanced. TP is a parametric feature extractor and it used 5 th value as a center pixel. However, more differs and valuable features can be extracted by using variable center and variable threshold. A novel TP like microstructure (MCMTTP) is proposed to tackle problems of determining the optimal threshold value, center value selection, and medium-high level VOLUME 8, 2020 feature extraction. MCMTTP has better feature generating capability than classical TP because of different choices of center values and thresholds to generate features comprehensively as inception network. 2) Feature selection is one of the most important steps for machine learning especially, when the number of features is huge. To optimize this phase, an iterative neighborhood component analysis, NCA, feature selector is used. The main aim of the proposed INCA is to automatically select most discriminating features. 3) Voice based disease detection is a hard problem for machine learning based methods. Binary classification has generally been used in the voice disease detection method in the literature. Because, multiclass classification is very hard for voice-based disease detection [16], [17], [51]. In this work, frontal resection, cordectomy and spastic dysphonia diseases are used for performance evaluation and by using these diseases, 8 cases are created. High success rates are reached by using the proposed MCMTTP based multileveled method.

II. BACKGROUND
In this article, local ternary pattern (LTP) like feature extraction method is used to extract features. Therefore, Local Binary Pattern (LBP) which is basic textural feature extractor and one dimensional LTP is explained in this section.

A. LOCAL BINARY PATTERN
Local Binary Pattern was proposed in 1994 and is a commonly used texture identifier [28]. It is widely used because it is an effective feature extractor [29]. As shown in Figure 1, LBP divides the image into 3 × 3 sized overlapping blocks. Then the center pixel and neighborhood pixels are compared using the signum function. Mathematical notation of the binary feature extraction is given in Eq. 1 [30], [31].
Extracted binary features are converted to decimal value and histogram of the constructed new image is considered  as feature vector. It has also one dimensional version for the signal and voice processing [32].
As seen from Fig. 2, 9 sized overlapping blocks are used in 1D-LBP to extract features. In the binary feature extraction phase, 8 bits are extracted. Therefore, LBP extracts 256 features [16], [33].

B. TERNARY PATTERN
Local Ternary Pattern (LTP) is one of the LBP like feature extractor [34]. The main difference between LBP and LTP is to use ternary function for binary feature extraction [35]. Mathematical definition of the ternary pattern is given in Eq. 2 [18].
ter IM t,k , center, thr where tr represents ternary value and thr represents the threshold value.
Eq. 2 shows that the generated values are -1,0 and 1 by using ternary function. To convert bits to ternary values, Eqs. 3 and 4 are used.
As seen from Eq. 3-4, the 8 lower and 8 upper bits are generated from the secondary layer (Eq.2). By converting these bits to decimal values, upper and lower signals are generated. Histograms of these signals are concatenate, and the 512 features are obtained by using TP. A graphical explanation of the TP is shown in Fig. 3 [36], [37].

III. THE PROPOSED MULTI CENTER AND MULTI THRESHOLD BASED TERNARY PETTERN: MCMTTP
As we have seen in section 2.1, LTP is efficient and easy to use for feature extraction for binary classification. Here we propose a novel feature extraction called MCMTTP which uses  simultaneously variable center value and variable threshold value to extract final feature from the image to improve the performance of LTP. The pseudo code of TP is described in Algorithm 1.

IV. THE PROPOSED VOICE-BASED DISEASE CLASSIFICATION METHOD
In this work, we propose an algorithm (MCMTTP) whose goal is to discriminate between pathological and healthy voices with high accuracy rate by using multileveled based feature extraction, hybrid feature selection and classification. The Our approach comprises three components: (1) Multileveled feature extraction, (2) iterative NCA based feature selection and (3) a panoply of classifier for prediction. Flowchart of the algorithm is represented in Fig. 4 and detailed in the following section A, B and C.

A. MULTILEVELED FEATURE EXTRACTION USING MCMTTP
In this phase, a novel multileveled feature extraction network is proposed. To create features at different levels, maximum pooling method, down-sample technique to reduce features, is applied [38], [39]. This method has nine levels.
In each level, center and threshold values are iteratively changed. Low, medium and high-level features are extracted by using the proposed multileveled MCMTTP feature extraction method. Mathematical explanations and steps of the proposed multileveled method are given as below. Also, schematically explanation of the proposed MCMTTP method is shown in Fig. 4. Step 1: Create a loop from 1 to 9 Step 2: Extract features by using parametric TP function (see algorithm 1). TP is a textural feature extractor where it is used for images and signals as well. In the images, texture classification and analysis are one of the most studied areas of research. By using TP, 512 textural features are generated from a signal as follows.
Step 6: Concatenated extracted features and obtain 4608 sized feature vector.

B. ITERATIVE NCA BASED FEATURE SELECTION
NCA is a distance based feature selection method which generates positive weight for each feature [41]. One of the VOLUME 8, 2020  [40]. To overcome this limitation, we propose an iterative NCA to be able to select optimal and non-redundant features. Therefore, a feature range is defined, (we choose in this paper the feature range between 40 to 1000). Similar to ReliefF [42], NCA weights are calculated, sorted by descending order. At each iteration, features using range values are proposed and a a loss values is calculated (here we use KNN as a loss function). The features with the smallest error value are selected. This process defines our Iterative NCA,. Steps of our automated feature selection process INCA is presented as follows:.
Step 1: Normalize data. We normalize 4608 sized feature vector using min-max normalization where X is the final feature vector with size of 4608. NCA is distance-based method and it uses stochastic gradient descent to update weights. Therefore, normalization should be applied to obtain optimal results.
Step 3: Use NN to calculate error value. In this work, NN with Manhattan distance is used.
Step 4: Sort weights by descending.
[minimum, index] = min(error) Step 8: Select optimal feature (feat final ) by using calculated index value.
The generated feat final is forwarded to classifiers.

C. CLASSIFICATION
In the classification phase, six widely conventional classifiers were used. These classifiers are naïve bayes (NB), k nearest neighborhood (kNN), linear discriminant (LD), decision tree (DT), support vector machine (SVM) and bagged tree (BT). 10-fold cross validation were selected as testing and training strategy. We used conventional classifiers to obtain a cognitive method and show success of the proposed MCMTTP and INCA based feature selector.

1) NB
NB is a widely used simple probability-based conventional classifier [43]. In this paper, for NB hyper parameter, we use Gaussian kernel [44].

2) KNN
KNN is a distance based parametric classifier and is one of the simplest classifiers in the literature. In this paper, k and distance metrics are selected as 1 and Euclidean respectively.

3) LD
LD is a basic linear discriminant classifier based on the mean and the covariance of each class to where we need to classify a data point. Therefore, any hyper parameter setting is not used [45].

4) DT
DT or decision tree classifier and is one of the widely preferred conventional classifiers and uses information entropy to classify observations. Some of the popular tree algorithms are C4.5, CART and ID3. The widely used split criterion of the DT are Gini, Twoing and maximum deviance reduction. It is a parametric classifier. In this work, Gini's model is used in DT [46].

5) SVM
SVM or support vector machine is an optimization based conventional classifier that incorporates a variety of kernel methods such as radial basis sets, polynomial kernel or neural networks. We selected 3rd degree polynomial kernel or what is called cubic SVM [47].

6) BT
BT or bagged tree is an ensemble model of the DT. The used hyper parameters in BT are given as follows. Ensemble method is bag and Gini's diversity index is used for random split selection [48]. The conspicuous attributes of the proposed method are summarized as follows: • Generating a comprehensively features by using MTM-CTTP method.
• A multileveled feature generating network is proposed to extract low, medium and high-level features by using MTMCTTP method and maximum pooling technique.
To denote a general success and strength of the proposed feature generation network and INCA feature selector, six traditional classifiers are used.

V. PERFORMANCE ANALYSIS
To evaluate performance of the proposed MCMTTP based voice classification method, Saarbruecken Voice Database (SVD) [14], [49] was used. This dataset has pathological voices and user can freely download these voices. These pathologic voices were collected from more than 2000 subjects in 70 classes (diseases) with 50 KHz frequency and 16-bit resolution. These voices were collected as vowels (/a/, /i/, /u/) and sentence (''Guten Morgen, ne es Ihnen?''). Three diseases are used to create cases and these diseases are frontal resection, cordectomy and spastic dysphonia. We collected sentences from SVD. Eight cases were defined. The defined cases are given as below.  To study the performance of those classifiers, classification accuracy and geometric mean are used. These metrics are mathematically shown in below [50]. accuracy = tp + tn tp + tn + fp + fn (14) gm = tp tp + fn x tn tn + fp (15) where tp, tn, fp and fn are true positives, true negatives, false positives and false negatives. The calculated results are listed in Tables 2 and 3.  As seen from tables 2 and 3, Case 2 achieved 100.0% classification accuracy and geometric mean by using kNN and SVM. There are two classes in the cases 1-4. Their best classification accuracies were at least 97%. The calculated best geometric mean achieved more than 95% for cases 1-4. The chosen classifier we obtained is kNN for all cases. Cases 5-7 have three classes. Our proposed provided 90% or more classification rates for these cases. However, the best obtained geometric mean for Case 5 was obtained and is 81.75%.
As we have seen, we used heterogeneous dataset to study the performance of our algorithm. For Case 8, 4 classes are used and the voice-based diseases detection with kNN and other classifiers was challenging except when using SVM which provided A 89.95% accuracy and 80.93% geometric mean.

VI. DISCUSSIONS
In this study, we proposed a novel multileveled feature extraction method which is a modified version of the TP called MCMTTP. To create MCMTTP, a parametric TP function is presented. Nine feature extraction networks are created using maximum pooling to extract several features at different levels. On the other hand, we use an effective feature selector INCA to select optimal features. We investigated the performance of the proposed algorithm on several pathologic voice dataset (SVD). Eight several cases were created by using these voices. In the literature, machine learning based voice disease detection methods have been applied to discriminate two class. Here we proposed an algorithm that classify 3 and 4 classes using also heterogeneous dataset. Tables 1 and 2 summarizes the performance of our algorithm and shows that the best classifier are kNN and SVM with more a classification rate greater than 90% of all cases except for Case 8 in which 80.93% geometric mean was obtained. To show the success of the proposed MCMTTP based method, results of the other methods were listed in Table 4.
As seen from classification results, the proposed MCMTTP based voice classification method achieved higher geometric mean than other selected works. Table 4 compares our algorithm with other approaches for case 1 and 2 only (binary). We could not do a comparative study between our algorithm and other classes for nonbinary. We summarize the advantages of the proposed MCMTTP, and INCA based method as the followings: 1) Automated feature selection process is solved by using INCA. 2) By using a modified TP (MCMTTP) and maximum pooling method, a multileveled feature extraction method is proposed, and high accuracy rates were achieved by using this feature extraction and INCA based feature selection method together. 3) Six conventional classifiers were used to show strength of the proposed voice-based disease detection method. 4) Eight cases were defined to obtain general results. 5) The proposed MCMTTP and INCA based pathologic voice classification method outperforms.

VII. CONCLUSION
In this work, a novel one-dimensional improved TP method was presented along with an automated voice-based disease classification method. In this method, a multileveled feature extraction network is constructed where features are extracted from each level by parametric TP. The extracted features are concatenated, and the proposed INCA selects discriminative ones. We used SVD pathologic voice dataset to create 8 cases. We selected three diseases for these cases. These are cordectomy, frontal resection and spastic dysphonia. This method reached 100.0% classification accuracy and geometric mean (perfect classification) for Case 2 (frontal resection detection). The proposed MCMTTP and INCA based method achieved high performance (See tables 2-3). This method also achieved better results than other selected presented methods (See Table 4). These results clearly demonstrated that the proposed MCMTTP and INCA method is successful. The proposed MCMTTP and INCA methods are cognitive and lightweight method. Therefore, our techniques are useful for most of new generation of real time applications like a chest diseases clinic as smart assistants.