A Comprehensive Review on Features Extraction and Features Matching Techniques for Deception Detection

Over a few decades, a remarkable amount of research has been conducted in the field of speech signal processing particularly on deception detection for security applications. In this study, a comprehensive review on recent machine learning approaches using verbal and non-verbal features are presented for deception detection. A brief overview on different feature extraction techniques, the results of recognition rate, and computational time based on machine learning methods are summarized in a tabular format. In addition, numerous datasets used as primary sources of deception detection in the review articles are also presented in this work. Key findings from the reviewed articles are summarized and a few major issues related to deception detection approaches are examined. A statistical analysis which conducted by extracting the significant information from the eighty -eight (88) scientific papers over the last thirty (30) years are provided in this review paper. The results emphasize on the trends of research in deception detection as well as further research opportunities for researchers as a part of continuous progress.


I. INTRODUCTION
Detecting human emotion has peaked researcher's interest for generations. However, how well humans or machines ultimately perform the task of deceptive speech remains a challenging question for a criminal investigation. In the case of a speaker under stress, increased activation of the sympathetic or the parasympathetic nervous system is observed to occur when a speaker is angry, fearful, or sad. This increased activation leads to changes in heart rate, blood pressure, and muscle activity [3]. Consequently, the articulatory and respiratory movements are affected by speech production [3 -4]. Therefore, it is important to address this issues in a collective way for the experts who works in the fields of law enforcement, education, health care, government agencies, border crossings, military screenings, regular job screenings, telecommunications, informants at embassies and consulates around the world [3,4,6]. It well known that human speech has emotion and nonlinguistic information that encoded in it where deception is one of them. In fact, deception is incorporated in everyday interactions, yet it is challenging for untrained and trained professionals to accurately detect it without the use of intrusive measures [1] - [2]. Nowadays security is a requirement for all systems and is incorporated in everyday interactions. An increased need for more efficient artificial intelligence security systems has arisen to execute larger and more powerful tasks at a higher productivity rate. So, what is deception? Deception can be described as intentionally causing an individual to accept false statements as one(s) that are true. From a psychological perspective, an individual is being deceptive when subconscious or conscious movements present themselves including shortened length of speech, a flushed face, changes in the individual's voice frequency, avoidant eye contact, changes in the diameter of the eye pupil, as well as presenting a more rigid body [4] - [5]. Traditional methods have attempted to take advantage of these deceptive indicators to detect deception with a relatively high accuracy rate through the combinatory use of various devices such as the polygraph, cardiovascular activity monitor governed by the sympathetic and parasympathetic nervous systems (i.e., blood pressure), heartbeat sensors, strain gauges to measure respiratory rate, and electrodermal activity (i.e., sweatiness of fingertips) [6]. The polygraph measures some human responses like respiratory rate, electro-dermal activity, heart rate, and others through direct contact, which can lead to numerous challenges and complexities in terms of implementation. Monitoring systems of this nature are intrusive, require the subject to be a cooperative, and experienced interviewer with years of training to operate and accurately perform the polygraph examinations.
The remainder of this paper is organized as follows. Section II presents the contemporary feature approaches in deception detection. Within Section II, Subsections 1 and 2 discuss nonverbal and verbal features (i.e., spectral energy and cepstral features) used for deception detection and how they are generally developed. Section III discusses the Principal Component Analysis (PCA) along with previous research results after it was applied to various data sets to improve recognition rates. Section IV presents various feature classification methods, their respective recognition rate results, and time duration for both non-verbal and verbal features used for deception detection. Section V concludes the paper and discusses potential future research directions.

II. CONTEMPORARY FEATURE APPROACHES IN DECEPTION DETECTION
In recent years, numerous avenues have been studied to test Figure 1. A systematic approach and a general process of deception detection research the possibility of more accurately detecting deception in humans using technological aid through the use of various non-verbal and verbal features. Researchers have been studying more innovative ways of detecting deception such as computational methods like artificial intelligence (AI), more specifically machine learning algorithms using non-verbal and verbal cues. Figure 1 that shows a systematic approach of key steps to conduct this deception detection research.

A. EXPERIMENTAL DATABASE
For conduct this comprehensive rearch, the authors used the database which is a collection of utterances from the audio recording of a male suspect under criminal investigation. The suspect was determined to have given deceptive statements under questioning during polygraph testing [3,4,6,12,14,88]. Audio recordings of three sessions of polygraph testing with the same questions by the investigator and the same responses by the suspect will be used for analysis and synthesis. For reference, available two pairs of truthful utterances or ground truth (label as Q7 and Q9) and deceptive utterances (label as Q4 and Q5) of the word 'No' from each recording will be selected for preliminary investigation. These utterances are sampled at the rate of 16,000 samples per second. Figure 2 shows the graphical view of the signals for all three sessions in the time domain.

B. NON-VERBAL FEATURE
Researchers have been learning about the multitude of ways in which humans present their deceptive ruses and how to detect them without the use of intrusive measures. Pavlidis and Levine studied thermal facial analysis [7]. Nugroho, Nasrun, and Setianingsih studied detecting deception using pupil dilation and eye blink analysis with a database consisting of 30 subjects [8]. Yap, Rajoub, Ugail, and Zwiggelaar studied various visual cues of facial behavior to detect deception [9]. Tsechpenakis et. al studied HMM-based visual cue analysis to detect deception [10]. Prosodic and nonlinear linear dynamic features were studied by Zhou, Zhao, Pan, and Shang to distinguish deception [15]. Barsever et. al studied detecting deception using text analysis with BERT [16]. Amir, Ahmed, and Chowdhry used interrogation data to study the brain waves and see how it performs when tasked with detecting deception [21]. Singh, Rajiv, and Chandra collected data from 5 subjects and studied how various eyeblink patterns can be utilized to detect deception [22]. Using facial thermal analysis, Jain et. al used a database with data from 16 subjects to study detecting deception [23]. Then noon, Ali, and Hashim studied the facial expressions of 43 subjects to study how deception alters facial expressions [25]. George, Pai, Pai, and Praharaj, studied eye blink count and eye blink duration using an intragender database consisting of 15 male subjects and 15 female subjects to study how it can be utilized to detect deception [26].

C. VERBAL FEATURES
As listeners, it is well-known that there is a natural intuition to pick up on a speaker's emotions and purposeful language to detect some form of deception. When deciding upon whether a speaker is being deceptive or truthful, most listeners have access to facial expressions as well as involuntary muscle movements to consider besides the voice itself. Using speech to analyze deception provides a non-intrusive experience, especially without needing to attach sensors on a body to chart and read blood pressure, respiration, and pulse. Additionally, speech-based analysis systems can be used to analyze prerecorded speech signals at any point in time, are inexpensive to produce, can be operated effortlessly, and can be designed to be portable devices. Over the years, researchers have been studying methods to analyze and recognize speech signal features and characteristics to detect deception. Spectral energy features and cepstral features being two types of speech signal features used to detect deception.

1) SPECTRAL ENERGY FEATURES
Spectral energy features were developed using the psychoacoustic masking property of human speech perception [48]. The psychoacoustic masking property of human speech perception is utilized to extract the spectral energy speech features, where the "irrelevant" speech signal information that typically goes undetected by the human ear is identified [48]. By modeling the non-linear perceptions of the human hearing sensation, the process of extracting spectral information can improve greatly [50]. Psychoacoustic principles including the absolute threshold of hearing, critical band frequency analysis, simultaneous masking, temporal masking, and the spread of masking along the basilar membrane were incorporated during the speech signal analysis process [48].  [7] Thermal Facial Analysis - [8] Pupil Dilation and Eye Blink Analysis 30 Subjects [9] Visual Cues of Facial Behavior - [10] Visual Cues (HMM-Based) - [15] The Prosodic features, Nonlinear Linear Dynamic features - [16] Text Analysis with BERT - [21] Brain Waves Interrogation Data [22] Eyeblink Pattern 5 Subjects [23] Facial Thermal Imaging 16 Subjects [24] Multimodal Deception - [25] Facial Multimodal Features - [30] Facial Expressions - [31] Infrared Imaging using Time Domain Analysis 11 Subjects Infrared Imaging using Frequency Domain Analysis 11 Subjects [32] Facial Micro-Expressions - [33] Thermal Imaging - [34] Brain Activities 11Male Subjects [35] Head Movement Analysis 10 Subjects [36] Brain Activities 5 Subjects [37] Thermal Imaging - [38] Facial Expressions - [39] Stress-Induced Facial Perspiration via Thermal Imaging 40 Subjects [77] Keybord Dynamics 60 Participants [78] Unexpected Questions and Mouse Dynamics 40 Participants [79] Keystroke Dynamics 190 Subjects [80] Facial Expression Videotaped Interviews [81] Facial Displays and Hand Gesture 61 Trial Videos [82] Multimodal Feature (Lexical, Acoustic and Visual) 121 Trial Videos [83] Improved Debse Trajectories and Microexpression 104 Trial Videos [84] Involuntary Facial Expression 12 People (344 Facial Images) [85] Audio, Visual, Textual (Static and Nonstatic) and Microexpression 121 Trial Video [86] Macro and Micro Facial Expression High Stakes YouTube Videos [87] Facial Action Units 121 Trial Videos The absolute threshold of hearing can be described as the smallest level of a pure tone that can be detected by a listener in a noiseless environment [48]. An example being a listener hearing the ticking sound of a clock in an empty room.
Typically, the absolute threshold of hearing is calculated in terms of sound pressure level (SPL) in decibels (dB) [48]. The absolute (quiet) threshold, ! ( ) is approximated [48,51] by the non-linear function that is shown in (1). This equation is the representation of a listener with acute hearing where is the frequency in Hz. The absolute threshold of hearing is also related to another acoustical metric known as the dB sensation level (dB SL) [48]. Relative to a listener's individual unmasked detection threshold for the stimulus, the intensity level difference is denoted by the SL [48], [52]. The SL is used because it quantifies listener-specific audibility instead of an absolute level [48]. Simultaneous masking is a type of auditory masking that occurs in the frequency domain when two sounds occur at the same duration, one of which is unwanted and one of which is wanted but inaudible [48], [53]. Auditory masking is when a louder sound becomes undetectable to the human ear due to a weaker but audible sound [53] - [55]. Masking is a process where one sound is virtually inaudible due to the presence of another sound [48]. From the perspective of the frequency domain, phase relationships between stimuli as well as the relative shapes of masker and the maskee magnitude spectra are used to establish to what extent the presence of particular spectral energy will mask the presence of other spectral energy [48]. An alternative way to explain this concept is to understand how the stronger noise (tone masker) effectively blocks the detection of the weaker signal at the critical band location by creating an excitation with an adequate strength on the basilar membrane [48].
Non-simultaneous masking or temporal masking is another type of auditory masking that occurs when a sound is made inaudible by another sound, that either immediately precedes or immediately follows the original sound [53], [56]. Pre-masking is one type of temporal masking that immediately precedes the presence of a masker [53]. Post-masking is another type of temporal masking that immediately follows the masker and obscures sound [53]. With the purpose of perceptual coding, a listener will neglect to perceive signals below the elevated audibility thresholds produced by a masker when unexpected audio signal transients generate pre-masking and post-masking regions in time [48].
Using the absolute threshold of hearing, a modified version known as the detection threshold is used for spectrally complex quantization noise [48]. Its shape fluctuating at any given time based on the stimuli present, thus the detection threshold being a time-varying function of the input signal [48]. Based on how the human ear naturally performs spectral analysis, the threshold estimation is calculated [48].
First, along the basilar membrane in the cochlea or inner ear, a frequency-to-place transformation occurs [48], [57]. When a frequency-to-place transformation transpires, an acoustic stimulus produces a sound wave that moves the ossicular bones and the eardrum attached to it [48]. The mechanical vibrations at the oval window, are then transferred to the cochlea which is a fluid-filled, spiral-shaped structure that happens to hold the coiled basilar membrane [48]. Along the length of the basilar membrane, the cochlear structure produces travel waves that were previously excited by the mechanical vibrations [48]. At frequency-specific membrane positions, these travel waves generate peak responses that "tune" various neural receptors, connected along the length of the basilar membrane, to various frequency bands depending on their locations [48]. From the oval window, the traveling wave on the basilar membrane then propagates until the traveling waves approach the area with a resonant frequency near that of the stimulus frequency for sinusoidal stimuli [48]. After the traveling wave slows down and the magnitude increases to a peak, the traveling wave then decays rapidly beyond the peak [48]. For the stimulus frequency, the location of the peak is known as the "best place" and the frequency that best excites a particular place is known as the "best frequency" [48]. With that, the frequency-to-place transformation occurs.
From a signal-processing perspective, the cochlea can be viewed as a band of highly overlapping bandpass filters due to the frequency-to-place transformation [48]. The magnitude responses are nonlinear and asymmetric [48]. The bandwidth of the cochlear filter passbands is nonuniform and increases along with the increasing frequency [48]. Additionally, the cochlear filter passbands are quantified by the "critical bandwidth" that is a function of frequency [48]. Critical bandwidth is when the perceived intensity of sound remains constant at a constant SPL for a narrow-band noise source [48]. It remains constant when the noise bandwidth is increased up to the critical bandwidth, although when the loudness begins to increase for any SPL beyond the critical bandwidth [48]. Essentially, the loudness level remains constant as long as the noise energy stays within a singular cochlear critical bandwidth and it increases when the noise energy is forced into the adjacent cochlear critical bandwidth [48].
To enumerate the cochlear filter passbands, the critical bandwidth is calculated as a function of frequency [48]. The critical bandwidth, ! ( ) is calculated approximately using the non-linear function shown in (2) where frequency, is in Hz.
Frequency in Hz is converted to the Bark scale, ( ) for analysis purposes using (3) where one critical band distance is referred to as "one Bark" [48].
The first critical band started with the resolution frequency (DF) to exclude DC. Critical bandwidth tends to remain constant from about 100 Hz up to 500 Hz and increases to approximately 20 percent of the center frequency above 500 Hz [45,49]. Additionally, the non-simultaneous and simultaneous masking phenomena are induced by the auditory time-frequency analysis in the critical band filter bank to shape the coding distortion spectrum [48]. The perceptual models allocate bits for signal components so that the quantization noise is shaped to exploit the detection thresholds that are determined by the energy within a critical band, for a complex sound [48].
Fan et al. studied detecting deception using speech signals by extracting the short-time energy (STE) feature among   [41]. Cosetl and Lopez used a criminal interrogation database to extract significant energy feature and use it to distinguish between deceptive and nondeceptive speech [43]. Ullah and Gopalan extracted the Bark energy and significant energy features from stressed speech signals to detect deception from a criminal interrogation database [45]. Srivastava and Dubey collected speech signal data from an interview they conducted in an isolated environment to study detecting deception using the fundamental frequency, zero-crossing rate, and energy features [46]. Tao et al. used the fundamental frequency, zerocrossing rate, and energy features to study how it affects deception detection while using the Swiss Research Institute IDIAP WOLF data set [47]. Table 2 shows the spectral energy features and the databases used in previous research work for deception detection.

2) CEPSTRAL FEATURES
Deception detection based on extracted cepstrum features was studied to understand how speech features can be used to detect human emotion and deception. Cepstral representation of an utterance provides a depiction of the local spectral properties of the signal [2], [7], [22]. When analyzing deceptive speech using cepstral speech characteristics, it exhibits an increased amplitude, decreased speech duration, and increased fundamental frequency [4].
Detecting deception using cepstral features has limited research. The delta cepstrum features added to the static MFCC features strongly improves speech recognition [60]. Wang et al. studied speech features including MFCC and energy features using the CSC corpus [42]. Ullah et al. presented the results of detecting deception through the process of analyzing human speech signals and their extracted cepstrum features [4], [6]. Chowdhury et al. studied the MFCCs and other speech features in their research work [14]. Chowdhury et al. studied the effects of using characteristics of speech to detect deception and noted an increased duration of speech, fundamental frequency, and amplitude when a person was being deceptive [14]. Graciarena et al. studied the results of detecting deceptive and non-deceptive speech based on prosodic, lexical, and acoustic features using the CSC corpus [13]. P. Benson in [58] obtained and analyzed an audiotape of a pilot's speech during a serious aircraft malfunction, engine failure of the single-engine F-16. This investigation revealed that speech under stress is shorter and simpler than that of normal speech. Gopalan and Wenndt in [12] studied the initial results of analysis of speech features for speech under stress and for detecting deception from speech utterances of a criminal suspect using Benson's research as a steppingstone. Using the CSC corpus, Desai et al. extracted the MFCC speech feature among other features and used it for detecting deception [41]. In [46], Srivastava et al. extracted the MFCC speech features from their deception detection database that they created using the data from interviews they conducted in an isolated environment. Tao et al. studied speech deception detection using MFCC features and the Swiss Research Institute data set. In [59], Venkatesh et al. studied one hundred and twenty-one real-life trial videos to extract and examine the MFCC, Log-Energy of MFCC, and thirteen cepstral coefficient speech features to detect deception. Table 3 highlights previous research work for deception detection using various cepstral features.

III. PRINCIPAL COMPONENT ANALYSIS
While working on data deception, it was observed that large datasets often make it challenging to interpret the results more accurately [6]. The incorporation of the principal component analysis (PCA) provided a solution by reducing the data dimensions while increasing the interpretability of the data and minimizing the loss of information [6]. The PCA is a type of reduction method that takes into consideration the original dataset as rows representing elements in high dimensional space [61]. The rows are arranged to directions that characterize the optimal set of features [61]. By constructing a group of new latent variables, the PCA is then able to reduce the original data dimensions [6]. From the new mapping space, the main variation information is then extracted along with the statistical features [6]. The original data can then construct the new solution of the spatial features [6]. To reduce the dimensions of the projection space, the variables in the new mapping space are composed of linear combinations of the original dataset [6]. The correlation between variables is then eliminated and the complexity of the principal characteristic analysis is simplified due to the statistical eigenvectors in the projection space being orthogonal to each other [62].
Fernandes and Ullah proposed using the PCA to improve their speech-based deception detection recognition rate results using various speech features [6]. Using the time-difference energy speech feature, they achieved an 8.34% increase in recognition rate, and while using the timedifference cepstrum feature, they achieved an 8.33% increase in recognition rate [6]. While using the delta energy speech feature, they achieved a 12.5% increase in recognition rate and a 29.17% increase using the delta cepstrum speech feature [6]. Roopa and Asha proposed using the principal component analysis (PCA) to improve their diabetes disease prediction approach and achieved a 6.03% increase [61]. For their underwater image recognition study, Bi and Du proposed using the PCA to improve their image recognition rate and achieved a 20.3% increase after applying the PCA to their data [63]. To solve the irregular packing problem, Gua et al. proposed a packing algorithm based on the PCA methodology which resulted in an increased filling rate, decreased packing time, and increased packing number as compared to the MGA method [64]. Zheng et al. proposed a PCA-based support vector classifier and noted an increased identification rate in their heart and adult data sets, as compared to the conventional support vector classifier [65]. Table 4 shows the previous research work conducted using the PCA and how it affected the recognition rate results. Overall, applying the PCA does show an increased rate in recognition results.

IV. FEATURE CLASSIFICATION METHODS USED IN DECEPTION DETECTION
In many speech processing tasks, deep neural networks have been successfully used in speaker verification [66] - [67], speech enhancement [68] - [69], and speech recognition [70,71,72], deception detection [4], [45], and emotion recognition [14]. Fan et al. constructed and used a Chinese corpus consisting of 15 male and 15 female recordings [1]. They extracted four kinds of speech features from the database including STE, pitch, format, and duration for male and female subjects [1]. Using logic regression (LR), J48 decision tree, multi-layer perceptron (MLP), SVM, and gradient boosting decision tree (GBDT) to test the effectiveness of using a combination of the features for deception detection [1]. The highest rate for both genders was achieved using the GBDT classification method with recognition rates ranging between 82% and 85% [1]. The time-difference spectral energy feature achieved a 79.16% recognition rate before applying the PCA, and 100% after applying the PCA using the Levenberg-Marquardt feature matching technique. Using the LSTM feature matching technique, the feature achieved a 91.66% recognition rate before applying the PCA and 100% after applying the PCA. Using the BFGS Quasi-Newton feature matching technique, the feature achieved a 75% recognition rate before applying the PCA and 100% after applying the PCA. The delta spectral energy feature achieved a 75% recognition rate before applying the PCA, and 87.5% after applying the PCA using the Levenberg-Marquardt feature matching technique. Using the LSTM feature matching technique, the feature achieved a 58.3% recognition rate before applying the PCA and 83.33% after applying the PCA. Using the BFGS Quasi-Newton feature matching technique, the feature achieved a 50% recognition rate before applying the PCA and 87.5% after applying the PCA. The time-difference cepstrum feature achieved an 83.33% recognition rate before applying the PCA, and 79.16% after applying the PCA using the Levenberg-Marquardt feature matching technique. Using the LSTM feature matching technique, the feature achieved a 91.66% recognition rate before applying the PCA and 100% after applying the PCA. Using the BFGS Quasi-Newton feature matching technique, the feature achieved an 87.50% recognition rate before applying the PCA and 100% after applying the PCA. The delta cepstrum feature achieved a 79.16% recognition rate before applying the PCA, and 91.66% after applying the PCA using the Levenberg-Marquardt feature matching technique. Using the LSTM feature matching technique, the feature achieved a 50% recognition rate before applying the PCA and 75% after applying the PCA. Using the BFGS Quasi-Newton feature matching technique, the feature achieved a 70.83% recognition rate before applying the PCA and 100% after applying the PCA. Graciarena et al. reported on distinguishing deceptive speech from non-deceptive speech using the CSC corpus and various classification models and features [13]. They computed 215 prosodic features including pitch, energy, and duration, 20 lexical features including filled pause counts, dialog act labels, and syntax-based features, as well as acoustic features including spectral-based Mel cepstral features with energy, simple delta features, double delta features, and triple delta features [13]. Graciarena et al. achieved the highest recognition rate using a combination of the acoustic and prosodic features as input into the GMM/SVM models with a   recognition rate of 64.4% [13]. Using relevance vector machine (RVM) and non-linear dynamic features, Zhou et al.
proposed an intra-gender deception detection approach [15]. With a combination of the prosodic and nonlinear linear dynamic features of a male and female subject, they were able to achieve the top recognition rate of 70.3% and 70.15%, respectively, using the RVM as compared to SVM or the radial basis function neural network (RBFNN) models [15]. Venkatesh et al. proposed the extraction of MFCC, cepstral coefficients, and log-energy of MFCC to detect deception using various classification methods including SVM and LSTM [59]. Using SVM, they achieved the highest recognition rate of 72% with the log-energy of MFCC feature whereas while using the LSTM, they achieve the highest recognition rate of 58% with the cepstral coefficients feature [59]. Ullah and Gopalan proposed extracting the Bark energy and significant energy features to detect deception [45]. Using the significant energy feature, they achieved a 66.67% recognition rate with the Levenberg-Marquardt classification method [45]. Using the Bark energy feature, they achieved a higher recognition rate at 83.33% with the Levenberg-Marquardt classification method [45]. Using the LSTM classification method, Desai et al. achieved recognition rates of 81.03% using the LexRNN features, 84.11% using the HybridRNN features, and 62.59% using the AudioRNN features [41]. Using the SVM classification method and MFCC speech feature, Wang et al. were able to achieve a 51.8% recognition rate whereas when using the LSTM classification method and the same feature, they were able to achieve a 54.6% recognition rate [42]. When using the energy speech feature and the SVM classification method, Wang et al. were able to achieve a 50.2% recognition rate whereas when using the LSTM classification method and the same speech feature, they were able to achieve a 47% rate of recognition [42]. Enos et al. proposed detecting deception through the use of critical "hot spot" segments in the speech where they achieved a recognition rate of 68.6% using a combination of bagging, AdaBoost, and J48 [73]. Using various lexical and speech features, Warnita and Lestari were able to achieve a 50.45% recognition rate using the random forest decision tree (RFDT) classifier [74]. Tao et al. achieved an 82.47% recognition rate using an SVM classifier and a combination of various speech features including fundamental frequency, STE, zero-crossing rate, and MFCC as inputs [47]. Kumar, Kim, and Stern achieved a recognition rate of 100% using SVM and using a combination of the fundamental frequency, zero-crossing rate, MFCC, frames function, and energy [46]. They were also able to achieve a recognition rate of 93.33% using an artificial neural network (ANN) for the same set of features [46]. Xue et al. proposed using MFCC, pitch, and energy features to detect deception using various classification methods [40]. Using SVM, Xue et al. were able to achieve a 51.8% recognition rate using the MFCC feature. They were also able to achieve a 54.6% recognition rate using LSTM and the MFCC feature [40]. Although while using the ensemble classification method, they were able to achieve the highest recognition rate of 55.8% using the MFCC and energy features [40]. Using the Ripper rule induction classifier, Hirschberg et al. were able to achieve a recognition rate of 66.4% using a combination of acoustic/prosodic, lexical, and speaker-dependent features [75]. Table 5 presents the previous deception detection studies along with what database and classification methods were used in the studies as well as their recognition rate results.

V. CONCLUSION
This study presented a comprehensive review of various verbal and non-verbal features extracted for deception detection as well as the recognition rate results of the various feature matching techniques used. Overall, the time-difference energy feature extracted and developed by Fernandes and Ullah in [6] showed the highest recognition rate of 100% after applying the PCA using three unique feature matching techniques. Thus, the time-difference energy feature could be a potential feature for speech-based deception detection. The authors suggested further research using more speech utterances from a multitude of speakers, which was hard to obtain, can confirm the results of the proposed feature classification methods in detecting deception [6].
The limitations recognized in the previous research work include the size of the database and the type of the database. A large intragender deception database was one recommended solution. Another limitation highlighted was limiting the analysis of the data to single-word utterances as compared to analyzing full sentences in certain studies.
Studying the use of a field-programmable gate array (FPGA) with the post PCA data is an implementation method where the results in this comprehensive review could be used to expand as further research direction to create a product based on a software-hardware device with improved accuracy for real-life applications.