LAD: A Hybrid Deep Learning System for Benign Paroxysmal Positional Vertigo Disorders Diagnostic

Herein, we introduce “Look and Diagnose” (LAD), a hybrid deep learning-based system that aims to support doctors in the medical field for diagnosing effectively the Benign Paroxysmal Positional Vertigo (BPPV) disorder. Given the body postures of the patient in the Dix-Hallpike and lateral head turns test, the visual information of both eyes is captured and fed into LAD for analyzing and classifying into one of six possible disorders which the patient might be suffering from. The proposed system consists of two streams: (1) an RNN-based stream that takes raw RGB images of both eyes to extract visual features and optical flow of each eye followed by ternary classification to determine left/right posterior canal (PC) or other; and (2) pupil detector stream that detects the pupil when it is classified as Non-PC and classifies the direction and strength of the beating to categorize the Non-PC types into the remaining four classes: Geotropic BPPV (left and right) and Apogeotropic BPPV (left and right). Experimental results show that with the given body postures of the patient, the system is capable of accurately classifying given BPPV disorder into the six types of disorder with an accuracy of 91% on the validation set. The proposed method can successfully classify disorders with an accuracy of 93% for the Posterior Canal disorder and 95% for the Geotropic and Apogeotropic disorder, paving a potential direction for research with the medical data.


I. INTRODUCTION
Benign Paroxysmal Positional Vertigo is one of the most common causes of vertigo with a disorder arising from the problem in the inner ear [1].Symptoms are repeated, brief periods of vertigo with movement, characterized by a spinning sensation upon changes in the position of the head [2].Each episode lasts less than one minute and nausea is commonly associated.Each part of the name depicts a key feature of the inner-ear disorder: Benign means it is not very serious, the patient's life is not in danger; Paroxysmal indicates that the symptom occurs suddenly and takes place within a short time; Positional implies that the vertigo symptom is triggered for certain postures and head movements.BPPV is considered a common public health problem [3].The probable causes of BPPV are different lesions such as head trauma, head injury, or some surgeries related to otology, oral, and maxillofacial cases [4].The dizziness symptom of BPPV is explained by loose calcium deposits (crystals or "ear rocks") in what is called the semicircular canals of the inner ear [5].When the patient's head is moved, these crystals roll around the semicircular canal, which transmits misleading information (like what the eyes are seeing) with the information received by the brain and these conflicting signals make the person's dizziness happens.There are around 5 to 6 million people who suffer from this disorder annually [6]- [8].In some cases, the problem of BPPV might become serious when it causes increasing chances of falling and losing balance for patients [9].The abnormal movements of the eyes help the doctor determine the type of BPPV disorder.The symptoms of benign paroxysmal positional vertigo usually often accompany abnormal rhythmic eye movements.These behaviors include nystagmus (beating) and torsional motions of the eyes.Based on the different nystagmus movements of the eye, the doctor is able to categorize BPPV into different disorders for the appropriate treatments.Differentiating different disorders is a challenging task as it requires expert knowledge and careful observation of the patient.BPPV disorder can occur in all age groups and gender.However, it is rare for someone under 20 years to have the disorder but quite common for someone in the middle age group of 31-50 years [3].

Right Eye Beatings
Left-beating Right-beating FIGURE 1. Horizontal beating detection of the right eye has been analyzed for a patient who is diagnosed to be positive to type Lt_Geo_BPPV with 9676 video frames.Red points are the right beating, blue points are left beatings.The horizontal axis is the number of frames, the vertical axis is velocity.
BPPV is often benign, however, it may be dangerous in certain cases.For example, when a BPPV patient is working on a ladder or on the top roof, he may suddenly be hit by a vertiginous symptom and he may lose balance which may lead to an unwanted serious accident.If the kind of disorder is accurately diagnosed, prompt treatment can be applied so that the patient can be cured earlier and avoid the painful experience and future risks [10].BPPV disorders are diagnosed by observing the visual motions of the human eyes in some clinical settings [11]- [13].
Based on any abnormal movements of the eyes, the otology experts are able to categorize each case to a specific type for further appropriate treatments.When the number of patients increases, it is a burden for medical employees.The cost of BPPV diagnosis is estimated to be approximately $2000, and most patients suffering from this disorder may undergo unnecessary testing or other interventions which leads to the associated cost of billions of dollars per year for diagnosing BPPV alone [2].Deep learning has made a huge impact in computer vision that includes image classification [14]- [16], video question and answering [17]- [19], object recognition [20], and natural language processing [21].It is helping to identify, classify, and quantify patterns in medical images.In [22], Inception V3 is used to extract new knowledge from retinal fundus images and infer multiple cardiovascular risk factors with the high area under the receiver operating characteristic curve (AUC).In [23], the finegrained variability in the appearance of skin lesions has been effectively captured by using CNN with manipulated transfer learning to detect skin cancer and achieve performance on par with the clinical experts.Recent studies demonstrate the potential applications of deep learning for understanding human actions in the videos [24]- [27].These researches show the promising capability of deep learning in capturing the torsional nystagmus patterns in the fine-grained motion of the eyes for BPPV analysis.In this paper, a deep learning system referred to as "Look and Diagnose" (LAD) is capable of classifying BPPV disorder into six types of the most common BPPVs including a geotropic lateral canal (left and right), apogeotropic lateral canal (left and right), and posterior canal types (left and right) based on the video of both eyes and seven postures of the patient.LAD is capable of classifying different types of beating (nystagmus) including torsional and horizontal beating to effectively classify BPPV disorders.To develop LAD, we have collected a large dataset of both eye movement videos of BPPV disorders.More details are shown in section IV.

A. BPPV ANALYSIS
Nystagmus examination is a crucial step to diagnose BPPV disorder [ to provoke the nystagmus for diagnosis and the BPPV can be treated effectively by applying repositional moving [10].Previous studies show that horizontal (lateral) canal type (5% to 15% of BPPV cases) is less common than the Posterior canal type (85% to 95% of BPPV cases) [2].
Slama et al. [28], [29] used a multilayer neural network (MNN) and the recorded parameters from Video Nysta Geographic (VNG) data to analyze the nystagmus to diagnose whether one person has a vestibular disorder or normal.Nystagmus signal from Videonystagmography (VNG) device has been analyzed by CNN-based method to classify two classes of vestibular disorder [30].Lim et al. [31] introduced a more complete approach that does not just only classify the nystagmus, but also diagnoses the final BPPV class using various ad-hoc techniques.
Lim et al. recorded the tracking videos of eyes at ten different postures of the patient.These videos have been processed to get the cumulative transient velocity of the eye movement by simply subtracting eye coordinates for every video frame.In contrast to [31], we collected videos from our clinical warehouse with seven postures of patient (Fig. 3, Fig. 2).A bidirectional GRU-based network was used to successfully learn end-to-end the torsional movements and nystagmus simultaneously of the two eyes from the given video input.
A consecutive feature-based classifier captures the horizontal beating of the eyes and recognizes the remaining four horizontal (lateral) canal BPPV disorder classes including geotropic BPPV (left and right side) and apogeotropic BPPV (left and right).

B. ACTION RECOGNITION
Video understanding is an attractive and challenging task.Some earlier works tried to extract the handcrafted features of the video via improved Dense Trajectory (iDT) and combine it with low-level video descriptors for video representation [24], [32], [33].Deep learning has made a significant impact on human action recognition [34]- [40].Two-stream networks are capable of extracting relevant features regarding the appearance and motion information from RGB and optical flow field for accurate action recognition [25], [27], [41].
Another approach for action recognition uses the 3D-CNN networks to directly capture motion clues in the video by using 3D convolution without using optical flow field [42], [43].In this work, a two-stream system is adopted to extract the motion features such as the torsional nystagmus of the eyes for accurate disorders classification.

C. EYE PUPIL TRACKING
Traditional computer vision methods use edge features, intensity gradient distribution and intensity thresholds to detect the eye pupil [44], [45].A more advanced feature-based technique uses edge segment selection and conditional segment combination for pupil detection with high accuracy [46].Recently, a deep learning-based network is incorporated to robustly detect pupils even under extreme conditions with reflections and occlusion by eyelids [47], [48].
In this work, the pre-trained CNN model for localizing the pupil proposed by [48] is adapted, and the pupil position is monitored in the proposed system for classifying horizontal beating so that four types of lateral canal BPPV disorders are categorized.To this end, an algorithm is proposed to process the pupil location to detect when the beating happened and measure the speed of horizontal beating for lateral canal BPPV.

III. METHODOLOGY
This section describes the details of the proposed BPPV disorder diagnosis system referred to as LAD which takes as input the video recording of eye movement of both eyes with posture labels.Seven postures labels aligned with the eye movement are provided.Fig. 3 shows the seven postures that include sitting, head on the bed turned left, head on the bed turned right, head hanging 45 o to the left, head hanging 45 o to the right, lying down position, and head bending a sitting position.
Based on these positions, the system learns to capture the relevant features of both eyes from RGB video and optical flow output of the optical flow CNN and based on a bidirectional gated recurrent unit (BiGRU) framework categorizes the input video into two types of posterior canal BPPV (PC) and other (not PC).If the video is classified as not PC type, then the beating feature of the eye is extracted to Here, ∆x1 indicates the distance when the eye travels during the AB in the direction of pixel increasing and ∆x2 is the distance when the eyes move toward the decreasing pixel direction.Similarly, we can determine the beating up and beating down where the eyes have the beating direction vertical.The beating is counted if the beating velocity is above a threshold.
categorize it into one of four remaining BPPV disorder types.

9:
else if f lag 1 < 0 and f lag 2 < 0 then if v lef t < v right then store beat right else store beat lef t , reset counters to 1. 17: end for

A. CLASSIFYING POSTERIOR AND LATERAL TYPES
Posterior canal disorder: It is specified by the torsional movement of the eye triggered under certain postures of the patient.The posterior BPPVs are categorized into two types: the first one is the "left posterior canal" type (Lt_PC_BPPV) specified by nystagmus of torsional clockwise along with a slight up-beating when lying with head hanging, and the second one is "right posterior canal" type (Rt_PC_BPPV) with the nystagmus of torsional counter-clockwise alongside with a slight up-beating when lying with head hanging (shown in Fig. 8).For given video input, we first classify whether it belongs to posterior canal types or not.To classify posterior BPPVs, we use the recurrent neural network with features extracted from a pre-trained CNN for two eyes of the patient The whole architecture for detecting six BPPV disorders.In the first stage, the input video is categorized into three raw classes "posterior types" and "not posterior types".After that, a pupil detector based on CNN is applied to track the eye pupils and detect the beating that happens in the second stage.
at five positions as in Fig. 3: c) d) e) f) g).At each position, the eye's motion is captured by a bidirectional GRU followed by a fully connected layer with 128 neurons.The outputs of five positions are concatenated to a 640-dimension vector and fed into a fully connected layer and a final softmax layer for classification (Fig. 7).
Lateral canal disorder: BPPV with the horizontal beating of the eyes (shown in Fig. 4) should be distinguished from posterior canal BPPV (PC) which is associated with torsional nystagmus of the eye.Based on the beating direction of the patient eyes corresponding to his/her body posture, it is categorized into geotropic and apogeotropic BPPV (next section).According to the Barany Society [49], the horizontal canal (HC) BPPV is categorized into two different types Canalolithiasis (HC-geo) and Cupulolithiasis (HC-apo) types for each ear (left and right side).So, the total types of HC-BPPV are 4 types: 1) Right ear HC-geotropic (canalolithiasis), 2) Right ear HC-apogeotropic (cupololithiasis), 3) Left ear HC-geotropic (canalolithiasis), and 4) Left ear HCapogeotropic (cupololithiasis) [49].To classify the different types of geo and apogeo BPPV, LAD extracts the number of beatings and the velocity magnitude of these beatings.And this information is utilized to form a final 20-dimension features vector for each given video input.

B. CLASSIFYING GEOTROPIC AND APOGEOTROPIC TYPES
Geotropic BPPV includes: left-and right-lateral canal BPPV, we denote them as "Lt_Geo_BPPV" and "Rt_Geo_BPPV", respectively.In the same way, we have the apogeotropic BPPV: "Lt_Apogeo_BPPV" and "Rt_Apogeo_BPPV".The definition of these disorders is summarized below.We performed experiments using LSTM or GRU for these types of BPPVs and found that it is not better than random guess indicating that RNN is hard to learn the horizontal beating.All four different types of the lateral canal are characterized by horizontal nystagmus, and endto-end learning of CNN+RNN did not produce satisfactory classification results.Horizontal nystagmus occurs within a certain moment in time (20s-30s), and "strong beating" in each lesion side is the most important indicator in classifying these BPPV disorders (specialists perform diagnosis by looking at particular characteristics of the horizontal nystagmus).In this case, the CNN+RNN model failed to capture the "strong beating" signal.The proposed Alg. 1 detects strong beating by way of measuring the speed of the beating, which is the most important indicator for classifying different lateral canals.
In the second stream, LAD is learned to extract the different types of features to categorize horizontal beating and classify the beatings into one of four types of horizontal BPPV disorders.

1) Geotropic BPPVs
There are two geotropic BPPV types that are defined as follows: Left geotropic canal BPPV composes the geotropic nystagmus and stronger intensity on the left side; while right geotropic canal BPPV includes the geotropic nystagmus and accompanies the stronger intensity on the right side.Both of the cases are considered from the patient's perspective for the postures.

2) Apogeotropic BPPVs
Two types of Apogeotropic BPPV are differentiated conversely with the geotropic types: Left apogeotropic canal BPPV with apogeotropic nystagmus and stronger intensity on the right side; right apogeotropic canal BPPV with apogeotropic nystagmus and stronger intensity on the left side.

C. EYE BEATING DETECTION
Normally, a video is recorded with the full seven actions of the subject patient with the support of a skillful doctor.In some cases, "head bending" may not be included.We detect horizontal beating for acquiring the visual features of the eyes to classify four types of lateral canal BPPV disorder.We Lt_PC_BPPV type with clockwise torsion and upbeat.These featured movements of the eyes have been learned successfully by the proposed system which is much more robust than the conventional template-matching -based method as in [31].
can use different available methods to detect and track the eye pupil from the classical techniques that use feature-based methods such as Hough Transform [45] or the more recent advanced technique PuRe [46].Recently, deep learning is an interesting approach for tracking the pupil [47], [48].Here we detect eye pupils with a CNN-based network that was inspired by Shaharam et al. [48].In our dataset, as described in section IV, to trigger and observe the clear symptom of BPPV disorders, the clinical experimental settings are done in the low light condition to record the videos of the two eyes.In addition, we find that the size and shape of the pupils varied for different patients, and there are cases the eyelid covers the pupil partly or half for which the traditional pupil detector failed to detect correctly the pupils (contains lots of false positives).In practice, we find that the neural network-based CNN detector [48] performs more robustly in detecting the pupil compared to the traditional method such as Hough Transform, furthermore, CNN is supported to run on GPU with high speed in processing (∼120 FPS for a single GPU Titan Xp NVIDIA, compared to the traditional detector Hough with ∼70 FPS).
The outputs are the coordinates of the eye (x, y) and the radius of detected pupils r.This information of coordinates is utilized to calculate the velocity of the eye movement for beat detection.With each given video, the output should be the beating with the magnitude of beating specified by the velocity magnitude stick to the frame ID that the beating is detected.With each of the positions of the patient, we have the total number of beats left, and right, and the corresponding velocity magnitude.This information is stacked to form the final feature of each video to distinguish four types of BPPVs (Geo and Apogeo types).Algorithm 1 detects the horizontal beating using the coordinates of the eye pupil acquired from every video frame.
SPV and FPV are computed by Eq. 1.The coordinates of the eye's pupil (x, y) include x for the horizontal axis and y for the vertical axis.For the horizontal canal BPPV, we focus on the horizontal component x only.Algorithm 1 illustrates how to detect one beating that occurred with the coordinate x of the eye.The beating velocity is computed by the following equation (Fig. 6): where ∆x is the distance the eye travels within the space of the video frame during the time step ∆t.The beatings for two eyes of the patient are scattered in Fig. 1 and Fig. 5, the behavior of both eyes is almost the same.In some cases, if one of the eyes closes then we can still get the information from the other one.In the region from frame 1900 to frame 2200 when the patient turned his head right, the right beating is dominated (red points).A few moments later, when he turned left (frame 2500 to frame 3000), the left beating appeared (blue points).

D. FEATURES FROM DETECTED BEATING
For each left and right pair of beating, a softmax (Eq.2) is applied to normalize the raw beating number to a range between 0 and 1.The pair of average beating velocity magnitudes are also normalized in the same way.For horizontal canal disorder, we care for the beating on the positions in Fig. 3 d, e, f, g.With four positions of the patient, we have a total of eight normalized values of the left and right beating numbers.The binary value 0 for the left side and 1 for the right side, denotes the side that has the maximum total beating, and performed a softmax normalization for the pair of the max total beating on the left side and the max total beating on the right side as the following equation: where i ∈ 1, 2 and x = (x 1 , x 2 ) ∈ R 2 .After this step, we have three more features for positions d, e.Now apply the same way for the average beating velocity for both sides, we have three more features for positions d, e (lateral head turn test).Similarly, six features are extracted for the position f, g when the patient lying on the bed (with the head hanging left and right with the Dix-Hallpike maneuver).For the position of head bending (Fig. 3b), and lying down (Fig. 3c), the beating direction is important, the magnitude of beating is not considered when diagnosing lateral canal BPPV types.
Finally, we form a 20-dimension vector extracted from a video that contains information on the horizontal beating of both affected sides of the patient.These feature vectors are used as input to a classifier such as a deep neural network or an SVM for classification.The experimental results and analysis are described in the next section.We also conducted the experiments with the method in [31] that is most relevant to our work on the nystagmus analysis for BPPV.

A. DATA COLLECTION
The videos were recorded in *.avi format with the resolution of 240 × 320 and at 30 FPS, each video includes three subvideos: left eye, right eye, and whole patient's body scene.As shown in [52], to observe the symptom of the eyes clearly, the diagnostic tests were conducted in a dark environment.There are two popular positional tests that are used generally to diagnose BPPV disorder which are described in more detail below.The room is set up with a low light condition to trigger BPPV symptoms in patients.The patient is staying on the bed.There is one doctor standing by side to perform Dix-hallpike maneuvers and lateral canal tests.There is one camera located 2.3m far from the patient to capture the whole scene of both patient and doctor.The first test is the Dix-Hallpike maneuver which is used to diagnose the type of posterior canal BPPV [10].This test is performed independently for both the left and right sides.For the left side, the patient (equipped with a goggles camera attached to his head) is seated straight on the bed and a doctor will help him to turn his head 45 o to the left and quickly lie down so that the patient's head is hanging left out of the bed (Fig. 3f).The right side test is similar to the left side but now the head of the patient is turned 45 o to the right before lying down in a hanging position (Fig. 3g).
Keeping each side test duration for less than thirty seconds and the eyes of the patient are observed for any nystagmus that occurred during this period.In the Dix-Hallpike test, the vertical-torsional nystagmus is often observed either clockwise or counter-clockwise direction which specified the left posterior canal or right posterior canal BPPV type, respectively.The second test performed on the data collection is the lateral canal test [2].This test is done by keeping the patient lying (head is on the bed) with the eyes looking straight to the ceiling and conducting two maneuvers.For the left side, quickly turn the patient's head toward the left (Fig. 3d) For the right side, turn the patient's head quickly to the right (Fig. 3e).After each of two of these maneuvers, the eyes of the patient are observed within less than thirty seconds to see if any horizontal nystagmus has occurred during this period.
In the lateral canal test, four types of BPPV are clearly observed including left and right geotropic lateral canal BPPV and left and right apogeotropic lateral canal BPPV.The patient's body (captured by a remote camera) and two eyes (captured by goggles) were performed simultaneously and stacked to a video with three sub-videos for the left eye, right eye, and the body (to see the scene later).Each video belongs to a unique patient, ranging from 10 to 70 yrs old, including both genders.
The two physical tests are consecutively recorded resulting in a unified video with a duration of approximately three minutes and a size of 15-20 MB on average.By observing the behaviors of the patient eyes in all positions we tested above, two otology experts in a hospital analyze and classify which kind of disorder the patient belongs to.The recorded videos with subtle symptoms, low quality, or incomplete have been removed from the cohort.A total of 746 data from patients with six classes of BPPV disorder with 406 data for posterior canal types and 340 data for lateral canal types were used in the experiments.
Table 4 shows that posterior canal BPPV is more common and accounts for a proportion of 26% and 29% for the left and right types, respectively.Table 3 illustrates a comparison of different works on various data collections in BPPV analysis.Our dataset is different from previous datasets: it includes movements of both eyes that are aligned with the different postures of the patient (following the Dix-Hallpike test procedure).

B. TRAINING DETAILS
A pre-trained CNN ResNeXt32x48d [53] is used as a feature extractor for both spatial and temporal streams which results in a 2048-dimension vector for each frame of the two eyes.We tried with another feature extractor Inception V3, however, the recognition accuracy is behind 5% compared to the ResNeXt32x48d.In the part of classification for posterior BPPVs, we utilized a bidirectional GRU to learn the eye motions in five corresponding postures of the patient.
In this part, each GRU network outputs a vector of 128 dimensions and is stacked to form a final 640-dimensions vector.A fully connected layer (512 neurons) and a softmax layer (3 neurons) were added to classify the given video into one of 3 classes: None_PC, "Lt_PC_BPPV" and "Rt_PC_BPPV".All of these networks used ADAM optimizer [54] with default hyper-parameters for training and cross-entropy loss for classification.The learning rate was set to 0.001.
In the classification part for four types of BPPVs, each video was extracted to a 20-dimension vector and then fed into a linear-kernel SVM classifier for classification [55].The use of SVM was a designer's choice as SVM performed similarly to FC+softmax and SVM was a simpler choice requiring less memory.

C. EVALUATION METRICS
To evaluate performance, the following metrics were used: F1-score.Accuracy is computed by the proportion of correct predicted class over total data.Precision shows the proportion of predicted data that are correctly classified.Recall gives the proportion of actual samples that are correctly classified.The F1-score determines the performance of the model to classify all data, which makes a balance between precision and recall and gets its best value at 1 and worst score at 0. These standard measures are computed and presented by the following standard equations: where TP, TN, FP, and FN denote the True Positive, True Negative, False Positive, and False Negative, respectively.In the total 746 data samples, train and valid set are split randomly with a ratio of 90%/10% three times.The aforementioned metrics are calculated for three validation sets and averaged.The split procedure follows that used in obtaining the splits for JHMDB/ HMDB dataset for action recognition [56], [57].Instead of a 7:3 split of HMDB, we used a 9:1 split for the evaluation.

D. RESULTS
We show that the four lateral canal types of Geotropic and Apogeotropic BPPV disorder can be well classified based on the proposed method.Fig. 10 visualizes the feature vectors of lateral BPPV types in 2D plane with t-SNE visualization [58].It clearly shows that these vectors are clearly separable and belong to four separate groups.
These feature vectors are constructed by the measurement of horizontal beating velocity alongside magnitude and the beating intensity which are the crucial clues for distinguishing the horizontal canal BPPVs.Table 6 shows the validation accuracy of 95% for lateral canal type, 93% for posterior canal types, and 91% for overall.For the posterior canal BPPV, the classification results are shown in Fig. 9, Table 6, and Table 5  To visualize how the "gated recurrent unit" (GRU) differentiates different posterior canal BPPV types, the features from the top layer of the model in Fig. 7 are extracted.The classification results and t-SNE plot demonstrate that the aforementioned featured motion of the eyes has been successfully captured by the bidirectional GRU model (BiGRU) to distinguish posterior canal types (torsional movements) and lateral canal types (horizontal beatings).Overall, the whole system classifies the videos in the validation set with 0.91, 0.90, and 0.90 for precision, recall, and F1 scores as shown in Table 8, respectively.Table 7 compares the baselines and our method.It shows that method in [31] has an accuracy of 48.7% which is much lower than that of our 91.0%on our dataset.Low resolution, low-lighting conditions, and partially obscured iris make it a challenging problem, and the simple template-matching based-method is not robust enough to capture the torsion in the iris.It also shows that our algorithm is far superior in comparison to the temple-matching method [31] which can never be robust to sporadic eyelid closure.
Alg. 1 detects frames with beating, which is important information to specialists for diagnosing lateral canal BPPV, it works as the attention mechanism filters out all frames with no beating, and the method in [31] doesn't have this capability.The pupil detection is performed independently to mimic how experts are performing the beating (nystagmus) detection.Table 3 shows the recent works in BPPV research.Each reference has a separate dataset and a different number of classes for nystagmus analysis.The most related data to our work is the work of Lim et al. [31].They collected data from ten different positions of the patient, however, they failed to apply RNN to classify the disorders and used the handcrafted feature as the feature descriptor.In contrast, our dataset contains 7 postures of the patient, applied successfully an RNN (with a bidirectional GRU) for classification of BPPV disorders, and proposed an algorithm that detects the horizontal beating that can filter out all irrelevant frames that have no beating, which results in the more accurate information for classifying the torsional movements as well as the horizontal nystagmus in the given video input.It is important to categorize well the video input in the first stage so that the next stage can use its results for the next stage to diagnose the rest of the BPPV types.For comparison with the full features-based method in [31] that is closest to our work, we reimplemented that technique with templatematching for torsional categorization in our dataset (Fig 14).The comparison results are reported in Table 7.

V. CONCLUSION
This paper proposes a deep learning-based system to support the doctor in automatically diagnosing BPPV disorder effectively based on the visual motions of the patient eyes in the standard maneuver tests.Abnormal eye behaviors are triggered and these features are captured and classified into six different types of BPPV.The torsional movements of the eyes are successfully captured by a BiGRU network, while the horizontal nystagmus is distinguished by the normalized beating feature that is extracted from the eye pupil's horizontal coordinate.
By detecting accurately the horizontal beating that occurred in a specific timestamp of the whole video, the final features of the input video are extracted to disorder types.

Anulus
Rectangle in polar coordinate Template FIGURE 14.The process to covert an annulus in Cartesian coordinate to a rectangle in a polar coordinate that is described as in method [31].
A two-stream deep architecture network is constructed to extract different types of features for accurate classification.The proposed method can successfully classify disorders with an accuracy of 93% for the Posterior Canal and 95% for the Geo and Apogeo BPPV, respectively.

VI. DISCUSSION AND FUTURE WORKS
Our work focuses on the application of artificial intelligence to investigate the BPPV disorders in medical data which can be observed by visual movements in the recorded videos.This can aid the doctors significantly in the context that the number of patients increases and hence the video data collected arose every year.The recent advances in deep learning such as CNN demonstrate the very powerful capabilities in learning the image and video representation in the carefully collected dataset [59] which is also proved to be beneficial in the custom dataset as in our work.
Our work has some limitations such as the feature of the patient body when integrates into the eyes movement features do not give the desired performance.Therefore, currently, we use labels that mark the exact posture of the patient for each moment in the video.To make the whole process a fully automatic diagnosis, the feature extractor on the human body should extract more accurate subtle details like when the head turns left or right which is crucial in diagnosing BPPV disorders.
The more recent framework Vision Transformer (ViT) [60] has emerged as a promising approach that outperforms CNN in some tasks in visual recognition.It is a potential backbone for improving the BPPV disorders automatic process.ViT is a powerful model but heavy computation, especially in the video dataset.We let it for future work.

FIGURE 2 .
FIGURE 2. Four examples of the collected video dataset.Following the Dix-Hallpike test, [10], different poses of the patient body were conducted for the diagnosis of BPPV disorders with the visual observation corresponding to the eye's abnormal moving.Each video contains 3 sub-videos which record the left eye, right eye, and body pose.The black area does not contain information.

FIGURE 3 .FIGURE 4 .
FIGURE 3. A normal procedure of BPPV diagnosis which collects video data of the eyes associated with the seven positions of the patient's body: (a) the patient is sitting straight, (b) head bending while sitting position, (c) laying down while patient keeps looking straightly into the ceiling, (d) when patient laid on the bed, perform turning head left quickly, (e) when patient laid on the bed try turning head right quickly, (f) from the sitting position, try turning head left 45 o lying down and make sure the head is hanging out of the bed, and (g) from the sitting position, turning head right 45 o lying down and make sure the head is hanging out of the bed.Performing actions b, c, d, e often indicates the "lateral head turns" test while the action a, f, g are intended the "Dix-Hallpike" test.
The extracted scalars have been formed into a grid image for each video and fed into a standard CNN classifier.Lim et al. is based on hand-crafted features for diagnosing BPPV disorder, which has limited their performance.

FIGURE 5 .
FIGURE 5. Horizontal beating detection of the left eye has been analyzed for a patient who is diagnosed to be positive to type Lt_Geo_BPPV with 9676 video frames.Red points are the right beating, blue points are left beatings.The horizontal axis is the number of frames, the vertical axis is velocity.

2 FIGURE 6 .
FIGURE 6.The horizontal beating is determined by slow phase velocity (SPV) and fast phase velocity (FPV).Fig.(a) illustrates the eye has left-beating.In this case, the eye move from point B to point C faster (decreasing the pixel) than the eye moves from A to B. Whilst Fig.(b) shows the right-beating behavior of the eyes where they move from point A to point B faster than when it moves from point B to point C. The behavior of these beatings is contradicted.Here, ∆x1 indicates the distance when the eye travels during the AB in the direction of pixel increasing and ∆x2 is the distance when the eyes move toward the decreasing pixel direction.Similarly, we can determine the beating up and beating down where the eyes have the beating direction vertical.The beating is counted if the beating velocity is above a threshold.

13 : else 14 :
else if f lag 1 > 0 and f lag 2 < 0 then 12: Calculate slow phase velocity: v lef t ← ∆x counter1 Calculate fast phase velocity: v right ← ∆x counter2 FIGURE 7. The whole architecture for detecting six BPPV disorders.In the first stage, the input video is categorized into three raw classes "posterior types" and "not posterior types".After that, a pupil detector based on CNN is applied to track the eye pupils and detect the beating that happens in the second stage.

FIGURE 8 .
FIGURE 8.The posterior canal BPPV types with (a) Rt_PC_BPPV specified by a counterclockwise torsion and sometimes accompany with slight up beating, and (b)Lt_PC_BPPV type with clockwise torsion and upbeat.These featured movements of the eyes have been learned successfully by the proposed system which is much more robust than the conventional template-matching -based method as in[31].
Fig 1 shows the horizontal beating over time for a video of a patient who is positive with left lateral canal BPPV.

FIGURE 9 .
FIGURE 9. t-SNE plot for the first round, classifying video into three kinds of BPPV groups: Lt_PC_BPPV, Rt_PC_BPPV, and None_PC.The plot is scattered into the 2D plane for 746 data points (the whole dataset).It is best viewed in color.

FIGURE 11 .FIGURE 12 .
FIGURE 11.Visualizing the learned features of the None_PC class.

FIGURE 13 .
FIGURE 13.Visualizing the learned features of the Rt_PC_BPPV class.

TABLE 3 .
Comparison of the recent works in analysis of nystagmus.The performance is reported based on the accompanying dataset.The abbreviations are as follows: PCA: Principle Component Analysis, MNN: Multilayer Neural Network, VNG: Video-nystagmography, FLD: Fisher Linear Discriminant, CNN: Convolutional Neural Network, VOG: Video-oculography, F-CNN: Fully Convolutional Neural Network, BiGRU: Bi-directional Gate Recurrent Unit.

TABLE 4 .
The collected data was split into train and test sets with a ratio of 90% and 10%, respectively."Per class rate" refers to the proportion of data for each class in the whole data.

TABLE 5 .
Input data have been roughly categorized into three groups using BiGRU: Not posterior canal (None_PC_BPPV), left posterior canal (Lt_PC_BPPV), and right posterior canal BPPV (Rt_PC_BPPV).It is performed on three different splits validation sets and takes the average.Then, the None_PC_BPPV is classified into the rest of the four classes with lateral canal BPPV types.

TABLE 6 .
with three classes: Lt_PC_BPPV, Rt_PC_BPPV, and None_PC_BPPV, showing that the BPPV types with rotational features are well extracted.Accuracy of two stages and the final process.Step 1, data are categorized into posterior canal types (Lt_PC_BPPV, Rt_PV_BPPV) and None_PC_BPPV.Step 2, the None_PC_BPPV data are classified into four types of lateral canal BPPVs.And finally, the accuracy for the final predicted labels is calculated.

TABLE 7 .
[31]arison of the baselines and the proposed method in terms of validation accuracy in our dataset, † indicates that we implemented the method[31]in our dataset.Non-Information Rate (NIR) indicates the most proportional class rate in the collected dataset.

TABLE 8 .
The classification results for six classes of BPPV disorders.The results are performed on three different splits validation sets and are taken average.LC stands for "lateral canal" while PC stands for "posterior canal".
samples from three classes that have been learned by using the top layer that is trained in the proposed model.The top layer of the proposed model produces 1280-dimensions feature vectors that have been reshaped to 2D feature maps (10 × 128) for visualization.It shows that class "None_PC", "Lt_PC_BPPV" and "Rt_PC_BPPV" have been extracted