Deep Learning Based Walking Tasks Classification in Older Adults Using fNIRS

Decline in gait features is common in older adults and an indicator of increased risk of disability, morbidity, and mortality. Under dual task walking (DTW) conditions, further degradation in the performance of both the gait and the secondary cognitive task were found in older adults which were significantly correlated to falls history. Cortical control of gait, specifically in the pre-frontal cortex (PFC) as measured by functional near infrared spectroscopy (fNIRS), during DTW in older adults has recently been studied. However, the automatic classification of differences in cognitive activations under single and dual task gait conditions has not been extensively studied yet. In this paper, by considering single task walking (STW) as a lower attentional walking state and DTW as a higher attentional walking state, we aimed to formulate this as an automatic detection of low and high attentional walking states and leverage deep learning methods to perform their classification. We conduct analysis on the data samples which reveals the characteristics on the difference between HbO2 and Hb values that are subsequently used as additional features. We perform feature engineering to formulate the fNIRS features as a 3-channel image and apply various image processing techniques for data augmentation to enhance the performance of deep learning models. Experimental results show that pre-trained deep learning models that are fine-tuned using the collected fNIRS dataset together with gender and cognitive status information can achieve around 81% classification accuracy which is about 10% higher than the traditional machine learning algorithms. We present additional sensitivity metrics such as confusion matrix, precision and F1 score, as well as accuracy on two-way classification between condition pairings. We further performed an extensive ablation study to evaluate factors such as the voxel locations, channels of input images, zero-paddings and pre-training of deep learning model on their contribution or impact to the classification task. Results showed that using pretrained model, all the voxel locations, and HbO2 - Hb as the third channel of the input image can achieve the best classification accuracy.


I. INTRODUCTION
M OBILITY impairments are common older adults af- fecting their functional independence and leading to increased risk of disability, morbidity, and mortality [1].Studies have shown that attention which is sub-served by the Pre-Frontal Cortex (PFC) and its related circuits plays a key role in the higher order cognitive control of mobility [2]- [6].Especially under complex and more taxing locomotion tasks as in dual task walking (DTW), allocating attention to competing task demands, requires use of additional attentional resources [7], can result in degradation in both the gait and the secondary task performances, and is sensitive to aging posing a key risk factor for incident frailty and falls [8], [9].Hence, assessment and identification of cognitive resource allocation together with gait performance under simple and attention demanding walking conditions, can be critical for incident risk assessment and prevention of falls in normal aging as well as in disease populations.
Motor control models of locomotion and robust associations between structural changes in frontal and subcortical brain regions with mobility outcomes have been established [10]- [12].Even though converging evidence suggests the role cognitive processes, specifically the executive functions in explaining mobility performance and decline in older adults [2], [3], studies on the real time assessment and specific detections of functional neural correlates of simple and attention-demanding locomotion tasks is scarce.This gap could be in part due to the requirements of subject immobility and supine positioning in traditional neuroimaging modalities during scanning procedures making functional imaging of real, on the ground walking unattainable.
Recent studies began to increasingly utilize an emerging neuroimaging modality, namely functional near infrared spectroscopy (fNIRS) to assess cortical control and functional correlates of mobility under simple and attention demanding dual task walking conditions in aging populations [4], [13]- [28].fNIRS is an optics-based non-invasive, safe, portable, and wearable neuroimaging technique [29]- [33], which can monitor relative changes in oxygenated-hemoglobin (HbO2) and deoxygenated-hemoglobin (Hb) associated with cognitive activity in real world tasks such as walking and talking.
While the tasks used in the investigation of functional brain mechanisms of mobility using fNIRS technology varies across studies, the most commonly implemented ones involve balance tasks, running, climbing the stairs and STW and DTW conditions [4], [13]- [28].Specifically, in prior fNIRS studies reproducible and statistically significant increases have been found in HbO2 obtained from the PFC in DTW as compared to STW due to greater cognitive demands on attentional resources and gait performance that are inherent in the DTW condition [13]- [20], [27], [28].Furthermore, it was found that cortical responses to task demands specifically in the DTW condition were moderated by age [28], gender and stress [13], fatigue level [14], medication use [16], and disease status including diabetes [17], Multiple Sclerosis (MS) [18], mild cognitive impairments [19], and neurological gait abnormalities [26].
Even though growing number of studies that utilized fNIRS measures on older adults have repeatedly shown that hemodynamic biomarkers from PFC can provide significant differences between STW and DTW conditions in healthy and disease populations, automatic classification of these tasks using machine learning algorithms have not yet been studied.Automatic detection of attentionally more demanding vs simple walking tasks using discriminative hemodynamic features extracted from HbO2 and Hb can provide information on an individual's use of his/her attentional resources during active walking.Such automated detection indicative of attentional load during active walking can help in real time identification or prediction of cognitive overload, loss of gait control, reduction in gait performance and even prevention of falls.Moreover, identification of selective features that can discriminate walking task conditions can also lead to further diagnosis, monitoring and automatic classification of different age-related disease conditions where PFC activations in DTW were found to differ.
fNIRS measures have been used in the classification of wide range of tasks and disease populations in different age groups in prior studies.Some of these applications involve monitoring of mental workload, motor imagery, auditory and visual perception, various brain computer interfaces, pain assessment, anesthesia monitoring, attention deficit and hyperactivity disorder (ADHD) diagnosis, cognitive decline in traumatic brain injury, diagnosis of various mental illnesses such as schizophrenia [34]- [42].However, there are very few studies on the classification of gait related tasks.Existing studies primarily monitored motor areas and investigated classification of intention or preparation to different types of gait in healthy young adults primarily for gait rehabilitation applications involving control of assistive devices where classification accuracy was found in about 80% ranges [43]- [46].In these small number of prior studies, fNIRS measures from PFC during single and attentionally demanding dual task active walking conditions that are indicative of different attentional states and cognitive load conditions in elderly populations were not studied with machine learning models.
In this study our aim is to achieve automatic classification of walking tasks requiring different levels of cognitive resources.Specifically, we develop a comprehensive pipeline for processing and engineering the collected fNIRS data to efficiently extract the features.We fine-tune pre-trained stateof-the-art deep learning models over the fNIRS dataset and obtain up to 81% accuracy, which is about 10% higher than the traditional machine learning algorithms.We also conduct ablation studies for identifying critical features when using fNIRS for classifying walking tasks of older adults.
This paper is organized as follows: In Section II, we introduce the information of the participants and our task protocol.In Section III, we explain our proposed methods in detail.We present the results of our comprehensive results in Section IV and finally, we provide concluding remarks in Section V. To the best of our knowledge, we are the first to apply deep learning methods in fNIRS-based walking task classification for older adults.

A. Participants
The study involved a total of n = 451 community dwelling older adults in Lower Westchester county, NY of age 65 years and older (76.16 ± 6.67, 223 females) who were originally enrolled in a longitudinal cohort study entitled "Central Control of Mobility in Aging" (CCMA) [4], [25].Recruitment procedures started with the identification of potential participants from population lists and then conducting a structured telephone interview to obtain verbal assent, assess medical history, mobility and cognitive functioning.Participants with significant loss of vision and/or hearing, inability to ambulate independently, current or history of severe neurological or psychiatric disorders, and recent or anticipated medical procedures that may affect mobility were excluded from the study.Individuals who agreed to participate in the study, fell into the inclusion/exclusion criteria and passed the phone interview were invited to two annual in-person study visits each lasting around 3 hours at the research center at Albert Einstein College of Medicine, Bronx, NY.During these visits, participants received a structured neurological examination and comprehensive neuropsychological, psychological, functional, and mobility assessments.Functional brain monitoring using fNIRS during the STW and DTW protocol was completed in one session.Cognitive status was determined at consensus diagnostic case conferences [47].Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) was used to characterize overall level of cognitive function [48].The sample was relatively healthy (Global Health Status mean score = 1.62 ± 1.09) and in the average range of overall cognitive function (RBANS mean Index score = 91.77± 11.71).The work described in this manuscript has been executed in adherence with The Code of Ethics of the World Medical Association (Declaration of Helsinki) and the APA ethical standards set for research involving human participants.Written informed consents were obtained at the first clinic visit according to study protocols approved by the Institutional Review Board at Albert Einstein College of Medicine, Bronx, NY (Protocol #2010-224; Date: 03/03/2022).

B. Task Protocol
The task protocol used in this study involved two single tasks and one dual-task conditions presented in a counterbalanced order using a Latin-square design to minimize task order effects on the outcome measures.The single task conditions were 1) single-task walking (STW) and 2) the single task alpha cognitive interference task (STA).In STW condition, participants were asked to walk at their "normal pace" around a 4 × 20 foot electronic walkway (Zenometrics system with Zeno electronic walkway using ProtoKinetics Movement Analysis Software (PKMAS), Zenometrics, LLC; Peekskill, NY).In the Alpha condition participants were asked to stand still on the electronic walkway while reciting alternate letters of the alphabet out loud (A, C, E. . . ) for 30 seconds.In the dualtask walking (DTW) condition, participants were required to perform the two single tasks at the same time by walking around the walkway at their normal pace while reciting alternate letters of the alphabet.Participants were specifically asked to pay equal attention to both the walking and cognitive interference tasks to minimize task prioritization effects.In both STW and DTW conditions participants were asked to walk on the instrumented walkway in three continuous loops that consisted of six straight walks and five left-sided turns.The duration of each task condition varied depending on the individual's walking speed.Reliability and validity for this walking paradigm have been well established [49].

III. METHODS
An overview of the proposed methods utilized in this work is illustrated in Fig. 1.We have four major steps: data collection, data pre-processing, feature extraction and deep learning: • Data Collection: In data collection, participants were asked to complete the task protocol as instructed, during which their hemodynamic activations were collected using fNIRS.In addition, we also collected subject-related data (gender and RBANS).• Data Pre-processing: In data pre-processing, we applied different methods such as visual inspection, wavelet denoising, hemodynamic data conversion, and spline and low pass filterings to obtain HbO2 and Hb data of participants in time domain for different task conditions.• Feature Engineering: We formulated the data preprocessed as an image tensor.First two channels of the image tensor represent the pre-processed Hb and HbO2 data, respectively.We also added another channel into the image tensor which is the difference between the HbO2 and Hb, i.e, the HbO2 -Hb referred to as the oxigen index [50].• Deep Learning: We applied various deep learning algorithms using the PyTorch framework [51].We used the pre-trained deep neural network and vision transformer architectures which are available open-source, and finetune them with the engineered fNIRS data and evaluate the model performance.

A. Data Collection
fNIRS System.We have utilized the fNIRS Imager 1100 (fNIRS Devices, LLC, Potomac, MD) in this study to collect the hemodynamic activations in the PFC while participants were performing the task protocol [25], [29], [30], [52].In this fNIRS device, the sensor consists of 4 LED light sources and 10 photodetectors configured as shown in Fig. 2 where each source-detector separation is set to 2.5 cm.The light sources on the sensor (Epitex Inc. type L4X730/4X805/4X850-40Q96-I) contain three built-in LEDs having peak wavelengths at 730, 805, and 850 nm, with an overall outer diameter of 9.2 ± 0.2 mm.The photodetectors (Texas Instruments, Inc., type OPT101) are monolithic photodiodes with a single supply transimpedance amplifier.With the given source-detector configuration and the serial data collection regime of the device, hemodynamic changes in the PFC can be monitored at the sampling rate of 2 Hz with 16 voxels as shown in Fig. 2.
During the fNIRS data collection procedure, first the fNIRS sensor was placed on the forehead of the recruited participants.A standardized sensor placement procedure based on landmarks from the international 10-20 system was implemented [52], [53] where middle of the sensor was aligned with the nose horizontally and the bottom of the sensor was placed above the eyebrows vertically.Testing was conducted in a quiet room.Participants wore comfortable footwear and performed the task protocol with the fNIRS sensor attached to their forehead during the overall data collection period.

B. Data Pre-processing
First, visual inspection was performed on individual data from all voxels to identify and eliminate the ones with saturation, dark current conditions or extreme noise.Then to eliminate spiky type noise, wavelet denoising with Daubechies 5 (db5) wavelet was applied to the raw intensity measurements at 730 and 850 nm wavelengths as proposed in [54] and widely applied in fNIRS studies [55].The artifact-removed raw intensity measurements were then converted to changes in HbO2 and Hb using modified Beer-Lambert law (MBLL) [20], [30], [56].In MBLL, previously published values for conversion parameters i.e. wavelength and chromophore dependent molar extinction coefficients ( ) and age and wavelength adjusted differential pathlength factor (DPF) were used [20], [30], [57].Finally, we applied Spline filtering [58] followed by a finite impulse response low-pass filter with cut-off frequency at 0.08 Hz [20], [59] to HbO2 and Hb data separately to remove possible baseline shifts and to suppress physiological artifacts such as respiration and Mayer waves.
Data epochs corresponding to each task condition, STW, STA and DTW, were extracted to be used in further processing for feature extraction and machine learning model generation for automatic activity classification.fNIRS data acquisition and the electronic walkway system for gait analysis were synchronized using a central "hub" computer with E-Prime 2.0 software where time stamps of start and end points for each baseline and task condition were marked and recorded [13]- [20], [27], [28].In order to correctly extract the data epochs during the exact walking task execution periods, a second level processing time synchronization method was implemented.The HbO2 and Hb data epochs corresponding to time interval between the first recorded foot contact with the walkway until the end of the 6th and final straight walk algorithmically determined by PKMAS as previously described in [26] were extracted for STW and DTW conditions.Finally, proximal 10second baselines administered prior to each experimental task were used to determine the relative task-related changes in the extracted HbO2 and Hb data epochs for each of the task condition using the previously described baseline correction method (subtracting the average value of the proximal baseline region data from the following task epoch data) [13]- [20], [27], [28].We then used HbO2 and Hb data epochs in DTW,

C. Feature Engineering
In this sub-section, we first show the statistical analysis across and within subjects, then present the feature engineering including input formulation for subsequent deep learning algorithms.Analysis across Subjects.We first illustrate the distribution of task completion time across subjects in Fig. 3.The task completion time differs between subjects and task conditions due to individual variability and normal pace.Note that completion time of the STA is 30 seconds as defined in the task protocol.For the other two tasks, we notice that DTW on average requires more time to complete than STW, since DTW could be a more challenging task for older adults.
Additionally, we investigate the distribution of Hb and HbO2 values under the three task conditions as provided in the histogram plots in Fig. 4. For Hb levels, all the three task conditions exhibit a Gaussian-like distribution where the averages are around negative values close to 0, suggesting decreases relative to baseline conditions.DTW condition shows higher standard deviation than the rest of the two single tasks, suggesting more individual variability in this condition.For HbO2, distribution of STW is still showing Gaussian-like pattern with average around the positive values close to 0. However, for DTW and STA, the distributions have shifted to more positive values suggesting more cognitive activations relative to baseline in these task conditions as compared to STW.Comparatively, levels in DTW show larger shift than STA.Such distribution differences between Hb, HbO2 and HbO2 -Hb levels inspire us in this study to use them as additional features for the machine learning model.Analysis within Subject.Although Fig. 4 indicates that the difference in hemodynamic activity levels of STW and DTW conditions can be leveraged as features, they could be different at the granularity of each voxel location for each individual subject.A representative subject HbO2 recording under STW and DTW conditions based on each voxel location is shown in Fig. 5.Note that, for this case, the time to complete DTW is higher than STW.We can observe from the plots that, although HbO2 levels in DTW are in general higher than those of STW, for each voxel location the range of values can drastically vary.In some channels and sample points, it can also be observed that HbO2 levels in STW are similar or even higher than DTW.Hence, there is a need to leverage the power of deep learning models beyond mere visual or statistical analysis to enable more accurate and individualized study, particularly to leverage the rich information revealed with each voxel location.Input Formulation.Based on the observation from the analysis, we aggregate the HbO2, Hb and HbO2 -Hb levels together and formulate each sample as a 3-channel image as input to the subsequent machine learning model.The first and the second channel are the HbO2 and Hb levels respectively and the third channel is the difference between HbO2 and Hb levels.For i-th sample in the dataset of each subject and task condition, the size is 3 × N i × L i .N i ≤ 16 refers to the number of forehead voxel locations and L i = 2 × T i refers to the sample points which is twice the task completion time since we use sampling frequency of 2 Hz.
Although there are 16 voxel locations, due to limitations on data collection with excessive artifacts that cannot be cleaned, there are missing fNIRS data of one or a few voxel locations in some samples.We perform a screening and accept samples with no more than 3 missing voxel locations and any samples with missing numbers beyond 3 is regarded as invalid data and thus not used in training or inference.The number of valid samples in the dataset after removing is n = 1216.For the missing data we use interpolation to reconstruct the data into an image of size 3 × 16 × L i .
Since the dataset is relatively small compared with general computer vision datasets with usually more than tens of thousands of images, we use pre-trained deep learning models rather than training models afresh to prevent over-fiting.Additionally, due to the difference on task completion time, the dimensions of the images are not consistent.Therefore, we perform further image processing to conform the input image dimensions with the input of the deep learning models that are usually pre-trained with the ImageNet dataset [60], i.e., 3 × 224 × 224.Although such processing or resizing of inputs are common in deep learning, how the images are resized into 3 × 224 × 224 can potentially have impact on the learning performance [61].Note that, for each sample, data beyond first 224 points are truncated while data with less than 224 points are padded with 0. We intentionally avoid row-wise interpolation because we want the fNIRS data to keep the information of task completion time and pace of each subject which can be an important marker.
In order to find the optimal processing configuration, we sweep across 3 different parameters which are two commonly used resizing techniques for deep learning model inputs [61], which enables us 5 different configurations.The 3 parameters are visualized in Fig. 6 and also explained below: • Interpolation mode: Interpolation is to insert data between each original data to enlarge the input size into 3 × 224 × 224.The available options are bilinear or bicubic interpolation.• Corner alignment: This parameter decides whether to force corner alignment during interpolation.If true, the levels from the first and last voxel locations will be used as corner so that no interpolated values will appear as corners of the processed image.• Interleave repeat: As we have 16 voxel locations, we are able to interleave repeat the data of each location by 14 times to achieve the required 224 dimension (row-wise) as an alternative for interpolation.Interleave repeat will duplicate exactly the data row-wise, which will reflect as the clear boarders between each voxel locations visually as illustrated in Fig. 6.In summary, after feature engineering, we are able process and enhance the original samples with fNIRS levels of missing data and inconsistent dimensions into images in the shape of 3 × 224 × 224, which is ready for the deep learning models.

D. Deep Learning
We leverage the state-of-the-art deep learning models for machine vision and/or image classification.We first train the models afresh, i.e., without any pre-training.However, since the scale of the fNIRS dataset is rather small, the model easily overfits: the accuracy on training set achieves more than 90% while the inference set is only around 70%.This is a common issue for other applications from medical imaging in the bioinformatics domain as the clinical data are usually much less than large general image datasets.Thus, for domain specific application with proprietary datasets, the concept of transfer learning [62] is usually in favor: the model is first pre-trained using the general and public image dataset and then fine-tuned on the proprietary dataset.In this work, using ResNet as an example, we obtain open-sourced, publicly available ResNet model which is trained using the ImageNet dataset which has 1000 classes.We then change the dimensionality of the last classifier (the fully connected layer) from 1000 to 3 to match the three task conditions.We fine-tune over this model by using the processed images to fit the model on the fNIRS dataset for classifying the three task conditions.We implement our deep learning model using various architectures that are basically in two families: deep convolutional neural networks and vision transformer attention networks.Model details and configurations are introduced in the experimental results section Sec.IV.

IV. EXPERIMENTAL RESULTS
In this section, we present the experimental results on the machine learning model for the classification of tasks in older adults.We evaluate the impact of different feature combinations as well as different machine learning algorithms on classification accuracy and computational efficiency.

A. Experimental Setup
Environment.We use Pytorch, a machine learning framework for python [51] to implement the machine learning models.We split the samples into training and inference set by 8:2, and the results are obtained with 5-fold cross validation.We use Adam optimizer with 0.001 learning rate and halt finetuning when accuracy does not increase further, thus the total number of epochs as well as the time for fine-tuning can vary across different models.Models.We use the state-of-the-art deep learning and/or machine vision models from two architectures: deep neural networks and vision transformer attention networks.We sweep across different architectures including deep convolutional neural networks such as ResNet [63], VGG [64], MobileNet [65], EfficientNet [66], and TinyNet [67].We also evaluate on recently emerging attention based vision transformer models [68].All the pretrained deep learning models are fetched from online open source repositories via PyTorch Image Models (timm) package [69].
As a comparative study, we also implement several baseline machine learning models, including traditional machine learning models of decision tree, random forest (with 25 trees), and k-nearest neighbors (k = 5) which are implemented via Scikit-Learn package [70].

B. Classification Results
Comparison with Baselines.Results of this classification task including accuracy and error bars with different learning models (baseline and deep learning) is presented in Table I.
We can first observe that deep learning models are able to out-perform all the baseline traditional learning algorithms by at least 10% on the inference set.Particularly, decision tree and random forest are extremely over-fit on training data with higher than 99% accuracy while having poor performance on inference set.K-nearest neighbors can hardly learn efficiently as the accuracy on training set is only around 77% which is quite lower than all the other models.Some of the deep learning models also experience different degree of overfit.Particularly for larger models such as VGG, ResNet and ViT-Base, their accuracy on training set is mostly over 94% except for VGG.Smaller models like MobileNetV2, TinyNet and EfficientNet are usually less overfit as their accuracy is 90% -93% on training set.
The top three models according to accuracy on inference set are: ResNet-18, VGG-13 and TinyNet-E.All the three models can achieve accuracy near 81%.We use these three models for the following ablation study and efficiency analysis.

C. Ablation Study
Dimension.We perform an ablation study to identify the contribution of features.First, we are to rank the contribution of each dimension in the input sample, e.g., Hb, HbO2 and the difference between HbO2 and Hb, respectively, to the classification performance.Therefore, we attempt to remove one dimension in the image while keeping the data in other two dimensions intact and observe the change on the accuracy of inference set.According to Fig. 7, removing any of the three dimensions will cause accuracy degradation.However, model accuracy has different sensitivity towards different dimensions.Specifically, removing the HbO2 and the HbO2 -Hb dimension induces larger accuracy drop than removing Hb.This aligns with the statistical analysis in Fig. 4 that HbO2 levels are different in distribution which indicate more discriminative features while Hb levels preserves similar Gaussian distribution under three task conditions and hence not very selective.
Our results are also in line with the prior findings that HbO2 and hence the oxygenation are more reliable and sensitive to locomotion-related changes in cerebral blood flow [71] and therefore providing the most distinctive features.Image Processing.As shown in Fig. 6 (images are normalized for visualization purposes), we apply in total 5 different configurations on image processing.As an ablation study, we analyze the performance impact of image processing by comparing "Full" refers to no removal of any data."Hb", "HbO2" and "HbO2 -Hb" refer to removal of "Hb", "HbO2" and "HbO2 -Hb" features from the overal analysis, respectively.
the inference set accuracy under different configurations in Hemisphere.We also try to characterize hemispheric contributions to the classification accuracy.We prune the input samples by removing all the data from voxel locations that is from one hemisphere (Channel 1 -8 for left and 9 -16 for right) and only use the rest for deep learning and identify the model performance.Based on Fig. 8, we can observe that for TinyNet-E model, data from the left hemisphere seem to contribute more while for ResNet-18 and VGG-13 models, removing data from right hemisphere causes more accuracy drop.In general, by removing data from either hemisphere will result in accuracy degradation for up to 8%, thus indicating it is preferred to use the data from all the voxel locations for better model performance.

D. Model Efficiency
We also provide insights on the three models for their cost and overhead including number of parameters, model size, time for model fine-tuning and inference as well as the throughput (samples per second) in Table III.Time and throughput data are obtained with NVIDIA Tesla P100 GPU.
ResNet-18 and VGG-13 use relatively larger model size and show lower throughput.Particularly for VGG-13, although the model is quite large with around 130M parameters, it does not  V. CONCLUSION Functional near infrared spectroscopy (fNIRS) is an opticbased, non-invasive neuroimaging modality, which is increasingly used as a safe and portable method to assess the cortical control of gait.Notably, fNIRS studies repeatedly show increased activation in the prefrontal cortex from single task walk (STW) to dual-task walk (DTW) conditions in older adults due to increased attentional demands in DTW, which is also an established risk factor for incident frailty, disability, and mortality.In this paper, we introduce and integrate the emerging deep learning methods into the pipeline of using fNIRS measures based on oxygenated (HbO2) and deoxygenated hemoglobin (Hb) to detect and classify task conditions in older adults to assess their cognitive capabilities during single and dual task locomotion.We develop an extensive framework for data collection, pre-processing, feature engineering and deep learning and leverage the outstanding learning capabilities of deep neural networks models which surpasses traditional machine learning models by at least 10% in terms of classification accuracy.To the best of our knowledge, this is the first study to introduce deep learning methods in fNIRS-based single and dual task walking classification in older adults.

Fig. 1 .
Fig. 1.Overview of the proposed framework.There are 4 major phases: 1) Data Collection: Subjects are asked to perform task protocols and fNIRS data and task condition labels are collected.2).Data Pre-processing: Signal processing methods are applied for de-noising, conversion and filtering on the raw collected fNIRS data.3).Feature Engineering: fNIRS data are formulated and engineered into an image which is ready for deep learning model to learn the features.4).Deep Learning: Various deep learning models are trained or fine-tuned on the fNIRS dataset.The models are evaluated on a separate inference set on their task condition classification accuracy.

Fig. 2 .
Fig. 2. fNIRS system of the sensor pad and the sensor placement on the forehead with 16 voxel locations.

Fig. 3 .
Fig. 3. Histogram of task completion time of different subjects under three conditions.

Fig. 4 .
Fig. 4. Histogram of Hb and HbO2 levels of different subjects under three task conditions.

Table II .
We can observe that although the accuracy can vary up to 2% across different configurations, we do not observe any single configuration able to dominate over other configurations.Based on such observation, we conclude that for different deep learning models, the best configuration can vary and require individual evaluation to select for the best to extract the features from the fNIRS data.

TABLE II TASK
CLASSIFICATION ACCURACY COMPARISON UNDER DIFFERENT IMAGE PROCESSING CONFIGURATIONSModelsbilinear, aligned bilinear, not aligned bicubic, aligned bicubic, not aligned interleave more competitive accuracy than the rest.For TinyNet-E, since the classifier layer (fully connected layer) output is reduced to 3, the model size becomes drastically compact and the throughput nearly triples the rest two models.However, based on our experiment, TinyNet-E requires more epochs to achieve comparable accuracy, thus even if it is smaller in model, it takes more time than ResNet-18 to fine-tune. show