A Siamese Convolutional Neural Network for Identifying Mild Traumatic Brain Injury and Predicting Recovery

Timely diagnosis of mild traumatic brain injury (mTBI) remains challenging due to the rapid recovery of acute symptoms and the absence of evidence of injury in static neuroimaging scans. Furthermore, while longitudinal tracking of mTBI is essential in understanding how the diseases progresses/regresses over time for enhancing personalized patient care, a standardized approach for this purpose is not yet available. Recent functional neuroimaging studies have provided evidence of brain function alterations following mTBI, suggesting mTBI-detection models can be built based on these changes. Most of these models, however, rely on manual feature engineering, but the optimal set of features for detecting mTBI may be unknown. Data-driven approaches, on the other hand, may uncover hidden relationships in an automated manner, making them suitable for the problem of mTBI detection. This paper presents a data-driven framework based on Siamese Convolutional Neural Network (SCNN) to detect mTBI and to monitor the recovery state from mTBI over time. The proposed framework is tested on the cortical images of Thy1-GCaMP6s mice, obtained via widefield calcium imaging, acquired in a longitudinal study. Results show that the proposed model achieves a classification accuracy of 96.5%. To track the state of the injured brain over time, a reference distance map is constructed, which together with the SCNN model, are employed to assess the recovery state in subsequent sessions after injury, revealing that the recovery progress varies among subjects. The promising results of this work suggest that a similar approach could be potentially applicable for monitoring recovery from mTBI, in humans.


I. INTRODUCTION
D EPENDING on the severity level of the injury, traumatic brain injury (TBI) is classified into mild, moderate, and severe, with the majority of patients falling under the mild TBI (mTBI) category [1].Undiagnosed mTBI can lead to various short-and long-term issues, including cognitive impairment, fatigue, depression, irritability, and headaches.Existing brain imaging tools, such as computed tomography (CT) and magnetic resonance imaging (MRI), are typically unable to detect early-stage mTBI [2].Moreover, diagnosing mTBI based on subjective patient reporting and self-description may lead to uncertainty in clinical decisions.Hence, there is a critical need for the development of objective and accurate tools to diagnose mTBI.
Recently, functional neuroimaging techniques, such as magnetoencephalography (MEG), electroencephalography (EEG) and functional MRI (fMRI), have been employed to study mTBI in order to develop models or find neurophysiological biomarkers for mTBI diagnosis.In [3], a supervised machine learning method was employed, achieving an accuracy of 79% for predicting mTBI using power spectra from MEG measurements of 25 mTBI patients and 20 healthy controls.The results of the work in [4] have suggested beta oscillations could be potential biomarkers for diagnosing mTBI.In [5], a combination of a symptom questionnaire, behavioral tests, and resting state EEG measurements were used to classify participants into injured or control groups with an accuracy of 91%.Another study [6], utilized the average power in different frequency sub-bands and the alpha:theta power ratio in EEG, as features in order to distinguish between mice with mTBI and the control group during wake stages.The study reported an accuracy of 86% by employing a convolutional neural network (CNN).The review work in [7] suggested that fMRI biomarkers of resting state brain connectivity, acquired within one month after the injury, hold potentials for predicting the outcome in mTBI.In [8], multifeature analysis, incorporating diffusion-weighted imaging, magnetic field correlation, resting-state fMRI, and volumetrics, showed promising results in accurately distinguishing mTBI patients from controls, with up to 86% accuracy using minimal-redundancy maximalrelevance feature selection and support vector machine (SVM).In [9], it was shown that subjects with mTBI exhibited a significant rise in their fMRI resting state connectivity between the cerebellum and sensorimotor networks, as well as between the left angular gyrus and precuneus.The study obtained a SVM model using resting state functional connectivity features that achieves a maximum accuracy of 84.1% for detecting mTBI from healthy control groups.The work in [10], utilized widefield calcium imaging recordings from Thy1-GCaMP6s mice and demonstrated that mTBI-induced changes in cortical functional connectivity are frequency dependent.
The animal and human studies mentioned above provide evidence of brain function alterations following mTBI, and suggest that by employing functional neuroimaging techniques, models for detecting mTBI can be developed.However, the suggested models in most of these studies have required manual feature engineering, in which a set of features need to be manually determined, selected and extracted from neuroimaging data.Feature-based approaches, however, may not be ideal solutions for a problem as complex as mTBI detection, where the optimal set of features are typically unknown or difficult to identify.A data-driven approach on the other hand, offers the potential to find patterns, relationships and interactions within the data that may not be apparent through manual feature selection.Compared to feature-based models, data-driven models may provide a more automated and unbiased approach to capture non-linear relationships between features, and thus, would be more suitable for the problem of mTBI detection.
Additionally, tracking the progression/regression of the disease or monitoring the response to treatments in patients with mTBI, over time, is critical for understanding the long-term effects of the condition and optimizing patient care.However, gathering longitudinal data, which involves collecting information from the same individuals at multiple time points, can be a complex and resource-intensive process.Additionally, a major challenge arises from the absence of a reference point to assess how the condition of the brain changes over time following mTBI.As such, limited work exist on this topic.The work in [11] examined various assessment means, including neurocognitive tests, postural stability, and serum biomarkers, spanning from baseline to six months post-injury, suggesting that such investigations can contribute to the development of mTBI-recovery models.In [12] and [13], investigators presented evidence of both functional and structural abnormal connectivity in the early phase of mTBI, followed by compensation in functional and structural connectivity in later stages, suggesting the involvement of compensatory mechanisms in brain function after mTBI.
This paper presents a data-driven approach for monitoring the recovery progress in subjects affected by mTBI.The proposed approach utilizes a Siamese CNN (SCNN) to distinguish subjects with mTBI from sham and healthy groups.In contrast to feature-based methods, the proposed data-driven SCNN model receives cortical images as pairs at its input, without the need for explicit feature extraction.The trained SCNN model is then used to form a reference distance map to establish boundaries for detecting healthy, sham and injured groups.The reference map is used as a baseline for comparing and assessing changes in the brain's state over time following injury.The proposed framework is tested on the cortical images of Thy1-GCaMP6s mice, obtained via widefield calcium imaging in a longitudinal study, to monitor the recovery progress of the brain of each animal following mTBI over time.To the best of our knowledge, this is the first study that presents a data-driven approach for tracking the progression/regression of mTBI using SCNN.
The rest of the paper is organized as follows.In Section II the data obtained from the longitudinal study along with the proposed framework are described.Results are presented in Section III and discussions are given in Section IV.The paper is concluded in Section V.

II. METHOD
We first describe the experimental data used in the study and then discuss the proposed framework.

A. Description of the Dataset
Experiments were run at the Department of Cell Biology and Neuroscience of Rutgers University and data was shared.All procedures were approved by the Rutgers University Institutional Animal Care and Use Committee.Spontaneous cortical activity of Thy1-GCaMP6s transgenic mice were acquired via widefield calcium imaging, at the rate of 100 frames/s, in multiple recording sessions.This imaging technique offers high spatial resolutions and enables longitudinal studies of the brain function [14], [15], [16], [17], [18], [19], [20].Each recording session consisted of 8 trials, each lasting for 20.47 s.
Details of the experimental procedures were previously described in [10].Briefly, mice underwent cortical Ca 2+ transient activity recording via a transparent skull transcranial window.On the day of injury, mice were anesthetized with isoflurane.A small craniotomy (approximately 1 mm diameter) was performed on the left frontal motor cortex using a dental drill, while keeping the dura intact.Mice were randomly divided into two groups: the injury group and the control group (sham), each group with 9 animals.In the injury group, trauma through craniotomy was caused in the motor cortex, by activating the controlled cortical impact (CCI) device, calibrated for mild trauma.The sham group experienced probe activation without contact.Mice in both sham and injury groups underwent similar procedures except that the sham group did not experience the injury.
Each mouse in the sham group underwent two recording sessions: one was recorded before craniotomy (session 1) and another, after craniotomy (session 2).Each mouse in the injury group had a total of seven recording sessions: one session was recorded 20 minutes before inducing the injury (session 1), and six sessions were recorded at 20 minutes (session 2), 60 minutes, one day, three days, one week, and two weeks after inducing the injury.Data for the one-week and two-week postinjury sessions for animal 5, and two-week post-injury session for animal 9 were unavailable, due to experimental issues.Fig. 1 visually summarizes the recording sessions obtained from each group.
With a camera frame rate of 100 frames/s, each trial generated 2047 (20.47 s ×100 frames/s) images of size 100 × 100 × 1.The pixel values of the acquired images were between 0 and 6000, with only 2% of pixels having a value greater than 5000.To standardize pixel values across images, a maximum limit of 5000 was applied, replacing any pixel value exceeding 5000 with 5000.Pixel values of the images were then rescaled to lie in the range of [0, 1] to enhance the network stability and performance [21], [22].Lastly, to improve the signal to noise ratio, each trial was represented by one image that was obtained by averaging all the images acquired during the trial.

B. Proposed Model 1) Siamese Convolutional Neural Network (SCNN):
The Siamese neural network (SNN) [23], [24] is a class of neural network architectures in which the network learns to measure the similarity or dissimilarity between pairs of inputs [25].The SNN architecture consists of two identical subnetworks with identical weights.By comparing the outputs of these subnetworks, the SNN can quantify the similarity between the two inputs.During training, SNN learns a similarity function [26], [27].SNN is suitable for applications with small datasets [28], [29], and tasks in which comparing and classifying pairs of data is crucial, and has been employed in various applications such as diagnosing spinal metastases [30], detection of diabetic retinopathy in retinal fundus photographs [31], tracking cardiac motion [32], diagnosing Alzheimer's [33], and visual tracking [34].
A Siamese convolutional neural network (SCNN) is a type of SNN that employs identical CNNs in its two subnetworks, making it well-suited for tasks involving comparing images [35], [36], [37].A SCNN takes a pair of images as input and evaluates the similarity between them.
In the SCNN model used in this work (Figure 2), we employed ResNet [38] for the CNNs.ResNet has been utilized in various medical imaging applications demonstrating promising results [39], [40], [41].The ResNet model was pre-trained on the ImageNet dataset [42].The softmax layer in the last layer was removed.
2) Contrastive Loss: The contrastive loss [43], [44] is a distance-based loss function that is used to make the model generate similar feature embedding if the samples belong to the same class, and less similar feature embedding if the samples belong to different classes.The contrastive loss function, L, is defined as where the value of Y represents the true label (i.e., Y = 0 indicates similar images and Y = 1 indicates dissimilar images), D w is the distance between feature embeddings of the input images, and m is the hyperparameter [45].It can be seen from ( 1) that dissimilar pairs of samples can only contribute to the loss if their distance is less than m.
3) Training Procedure: Three classes were considered: healthy, sham, and injured.For the healthy class, data was selected from session 1 of the animals in the injury group.For the sham class, data was selected from session 2 of the animals in the sham group, and for the injury class, data was selected from session 2 of the animals in the injury group (Fig. 3).To form the training, validation, and testing sets for each class, 6 out of available 8 images for each session were selected for training, 1 for validation, and 1 for testing.Each class has a total of 9 mice.Therefore, for each class, there were 54 images for training, 9 images for validation and 9 images for testing.
The input to the model is a pair of images, with a binary label stating whether the two images are similar (i.e., belong to the same class) or not similar (i.e., belong to different classes).If both images belong to the same class, the label is 0; otherwise, it is 1.A total of 3×54 2 = 13, 041 image pairs were used for training.Adam optimizer [46] was used as the optimizer (with the initial learning rate of 0.0005, β 1 = 0.9, β 2 = 0.99), and batch size of 32.During training, early stopping was considered if the validation loss did not improve after three consecutive epochs.

A. SCNN for Detecting mTBI
Here, we investigate how the choices for 1) the feature dimension (fd) in the final layer of the CNNs, 2) the distance function (df) used in SCNN, and 3) the ResNet structure, impact the accuracy (AC) of the SCNN model in correctly identifying the class of the images.The fd in the final layer of each CNN, corresponds to the number of dimensions used to represent the learned features for each input image, and captures the important characteristics of the input that are relevant for the similarity or dissimilarity comparison in the Siamese network.We considered three values for fd: 3, 10, and 100.For the distance function, to assess the similarity between embedding feature vectors, we considered the L 1 norm and L 2 norm.For the ResNet structure, we considered ResNet-18, ResNet-34, ResNet-50, and ResNet-101.
Table I summarizes the accuracy results for each scenario.It can be observed that the highest accuracy is achieved when the model uses ResNet-34 as the CNN architecture, feature dimension of 3, and the L 2 norm as the distance function.With these choices for the model, the SCNN achieves an accuracy of 96.5%.

B. SCNN to Predict Recovery Over Time
As discussed in Section II-A, the animals in the injury group (n = 9) went through six imaging sessions at different time points following injury.There is a potential for the animals to recover from the injury as time progresses.Here, we propose  As can be seen, three distinct intervals are identified in Fig. 5.The median L 2 norm of anchor images and images taken from the healthy group is less than 1.This is expected as both images belong to the same class of healthy group.The none-zero result could be due to subject variability as the anchor and testing images may belong to different animals.The median L 2 norms of anchor images and images from the sham and injury groups are between (1.1, 2.2) and (2.2, 3.6), respectively.Compared to healthy group, L 2 norm distance from anchor images is increased for the images from these two groups, with the images from the injury group showing the maximum distance.This is expected since among the three classes, images from the injury group are expected to have the least similarity to the images from the healthy group.This map with the established boundaries, is then utilized as a reference to determine the label of images obtained in sessions following session 2 in animals from the injured group, as a measure of tracking their recovery.
2) Tracking Recovery Over Time: To determine how each animal in the injury group is responding to injury over time, we propose to use the trained SCNN model, along with the anchor images and the reference distance map (Fig. 5).The trained SCNN model is used to calculate the L 2 norm distance of images from each post-injury session and the anchor images.These distances are compared with the reference distance map in Fig. 5 to predict the class label (healthy, sham or injury) to which the images belong, which can serve as a way for tracking the post-mTBI recovery in animal.
The results for each of the 9 animals in the injury group are shown in Fig. 6.The x axis indicates the sessions, with S1, S2, S3, S4, S5, S6 and S7 corresponding to recordings done at 20 minutes prior to, and 20 minutes, 60 minutes, one day, three days, one week, and two weeks after injury, respectively.The figure plots the L 2 norm computed using the SCNN model that receives images from each session paired with the anchor images.The distance boundaries for the three classes, based on the reference map (Fig. 5), are also shown.As discussed earlier, data for sessions S6 and S7 for animal 5 and session S7 for animal 9 were not available.
As seen in Fig. 6, for all animals, the computed L 2 norm for images of session 1 is below 1.This is expected, since images from this session belong to the same class as the anchor images (healthy class).On the other hand, the computed L 2 norm for images of session 2, for all the animals shows to be maximum among all other sessions.This is also expected, as images from session 2 (20 minutes after injury) are likely to be the least similar to the anchor images that belong to the healthy class.For session 3, the L 2 norm for images from all animals except animals 2 and 7, still falls within the injury boundary, suggesting that the impact of injury in most animals is still pronounced.As time progresses however, the value of computed L 2 norm decreases, indicating that images are getting more similar with the anchor images.By session 7, the L 2 norm for all the animals falls within the healthy/sham range, suggesting that the animals have recovered from injury after two weeks.However, the recovery progress does not follow the same timeline for all, highlighting variability in the rate and trajectory of recovery among animals.The results suggest that the proposed approach could be an effective way to track the recovery process following injury.

IV. DISCUSSION
Compared to more severe cases of TBI, mTBI remains underdiagnosed due to the lack of standardized clinical criteria and the limitations of structural imaging such as MRI or CT in detecting injury caused by mTBI [47].Recent studies have provided evidence that mTBI can lead to changes in brain function, suggesting that functional neuroimaging together with machine learning can be utilized to develop mTBI detection models.Majority of these studies however, have used feature-based approaches, where a set of hand-crafted features from neuroimaging data had to be selected and extracted [48], [49].On the other hand, data-driven models offer the advantage of automatically learning and extracting relevant features from the data, and thereby, eliminate the need for manual feature selection and extraction, which can be time-consuming and subject to bias.These approaches can reveal complex patterns and relationships in the data that may not be apparent through traditional feature-based approaches, leading to improved accuracy and robustness of the models.
In this paper, we presented a data-driven model based on SCNN, that receives a pair of cortical images as input to identify whether an image is associated with mTBI.Since SCNN works based on the concept of similarity/dissimilarity, a reference distance map was formed to identify boundaries for distances between anchor images belonging to the class of healthy brain, and test images.For our experiments, we randomly selected 5 images from the training set of the healthy class as anchor images.Choosing other sets of 5 healthy images as anchors is not expected to yield significant differences in the results, as they all belong to the same class (healthy).Indeed, Figs. 5 and 6 confirm this expectation, showing small distances between randomly-selected anchor images and other images of the healthy class, suggesting high similarity.Using the constructed reference distance map, one can assess the degree of similarity/dissimilarity of subsequent new test images with anchor images and make predictions regarding the state of the brain.The reference map serves as a solution to the critical problem of the lack of ground truth regarding the state of the brain as time progresses following an injury.By incorporating SCNN and leveraging the reference distance map, our data-driven model provides an effective tool for detecting mTBI and tracking recovery over time.Fig. 6.Longitudinal tracking of mTBI recovery for each of the 9 animals in the injury group, over time.Labels S1, S2, S3, S4, S5, S6 and S7 correspond to recording sessions done at 20 minutes prior to, and 20 minutes, 60 minutes, one day, three days, one week, and two weeks after injury, respectively.The boundaries for the three classes of healthy, sham and injured are set using the reference distance map in Fig. 5.Note that data for S6 and S7 for animal 5, and for S7 for animal 9 were not available.
Recent work [12], [13] have suggested the involvement of compensatory mechanisms of brain function following mTBI.When combined with longitudinal studies, recovery tracking models, such as the one proposed here, have the potential to contribute to advance our understanding of such mechanisms.These models can also play key roles in assessing the efficacy of the mTBI treatments over time, thereby, benefiting patient outcomes and guiding clinical practice.
In our previous studies, we had developed data-driven models for detecting mTBI from cortical activities of Thy1-GCaMP6s transgenic mice, immediately following injury.In [50] and [51] we developed customized CNN models that directly accepted raw cortical images as inputs.These studies suggested that the spatial features of calcium imaging data learned via hierarchical layers are informative for discriminating healthy and injured brains.Additionally, we developed a convolution autoencoder decoder (CAE) model to extract the most informative features that distinguish the images corresponding to injured and healthy brains [52].In [53] and [54], we demonstrated that patch-level-based models can be employed for mTBI detection.These data-driven of patients' recovery over time.The similarity or dissimilarity between newly-acquired brain images and healthy brain images can serve as a metric for monitoring the recovery progress.However, thorough evaluation against established medical criteria and standards is essential to guarantee the effectiveness, reliability, and accuracy of this approach when applied to humans.
Due to the nature of animal experiments and particularly the complexities of longitudinal studies, this work, similar to other animal-based TBI studies [6], [55], [56], [57], [58], [59], [60], [61], had a relatively smaller number of subjects compared to human clinical trials.Regardless, our experiments resulted in a large number of image pairs that was sufficient to train the proposed SCNN model.
In the future, in addition to the spatial features, we plan to also include temporal information in SCNN.In [50], we had shown that considering both spatial and temporal features of calcium imaging data for identifying mTBI, results in improved accuracy.We also plan to consider other loss functions, such as the triplet loss function, to see if they improve the performance.Furthermore, we plan to investigate the generalization capability of the proposed approach to data from subjects unseen during training.
Our study primarily utilized cortical images of mice.Animal models of TBI, specially those using transgenic mice, offer opportunities for studying mild, moderate, and severe brain injuries, due to the controlled nature of the injuries administered to the animals [57], [62], [63].These models can allow tracking through neuropathological and neurobehavioral metrics, which is critical for assessing long-term effects of TBI and the efficacy of treatments [62].Our study introduced the framework of employing a trained SCNN to monitor recovery over time, after sustaining injury.Additional validation and extension to human clinical trials are needed to ensure the applicability and generalizability of the proposed approach to humans.

V. CONCLUSION
In this paper, a framework based on SCNN for detecting mTBI and tracking recovery was presented.The framework was tested on the cortical images of Thy1-GCaMP6s mice obtained via widefield calcium imaging that were acquired in a longitudinal study.A reference map was constructed to set the boundaries for the distances between anchor images (healthy group) and images from healthy, sham and injury groups.This reference map along with SCNN was then utilized to identify the state of the brain of the animals in sessions following the injury.Results suggested that the animals recovered by the end of the longitudinal study, however, the recovery progress did not follow the same timeline for each.When used in longitudinal studies, the proposed data-driven framework has the potential to enhance our understanding of compensatory mechanisms in mTBI and enable the assessment of treatment effectiveness over time, ultimately improving patient outcomes.

Fig. 1 .
Fig. 1.Summary of the recording sessions obtained from the injury (n = 9) and sham (n = 9) groups.The animals in the sham group went through two recording sessions (one before and one after craniotomy), and the animals in the injury group went through seven recording sessions (one before injury, and six other sessions at 20 minutes, 60 minutes, one day, three days, one week, and two weeks post injury).

Fig. 2 .
Fig. 2.An overview of the SCNN architecture.A pair of images is given as input, and the two mirrored sub-networks, sharing same weights (W), generate the feature vectors.The contrastive loss function calculates a loss value by assessing the distance (computed, for example, using the L 1 or L 2 norm) between the two feature vectors.During training, the networks' weights are adjusted through backpropagation.

Fig. 3 .
Fig. 3. Illustration of the selection of image pairs from injury and sham groups for training the SCNN model.Three classes are considered: healthy (n = 9), sham (n = 9), and injured (n = 9).Images obtained from session 1 of the injury group form the class of "healthy".Images from session 2 of the control group form the class of "sham".Images from session 2 of the injury group (20 minutes after injury) form the class of "injury".In the training set, each class contains 54 images (9 subjects × 6 images), resulting in a total of 3×54 2 = 13, 041 image pairs for training the SCNN model.

Fig. 4 . 1 )
Fig. 4. The distance (L 2 norm) between each of the 5 anchor images randomly selected from the healthy class of the training set, and each of the 27 images (n = 9 per class) in the testing set is obtained.For each image in the testing set, 5 values for L 2 norm are obtained and the median is then used.

Fig. 5 .
Fig. 5. Boxplot of the median L 2 norms computed between each image in the testing set for each class (healthy n = 9, sham n = 9 and injured n = 9) and the 5 anchor images (randomly selected from the training set of the healthy class), using SCNN.Three distinct intervals are identified, separating each class.Intervals from this plot are then used as reference distances to monitor alterations in an injured brain in sessions following injury.

TABLE I ACCURACY
(AC) AND F1-SCORE OF THE SCNN MODEL FOR VARIOUS CHOICES OF FEATURE DIMENSION (fd), DISTANCE FUNCTION (df) AND ResNet STRUCTURE Table II compares the contributions of this work with other data-driven mTBI identification models.In this study, we demonstrated the potential of using a trained SCNN in monitoring recovery progress by comparing newly-acquired brain images to healthy brain images.This application holds promise in clinical settings, enabling tracking Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.