Personalizing Loneliness Detection Through Behavioral Grouping of Passive Sensing Data From College Students

Loneliness among college students is an increasingly prevalent issue. While technology-based methods for detection using behavioural patterns have been proposed, there remains an opportunity for improvement as insufficient attention has been given to the individual behavioral differences among students. Loneliness is a highly subjective experience and people’s routines differ, making it a challenge for generic models to accurately determine its presence and severity. In response to this challenge, it is particularly helpful to identify subgroups within the population that exhibit similar behavioral characteristics, enabling a more nuanced understanding and detection of loneliness. This paper introduces a novel approach to loneliness detection, leveraging a data set gathered through passive sensing using mobile phones, which provides a rich source of user behavioral data. We utilized unsupervised clustering to find subgroups of students exhibiting similar behavioral patterns over time within the data set. This approach is essential for continuous monitoring, identifying changes in behavioral patterns, and facilitating the early detection of loneliness. Using data from 41 students’ smartphones, we created group-specific classification models to identify loneliness. Group-based prediction models for loneliness detection have shown significant improvement in accuracy over generalized models. These findings can lead to the development of more effective, tailored methods for loneliness detection in diverse populations. This study emphasizes the importance of personalized approaches in mental health interventions and highlights the potential of passive sensing data in creating tailored loneliness detection methods.


I. INTRODUCTION
Loneliness has been identified as a growing public health concern globally and is considered a key contributor to a variety of chronic health conditions [1]. Loneliness is an experience in which a person perceives a lack of quality social relationships [2]. The negative health effects of loneliness range from sleeplessness to increased anxiety, sadness, and The associate editor coordinating the review of this manuscript and approving it for publication was Md. Moinul Hossain . a weakened immune system. It is a common condition that most individuals encounter at some point in their lives; yet, it can be harmful when it becomes chronic [3]. Loneliness and mental health issues have a cyclical relationship. Those with mental health issues are more than twice as likely to experience loneliness as those in good mental health [4]. In light of the adverse impacts and prevalence of loneliness after the COVID-19 pandemic, there is a need for early detection of loneliness in order to mitigate its subsequent consequences.
Traditionally, clinical measures have been used to determine the level of loneliness in individuals. These assessments or scales are well-established and have shown reliable results in detecting loneliness. However, the scores from these scales may be affected by individuals' mental state and other circumstances. In recent years, researchers have accumulated evidence supporting the use of passive sensing to determine an individual's mental wellbeing [5]. Passive sensing is a way for smartphones and wearable sensors to collect information about individuals without their active participation. These sensing modalities may capture data that can be transformed into bio-indicators of users' mental health, which could aid in detecting behaviors or patterns linked with loneliness. A scoping review [6] on the detection of loneliness by passive sensing examined the idea and state of the art in using data streams from smartphone sensors to monitor users' daily lives and activities, which can subsequently be used as bioindicators of loneliness.
Current passive sensing-based techniques for detecting loneliness use generalized models trained with all available data at once. In this method, the model learns broader patterns that are common among observations, which are then utilized to predict loneliness. Since the daily living patterns of different people might vary considerably, which can impact the performance of a general model, it may be useful to find sub-groups that share similar behavioral patterns. Our hypothesis is that determining the loneliness levels of members of these sub-groups can provide a more nuanced understanding of loneliness and its complex relationship with behavioral indicators. Moreover, it is also important to analyze group dynamics over time to monitor behavioral changes among people, which helps to detect changes in real-time and track the evolution of groups over time. This helps in identifying behavioral patterns and trends that are not visible through traditional methods.
Our study provides a novel empirical evaluation of the effectiveness of group-based prediction models in identifying loneliness in students. We utilized incremental clustering to identify groups of students with similar behavior from a dataset and to monitor changes in these groups over time. Subsequently, we developed a loneliness prediction model for each group to assess whether group-based models outperform generalized models. In addition, this study revealed group-specific behavioral patterns within students. While clustering approaches have been utilized in various fields of study, their application to the understanding and classification of loneliness, particularly in the context of college students, is relatively novel. Our method of using dynamic behavioral changes and transitions between groups to reveal insights about loneliness is a unique aspect of our study. Furthermore, the performance decrease in mixed groups observed in our study provides valuable insights into the complexities of loneliness detection and can contribute to the development of more nuanced, effective models in the future. Consequently, we believe our study offers a meaningful contribution to the field of mental health and loneliness research. This group-based method for detecting loneliness among college students presents multiple potential real-world applications. Our approach for detecting loneliness by identifying and analyzing behavioral patterns in student populations could be integrated into existing mental health apps or student support services in universities. This could lead to the development of more targeted and personalized interventions that consider the unique behaviors and experiences of different student subgroups. Moreover, the applicability of our method is not restricted to students or to loneliness detection. The same approach could be harnessed to detect other mental health conditions, such as depression or anxiety, providing valuable insights that can aid in proactive mental health care. Beyond the academic setting, this system could be beneficial in various other environments. In workplaces, for instance, it could be used to monitor employee well-being and inform workplace wellness initiatives. For older populations, especially those living in isolation, the model could aid in the early detection of loneliness symptoms, triggering timely interventions. Therefore, our study not only contributes a novel methodology for detecting loneliness in students but also opens up a range of promising opportunities for the broader application of this method in addressing mental health and well-being issues.
The remainder of this paper is structured as follows: section II discusses the current literature in this field, section III presents the methodology, describes the dataset we used, data preprocessing, students subgrouping using clustering algorithms, and then binary classification for loneliness detection, section IV discusses the results, section V is discussion and section V draws conclusions and outlines future research directions.

II. RELATED WORK
Several research studies have used smartphone sensors and wearable devices to gather passive sensing data and monitor daily behaviors and other health indicators, which may subsequently be utilized to predict the association between behavior and mental health. However, these studies present several limitations which our research aims to address. The existing literature for loneliness detection through passive sensing has been examined in our review article touching on various aspects of prior work, especially population, privacy, and validation issues [6].  VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Pulekar et al. [7] used machine learning to identify loneliness in nine students by passively collecting and analyzing smartphone data. However, the sample size was small, making it difficult to generalize to a larger population. Doryab et al. [8] passively collected data from 160 individuals using smartphone and fitness bands. They investigated SMS and call logs, location data, Bluetooth and WiFi addresses, and fitness tracker health data. They used machine learning models to train and infer the loneliness levels of individuals. Three analyses were presented to determine the viability of passively detecting loneliness through smart devices: statistical analysis (based on UCLA responses), data mining analysis (presenting behavioral patterns using smartphone and fitness band data), and machine learning analysis (loneliness detection and change in loneliness level over an academic semester using smartphone and fitness band data). However, the study only included participants who identified as lonely and looked at the data after the end of the study only and not throughout. Due to the fact that the study employed a single, generalized model for all individuals, it was difficult for the model to work effectively for new, unseen participants with distinct behavioral traits.
In a study by Sarhaddi et al. [9], the authors employed a generic machine learning approach to predict maternal loneliness using all the data collected from wearable devices at once. The study provides valuable insights by focusing primarily on physiological data captured from wearables. While this approach has its merits, it leaves potential for further exploration of the vast behavioral data obtainable from alternative sources such as mobile phones, and the consideration of individual behavioral differences that could play a significant role in loneliness experiences. Furthermore, the use of only two machine learning models, decision tree and gradient boosting, may not fully capture the complexity of loneliness, which is a highly subjective and multifaceted experience. In contrast, our study leverages passive sensing data from mobile phones and employs unsupervised clustering to identify subgroups within a broader demographic, leading to more nuanced and accurate loneliness detection.
In a recent study [10], authors proposed an approach to detect loneliness using a combination of data from wearable devices and smartphones. The researchers employed a generic machine learning model, specifically a random forest algorithm, to predict loneliness based on a comprehensive set of behavioral and physiological data collected from these devices. However, while the study's comprehensive approach and the use of multimodal data are commendable, it falls short in considering the individual differences among people and groups. Loneliness, being a highly subjective experience, varies significantly among individuals and groups, and these variations can significantly impact the accuracy and effectiveness of loneliness detection models. The study's use of a generic machine learning model, which processes all available data at once, might not fully consider the key individual and group differences that are important in the context of loneliness.
Some studies have investigated the use of ambient sensing methods to identify loneliness in smart home environments [11], [12]. Smart home settings may collect data on many aspects of human activity using a variety of sensors, such as video cameras for in-home monitoring and body-worn tags. In contrast, ambient sensors provide a less intrusive method of monitoring activity, enabling data gathering without the need for user participation. Recent research has used ambient in-home sensing to uncover patterns in human behavior, including emotions, daily routines, and personality, that may be predictive of loneliness. To infer these patterns, data collected from ambient sensors has been analyzed using machine learning algorithms. Nonetheless, earlier research used generalized machine learning models and did not account for inter-subject differences by applying group-based models for loneliness detection.
There are some studies that have used clustering based models, but they have focused on stress [13], depression [14] and diabetes [15]. Moreover, these studies have not specifically targeted a student-based population, a demographic that presents unique challenges and considerations in behavioral patterns and mental health issues. All prior research on loneliness detection is based on generalized machine learning models for all available data, which do not account for inter-subject variability or behavioral variations across groups. Addressing these gaps, our study introduces a novel approach to the existing literature by applying a group-based model to detect loneliness specifically within a student-based population. This approach, informed by students' distinct and varying behavioral patterns, allows us to account for individual variability and group-specific differences. Consequently, our study not only fills a significant gap in current research but also introduces an innovative framework for loneliness detection in populations with unique behavioral characteristics and mental health contexts. This represents a substantial contribution to advancing personalized mental health interventions and predictive models.

III. METHODOLOGY
In this study, we used part of the publicly available Stu-dentLife dataset [16]. This data set was collected from 48 participating college students at Dartmouth College over the period of ten weeks using a smartphone app. The study was approved by the Institutional Review Board at Dartmouth College [16]. In our study, we used data from the dataset that was collected through accelerometer, microphone, light sensor, GPS, and Bluetooth sensors along with data for participants' personality traits which is presented in Table 2. The selection of these particular sensors is based on their significance shown in the literature for detecting loneliness [6]. The overall methodology has been presented in Fig.1.
The dataset includes responses from the UCLA loneliness scale [17], a 20-item questionnaire intended to measure subjective experiences of loneliness. 10 of these items are positive, with the remaining items being negative. This scale evaluates loneliness on a range from 20 to 100, with higher scores indicating higher feelings of loneliness. We have used these scores as ground truth for loneliness experiences. Scores over 44 indicate a significant feeling of loneliness. In the sample data, the lowest score was 25, and the highest score was 64. The UCLA loneliness scale was administered to students at two different time points: at the beginning (pre) and end (post) of the semester. The goal of these measurements was to capture any changes in students' experiences of loneliness over the course of the semester and to see if these changes corresponded with changes in their behavior as captured by the smartphone app.

A. DATA PRE-PROCESSING
During data preprocessing, we excluded the data of students who did not complete the post-study loneliness questionnaire. This process resulted in a refined dataset consisting of 41 students' data that were used in our study. Given the ten-week period of data collection, our dataset encompassed multiple data points for each student, collected through various sensors at different timestamps. Thus, this provided a comprehensive behavioral snapshot of these 41 students over the course of the ten-week study. Using each participant's timezone information, we transformed the UNIX timestamps of each sensor's data into a human-readable local date and time format. We split a 24-hour period into three sessions as students participate in different activities depending on the time of day: the day session (9am -6pm), evening session (6pm -12am), and night session (12am -9am).
In regards to the class distribution, our dataset after preprocessing consisted of two primary classes based on loneliness scores: those who reported significant feelings of loneliness (scoring above 44 on the UCLA loneliness scale) and those who did not. Out of the 41 students, 11 students fell into the 'lonely' category, while the remaining 30 did not, providing us with a clear class distribution to work with during our analysis.
To solve the missing values issue, we first eliminated all records containing outliers and then imputed missing data for each student using the median of a specific feature per session if its data values are continuous. For categorical data, we used the mode of a particular feature for that session. Moreover, we generated behavioral features for each student using the Reproducible Analysis Pipeline for Data Streams (RAPIDS) [18]. We generated digital biomarkers by quantifying the per-participant and per-session behavioral patterns (i.e., routines, irregularity, and variability) in these student data sets using basic counts, standard deviations, entropy, and the regularity index (features). All calculated participant features were combined on a per-epoch basis (day, evening, and night). We extracted 117 features from sensor data; the RAPIDS documentation contains descriptions of these features [18], [19]. Using one-hot encoding, categorical features were transformed to integer representation. To accommodate differences in student data, feature normalization (min-max scaling) was performed on numerical data, converting each value to the range [0,1]. As clustering works better with fewer features, we conducted principal component analysis (PCA) to the whole dataset in order to reduce dimensionality.

B. FINDING BEHAVIORAL GROUPS FROM STUDENT DATA
Clustering to group or categorize participants based on their behaviours is highly dependent on the type of data and the nature of the cluster; hence, no single algorithm can be employed to address all clustering problems. Multiple factors, including algorithmic parameters, the number of features, and the ordering of data presentation, might affect the clustering outcomes [13].
Before using clustering methods, we normalized the data to ensure that each feature has a similar range of values, making each feature contribute equally to the distance computation when identifying clusters. This can result in more balanced clustering and improved cluster separation.
We investigate which clustering method is most effective at grouping students based on their behavior. For this purpose, we evaluated four different clustering algorithms, K-means [20], OPTICS (Ordering Points To Identify the Clustering Structure) [21], Partition around medoids (PAM) [22] and IDBSCAN [23] to cluster the features from the mobile sensing data and obtain behavioral representations. The primary purpose of these clustering methods is to discover subgroups within the dataset that exhibit similar behavioral characteristics. The four clustering approaches analyze similarity between points in distinct ways, hence reflecting unique ways in which behaviors may be compared with one another and producing different cluster outputs.
We have used the elbow method to identify the optimal number of clusters for the k-means and PAM algorithms, as they require a fixed number of clusters to be defined in advance. The elbow method is one of the most popular methods for selecting the best number of clusters by fitting the model with a range of K values [24].
We have used the Silhouette score [25] and Xie-Beni Index [26] to evaluate the quality of clusters. The silhouette score is used to validate the consistency of clusters generated by clustering algorithms based on how effectively samples are clustered with other similar samples. The Xie-Beni Index is used to measure the average similarity between each cluster and its most similar cluster in a clustering solution. This index is particularly useful in evaluating the compactness and separation of clusters in a dataset. The Xie-Beni Index is calculated by dividing the sum of squared distances between data points and their cluster centers by the product of the number of data points and the square of the distance between the two closest cluster centers.
Based on quality of clusters created by different clustering algorithms, we have selected the IDBSCAN algorithm for our analysis. IDBSCAN is a variation of the DBSCAN algorithm, and it is specifically designed for incremental clustering. The algorithm can learn from data as it arrives and can adapt to changes in the data distribution over time. The basic working  of IDBSCAN involves maintaining a cluster hierarchy, which represents the density structure of the data. Each new data point is compared to existing clusters, and if it is within a certain distance and density range of a cluster, it is added to that cluster. If the data point is not within the range of any existing clusters, it is marked as noise or assigned to a new cluster. Unlike traditional DBSCAN, the IDBSCAN algorithm can add new data points to existing clusters, split or merge clusters as the data changes, without having to retrain the model on the entire dataset. In our implementation, we have used the Euclidean distance metric to calculate distances between data points. We also used a maximum distance parameter (eps) of 0.5 to define the neighborhood of each data point, and a minimum density parameter of 0.1 to determine whether a data point belongs to a cluster or is considered an outlier.

C. LONELINESS DETECTION FOR EACH GROUP
We addressed the detection of loneliness as a binary classification problem: 1 for lonely and 0 for not lonely. We have classified students as lonely or not lonely based on their UCLA survey scores. Students with a UCLA total score of 44 or above are regarded as lonely. Using the data of 41 students, we trained four distinct binary classifiers for each of the subgroups identified during the grouping process.
Using the synthetic minority oversampling method (SMOTE) [27], which generates synthetic data for the minority class, we resolved the class imbalance in the training dataset for each identified group, resulting in a balanced training data set for training classification models. Applying SMOTE can help balance the class distribution in a dataset, which can improve the performance of classification models by reducing the bias towards the majority class [28]. Initially, our models struggled with the underrepresented minority class. To overcome this, we employed SMOTE, boosting the minority class representation. Post-SMOTE application, the model's minority class predictions significantly improved, confirming that our class imbalance was impacting performance. Thus, SMOTE was key to enhancing our models' performance and generalizability.
Due to the small sample size of each group in the Stu-dentLife dataset, it was decided to use traditional machine learning methods for loneliness detection, as neural networks and other advanced techniques typically require larger datasets. These methods have been widely adopted and rigorously tested in various domains, including healthcare research, which enhances their credibility and reliability [29]. However, it is important to acknowledge that future studies with larger datasets could benefit from more advanced methods. We employed logistic regression, random forest, support vector machine, and XGBoost algorithms to train prediction models for loneliness detection. These models were trained for each behavioural group to detect loneliness for students of that group, rather than a generic model that uses all the available data at once for training. In order to evaluate the VOLUME 11, 2023 FIGURE 2. Visualization of students temporal behavioral group dynamics over a 10-Week period (G1:W1:9 denotes Group 1 in Week 1 with 9 students).
performance of the classification models for loneliness detection, we have used accuracy, precision, recall, and F1 Score. Accuracy measures the overall correctness of the model in classifying students as either lonely or not. It is the ratio of correctly classified students to the total number of students in the dataset. Precision is the proportion of true positive predictions (i.e., the number of students correctly predicted as lonely) over the total number of positive predictions (i.e., the number of students predicted as lonely, regardless of whether the prediction is correct or not). Recall measures the proportion of true positive predictions over the total number of actual positive cases in the dataset. In the context of loneliness detection, it measures how well the model can identify all the lonely students in the dataset. The F1 score is a combined metric that takes into account both precision and recall as harmonic mean, providing a single score that reflects the model's overall performance.
We have used the k-fold cross validation technique for model training because it provides a way to estimate the model's performance on new, unseen data. We chose the k-fold cross-validation technique for our study with k = 10 as it is a robust and widely used method for estimating the predictive performance of models. It allows us to make the most of our relatively small dataset by repeatedly using different partitions for training and testing, ensuring that every data point is included in the testing set at least once. Moreover, it provides a balance between computational efficiency and variance reduction, delivering more reliable average model performance metrics, which is critical in exploratory studies like ours. By splitting the data into 10 subsets and iteratively using each subset as a test set while training the model on the remaining subsets, we can obtain a more reliable estimate of the model's performance on unseen data [30].

D. LONELINESS DETECTION USING ALL STUDENTS DATA
Comparing group-based models with generic models is an important step in understanding the effectiveness of the behavioural grouping approach for detecting loneliness in students. The comparison can reveal whether the group-based models perform better than the generic models, and if so, by how much. This can provide insights into the degree of heterogeneity in the population and the effectiveness of the clustering algorithm in identifying subgroups of students with similar behavioral patterns. Additionally, comparing the models can also provide insights into the extent to which the behavioral characteristics of the subgroups differ from those of the overall population, and the extent to which they are contributing to the effectiveness of the group-based models.
For this purpose, we trained generalized machine learning models utilizing the data of all 41 students at once. We used the same four binary classification algorithms as we used for subgroup classification: logistic regression, random forest, support vector machine, and XGBoost. For model evaluation and calculation of precision, recall, and the F1 Score, we also used the 10-cross validation technique. The results of the

A. EVALUATING CLUSTERS QUALITY
In our evaluation, we have found 4 to be the optimal number of clusters for k-means and PAM models using the elbow method, as shown in Fig.3. To determine which clustering algorithms performed the best based on clustering group consistency, we assessed the Silhouette score for each clustering algorithm we used. Box plots in Fig.5 represent the Silhouette score analysis findings seen in the figure. Each boxplot on the x-axis indicates a clustering algorithm, while the Silhouette score is shown between [-1,1] on the y-axis. Based on the results, we can see that the IDBSCAN has the highest Silhouette score, indicating that data is properly combined in groups for this algorithm and that the inter-cluster and intra-cluster distances are excellent for those groups.
The output of the Xie-Beni Index is a numerical score that provides a measure of quality of a clustering solution. The idea behind the Xie-Beni Index is to minimize the distance between points within a cluster, while maximizing the distance between different clusters. As we can see in Fig.4, the IDBSCAN has the lowest Xie-Beni Index suggesting that clustering results are better for IDBSCAN as compared to other methods in terms of compactness and separation of the clusters.

B. TEMPORAL EVOLUTION OF STUDENT GROUPS AND BEHAVIORAL PROFILES
As the dataset consists of data for 10 weeks, we have applied grouping incrementally by adding each week's data to the same model in order to track changes in students' behaviour and the evolution of the groups over time. This approach provides insight into the changes in behavioural patterns and the transition of students between different behavioural groups throughout the study period. Fig 2 presents the student groups evolution over 10 week period. It revealed that from week 1 to 6, the grouping results were stable, and three groups were identified. However, from week 7 onwards, the grouping results changed, and four groups were identified. Each group represented a distinct behavioral pattern, and students changed their behavioral pattern as they moved from one group to another during the study period.
In order to observe the behavioral profile of each group, the features of each group were averaged on a weekly basis. This approach provides a clear and concise way to understand the behavioral tendencies of groups, allowing for further analysis and interpretation of the results. This method enabled us to gain insights into the various behavioral patterns that emerge among students over time. Table 2 describes the behavioral profiles of each group based on activity levels, conversation, Bluetooth, Phone Usage, calls and SMS activity. The ranges  for each of these behaviors were as follows: activity duration ranged from 30 minutes to 5 hours, conversation duration ranged from 10 minutes to 3 hours, and phone usage ranged from 1 hour to 6 hours. Fig 6 shows activity duration pattern for each group over 10 weeks in a heatmap. We can see that groups 1 and 3 have lower activity durations compared to Group 2, which has higher activity durations. There was a gradual increase in the activity duration pattern for Group 1. Group 4 showed a higher pattern for the first 2 weeks, with reduced activity in the last 2 weeks. Fig 7 shows weekly conversation duration pattern of groups. Group 1 has mostly lower values throughout the study period, while Group 2 has mostly high conversation durations around week 6 and in the last three weeks of the study. Group 3 showed an average conversation duration with a gradual increase towards the end of the study, whereas Group 4 had a more random pattern for conversation durations.
The phone usage patterns in Fig. 8 show that Group 1 has the highest phone usage pattern throughout the study period. Group 2 shows an average to low pattern, while Group 3 has lower usage durations throughout study period. Group 4 has a random pattern.

C. LONELINESS DETECTION USING ALL STUDENT DATA
The performance of the generalized machine learning models, which were trained on the entire student dataset, is summarized in Table 3. Each of the four binary classification algorithms, logistic regression, random forest, support vector machine, and XGBoost, was applied to the data.
The XGBoost model showed a strong performance with an accuracy of 81%, a precision of 91%, a recall of 86%, and an F1 score of 88.5%. The random forest model demonstrated similar accuracy at 82%, with slightly lower precision, recall, and F1 scores of 86%, 82%, and 84%, respectively.
The support vector machine model had an accuracy of 76.5%, precision of 75%, recall of 82%, and F1 score of 79%. Lastly, the logistic regression model performed at a slightly lower level, with an accuracy of 72%, precision of 67%, recall of 78%, and an F1 score of 72%.

D. IMPACT OF BEHAVIORAL GROUPING ON LONELINESS DETECTION
We trained classification algorithms to identify loneliness within each behavioral group. Table 4 depicts the performance of loneliness detection models for each of the four groups. The results showed that except for Group 4, all models performed well for the other groups.
The XGBoost algorithm outperformed the generalized model in terms of accuracy, precision, recall, and F1 score for three of the four groups. The highest accuracy of 89.5% was achieved for Group 2 using XGBoost, while the highest accuracy for the generalized model was 82% using Random Forest.
The Random Forest algorithm achieved the best accuracy among the generic machine learning models, with a score of 82%. However, XGBoost performed better in terms of precision, recall, and F1 score despite having a slightly lower accuracy of 81%.
These findings suggest that behavioural grouping of students can lead to improved performance of loneliness detection models. Furthermore, the choice of algorithm can have a significant impact on the performance of the model, with 88848 VOLUME 11,2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. XGBoost showing promise as a potentially better option for loneliness detection in students.

V. DISCUSSION
Behavioral patterns vary significantly among different groups of students, especially in terms of activity and phone usage duration. The classification models trained on each of the identified groups performed better than the generalized model and these results are also consistent with our earlier study [31]. The improved performance of the group-based models suggests that training the models on more homogeneous data may have reduced the noise and variability present in the data, resulting in better performing models. This reduced variability may have facilitated the identification of relevant patterns and enabled more accurate predictions. Additionally, the behavioural patterns identified by the clustering algorithm may have been more closely related to loneliness levels for that group than the patterns identified by the generalized model, which could have improved the performance of the group-based models. Another important observation is that the group-based models have the potential to identify and account for individual differences that exist within each behavioral subgroup, which may not be detectable in the generalized model. For instance, the generalized model may overlook the nuance patterns of behavior and preferences of individuals, leading to a loss of information and a decrease in predictive accuracy.
The emergence of a new group around week 7 is an interesting insight. When more weeks of data were added, it is possible that a broader range of behaviors and patterns emerged, making the dataset more complex. This increased complexity has led to the discovery of a new group, as the algorithm analyzed the data for various patterns. In addition, it is possible that around week 7 there were underlying shifts in student behavioral patterns that contributed to the formation of a new group. For instance, students may have started engaging in new activities or their previous habits may have grown more evident, resulting in increased difference across groups. In fact, the model evolved over time, becoming more sensitive to subtle changes in student behaviors and more efficient at recognizing separate groups.
The poor performance of Group 4 could be due to the high variability in the behavioral pattern among the students in that group. When there is a lot of variation in the behavioral patterns of the individuals in a group, it becomes challenging for a classification model to identify common features and make accurate predictions. In other words, the higher the heterogeneity of a group, the more difficult it is to accurately predict loneliness. This could be because the features that are associated with loneliness might not be present in all students in this group or could be masked by other features that are not related to loneliness. Additionally, the limited number of samples in this group could have also contributed to the poor performance of the classification model. These findings suggest that the performance of loneliness prediction models can be significantly affected by the level of variability in the behavioral patterns of individuals in a group, highlighting the importance of careful consideration of group heterogeneity when designing and evaluating prediction approaches.
This work makes several contributions to the area of student mental health research. The use of the StudentLife dataset and incremental clustering provides a distinctive way for identifying subgroups of students with different behavioral patterns that might influence targeted interventions. This is particularly important given the heterogeneity of student behavior and mental health issues. This study contributes to the literature on the use of data-driven methodology to identify college students at risk of loneliness, a prevalent and significant mental health concern. The comparison of group-based models with a generalized model employing all available data demonstrates the significance of considering behavioral subgroups when designing interventions to enhance student mental health. Lastly, our results imply that the incremental clustering approach might be a useful way for monitoring changes in student behavioral groups over time, giving a potential avenue for early detection and prevention of loneliness.
However, there are some limitations to this study. Despite the fact that these initial findings indicate improved performance for group-based prediction models for loneliness detection, this strategy must be evaluated on a larger, more diverse population sample. College students often share similar daily routines and other psychological pressures, which reduces the effect of a large array of possibly subtle factors, and thereby restricts the generalizability of our results.
While our study provides a novel perspective into the dynamics of loneliness among college students based on available data, we acknowledge that the StudentLife dataset may not encompass all relevant factors impacting loneliness. Certain aspects like personal relationships, family history, significant life events, mental health history, or personal coping mechanisms, which could significantly influence an individual's experience of loneliness, are not captured in this dataset. Therefore, while the results of our analysis shed light on the correlation between sensor data and feelings of loneliness, they should be interpreted within the context of these limitations. We recommend future research to consider integrating these additional factors for a more comprehensive understanding of loneliness among college students. Another limitation is reliance on self-reported loneliness data. The subjective nature of such reporting may sometimes lead to discrepancies between reported feelings and actual emotional states due to factors like response bias and mood variation. Thus, while our analysis provides important insights, it is crucial to note that these self-reported feelings might not always accurately represent the students' actual loneliness levels.
The StudentLife dataset was chosen for this study since it is the only publicly accessible dataset containing a clinical loneliness measurement scale. However, to address the limitations of the current study and expanding its scope, we are in the process of collecting our own dataset that includes younger and older populations, thereby creating an opportunity to assess the efficacy of the proposed method across a more diverse range of demographic groups. In addition, we will integrate demographic characteristics, such as age and gender, to examine their impact on groups and loneliness detection performance.

VI. CONCLUSION
This study presents a novel group-based method for detecting loneliness among college students with similar behavioral patterns, a significant shift from traditional, generalized models. In our study, we first leveraged clustering-based techniques to discern distinct behavioral pattern groups within a subset of the StudentLife dataset. Subsequently, we traced the temporal evolution of these groups over the course of the study. Finally, we applied binary classification algorithms to each group to determine the presence of loneliness. Our findings reveal that grouping students based on their behavioral patterns significantly improves the performance of the loneliness detection models compared to a generalized model, underscoring the potential of personalized, group-based detection methods. This not only enhances our understanding of how behavioral pattern changes affect loneliness but also opens new doors for developing tailored loneliness detection systems. However, the small sample size and limited dataset may have limited the generalizability of our findings. Future research should aim to validate these results with larger, more diverse populations. Moreover, there is room to explore the integration of additional behavioral data or incorporating other machine learning models to further improve the prediction accuracy of loneliness detection. It would also be of interest to examine the application of this methodology to other mental health issues, expanding its scope beyond loneliness. Our study stands as a significant contribution to the field, offering an incremental approach to monitor and identify behavioral patterns associated with loneliness. It provides a promising avenue for developing more accurate and personalized loneliness detection systems, paving the way towards more effective mental health detection.
MALIK MUHAMMAD QIRTAS received the B.E. degree in information technology from the University of Engineering and Technology, Taxila, and the master's degree in computer science from COMSATS University Islamabad. He is a Computer Science Ph.D. Researcher with University College Cork, specializes in the intersection of computer science and social sciences. Through the use of passive sensing via smartphones and fitness bands, his innovative research focuses on collecting user data to discern behavioral patterns, ultimately aiming to assess individuals' mental health. He has a practical experience as a software engineer to his multidisciplinary research, encompassing human-computer interaction, technology-driven human behavioral sensing and modeling, sensory instrumentation, and mobile and ubiquitous computing.
ELEANOR BANTRY WHITE is a Senior Lecturer of social work with University College Cork, Ireland. Since her doctoral work with Oxford University, she has been researching social intervention primarily in the context of global population ageing. Her research applies mixed methods to examine the impacts of social and physical environments on ageing trajectories and experiences. She has a particular interest in wellbeing, social participation, arts engagement, and loneliness. Her current research examines the role of wearable and ambient sensing technology in identifying loneliness, thereby opening opportunities for early intervention to reduce the harms associated with loneliness.
EVI ZAFEIRIDI received the first bachelor's degree in psychology from London Metropolitan University, the second bachelor's degree in sociology from Panteion University, the M.Sc. degree in psychology and neuroscience of mental health from King's College London University, and the Ph.D. degree in psychology from the University of Hull. She received the Postgraduate Certificates in research training from the University of Hull and in brain imaging from the University of Nottingham. She is a Postdoctoral Researcher with the School of Computer Science and Information Technology and the School of Applied Social Studies, University College Cork. She also works as an Associate Lecturer of psychology with The Open University, U.K. Her research field is ageing. Her current research is focused on technology for monitoring older people's wellbeing.
DIRK PESCH (Senior Member, IEEE) received the Dipl.-Ing. (M.Sc.) degree in electrical and electronic engineering from RWTH Aachen University, Germany, and the Ph.D. degree in electrical and electronic engineering from the University of Strathclyde, Glasgow, Scotland, U.K. He is a Professor of computer science with University College Cork (UCC), Ireland. Prior to joining UCC, he was a Professor and the Founding Head of the Nimbus Research Centre, Munster Technological University. He is the Director of the Science Foundation Ireland Centre for Research Training in Advanced Networks for Sustainable Societies and a Principal Investigator in the SFI funded CONNECT Centre for Future Networks and the CONFIRM Centre for Smart Manufacturing. His research interests include architectures, design, algorithms, and performance evaluation of the Internet of Things and cyber-physical systems for applications in smart and connected communities, health and wellbeing, and smart manufacturing. He is on the editorial board of a number of international journals and contributes to international conference organization in his area of expertise.