Tracing the Emotional Roadmap of Depressive Users on Social Media Through Sequential Pattern Mining

Depression is one of the most growing health disorders, generating social and economic problems. The affective computing models focus on analyzing unique user posts, not observing temporal behavior patterns, which are essential to track changes and the evolution of emotional behavior and user context, that involves the persistent analysis of feelings and characteristics over time. This article proposes the TROAD framework for longitudinal recognition of sequential patterns from depressive users on social media. The framework identifies the best interval to analyze every user activity, extracts emotional and contextual features from user data, and models the features into time windows to recognize sequential patterns from depressive user behavior. The main characteristics of the users found in the top-10 rules are negative emotions: violence, pain, shame, depression, sadness, and silence. We obtained strong sequence patterns with a minimum of 70% of support, 81% of confidence, and 69% regarding sequential confidence, considering periods of silence between users’ posts. Without considering silent periods, the rules showed 70%, 86%, and 38% of support, confidence, and sequential confidence. TROAD computational approach is a promising tool for clinical specialists in human behavior.


I. INTRODUCTION
Depression is one of the most prominent and growing disorders in the world. The World Health Organization -WHO [1] estimates that 20% of the population will experience mood disorders at some point in life. Depression has been characterized as a mood disorder in which emotional and motivational conditions are the most compromised components.
The associate editor coordinating the review of this manuscript and approving it for publication was Arianna Dulizia .
It comprehends many symptoms like a persistent feeling of sadness, a decrease of interest in engaging in activities that would be typically pleasurable, with impairment of routine and daily activities along two weeks at least. Also, there is an increase in anxiety, reduced concentration, feelings of guilt, hopelessness and worthlessness, suicidal ideation, among others [2], [3].
Traditionally, mental health professionals have primarily used clinical examination techniques based on self-reporting of emotional, behavioral, and cognitive dimensions to diagnose and monitor the condition's evolution. Examples of examination techniques are questionnaires, interviews, inventories, and other standardized instruments. Even though there are successful therapies for mental illnesses, between 76 and 85 percent of people in low and middle-income countries do not receive proper mental care [4].
The lack of funding, qualified health providers, and the social stigma associated with mental illness are barriers to successful treatment. The imprecise evaluation is another obstacle to successful clinical care [3]. Although the processes and psychological instruments for detecting mood disorders have been improved and specialized to allow an earlier diagnosis, access to care is still limited.
In the last decades, the growing access to the internet and connected devices has encouraged users to interact and register their everyday feelings and opinions online. Among the most popular social networks are Reddit, Facebook, Twitter, and Instagram. Online social networks are potential tools for specialists in human behavior, such as psychologists, psychiatrists, and therapists, given the extensive sharing of personal user data online, contributing to provide patients' extra information to specialists [5]- [9]. For example, a specialist could take advantage of an existing platform's emotional reports (with previous consent). Then, the specialist can combine the discovered information with the patient's evaluation in a faceto-face service (limited by the hour) to improve the evaluation of the patient's mental state. In this context, changes in online users' behavior can provide meaningful mental illness indicators and trigger health care systems to produce automatic analysis and reports to inform specialists and allow early interventions, even before feelings worsen [10], [11].
Computational support is still considered insufficient to provide a precise diagnostic or complete support of human behavior specialists [12], [13]. Works from the literature have focused on the extraction of emotional characteristics and identifying feelings from individual posts without considering aspects of temporal and longitudinal analysis, i.e., the user's timeline and history [6], [8], [14].
In contrast, organizations such as World Health Organization (WHO) [1], [3], and American Psychiatric Association (APA) [2], as well as the traditional evaluation protocol followed to identify mood disorders, all of them strongly recommend observing patients' feelings for a while. Many psychological disorders involve perseverative patterns of cognition, affect, and behavior. However, in depressive disorders, studies point to the propensity of persistent negative feelings in the individuals' lives, despite the circumstances and environment [15], [16]. This specific pattern, named emotional inertia [17], contributes to the severity of depressive symptoms in people diagnosed with depression [16] and can be considered a risk factor for the development of the mental disorder [18]. Therefore, if this characteristic tends to persist in the lives of depressed individuals, a pattern of negative responses may also be persistent in different environments in which these people communicate, such as on social networks.
Traditional methodologies provided by different associations point to the necessity for long-term evaluations. The methodologies focus on providing information that a post contains depressive emotions without matching aspects of the user's contexts, personality, or the user history on the network, that is, the user's timeline. The given information shows that a single user post can hardly predict mood disorders and other mental disorders, working only to inform the polarity of sentiment and emotion in a specific event. In contrast, analyzing user behavior changes over time can discover interesting information and indicators regarding different users.
Sequence pattern mining approaches can provide longitudinal analysis and predictions in different contexts and applications, e.g., stroke prediction [19], crime location prediction [20], prediction of cerebellar ataxia based on body activity detected by sensors [21], opinion analysis on Twitter (e.g., during the American pre-election campaign) [22], stress detection using smartphones [23], among others. Temporal analysis of feelings in social networks has to be subject of a study aimed at extracting characteristics of feelings, mainly in specific contexts, such as from students, regions, workers, and visual analysis over time, without considering specific approaches of sequence pattern mining [24]- [29].
In this work, we take into account the need for longitudinal approaches to assess the depressive behavior of users in social networks. Differently of previous works and considering the potential use of mental illness indicators by domain experts to meet the functional requirements of the behavior analysis area, we investigate the following research questions: • Is it possible to recognize rules that describe the emotional behavior of depressive users on social media via sequence pattern mining with high-sequential confidence?
• What feelings and emotions do happen sequentially?
To answer the research questions, we propose TROAD (Tracing the Roadmap of Depressive Users), a framework for the collection, preprocessing, modeling, and knowledge discovery of sequential patterns of user emotions from social networks.
Accordingly, the proposal of TROAD in this work contributes with: • The collection of the timeline of depressive users in social media and the extraction of emotional features validated by specialists considering contextual aspects of users' posts.
• A flexible approach to model and discretize the users' timeline in the form of a time series, considering that (i) each user has a posting frequency and (ii) there is no common frequency pattern that describes all users.
• A methodology for discovering sequential patterns describing the behavior of depressive users on social networks. This methodology aims at supporting the observation of feelings and emotions occurring concomitantly and sequentially in a period of 15 days.
The remaining sections of this work are organized as follows. Section II presents related work. Section III describes the material and methods. Section IV details the experimental results. Section V presents the discussion and takeouts from the literature. Finally, Section VI gives the final remarks.

II. RELATED WORKS
This section presents previous studies focused on the temporal analysis of depressive users on social media. We use the following research string to search for the related studies: ((''temporal'' or ''longitudinal'' or ''sequential) '' and (''depression'' or ''mood'') and (''social media'' or ''social network'')).
In the studies [24], [25], De Choudhury et al. conducted a survey asking the Twitter users in the United States about their feelings with questions of the CES-D (Center for Epidemiologic Studies Depression Scale) [30], [31], and collected the Twitter profile of the user. The authors analyzed the users' activities for a year, counted the frequency of posts and replies, and created a Graph-based on user relations. To extract the expression of emotions and linguistic style, they use the LIWC lexicon. 1 The authors conducted a longitudinal analysis plotting the user features along the time, and the results showed a decreased social interaction, increased negative affect, strongly clustered ego networks, increased relational and medicinal issues, and increased religious involvement.
Seabrook et al. [26] used a mobile app to collect and analyze the frequency of negative expressions in texts associated with depression of 49 Twitter users and 29 Facebook users located in Australia. The authors also evaluated the connection between depression intensity and the volatility and consistency of emotion. They extracted features using LIWC and performed a temporal analysis, comparing the differences among distinct social media platforms. The results show that the instability of negative expressions is a predictor of severe depression on Facebook and lower depression severity on Twitter.
In study [27], Chen et al. analyzed sentiments, activities, and linguistic styles of users on Twitter to detect depression. The authors collected posts containing the expression ''I was/have been diagnosed with depression'' for four months, using the official API. They used the EMOTIVE ontology semantic model to recognize Ekman's basic expressions of emotions: sadness, surprise, anger, fear, disgust, and happiness [32]- [34]. As a result, the authors produced a descriptive temporal analysis plotting the features along the time.
Aalbers et al. [35] conducted a survey with 125 Amsterdam students asking about their use of social media on smartphones. The goal was to measure passive use of social media, correlate it with depression symptoms using the multilevel auto-regressive time-series model, and model the data in a network form. The authors concluded that features related to the passive use of social media, e.g. the time scrolling the feed page, are not a good predictor of depression or stress symptoms. However, much time in passive mode can predict high levels of fatigue, interest loss, concentration problems, and fatigue and loneliness symptoms can predict the passive use of social media.
In the same research direction, Heffer et al. [36] conducted a survey and implemented an auto-regressive analysis to investigate the correlation between social media use by Canadian adolescents and depression symptoms in a long-term period. The study concluded that the use of social media does not predict depression in girls or boys, but severe depression predicts the use of more intensive social media by girls.
Yao et al. [28] collected posts of a community of depressive users on the Sina Weibo (weibo.com) Chinese social network, using the official API. The authors applied K-means [37] and Pearson's chi-square [38] over the data to investigate the frequent use of negative emotional terms and their correlation with depression. The results suggest that depression users are more active and express more negative terms, and depression is characterized by demonstrating three times more negative terms, on average, during the first month of social media use, in which the depressed symptoms are more evident in comparison to other periods.
Wang et al. [39] analyzed a published database constructed with blogs and Twitter posts to identify and predict mental disorders in six different occupations (waiters, reporters, engineers, travelers, musicians, and comedians), with different levels of stress. The authors used the Sentiment140 lexicon [40] to extract emotional features and the FP-Growth [41] algorithm to discover frequent patterns. The results show that depression disorder contains the ''sadness'' emotion with 0.86 of intensity. Bipolar disorder is formed by ''sadness'' with 0.298 of intensity. However, the anxiety disorder contains mixed features occurring in the same sequence: ''sadness'' (0.088), ''despair, pain, hope, concentration, sadness'' (0.812), ''anxiety, intimacy, trust, pain, confidence, tired'' (0.89). Table 1 summarizes the related studies, with the main features considered in this work, including how the study performed data collection, what are the main approaches employed, if the authors used social networks and information sources, and the media used for the analysis. Also, the table informs in which context the work was conducted and how is the temporal analysis performed in the corresponding work.
Among the works mentioned above, the studies [35], and [36] conducted a longitudinal analysis, but did not analyze the content of social media. Only the study [27] approached a broad context, while the others are applied to specific locations or audiences. Studies [27], and [28] used specific APIs for the direct collection of social network data. The study [39] used a database published in previous work, while the others studies relied on questionnaires. Although the described studies conducted temporal analyzes over social media data, only study [39] used an approach based on sequence pattern mining, while the others just arranged collected information and features over time.
The previous studies presented in this section only addressed the sequential component of the user behavior individually, describing a unique pattern for all users, and without considering the personalities and individuality of each user. It is important to pay attention to individual patterns and multi-information in order to detect changes in their behavior. In this work, we propose a framework to investigate sequence patterns discovered from users' posts, comments, and replies along time to fill this gap. Then, we model users' behavior in the form of time sequences. Each sequence is constituted by the emotional features extracted from the users' complete timeline on social media. We detail the proposed framework in the next section.

III. MATERIAL AND METHODS
This section describes the methodology employed to collect, model, filter, and process the data acquired from users over time. We propose the TROAD framework (Tracing the Roadmap of Depressive Users), which models the collected data into time windows, classifies emotions from users' posts, comments, and replies, and discovers sequential patterns from users' emotions.
A. DATA COLLECTION AND FILTERING Figure 1 illustrates the pipeline for data acquisition and user filtering. We chose the Reddit social network because it has specific mutual support communities for diverse themes, including depression. We developed (i) a data crawler using the official Reddit API (https://www.reddit.com/dev/api/ and downloaded posts, comments, and replies from the /r/depression subreddit, publicly accessible on (https:// www.reddit.com/r/depression/).
In total, (ii) the crawler retrieved 415,459 user records, 778,822 posts, and 1,573,858 comments and replies from January 1st, 2009 until July 1st, 2019. We applied the PROMIS form of depression assessment level 2 [2] for 317 adults to check the sample of depressants in the municipality, and we found that 55.2% had severe depression, 31.7% moderate, and 7% mild, that is, 93,9% of respondents report some level of depression.
The average of posts and comments per user is 3.8, but this information is not evenly distributed since a large portion of the users has posted and commented only once in the period we collected the data. Thus, (iii-iv) we filtered the collected data in order to maintain only users that were active in the community for at least 15 days, resulting in 1,212 frequent users. The users were deidentified, generating random IDs to follow the Ethics protocol defined for this research.

B. THE TROAD FRAMEWORK
After collecting the data from Reddit, we employ TROAD to model the collected information and extract frequent patterns from the expressed emotions. Algorithm 1 details the main step of TROAD, which receives as input: • Dataset: raw user data collected using the Reddit API • Intervalsize: number of days in which the algorithm will look for frequent activity of every user • w: number of windows that TROAD will split the time interval As output, TROAD returns: • sPatterns: sequential patterns discovered from users' timelines The output includes the temporal rules with the items of emotional and contextual features, the corresponding support and confidence values, and also a visual representation of the rules.
The algorithm starts (Line 1) initializing the set of features, which will receive features' time series extracted from user data. For each user in the input data set (Line 2), the algorithm extracts (Lines 3-10) the corresponding time series composed Algorithm 1 TROAD Input: dataset: data from users acquired via API intervalSize: number of days to analyze the user behavior w: number of windows to split the user timeline Output: sPatterns: sequential patterns discovered from all users 1: featuresFromAllUsers ← [] // Initialization 2: for each user in dataset do 3: userData ← GetRawUserData(user, dataset) 4: // Data cleaning and standardization 5: userData ← PreprocessRawData(userData) 6: // Data selection and transformation 7: userTimeline ← ProcessTimeline(userData, interval-Size, w) 8: // Extract emotional and contextual features 9: userFeatures ← ExtractFeatures(userTimeline) 10: featuresFromAllUsers.Add(userFeatures) 11: end for 12: // Extract patterns of user behavior 13: sPatterns ← getSequentialPatterns(featuresFromAllUsers) 14: Return(sPatterns) of emotion and context features. TROAD (Line 3) selects the data corresponding to the current user, and (Line 5) preprocesses the information to remove tags, stopwords, lemmatization, stemming, and special characters (maintaining the emoticons). Then, in (Line 7) the algorithm processes the user data to select the best interval to analyze the user data based on a phase-shift heuristic. With (Line 9) the user timeline from the selected time interval, TROAD extracts emotional and contextual features corresponding to the user activity, (Line 10) adding them to the set of features featuresFromA-llUsers. Finally, (Line 13-14) TROAD employs a sequential pattern mining algorithm to discover patterns from all users' timelines, returning the patterns as output. Following we detail every function of TROAD.

1) DATA CLEANING AND TRANSFORMATION (PREPROCESSRAWDATA)
We preprocessed the raw user textual data with the NLP-preprocessing library NLTK [42], following the literature's classic steps: 1) Removal of html formatting tags from the text, e.g., ''<b>I'm suffering</b>'' to ''I'm suffering''. 5) Stemming and lemmatization textual normalization, in order to generate radicals and decide whether to remove the word suffix. Emojis and emoticons in social media can contribute up to 57% with the sentiment analysis process [43], [44]. Accordingly, in Step 2, we kept a set of characters representing emojis and emoticons due to their potential to represent emotions and add context and feelings to user posts. To check if a notation is an emoji or emoticon, we took as reference the respective categories in the database provided by Rodrigues et al. [45].

2) DATA SELECTION AND TRANSFORMATION (PROCESSTIMELINE)
This study assumes seasonal episodes in our emotional features since we restrict our analysis to a community focused on depressive users. The most significant difficulty when analyzing Reddit's data is the lack of uniformity in users' frequency of posts and comments. To work around this problem, we apply a cut filter to weigh the user activity frequency and select active users, which frequently post in a determined period. For every user, TROAD looks for the period with more activity in the social network, and the selected period is further divided into time windows. We call the windows with no user activity silent periods and classify each of them with ''silence'' in the feature extraction step, which we explain later in this section.
TROAD transforms the user data into timelines for pattern discovery with a three-step discretization process: 1) Frequency: First, we understand the impact of different time window sizes within the user data specifics. Very short windows preclude informative inferences from sequences since the collected data has few posts. Significantly increasing the number of windows requires the method to classify most of them as ''silence'', which adds minor semantics to the pattern discovery (''silence'' is not an emotion) and increases the complexity of generating rules with of large itemsets. Additionally, with long windows, we lose the subtleties of changes in emotions that we are interested in because a user may have many posts per window, and thus we lose the information of transition between moods of our user. The average time interval between two posts in our selected series is of two days and 8 hours. Accordingly, we modeled our approach to work with a time interval of 15 days, which is the minimum number necessary for diagnosing depression [1], [3], and a window size of 3 days. We consider a sequential analysis of users through 5 windows to meet the specialists' requirements in the analysis of human behavior i.e., psychologists, psychiatrists, and psychotherapists. Bearing in mind that traditional methodologies for inference of mood disorders usually constitute employing questionnaires and interviews on at least 15 days. Specifically considering depression, the manifestation of sadness Parameter φ ∈ R + is the phase shift that has a value between zero and the number of days in the window (i.e., 0 ≤ φ < 3 in our modeling).
In this step of TROAD, only users with at least 15 days of frequent and consecutive activity were maintained. We maintained users with silent windows (i.e. without user activity) within the period, but not at the first and last windows, since in this case, the time series would have size ≤ 15. After the filter pass over the collected data, we selected a total of 1,212 users with a time series considered adequate to perform feature extraction and analysis.

3) EMOTIONAL FEATURE EXTRACTION (EXTRACTFEATURES)
All posts and comments were classified using the Empath lexicon [46]. Six judges selected a subset of the words from Empath's dictionary to classify users' timelines, observing the classes reported on Empath library documentation. Three judges had experience in text mining and classification, and three judges are Psychology researchers. Initially, the project context was explained to specialists, considering the main depression characteristics reported by WHO (e.g., persistent sentiments of pain, sadness, and a loss of interest in activities previously enjoyed). Finally, we applied the Kappa coefficient [47], [48] and selected the features (classes) with 80% of agreement between the judges, listed in Table 2.
TROAD employs Empath with the selected classes to generate a feature vector from the user timeline corresponding to every window, separately. This results in a feature set V = {v 1 , . . . , v w }, where w is the number of windows and v i |1 ≤ i ≤ w is the feature vector extracted from the user timeline at window i.

4) SEQUENTIAL PATTERN MINING (GETSEQUENTIALPATTERNS)
After obtaining the features from the timelines of all users, TROAD extracts patterns of user behavior. An association rule is A ⇒ B, which indicates that the set of items A implies in the set of items B. In this work, an item can be any of Empath's classifications (see Table 2). Support and confidence are two measures of interest employed to discover association rules [49]. Let D be the dataset with all user features. The support of a rule is the probability of A and B occurring together in D, and is given by Equation 2.
The confidence of a rule assesses the degree of certainty of the rule, and is given by Equation 3.
Among the users presenting the feelings from set A, the confidence represents the probability of also having the feelings from set B, not necessarily in the next immediate window. However, we are interested in analyzing the user behavior within the time interval and in sequence. We employ the PrefixSpan [50] sequential pattern mining algorithm to discover association patterns between sets of items that occur concomitantly and sequentially. The measure of interest employed is the sequential confidence, which informs how likely a set of items A from window w i occur before a set of items B in the next immediate window w i+1 (i.e. A w i ⇒ B w i+1 ).
TROAD found strong sequential rules that describe the depressive behavior along time on social media. With the guidance of specialists, we defined a minimum support threshold of 70%, which resulted in a total of 356,258 sequential patterns. We present and discuss TROAD's discovered rules in the next section.

IV. RESULTS
In this section, we present the sequential patterns discovered with the application of TROAD over the user data collected from Reddit. We consider a window size of 3 days for all experiments. However, we vary the number of windows from 1 -to check the occurrence of individual emotion expressions to 5 -which corresponds to an interval of 15 days, suggested by specialists as the optimal time interval to infer consistent user behavior patterns. In Subsection IV-A we analyze patterns discovered considering only emotions detected from users. Subsection IV-B analyzes the patterns considering emotions and silent periods, in which the user has no activity between two or more windows in the analyzed interval. In Subsection IV-C we present the emotion transitions visually over time. Finally, in SubsectionIV-D we present a discussion regarding sequential patterns containing contextual information extracted from the users' posts.

A. SEQUENTIAL PATTERNS OVER USER EMOTIONS
First, we analyze the sequential patterns of user behavior without silent windows. Table 3 presents the top-10 discovered rules per the number of windows (from 1 to 5) and the corresponding support, confidence, and sequential confidence of the patterns.
For rules of w = 1, the only positive emotion expression was ''positive'', and most of the emotions occurred alone. The majority of rules generated with w = 2, w = 3 and w = 4 contain the emotional expression ''negative''. Notice that TROAD only found two frequent patterns for windows of size 5 (with sequential confidence of up to 0.4), mainly composed of the ''negative'' emotion. Although the number of rules is smaller than the smaller number of windows, the two generated rules are significant for the user behavior analysis since they correspond to patterns observed during 15 days, characterizing a recurrent pattern. At w = 5, it is possible to observe high confidence above 86%, but the sequential confidence is already lower at 38%. This is evident, as people can manifest patterns of individual and different emotional expressions, which may not be a pattern that describes the temporal order of the entire population analyzed. However, it can be an expressive pattern for 70% of users without a temporal order. Besides, the use of negative terms successively for four windows of time can predict a context of violence with high non-sequential confidence. We observe that ''pain'', ''shame'', and expressions of violence are always present in rules with negative emotions.
Regarding the emotions that occur in the same time window, ''pain'' and ''violence'' and ''pain'' and ''shame'' occur in the same period with w = 2. As the number of windows increases and, consequently, the analysis, negative emotions become more evident in the rules. The use of emotions that appear together ceases to appear for stricter rules. Rules with less trust can be consulted because even if they do not describe a typical pattern, they can describe the users' individuality.
Next, we analyze the generated patterns considering silent periods.

B. SEQUENTIAL PATTERNS OVER USER EMOTIONS AND SILENT PERIODS
When posting on social networks, many users may present an intermittent activity. In this experiment, we address this behavior by including silent periods with the interval of postings analyzed from each user. Table 4 shows the top-10 discovered sequential rules per the number of windows (from 1 to 5) and the corresponding support, confidence, and sequential confidence of the patterns. Again, the majority of items appearing in the frequent patterns relate to negative emotions. Interestingly, the item ''silence'' does not appear as one of the most frequent items for w = 1 and w = 2. However, for windows, w ≥ 3, the silence period appears in all rules, which indicates that users frequently post content related to negative emotions interleaving periods of silence, i.e., absence of social interaction in the community.
Considering only one window (w = 1), emotions ''pain'', ''violence'', ''negative'' and ''shame'' appeared together and combined. In the span of two-time windows (w = 2), the emotion ''violence'' appeared in most discovered rules, also co-occurring with ''pain'' within the same window. Considering w = 3 and w = 4, most of the discovered patterns show silence periods with the detected emotions. Finally, considering a period of 15 days (w = 5), most of the discovered rules show that users expressed themselves with posts, comments and replies with the emotions ''negative'', ''pain'', ''shame''. Again, the rules present periods of silence with no user activity.
The sequential rules of several extensions, that is, with w > 2, demonstrate in all periods that the depressed user has participation in intermittent posting, that is, that fluctuates over time. It breaks the common myth that the depressed user does not post on the network or is just a passive user who scrolls the page. Although the user was silent, that is, TABLE 3. Discovered rules without silent periods: Top-10 sequential patterns discovered for different sequences of windows ordered by the sequential confidence. they did not post or did not comment in the period, they may have participated in other ways, such as reacting to the content whether they like it or not (upvote or downvote on Reddit). Consequently, tracking periods of silence allowed us to extract solid sequential rules over a more extended period, contributing significantly to the sequence pattern mining. This fact is more evident with w = 5, because without periods of silence (Table 3), the algorithm extracted just two rules with a minimum confidence of 0.86 and temporal confidence of 0.38. Considering periods of silence, as shown in Table 4, it is possible to extract the top-10 rules with a minimum confidence of 0.81 and minimum sequential confidence of 0.69. The rules showed varied characteristics such as ''pain'', ''shame'', ''violence'', and items co-occurring in the same window, such as ''pain'' and ''violence'' implicating in ''negative'' emotions and periods of silence. Figure 3 depicts the connections of all sequential patterns discovered regarding the detected sentiments expressed by users in posts, comments, and replies. We visually represent the patterns using a Sankey Diagram, where each node (vertical bars) corresponds to an item (i.e., a sentiment), the links represent the co-occurrence of items in the same pattern, and every color represent a window of three days. The links between items of the same color (i.e., belonging to the same window) correspond to items that occurred together in the window period. Notice that, although the windows are not evenly spaced to improve the visualization of the link transitions (e.g. w = 1 has more links and takes more space from the figure), all windows have the same size in the pattern discovery modeling (3 days). Links between items depicted in different colors correspond to an emotional transition, from one window to the immediately next to one.

C. VISUAL EVALUATION OF EMOTION TRANSITIONS
The majority of items relate to negative emotions. Within the first window, different pairs of items that occurred together, such as ''violence'',and ''negative'', ''pain'' and ''shame'', ''violence'' and ''shame'', and ''violence'' and ''negative''. Windows 2, 3, and 4 presented item transitions only between different windows. Notably, the ''silence'' item corresponds to a period (in this case, a 3-day window) in which the user has no activity in the social network. The ''silence'' item has the highest number of connections considering windows 2, 3, and 4. This pattern indicates that most users tend to spend periods of silence before and after posting content related to negative emotions, such as ''violence'', ''pain'', ''shame'', ''negative''. Finally, pairs of items occurring together also appeared within window 5, such as ''negative'' and ''violence'', ''pain'' and ''violence'', and ''pain'' and ''shame''.

D. EVALUATION OF CONTEXTUAL CONCEPTS OVER TIME
The top discovered rules relate to mostly negative feelings. However, one of the main features related to human behavior is contextual information. In this experiment, we focus on contextual features and select the top rules containing at least one of the contextual emotion expressions considered in our study: • Contextual classifications: alcohol, crime, death, dispute, exercise, fight, friends, fun, healing, lust, neglect, pain, politeness, poor, sexual, violence, wedding, work Since most of the top discovered rules presented in the sequential patterns analysis have ''violence'' as a frequent item (see Tables 4 and 3), we omit this class in this analysis when it appears alone in the windows, to broaden the discussion regarding other contextual features.
The contextual items appear 385,019 times in the discovered patterns, and 260,106 patterns. Table 5 presents the top-10 discovered rules with contextual features. The top-10 rules showed features together in the same window, for w > 2, such as ''pain'' with ''violence'' and ''pain'' with ''shame''. The rules contained a minimum confidence of 0.80 and temporal confidence of 0.68. The rules call attention to how much the depressive pain and shame are evident in the patterns because even if the plot is different, these characteristics are strongly connected.

V. DISCUSSION AND TAKEOUTS FROM LITERATURE
Even though the literature shows that depressive users express more negative emotions than other types of social network users [24], [25], the longitudinal pattern of responses of people with depressive disorders in these networks is still unclear [28]. Longitudinal evaluations could contribute to a more satisfactory diagnostic identification, as they allow monitoring the individuals' behaviors that are rarely identified in a clinical assessment of a few minutes [51]. Studies indicate that the responses of depressive users may present different patterns [28], [52], and the identification of the characteristics of users, which justify these differences, still needs scientific investigation [51]. This study showed an increasing number of posts related to negative emotions over the observed interval (15 days). In contrast, positive emotions were expressed in isolation and only in this investigation's initial period. Difficulty in managing negative emotions is, in fact, common among people with depression. In general, they think repeatedly and uncontrollably about their depressive symptoms, their failure, and their negative experiences [53], and they will be able to find spaces in virtual environments to express this experience where they feel familiar and safe [54]. In addition to this excessive focus on symptoms being able to trigger a spiral of negative thoughts and emotions [53], sharing them on the network can attenuate the individual's view of problems and strengthen victimization by other users in an attempt to provide emotion-focused support. Some authors explain that posts related to negative emotions can increase VOLUME 9, 2021 the chances that other users will also post messages related to these emotions [55], which contributes to the co-rumination, perpetuation, and intensification of symptoms of depression among users [56].
Posts of content related to negative emotions by users are interspersed with periods of silence (absence of interaction on the social network), leading to the understanding that there are possible fluctuations in mood. Considering that active participation in the social network, with a more significant number of posts related to negative emotions, may show a period of intensification of depressive symptoms [54], the absence of activities could evidence a period of symptom stability. Some research indicates that mood instability is positively correlated with depression, contributing to the duration of the overall experience of depression and stressful events [57], [58]. Such instability can be part of a disordered depressed mood, as well as it can be part of other disorders, such as anxiety, borderline personality disorder, or substance abuse [58]- [60]. Therefore, the longitudinal assessment of users on the network is a way of measuring the overall experience of depressed mood, which is generally not identified during a regular clinical examination.
The contents that preceded the silence periods identified in this study were related to pain, violence, and shame. Once latent symptoms end up being expressed and identified on social networks, they are likely to be related to traumatic and stigmatizing experiences [61]. Depression is frequently observed among people who experience chronic painful experiences [62] and people who are victims of violence [63]. Nevertheless, just as many people express their problems related to depression on social networks, they post content related to the difficulty to live with the experience of pain [64], and which are characterized by relationships of abuse and violence [65].
Among the users participating in this study, the word ''violence'' was related to the words ''negative'' and ''pain''. In [66] the authors point out that experiences of pain and violence may be related and also corroborate the theory that stressful events, such as child abuse and domestic violence, may contribute to the etiology or maintenance of chronic pain [66], [67]. The word ''shame'' was also prevalent in the posts and was related to the word ''pain''. Shame is related to depression, and because it includes forms of embarrassment and humiliation, it can be related to social pain [68]. Shame is identified in the literature as the leading cause of rumination, mainly when events focus on damage to self-esteem, lack of control, or social inadequacy.
The framework proposed in this study described a set of sequential patterns of behavior discovered from depressive users' activities in the social network Reddit, which were also reported and supported by literature on mental health as evident characteristics in people with clinical depression. This framework can be a valuable tool for identifying people who are enduring depressive episodes.

VI. CONCLUSION
In this work, we proposed TROAD, a framework to evaluate the behavior of depressive users on social media. The framework collects user activity (posts, comments, replies, and emojis) from the Reddit depressive community /r/depression subreddit, preprocesses, and models the data into five consecutive windows of three days each. Accordingly, we analyzed the social activity as timelines of 1,212 users during 15 consecutive days, a period considered adequate by specialists to evaluate possible mental disorders, such as depression [3].
TROAD discovered sets of sequential patterns from user emotions and contextual features. The most significant expressions in the top-10 discovered rules are ''negative'' emotion, ''violence'', ''pain'', ''shame'', ''depression'', and ''sadness''. We also modeled the rule extraction step of TROAD considering silent periods, in which the user presented no activity (e.g., post or comment) in the social network during the observed interval. In this scenario, ''silence'' appears in most of the discovered rules. The obtained top-10 patterns presented strong rules that describe the emotional and contextual aspects of depressive behavior, considering both the presence or absence of user activity in specific periods. In summary, we highlight the following contributions of TROAD : 1) The framework can evaluate the emotional behavior of depressed users and provide emotional and contextual information from users' activities in a depressive community.
2) The flexible approach for user data discretization over time windows allows analyzing users' timelines as time series, considering the irregular non-normalized frequency of posts. 3) The framework provides a methodology for pattern discovery of sequential rules that describe the behavior of depressive users on social networks over time.
Regarding the faced difficulties and the lessons learned we faced in this work, we highlight the following:

• Finding a Database With Posts of the Users Posed
Sequentially: We observed that existing databases reported in studies from literature do not address temporal aspects of posts. As we are interested on the complete timeline of a user, this was a relevant difficulty. Accordingly, we opted to collect the data using the Reddit social network, which has thematic communities and an official API with policies less restrictive than other social networks.
• Representing the Rules Graphically, Due to the Large Number of Patterns Discovered in the Data: Specialists can benefit from visualization tools as the Sankey Diagram to support the understanding of temporal patterns. Further, alternative visualizations can be proposed to improve the readability of rules and sequential patterns, highlighting the individual level, or the collective one, regarding the entire community.
• Absence of User Posting in Intervals of Time: The absence of postings among time windows is common when analyzing depressive users, but the absence of data in our modeling could prejudice the discovery of meaningful rules. To overcome this problem, we considered the ''silent'' windows in our modeling, which allowed us to consider users without posts in a few intervals of time in our analysis. As future work, we intend to: • Employ the discovered rules to infer information from users on a Web or mobile platform in order to conduct a controlled experiment. In this case, with the users' consent, the application would synchronize the users' activity and allow their therapists to observe behavior changes over time. With user monitoring from public social media, the application will extract metrics that potentially assist specialists in assessing the effectiveness of the current therapy or even assist in changing the treatment approach. Likewise, it is possible to implement a general and public solution, considering that the major social networks do not allow to register official applications that perform sentiment analysis. If the system automatically perceives any depressive pattern, it triggers warnings so that mental health specialists can get in touch with the user and conduct an intervention.
• Explore the proposed framework for analyzing other mental health disorders, such as anxiety, schizophrenia, bipolarity, among others.
• Adapt TROAD to work in new environments and technologies to find depressive behavior patterns, such as in smart homes for indoor monitoring and mobile devices, to track outdoor behaviors. . Her research interests include database systems, image analysis, content-based retrieval, complex data, telemedicine and mHealth, and medical data analysis supported by image processing techniques.

ETHICS COMMITTEE APPROVAL
LUZIANE DE FÁTIMA KIRCHNER received the Ph.D. degree in psychology from the Federal University of São Carlos, the master's degree in behavior analysis from the State University of Londrina, and the bachelor's degree in psychology. She is currently a Professor in the undergraduate and postgraduate program with Catholic University Dom Bosco, and coordinates the Laboratory of Behavioral Studies in Health. She is interested in studying the relationship between behavior and health, focusing on chronic pain, professional-patient relationship, and health promotion in university students.
MARIA DE JESUS D. DOS REIS graduated in psychology from the University of Brasilia, in 1987. She received the master's degree in experimental psychology from the University of Brasilia, in 1989, and the Ph.D. degree in experimental psychology from the University of São Paulo, in 1997, with research implemented with a sandwich bag at the Shriver Center, Boston. She has been an Associate Professor with the Federal University of São Carlos, since 1989. She develops teaching, research and extension activities in behavior analysis, being the Counselor at the Postgraduate Program in Psychology. She leads a research group that investigates phenomena and processes pertinent to the relationship between behavior and health, focusing on aspects relevant to analytical-behavioral therapy.
AGMA J. M. TRAINA (Member, IEEE) is currently a Professor with the Department of Computer Science, Mathematics and Computer Science Institute, University of São Paulo at São Carlos. She has focused her research on medical applications supported by image processing techniques, and more recently on climate/agriculture and remote sensing data. Over the years, she has supervised over 50 Graduate students in these areas, and published more than 250 papers in journals and conferences. Her research interests include complex data indexing and retrieval by content, similarity queries to data visualization and visual data mining. She is a member of the Brazilian Computer Society, and the ACM and IEEE Computer Society.
ANDREW T. CAMPBELL spent ten years in the software industry leading the development of wireless packet networks and operating systems. He was a Tenured Associate Professor of electrical engineering with Columbia University, working on wireless networks. From 2016 to 2017, he joined Google to work on cardiovascular health, as a member of the Android Group. He has been a Visiting Professor with CMU Rwanda, the University of Salamanca, and Cambridge University. He is currently a Professor of computer science with Dartmouth College. At Dartmouth, he is a member of the Center for Technology and Behavioral Health and co-direct the DartNets Lab. He is a Visiting Research Scientist at Verily (formerly Google Life Sciences), working on mental health. JÓ UEYAMA received the Ph.D. degree in computer science from the University of Lancaster, in 2006. He was a Research Fellow with the University of Kent, Canterbury. He is currently a Professor with the Institute of Mathematics and Computer Science (ICMC), University of São Paulo (USP). He is also a Brazilian Research Council (CNPq) Fellow. He has published 55 journal articles and more than 100 conference papers. His main research interests include computer networks, security, and blockchain. VOLUME 9, 2021