Assessing Sleep Quality Using Mobile EMAs: Opportunities, Practical Consideration, and Challenges

Sleep is one of the most important factors in maintaining both physical and mental health. There are many causes of sleep problems, it is generally necessary to maintain a healthy lifestyle to avoid them. In the medical field, information related to sleep problems including lifestyle information is obtained through interviews, but this approach is limited because it is dependent on the patient’s memory. Thus, there are many studies adopting ecological momentary assessments (EMAs) to collect patient’s lifestyles. Some of them also use smart devices to collect data effectively. However, these studies focused on specific factors such as smoking, exercising so that they have limits to reflect complex narrative of lifestyle patterns. Therefore, we proposed indicators consist of EMAs data for assessing everyday sleep quality and these indicators contain the complex lifestyle contexts in a quantitative manner. First, we collected real-life data using a smartphone through a 4-week data collection experiment. Second, we develop a method of generating daily indexes reflecting geospatial and social habits, social condition, activity level, and emotional condition using self-report data. Third, we evaluated daily indexes whether could use to supplement indicators comprising features using EMAs from conventional sleep questionnaires. The goal of analysis consists of five metrics of sleep quality that explain perceived sleep quality. The result of analysis indicates that features using both daily indexes and sleep questionnaires lead to better prediction of sleep quality. Additionally, it also shows the potential to generate indicators identifying complex human behaviors with the help of mobile devices and EMAs. Further research on user-friendly data acquisition methods and more diverse lifestyle information should be useful to support behavior decisions for better sleep in well-being services and in specialized medical fields.

affect sleep, while complex factors that affects sleep quality are analyzed evaluate through questionnaires and interview in clinical area. The data analysis by experts has been conducted to identify the causes of sleep disorders through the use of sleep diaries written by patients in the clinical area. However, such diaries generally report on a very limited set of items related to sleep hygiene, such as nap times and the timing of alcohol intake prior to sleeping, therefore sleep diaries alone are limited in their ability to provide sufficient information that causes sleep disturbance [4]. According to a forgetting curve of Hermann Ebbinghaus, humans only remember 33% of learned facts after a day, 28% after two days, and 21% after a month [5]. Therefore, complex factors such as lifestyle are difficult to obtain objectively through interviews because information loss inevitably occurs in questionnaires about the state of the past week or month. On the other hand, ecological momentary assessments (EMAs) is developed to collect in-the-moment data. The ground truth of sensor data to recognize personalized context [6], social anxiety [7]. In psychology area, EMAs is applied diary method that has been used for a long time. For participants' long-term adherence, there was an attempt to collect more accurate contextual information by providing efficient recall [8]. Lifelogging, which is understood to reflect a different approach to digital self-tracking and the recording of everyday life [9], can capture the lifestyle and analyze the human life by collecting physiological as well as behavioral data. However, the existing method analyzes lifelogging data for the purpose of constructing knowledge for a specific purpose. For example, user behavior recognition in a house using the video-based lifelogging technology offers proper information for ambientassistive living [10], but they don't consider complex lifestyle information. However, lifelogging data collected from sensors and automatically analyzed, self-reported data by the users can also be a good source of information describing an individual's situation and analyzing other sensor data [11]. The information contains a person's habits, emotional condition, the social condition can be a key source for caregivers, medical practitioners, etc. It also provides assistive means for important decisions about improved, long-lasting independent living [12]. Accordingly, we adopted lifelogging based on EMAs to collect complex behavioral and context data of peoples' lifestyle. The analysis to comprise information such as activities one engages in, people that one meets, the time at which one wakes up, or the surroundings, and time stamp is essential to predicting sleep quality. To the best of our knowledge, it is hard to find studies on sleep quality prediction that collects objective data and reflects an individual's everyday context. Therefore, we proposed the indicators to have an effect on sleep quality based on lifestyle from self-report data. We collect self-reported data with a smartphone app, and proposed the method to extract daily indexes to reflect complex lifestyles and also extracts domain knowledge base features used in previous studies. Then the proposed model to find significant indicators for representing sleep quality for the next day is examined. The predictive goal consists of five metrics of sleep quality: subjective sleep quality, sleep disturbance, dream issues, feeling after sleep, and body condition after sleep.
The contributions of this study are as follows. First, we derived our results from the collection of real-life EMAs data, as opposed to data from an experimental environment. In this context, ''real-life'' refers to information that is obtained from a natural context and not from an artificially constructed environment. Second, we developed a method for generating reliable and complex information on the personal lifestyle that does not depend on human memory. Finally, we explored which personal lifestyle indicators have a significant influence on sleep to suggest how individuals could improve their lifestyles for better sleep. The remainder of this paper is structured as follows. Section 2 presents previous studies using behavior data and electronic data for representing sleep quality. Section 3 describes the app design and experiment design for data collection. Section 4 introduces a research model framework, daily index development, and data analysis methods. In Section 5, the results of data analysis are explained and the importance of daily indexes proposed in this study is discussed. Finally, in Section 6, we discuss how the results of our study can support sleep management and how they can be applied for future studies. The limitations of our approach also are discussed.

II. PREVIOUS STUDIES AND BACKGROUND A. SLEEP QUALITY AND BEHAVIORAL DATA
Studies conducted on improving the quality of sleep have concluded that it is necessary to improve related lifestyle habits or develop exercise habits. For persons who have sleep disparities, behavioral change is imperative according to proper guidelines such as Cognitive Behavioral Therapy for Insomnia (CBT-I) [13]. Exercise is also known to affect sleep quality and its effects have been studied on patients with sleep disorders [14], [15]. The other important factor affecting sleep quality is lifestyle regularity [16]. A number of studies have also focused on the relationship between sleep quality and habits. Studies involving college students, for instance, have indicated that smoking, exercise, and study time affect sleep quality [17], [18]. This study indicates that lifestyle habits have an effect on sleep quality based on sleep questionnaires. However, data collection from questionnaires could not elude likelihood of retrospective bias. Other previous studies have focused primarily on analyzing major behaviors that affect sleep using self-report data, because sleep is affected in a complex manner by various factors in daily life. It is necessary to identify more behavioral factors to obtain useful information for improving sleep quality. It is also the case that the increasing difficulty of managing individual lifestyles has enhanced the demand for technological help. Thus, various studies have made efforts to better understand the lifestyle factors related to sleep. Baron et al. (2018) noted the accuracy problems inherent to the existing method involving the use of sleep diaries and the lack of a sufficient body of evidence-based research [19]. The patients who participate in a sleep diary, record their write date, day of the week, type of day (work, school, day off, or vacation), what they eat and drink, and the time they sleep every day. However, the results of these previous studies reveal that there has been little research on the relationship between comprehensive daily context information, such as working intensity, social condition, visit an unfamiliar place and so on. For example, although the results of previous studies suggest the usefulness of exercise or quitting smoking, they provide little insight into the various causes that can affect one's sleep. To obtain such information, it is necessary to collect and organize complex data that reflect an individual's everyday context, which can comprise information such as activities one engages in, people that one meets, the time at which one wakes up, or the surroundings, time. To extract information that is difficult to obtain objectively through interviews, it's shown to be possible that data-driven analysis identifies the causes of sleep disorders through the use of sleep diaries written by patients. However, such diaries generally report on a very limited set of items related to sleep hygiene, such as nap times and the timing of alcohol intake time prior to sleeping, and are therefore limited in their ability to provide sufficient information concerning the behaviors that cause sleep disturbance [4]. For this we proposed the daily indexes to reflect complex lifestyle pattern from the data in the wild.

B. SLEEP QUALITY PREDICTION USING MOBILE DEVICES
There are myriad studies about sleep monitoring using smart and wearable devices. For example, smartphone applications and sensors are widely used for sleep monitoring, activity monitoring for behavior change strategy and etc. They investigate the sleep quality measured by snoring, changes in breathing, movement, and illuminance during sleep [20], [21]. Nevertheless, the studies develop indicators that represent complex lifestyle and habits that affect sleep quality are impoverished. EMAs were modified by the assisted recall and utilized in healthcare [8]. For example, the evidencebased smartphone application was developed for identifying sleep/physical activity patterns to suggest personalized positive airway pressure therapy and behavior change strategies [22]. They conduct a 60-day experiment with patients and collect physical data with a wristband to test the effectiveness of personalized therapy and behavior strategy. Melbye et al. (2020) investigated associations between daily smartphone-based self-monitored mood, activity, and sleep for diagnostic evaluations and early interventions of patients with bipolar disorder [23]. The application prompted the user to fill out the self-monitoring once a day and the items are consisting of nine that are mood, sleep, activity, medicine, anxiety, stress, cognitive problems, irritability and note. Using smartphone text messages, Garcia et al. (2014) studied perceived sleep and mood for Latina adolescents. They concluded that inadequate sleep has been associated with negative outcomes of mental health, physical health, academic behavior and other behavior problems [24]. The study examined the association of sleep before and during a chemotherapy cycle for breast cancer also adopted EMAs using the personal palm computer [25]. The participants responded to beep signals four times a day. They collect electronic data, while the participants have to use devices provided by the researchers not smartphones of their own. The efficient way of data logging should be considered even though the ease of use was not a major issue of this study. The sleep diary has been evaluated as a method that enables objective data collection on the patient's daily life [26]. Several studies are using a mobile app for writing sleep diaries. Although studies such as theirs have identified the need to use technologies such as mobile apps for the efficient and reliable collection of data, current sensor-based studies for analyzing sleep quality focus on self-awareness of sleep activity and the amount of activity during the day ather than on lifestyle [27]- [29]. The efficient and secure method for EMA can achieve immediate awareness of responses, as well as monitoring compliance daily. The study of lifestyle behaviors associated with sleep quality for understanding of the causal relationships to be achieved [30]. Therefore, we developed the smartphone app for collecting labels of the daily context of users.

III. DATA COLLECTION A. BEHAVIOR CONTEXT LABEL DESIGN
Behavior label design plays a very important role in data collection because it uses a smartphone app that can be entered immediately after behaviors occur. Although various studies have been conducted on the human context of the concepts of ''lifestyle'' and ''quality of life,'' definitions of these terms can be obtained from an examination of time use surveys conducted by the national statistical offices of various countries. In 69 countries around the world, time use surveys are used to obtain basic data for measuring lifestyle and quality of life in terms of activities engaged in over the 24-hours course of a day [31], [32]. Each country's behavioral classification system has been developed in consideration of the international standards at the time of development, the unique characteristics of each country, such as cultural characteristics and social structure, and changes in life patterns arising from technological development. Therefore, the timeuse survey is a proper reference to define user behavior labels of this study. The Korean time use survey is conducted once every five years and, as of 2019, the most recently updated classification of the survey is the 2014 time use survey behavioral classification [ Table 1].
In the time use survey, the definition of temporal location is considered to be important.  by the Korea Statistical Office in 2014 were applied. In the case of social interaction items, subjects were recorded using a separate system for the convenience of the experimental participants in this study. We also added a further item concerning emotion and location to reflect behavior according to its sleep quality relevance. After an initial experiment in 2018, leisure satisfaction was added as a subjective well-being item to the Korea Statistical Office time use survey conducted in July 2019, along with information and communication technology device usage [34]. Considering the addition of these items, we recognized that it would be necessary to collect emotion-related items in our time use survey. Therefore, labels relate to emotion were designed to arousal and valence on seven-level Likert scales.

B. SLEEP LABEL DESIGN FROM SLEEP QUESTIONNAIRE
In this study, survey data were collected each morning from test participants on their quality of sleep the night before and also collected each night on behaviors of day-time that are known to affect sleep at night. The morning questionnaire measured the subjectively evaluated quality of sleep using questions obtained from the Pittsburgh Sleep Quality Index (PSQI) [35]. The items of morning sleep questionnaires consist of five: subjective sleep quality, sleep disturbance, dream issues, feeling after sleep, and body condition after sleep. These items are predictive goals of this study. The evening questionnaire measured sleep hygiene behaviors [36], which are used as a behavioral treatment for the sleep disorder and behavior-related items. Sleep diaries [13] were used to collect data on behaviors known to affect sleep [37]. Our sleep questionnaire was constructed using items utilized in previous studies. However, while existing sleep questionnaires involve the recapitulation of sleep conditions over the past two weeks, the sleep questionnaire used in this study solicited responses each morning and evening to minimize memory-related distortion. Accordingly, we conduct a sleep questionnaire every morning and night.

C. EMAs DATA COLLECTING APP DEVELOPMENT
We adopted ecological momentary assessments (EMAs) that are one of the primary methods to collect in-the-moment data to compensate for the shortcomings of retrospective measuring and data collection of conventional sleep questionnaires. The smartphone app is designed for taking the user's context as it occurred. The set of behavioral context items comprised action, behavior, social state, conversation status, place, transportation, and smartphone location, and a selection based on the inclusion of emotion and place to engage with behaviors-a statistics item defined under the 2014 time-use survey conducted by Statistics Korea. The emotion item was designed to input emotional arousal and valence on seven-level Likert scales. The data collecting app was developed for use on a Samsung Galaxy9, and Galaxy9 phones with apps were distributed to all participants. Since the experiment was conducted on Korean adult males and females, the language used in the smartphone app is Korean. The input screen consists of morning sleep questionnaires, behavior labels, and night sleep questionnaires. To help understand the item composition, the Korean version app design has been translated into English as shown in Figure 1. The user can finish writing just by tapping the item on the smartphone app screen that corresponds to the current behavior. Selected items are displayed in red. On the logging history, users can see the start and end times of the behaviors they entered. The app sends data to the Mongo DB server every evening at 9 pm. Based on this data log, the researchers provided feedback related to the data input of the subjects so that the experiment was conducted properly.
To acquire data on sleep quality, items through which the experimental participants could answer questions on sleep quality and sleep-related behavior in both the morning and evening were included in the app. The items entered using the smartphone app are listed in Table 2.

D. DATA COLLECTION EXPERIMENT AND DATA STATISTICS
We submitted our research proposal to the e-IRB system of Korea National Institute for Bioethics Policy and it was approved with reference no. P01-201908-22-002. We collected data from 30 subjects for 14 days. All personal information such as the subject's name and phone number that can identify the individual was destroyed after the experiment. Subjects were informed in advance of the fact that data about their activity location and life pattern were being collected. The first data collection experiment was carried out on 15 subjects over two weeks starting on October 31, 2018. The second experiment was carried out on 15 subjects for two weeks starting on November 15, 2018. For at least 12 hours each day over the 14-day of the experiments, each participant selected answers to multiple-choice items regarding their activities and situation whenever a specific event occurred. Each morning, the participants answered a questionnaire regarding five metrics of sleep quality and each evening, they answered a questionnaire reflecting domain knowledge related to their daily mood, stress level, etc. The experiment carried out each day (which was repeated for 14 days) is summarized as follows: 1) Wake up in the morning and complete the morning sleep questionnaire with a smartphone app; 2) Start logging behavioral context labels with a smartphone app; 3) Engage in at least 12 hours of behavioral context label input; 4) End behavioral context label input; 5) Complete the sleep questionnaire with a smartphone app and then go to sleep. During the two-course of the experiments, 14 days each, 30 participants entered a total of 6,081 behavioral context  Table 3 lists the distribution by age, gender, and an occupational group of the participants. Those in their 20s accounted for the largest proportion of participants with 63%. The ratio of males and females was not significantly different at 60:40. In terms of occupation, the largest group was students, followed by workers in the education service and professional/science technology service industries. The job groups that produced the most valid data were, in order, students, education service, and professional/science and technology service.

IV. PROPOSED METHOD AND DATA PREPARATION A. RESEARCH FRAMEWORK
The study involved data collection, preparation data including data cleaning and preprocessing data, predictive goal and extracting features from the data, and finally analyzing the results using the extracted features with five proposed models 2.
The daily indexes and the domain knowledge-based (DKB) features are extracted as shown in Table 4. Each of the daily indexes was scaled as categorical variables based on a value distribution for each user with a three-level Likert-Scale. The daily routine feature is created by preprocessing the behavioral context labels to calculate the usuality index which is one of the daily indexes. Then, behavior, place, and emotion items are used to generate other daily indexes. The DKB features are extracted from the night questionnaires adjusted by a three-level Likert-Scale to match the level with the daily indexes. We establish five models to evaluate the daily indexes and DKB features. The following five models investigate the importance of the daily indexes in predicting sleep quality and the significance of all individual features used.
(DKB feature model) The DKB feature was set as a baseline model to assess its predictive power of the method of collecting the data immediately after the occurrence of behavioral events, not by the memory-based questionnaire.
(Daily index model) The daily indexes attempt to reflect the complexity of lifestyle considering psychological, behavior, location and social routine. Daily index model excludes expert knowledge and only considers features through data analysis. Therefore, the effect that can be obtained only by data analysis can be examined.
(All-feature model) All-feature model contains both DKB features and the daily indexes. The model evaluates to investigate the additional effect of adding the daily indexes to conventional features.
(Important-feature model) We also examined the model obtained by reducing the number of features based on the feature importance to compare the significance of the daily indexed and DKB features. In case of decision tree and random forest classifier, the feature importance values of all features were sorted in descending order, and features with a cumulative sum of 0.8 were included in the model. On the other hand, when logistic regression is applied, features with p-values of 0.05 or below were selected.
(Lifestyle Factor model) The last model has an attempt to enable a more intuitive understanding of the results. The factor analysis was applied to reduce the number of dimensions of the DKB feature and daily indexes to construct the lifestyle factor model. First, we evaluated its ''factorability'' which represents the data that is proper for applying factor analysis. We adopted Bartlett's test of sphericity as well as the Kaiser-Meyer-Olkin (KMO) test to assess factorability or sampling adequacy. The p-value of the Bartlett test is 0 and the overall KMO result for the data is 0.68, respectively, indicating that 2068 VOLUME 10, 2022  factor analysis could be properly applied to our dataset. The number of factors was then determined using a scree plot of the factors and their corresponding eigenvalues. Based on the scree plot, four to eight factors were considered. Three of the four factors-life stress, caffeine drink intake, and usual life pattern-had values of Cronbach's alpha, which represents the degree of intrinsic agreement among the variables adopted for a factor, of 0.6 or greater. Anderson and Gerbing (1988) argued that there is an appropriate discriminant validity between constituent concepts when the correlation between their sub-factors is between 0.3 and 0.7 [38]. The correlation between these three factors was in the range of 0.34 to 0.71 at a statistically significant Pearson correlation level (p < 0.01), indicating discriminant validity under Anderson and Gerbring's criterion. Consequently, the three factors were defined as factors for use in the lifestyle factor model.
(sleep quality) The sleep quality were constructed from items used in previous studies, as shown in Table 5. The points are scaled with a three-level Likert-Scale.

V. PRELIMINARY CONCEPTS A. USER EPISODE WITH BEHAVIOR CONTEXT
An episode representing a user's day can be expressed as a behavioral context sequence indexed by a timestamp. The behavior (B), social state (S), and place (P) corresponding to the place of activity-which can be used to identify the context related to the user's activity-can be collected as follows: A social behavioral context (social BC) and a geospatial behavioral context (geospatial BC) that describe the user's state can be generated using the B, S, and P items; represents a combination of two items. The geospatial and social BCs for each day are generated as sequence data as follows: where the sequence id I= (1,2,3 A user's daily episode is defined as the union of their behavioral contexts, which is expressed as sequential data as follows: The user episode consists of the geospatial episode and social episode. The geospatial episode is generated by a sequence of geospatial behavioral contexts. Similarly, the social episode is taken by a sequence of social behavioral contexts (Table 6).
If the data collection period is increased, user episodes can be generated by adding more emotion, transport, and communication label data to the behavior label data. In this study, only geospatial and social episodes from which meaningful results could be derived within the context of the two-week data collection period were generated.

B. DAILY ROUTINE EXTRACTION FROM USER EPISODE
An individual's daily routine determines the social rhythmicity of their everyday life and is, therefore an indicator of their lifestyle [39]. The daily routine can be defined as a frequent pattern of sequence data constituting episodes over a specific set of periods. The closed sequential pattern mining with time constraints from a time-extended sequence database was used to derive frequent patterns [40]. To apply their algorithm, geospatial and social BCs for each subject were derived from data collected over the two-week experiment. Each BC item was numerically coded as shown in 3, in which each line corresponds to a different experiment date and the separator '-1' means the end of BC sequences and <n> delimits different time-slot n. This 14-line number coding is used as input data for sequential analysis by expressing the sequence of the 14-day behavioral context of each participant by time slot.
The closed sequential pattern mining is conducted with SPMF, an Open-Source Data Mining Library developed by Fournier-Viger (2021) [41]. We set the min support to 40% and divided the data by day to derive patterns. The result of analyzing the input data is shown in Figure 4. #SUP is the support value of each sequential pattern and denotes the number of times the sequential pattern appears during the 14 experiment days. For example, the first row of Figure 4 represents the frequent sequence with a support value of 7. In this example, the pattern that is ''Breakfast: Personal Services at Home → Housekeeping at Home → Transport at outdoor → Working at Workplace/school, lunch: Eating/Snack at Restaurant, afternoon: Housekeeping at Home →Transport at outdoor'' is most routine of this participant.
The derived frequent geospatial and social BCs were then used to calculate the ''usuality'' indexes of the daily indexes.

C. DAILY INDEX
In this study, the daily indexes are proposed to quantitatively express various individuals' contexts of lifestyle for sleep quality prediction using user episodes and daily routines extracted from self-report data. The daily indexes comprise five indexes: two usuality indexes and a social, activity, and valence index, all of which are defined on a time basis to express the user's lifestyle in multiple dimensions. The usuality index indicates how similar an episode is to the user's People tend to experience some days as following a ''usual'' pattern and others as differing from that pattern. ''Usual'' days tend to comprise a series of frequently experienced events, while on unusual days the individual will experience these frequently occurring events in either a different order or with a reduced frequency. The degree of usualness of a daily routine can be quantitatively calculated using the frequency of BC sequences: UsualityIndex = RoutineTime TimeforAllBehaviors (6) where the Routine time duration is the duration of events belonging to a routine, and the Not routine time duration is the duration of events not belonging to a routine. The routine time and not routine time duration are taken by the daily routine extraction method described in subsection B. Accordingly, the geospatial usuality index and social usuality index using geospatial BC and social BC respectively can be calculated in the same manner.

2) SOCIALITY INDEX
A user's social interaction can be expressed quantitatively in terms of the ratio of time during social time duration to the overall time, including time spent interacting with other people: The sociality index value derived in (7) corresponds to how social the individual's day is.

3) ACTIVITY INDEX
Studies of the relationship between activity level and depression have been shown that less active behavior corresponds to a higher likelihood of depression [42]- [47]. According to the results of these studies, we propose a method to quantify the activity level using self-report labels. The activity index is calculated as the proportion of dynamic activity to overall time by dividing the episode into dynamic activities in which the subject moves frequently, concentrates on activities, or is socially active and static activities in which the subject is less active or is socially inactive. These behavior labels can be classified as follows: The dynamic time consists of behaviors that ''Transport, Eating/Snack, Commute, Personal Services, Leisure, Sports, Housekeeping, Culture/Traveling, Regular Activity, Hobbies, Meeting, Social Activity''.

4) VALENCE INDEX
To quantitatively indicate the degree to which emotional positivity is maintained over a day, the valence index is defined as the percentage of time during which the user maintains positive emotion. In this study, seven-level Likert scales of the valence of emotion are rescaled into 3-level which are positive, neutral, and negative.: The positive time duration is for the label which is 1,2,3 valence score.

D. DKB FEATURES
Sleep disorders have been found to be related to lifestyle, albeit through a complex interaction of various factors. In previous studies, caffeine [48], alcohol [49], sedentary behavior [50], and poor exercise habits [51] were all found to affect sleep quality. The concept of sleep hygiene has been developed as a tool for providing medical advice to patients suffering from sleep disorders. Sleep hygiene involves recommendations on the consumption of contraindicated substances, diet, exercise, and changing the sleep environment. Some studies have also shown that sleep problems occur when stress [52] or emotional problems [53] occur. This study examined the behavior-related items that affect sleep using sleep behavior questionnaire items. Based on the responses, a DKB feature was constructed and defined as a medically valid sleep-related behavior indicator. The DKB feature comprised 12 components: feelings experienced at night, stress level, body condition at night, exercise intensity, breakfast satisfaction, lunch satisfaction, dinner satisfaction, nocturnal eating satisfaction, morning caffeine drink consumption, afternoon caffeine drink consumption, alcoholic drink consumption, and the occurrence of unusual events (Table 7).

E. SLEEP QUALITY
Doctors consider several items in diagnosing sleep disorders, and questionnaires using these items have been used in observational studies and evidence-based clinical research on patients with sleep disorders. Among these is the PSQI, which comprises items relating to the causes of sleep disorder and has been used in various sleep disorder studies [58]. However, because it requires respondents to report on their sleep activities over the previous 30 days based on memory, the accuracy of the PSQI has been disputed. In this study, PSQI items related to sleep disorder were input through the smartphone app each morning through the morning sleep questionnaire (Table 8).

A. EXPERIMENTAL RESULT OF SLEEP QUALITY PREDICTION
We explore which model and classification are proper for sleep quality, the logistic regression (LR), decision tree (DT), and random forest classifier (RF) are applied for each proposed model. We randomly sampled 70% of all user data as our training data and the rest 30% were used for training. The results applying LR, RF, and DT were improved through a random search process across 4,320 settings and a grid search using 288 combinations, with 100 iterations of threefold cross-validation performed in each case. By applying DT and RF, categorical features are processed via one-hot encoding. In summary, there are 60 analyses that each model predicts five metrics of sleep quality, and three analytic methods are applied for each to investigate the appropriate model and analytic method. The accuracy results are shown in Table 9. Among the five models, the results indicate that the all-feature model had been shown the best accuracy to predict every sleep quality with all analysis methods. The second-best accuracy by the model is the important feature model. This result indicates that both the daily indexes and DKB features are significant to predict sleep quality. Lifestyle factor model using reducing dimension could be premature due to the lowest accuracy of it for all analyses. In terms of the analysis method, RF showed the best performance in terms of accuracy compared with other analysis methods. It can be concluded that the daily indexes have the potential to predict sleep quality. We examine the significance of the daily indexed and DKB features with the important features model. In the case of DT and RF, the feature importance values of all features were sorted in descending order, and features with a cumulative sum of 0.8 were selected using the results of the all-feature model, the feature importance values of all features were sorted in descending order, and features with a cumulative sum of 0.8 were included in the model. On the other hand, when LR is applied, features with p-values of 0.05 or below were selected. There are five daily indexes and eleven DKB features. In Figure 5, The grey columns represent the total number of important features and the green columns represent the number of important daily indexes among them. An exploration of the selected features by type revealed that the daily indexes were adopted as important features in all prediction results except under the sleep disturbance prediction obtained by applying LR. Under DT and RF, in particular, at least three of the five daily indexes were adopted, empirically suggesting that the daily indexes are significant in predicting sleep quality. The accuracy of the important features model also shows the secondbest result and this indicates the significance of the daily indexes.
The sleep quality in this study is described by five categories of subjective sleep quality, sleep disturbance, dream issues, feeling after sleep, and body condition after sleep. Among the analysis method, RF had shown the best performance. Therefore, we examined the accuracy of each model to predict which sleep quality is better when the RF is applied as shown in predicting the other metrics of sleep quality with an accuracy of 0.97. The important feature model had a pattern of accuracy similar to that of the all-feature model, with prediction accuracies ranging from 0.93 to 0.97 for all sleep quality. Overall, the various models had slightly different prediction accuracies. The DKB features, daily indexes, and lifestyle factor models all performed worse in predicting body condition after sleep than in predicting the other metrics of sleep quality.
The model proposed in this study is to predict the sleep quality of tonight with features that represent an individual lifestyle. Therefore, the interpretation of the analysis result consists of an individual's behavior, feeling, and habit. In terms of analysis method, RF performed best for all analysis models. An examination of the decision tree created by applying RF indicates the advantage of the algorithm in terms of being able to track the lifestyle paths that reach the sleep quality. As an example, the decision tree generated as a result of predicting ''dream'' by applying RF to the allfeature model. By investigating DTs such as this, it is possible to determine which lifestyles are good and bad in terms of predicting a given sleep quality. Following each path can be interpreted as follows:  • (feeling not bad before sleep) & (usual behavior in usual place) & (drinking some booze) & (having some coffee in the morning) → satisfied after sleep: If a person has some coffee in the morning and has done routine tasks at places as usual, drink some booze, and feels not bad at night today, then a person will sleep well.
• (feeling bad before sleep) & (unsatisfied dinner) & (the day was full of negative feeling) & (meeting so many people in a day) → unsatisfied after sleep: If a person feel negative, meets so many people, eat dinner in a bad mood and feels bad at night today, then a person won't be able to sleep well. This example shows the lifestyle indicators of a day to affect the sleep quality for an individual that gives useful information to support clinical or behavior decisions for caregivers, medical practitioners, and healthcare service users.

VII. CONCLUSION
This study proposed indicators consist of EMAs data for assessing everyday sleep quality and these indicators contain the complex lifestyle contexts in a quantitative manner. The EMAs data is an event-contingent method that are more objective and reliable compare to the data with the likelihood of retrospection such as interviews and questionnaires. Due to privacy, technical issues, etc., behavior data that represent complex lifestyle narratives has relied on long-term memory based on the application of questionnaires or interviews in clinical research. Many studies attempted to collect reliable data from sensors and mobile phones and they have shown the reliable model to recognize activities and the environment. However, previous studies did not derive information that sufficiently reflected the complexity of human lifestyles. To overcome these gaps, this study focused on the collection of EMAs data reflecting a variety of lifestyle items from 30 individuals over two weeks in their real-life settings. Based on questionnaires developed by medical experts, we developed a mobile app that enables behavior logging when an event occurs. The daily indexes are also proposed to reflect the complex lifestyle contexts related to psychological and social habits. Among the daily indexes, a usuality index reflects the effect of behavioral sequences on an individual's daily routine through statistical analysis. To investigate the predictive power of features we proposed, five models were constructed and analyzed. Among five metrics of sleep VOLUME 10, 2022 quality as predictive goals, analysis of the model using EMAs data revealed that all five analysis models achieved their best prediction result for the ''dream'' issue; the models also showed the next best results at predicting sleep disturbance, feeling after sleep, and subjective sleep quality. In terms of the usefulness of the daily indexes proposed in this study, we proposed five models and the all-feature model, which includes the DKB features and the daily indexes, achieved the best performance overall. The DKB feature is important in predicting the sleep quality, as well as the daily indexes indicating the complex lifestyle context that derived from only the data analysis is also critical in this respect. A comparison of the performance of each sleep quality prediction algorithm revealed that RF performed best of all models. Furthermore, the decision trees derived by applying RF can provide a user with information on the psychological, behavior, location and social paths leading to good and bad sleep quality. These results can be applied to related services by enabling behavior decision support for better sleep. However, the model proposed in this study has a limitation in that it cannot predict long-term trends by predicting sleep on a daily basis. The results of this study open up the possibility of using a mobile app to get information about more complex lifestyle contexts through the EMAs data in the wild. Furthermore, our method should be useful for enabling the quantitative evaluation of qualitatively generating lifestyle habit data with EMAs. The features from the proposed method can identify indicators to predict sleep quality. Thus, this method can be applied to decision support for proper clinical intervention for the sleep problem. However, the analysis results must contain more specific information to generate information that is useful individually or by clinical experts. It will be necessary for future studies to further increase the duration of data collection and add other factors not considered in this analysis, such as mobility and eating habit information. As the data collected becomes more diverse and the collection period increases, securing the reliability of the collected data will become a significant issue, and further research on data collection methods will be necessary. Furthermore, since the self-report method also places a burden on users, research on how to collect complex lifestyle context data without user intervention should be considered in the future.