Ubiquitous Depression Detection of Sleep Physiological Data by Using Combination Learning and Functional Networks

Nowadays, depression has become a common mental disorder with high morbidity and mortality. Due to the limitations of traditional interview-based depression detection, it has become an urgent problem to realize objective, convenient and fast detection. This study is to explore ubiquitous methods of depression detection based on combination learning and functional networks, using sleep physiological data. Sleep physiological data were collected using a portable physiological data instrument, and then preprocess and extract several related features. We applied combination learning to discover the best sleep stage, the optimal features subset, and the most effective classiﬁer, which are hidden behind physiological features, to detect depression. Physiological features in the optimal feature subset based on Euclidean distance are mapped to nodes to construct the functional network. The optimal feature subset was combined with the functional network attributes as the input of the most effective classiﬁer to get the ultimate performance of depression detection. Controlled trials based on ubiquitous sleep physiological data were conducted on different genders. Experiments show that the best results for male and female were derived from slow wave sleep (SWS) and rapid eye movement (REM), with performances of 92.21% and 94.56%, AUC of 0.944 and 0.971, respectively. Thus, our study may provide an effective and ubiquitous method for detect depression.


I. INTRODUCTION
Depression is a common psychological disorder, which is characterized by persistent slow thinking, impairment of brain function and sleep disorder. According to the statistics of World Health Organization (WHO), there are about 340 million people with depression in the worldwide [1]. It is estimated that about 8,000 people die each year due to depression [2], and up to 53.7% of suicides suffer from depression [3]. Currently, depression has quietly become the number one killer of mental disorders due to its high morbidity and mortality. In addition, compared with healthy The associate editor coordinating the review of this manuscript and approving it for publication was Nuno Garcia . people, patients with depression usually have a significantly increased risk of cardiovascular, cancer, stroke and other diseases [4]. Considering the predictable consequences of these diseases, the potential harm of depression is also amazing. Therefore, depression threatens the well-being of millions of patients and their families.
Clinical proved that it is very difficult to completely cure severe depression, so early diagnosis and early treatment to avoid exacerbations over time is the most effective way to deal with depression [5]. The traditional diagnosis methods of depression are mainly based on face-to-face clinical interviews and structured questionnaires, such as Patient Health Questionnaire (PHQ-9) [6], Beck's Depression Inventory (BDI) [7], and so on. However, the traditional diagnostic method is time-consuming and labor-intensive, which takes about 30 minutes per patient [8]. In addition, due to the influence of subjective factors such as doctors' experience and patients' concealment, misdiagnosis is easy to occur. Thus, to explore an objective, effective and convenient method is an urgent problem for depression detection.
Considerable studies shown that physiological electrical signal, functional magnetic resonance imaging (fMRI), facial expressions and sounds of patients with depression have significantly changed compared with normal people. Although these physiological and behavioral abnormalities cannot directly help the treatment of depression, they provide a new perspective for the detection of depression. Then many researches began to explore effective detection methods from the perspective of speech speed and volume [5], facial dynamic analysis [9]. Other researchers focus more on physiological electrical signals [10] and fMRI [11] to avoid patients with depression deliberately concealing their real reactions during behavioral measurement. The abnormality of physiological electrical signals or fMRI is the external manifestation of depression. fMRI has a high spatial resolution, it is hampered by relatively low temporal resolution. The higher temporal resolution of physiological electrical signal, making up for the deficiency of fMRI. In addition, considering the universality of depression detection based on physiological electrical signal to overcome the inconvenience of fMRI measurements, this study pays more attention to this type of detection method.
Physiological electrical signals is the general term for the potential difference between inside and outside the membrane produced by human cells in quiet or external stimulation. Typical physiological electrical signals include electroencephalogram (EEG), electrooculogram (EOG), chin electromyogram (EMG) and electrocardiogram (ECG). Physiological electrical signal acquisition has the advantages of non-camouflage, non-invasive, safe and low cost. So it is widely used to detect various diseases. EEG-based deep learning method is applied to detect insomnia [12]. The severity of Parkinson's disease is detected by analyzing the instability of EMG on both shoulders with wavelet analysis [13]. ECG detects emotional changes via empirical mode decomposition is proposed in [14]. Because of the non-camouflage of physiological electrical signals, the research on the relationship between physiological electrical signals and mental diseases, especially depression [15]- [17], has gained more and more attention and has become a research hotspot in recent years.
To avoid the interference from other factors, the study of depression generally uses resting state physiological electrical signals such as EEG [18]. The essence of resting state is state of quiet, relaxation, lucidity and closed eyes. Inspired by the idea of resting state, considering that sleep is the most basic need of human and other mammals, it is also a state of quiet, relaxation and temporary separation from the surrounding environment [19]. Sleep state can be regarded as a steady state, even more stable than resting state. Modern scientific shows that depression is closely related to change of sleep structure and about 90% of patients with depression have sleep problems [20]. There are also many studies have demonstrated that sleep structure and state is one of the important indicators to detect depression [21]- [23]. Therefore, the study of sleep physiological signals provides a potentially feasible approach for the detection of depression.
So far, the pathological explanation of depression is not completely clear. Meanwhile, the relationship between sleep physiological signals and depression is still in the exploratory stage. There is no uniform method best for depression detection based on physiological signals that is suitable in all situations [15]. In solving this problem, combination learning [24] has the advantage and position of exclusiveness. In addition, with the continuous deepening of functional network theory [25] and wearable technology [26] in the research of mental disease detection recently, it provides a new perspective for the detection of depression. Based on the above literature review and analysis, this paper takes universal sleep physiological data as the starting point, and proposes a new method of depression detection based on combinatorial learning and functional network.

II. OVERVIEW OF UBIQUITOUS DEPRESSION DETECTION
To clarify the research ideas of this paper, Figure 1 shows a ubiquitous depression detection framework. Its simplified implementation process is as follows: Step 1: The experiment is designed to collect sleep physiological data from experimental subjects using a ubiquitous physiological data instrument.
Step 2: Sleep physiological data was preprocessed and the physiological features contained therein are extracted.
Step 3: Combination learning is adopted to explore the core information needed for depression detection. This core information includes three aspects: combination learning to obtain the optimal feature subset of depression detection, combination learning to excavate the most effective classifier of depression detection, and combination learning to get the best sleep stages of depression detection for different genders.
Step 4: The features of optimal feature subset are mapped to the nodes, and these nodes were used to construct functional network.
Step 5: Optimal feature subset combined with functional network attribute as the input of the most effective classifier to complete depression detection of different genders.

III. UBIQUITOUS EXPERIMENTS A. UBIQUITOUS PHYSIOLOGICAL DATA INSTRUMENT
Traditional, sleep experiments basically adopt polysomnography (PSG) [27] to collect physiological data. PSG is a comprehensive record of concurrent physiological signal during sleep, and its typical record includes EEG, EOG, chin EMG, ECG, oxygen saturation (SpO2), respiration (Resp), and rectal body temperature. During data acquisition, such equipment not only needs to do a lot of preparatory work, but also requires a professional doctor to operate due to the complexity of the process, which is very time consuming, and it usually takes more than half an hour to place all the electrodes. In addition, due to the large number of PSG wires, there are certain limitations, such as wire interference with the normal sleep posture of the subjects, affect the normal sleep state of the subjects. Based on the above reasons, the current demand for new sleep physiology data acquisition equipment is urgent.
On the premise of meeting the requirements of data acquisition, the number of electrodes in the new sleep physiological data acquisition equipment must be simplified to achieve the goal of convenient, wireless, mobile, fast, and low-cost ubiquitous data acquisition. In recent years, with the development of sensor and ubiquitous electronic technology, it is possible to achieve the above goal. In the experiment, we used the ubiquitous physiological data acquisition instrument, as shown in Figure 2(a), developed by UAIS Laboratory of Lanzhou University. The instrument has the advantages of small size, wireless data transmission, easy electrode placement, and simple operation. Ordinary personnel can perform physiological data collection after simple training. In addition, the accuracy and real-time performance of the instrument has been verified in previous studies [2], [28], [29].
In general, as long as half of the standard sleep physiological signals are collected, the experimental requirements can be met. Therefore, in this experiment, the electrode was simplified to achieve the goal of ubiquitous application while ensuring sufficient physiological data was collected. Finally, two types of physiological data were collected with a sampling rate of 250Hz, including 2-channel EEG (C3-A2 and O1-A2) and 1-channel EOG (LOC-A2). The specific electrode placement is shown in Figure 2(b).

C. EXPERIMENT WORKFLOW
All the subjects involved in the experiment were determined by the screening criteria, which were jointly formulated by project researchers and sleep physicians using the international general scale. These international general scales mainly include: (1) Basic health screening: Cornell Medical Index (CMI) self-assessment health questionnaire was used to screen physical health information of subjects.
(2) Sleep quality screening: Insomnia Severity Index (ISI) and Pittsburgh Sleep Quality Index (PSQI) were used to evaluate the sleep quality.
(3) Depression degree screening: Mini International Neuropsychiatric Interview (MINI) and PHQ-9 were used to evaluate the severity of depression.
Screening criteria were mainly used to investigate other mental diseases and major physical diseases, so as to ensure that the differences among the experimental subjects were mainly caused by depression. After meeting the screening criteria and signing informed consent, the subjects can participate in the experiment. As the subjects were very sensitive to external stimuli during sleep, the whole experiment was conducted in a special room with quiet, non strong light, moderate temperature and humidity, good ventilation and no electromagnetic interference. All experiments were performed at night and the total length of sleep was approximately 7 to 8 hours. Firstly, explain the experimental process and precautions to the subjects. Then, the experimental staff should wear the ubiquitous physiological data acquisition instrument to the corresponding electrode position for the subjects, and ensure that the electrode position is accurate, the electrode contacts is good, and the subject is comfortable. Next, the pretest collects data for at least 10 minutes. After the pretest data was all normal, the experiment formally started and lasting for 7 to 8 hours. At the end of experiment, the experimental staff save the data, and remove the instrument of the subject. The whole experimental workflow is shown in Figure 3.  In addition, it should be noted that irrelevant people cannot enter the room during the experiment. Meanwhile, at any stage of the experiment, the experiment is stopped as soon as the subject asks or indicates that he or she is unwilling to continue the experiment.

D. EXPERIMENTAL DATA ACQUISITION
All experiments were carried out in Tianshui Third People's Hospital, China. Based on research objectives and screening criteria, a total of 40 subjects participated in this experiment. Firstly, 20 subjects (female/male = 12:8, No: 03DP * * * * * ) who met the screening criteria of depression patients (DPs) were selected from outpatients. Then, 20 normal control subjects (NCs, female/male = 12:8, No: 03NC * * * * * ) were recruited from Tianshui city, whose gender ratio, age, and education background were basically matched with DPs.
Demographic variables such as age and gender exert strong influences on sleep physiological data, and must be controlled in depressive sleep studies [33]. Meanwhile, to ensure the accuracy and scientificity of the experimental data, partial abnormal data were removed based on data quality and matching between the two sets of data. Finally, 16 female (DPs/ NCs = 1:1) and 16 male (DPs/ NCs = 1:1) were retained for further study. Female group' ages ranged from 26 to 49, while male group' ages ranged from 31 to 52. Table 1 lists means and standard deviations (SD) of age, education background, and PHQ-9 score for the depression group and normal group. In respect to age there are no significant different between two groups. However, there are significant differences in PHQ-9 score, which indicate that our study predicts depression rather than age.
Before depression detection, the experimental data of 32 subjects were divided by two experienced and independent sleep physicians according to the new guidelines developed by the American Academy of sleep medicine (AASM) [34]. Each 30 seconds of data segment was divided into one of five sleep stages, which includes Wakefulness (WA), Nonrapid eye movement (NREM) sleep stage 1 (NREN1), NREM sleep stage 1 (NREN2), slow wave sleep (SWS), and Rapid eye movement (REM). The corresponding results will be adopted while the sleep staging were consistent between the two sleep physicians. Table 2 summarizes the VOLUME 8, 2020 number of physiological signal segment in the five sleep stages that were used in this study.

A. DATA PREPROCESSING
Previous studies [35], [36] have shown that the frequency range of depression related physiological signals mainly exists between 0.5 Hz and 30 Hz. Therefore, a band-pass Butterworth filter with a low cut-off frequency of 0.5 Hz and a high cut-off frequency of 30 Hz is applied to eliminate lowfrequency breathing wave and high-frequency electromyography wave in this study.
In order to ensure the reliability of subsequent studies and reduce the impact of abnormal data, data normalization is performed by scaling each channel's signal before feature extraction, as shown in Figure 4. The data of each subject are normalized per channel in the range [0, 1] using the Min-Max normalization and the calculation process is shown in Formula 1. The above process helps to extract more comparable features between subjects and ensure the variability of different channels.
where X represents the initial value of each data, X norm represents the value of each data after normalization, min represents the minimum value of all data in a channel, and max represents the maximum value of all data in a channel.

B. FEATURE EXTRACTION
As is well known, delta (0.5-4 Hz) wave is the most common during slow-wave sleep, and often occurs in the process of dreaming. Theta (4-8 Hz) wave is a typical waveform of drowsiness and early stage of sleep, Alpha (8-13 Hz) wave mainly appears in the state of quiet and awake closed eyes, but it disappears immediately after opening the eye. Beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) wave is closely related to mental anxiety and tension. Therefore, to extract more effective features, sleep physiological signals were filtered using the Hanning filter and the above-mentioned four bands delta, theta, alpha, and beta were divided for further feature extraction. Sleep is not a unified state, the physiological signals in the process shows time-sensitive, non-stationary and weak characteristic, etc. The analysis of sleep physiological signals found that they exhibit different linear features, as described in the literature [30]. The importance of the physiological signals time-series analysis, which exhibits typically complex dynamics, has long been recognized in the area of non-linear feature research [37]. Efforts have been made in determining non-linear features such as entropy and Lyapunov exponent for pathological signals, which are shown as useful indicators of pathologies [35], [38]. Finally, on the basis of existing research, combined with the objectives of this study, the following features are selected for extraction.

1) LINEAR FEATURE
Linear feature consist of frequency-domain feature and timedomain feature [39].

a: FREQUENCY-DOMAIN FEATURE
Auto regression (AR) model is a linear model of stationary time series, while sleep physiological signal is typical non-stationary signal. Getting the more accurate frequencydomain feature, an adaptive AR model is adopted to calculate the power spectral features. AR model is a common method of sleep physiological signal processing [40], but existing studies mostly used fixed order AR model [41] rather than adaptive AR model. The coefficients of fixed order AR modeling can reflect the change of physiological signal state. Fixed order AR modeling represents the current signal x(t) as the weighted sum of its previous values x(t − i) and the standard deviation of the residuals ε(t) as shown in Formula (2): where a i is the AR model coefficients, and p is the fixed order of AR model. In this study, Akaikes information criterion (AIC) [42] is used to adaptively obtain the best order p of the adaptive AR model in each 30 seconds segment, so as to calculate more accurate power spectrum features. The calculation of the best order p is shown in the following algorithm (Table 3).
Finally, frequency-domain features obtained by the adaptive AR model include: absolute power spectrum, relative power spectrum, max power spectrum, and center power spectrum.

b: TIME-DOMAIN FEATURE
Time-domain feature is the most intuitive external mapping form of physiological signal. Therefore, the author extracted time-domain features closely related to depression and pathologies [43], [44], including peak-to-peak amplitude, variance, skewness, kurtosis, and Hjorth parameters.
Peak to peak amplitude: the difference between the maximum and minimum amplitudes within a segment, which is calculated by: where X = {x 1 , x 2 , . . . , x n } denotes a set of signal amplitudes in a segment. Hjorth parameters are statistical index proposed by Hjorth in 1970 [45] for time-domain physiological signal processing and one of the original aims of this parameter is to solve sleep related problems. Hjorth parameters provide dynamic temporal information of the sleep physiological signals, which mainly include activity, mobility, and complexity. Among them, activity implies the signal power and the variance of time function. Mobility implies the mean frequency of standard deviation of the power spectrum. Complexity implies the change in frequency. The simplified calculation process of these three parameters is as follows:

2) NONLINEAR FEATURE
As a supplement to linear features, the following nonlinear features were extracted for analysis in this study. Correlation dimension: this feature is an important parameter to characterize the non-linear dynamic complexity of physiological signals. There are significant differences in the correlation dimensions of different mental states, so the magnitude of their values can be used to characterize and distinguish pathological states. The calculation process of this feature is as follows: Let the physiological signal be a time series {x t |t = 1, 2,. . . , N }, and embed it into m-dimensional space to obtain a vector or point set, denoted as: where L is the time delay, J is the time lag, n = 1,2,. . . , N m . N m is the dimension of reconstruction vector, which satisfies: Select any point X i from N m , and calculate the distance from this point to the remaining N m −1 point: Repeat this process for all points in N m to obtain the correlation integral function: where θ is Heaviside function [46]. When the r→ 0, The correlation integral is approximately equivalent to the following formula: ln Cm(r) = ln C + CD(m) ln r (11) According to Equation (11), the reconstructed correlation dimension can be expressed as below: Kolmogorov Entropy: this feature can reveal the loss rate of physiological signals in unit time. In the nonlinear system, the larger the K entropy indicates is the greater the information loss rate, and the more complex the corresponding system. Therefore, its value can also be used to characterize and distinguish pathological states. Kolmogorov entropy is defined as: C0-complexity: this feature reveals the proportion of non-linear components in the original physiological signal. Therefore, the pathological state can be characterized and distinguished by analyzing the proportion of non-linear components in the signal. Assume that the time series of the original physiological signal is {x(n)|n = 1,2,. . . , N }, the calculation process of C0-complexity is as follows: Perform Fast Fourier Transform (FFT) on the signal x(n): Then, the average amplitude of X (k) is computed: X (k) less than or equal to M is replaced by 0, and get a new spectrum sequence: The inverse FFT (IFFT) processing on Y (k) to get a y(n), and the C0-complexity is obtained as follows: where A 0 is the measurement of the non-linear components of the physiological signal, A 0 is the measurement of the all physiological signal.
Shannon entropy: this feature reveals the uncertainty of physiological signal in non-linear system, which is defined by: Largest Lyapunov Exponent: Lyapunov exponent [47] is an indicator of the speed of divergence or convergence of two trajectories in a relative space. The n-dimensional data signal has n Lyapunov exponent, the largest of which is called the largest Lyapunov exponent. Largest Lyapunov exponent is an important index to distinguish the differences of physiological signals. In this study, this feature is used to characterize and differentiate pathological state, and its calculation formula is as follows: where L(t i ) represents the shortest distance from the 0 point at time t i .
In conclusion, this study focuses on the linear and nonlinear features of ubiquitous physiological signal, and extracted a total of 240 features (16 basic features × 5 frequencies × 3 electrodes) from delta, theta, alpha, beta, and full-band of three electrodes. These features are shown in Table 4.

C. COMBINATION LEARNING TO EXPLORE THE RELATED INFORMATION OF DEPRESSION
There are differences in sleep structure of depression patients of different genders [48], [49], but it is unclear which sleep stage is the most effective in recognizing depression, and there are no identified physiological feature or feature combinations that can accurately distinguish the differences between depression patients and normal controls, as well as identified depression recognition model. In order to solve this problem, combination learning was adopted to explore the related information of depression detection.
Combinatorial learning has been widely applied in many biomedical research fields. In this study, four classical correlation analysis methods were tried in combination learning: Relief, gain ratio, principal component analysis (PCA) and correlation-based feature selection (CFS). Because of these correlations analysis methods were widely acknowledged in the depression research area [50], [51]. Considering the time complexity, feasibility and previous application situation of different classification mechanisms in depression studies, we tried five representative classification algorithms in combination learning: Bayesian network (BN) based on probability graph model, support vector machine (SVM) based on statistical learning, K-nearest neighbor (KNN) based on distance, an improved random forest (RF imp ) algorithm [52] and Multilayer perceptron (MLP) based on artificial neural network.
Pair-wise combination tests were performed on four correlation analysis methods and five classification algorithms ( Figure 5) to calculate depression detection rates of different gender's subjects in the five sleep stages. After all the test  results are obtained, the differences of depression detection in different sleep stages were analyzed, and determine the best sleep stage of depression detection in different genders. Meanwhile, the corresponding optimal feature subset and the corresponding effective classifier were generated in the best sleep stage.

D. CONSTRUCTION OF FUNCTIONAL NETWORKS
A function network can be represented by a graph consisting of nodes and edges between nodes. In this study, the basic unit of sleep staging was 30 seconds, so each physiological feature in the optimal feature subset within 30 seconds was defined as an n-dimensional node, as shown in Figure 6. Each node represents a time series of feature. Each edge represents the correlation strength between nodes. To define the edges, we should first the time series of features were transformed into geometric space, and then discuss the correlation strength of between nodes. Typical calculation methods of correlation include: distance metric and correlation coefficient metric. Due to distance metric is more suitable for geometric space, so Euclidean distance [53] was used to measure correlation strength. Euclidean distance between any nodes in geometric space is defined as: where A =  d(A, B) ≤ δ, there is an edge between nodes A and B, otherwise it does not exist. In this study, δ is determined by the connectivity between nodes, and the calculation process is shown in the algorithm 2 in Table 5.   Figure 7(1) gives a simple example of functional network generated based on Euclidean distance and threshold δ. In this example, the two optimal connectivity rates are 0.25 and 0.30, that is, 25% (e = 9) to 30% (e = 10). Finally, the upper limit of the connectivity rate is selected to obtain δ best =10.
The statistical attributes of the network are only related to connectivity between nodes, but not to the edge weight. Therefore, the connectivity between nodes can be represented by Formula 21, in which 1 represents there is an edge between nodes, and 0 represents there is no edge. According to formula 21, the adjacency matrix of Figure 7(1) is obtained as figure 7(2).

E. STATISTICAL ATTRIBUTES OF FUNCTIONAL NETWORKS
Human body is considered to be the most complex functional network system in the universe [55] and attempts to understand its intricate system structure is one of the most challenging areas in modern science. Statistical attributes of functional networks may contain valuable information for depression detection. Meanwhile, the statistical attributes of functional networks be obtained from the adjacency matrix of functional networks. Therefore, the following statistical attributes were calculated and used for further depression detection in this study.
(1) Degree distribution (DD): degree refers to the number of edges connected with a node in the network, which is one of the most important attributes of the network. Degree distribution [56] refers to a probability distribution of the node degree in a network, which is reflects the dispersion of node degrees. The calculation and results of the degree distribution of adjacency matrix 7(2) are shown in Figure  8(1).
(2) Clustering coefficient (CC): this attribute is frequently used to describe the local or global structures of the functional network [57]. The clustering coefficient of a network node refers to the proportion of adjacent nodes that are adjacent to each other, and clustering coefficient of the whole network is the average of the clustering coefficients of all nodes in the network. In this study, the clustering coefficient of the whole network is used to explore the closeness of the neighborhood. The clustering coefficient of the whole network for adjacency matrix 7(2) is shown in Figure 8(2).
(3) Jaccard similarity coefficient (JSC): this attribute reflects the similarity between two nodes in a finite sample set of functional network. It is calculated as the set of the intersection neighbors between two nodes divided by the neighbor set of the union of the two nodes. The jaccard similarity coefficient of adjacency matrix 7(2) is shown in Figure 8(3).

F. PERFORMANCE EVALUATION
In this paper, k-fold cross-validation, and area under receiver operating characteristic (ROC) curve (AUC) are used to evaluation the performances of the proposed method.
(1) k-fold cross-validation: it is a popular and classical method to evaluate the performance of classification algorithm. Each dataset is stratified into k folds, of which k-1 folds are used as a training sample and the remaining 1 fold is used as a testing sample. This process is repeated k-times so that all subsets are tested, and it is defined as: where AC is the accuracy, that is, the proportion of correctly predicted number to total number. Its definition is as follows: Among them, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives, and FN is the number of false negatives.
(2) Area under receiver operating characteristic curve: it is usually considered as an indicator to measure advantages and disadvantages of prediction model. The closer of AUC is to 1.0, the prediction model is better.

V. RESULTS AND DISCUSSIONS
The experimental data analysis of this study is implemented in MATLAB 2017b, and the experimental hardware and software configuration is shown in Table 6.

A. STATISTICAL ANALYSIS RESULTS OF COMBINATION LEARNING
240 physiological features extracted from the data of the ubiquitous experiment were used as the input of combined learning, and then the combination learning applied four correlation analysis methods and five classification algorithms pair-wise to the detection of depression in WA, NREM1, NREM2, SWS and REM for male and female, respectively. To obtain an unbiased result, 10-fold cross-validation was adopted to evaluate the detection performance in this study. In addition, in order to get a statistically meaningful result, 100 times of 10-fold cross-validation are executed independently. This means that the classification is executed 1000 times, and the average value is taken as the result in finally. Tables 7 and 8 show the average performance and average feature number obtained from combination learning of different sleep stages for different gender. Due to one of the main purposes of this study is to explore the most significantly sleep stage to detect depression, thus combination learning were also applied to WA, NREM1, NREM2, SWS and REM respectively. In two tables: (1) AgP: average performance of five classifiers (BN, SVM, KNN, RF imp and MLP) detecting depression subjects and normal controls. (2) AgN: average feature number represents the average number of needed features that five classifiers get highest performance using each correlation analysis method. (3) None: indicates that no feature selection algorithm was used.
From tables 7 and 8, it can be seen that the best sleep stage for male and female are SWS and REM, with average performances of 84.62% and 85.72%, respectively. These results suggest that SWS stage in male and REM stage in female are closely related to depression state detection. Despite it is still an open question whether sleep structure difference are state or trait marker of depression. However, previous studies have indicated close relationship among sleep structure, depression, and gender, furthermore there are significant differences in sleep structure between depression group and normal group [58]. Researchers [33], [59], [60] indicated that there were significant changes in SWS and REM between depressive patients and normal controls. To some extent, this conclusion is consistent with our results. The percentage of SWS stage did not differentiate between different genders in normal controls [61], while male had significantly less SWS counts than female in depression group [48]. These two studies indirectly explain our findings: why SWS stage is most effective in the detection of male depression.
In addition, it can also be observed from tables 7 and 8 that the highest average performance of depression detection in both genders is related to CFS. Why the CFS is superior to other correlation analysis methods? We believe that the main reason can be attributed to as follows: CFS [62] calculates the optimal feature subset based on heuristic evaluation function. This function tends to subsets that contain features that are highly correlated with the class and uncorrelated with each other. Irrelevant features should be screened out due to they have a low correlation with class. Redundant features should be removed as they will be highly correlated with remaining features. The Basic principle of CFS calculating optimal feature subsets is shown in Figure 9. It is based on the above advantages that the CFS method is widely recognized and applied to related fields such as depression detection [63] and sleep disorder recognition [64]. Table 9 lists the optimal feature subset of depression detection based on the CFS. For male optimal feature subset is mainly derived from delta wave of EEG. For female optimal feature subset, except for EOG, it mainly derived from alpha wave, beta wave and theta wave of EEG. The new guidelines developed by AASM [34] point out that delta wave of EEG is the main basis of SWS stage determination, and alpha wave, beta wave and theta wave of EEG, as well as the amplitude of EOG is the main basis of REM stage determination. The criteria of this guideline coincide with the results of optimal feature subset in Table 9. It is further verified from the perspective of sleep medicine that the SWS and REM are the best sleep stages for depression detection in male and female, respectively.
In the above sections, for different genders we analyzed the best sleep stages for depression detection using correlation analysis methods in combination learning. However, we cannot clearly point out which classification algorithm in combination learning has a greater effect on depression detection. Therefore, based on CFS, we further analyze the most effective classifiers for depression detection of different genders in specific sleep stages. The results of different classifier for performance and ROC curve are shown in Figures 10 and  11. From Figure 10, it can observe that performance derived from SVM and RF imp is very closer whether male or female,    and it is higher than 86.5%. And moreover, from Figure 11, it can observe that AUC of SVM and RF imp is very closer, and both exceed 0.920. Comprehensive consideration the factors of performance, AUC, training time of SVM and RF imp (Table 10), this study ultimately uses RF imp to complete further depression detection. VOLUME 8, 2020  Why RF imp is superior to other classifiers? We believe the main reasons can be summarized as follows: (1) In this study, the corresponding experimental data are used only when the sleep staging results of two sleep physicians are consistent. In other words, the data used in this study are not continuous and just an integration of sleep physiology data throughout the night. This data usage strategy conforms to the random sampling mechanism of RF imp [52], which enables RF imp have strong generalization ability and can dig out the hidden information behind the data, to obtain a higher depression detection performance.
(2) Decision trees are independent and highly parallel in the process of RF imp training [65], so the training time of RF imp is shorter than other classification algorithms.

B. STATISTICAL ANALYSIS RESULTS BASED ON FUNCTIONAL NETWORKS
To investigate the performance effect of functional networks for depression detection, the method based on Section 4.4, the optimal feature subsets in Table 9 are used to construct functional networks of different gender. Then two sets of controlled trial are designed for functional networks and conducted statistical analysis of experimental data. In the first set of experiments, only three functional network attributes, such as degree distribution, clustering coefficient and jacard similarity coefficient, are used to detect depression. In the second set of experiments, the optimal feature subset in Table 1 is combined with three attributes of functional network to detect depression. The following is a detailed description of the specific process and results:

1) ANALYSIS AND COMPARISON BASED ON FUNCTIONAL NETWORK ATTRIBUTES
On the basis of the results in Section 5.1, analyze and compare the depression detection performance of functional network attributes in the following four scenarios: Scenario 1: degree distribution is calculated and used as input of RF imp .
Scenario 2: clustering coefficient is calculated and used as input of RF imp .
Scenario 3: jacard similarity coefficient is calculated and used as input of RF imp .
Scenario 4: degree distribution, clustering coefficient and jacard similarity coefficient are used together as inputs to RF imp .
According to the experimental statistical results shown in Figure 12, we can observe that the depression detection ability of jacard similarity coefficient is significantly lower than degree distribution and clustering coefficient for whether male or female. We believe that this result is due to the significant difference of degree distribution and clustering coefficient between depression patients and normal controls, which improves the detection ability of these two functional network attributes. Similar conclusions have been mentioned in depression studies based on network attributes. The study in [66] report that the degree distribution of functional networks in patients with depression clearly tends to be homogeneous. Hu's team [67] found that the clustering coefficients of depression patients were significantly lower than those of healthy controls. Sun et al. [18] reported the clustering coefficient was significantly negatively correlated with depressive level. In summary, jacard similarity coefficient can't be used as a key core attribute to detection depression probably due to overlaps in network similarities. However, the combination of three functional network attributes can improve the detection performance to some extent.

2) ANALYSIS AND COMPARISON BASED ON OPTIMAL FEATURE SUBSET AND FUNCTIONAL NETWORK ATTRIBUTES
On the basis of previous section, analyze and compare the depression detection performance in the following three scenarios:  Scenario 1: optimal feature subsets (OFS) in Table 9 used as the input of RF imp .
Scenario 2: three attributes of functional network are combined as the input of RF imp .
Scenario 3: OFS and three attributes of functional network are combined as the input of RF imp . Figure 13 illustrates the comparison results of the above three scenarios. Meanwhile, the best detection results are shown in Table 11. Based on the results, we can observe that: the optimal features set, as a key factor, can effectively detection depression. Functional network attributes, as a secondary factor, can be further improving the detection performance of depression. These findings reveal: (1) the relationship between sleep physiological features and depression, that is the relationship between features and classes, plays a dominant role in depression detection. Because CFS focuses on selecting features based on the correlation between features and classes. (2) The relationship between sleep physiological features plays a secondary position in the detection of depression. Due to sleep physiological feature is the basis of the construction of functional networks.

C. DETECTION PERFORMANCE AND THE NUMBER OF PHYSIOLOGICAL SIGNAL SEGMENTS
In this section, the relationship between detection performance of depression and the number of physiological signal segments is evaluated and analyzed. The OPS and functional network attributes are used as input of RF imp , the detection performance of depression is compared from the following the number of physiological signal segments: 1000, 2000, 3000, and 4000. Figure 14 shows the detection performance derived from four physiological signal segments of different gender. According to the comparison results, it can be seen that increasing the number of physiological signal segments will improve the detection performance, but when the segment is more than 3000, the improvement degree is relatively small. Therefore, it is suggested that the number of physiological signal segments should be larger than 3000 when applying the proposed method to detect depression.

VI. CONCLUSION
To explore convenient, fast, objective and reliable methods for detecting depression, this study conducted automatic detection and discrimination of DPs from NCs based on combination learning and functional networks using ubiquitous physiological data. We found that the best sleep stage of male and female are derived from SWS and REM, which have different best feature subsets, and the most effective classifier is RF imp , via combination learning. Applying functional network attributes to further improve the detection performance for DPs, we found functional network attributes as a supplement to the optimal feature subset can improve detection performance by at least 4.6%. In addition, we have also investigated the relationships between detection performance and the number of physiological signal segments. The experimental results show that the detection performance is almost unchanged when the number of physiological signal segments is greater than 3000. In conclusion, these findings provided insights and detection tools for our understanding of depression.