Individual Commute Time Recognition Based on the Hierarchical Semantic Model

Individual commute time recognition is essential for traffic demand management. However, this problem has yet to be studied. In this study, we propose a hierarchical semantic model (HSM) to recognize individual commute time. To the best of our knowledge, this work is the first to integrates large scale travellers commute time prediction at an individual level. HSM consists of a low and a high semantic layer. The low semantic layer models spatial, temporal and environmental information, whereas the high semantic layer recognises commute time using the hidden Markov model on the basis of the low semantic layer outputs. Experimental results demonstrate the effectiveness of our proposed model for individual commute time recognition.


I. INTRODUCTION
The recognition of individual commute time can improve the efficiency of transportation systems. This topic is crucial to the study of traffic demand management.
Commuters have diverse commuting behaviour. Some of them have a fixed commuting mode, as shown in Fig. 1. The commute time of some workers, such as civil servants, doctors and comapny employees, does not change in a short period. Other workers, such as couriers, have no apparent characteristics. In this study, we aim to recognise the commute time of people with a fixed commuting pattern which have a high degree of confidence in traffic demand forecasting at an individual level.
Datasets used in related research are concentrated on commuting behaviour at the group level instead of the individual level. For example, Toole et al. [1] used call detail records from mobile phones in conjunction with open and crowd sourced geospatial data, census records and surveys to estimate travel demand and infrastructure use. To predict the behaviour of travellers accurately, we use mobile phones as carriers and form our dataset by collecting the sensor data of the subjects for up to a month. We call this dataset commuting data from phone sensors (CPS) and apply it to our research.
The associate editor coordinating the review of this manuscript and approving it for publication was Jenny Mahoney.
From the sensor data, we can obtain low semantic data which include the basic actions and frequent locations of individuals. High semantic data, which consist of home and work locations and meaningful commute-related indoor and outdoor transition, are extracted. We use hierarchical clustering to obtain frequent visiting sites and use the Gaussian mixture model(GMM) and the hidden Markov model (HMM) to identify human actions simultaneously.
The main contributions of this study are as follows: (1) We collect an utterly new dataset named CPS for individual commuting analysis based on phone sensor data.
(2)To the best of our knowledge, this study is the first to propose a hierarchical semantic model for commute time recognition at an individual level.
(3) An end-to-end framework is proposed to recognize an individual's commute time.
The rest of this paper is organized as follows: In Section II, related work is reviewed. In Section III and Section IV, we present our proposed method. In Section V, we discuss our data and experiments in detail. Lastly, we summarize and conclude the study in Section VI.

II. RELATED WORK
In the literature, a common method for predicting commute time is to study the major factors that affect commute time in a specific commute scenario and predict the average commute VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ time of the corresponding group by modelling these factors. For example, Ford et al. [3] predicted commute time by studying the major factors affecting the commute time of people who use shared bicycles in Wall Street in New York City. This method dicovers the major factors that affect the commute time of the group; thus, efficient traffic management measures are formulated on the basis of the group's perspective. However, this traditional method of prediction and recognition ignores the individual needs of commuters and cannot achieve commute time identification at the individual level. In our method, we overcome this limitation by modelling individual sensors data. This work builds upon the CPS dataset described in Section V because the data and method of commute time recognition have not been studied yet. Nevertheless, four topics are related to this study:(1) related datasets, (2) activity recognition, (3) GMM-HMM model and (4) commute location recognition. Therefore, we review them separately.

A. DATA
To the best of our knowledge, the most similar datasets to CPS are the Actitracker dataset published by WISDM Labs [3] and the Har dataset of UCI [4]. The Actitracker dataset collected accelerator data from 36 mobile phone users at a frequency of 20 Hz, totalling 1,098,207 records. Meanwhile, the Har dtatset involved 30 subjects aged 19-48, thus having a total of 10,299 records. CPS is different in two major aspects. Firstly, CPS is an unlabeled dataset which simulates real-life and samples in open scenarios, whereas the Actitracker and Har are both labelled, and the mobile phones for sampling are required to be fixed in a part of the subject's body. Secondly, CPS samples 13 kinds of sensors, thereby providing a data foundation to identify the subjects' environment; by contrast, Actitracker only samples accelerometer data, and Har samples accelerometer and gyroscope data. Tbale 1 shows the composition and relation information of the Actitracker, Har and CPS datasets.
Although no similar work has been conducted on commute time recognition, many datasets have already included relevant information, such as travel survey databases of residents and public transportation datasets. The residents' travel survey databases which comprise family, personal and travel information tables, are frequently used to construct time assignment models in travel demand management. Alexander et al. [5] generated a user travel matrix using the private survey data of the United States for the analysis and modelling of traffic planning and investment; Toole et al. (2015) combined CDR and other datasets to estimate travel demand and infrastructure usage. As one of the main commute modes for commuters, public transportation datasets provide the possibility to identify passengers' commute time. Chen et al. [6] used ATPS to analyse commuters' travel demand and optimise bus management strategies. CPS differs from the above datasets in the sense that (1) phone sensor data are easy to obtain, and (2) data are flexible and can reflect the users' environment in detail. Although the microphone information of the user is collected and analysed in the CPS dataset, we only extract the loudness data of the environment, and do not save the microphone recording; thus, we do not violate the code of ethics. Moreover, we have informed the subjects of relevant information before the experiment and obtained their consent.

B. ACTIVITY RECOGNITION
Sensor-based activity recognition (AR) uses various sensors to measure the user's activity status, and the available hardware devices can be divided into three categories: (1) mobile phones, (2) wearable devices and (3) special sensors. San-Segundo et al. [7] used multiple sensors in mobile phones to identify six kinds of behavioural patterns, such as running and walking. Takahiro et al. [8] improved the accuracy of human activity recognition using ensemble learning based on single inertial measurement unit sensors. Ghosh et al. [9] used ultrasound sensor arrays to identify human activities. In addition to the expansion of sophisticated sensors, the authors in [10] developed a new strain sensor with excellent biocompatibility; the sensor can detect various human movements, including that of the wrist and fingers, breathing, speaking and swallowing.
Sensor-based AR algorithms have been continuously improved. Triboan et al. [11] proposed a semantics-based segmentation method for sensor data sequence to identify the complex activities of the elderly. In 2018, Hussein et al. [12] extracted features from mobile phone accelerometers worn by subjects, and further classified the features using random forest classifiers to identify human activities. Similarly, Reunanen et al. [13] presented a computational graph structure of human activity detection, which used accelerometers. Honghe et al. [14] proposed a wavelet tensor fuzzy clustering method for multi sensor activity recognition for human activity recognition. Hsu et al. [15] described an activity recognition algorithm based on wearable devices, extracted relevant features after productive human activities and then used PCA to recognize social activities. Moreover, a multi sensor classification and multi layer fusion model based on entropy weight were proposed by Ming et al. [16] for human motion recognition of wearable devices.
In our work, we refer to the method of extracting behavioural patterns in [7] and identify six actions, namely, running, walking, climbing up, climbing down, static and unknown.

C. GMM-HMM MODEL
Our method uses the GMM-HMM model to calculate the sequence of indoor and outdoor states of individuals with time stamps and place labels.
The GMM-HMM model was launched in the 1990s and is mainly used in the field of speech recognition. [17] first combined the output probability of the normalised neural network with the transmission probability of the HMM model. Furthermore, Renals et al. [18] presented a combination method of scale similarity generated by two-layer MLP and GMM-HMM. Zhenjun et al. [19] proposed a linear spectrum frequency conversion method based on HMM-HMM, which can improve the naturalness and clarity of speech conversion. Zweig et al. [20] combined PLP-GMM and hybrid systems using the SCARF framework. Pang et al. [21] improved system performance by embedding a competitive penalty learning mechanism under the hidden Markov state during model training. Yan et al. [22] used DNN for feature extension based on the GMM-HMM model. Wang et al. [23] used the GMM-HMM model for single-channel speech separation. Zimmermann et al. [24] applied the features learned by neural networks to the GMM-HMM speech recognition system, and their work improved the accuracy of speech recognition. Heck et al. [25] proposed the DPFMM-HMM acoustic unit recogniser to enhance the performance of the language model.
In addition to its application in speech recognition, GMM-HMM models are used in capturing remarkable changes in the state of the mobile robot's motion [26], and segmenting human activities.

D. COMMUTER LOCATION RECOGNITION
Studies related to commuter location recognition are lacking. Ahas et al. [27] proposed the use of the passive location dataset provided by Estonia's mobile operator EMT in 2010 to determine the home and work location of users. They used the geographic coordinates of the data to determine the anchor points for the home and workplace on the basis of the standard deviation of the call's average-start-time and call-start-time to arrive at a specific location. In [6], the author deduced the passengers' home and workplace by calculating the passenger's ride frequency and spatially clustering the passenger's bus stop coordinates based on the public bus system data-ATPS. However, our method is different from the above studies in the following aspects: (1) The GPS data we used is active positioning data which can reflect user's position accurately and timely. (2) To the best of our knowledge, our work is the first to suggest to cluster the GPS discrete points and construct the time-place distribution map to determine the user's home and workplace.

III. HIERARCHICAL SEMANTIC MODEL
Before presenting the proposed model, we first formalize our task. The input is a time-stamped sequence of phone sensor datasets of length T, {G, M , A, G y , R} t=1 T , whereas (G, M , A, G y , R) represent GPS, mic, accelerometer, gyroscope, and rotation vectors, respectively. The output is a sequence of indoor and outdoor states of individuals with time stamps and place labels, {I * t }. By calculating the time interval between home and work location in {I * t }, we can obtain the individual commute time on the day. VOLUME 8, 2020 FIGURE 2. Hierarchical semantic structural diagram. An end-to-end framework whose input is sensor data and output is individual commuting time.
As illustrated in Figure 2, the hierarchical semantic model is composed of a low semantic layer and a high semantic layer. The task of the low semantic layer is to recognise the individual's action state and relevant locations in accordance with the input sequence, and the high semantic layer fuses the two to obtain {I * t }.

A. LOW SEMANTIC LAYER
In this work, the function of the low semantic layer lies in data processing, individual activity recognition and regularly visited place recognition. We divide the GPS state into a strong state and weak state in accordance with the number of satellites captured by GPS. Moreover, ambient noise is similarly divided into two states. Given that the original accelerometer data are related to the postition of the mobile phone, we use the quaternion method to convert raw data so that it can fit the earth's coordinate system.
The recognition of individual activities can be described into three parts. Firstly, we segment the processed accelerometer and gyroscope data sequence using a sliding window and then form feature vectors by extracting features from each window. Considering the high density and serialisation of input data and inspired by the speech recognition algorithm, we select the GMM-HMM model to recognise individual activities in accordance with the eigenvector group. The advantage of GMM-HMM over the HMM model is that the sample points projected by GMM are not a definite classification marker, but a classification probability, which can predict the action state accurately.
The locations that individuals visited are obtained by clustering GPS trajectory, and an individual's regularly visited places are recognised by counting those locations. The beginning and the end of the path can reflect an individual's departure and destination; thus, we intercept the head and tail parts of the GPS track for clustering. Each caught part is treated as a set of points, and our goal is to find a centre point that can represent the set. To solve this task, we choose the AGglomerative NESting algorithm (AGNES) [28] in the hierarchical clustering method to cluster the points in this set and, the final clustered point is considered as the recognised location. The reason we choose the hierarchical clustering method instead of other methods is that this method artificially designates central points, which may cause subjective effects, before clustering.

B. HIGH SEMANTIC LAYER
The high semantic layer fuses the information of the individual's basic actions and frequent locations, which are mined from phone sensors data in the low semantic layer. In this layer, basic motions and the state of GPS/noise (ambient noise) construct a hidden Markov group to recognise the indoor or outdoor states of individuals. Afterward, the status' sequence is marked with location information at a time scale, whereas the location of home and work is recognised in accordance with the time distribution of sites obtained in the low semantic layer.

A. LOW SEMANTIC INFORMATION RECOGNITION
The low semantic level focuses on individuals' basic activities and frequent location recognition.

1) INDIVIDUAL BASIC ACTION RECOGNITION
The achievement of individual activity recognition is divided into three parts: pre processing of sensor data, feature extraction and activity recognition.

a: DATA PRE PROCESSING
The sensor' data used for recognising individuals' activities contain three kinds of sensor data: accelerometer, gyroscope and rotation vector sensor data. Before the system works, we use the quaternion method [29]- [31] to transform the accelerometer sensor data.
Let the quaternion be given by the rotate sensor: (p 1 , p 2 , p 3 , λ), where p 1 , p 2 , p 3 represent the rotation vectors along the coordinate axis x, y, z respectively, and λ represents the value of the rotation vector. Accelerometer sensor raw data composed of the component of acceleration on each axis can be represented by the vector [a x , a y , a z ]. We can obtain the unaffected coordinates using Formula (1).
where M is a transformation matrix used in the quaternion method, and it can be represented as Formula (2), as shown at the bottom of the next page. We use matrix M to transform the coordinate system and convert the accelerometer data of mobile phones into carrier coordinates. The coefficient derivation of matrix M can be referred to Henderson et al. [32].

b: FEATURE EXTRACTION
The converted vector [a x , a y , a z ] and the gyroscope sensor data are sampled at a 50 Hz rate and filtered for noise reduction [7]. A Butterworth low pass filter with a cut-off frequency of 0.3 Hz is used to separate the gravitational and body motion components included in the sensor acceleration signals. The rate is sufficient for capturing human body motion, that is, than 95% of its energy is contained below 15 Hz [33]. Then, the processed sequences are grouped into frames by fixed-width windows of 2.56 s and 50% overlap (128 samples per frame with an overlay of 64 samples). We use the method in [34] for feature extraction. Each frame extracts a feature vector by computing measurements from the time and frequency domains of inertial signals. The feature vector consists of 561 features, including wellknown standard measures [35], such as mean, correlation, signal magnitude area (SMA) and autoregression coefficients [36]. In [34], new features were included: energy of different frequency bands, frequency skewness, and the angle between vectors (e.g. mean body acceleration and vector). Further details are provided in [34].

c: ACTION STATE RECOGNITION
Complex action recognition is difficult; in general, the more basic actions are recognised, the higher the accuracy obtained. We identify six kinds of basic actions (running, static, walking, climbing up, climbing down and unknown) using the GMM-HMM model [7]. Similar to the literature [7], we construct five models to recognise five basic actions except for unknown actions. Frames described by 561 features are represented as feature vectors divided into minutes, in alignment with the division of an individual's indoor/outdoor states. The mixed Gaussian distribution generated by feature vectors is processed using GMM models. Moreover, it is used as an input sequence of HMM models, and the probabilities of input action corresponding to models are obtained separately by calculating the joint probability of the sequence path of each model. Moreover, the probabilities of unknown actions are recognised as an infinite negative value.
The probabilities obtained from GMM-HMM models can be represented as {P run , P motionlessness , P walk , P up , P down }, and the maximum value is expressed as follows: P = max{P run P motionlessness P walk P up P down } (3) We recognise the action class label on the basis of the maximum value. However, if where T unknown is set as a threshold to recognise the unknown actions, the action will be recognized of unknown actions. The output value of unknown actions becomes meagre (almost zero) after passing through the model group, whereas the probability of the corresponding action becomes high. We classify running and climbing up and down into other actions, and the six actions recognised are further classified into four actions: static, walking, other actions, and unknown actions.

2) FREQUENT LOCATION RECOGNITION
We use GPS to recognise the locations where the individual frequently visits. The frequency of GPS data downsampling to 1 Hz. Moreover, we use AGNES hierarchical cluster algorithm to understand the sites that individuals often visit.
Step 1: We token the first r points and the last r points of each trajectory. Then, the chronologically arranged points are represented as a sequence N 1 , N 2 , · · · , N n , and an empty set D is used for indoor locations coordinates.
Step 2: Traversing coordinate sequence N 1 , N 2 , · · · , N n . For each point N i , if set D is empty, then the indoor location coordinates d 1 = N i are added to set D; otherwise, the indoor location coordinates d 1 , d 2 , · · · , d m of set D is traversed, and the Euclidean distance is calculated for each indoor location d j using Formula (5).
Step 3: The minimum distance l = min j∈{1,··· ,m} l j is taken, and then the corresponding indoor location is d s when the distance is within the confidence range, thus updating the coordinates using Formula (6).

VOLUME 8, 2020
where k is the number of GPS coordinate points N i which is used in indoor location coordinate d s . Then, the indoor location coordinates are added to set D.
In conclusion, set D contains the locations where individuals frequently visit during the sampling period.

B. HIGH SEMANTIC INFORMATION RECOGNITION 1) INDOOR AND OUTDOOR STATE RECOGNITION
Indoor and outdoor state division is an essential part of our work. The locations' time distribution obtained from the state sequence can further recognise the home and workplace. The hidden Markov model (HMM) [37] and the Viterbi algorithm [38] are used for indoor and outdoor state recognition. Moreover, two HMM models are used to make joint decisions and obtain a sequence with high accuracy.
HMM. As described in Section 4.1, the HMM model's observable state is composed of four basic actions (motionlessness, walking, other and unknown actions) and two states (strong and weak) of GPS and ambient noise. We describe the observable state as a triple elements group (A, G, V ).
Our work focuses on the division of indoor and outdoor states, and the Viterbi algorithm is used to obtain the optimal prediction sequence of indoor and outdoor states. Concretely, we have where a ij is the state transition probability, and b j (k) is the observation probability. Given no prior condition, we set the initial state probability vector π = (0.5, 0.5). When the sequence computation terminates, we can obtain the probability P * of the optimal path and the corresponding terminal i * T under the path.
Then, we retrospect the optimal path to obtain the sequence of the path which can represented as . . , i * T is the optimal prediction sequence of indoor and outdoor states.
Misjudgment may occur in the models, and it happens during state transitions; thus, we simultaneously use two HMM models with different state transfer matrices A to pursue high accuracy. Two sequences from the models are combined with the WiFi and location information, which are provided by the mobile phone's WiFi module and GPS data separately. WiFi name is the first condition for discriminating a change in state. If the WiFi name is not changed, then we deem the state of the individual as not altered, and vice versa. Supposing an individual's phone has no WiFi connection, we calculate the Euclidean distance between the actual and clustered locations to perceive the change of the individual's location; GPS data can obtain the real site, whereas the cluster locations are recognised by the genealogical cluster algorithm in Section 4.1. In summary, the sequences synthetically judged are the final status sequences that will be used to recognise commute time.

2) LOCATION RECOGNITION OF HOME AND WORK
The locations obtained in the sampling cycle in Section 4.1 are be counted. We recognise the home and workplace based on the basis of the general situation wherein home and commute locations are the most common places for most people. The subjects we studied had distinct characteristics of day work and night rest. Therefore, we conduct time interval distributionon the basis of the two locations with the most statistics (as shown in Fig.8). In addition, the place where time concentrates in the daytime is recognised as the commute location. Similarly, the site where time focuses in the evening will be remembered as the location of the home.

3) COMMUTE TIME RECOGNITION
The indoor/outdoor state sequence with time information is marked with locations in set D (Section 4.1), and the home and commute location information is added. The time used between home and commute location is recognised as commute time.

A. DATASET
We provide a publicly available dataset CPS. Table 1 shows details of the CPS. CPS contains ten individual data with an average of 27 days, and the subjects include six graduate students and four young workers aged 22 to 30 years. We used Huawei Mate 8 with the Android 6.0 system as experimental device. The participants were tasked to collect data for more than three weeks. Throughout the experiment process, WiFi and GPS remained open. We developed an app to collect relevant sensors data automatically. Moreover, no restrictions were imposed on the usage of the provided smartphone. The website for obtaining the dataset is https://pan.baidu.com/s/12jKE18tpO4u2ihie4AwiNw. Please contact us to obtain the extraction code if you need to use the dataset.
A prior dataset (collected from subjects) was built for training the GMM-HMM and HMM models. The previous dataset consists of two parts: five kinds of physical activities (i.e., static, walk, run, upstairs, and downstairs) and three day phone sensor data for each subject. The activity data for GMM-HMM pretraining and daily data were further divided into two segments for HMM training.
Two places are needed to set thresholds manually; thus, we used heuristic rules to fix them: (1)The division of states of GPS and ambient noise. The state of GPS determined by the number of satellites (when the number of satellites exceeds 8, it is judged as a strong state, vice versa). Similarly, the state of ambient noise is determined by a threshold of 60 db; (2)Sequence length of location clustering for trajectory interception. In this study, the value of r is set to 10. Usually, 10 points are enough to determine the location. Too many points affect the clustering centre, whereas too few points may result in noise. In addition, the length of the input and output sequence of HMM is 1,440 because a day is divided into 1,440 min.

B. COMMUTE TIME RECOGNITION
The experiment was conducted on the dataset described in Section 5.1. We present our experimental results in accordance with the structure of the hierarchical semantic model.

1) LOW SEMANTIC LAYER
The role of the lower layer is data fusion. We extract the locations information and recognise the individual's actions from the sensor data.

a: FREQUENT LOCATION RECOGNITION
On the basis of the method described in Section IV (frequent location recognition), the location information about the frequently visited places of individuals is obtained by clustering GPS trajectory. According to the cluster location, Fig. 4 shows the quantitative distribution of the subject's recognised frequently visited sites.
We compare the identified location with the standard map coordinates of the location. Considering that the experimental and standard coordinates of the location are latitude and longitude coordinates, we use the Haversine formula ([Formula (16), as shown at the bottom of the next page]) to calculate the coordinate deviation.
In the Formula (16): r is the radius of the sphere. (r in this study refers to the radius of the earth.) d is the distance between the two points. φ 1 , φ 2 : denotes the latitude of points 1 and 2. λ 1 , λ 2 : denotes the longitude of pints 1 and 2. Table 2 details the experimental data. One point to be explained here is that the data deviation of some places in the table is relatively large, because of two reasons: 1) After the subject enters the building, the GPS signal may be weak. In this case, we can only locate the periphery of the building, but the standard landmark is the core point coordinate of the building. At this time, the main factor of the location identification error is the size of the building. Generally, within 100 meters (D-deviation less than 0.1) is within the normal error range. In some extreme cases, relatively large errors may occur. For example, the actual location is a terminal of the airport, and a coordinate point around the terminal of the location is identified. At this time, the identification error reaches the order of 100 m. 2) The  system may recognise the endpoint of the track before GPS misses as a location, and the error between the target location and the recognition location is relatively large. Nevertheless, the purpose of location recognition is to identify the location information of the home and workplace, and the other locations are secondary information, whose accuracy does not affect the recognition accuracy of the last commuting time.  We use the two places with the most statistics as the home and wrokplace points. As shown in Table 2, we average the identification errors of the two points. The maximum average error of the experiment is 37 m, and the minimum average error is 2 m, which belongs to the normal error range (within 50 m).

b: ACTION STATE RECOGNITION
By processing the data of the cell phone accelerometer, rotation vector sensor and gyroscope, we can identify the action of the subjects and generate the action sequence. The length of the action sequence is 1440, indicating that 1440 action states are generated for each individual per day. Fig. 5 shows the action recognition results of a certain week for a subject.

2) HIGH SEMANTIC LAYER
In this layer, we process the data output from the lower semantic level to identify the indoor and outdoor state sequence of individuals. By combining the sequence with time stamp and the location data, we can derive the location semantic information of an individual's home and workplace and further identify the commuting time of individuals.

a: INDOOR AND OUTDOOR STATE RECOGNITION
The HMM group is used to recognise the indoor/outdoor state of the individuals after location recognition. Fig. 6 depicts a section of two different scenarios' recognition result via two HMM models; HMM 1, which has a critical transfer matrix is sensitive to the changes in indoor/outdoor states. By contrast, HMM2 with a smaller transfer matrix is more insensitive to the changes. The sequence of the indoor/outdoor state is divided into minutes. The indoor state is represented by value '1', and outdoor state is represented by value '2'. Moreover, the number marked represents the clustering locations obtained in Section 4.1. Different models obtain different results in some special situations, and the figure shows two different scenes that often happen in our daily life. In the left picture, the outdoor state transition is ignored by HMM2, resulting in the unrecognised location change from 1 to 4 at the 500th minute of the sequence. Similarly, the right graph shows the misidentified caused by the faster transform rate of HMM1.

b: LOCATION RECOGNITION
The two places with the most statistics are taken as the candidates for home and workplace. Given that most commuters are in commute locations by day and at home by night, two different sites are given the semantics of the home and workplace in accordance with their time distribution.
We divide each day into two sections; Section A is from the current day's 20:00 to the next day's 8:00, and section B is from the current day's 8:00 to 20:00. By comparing the time proportion of two locations in the A/B period, we provide semantic information on home (A period accounts for more time) and workplace (B period accounts for more time). Fig. 7 describes the time distribution of the two locations using the statistical proportions of sites over time. Fig. 7(a) shows the time distribution of the two places on working days, wheres Fig. 7(b) shows that for the rest of the day. As is shown in Fig. 7, the time distribution of two places is remarkable different. One of the two locations is more concentrated in the evening, whereas the other location is concentrated during daytime.
AVG (1)(2) in Table 2 shows the recognition deviation of home and workplace locations. We identified the home and commute locations of eight partcipants successfully; however, those of the other two subjects were not identified because their commute locations were not fixed. As shown in Table 3, our method is suitable for identifying fixed sites.

c: COMMUTE TIME RECOGNITION
By combining the recognition results of indoor/outdoor state sequence and workplace and home locations, the time used between home and commute location is recognised as commute time. Fig. 8 shows the results of commute time recognition for different subjects, the commute time spent from home to work are drawn in red, whereas that for the workplace to home is drawn in blue; the graph shown is not symmetrical. As shown in Fig. 8, many factors affect commute time. Although individuals have considerable differences, commute time fluctuates within a specific range individually. In addition, commute time is recognised as 0 min when the subjects go to work without the phone.
We thus provide a method to recognise the individual's commute time; this method essentially identifies the home and the workplace and calculates the time spent in between them. The commute time of commuters on regular working days is stable, and our experiment shows the same results.
Results show that our method is effective for commute time recognition.

VI. CONCLUSION
We propose a method for commute time recognition by using a hierarchical semantic model, which consists of two layers: low and high semantic layers. As a preliminary information fusion layer, the low semantic layer recognises the individual's basic actions and frequent locations. On this basis, we further identify home and commute locations to recognise meaningful commute-related indoor/outdoor state transition and commute time.
Our experiments show that the method is practical; however, many problems have yet to be solved. Our approach is only applicable to people who commute regularly, and it is not valid for people who work in an unstable workplace, with unpredictable working hours or whose working hours are mainly distributed in the evening. Moreover, the dataset only contains a small range; thus, whether age, ethnicity and other factors affect the of the GMM-HMM models' motion recognition effect, which leads to commute time recognition, remains uncertain. To solve the above problems, we will conduct further research to achieve the commute time recognition in complex situations. Her research interests include service robot, pattern recognition, machine learning, and image processing. VOLUME 8, 2020