Smartphone Zombie Context Awareness at Crossroads: A Multi-Source Information Fusion Approach

“Smartphone Zombie” (or Smombie) is a new term that describes pedestrians playing with their mobile phones while walking. Research has shown that the usage of mobile phone distracts pedestrians, slows walking and increases the risk of car accidents. Relevant measures and laws have been taken to decrease the phenomenon; however, these measures and laws are costly, and the effects are limited. Therefore, the motivation of this study is to develop an efficient approach to identify and remind Smombies at intersections to reduce accident occurrences. In this approach, a framework for Smombie context awareness is proposed that integrates behavior information from pedestrians within physical and virtual space. In the framework, a set of sensors is employed to recognize the gesture of playing mobile phones. A modified multi-information fusion algorithm is developed to make Smombies aware of their physical context by combining fuzzy mathematics and Dempster-Shafer (D-S) evidence theory. Experiments indicate that the recall and precision of the algorithm are above 0.9, which means the proposed methods can effectively identify Smombies at crossroads.


I. INTRODUCTION
With the popularity of mobile phone use, more and more people become addicted to their phones. Currently, a new term, ''Smartphone zombie'' (or ''Smombie''), is used to describe pedestrians who amble without paying attention to their surroundings because they focus on smartphones [1]- [4]. According to an American observational study, approximately one-third of pedestrians displayed mobile-phonedistracted activity while crossing streets [5]. Texting, particularly while walking is a dangerous behavior because a smartphone user's vision is estimated to be only 5% of that of an average pedestrian [5]. Therefore, these distracted pedestrians might encounter potential safety hazards, for example, tripping over curbs, bumping into other walkers, or even being hit by a car [6]. Based on the National Electronic Injury Surveillance System database, Nasar (2013) reported that the The associate editor coordinating the review of this manuscript and approving it for publication was Honghao Gao . number of pedestrian injuries in the USA due to mobile phone usage increased from 559 in 2004 to 1,506 in 2010 [7]. The actual number of injuries is probably much higher because many people who suffer a slight injury may not go to the emergency room.
Many efforts were made to improve the situation. For example, the city of Honolulu in the United States enacted a law in 2017 to prohibit pedestrians from looking at their phones while crossing the street, with fines for violators [8]. Augsburg in Germany [9] and Bodegraven in the Netherlands [10] used ground traffic light technology to help Smartphone Zombies cross the roads. In some cities, such as Antwerp in Belgium [11] and Chongqing in China [12], some so-called ''mobile phone lanes'' appeared.
To reduce Smombie behavior while crossing streets, it is critical to accurately identify whether a pedestrian is looking at his mobile phone; however, few approaches have been developed to achieve this goal in literature. This study presents a multi-source information fusion approach to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ detect the Smombie context at crossroads. In this approach, the combination pattern of mobile phone sensor signals was used to indicate the Smombie context, and an extended Dempster-Shafer (D-S) evidence algorithm was developed to judge the Smombie context. The contributions of this article are as follows: First, we proposed a framework for Smombie contextawareness integrating the pedestrian behavior information within physical and virtual space. This framework uses geographical location to trigger context-awareness judgment, thus reducing the data calculation and battery cost of other sensors. And the context-awareness judgment not only considers the physical postures but also the virtual space activities, thus better depicting the user's situation.
Second, we proposed a modified multi-information fusion algorithm when judging the physical postures. We validated a set of sensors that can recognize Smombie gestures and extracted 14 features. Then to fuse these features in this study, we proposed a modified multi-information fusion algorithm to aware Smombie context by combing fuzzy mathematics and D-S evidence theory. Compared with traditional Bayesian theory, the D-S evidence theory can assign uncertainty or ignorance to propositions, thus retaining uncertain context information through the process. And fuzzy mathematics can cope with the uncertainty as a result of data inaccuracy or vagueness in context-awareness.
Third, we conducted comprehensive experiments in several scenarios and compared our methods with the single-sensorfeature inference, traditional D-S method, and Bayesian theory. Our approach showed the best performance and demonstrated its feasibility with the precision and recall rate both higher than 0.9.
Related work is discussed in Section II, and methodologies for the proposed approach are described in Section III. The conducted experiments and their results are introduced in Section IV. Section V focuses on a discussion of the approach, its limitations, and future work. Section VI summarizes the conclusion of this article.

II. RELATED WORK A. CONTEXT AWARENESS BASED ON MOBILE DEVICE SENSORS
The context of mobile devices is receiving increasing attention in mobile and ubiquitous computing research communities [15]. Smartphones are often equipped with a series of embedded sensors (e.g., GPS, accelerometer, gyroscope, orientation sensor, magnetometer, camera, and microphone) and different communication interfaces (e.g., cellular, WIFI, Bluetooth) [16]. These sensors are widely used to detect mobile phone's context. Initial research used one single sensor to detect context. One of the most commonly used sensors is GPS. For example, a virtual tour guide system used GPS to detect which areas the tourists entered to show different guide notes [17]. Another application called ''e-graffiti'' also detected the user's location on a college campus to display diverse text notes via GPS [18]. Accelerometer sensor is also widely used in activity monitoring; for example, Muller (2000) detected states of walking, running, sitting, walking upstairs by the x-y axis of single sensor accelerometer [19]. However, a single sensor falls short in describing the characteristics of complicated environments.
In recent years, multi-sensors are used to aware of comprehensive contexts due to the availability of sensors in smartphones. For example, Davis (2016) used a smartphone's accelerometer and gyroscope sensor to infer six basic activities (i.e., walking, going up and down the stairs, sitting, standing, and laying) of user groups in their natural surroundings [20]. Zhang et al. (2017) used an accelerometer and orientation sensor to detect human falls [21]. Additionally, many studies [22]- [24] identified drivers' driving behaviors (i.e., ''Lane-Change,'' ''Turn,'' ''U-Turn'') through a combination of GPS, accelerometer, gyroscope, magnetometer, and orientation sensor.
In short, mobile phone sensors have been widely used in recognition of human body posture and activity, but there is still a lack of approaches able to identify the situation when the user looks at the mobile phone while crossing the road.

B. MULTI-SENSOR INFORMATION FUSION
In recent years, multi-sensor systems have been widely adopted in various fields like health intelligent monitoring [25], education and sequential patterns mining [26], tourist guiding [27], smart shopping [28], and assisted pedestrian navigating [29]. In these systems, complementary observations from different sensors are fused to enhance the performance. Many multi-sensor fusion algorithms have been applied in the literature, including the Weighted average method [30], Kalman filtering method [31], Bayesian inference [32], BP neural network [33], Dempster-Shafer (D-S) evidence theory [34], fuzzy reasoning [35].
Among these algorithms, the Dempster-Shafer (D-S) evidence approach distinguishes itself with more logical and realistic results when combining the probability of existence estimations [36]. Compared with Bayesian theory, the Dempster-Shafer theory of evidence can assign uncertainty or ignorance to propositions [37], thus retaining uncertain information through the process [38]. Therefore, it always serves as an alternative method of modeling evidence and uncertainty than traditional Bayesian probabilistic models [36]. Nevertheless, there still exists the problem of the basic probability assignment (BPA), a crucial step in the D-S evidence framework, as in most of the systems, the ''probability'' numbers are simply assigned based on expert opinion [37].
Fuzzy set theory [39] was first proposed by Zadeh in 1965 to cope with the uncertainty because of data inaccuracy or vagueness. In the context-awareness field, the situations are subjective concepts with vagueness, causing the intractable problems of being precisely measured in conventional quantitative expressions. Besides, the states of situations are not stable, causing the sensor signals to fluctuate sometimes. In those situations, the fuzzy set theory serves as an efficient and simple solution widely used in multicriteria decision-making [40]- [42].
Some researchers combine D-S evidence theory with fuzzy mathematics, using the fuzzy membership function to assign values to the probability function in evidence theory, thus solving the critical problem of establishing an assignment function model in D-S evidence theory and obtaining a better performance of information fusion. For example, Liu et al. developed a fault diagnosis system by combining fuzzy mathematics and D-S evidence theory [43], and Xu used the combination method to diagnose transformer insulation aging [44]. Wang et al. designed the application of human fall detection in substation [45].
However, in these studies, the critical values of the fuzzy membership function were mainly depended on subjective judgment, and the outliers of data might affect the estimation of critical values. Thus, a new method needs to be proposed to determine these critical values.

C. PHYSICAL AND VIRTUAL ACTIVITIES
In recent years, our society has rapidly developed new kinds of information and communication technologies (ICT). The fast-diffusion of ICT has transformed how people carry out their activities [46], enabling people to conduct activities in virtual space and physical space simultaneously [47], [48]. According to Negroponte (1995), physical space can be defined as a material world made of atoms, while virtual space is a world composed of bits of information [49].
However, the two spaces are not separated [50], [51]. Research has shown that activities in physical space and virtual space can influence one another [52], [53]. On the one hand, sometimes the information in virtual space may assist the physical activities. For example, in physical space, various modes of transportation are used to move people around, while in virtual space, information and communication technologies provide the means for navigation [46]. On the other hand, sometimes, virtual activities are nonauxiliary activities for physical space and might distract people from physical activities. For example, reading messages on a mobile phone might steal the attention of pedestrians when crossing intersections [1]. Therefore, to identify the Smombie situation, we should not only care about the physical space but should also pay attention to virtual space.

A. OVERALL FRAMEWORK
The methodological framework for Smombie context awareness is shown in Fig. 2. The procedure includes four main steps: First, the mobile phone's multi-source data is collected, including GPS coordinates, mobile phone sensors signal (i.e., gyroscope, accelerometer, proximity), and app thread information (i.e., the current app in use).
Secondly, the GPS coordinates are used to determine whether the user is in a buffer of the intersection. Only when the user is in the buffer, the calculation of the third step will start, thus reducing the calculation and battery cost.
Thirdly, screen and app thread information are used to identify virtual space activities. If the phone screen is locked, namely, the phone is not used, it would be unnecessary to continue the following steps. Otherwise, the package name of the current app in use is captured. A previously prepared category table will identify whether the app belongs to a ''map and navigation'' category that corresponds to an assistedreality activity in the virtual space. If not, it would be deemed that the user might be distracted by the app when crossing a street. Then the next step would be triggered to identify the pedestrian's postures.
At last, the sensor data are used to detect whether the user is playing with the mobile phone. Based on the previous steps, further identification calculation is performed. First, in the feature extraction step, a preliminary experiment is conducted to distinguish the valid sensors and extract the quantitative features. Then an improved algorithm based on fuzzy mathematics and D-S evidence theory fuses the extracted features and determines whether the user is in the posture of playing on his mobile phone. If yes, a ''Smombie Situation'' is identified, and the phone will send a warm reminder to the user via a pop-up window and vibration.
In the framework, the most critical step is to identify the posture of a user using the mobile phone while walking, mentioned as ''the target posture'' in the following text. Previous studies mainly focused on the differences in the behavioral postures of users, such as walking, running, climbing stairs, sitting, and so on. However, there was a lack of relevant research on the postures of users playing on mobile phones while crossing the roads. The main contribution of this paper lies in this area.
The following phases, B and C, will elaborate on how to use the mobile phone sensor information to identify different gestures, including two steps: feature extraction, and information fusion. Besides, the category table of the virtual activities will also be given in phase D.

B. FEATURE EXTRACTION FOR THE TARGET POSTURE OF USING MOBILE PHONE WHILE WALKING
A mobile phone has numerous sensors. To provide a reliable source of evidence for information fusion, a crucial issue of this study is to determine which features of which sensors could be used to identify the target posture.
Common pedestrian walking postures are observationally classified into the following three styles: Gesture1: Walk and play with the phone. Gesture2: Hold the phone but swing it freely. Gesture3: Put the phone in the pocket.
To identify the sensors to use to detect the target posture, a preliminary experiment was conducted to collect different  sensors data recorded by the same user in different environments. However, only three sensors (gyroscope, accelerometer, and proximity) show similar regularities in the same posture in different environments, meaning that they were affected by the posture rather than the environment. So, they probably could be used to extract the features of our target posture. The coordinate system of the smartphone used in this paper is shown in Fig.3. The data format is listed in Table 1.  The data processing and visualization of sensor data were carried out in MATLAB, and numerical differences of each sensor were compared. It was found that the proximity sensor makes clear distinctions between Gesture1 and Gesture3, as shown in Fig.4. The gyroscope, accelerometer, and proximity show similar regularities between Gesture1 and Gesture2 in different environments, as shown in Fig.5, which probably could be used to extract the features of our target posture.
In Gesture1, when a user is playing with the phone, the device will have an upward tilt for reading convenience, and the cell phone tends to maintain a stable state, as shown in Fig.3. As for Gesture2, when a user holds the mobile phone in his hand and swings it naturally, the phone will be swung back and forth with his walk, causing the phone's tilt angle, angular velocity, and distance to constantly change with the body in motion. As the accelerometer can detect the tilt angle, the gyroscope can detect the rotation, and the proximity sensor can detect obstructions. These three sensors might be likely to be combined to distinguish Gesture1 from Gesture2.
As to Gesture3, when the phone is put in the package, its screen would be covered. Therefore, Gesture1 can probably be distinguished from Gesture3 via the proximity sensor.
For these three sensors, we will interpret the experimental results in combination with the sensor principles, to extract the features and quantify the description indexes.

1) ACCELEROMETER
For a tri-axis accelerometer, the values of the axis depend entirely on the phone's orientation. If the phone is placed flatly on the desktop, the X-axis and Y-axis will default to 0, and the Z-axis will default to 9.81. Furthermore, if the phone is tilted to the left, the X-axis would be positive; if tilted to the right, the X-axis would be negative. Moreover, the upward and downward motion of a phone is also connected with the positive and negative values of the Y-axis. To find some quantitative indicators for axes, SPSS was used to extract descriptive indicators from the data. The results are shown in Table 2.  In these results, for Gesture1, the mean value of the X-axis of the accelerometer is about 0, and the fluctuation range is small (the variance is small). The Y-axis is a positive value, and the mean value of the Z-axis is about 9.5. For Gesture2, the mean value of the X-axis of the acceleration sensor is positive, and the fluctuation range is broad (the variance is significant). The Y-axis is a negative value, and the mean value of the Z-axis is negative. The statistical value is very close to the theoretical value; thus, these indicators can be adopted, considering the fluctuation error in practice.

2) GYROSCOPE SENSOR
The gyroscope returns angular acceleration data on the X-axis, Y-axis, and Z-axis, which can represent the rotational motion of the phone relative to the coordinate system. If a user is maintaining a posture of looking at the phone, the device will tend to be static relative to the user.
In our experimental results, as shown in Fig.5(a2) and Fig.5(b2), the red-line representing Gesture1 illustrates that the three axes have only slight fluctuations, indicating that the mobile phone is in a stable state and has nearly no rotation. However, the blue-line representing Gesture2 demonstrates that the angular velocity constantly shifts between positive and negative. It might be that when the user holds the mobile phone in his right hand and swings it naturally, the phone will swing back and forth as the person walks.
SPSS was used to compile descriptive indicators from the data. The results appear in Table 3. For Gesture1, the mean value of each axis is 0, and the fluctuation is slight (the variance is small). As for Gesture2, although the mean value of each axis is also 0, the fluctuation is fierce (the variance is large). Therefore, the mean and variance of the three axes of the gyroscope can be combined to identify different postures.

3) PROXIMITY SENSOR
The proximity sensor measures the distance between the object and the phone in centimeters. Some proximity sensors can only return two states, far and near. In our study, if the distance greater than 5cm, the proximity sensor will return 5, otherwise it will return 0.
When people are walking and playing with their phones, they tend to hold their phones slightly forward and uncovered, so the distance is estimated to be 5 cm. However, when the phone is swinging, the screen will be covered when it is close to the body, and the distance will sharply reduce to 0 in a short time.
In the experiment results, as shown in Fig.5(a3) and Fig.5(b3), the value for Gesture1 is always 5, while Gesture2 has occasional 0 values. SPSS was used to extract descriptive indicators from the data. The results are displayed in Table 4, indicating that the mean and variance of the proximity sensor value can effectively distinguish the two different postures.
To sum up, from the above analysis, we can extract a total of 14 effective descriptive indicators, namely, the mean and variance of the accelerometer's three axes, of the gyroscope's three axes, and of the proximity sensor, respectively. These indicators could be used to identify the posture of a mobile phone user in the following study.

C. AN IMPROVED INFORMATION FUSION ALGORITHM BASED ON FUZZY MATHEMATICS AND D-S EVIDENCE THEORY
How to fuse these 14 identified sensor features is the second key issue in this study. As mentioned in Section II, D-S evidence theory is a classical method to fuse information of multiple sensors.

1) THE TRADITIONAL D-S THEORY OF EVIDENCE
The D-S Theory of Evidence is an imprecise reasoning theory, which was originally proposed by Dempster (1967) [54] and improved by Shafer (1976) [55]. The theory can comprehensively consider uncertain information from multiple sources, such as information from multiple sensors and opinions from multiple experts, to solve the problem. The theory has three key components: the frame of discernment definition, basic probability assignment, and Dempster's rule of combination.
The frame of discernment definition denotes the set of all possible properties for a given event, like = { 1, 2, . . . n}. The frame of discernment in this paper is defined as = {Yes, No}, which corresponding to ''the target posture'' and ''not the target posture''.
The D-S theory of evidence assigns a belief mass to each element, which called basic probability assignment (BPA). A belief function m: 2 → [0, 1] has two properties: x the mass of the empty set is zero: y the masses of the rest elements of the set add up to 1: If A is a subset of , the mass m(A) expresses the proportion of all relevant and available evidence that supports the claim that the actual state belongs to A. VOLUME 8, 2020 Then to combine multiple evidence sources, the following Dempster's rule of combination can be used to obtain a joint mass M: where K is a normalized coefficient, m i is the mass function of evidence i, and A i is an element of the frame of discernment.

2) IMPROVED APPROACHES OF D-S THEORY FOR SMOMBIE CONTEXT AWARENESS
The basic idea of this study is to use the D-S evidence theory for fusion. However, considering the actual situation of this study, some improvements were made as follows: (

1) Definition of the basic probability function using the fuzzy number
In the traditional D-S evidence theory, the BPA is always given directly by experts and is a precise number. However, in our study, as the same posture of different people may have deviations, it's hard to give a clear range distinction between two states; for example, how far the phone will be tilted when the user plays with it is hard to define. Therefore, mobile phone status recognition could be deemed as a fuzzy classification problem. Thus, we use a fuzzy number M as a probability distribution function m in equation (2).
A fuzzy number M is a generalization of a regular real number. It refers to a set of possible values for a connection, each of which has a weight between 0 and 1 [56]. A fuzzy number M is called a triangular fuzzy number if its membership function µ M is equal to where u is the upper value, l is the lower value, and m is the modal value of the support of M, and l<m<u [57].

(2) Determination of the upper and lower limits of the triangular fuzzy number using box-plot analysis
Three parameters in the function need to be determined when using triangular fuzzy numbers for the probability distribution. These are the upper limit value u, the average value m, and the lower limit value l. In traditional methods, outliers are very likely to occur when the sensor collects data. Therefore, our study uses the box-plot analysis derived from the statistical field to exclude outliers and determine the values of l, m, u, thus reducing the errors.
The specific process is as follows: Step1. Data segmentation. Data collected from volunteers are used as the training set. Then 100 random starting points are generated to segment each sample data in a time window of two seconds (a period that can timely detect changes of postures without disturbing pedestrians too frequently).
Step2. Descriptive statistics. The distribution of each feature, such as the mean value of the X-axis of the gyroscope, in each 2-second period, can be calculated, and the quartile, upper limit, lower limit, and the average value of box-plots can be calculated. An example is shown in Fig.6. Step3. BPA design. To determine the l, m, u parameters in the triangular fuzzy function, some adjustments are made according to the actual situation. The features extracted above can be categorized into two groups: the mean features and the variance features.
For mean features, the mean value is assigned to m, and the upper and lower limits calculated by the box-plot are assigned to u and l. The membership function can be calculated by (5). The function diagram is shown in Fig.7. For variance features, the variance value must be greater than or equal to 0. In the theoretical case, when the fluctuation is very small, the variance should approach 0, but the actual situation may lead to a certain degree of fluctuation. Therefore, the lower limit should be 0, and the probability from 0 to m is taken as 1. The probability decays from the mean to the upper limit. The membership function is adjusted  as (6) and the function diagram is shown in Fig.8.
Finally, 14 distribution functions can be determined, as shown in the results section. (

3) Elimination of the ''0 absolutizations'' effect by adding a full-frame
In the fusion of different sensor information sources, the pieces of evidence may have a high conflict situation. For example, when the basic probability of the state ''Yes'' given by one sensor is 0, even if the probability given by other sensors are extremely high, the final probability obtained according to the D-S evidence fusion rule would still be 0.
According to Xu's theory [58], to avoid this situation, the value of BPA can be slightly modified without changing the original meaning. Therefore, a full-frame is added here with a value of 0.0001, and the rest of the frame is allocated with a value of 0.9999 to avoid the effect of ''zero absolutization.'' The frame is shown below in Table 5. Finally, according to the D-S fusion rule, the multi-sensor information can be fused to obtain the final result.

D. IDENTIFY THE VIRTUAL SPACE ACTIVITIES
In this study, we can define virtual activities into two categories: assisted-reality activities and non-assisted-reality activities. Assisted-reality activities for pedestrians in virtual space can be defined as ''navigation activities,'' which can be manifested by using ''map and navigation'' type applications. Otherwise, the rest type of applications in use would be deemed as non-assisted-reality activities, which would distract the pedestrians while crossing the intersections.
To categorize the applications on mobile phones, a category table was prepared previously, as shown in Table 6.
If the user is in the posture of playing a mobile phone, the package name of the current app in use would be captured and classified via Table 6. If the virtual activity is a nonassisted-reality activity, the situation would be identified as ''Smombie Situation.''

IV. EXPERIMENTS AND RESULTS
This section mainly introduces the experimental area, collected data, and experimental results. The experimental results include the parameter values of the BPA function calculated from the training set and the final discrimination precision and recall obtained from the test set.

A. EXPERIMENTAL AREA AND DATA
The experiment was carried out on the campus of Wuhan University. Three intersections were selected for the experiment (see Fig. 9). These intersections are located on the main roads of the campus. Intersection A is a crossroad connecting the residential area with the teaching area, which contributes to its heavy traffic throughout the day. Intersection B is also a crossroad but has less traffic and pedestrians than intersection A. Different from intersection A and B, intersection C is a T-junction with more narrow lanes, thus owning the least traffic flow. Meanwhile, it is a common phenomenon  that students play with mobile phones while crossing these intersections.
In this study, 20 volunteers used the mobile phone application program written by the author to collect data at these intersections. These volunteers were divided into two groups. The data from the first group was used as the training set to build up the BPA functions, while the data from the second group was used as the test set to calculate the precision and recall rate of the algorithm. Each volunteer collected two sets of data at each intersection, including: Set1: Read the messages with the phone and passed through the intersection. This set was taken as a positive sample in the test to calculate the precision and recall rate.
Set2: Held the phone but swung it freely to cross the road. This set was taken as a negative sample in the test to calculate the probability that the algorithm misidentified a non-target posture as the target posture.
The data acquisition program ran on the Android operating system in the Redmi4 smartphone. The sensor data were sampled at a frequency of 20 Hz, and the collected data were stored as a separate ''.txt'' file for each repetition. The collected files were processed in MATLAB. The data format of this experiment is listed in Table 7.

1) PARAMETER VALUES OF THE BPA FUNCTION
Using the box-plot statistics of the first group of positive samples, the parameter l, m, u were determined as shown in Table 8. Then the BPA functions were obtained by substituting the parameters into formulas (5) and (6).

2) ALGORITHM PRECISION AND RECALL
The Group1 and Group2 test set data were divided into 2-second segments, including 1000 positive and 1000 negative samples. The proposed algorithm was used to classify these samples. There were four possible results available, as shown in Fig.10.
Classification results were obtained from 10 repeated tests for Intersection A, as shown in Table 9. The average Precision and Recall were both above 0.9, which was a satisfactory classification result.

3) COMPARISON WITH THE SINGLE-SENSOR-FEATURE BASED RESULTS
In order to evaluate the advantages of fusing multiple sensors information, the results from our approach were compared with those separately calculated by 14 single sensor features, as shown in Table 10 and Fig.11.
According to the experimental results, the 14 sensor features could be classified into four categories: In the first category of sensor features (such as prxm, prxv, gyrym, and gyrzm), the value of TP and FP are both high; that is, a test sample is more likely to be judged as positive. Thus, the recall rate is high while the precision rate is low. The contributions of such sensor features in the fusion are mainly to provide valid information for identifying positive samples.  The second category of sensor features (such as gyryv, accxm, accxv, accyv, and acczm), has high values for TN and FN, which means a test sample is prone to be judged VOLUME 8, 2020  as negative. Therefore, the recall rate is low while the precision rate is high. These sensor features would contribute to the negative samples identifying in the fusion.
The third kind of sensor features, such as acczv, is not very effective in recognition of positive and negative cases, thus providing a limited contribution to the fusion. The weight of this feature will be reduced in future research.
The fourth kind of sensor features (such as gyrxv, gyrzv, accym), performed well both in accuracy and recall rate. This kind of feature provides credible evidence for fusion; thus, the fusion weight of such features should be improved in subsequent research.
Although the single sensor feature may be superior to the fusion results in either the precision or recall rate; however, considering the tradeoff of the two indexes, the fusion result is better than any other single-sensor-feature performance in the F1 index. This indicates that our multi-source information fusion approach can effectively improve the probability of Smombie context-awareness.

4) COMPARISON WITH DIFFERENT FUSION APPROACH
To demonstrate the advantage of the proposed approach, we also compared the result with the traditional D-S evidence theory and Bayesian Inference method. The results are shown in Table 11 and Fig.12. According to the result, our method has the highest accuracy and recall rate.

5) ADAPTABILITY EVALUATION OF DIFFERENT INTERSECTIONS
We used the same method to calculate the accuracy rate and recall rate for the data collected at the three different junctions, respectively. The results are shown in     Table 12 and Fig.13. All of the precision and recall rates of the three intersections are above 0.9, which demonstrates the adaptability of the algorithm in different environments.

V. DISCUSSION
This approach could be used as a built-in context-awareness function on mobile phones. When the function is activated, a mobile phone will identify whether the user is in Smombie Context and proffer a reminder to reduce their risk at crossroads. However, several issues and limitations should be further discussed in case of large-scale application.

A. BATTERY ISSUES
To minimize the battery consumption, in our framework, the battery cost of sensor acquisition and calculation is reduced by some prepositive restrictions. Only when the user is in the intersection buffer, the phone screen is unlocked, and a non-navigation class app is used, will the phone collect the sensor information and run the calculation program, thus minimizing the battery consumption.
Besides, according to the TalkingData Company [59], they have performed performance tests according to a comparison of multiple phones. They continuously picked up sensor data and performed real-time behavior recognition all day long, and figured out that the battery consumption was about 1% per hour. Therefore, our algorithm should consume less in the case of discontinuous acquisition, thus not affecting the daily use of the mobile phone.

B. PRIVACY ISSUES AND REAL TIME PROCESSING
Because the location, sensors data, and app thread information are sensitive, to avoid privacy problems, our study possesses real-time computing on the smartphone directly. The data will be only processed on the phone without uploading, thus avoid data disclosure.
In practical application, the real-time processing algorithm must calculate fast to timely remind pedestrians. We tested the calculation time through 1000 times, and the average calculation time was 0.0008 seconds per time, which was good enough to meet realistic requirements.

C. ACCURACY LIMITATIONS
Smartphone sensors data can be of very low accuracy sometimes. For the GPS sensor, according to previous research, the accuracy of GPS is about 0.05-10 m [60]- [62]. Since there is a buffer zone in our position judgment, the positioning accuracy almost meets the demand. But when it comes to mass usage, embedded Geofencing tools in IOS or Android, which can provide a more accurate location by fusing multiple positioning approaches [63], would be a better choice. For other sensors, the accuracy is affected by sensor heterogeneity and pedestrian biomechanical effects. Different smartphone brands, types, operating systems would cause heterogeneous smartphone sensor data [64]. Meanwhile, pedestrian's intermediate transition postures would produce biomechanical effects, causing user-induced uncertainties in sensor data [65].
One shortcoming of this paper caused by biomechanical effects is that at present, only when a pedestrian uses a phone with a vertical screen (like texting, reading, watching activities), will the algorithm take effect. When a pedestrian uses the phone with a horizontal screen (like playing mobile games), the posture may have different features and data uncertainties. However, considering that in most cases, users always use mobile phones with a vertical screen, there is little interference in the application of this study. And this shortcoming will be complete in the future work.
The fusion approach will also affect the accuracy of the algorithm. Our approach shows some effectiveness, but there is still potential for improvement. In the future work, we will test the heterogeneity of sensors across different smartphones, try different fusion approaches to improve accuracy, and provide adaptive algorithms that allow each mobile user to record their own personalized gesture parameter data in advance, thus reducing sensor heterogeneity and pedestrian biomechanical effects in our approach.

VI. CONCLUSION
With the increasing popularity of smartphones, more and more pedestrians are addicted to their mobile phones while across the road. This research proposes a method for context awareness to identify this behavior. Therefore, warning reminders can be send to pedestrians, which could help reduce the occurrence of traffic accidents, serving as a useful application innovation for society.
In this study, a framework for Smombie context awareness that integrates pedestrian behavioral information in physical and virtual space is proposed. This framework uses geographical location to trigger context-awareness judgment, thus reducing the data calculation and battery cost of other sensors. And the context-awareness judgment not only considers the physical postures but also the virtual space activities. The integration of such multi-source information mutually complements and verifies each other to depict a complex situation like the ''Smombie'' situation, which is a new direction of context-awareness applications.
Meanwhile, based on fuzzy mathematics and D-S evidence theory, an improved multi-information fusion algorithm was designed. This algorithm can effectively identify whether users are playing with their smartphone while crossing a road. The algorithm has an average precision rate of 0.999 and a recall rate of 0.916.
We compared our approach with those results obtained from a single sensor feature. It showed that the fusion result outperforms other single-sensor-features in the F1 index, achieving a trade-off between the precision and recall rate. Meanwhile, we also identified how different sensors contributed to the fusion, thus helping to analyze the performance of the proposed approach in theory.
To clarify the distinctive traits of our work, we also compared the results with the traditional D-S evidence and Bayesian Inference method. It showed that our method has an improvement in recall and precision. We also conducted an adaptability evaluation of different intersections. Both the precision and recall rates of the three intersections are above 0.9, which demonstrates the adaptability of tche algorithm in different environments. His research interests include the spatio-temporal modeling of human behavior, space-time GIS for transport, and intelligent navigation. VOLUME 8, 2020