Making Connected Cars Untraceable via DSRC Radios

This paper shows potential of using DSRC radios for vehicle tracking protection. We focus on trafﬁc data collection as our target application where vehicles send their locations to a server while driving. Such vehicles are easily trackable—revealing location history—as the application often requires frequent and accurate location updates. This paper presents PathCloak, a method that enables vehicles to report their locations while preventing the server from properly tracking the vehicles. PathCloak leverages vehicles’ two network interfaces: in-car Internet (for accessing the server) and car-to-car DSRC (for creating path confusion). PathCloak-enabled vehicles exchange their kinematic information via DSRC radios, and use it to generate plausible path segments for each other, making their paths indistinguishable each other from the server’s viewpoint. We demonstrate its feasibility via ﬁeld experiments on real roads using our DSRC testbeds. Our evaluation shows that PathCloak offers strong privacy (tracking success ratio < 1%), while maintaining high utility for various trafﬁc statistics.


I. INTRODUCTION
Dedicated short-range communication (DSRC) is a vehicular communication technology that has long been a pilot service worldwide [1]- [4]. These long-term tests showed that DSRC can improve road safety and reduce traffic accidents. The key instrument is the basic safety message, a DSRC beacon that contains vehicle position, heading, speed, and other information about a vehicle's state and predicted path. Surrounding vehicles receive the message, and each estimates the risk, e.g., collision threats, imposed by the transmitting vehicle. Such enhancement of road safety has long been tested as the first prominent use of DSRC on the roads.
In this work, we explore new potential of DSRC that can enhance vehicle privacy using existing DSRC beacons. We focus on traffic data collection as our target application where vehicles send their locations to a central server while driving. Such vehicles are easily trackable as the application often requires frequent and accurate location updates. Specifically, the server that collects periodic user locations could use them to track the vehicles and come up with their location history. Even with anonymized location reports, the server-side timeseries analysis on such location samples (i.e., following the footstep)-also combined with personally identifiable posi-tion data-could accumulate path information and eventually identify users' location history. This paper presents PathCloak, a method that enables vehicles to report their locations while preventing the server from properly tracking the vehicles. PathCloak leverages vehicles' two network interfaces: in-car Internet and car-to-car DSRC. Vehicles send locations to the server via in-car Internet while protecting against tracking via car-to-car DSRC. To this end, we utilize moving vehicles' kinematic informationposition, speed, heading, etc-that is originally contained in a regular DSRC beacon (broadcasted 10 times per second). By exchanging such information via DSRC radios, vehicles generate plausible, mutually-obfuscating path segments for each other. Each path segment is created in a way that it diverges from one's actual path and joins the other's actual path. Such a PathCloak operation is brief and can thus be performed multiple times during the trip. As a result, a vehicle's path keeps branching off from the viewpoint of observers.
PathCloak has several merits: (i) vehicles send their actual updates all the time, ensuring spatial-temporal accuracy; (ii) car-to-car DSRC (range of 300-400m) enables a flexible form of path confusion between vehicles moving in various directions; (iii) tracking protection is possible with a minimal number of false updates; (iv) the mutually beneficial condition naturally encourages user participation; and (v) location data collected at the server closely reflects the real traffic traces, offering high utility for traffic data statistics.
We demonstrate the feasibility of PathCloak via field experiments on real roads using our DSRC testbeds and tracedriven evaluation. Our results show that PathCloak successfully creates path confusion in various real-world driving situations, and offers strong privacy (tracking success ratio < 1%) while retaining high utility for various traffic statistics such as congestion estimation, traffic flow modeling, etc.
It is worth mentioning that the protection performance of PathCloak depends on the DSRC-equipped rate. In other words, the more vehicles are equipped with DSRC radios, the more PathCloak operations are likely to occur during a trip-the greater the confusion. Thus, a high equipped rate is desired for PathCloak. We do not expect PathCloak to be an immediate rollout as the standard has not been fully adopted by vehicle manufacturers. In this work, we intend to report the new potential of DSRC via PathCloak as it is anticipated that celluar-based in-car Internet and DSRC will be deployed in coexistence as essential features for soon-tocome connected cars [5]- [8].
In summary, we make the following contributions: • We unlock the potential of DSRC to protect location privacy. This opens up a new opportunity for DSRC, which was previously used predominantly for road safety.
• We build a full-fledged prototype and conduct real road experiments using PathCloak-enabled vehicles.
Our field measurement provides insights about necessary conditions for vehicles to perform PathCloak operations in reality.
• We examine PathCloak-generated traces from both privacy and utility perspectives. We evaluate the privacy using the most advanced deep-learning based tracking. We assess the utility using three different traffic data analysis tasks.

II. MOTIVATION A. TRAFFIC DATA COLLECTION
Our work specifically focuses on location reporting applications that do not explicitly require a user identity. 1 In this structure, users only send their GPS coordinates to the server while driving. This type of applications is intended to collect users' location data for various traffic statistics. They range from traffic monitoring [9]- [11], to traffic congestion assessment [12]- [14], to mobility pattern estimation [15]- [18], to dashcam data reporting [19]- [21]. These applications aim to capture data that accurately reflects the real-world traffic situation. These applications aim to capture data that accurately reflects the real-world traffic situation. Spatialtemporal accuracy of user reports as well as their reporting frequency thus play an important role for the quality of data accumulated in the server. For this reason, some fine-grained, crowdsourced monitoring systems have their user reporting as frequent as every second [19]. These location reports can be made anonymous via a simple practice: (i) users can hide network identifiers via anonymization networks like Tor or I2P; and (ii) users can also constantly change their externally-visible network IDs (e.g., by switching Tor circuits). This will prevent network IDs from being used for distinguishing among users. However, such anonymized reports do not guarantee privacy as below.

B. THREAT MODEL
Our primary adversary in this work is a hostile server or anyone with access to the database. These attackers may wish to track users via the collected location samples. Especially, if a user sends location reports frequently (even with constantly changing network IDs), this creates a trail of his/her locations, allowing a hostile server to easily follow a user's path.
Such location tracking can further reveal users' very private information if the server can connect specific individuals to specific locations [22]- [24]. For example, an anonymous user at a hospital or other private location can be eventually identified if the user's path is tracked from his/her home (as a resident of a particular address). We assume a strong adversary with the combination of such identification and tracking capabilities by cross-referencing the coordinates of the location reports with a database, and using powerful tracking techniques, e.g., state-of-the-art machine-learning based tools to reconstruct a target user's path from anonymized location samples.

C. GOAL MODEL
Tracking protection. A collection server should not be able to trace a user's path. More specifically, given a start position of a user, the server should not be able to determine an end position of the trip, and vice versa. Such untraceability should be retained even under frequent and high-accuracy location updates. Utility preservation. Enabling tracking protection should not compromise the quality of collected statistics. For example, reporting too much fake location data will result in an excessive noisy database, making it suffering from serious utility losses. The servers should be able to maintain utilitypreserving, fine-quality databases.

Space-time intersection.
Mix-zones [25]- [29] make users' paths indistinguishable if they coincide in space and time. As shown in Fig. 1a [33]. However, it requires that false reports vastly outnumber the real ones. As in Fig. 1b, making the chance of an attacker picking the true path below 0.01 requires the individual user to report at least 100 different false paths. The more users adopt this approach, the more overwhelming the volume of "noisy" data accumulated in the system, severely reducing utility. Differentially-private traces. Data collectors can publish trajectory data in the form of synthesized location traces that achieve differential privacy [34]- [36]. A recent work [36] further presents a high-utility trace synthesizer with differential privacy guarantees. However, they are inherently server-driven privacy techniques where data collectors have raw user trajectory data and process it for publication. User location history is known to the data collector.

III. PATHCLOAK: COOPERATIVE PRIVACY
As mentioned earlier, false trips can create confusion but will incur serious utility losses. The main culprit is that each individual user produces a large number of false paths for his/her own protection (Fig. 1b). We suggest using inter-vehicle communication to do a cooperative form of tracking protection with minimal false data. With cooperation between vehicles (via DSRC radios), we can create confusion with pairs of short false paths (or false pathlets) to collectively realize the full-trajectory protection, as illustrated in Fig. 1c.

A. PATHCLOAK OVERVIEW
Here we briefly describe how PathCloak works (see Fig. 2). When vehicles A and B, both reporting to server S, hear each other's DSRC beacon (range of 200-400m), they determine whether to generate false pathlets for each other. If yes, from this moment (t 0 in Fig. 2a), each vehicle A independently sends S two location reports (via two new network IDs): one is A's actual location; and the other is B's false location.
Once the actual and false pathlets converge (e.g., t 0 +α in Fig.  2b), the vehicle sends S only its actual locations (via a new network ID) and performs, whenever possible, the PathCloak process with others as it moves on.  Outcome of PathCloak. PathCloak achieves the followings: (i) vehicles send their actual location reports all the time (spatial-temporal accuracy); (ii) the number of false reports during a trip is strictly less than actual ones (data quality); and (iii) continuously divergent paths prevent the server from determining vehicles' actual trajectories (privacy). Compared with Mix-zones. Mix-zones create confusion about vehicle paths with help from RSUs (roadside units). However, it is specifically designed to protect against a passive adversary who installs its own, globally-placed receivers in roadsides and tries to eavesdrop vehicle location information in DSRC beacons. In other words, mix-zones are not intended (thus unable) to provide privacy under locationreporting applications where vehicles continuously send their locations to a remote server. Whereas, PathCloak is tailored for such applications, protecting against tracking via intervehicle cooperation even without help from RSUs. Why cooperative? PathCloak ensures privacy only when vehicles generate false pathlets for each other. A selfish user may not collaborate; the user may not produce a false pathlet for the other who does for him/her. We, however, argue that such uncooperative behavior rather threatens his/her own privacy. Suppose that B in Fig. 2 does not generate a false pathlet for A. Then, the server could easily recognize B's actual path via a posteriori reasoning. Without A's false pathlet (created by B), there will be only a single path from A's position at time t 0 to t 0 + α. This will negate B's false pathlet (created by A), and thus B's actual path will be identified. Such mutually beneficial condition in PathCloak naturally encourages user participation.

1) Pairing Between Vehicles
Each vehicle A determines whether to perform PathCloak with other vehicle B based on received DSRC beacons, especially B's current location loc B cur and speed speed B cur . A assesses whether its actual path and B's false pathlet could converge in the predefined time period T f (without significantly changing B's speed) as follows: where loc A T f is A's estimated future position (in T f ) based on A's current speed and route ahead provided by in-car navigation system. Dist(loc B cur → loc A T f ) is a driving distance 2 from the start point of B's false pathlet (loc B cur ) to the provisional convergence point (loc A T f ). λ is the predefined margin that limits the short-term speed change on the false pathlet to avoid trackers' suspicion. Note, A's exact future positions can also vary from the estimated ones due to its speed change. Thus, satisfying Inequality (1) does not pinpoint the convergence time, but means that A's actual path and B's false pathlet are likely to converge in about T f . If the beacon from B satisfies Inequality (1), A sends B a propose message. Likewise, B responds with an accept message if confirming the joinability of B's actual path and A's false pathlet based on A's beacon. Note that these PathCloak messages are contained in a regular DSRC beacon and thus are broadcasted 10 times per second. If no accept message from B is received within one second, then A considers that this proposal is unsuccessful and proceeds with other candidates as moving on. One second of waiting is chosen because it gives enough time to deal with possible losses of DSRC beacons (i.e., propose/accept messages) while not disrupting the periodic (second-scale) location reporting to the server. It is also worth mentioning that, despite several retransmissions within one-second waiting time, a permanent loss of an accept message, if ever, can cause a "half-pairing"; the one who sent the accept message creates a false pathlet 2 The driving distance between two points on a road map, especially not very distant ones, can be instantly obtained from in-car navigation systems. while the proposer does not. We, however, point out that such a case is not only rare but also tolerable without permanent disruption of the full-trajectory protection as the vehicles continue to perform, whenever possible, the PathCloak operations with others as they move on.
Once a mutual pairing is made, each vehicle starts to independently send the server two location reports (via two new network IDs): actual (for itself) and false location (for the other). We here use Tor to switch vehicles' network IDs. Whenever new network IDs are needed, the vehicle use new Tor circuits. To do so, the vehicle prepares new circuits in advance while driving, as it takes 5-7 secs to build a new circuit. Fig. 3 shows an example of such location reporting from the network-ID perspective of vehicle A. From this moment when the vehicles get paired, their DSRC beacons contain a pairing-on message, indicating that they are not currently available for pairing with others. Fig. 4 shows the finite state machine that specifies a PathCloak-enabled vehicle.

2) Plausible Pathlet Generation
We describe the false pathlet generation by vehicle A, and it is a reciprocal procedure for B. Convergence point selection. A first chooses the convergence time T conv . A picks T conv randomly within Inequality (1) by replacing T f with T conv to fix the convergence position L conv (= loc A Tconv ) in advance. That is, A will reach L conv if moving along the route with default speed speed A df lt (= speed A cur ) for T conv . And, B's false position will be at L conv if moving with default speed speed B df lt for T conv as follows: Note that resulting speed B df lt is within the margin λ of B's current speed speed B cur as λ is already reflected in T conv . Now A is ready to generate B's false location reports. Reporting locations. The key idea is that, while A may change its speed, keeping the ratio of speed A df lt : speed B df lt (by adjusting fake B's speed accordingly) will lead them to convergence location L conv at the same time (not necessarily at T conv ). More specifically, on i th reporting time where A's actual location is loc A i , B's false position loc B i is created via the following proportionality: is a moving distance since the last reporting time. Note that B's false locations are placed along the route of Dist(loc B cur → L conv ) obtained (in Eq. (2)) by in-car navigation system. The resulting false pathlet reflects such a realistic route and has its coordinates variably spaced along the route. A sends loc A i and loc B i via two different network IDs. Once the actual and false pathlets converge, i.e, when A reaches the convergence location L conv , A only sends the actual locations over a new network ID. From this moment, A's DSRC beacons contain a pairing-off message, indicating that it is available for pairing with others.

IV. THE COST OF PATHCLOAK
Communication. The format of a PathCloak message is as follows: message type (2 bits: propose, accept, pairing-on, or pairing-off), counterpart vehicle ID (6 bytes), and an LBS identifier (4 bytes). 3 The length of our PathCloak message is thus only 11 bytes. It can be piggybacked into a DSRC beacon-Basic Safety Message (BSM) whose size is as large as nearly 300 bytes [37], as shown in Fig. 5. False data. A PathCloak-enabled vehicle generates false data only when being paired for the operation. Thus, its volume never exceeds that of the real one. Our result from the PathCloak experiment (in Section VI) shows that the total volume of false updates is typically 15-20%. Note that, in the same settings, independent false trips [30]- [33] would incur at least 100 times more number of false updates than PathCloak in order to achieve the same degree of privacy. Computation. Three key procedures that require computation are: (1) pathlet joinability check (when pairing); (2) convergence point selection (upon paired); and (3) false location creation (until pairing-off). Table 1 shows the time taken in each procedure when running our PathCloak implementation on a Raspberry Pi as well as other platforms. Proc (1) takes more time than Proc (2) and (3) as it runs in-car navigation API-we use GraphHopper directions API [38]-to compute driving distances of false pathlets. Still, the processing time on a Raspberry Pi is small enough to fit in the DSRC beacon interval of 100 ms. Note, in-vehicle computers  Prototype implementation. Our testbed consists of onboard DSRC units (deployed in a rear window), Raspberry Pis with LTE module (for in-car Internet access), and 7" touchscreen (for display) as depicted in Fig. 6. We implement PathCloak on Raspberry Pis, which connects (via an Ethernet cable) to the DSRC OBU for inter-vehicle PathCloakmessage exchange. For in-car navigation, we use GraphHopper [38], an open-source, fast directions API to set actual routes ahead and to compute false pathlets. All our source code is available at https://github.com/inclincs/pathcloak. Experiment settings. We conduct our experiments for six weeks in downtown/residential areas of Seoul, Korea.
We run experiments using two vehicles equipped with our PathCloak-enabled DSRC testbeds in various situations. The vehicles-we vary their locations, moving directions, and speeds-send out PathCloak messages via 802.11p-based DSRC broadcast. These messages are contained in a regular DSRC beacon and thus are broadcasted 10 times per second. The transmission power is set to 14 dBm as recommended in [37]. In our field experiments, we use λ = 0.3 and T f = 30 secs-which are the recommended values and will be discussed in Section VI. We make the vehicles (as users) send their location reports every second via the LTE interface to our collection server.

B. MEASUREMENT RESULTS
In the field tests, we use the following two metrics: • Pairing success rate (P-rate): Ratio of the number of runs where the two vehicles get paired and perform PathCloak operations to the total number of runs.   • Tracker's uncertainty: Normalized entropy of probability that the attacker correctly picks an actual pathlet upon each individual PathCloak operation.

Experiment-Set 1: Vehicles Are Making Right Turns
The first experiment-set reflects a situation where vehicles make right turns in opposite directions (Fig. 9) [40], and our experiments have obeyed this speed limit. We make our vehicles (at least 800m apart) move from opposite sides of the intersection. When making a turn, our vehicles have temporarily reduced the speeds accordingly.

Actual pathlet False pathlet
Pairing point

Merge point
The same-colored pathlets are generated by the same vehicle.   7 shows the result of pairing success rate (P-rate) from these experiments. We get high P-rate (> 83%) in the first three cases and the reduced P-rate (47%) in the last case (30:50km/h). This suggests that the speed difference between vehicles impacts the P-rate. The larger the speed difference, the greater the disparity in lengths of the corresponding actual and false pathlets (due to the greater difference in their estimated moving distances in T f ; see Eq. (1)), making it hard to satisfy the PathCloak triggering condition. Table 2 shows the average vehicle distance at the time when PathCloak is triggered. The pairing range is as far as 430m. We also observe that, throughout the experiments, PathCloak operations do not require the paired vehicles to reach the intersection at the same time. This shows that PathCloak indeed works without space-time intersection of vehicles. Fig. 9 depicts sample pathlets from the experiment. 4 We use the notation X : Y km/h to denote that vehicles A and B move at speeds of X and Y km/h, respectively.

Experiment-Set 2: One Moving Straight, The Other Turning Right
The second experiment-set reflects a situation where one vehicle moves straight while the other makes a right turn (both initially in the same direction), as depicted in Fig. 10. We again conduct a set of four experiments but with different vehicle speeds as follows: (i) 40:40km/h; (ii) 40:50km/h; (iii) 30:50km/h; and (iv) 20:40km/h. In each experiment, we make the slower vehicle take a right turn at the intersection.    8 shows the result from the experiments. We get high P-rate (> 75%) in the first three cases-even for the third one (30:50km/h). Whereas, only 17% P-rate is attained in the last case (20:40km/h). This result indicates that, besides the speed difference between vehicles, their actual speeds also affect the P-rate. When a vehicle move at a slower speed, the corresponding PathCloak triggering condition becomes tighter (Eq. (1)), making it hard to get paired with others.

Experiment-Set 3: Vehicles Are Moving Straight in the Same Direction
The third experiment-set represents a situation where both vehicles move straight in the same direction. This time, we make both vehicles move at the same speed-which is typical for vehicles in the same road. Instead, we vary the vehicle distance: 50m, 100m, 150m, and 200m.  longer-distance cases (except 150m at 45km/h). The reason for the short pairing distance (cf. 300-400m in the other scenarios) is that, in this setting, the vehicle distance is directly added (/subtracted) to the lengths of false pathlets, making it a limiting factor for the triggering condition. Fig.  11 depicts sample pathlets from the experiment. At first glance, this seems to offer little privacy as both vehicles still move in the same direction. We, however, point out that every single PathCloak operation (including such lane-level confusion) collectively contributes to full trajectory tracking protection in that: (i) after the confusion, the tracker of a target vehicle needs to follow both vehicles; (ii) their eventual destinations are not likely the sametheir paths will diverge afterwards; and (iii) each of them continues to perform PathCloak with others-doubling the degree of confusion. Note, each PathCloak operation serves as a building block to collectively realize the full-trajectory protection.    . 14 shows high P-rate (> 78%) in the first three cases and low P-rate (26%) in the last case, which signifies the impact of relative vehicle speed. Note, this result is only valid for intersections that allow U-turns. We have also conducted the same set of experiments in the roads where U-turns are not permitted, and found that pairing is not possiblenavigation does not provide false pathlets with U-turns in such roads. This indicates that traffic rules also affect the triggering condition in this situation. Fig. 18 depicts sample pathlets from the experiment.     15 shows high P-rate (> 73%) in the first three cases and low P-rate (23%) in the last case-a similar trend to Ex-Set 4. Note, we were able to obtain this result only in the intersections with U-turns. This signifies that, if either one of the vehicles moves straight in the opposite direction, the key requirement for pairing them is the allowance of U-turns.

Experiment-Set 6: One Moving Straight, The Other Turning Left
This reflects a situation where one vehicle moves straight while the other makes a left turn, as depicted in Fig. 20. Fig. 16 shows high P-rate in the first three cases and the reduced P-rate (58%) in the last case, displaying again the effect of relative vehicle speed. Note that many intersections VOLUME 4, 2016 in Korea allow going straight and (protected) left turns at the same time. This set of experiments has been performed under such traffic light control.

Actual pathlet False pathlet
Pairing point

Merge point
The same-colored pathlets are generated by the same vehicle. Nevertheless, the above results are still valid for other types of intersections. Unlike Ex-Set 5, this situation does not require specific traffic rules (e.g., U-turns) in the creation of false pathlets. Thus, the speed difference is the major factor in this scenario. For example, in the case of unprotected leftturns, the vehicles are likely to move much slower (or even stop) as reaching the intersection while straight-moving vehicles pass the intersection without slowing down. Such a speed difference would corresponds to our last case (30:50km/h) or worse-hence the lower P-rate.

Experiment-Set 7: One Turning Right, The Other Turning Left
This represents a situation where one vehicle makes a right turn while the other takes a left turn, as depicted in Fig. 21 Actual pathlet False pathlet Pairing point

Merge point
The same-colored pathlets are generated by the same vehicle.   17 shows high P-rate in the first two cases, whereas P-rate of 43% at 30:50km/h and 12% at 20:40km/h. This reconfirms that both the relative and the actual vehicle speeds are the key factors for the triggering condition. Note that the above results are also applicable to any types of intersections for the same reason as Ex-Set 6.

Experiment-Set 8: Vehicles Are Approaching from Perpendicular Directions
This experiment-set reflects a situation where vehicles approach an intersection from perpendicular directions. In this situation, large buildings at the intersection corners, if any, may restrict the DSRC communication between the vehicles-the wireless signal could be blocked by the buildings. To assess its impact on the P-rate, we conduct the experiments in two different environments: line-of-sight (LOS) and non line-of-sight (NLOS). Fig. 22a and 22b show the P-rate results in the LOS (Fig.  23a) and the NLOS intersections (Fig. 23a), respectively. In     the LOS case, we get high P-rate (> 75%) when the vehicles move at the speeds of 40:40km/h and 40:50km/h-a similar trend to Ex-Set 7. Whereas, in the NLOS case, the P-rate decreases considerably-only 41% successful pairing even at the speeds of 40:40km/h. We observe that, in the NLOS intersection, pairing is possible only at the time when one vehicle passes the corner of the building towards the intersection (e.g., vehicle A in Fig. 24), reaching the LOS position to the other. Thus, the pairing condition in an NLOS intersection is restricted to a specific timing-just prior to entering the intersection. This rules out pairing opportunities that might exist at other timings if not with the DSRC blockage. Our result indicates that, when vehicles approach from perpendicular directions, large obstacles (e.g., buildings) around the intersection have significant impact on the likelihood of their pairing. This observation also conforms to the previous empirical and theoretical results reported in [41]- [43]. We thus expect that such "perpendicular" pairing is more likely to happen in residential or suburban areas rather than downtown areas with many tall buildings.

Plausibility Experiment: Tracker's Uncertainty
To examine the plausibility of false pathlets created from our experiments, we measure the tracker's uncertainty in actual   Table 3 shows the tracker's uncertainty when applying the prediction to PathCloak traces from the previous Ex-Sets 1-8. We use the normalized entropy as the uncertainty index and thus, it is valued between 0 and 1. When the tracker is fairly unsure which pathlet is actual, the resultant index is close to 1. Our result above shows that the false pathlets from realworld driving are plausible enough to confuse the tracker.

Summary of Measurement Results
The results of our field tests show that PathCloak works well (P-rate>80%) in many cases. Our general observation is that P-rate between cars becomes high when: (i) the speed difference is small; and/or (ii) their actual speeds are high. This has practical implications for real-world drivers because: (i) nearby cars usually move at similar speeds; and (ii) people tend to drive near the speed limit (rather than driving slow). The plausibility result is also positive. Given a PathCloak operation, the tracker has high uncertainty (>0.98) about which pathlet is actual. This demonstrates that each PathCloak operation generates highly plausible pathlets, successfully creating en-route confusion on real roads.

VI. THE PRIVACY OF PATHCLOAK
We now evaluate PathCloak in a large-scale environment, focusing on its full trajectory protection. Here we consider the collection servers as potential attackers with tracking capabilities. We experiment with two prominent tracking methods: Deep-learning based and Markov-based approaches.

A. ATTACKER'S TRACKING MODEL
Deep-learning based tracking. We use a spatial-temporal recurrent neural network (ST-RNN) model [44], [45], one of the most-accurate location prediction methods today. We build it using Keras [46], a neural-networks API running on top of TensorFlow [47]. We obtain the GeoLife dataset [48]- [50], a real-world location history dataset from Beijing, China, containing five-year driving records of 182 users (+17K trajectories) with their GPS logged every 1-5 seconds. We use them as training input to the ST-RNN model that internally performs learning and captures spatio-temporal transition regularities for future location prediction. Markov-based tracking. We also test with a Markovbased model, which is traditionally widely used for mobility prediction [51]- [53]. Using the GeoLife dataset as a historical data, we generate a Markov transition matrix over a grid of blocks in a target area. For vehicle tracking, when a PathCloak operation is triggered, the attacker determines which one belongs to the actual path by predicting the most probable next block using the Markov matrix.

B. PRIVACY METRICS
Location entropy. We measure the degree of privacy using location entropy, is the attacker's belief (probability) that location sample l(i, t) from anonymous user i of time t belongs to the vehicle tracked. Lower values indicate more certainty or lower privacy. Tracking success ratio. To give more intuitive privacy results, we also assess a tracking success ratio S t , that measures the chance that the tracker's belief, when tracking a target over time t, is indeed true. Thus, S t is equivalent to p(u, t) of actual target u, since i p(i, t) = 1 at any time t. Note, S t is unknown to the tracker, who becomes unsure which l(i, t) belongs to target u over time.

C. PRIVACY EXPERIMENTS
We run PathCloak over the GeoLife dataset. To test with a less (/more) number of participants, we experiment with 50-1000 vehicle trajectories in a certain 6×6km 2 area of Beijing. Using them as baseline mobility patterns over the road network (extracted via OpenStreetMap [54]), we obtain PathCloak-enabled location traces via ns-3 simulator. Choice of λ and T f . We first assess the privacy implications of λ and T f , two key parameters for the PathCloak triggering condition (see Eq. (1)). To do so, we run PathCloak while varying their values. Fig. 34 shows the prediction success rate-the tracker's probability of picking an actual pathlet upon a single Path-Cloak operation-with different vehicle speeds. We see that: (i) the larger the speed-change margin λ, the less plausible the false pathlets; and (ii) such a trend becomes more significant in higher-speed roads. To have the value of λ as large as possible-more likely triggering a PathCloak operationwhile limiting the tracker's successful prediction, we choose to use λ=0.3 in urban roads and λ=0.2 in highways. Fig. 25, 28, and 31 show the location entropy of 20-min driving with different values of T f . We observe the diminishing returns in all the scenarios. This is, in fact, quite intuitive in that a small value of T f restricts the triggering condition while a large value incurs a fewer number of PathCloak operations for a given trip time. To balance this tradeoff, we VOLUME 4, 2016 choose to set T f = 30 seconds, which leads to the highest location entropy across all the scenarios.
Low-density result. Fig. 26 shows the average entropy with 50-150 users (1.3-4.2 users/1km 2 ). With 20-minute driving, vehicles reach four bits of location entropy even in the sparse case of n = 50. In other words, the tracker of a certain vehicle may suspect 16 different locations, 6 but without knowing the exact location. Fig. 27 shows the tracking success ratio in the same setting. It decreases to 0.2 before seven minutes of driving and further drops below 0.1 before ten minutes. Medium-density result. Fig. 29 shows the location entropy with 250-450 users (7.0-12.5 users/1km 2 ). In this more participatory setting, with 20-minute driving, a vehicle has 6 X bits of entropy corresponds roughly to 2 X equally likely locations.  Fig. 27 shows the tracking success ratio in the same setting. It drops below 0.1 before seven minutes of driving.
High-density result. Fig. 32 shows the entropy with 600-1000 users (16.6-27.7 users/1km 2 ). With 20-min driving, vehicles reach more than seven bits of entropy, implying that the tracker of a certain vehicle may suspect at least 128 different locations not knowing exactly where to locate it. In this setting, the tracking success ratio drops below 0.1 before five minutes of driving (Fig. 33). Note that, without PathCloak, on the other hand, it still remains above 0.8 even after 20 minutes. This result shows: (i) the privacy risk from anonymous location data in its raw form; and (ii) privacy protection against tracking in the PathCloak-enabled traces.

VII. THE UTILITY OF PATHCLOAK
We now assess PathCloak from a utility perspective. We measure how closely PathCloak-enabled traces reflect the real (raw) data for various traffic statistics.

A. UTILITY REFERENCE MODEL
We use the following prominent traffic-data analysis tasks to evaluate utility of PathCloak-enabled traces.
Traffic congestion estimation. Location updates from vehicles can provide realtime traffic information to predict traffic volume on road segments. We use the speed performance index [14] to estimate the road traffic state. The index value [0, 100] (percentage) is computed as a ratio between average vehicle speed and the maximum permissible road speed. Three threshold values (25,50,75) are used for classification of road traffic level. The index values within [0,25], (25,50], (50,75], and (75, 100] indicate road traffic level as heavy congestion, congestion, smooth, and very smooth, respectively. Traffic mobility modeling. The collected vehicle trajectories also provide information about traffic dynamics in road networks. We use the Markov mobility model to capture vehicles' intra-trajectory movement patterns. In doing so, we compute Markov transition matrix P t and probability vector π t . Mobility modeling in such an aggregate level helps discover traffic flow patterns over road segments, benefiting road network planning, optimal allocation of roadside units, and provision of public transportation services. Points of interest (PoI) extraction. The goal is to discover locations that are frequently visited and are prominently of interest to the public. The PoI extraction outputs the distribution of visits among locations, specifically the most visited locations-the top n popular places. This can have various commercial benefits, including ideal geographic placement for advertisements or retail stores, spatial hotspot discovery, and travel recommendations.

B. UTILITY MEASUREMENT
Analysis results from PathCloak-enabled traces may not be the same as those from the raw traces. We measure the difference between their results on the above analysis tasks. Measurement settings. We extract 1000 vehicle trajectories from the GeoLife dataset and use them as our real traces. We obtain PathCloak-enabled traces over those raw trajectories in the road network. For comparison, we also generate traces using two other location-privacy preserving mechanisms (LPPMs): geo-indistinguishability (or geo-indi) and false trips. Geo-indi [55] is a generalization of differential privacy. It adds the planar Laplace noise to an actual location with parameter l = r, where the user is protected within radius r with a privacy level l that increases with the distance. We set l=ln(1.6) and r=75(meters). False trips [56] use many dummy routes. Each vehicle sends k false trips along with the actual one. Note, the parameter values for each LPPM are chosen to offer a similar level of privacy as PathCloak. 7 Traffic estimation result. We first assess the speed performance index (for each road segment) using the real data as well as the LPPM-processed data, and compare them. Fig. 35 shows the results-the cumulative distribution of the 7 We use k=15 for false trips-much lower privacy than PathCloak-due to evaluation costs. Nonetheless, it still shows lower utility than PathCloak. absolute value of their difference in speed performance index (between the real and the LPPM data). We see that the distribution of PathCloak leans toward zero, indicating that there are few "errors" in the traffic estimation. More specifically, the results are the same (or at most 5% difference) as those from the real data in more than 90% of the roads. On the other hand, for geo-indi and false trips, such accurate estimation happens in only 24% and 21% of the roads, respectively. As mentioned earlier, traffic condition at each road segment can be classified into four congestion levels based on its speed performance index. Table 5 shows the resulting traffic levels. Across all the cases, the traffic levels from PathCloak data match those from the real data in more than 91% of the roads, indicating a high-level of preservation. On the other hand, we observe the significant discrepancies between the real and the estimated levels in the other two LPPMs. For example, 92.4% of the "Smooth" roads retain their actual level when estimated from PathCloak data, whereas only 34.8% and 30.8% of the roads do with geo-indi and false trips, respectively. Mobility modeling result. We compare the traffic mobility model P t , π t from the real traces and the model P t , π t from the LPPM-applied traces. To measure the difference between two mobility models, we use the earth mover's distance (EMD) [57]. The EMD is widely used for evaluating dissimilarity between two probability distributions. We compute the expected EMD between two models: E[d EM D (P t (i, * ), P t (i, * ))] and E[d EM D (π t , π t )] for each region i and time t.   Table 4 presents the EMD results. Here, the uniform distribution is used as a baseline. We see that, for transition matrix P t of each LPPM, the mobility model from the PathCloak traces is closest-most closely resembling-to the actual model from the real traces. As for π t , the model from the geo-indi traces has a smaller value, albeit little difference, than that from the PathCloak traces. This is because geo-indi does not produce any dummy location data unlike PathCloak and false trips. However, PathCloak minimizes the use of dummies and thus, its mobility model-even for π t as wellclosely reflects the actual model from the real traces.    Table 6 shows the coverage results on the top n most frequently visited locations. We see that the coverage of the PathCloak traces is 100%. This is, in fact, quite an intuitive result. Such PoI-related information (e.g., n popular locations, etc) is retained entirely intact in the PathCloak traces because PathCloak never creates fake destinationsinstead it creates en-route confusion using vehicles' actual paths.
Overall, the statistics we have measured from the real traffic traces are well preserved in the PathCloak-enabled traces. Our evaluation results show that PathCloak is suitable for traffic data collection and traffic data analysis tasks via its privacy-preserving and high-utility trajectory data.

VIII. DISCUSSION
We discuss some remaining issues relating to PathCloak. Use of multiple applications. Our PathCloak description so far focuses on cases where the vehicles use the same location reporting application. In reality, it is not uncommon to use multiple such applications at a time, not necessarily the same ones across users. This requires additional consideration because a PathCloak operation should be performed by vehicles reporting to the same server(s). In fact, this also applies to any existing methods that create obfuscation using other vehicles' paths, such as mix-zones [25]- [29] and path confusion [58]. For this reason, they limit their description to the single application context. Still, such methods can work under multiple applications via proper user matching. In our case, this requires a slight modification to the PathCloak message. By specifying a list of the currently-used location reporting applications (instead of a single app-id) in PathCloak messages, vehicles can get paired when they have any overlap in their lists. In this way, false pathlets from a single PathCloak operation can be simultaneously used for those overlapped applications. We believe such modification does not significantly increase the size of PathCloak messages, as individual users tend to use only a certain number of reporting applications at a time.
Realism of false pathlets. While our false pathlets reflect realistic routes, what remains unaddressed is traffic-signal awareness. At present, PathCloak does not take into account traffic signals when creating false pathlets. As a matter of fact, reflecting realtime traffic signals is one of the most challenging problems in false trip generation, and still largely unaddressed in the literature [30]- [33], [59]. One potential direction for PathCloak would be to use computer vision techniques to recognize the state of nearby traffic lights via in-vehicle cameras [60], [61] for creating false pathlets. Although not explored in this paper, we believe such trafficsignal awareness is necessary for increasing realism and merits separate research. Navigation-assisted approach. The PathCloak operation requires the use of navigation systems in the driving process. In other words, we do not provide tracking protection for unequipped vehicles. We however expect that such vehicles will be uncommon in the near future. Indeed, in-car GPS navigation is becoming not only popular today, but also an essential feature for soon-to-come autonomous or selfdriving cars (with auto-pilot features) that will enter mass production by 2022 [62].

IX. CONCLUSION
In this work, we report the new potential of DSRC via PathCloak that can protect vehicles from location tracking under traffic collection servers. The key insight is to leverage vehicles' two network interfaces, in-car Internet (for accessing servers) and car-to-car DSRC (for obfuscating their paths). Our field experiments demonstrate that PathCloakenabled vehicles (range up to 400m) successfully create path confusion for each other-by exchanging their kinematic information via existing DSRC beacons-under various real road situations. The evaluation using real traffic traces shows that PathCloak provides strong privacy protection (tracking success ratio < 1%) while preserving high utility for the prominent traffic-data analysis tasks. In a broader scope, our solution explores to use untrusted location-collecting services in a privacy-preserving way while not giving up utility of the collected data statistics.