Introduction
Websites track their users for different reasons, including targeted advertising, content personalization, and security [2]. Traditionally, tracking consists in assigning unique identifiers to cookies. However, recent discussions and legislation have brought to light the privacy concerns these cookies imply; more people are sensitive to these issues. A study conducted by Microsoft in 2012 observed that they were unable to keep track of 32 % of their users using only cookies, as they were regularly deleted [26]. Cookie erasure is now common as many browser extensions and private modes automatically delete cookies at the end of browsing sessions.
In 2010, Eckersley introduced a tracking technique called browser fingerprinting that leverages the user's browser and system characteristics to generate a fingerprint associated to the browser [8]. He showed that 83.6 % of visitors to the Panopticlick website1 could be uniquely identified from a fingerprint composed of only 8 attributes. Further studies have focused on studying new attributes that increase browser fingerprint uniqueness [7], [10], [14], [18]–[20], while others have shown that websites use browser fingerprinting as a way to regenerate deleted cookies [1].
However, fingerprint uniqueness, by itself, is insufficient for tracking because fingerprints change. One needs to keep track of these evolutions to link them to previous fingerprints. Recent approaches exploit fingerprint uniqueness as a defense mechanism by adding randomness to break uniqueness [12], [13], [21], but they did not address linkability.
The goal of this paper is to link browser fingerprint evolutions and discover how long browsers can be tracked. More precisely, Fp-STALKER detects if two fingerprints originate from the same browser instance, which refers to an installation of a browser on a device. Browser instances change over time, e.g. they are updated or configured differently, causing their fingerprints to evolve. We introduce two variants of Fr-STALKER: a rule-based and an hybrid variant, which leverage rules and a random forest.
We evaluate our approach using 98,598 browser fingerprints originating from 1, 905 browser instances, which we collected over two years. The fingerprints were collected using two browser extensions advertised on the AmIUnique website2, oneforFirefox3andtheotherforChrome4. We compare both variants of Fp-STALKER and an implementation of the algorithm proposed by Eckersley [8]. In our experiments, we evaluate Fp-STALKER'S ability to correctly link browser fingerprints originating from the same browser instance, as well as its ability to detect fingerprints that originate from unknown browser instances. Finally, we show that Fp-STALKER can link, on average, fingerprints from a given browser instance for more than 51 days, which represents an improvement of 36 days compared to the closest algorithm from the literature.
In summary, this paper reports on four contributions:
We highlight the limits of browser fingerprint uniqueness for tracking purposes by showing that fingerprints change frequently (50 % of browser instances changed their fingerprints in less than 5 days, 80 % in less than 10 days);
We propose two variant algorithms to link fingerprints from the same browser instance, and to detect when a fingerprint comes from an unknown browser instance;
We compare the accuracy of our algorithms with the state of the art, and we study how browser fingerprinting frequency impacts tracking duration;
Finally, we evaluate the execution times of our algorithms, and we discuss the impact of our findings.
The remainder of this paper is organized as follows. Section II gives an overview of the state of the art. Section III analyzes how browser fingerprints evolve over time. Section IV introduces Eckersley's algorithm as well as both variants of Fp-STALKER. Section V reports on an empirical evaluation, a comparison to the state of the art, and a benchmark of our approach. Finally, we conclude in Section VI.
Background&motivations
a) Browser Fingerprinting
Aims to identify web browsers without using stateful identifiers, like cookies [8]. A browser fingerprint is composed of a set of browser and system attributes. By executing a script in the browser, sensitive metadata can be revealed, including the browser's parameters, but also operating system and hardware details. As this technique is completely stateless, it remains hard to detect and block, as no information is stored on the client side. Individually, these attributes may not give away much information but, when combined, they often form a unique signature, hence the analogy with a fingerprint. Most of the attributes in a fingerprint are collected through JavaScript APls and HTTP headers, but extra information can also be retrieved through plugins like Flash. Table I illustrates a browser fingerprint collected from a Chrome browser running on Windows 10.
b) Browser Fingerprinting Studies
Have focused on uniquely identifying browsers. Mayer [17] was the first to point out, in 2009, that a browser's “quirkiness” that stems from its configuration and underlying operating system could be used for “individual identification”. In 2010, the Panopticlick study was the first large-scale demonstration of browser fingerprinting as an identification technique [8]. From about half a million fingerprints, Eckersley succeeded in uniquely identifying 83.6 % of browsers. Since then, many studies have been conducted on many different aspects of this tracking technique. As new features are included within web browsers to draw images, render 3D scenes or process sounds, new attributes have been discovered to strengthen the fingerprinting process [5], [7], [9], [10], [18]–[20]. Additionally, researchers have performed large craw ls of the web that confirm a steady growth of browser fingerprinting [1], [2], [9], [22]. While most of these studies focused on desktops, others demonstrated they could successfully fingerprint mobile device browsers [11], [14]. Finally, a study we conducted in 2016 confirmed Eckersley's findings, but observed a notable shift in some attributes [14]. While the lists of plugins and fonts were the most revealing features in 2010, this has rapidly changed as the Netscape Plugin Application Programming Interface (NPAPI) has been deprecated in Chrome (September 2015) and Firefox (March 2017). Browser fingerprinting is continuously adapting to evolutions in browser technologies since highly discriminating attributes can change quickly.
c) Browser Fingerprinting Defenses
Have been designed to counter fingerprint tracking. The largest part of a browser fingerprint is obtained from the JavaScript engine. However, the values of these attributes can be altered to mislead fingerprinting algorithms. Browser extensions, called spoofers, change browser-populated values, like the User-agent or the Platform, with pre-defined ones. The goal here is to expose values that are different from the real ones. However, Nikiforakis et al. showed that they may be harmful as they found that these extensions “did not account for all possible ways of discovering the true identity of the browsers on which they are installed” and they actually make a user “more visible and more distinguishable from the rest of the users, who are using their browsers without modifications” [22]. Torres et al. went a step further by providing the concept of separation of web identities with FP-BLOCK, where a browser fingerprint is generated for each encountered domain [24]. Every time a browser connects to the same domain, it will return the same fingerprint. However, it keeps presenting the same limitation as naive spoofers since the modified values are incomplete and can be incoherent. Laperdrix et al. explored the randomization of media elements, such as canvas and audio used in fingerprinting, to break fingerprint linkability [12]. They add a slight random noise to canvas and audio, that is not perceived by users, to defeat fingerprinting algorithms.
Finally, the Tor browser is arguably the best overall defense against fingerprinting. Their strategy is to have all users converge towards a normalized fingerprint. The Tor browser is a modified Firefox that integrates custom defenses [23]. In particular, they removed plugins, canvas image extraction is blocked by default, and well-known attributes have been modified to return the same information on all operating systems. They also defend against JavaScript font enumeration by bundling a set of default fonts with the browser. However, using Tor can degrade the user's experience (e.g., due to latency) and can break some websites (e.g., due to disabled features, websites that block the Tor network). Furthermore, the unique browser fingerprint remains limited, as changes to the browser's configuration, or even resizing the window, can make the browser fingerprint unique.
d) Browser Fingerrint Linkability
Is only partially addressed by existing studies. Eckersley tried to identify returning users on the Panopticlick website with a very simple heuristic based on string comparisons that made correct guesses 65 % of the time [8]. Although not related to browsers, the overall approach taken by Wu et al. to fingerprint Android smartphones from permissionless applications [25] is similar in nature to our work. They collected a 38-attribute fingerprint, including the list of system packages, the storage capacity of the device and the current ringtone. Using a naive Bayes classifier, they were able to successfully link fingerprints from the same mobile device over time. However, the nature of the data in [25] strongly differs from the focus of this work. In particular, the attributes in a browser fingerprint are not composed of strong identifiers, like the current wallpaper, and the browser does not share personal information from other parts of the system as do applications on Android. For these reasons, the results are not comparable.
To the best of our knowledge, beyond the initial contribution by Eckersley, no other studies have looked into the use of advanced techniques to link browser fingerprints over time.
Browser Fingerprint Evolutions
This paper focuses on the linkability of browser fingerprint evolutions over time. Using fingerprinting as a long-term tracking technique requires not only obtaining unique browser fingerprints, but also linking fingerprints that originate from the same browser instance. Most of the literature has focused on studying or increasing fingerprint uniqueness [7], [8], [14]. While uniqueness is a critical property of fingerprints, it is also critical to understand fingerprint evolution to build an effective tracking technique. Our study provides more insights into browser fingerprint evolution in order to demonstrate the effectiveness of such a tracking technique.
a) Input Dataset
The raw input dataset we collected contains 1 72,285 fingerprints obtained from 7,965 defferent browser instances. All browser fingerprints were obtained from AmIUnique extensions for Chrome and Firefox installed from July 2015 to early August 2017 by participants in this study. The extensions load a page in the background that fingerprints the browser. Compared to a fingerprinting website, the only additional information we collect is a unique identifier we generate per browser instance when the extension is installed. This serves to establish the ground truth. Moreover, we preprocess the raw dataset by applying the following rules:
We remove browser instances with less than 7 browser fingerprints. This is because to study the ability to track browsers, we need browser instances that have been fingerprinted multiple times.
We discard browser instances with inconsistent fingerprints due to the use of countermeasures that artificially alter the fingerprints. To know if a user installed such a countermeasure, we check if the browser or OS changes and we check that the attributes are consistent among themselves. Although countermeasures exist in the wild, they are used by a minority of users and, we argue, should be treated by a separate specialized anti -spoofing algorithm. We leave this task for future work.
After applying these rules, we obtain a final dataset of 98, 598 fingerprints from 1, 905 browser instances. All following graphs and statistics are based on this final dataset. Figure 1 presents the number of fingerprints and distinct browser instances per month over the two year period.
Most users heard of our extensions through posts published on popular tech websites, such as Reddit, Hackernews or Slashdot. Users install the extension to visualize the evolution of their browser fingerprints over a long period of time, and also to help researchers understand browser fingerprinting in order to design better countermeasures. We explicitly state the purpose of the extension and the fact it collects their browser fingerprints. Moreover, we received an approval from the Institutional Review Board (IRB) of our research center for the collection as well as the storage of these browser fingerprints. As a ground truth, the extension generates a unique identifier per browser instance. The identifier is attached to all fingerprints, which are automatically sent every 4 hours. In this study, the browser fingerprints we consider are composed of the standard attributes described in Table I.
Figure 2 illustrates the anonymity set sizes against the number of participants involved in this study. The long tail reflects that 99 % of the browser fingerprints are unique among all the participants and belong to a single browser instance, while only 10 browser fingerprints are shared by more than 5 browser instances.
b) Evolution Triggers
Browser fingerprints naturally evolve for several reasons. We identified the following categories of changes:
Automatic evolutions happen automatically and without direct user intervention. This is mostly due to automatic software upgrades, such as the upgrade of a browser or a plugin that may impact the user agent or the list of p lugins;
Context-dependent evolutions being caused by changes in the user's context. Some attributes, such as resolution or timezone, are indirectly impacted by a contextual change, such as connecting a computer to an external screen or traveling to a different timezone; and
User-triggered evolutions that require an action from the user. They concern configuration-specific attributes, such as cookies, do not track or local storage.
To know how long attributes remain constant and if their stability depends on the browser instance, we compute the average time, per browser instance, that each attribute does not change. Table II presents the median, the 90th and 95th percentiles of the duration each attribute remains constant, on average, in browser instances. In particular, we observe that the User agent is rather unstable in most browser instances as its value is systematically impacted by software updates. In comparison, attributes such as cookies, local storage and do not track rarely change if ever. Moreover, we observe that attributes evolve at different rates depending on the browser instance. For example, canvas remains stable for 290 days in 50% of the browser instances, whereas it changes every 17.2 days for 10% of them. The same phenomena can be observed for the screen resolution where more than 50% of the browser instances never see a change, while 10% change every 3.1 days on average. More generally this points to some browser instances being quite stable, and thus, more trackable, while others aren't.
c) Evolution Frequency
Another key indicator to observe is the elapsed time
d) Evolution Rules
While it is difficult to anticipate browser fingerprint evolutions, we can observe how individual attributes evolve. In particular, evolutions of the User agent attribute are often tied to browser upgrades, while evolutions of the P 1 ugins attribute refers to the addition, deletion or upgrade of a plugin (upgrades change its version). Nevertheless, not all attribute changes can be explained in this manner, some values are difficult to anticipate. For example, the value of the canvas attribute is the result of an image rendered by the browser instance and depends on many different software and hardware layers. The same applies, although to a lesser extent, to screen resolution, which can take unexpected values depending on the connected screen. Based on these observations, the accuracy of linking browser fingerprint evolutions depends on the inference of such evolution rules. The following section introduces the evolution rules we first identified empirically, and then learned automatically, to achieve an efficient algorithm to track browser fingerprints over time.
CDF of the elapsed time before a fingerprint evolution for all the fingerprints, and averaged per browser instance.
Linking Browser Fingerprints
Fp-STALKER's goal is to determine if a browser fingerprint comes from a known browser instance-i.e., it is an evolution-or if it should be considered as from a new browser instance. Because fingerprints change frequently, and for different reasons (see section III), a simple direct equality comparison is not enough to track browsers over long periods of time.
In Fp-STALKER, we have implemented two variant algorithms with the purpose of linking browser fingerprints, as depicted in Figure 4. The first variant is a rule-based algorithm that uses a static ruleset, and the second variant is an hybrid algorithm that combines both rules and machine learning. We explain the details and the tradeoffs of both algorithms in this section. Our results show that the rule-based algorithm is faster but the hybrid algorithm is more precise while still maintaining acceptable execution times. We have also implemented a fully random forest-based algorithm, but the small increase in precision did not outweigh the large execution penalty, so we do not present it further in this paper.
A. Browser Fingerprint Linking
When collecting browser fingerprints, it is possible that a fingerprint comes from a previous visitor-i.e., a known browser instance-or from a new visitor-i.e., an unknown browser instance. The objective of fingerprint linking is to match fingerprints to their browser instance and follow the browser instance as long as possible by linking all of its fingerprint evolutions. In the case of a match, linked browser fingerprints are given the same identifier, which means the linking algorithm considers they originate from the same browser instance. If the browser fingerprint cannot be linked, the algorithm assigns a new identifier to the fingerprint.
More formally, given a set of known browser fingerprints
B. Rule-Based Linking Algorithm
The first variant of Fp-STALKER is a rule-based algorithm that uses static rules obtained from statistical analyses performed in section III. The algorithm relies on rules designed from attribute stability presented in Table II to determine if an unknown fingerprint
The OS, platform and browser family must be identical for any given browser instance. Even if this may not always be true (e.g. when a user updates from Windows 8 to 10), we consider it reasonable for our algorithm to lose track of a browser when such a large change occurs since it is not frequent.
The browser version remains constant or increases over time. This would not be true in the case of a downgrade, but this is also, not a common event.
Due to the results from our statistical analyses, we have defined a set of attributes that must not differ between two fingerprints from the same browser instance. We consider that local storage, Dnt, cookies and canvas should be constant for any given browser instance. As observed in Table II, these attributes do not change often, if at all, for a given browser instance. In the case of canvas, even if it seldomly changes for most users (see Table II, the changes are unpredictable making them hard to model. Since canvas are quite unique among browser instances [14], and don't change too frequently, it is still interesting to consider that it must remain identical between two fingerprints of the same browser instance.
Fig. 4:Fp-stalker: overview of both algorithm variants. The rule-based algorithm is simpler and faster but the hybrid algorithm leads to better fingerprint linking.
We impose a constraint on fonts: if both fingerprints have Flash activated-i.e. we have a list of fonts available-then the fonts of
must either be a subset or a superset of the fonts off_{u} , but not a disjoint set. That means that between two fingerprints of a browser instance, it will allow deletions or additions of fonts, but not both.f_{k} We define a set of attributes that are allowed to change, but only within a certain similarity. That means that their values must have a similarity ratio > 0.75, as defined in the Python library function difflib. SequenceMatcher (). ratio. These attributes are user agent, vendor, renderer, plugins, language, accept, headers. We allow at most two changes of this kind.
We also define a set of attributes that are allowed to change, no matter their value. This set is composed of resolution, timezone and encoding. However, we only allow one change at the same time among these three attributes.
Finally, the total number of changes from rules 5 and 6 must be less than 2.
The order in which rules are applied is important for performance purposes: we ordered them from the most to least discriminating. The first rules discard many candidates, reducing the total number of comparisons. In order to link
On a side-note, we established the rules using a simple univariate statistical analysis to study attribute stability (see Table II), as well as some objective (e.g., rule 1) and other subjective (e.g., rule 4) decisions. Due to the difficulty in making complex yet effective rules, the next subsection presents the use of machine learning to craft a more effective algorithm.
C. Hybrid Linking Algorithm
The second variant of Fp-STALKER mixes the rule-based algorithm with machine learning to produce a hybrid algorithm. It reuses the first three rules of the previous algorithm, since we consider them as constraints that should not be violated between two fingerprints of a same browser instance. However, for the last four rules, the situation is more fuzzy. Indeed, it is not as clear when to allow attributes to be different, how many of them can be different, and with what dissimilarity. Instead of manually crafting rules for each of these attributes, we propose to use machine learning to discover them. The interest of combining both rules and machine learning approaches is that rules are faster than machine learning, but machine learning tends to be more precise. Thus, by applying the rules first, it helps keep only a subset of fingerprints on which to apply the machine learning algorithm.
1) Approach Description
The first step of this algorithm is to apply rules 1, 2 and 3 on
2) Machine Learning
Computing the probability that two fingerprints
In summary, given two fingerprints,
a) Input Feature Vector
To solve the binary classification problem, we provide an input vector
Most of these features are binary values (0 or 1) corresponding to the equality or inequality of an attribute, or similarity ratios between these attributes. We also include a number of changes feature that corresponds to the total number of different attributes between
In order to choose which attributes constitute the feature vector we made a feature selection. Indeed, having too many features does not necessarily ensure better results. It may lead to overfitting-i. e., our algorithm correctly fits our training data, but does not correctly predict on the test set. Moreover, having too many features also has a negative impact on performance. For the feature selection, we started with a model using all of the attributes in a fingerprint. Then, we looked at feature importance, as defined by [15], to determine the most discriminating features. In our case, feature importance is a combination of uniqueness, stability, and predictability (the possibility to anticipate how an attribute might evolve over time). We removed all the components of our feature vector that had a negligible impact (feature importance < 0.002). Finally, we obtained a feature vector composed of the attributes presented in Table III. We see that the most important feature is the number of differences between two fingerprints, and the second most discriminating attribute is the list of languages. Although this may seem surprising since the list of languages does not have high entropy, it does remain stable over time, as shown in Table II, which means that if two fingerprints have different languages, this often means that they do not belong to the same browser instance. In comparison, screen resolution also has low entropy but it changes more often than the list of languages, leading to low feature importance. This is mostly caused by the fact that since screen resolution changes frequently, having two fingerprints with a different resolution doesn't add a lot of information to determine whether or not they are from the same browser instance. Finally, we see a high drop in feature importance after rank 5 (from 0.083 to 0.010), which means that most of the information required for the classification is contained in the first five features.
b) Training Random Forests
This phase trains the random forest classifier to estimate the probability that two fingerprints belong to the same browser instance. To do so, we split the input dataset introduced in Section III chronologically into two sets: a training set and a test set. The training set is composed of the first 40 % of fingerprints in our input dataset, and the test set of the last 60 %. The random forest detects fingerprint evolutions by computing the evolutions between fingerprints as feature vectors. During the training phase, it needs to learn about correct evolutions by computing relevant feature vectors from the training set. Algorithm 3 describes this training phase, which is split into two steps.
In Step 1, for every browser instance (id) of the training set, we compare each of its fingerprints
While Step 1 teaches the random forest to identify fingerprints that belong to the same browser instance, it is also necessary to identify when they do not. Step 2 compares fingerprints from different browser instances. Since the number of fingerprints from different browser instances is much larger than the number of fingerprints from the same browser instance, we limit the number of comparisons to one for each fingerprint. This technique is called undersampling [16] and it reduces overfitting by adjusting the ratio of input data labeled as true-i. e., 2 fingerprints belong to the same browser instance-against the number of data labeled as false-i.
c) Random Forest Hyperparameters
Concerning the number of trees of the random forest, there is a tradeoff between precision and execution time. Adding trees does obtain better results but follows the law of diminishing returns and increases training and prediction times. Our goal is to balance precision and execution time. The number of features plays a role during the tree induction process. At each split,
After training our random forest classifier, we obtain a forest of decision trees that predict the probability that two fingerprints belong to the same browser instance. Figure 5 illustrates the first three levels of one of the decision trees. These levels rely on the languages, the number of changes and the user agent to take a decision. If an attribute has a value below its threshold, the decision path goes to the left child node, otherwise it goes to the right child node. The process is repeated until we reach a leaf of the tree. The prediction corresponds to the class (same/different browser instance) that has the most instances over all the leaf nodes.
d) Lambda Threshold Parameter
For each browser fingerprint in the test set, we compare it with its previous browser fingerprint and with another random fingerprint from a different browser, and compute the probability that it belongs to the same browser instance using our random forest classifier with the parameters determined previously. Using these probabilities and the true labels, we choose the
Empirical Evaluation of FP-STALKER
This section assesses Fp-STALKER's capacity to i) correctly link fingerprints from the same browser instance, and to ii) correctly predict when a fingerprint belongs to a browser instance that has never been seen before. We show that both variants of Fp-STALKER are effective in linking fingerprints and in distinguishing fingerprints from new browser instances. However, the rule-based variant is faster while the hybrid variant is more precise. Finally, we discuss the impact of the collect frequency on fingerprinting effectiveness, and we evaluate the execution times of both variants of Fp-STALKER.
Figure 6 illustrates the linking and evaluation process. Our database contains perfect tracking chains because of the unique identifiers our extensions use to identify browser instances. From there, we sample the database using different collection frequencies and generate a test set that removes the identifiers, resulting in a mix of fingerprints from different browsers. The resulting test set is then run through Fp-STALKER to reconstruct the best browser instance chains as possible.
A. Key Performance Metrics
To evaluate the performance of our algorithms and measure how vulnerable users are to browser fingerprint tracking, we consider several metrics that represent the capacity to keep track of browser instances over time and to detect new browser instances. This section presents these evaluation metrics, as well as the related vocabulary. Figure 6 illustrates the different metrics with a scenario.
A tracking chain is a list of fingerprints that have been linked-i.e., fingerprints for which the linking algorithm assigned the same identifier. A chain may be composed of one or more fingerprints. In case of a perfect linking algorithm, each browser instance would have a unique tracking chain-i. e., all of its fingerprints are grouped together and are not mixed with fingerprints from any other browser instances. However, in reality, fingerprinting is a statistical attack and mistakes may occur during the linking process, which means that:
Fingerprints from different browser instances may be included in the same tracking chain.
Fingerprints from a given browser instance may be split into different tracking chains.
The lower part of Figure 6 shows examples of these mistakes. Chain 1 has an incorrect fingerprint fpB1 from Browser B, and chain 3 and chain 4 contain fingerprints from browser C that have not correctly been linked-i.e., fpC3 and fpC4 were not linked leading to a split).
We present the tracking duration metric to evaluate the capacity of an algorithm to track browser instances over time. We define tracking duration as the period of time a linking algorithm matches the fingerprints of a browser instance within a single tracking chain. More specifically, the tracking duration for a browser
The average tracking duration for a browser instance
The maximum tracking duration for a browser instance
The Number of assigned ids represents the number of different identifiers that have been assigned to a browser instance by the linking algorithm. It can be seen as the number of tracking chains in which a browser instance is present. For each browser instance, a perfect linking algorithm would group all of the browser's fingerprints into a single chain. Hence, each browser instance would have a number of assigned ids of 1. Figure 6 shows an imperfect case where browser C has been assigned 2 different ids (chain 3 and chain 4).
The ownership ratio reflects the capacity of an algorithm to not link fingerprints from different browser instances. The owner of a tracking chain chaink is defined as the browser instance
Overview of our evaluation process that allows testing the algorithms using different simulated collection frequencies.
B. Comparison with Panopticlick's Linking Algorithm
We compare Fp-STALKER to the algorithm proposed by Eckersley [8] in the context of the Panopticlick project. To the best of our knowledge, there are no other algorithms to compare to. Although Eckersley's algorithm has been characterized as “naive” by its author, we use it as a baseline to compare our approach. The Panop-ticlick algorithm is summarized in Algorithm 4. It uses the following 8 attributes: User agent, accept, cookies enabled, screen resolution, timezone, plugins, fonts and local storage. Given an unknown fingerprint
Example of the process to generate a simulated test set. The dataset contains fingerprints collected from browser's A and B, which we sample at a collect_frequency of 2 days to obtain a dataset that allows us to test the impact of collect_frequency on fingerprint tracking.
C. Dataset Generation Using Fingerprint Collect Frequency
To evaluate the effectiveness of Fp-STALKER we start from our test set of 59,159 fingerprints collected from 1, 395 browser instances (60% of our input dataset, see Section IV-C2b). However, we do not directly use this set. Instead, by sampling the test set, we generate new datasets using a configurable collect frequency. Because our input dataset is fine-grained, it allows us to simulate the impact fingerprinting frequency has on tracking. The intuition being that if a browser is fingerprinted less often, it becomes harder to track.
To generate a dataset for a given collect frequency, we start from the test set of 59, 159 fingerprints, and, for each browser instance, we look at the collection date of its first fingerprint. Then, we iterate in time with a step of collect_frequency days and recover the browser instance's fingerprint at time
The browser fingerprints in a generated test set are ordered chronologically. At the beginning of our experiment, the set of known fingerprints
Average tracking duration against simulated collect frequency for the three algorithms
D. Tracking Duration
Figure 8 plots the average tracking duration against the collect frequency for the three algorithms. On average, browser instances from the test set were present for 109 days, which corresponds to the maximum value our linking algorithm could potentially achieve. We see that the hybrid variant of Fp-S Talker is able to keep track of browser instances for a longer period of time than the two other algorithms. In the case where a browser gets fingerprinted every three days, Fp-STALKER can track it for 51,8 days, on average. More generally, the hybrid variant of Fp-STALKER has an average tracking duration of about 9 days more than the rule-based variant and 15 days more than the Panopticlick algorithm.
Figure 9 presents the average maximum tracking duration against the collect frequency for the three algorithms. We see that the hybrid algorithm still outperforms the two other algorithms because the it constructs longer tracking chains with less mistakes. On average, the maximum average tracking duration for FP-Stalker's hybrid version is in the order of 74 days, meaning that at most users were generally tracked for this duration.
Average maximum tracking duration against simulated collect frequency for the three algorithms. This shows averages of the longest tracking durations that were constructed.
Average number of assigned ids per browser instance against simulated collect frequency for the three algorithms (lower is better).
Figure 10 shows the number of ids they assigned, on average, for each browser instance. We see that PANOPTICLICK's algorithm often assigns new browser ids, which is caused by its conservative nature. Indeed, as soon as there is more than one change, or multiple candidates for linking, Panopticlick's algorithm assigns a new id to the unknown browser instance. However, we can observe that both Fp-STALKER's hybrid and rule-based variants perform similarly.
Finally, Figure 11 presents the average ownership of tracking chains against the collect frequency for the three algorithms. We see that, despite its conservative nature, PANOP-TICLICK's ownership is 0.94, which means that, on average, 6 % of a tracking chain is constituted of fingerprints that do not belong to the browser instance that owns the chain-i. e., it is contaminated with other fingerprints. The hybrid variant of Fp-STALKER has an average ownership of 0.985, against 0.977 for the rule-based.
When it comes to linking browser fingerprints, Fr-STALKER's hybrid variant is better, or as good as, the rule-based variant. The next paragraphs focus on a few more results we obtain with the hybrid algorithm. Figure 12 presents the cumulative distribution of the average and maximum tracking duration when collect_frequency equals 7 days for the hybrid variant. We observe that, on average, 15,5% of the browser instances are tracked more than 100 days. When it comes to the the longest tracking chains, we observe that more than 26 % of the browser instances have been tracked at least once for more than 100 days during the experiment. These numbers show how tracking may depend on the browser and its configuration. Indeed, while some browsers are never tracked for a long period of time, others may be tracked for multiple months. This is also due to the duration of presence of browser instances in our experiments. Few browser instances were present for the whole experiment, most for a few weeks, and at best we can track a browser instance only as long as it was present. The graph also shows the results of the perfect linking algorithm, which can also be interpreted as the distribution of duration of presence of browser instances in our test set.
Average ownership of tracking chains against simulated collect frequency for the three algorithms. A value of 1 means the tracking chain is constructed perfectly.
CDF of average and maximum tracking duration for a collect frequency of 7 days (fp-stalker hybrid variant only).
The boxplot in Figure 13 depicts the number of ids generated by the hybrid algorithm for a collect frequency of 7 days. It shows that half of the browser instances have been assigned 2 identifiers, which means they have one mistake, and more than 90 % have less than 9 identifiers.
Finally, we also look at the distribution of the chains to see how often fingerprints from different browser instances are mixed together. For the Fp-STALKER hybrid variant, more than 95% of the chains have an ownership superior to 0.8, and more than 90% have perfect ownership-i.e., 1. This shows that a small percentage of browser instances become highly mixed in the chains, while the majority of browser instances are properly linked into clean and relatively long tracking chains.
Distribution of number of ids per browser for a collect frequency of 7 days (fp-stalker hybrid variant only).
E. Benchmark/Overhead
This section presents a benchmark that evaluates the performance of Fp-STALKER's hybrid and rule-based variants. We start by providing more details about our implementation, then we explain the protocol used for this benchmark, demonstrate that our approach can scale, and we show how our two variants behave when the number of browser instances increases.
a) The Implementations
Of Fp-STALKER used for this benchmark are developed in Python, and the implementation of the random forest comes from the Scikit-Learn library. In order to study the scalability of our approach, we parallelized the linking algorithm to run on multiple nodes. A master node is responsible for receiving linkability requests, then it sends the unknown fingerprint to match
b) The Experimental Protocol
Aims to study scalability. We evaluate our approach on a standard Azure cloud instance. We generate fake browser fingerprints to increase the test set size. Thus, this part does not evaluate the previous metrics, such as tracking duration, but only the execution times required to link synthetic browser fingerprints, as well as how well the approach scales across multiple processes.
The first step of the benchmark is to generate fake fingerprints from real ones. The generation process consists in taking a real fingerprint from our database and applying random changes to the canvas and the timezone attributes. We apply only two random changes so that generated fingerprints are unique, but they do not have too many differences which would reduce the number of comparisons. This point is important because our algorithms include heuristics related to the number of differences. Thus, by applying a small number of random changes, we do not discard all
Speedup of average execution time against number of processes for fp-stalker's hybrid variant
Once the fingerprints are stored in the slave processes memory, we start our benchmark. We get 100 real fingerprints and try to link them with our generated fingerprints. For each fingerprint, we measure the execution time of the linking process. In this measurement, we measure:
The number of fingerprints and browser instances.
The number of processes spawned.
We execute our benchmark on a Standard D16 v3 Azure instance with 16 virtual processors and 64 Gb of RAM, which has an associated cost of $576 USD per month. Figure 14 shows the execution time speedup in percentage against the number of processes for the hybrid approach. We see that that as the number of processes increases, we obtain a speedup in execution time. Going from 1 to 8 processes enables a speed up of more than 80 %. Figure 15 shows the execution time to link a fingerprint against the number of browser fingerprints for Fp-STALKER's hybrid and rule-based variants, using 16 processes. Better tracking duration from the hybrid variant (see V-D) is obtained at the cost of execution speed. Indeed, for any given number of processes and browser instances, the rule-based variant links fingerprints about 5 times faster. That said, the results show that the hybrid variant links fingerprints relatively quickly.
However, the raw execution times should not be used directly. The algorithm was implemented in Python, whose primary focus is not performance. Moreover, although we scaled by adding processes, it is possible to scale further by splitting the linking process (e.g., depending on the combination of OS and browser, send the fingerprint to more specialized nodes). In our current implementation, if an unknown fingerprint from a Chrome browser on Linux is trying to be matched, it will be compared to fingerprints from Firefox on Windows, causing us to wait even though they have no chance of being linked. By adopting a hierarchical structure where nodes or processes are split depending on their OS and browser, it is possible to increase the throughput of our approach.
Furthermore, the importance of the raw execution speeds depend highly on the use case. In the case where fingerprinting is used as a way to regenerate cookies (
Execution times for fp-stalker hybrid and rule-based to link a fingerprint using 16 processes. Time is dependent on the size of the test set. The increased effectiveness of the hybrid variant comes at the cost slower of execution times.
F. Threats to Validity
First, the results we report in this work depend on the representativity of our browser fingerprint dataset. We developed extensions for Chrome and Firefox, the two most popular web browsers, and distributed them through standard channels. This does provide long term data, and mitigates a possible bias if we had chosen a user population ourselves, but it is possible that the people interested in our extension are not a good representation of the average Web surfer.
Second, there is a reliability threat due to the difficulty in replicating the experiments. Unfortunately, this is inherent to scientific endeavors in the area of privacy: these works must analyze personal data (browser fingerprints in our case) and the data cannot be publicly shared. Yet, the code to split the data, generate input data, train the algorithm, as well as evaluate it, is publicly available online on GitHub5.
Finally, a possible internal threat lies in our experimental framework. We did extensive testing of our machine learning algorithms, and checked classification results as thoroughly as possible. We paid attention to split the data and generate a scenario close to what would happen in a web application. However, as for any large scale experimental infrastructure, there are surely bugs in this software. We hope that they only change marginal quantitative things, and not the qualitative essence of our findings.
G. Discussion
This paper studies browser fingerprint linking in isolation, which is its worst-case scenario. In practice, browser fingerprinting is often combined with stateful tracking techniques (e.g., cookies, Etags) to respawn stateful identifiers [1]. In such cases, fingerprint linking is performed much less frequently since most of the time a cookie is sufficient and inexpensive to track users. Our work shows that browser fingerprinting can provide an efficient solution to extend the lifespan of cookies, which are increasingly being deleted by privacy-aware users.
Browser vendors and users would do well to minimize the differences that are so easily exploited by fingerprinters. Our results show that some browser instances have highly trackable fingerprints, to the point that very infrequent fingerprinting is quite effective. In contrast, other browser instances appear to be untrackable using the attributes we collect. Vendors should work to minimize the attack surfaces exploited by fingerprint-ers, and users should avoid customizing their browsers in ways that make them expose unique and linkable fingerprints.
Depending on the objectives, browser fingerprint linking can be tuned to be more conservative and avoid false positives (e.g., for second-tier security purposes), or more permissive (e.g., ad tracking). Tuning could also be influenced by how effective other tracking techniques are. For example, it could be tuned very conservatively and simply serve to extend cookie tracking in cases where privacy-aware users, which are in our opinion more likely to have customized (i.e., unique and linkable) browser configurations, delete their cookies.
Conclusion
In this paper, we investigated browser fingerprint evolution and proposed Fp-STALKER as an approach to link fingerprint changes over time. We address the problem with two variants of Fp-STALKER. The first one builds on a ruleset identified from an analysis of grounded programmer knowledge. The second variant combines the most discriminating rules by leveraging machine learning to sort out the more subtle ones.
We trained the Fp-STALKER hybrid variant with a training set of fingerprints that we collected for 2 years through browser extensions installed by 1, 905 volunteers. By analyzing the feature importance of our random forest, we identified the number of changes, the languages, as well as the user agent, as the three most important features.
We ran Fp-STALKER on our test set to assess its capacity to link fingerprints, as well as to detect new browser instances. Our experiments demonstrate that the hybrid variant can correctly link fingerprint evolutions from a given browser instance for 54.48 consecutive days on average, against 42,3 days for the rule-based variant. When it comes to the maximum tracking duration, with the hybrid variant, more than 26% of the browsers can be tracked for more than 100 days.
Regarding the usability of Fp-STALKER, we measure the average execution time to link an unknown fingerprint when the number of known fingerprints is growing. We show that both our rule-based and hybrid variants scale horizontally.
ACKNOWLEDGMENT
We would like to thank the users of the AmIUnique extensions, whose contributions were essential to this study. We also want to thank our shepherd, Davide Balzarotti, and the anonymous reviewers for their valuable comments and feedback. Finally, this work would not have been possible without our long-term collaboration with Benoit Baudry.