Automatic Tuning of Privacy Budgets in Input-Discriminative Local Differential Privacy

Local differential privacy (LDP) and its variants have been recently studied to analyze personal data collected from Internet of Things (IoT) devices while strongly protecting user privacy. In particular, a recent study proposes a general privacy notion called input-discriminative LDP (ID-LDP), which introduces a privacy budget for each input value to deal with different levels of sensitivity. However, it is unclear how to set an appropriate privacy budget for each input value, especially, in current situations where reidentification is considered a major risk, e.g., in GDPR. Moreover, the possible number of input values can be very large in IoT. Consequently, it is also extremely difficult to manually check whether a privacy budget for each input value is appropriate. In this article, we propose algorithms to automatically tune privacy budgets in ID-LDP so that obfuscated data strongly prevent reidentification. We also propose a new instance of ID-LDP called one-budget ID-LDP (OneID-LDP) to prevent reidentification with high utility. Through comprehensive experiments using four real data sets, we show that existing instances of ID-LDP lack either utility or privacy—they overprotect personal data or are vulnerable to reidentification attacks. Then, we show that our OneID-LDP mechanisms with our privacy budget tuning algorithm provide much higher utility than LDP mechanisms while strongly preventing reidentification.


I. INTRODUCTION
W ITH the advancement of Internet of Things (IoT) devices, such as wearable devices, connected cars, smart homes, and activity monitoring systems, personal data are increasingly collected for various types of data analysis. For example, a large amount of location data collected from wearable devices or connected cars are analyzed to calculate a frequency distribution (geographic population distribution). The frequency distribution is useful for providing traffic information to users [1] or finding popular point of interests (POIs), such as restaurants and cultural landmarks [2]. For another example, person activity data from monitoring systems are analyzed to extract typical activity patterns of elderly people [3]. Power-consumption data from smart meters are analyzed to find typical daily consumption patterns [4] or the right customers to target for demand response programs [5]. Although these data are useful for industry and society, the disclosure of personal data can lead to serious privacy issues. Therefore, there is a need to develop algorithms to perform data analysis while strongly protecting user privacy. Differential privacy (DP) [6], [7] is known as a gold standard for a private analysis. It strongly protects user privacy against adversaries with any background knowledge. There are roughly two types of DP: 1) centralized DP and 2) local DP (LDP). Centralized DP assumes a centralized model where a central server has personal data of all users and obfuscates analysis results, e.g., frequency distribution. In this model, there is a risk that the personal data of all users are leaked from the server by illegal access [8]. In contrast, LDP assumes a local model where each user obfuscates her personal data and sends the obfuscated data to a data collector; i.e., it does not assume a trusted party. Thus, LDP does not suffer from the data breach issue and has been adopted in companies, such as Google [9], Apple [10], and Microsoft [11].
LDP prevents an adversary from distinguishing any pair of input values and controls the indistinguishability by a parameter called a privacy budget ε. LDP regards all input values as equally sensitive and uses the same privacy budget for all pairs of input values. However, different input values have different levels of sensitivity in practice. For example, homes and hospitals are highly sensitive locations, whereas parks, restaurants, and sightseeing places would be less sensitive for most users. Cancers and HIV are highly sensitive diseases, whereas headache, sore throat, and stomachache would be less sensitive. In these scenarios, LDP mechanisms excessively obfuscate personal data and cause the loss of data utility.
To address this issue, a recent study proposed a general privacy notion called input-discriminative LDP (ID-LDP) [12]. ID-LDP deals with different levels of sensitivity in input values by introducing a privacy budget ε x for each input value x. ID-LDP controls the indistinguishability of a pair of two input values x and x as a function of the corresponding budgets ε x and ε x . ID-LDP is general in that we can use any function for the pair. It includes MinID-LDP [12] and high-low LDP (HLLDP) [13], [14] as instances, both of which provide higher utility than LDP.
However, it is difficult to manually determine an appropriate privacy budget for each input value in practice. For example, it is well known that DP strongly protects user privacy when the privacy budget is small; e.g., ε ≤ 1 [15]. Thus, it is natural to allocate such small privacy budgets to sensitive locations, such as homes and hospitals. However, it is unclear how much privacy budgets should be allocated to less sensitive locations, such as parks and restaurants.
In particular, even if some input values are nonsensitive for users, the disclosure of them may lead to the reidentification of records [16]. For example, assume that Alice disclosed the fact that she went to a coffee shop, which was nonsensitive for her. An adversary who obtains this information may use it for reidentifying another sensitive record (e.g., hospital she regularly visits near the coffee shop) in a different database. Consequently, various kinds of personal data from different databases may be linked to make a user profile [17], and it might be sold on the dark Web [18]. Since reidentification is considered a major risk in general data protection regulation (GDPR) [19], [20], we need to strongly protect all personal data, including nonsensitive data, from reidentification attacks.
Moreover, IoT devices can collect various data, and the possible number of input values can also be very large, e.g., larger than 10 000 in our experiments. Thus, it is extremely difficult to manually set an appropriate privacy budget for each input value and to manually check whether each value is appropriate. Setting an appropriate privacy budget is also recognized as an important challenge in Internet of Vehicles (IoV) [21].
In this article, we propose algorithms to automatically determine privacy budgets in ID-LDP so that obfuscated data strongly prevent reidentification. Note that in many practical scenarios of the local model, an adversary needs to perform reidentification attacks to link the obfuscated data to users [22]. For example, some applications (e.g., Foursquare, Google Maps, and YouTube recommendation) can be used without a login, i.e., without sending a user ID. For another example, a data collector pseudonymizes obfuscated data to reduce the risks to the users, as described in GDPR [19]. In both cases, an outsider adversary who obtains the obfuscated data needs to reidentify the data. Our algorithms automatically determine privacy budgets to strongly prevent this attack.
As a task for the data collector, we consider frequency estimation [9], [12], [23], which is a fundamental task in the local model. We show that our proposed algorithms strongly prevent the reidentification attack while providing much higher utility than LDP.
Our Contributions: Our contributions are as follows. 1) We propose privacy budget tuning algorithms for ID-LDP, which automatically determine privacy budgets so that obfuscated data prevent reidentification. To our knowledge, this work is the first to automatically determine a privacy budget for each input value to prevent reidentification (see Section II for details). 2) We also propose a new instance of ID-LDP called one-budget ID-LDP (OneID-LDP) to bound the reidentification risk with high utility. We prove that OneID-LDP upper bounds the reidentification accuracy for every obfuscated data, hence, every user. 3) Through comprehensive experiments using four real data sets (one location data set with six cities and three person activity data sets), we show that two existing instances of ID-LDP lack either utility or privacy-MinID-LDP [12] I  RELATIONSHIP BETWEEN THE EXISTING WORK AND OUR PROPOSAL.  OUR PROPOSAL IS HIGHLIGHTED IN BOLD   TABLE II  PRIVACY AND UTILITY OF THE THREE PRIVACY NOTIONS. WE ASSUME  THAT OUR PRIVACY BUDGET TUNING ALGORITHMS ARE  APPLIED TO MINID-LDP AND ONEID-LDP still overprotects personal data and lacks utility, and HLLDP [13], [14] is vulnerable to reidentification. 4) Finally, we show the effectiveness of our algorithms using the four data sets. Specifically, we show that our OneID-LDP mechanisms with our privacy budget tuning algorithms provide much higher utility than MinID-LDP and LDP mechanisms while preventing reidentification. Novelty: Below, we explain the novelty of our work in more detail. As explained above, our proposal is twofold: 1) automatic tuning of privacy budgets and 2) OneID-LDP. Table I shows the relationship between the existing work and our proposal.
First and most importantly, the automatic tuning of privacy budgets (i.e., the third column of Table I) is a totally new research direction. All of the existing work on ID-LDP [12], [13], [14] manually set a privacy budget ε x for each input value x without theoretical justification; e.g., ε x = ln 6 [12] or ∞ [13], [14] for nonsensitive data. In contrast, our privacy budget tuning algorithms automatically determine ε x to provide theoretical guarantees against reidentification attacks.
Second, our OneID-LDP (i.e., the fourth row of Table I) is a new privacy notion. OneID-LDP is designed to prevent reidentification with much higher utility than MinID-LDP. We propose OneID-LDP with a manual setting of ε x in Section IV-B. Then, we propose privacy budget tuning algorithms for OneID-LDP (resp., MinID-LDP) in Sections IV-D and IV-F (resp., Sections IV-E and IV-F). We show that both OneID-LDP and MinID-LDP prevent reidentification when using our privacy budget tuning algorithms. Then, we show that OneID-LDP provides much higher utility than MinID-LDP.
Note that our privacy budget tuning algorithms cannot be applied to HLLDP ("N/A" in Table I). This is because HLLDP always sets ε x = ∞ for nonsensitive data. In contrast, our privacy budget tuning algorithms use a finite value of ε x for some nonsensitive data to prevent reidentification. Thus, they are incompatible with HLLDP. Table II summarizes the privacy and utility of the three privacy notions. Here, we apply our privacy budget tuning algorithms to MinID-LDP and OneID-LDP. We say an algorithm provides "high privacy against reidentification" when it upper bounds the reidentification accuracy for every user by a desired value. Because HLLDP does not protect nonsensitive data at all (i.e., ε x = ∞), it is vulnerable to reidentification. MinID-LDP lacks utility. In contrast, our OneID-LDP provides high privacy and utility. See Section V for details.
Remark on Privacy Risks: This article shows that our proposal (OneID-LDP with automatic tuning of privacy budgets) is secure against reidentification. There are other privacy risks in the privacy literature. Specifically, two types of information disclosure are known as privacy risks: 1) identity disclosure and 2) attribute disclosure [24]. Identity disclosure takes place when the adversary correctly links a user to a record in the database. Attribute disclosure takes place when the adversary correctly obtains some information about an attribute of a user.
Identity disclosure is caused by reidentification attacks and membership inference attacks [25], [26] as follows. The adversary first performs membership inference attacks, which determine who are members, i.e., users in the database. Then, the adversary performs reidentification attacks, which link one of the (inferred) members to each record in the database. The adversary succeeds in identity disclosure if she accurately performs both membership inference and reidentification. In this article, we assume that the adversary completely knows who are members/nonmembers when she performs reidentification. In other words, we consider a worst case scenario where the accuracy of membership inference is 100%. In practice, the accuracy of membership inference would be smaller than 100%. In that case, the accuracy of identity disclosure would be smaller than what is reported in our experiments.
Attribute disclosure is caused by attribute inference attacks [27], which infer an attribute of a user. LDP is a privacy notion to strongly prevent the inference of attributes from output data. Similarly, our OneID-LDP strongly prevents the inference of sensitive attributes from output data. One might think that the adversary might infer attributes from a frequency distribution. For example, assume that users in a certain area are likely to visit a hospital and that Alice lives in this area. Then, the adversary who obtains a frequency distribution estimated by the data collector would infer that Alice is likely to visit the hospital. This kind of attack is inevitable in any LDP (or ID-LDP) mechanism when the goal is to estimate a frequency distribution. In addition, this kind of inference is not considered a privacy violation in [28], because it is statistical inference. Thus, it is outside the scope of this article.
In summary, our proposal strongly prevents both identity and attribute disclosure other than statistical inference.
Paper Organization: The remainder of this article is organized as follows. In Section II, we review the previous work related to ours. In Section III, we explain some preliminaries for our work, such as basic notations, utility/privacy metrics, and randomized mechanisms. In Section IV, we propose our privacy budget tuning algorithms and OneID-LDP. We also prove that both OneID-LDP and MinID-LDP bound the reidentification accuracy when using our privacy budget tuning algorithms. In Section V, we show our experimental results. In Section VI, we conclude this article.
The limitation of LDP is that it requires too much noise; it is proven in [32] that LDP needs an extremely large number of users (e.g., dozen million [9]) to enable accurate data analysis due to the large noise. One reason for the low utility of LDP is that it regards all input data as equally sensitive.
Numerous variants of DP/LDP have been studied to overcome its limitation. A recent Systems of Knowledge (SoK) paper [41] classifies these variants into seven categories, depending on which aspect of the original DP/LDP is modified. Out of the seven categories, we focus on the V (Variation of Privacy Loss) category because it attempts to address the utility issue explained above; see [41] for details of the other six categories. The V category varies the privacy level of DP/LDP across inputs. The variants in this category include HLLDP [13], [14], MinID-LDP [12], and context-aware LDP [14].
The first variant of LDP in the V category was proposed by Murakami and Kawamoto [13]. They introduced the notion of HLLDP, 1 which provides a privacy guarantee equivalent to LDP only for sensitive data. Then, they proposed a subclass of HLLDP called utility-optimized LDP (ULDP), which optimizes the utility within HLLDP. Later, Gu et al. [12] proposed the notion of ID-LDP that includes HLLDP as a special case. They proposed an instance of ID-LDP called MinID-LDP and showed that it provides higher utility than LDP. In this article, we focus on ID-LDP because it is a general notion-it includes both HLLDP and MinID-LDP as instances.
Another interesting variant of LDP in the V category is context-aware LDP proposed by Acharya et al. [14]. It allocates a privacy budget ε x,x for each pair of two input values x and x . This is also very general and includes various variants of LDP as special cases, e.g., HLLDP, geo-indistiguishability [43], and d x -privacy [44]. They also introduced a new instance of context-aware LDP called blockstructured LDP [14], which hides input values within the same group. We do not focus on context-aware LDP, because our interest is in handling different levels of sensitivity in input values, as described in Section I. For this purpose, it is sufficient to use ID-LDP that allocates a privacy budget ε x to each personal data x.
A crucial issue in these variants of LDP is how to set appropriate privacy budgets. As explained in Section I, the disclosure of nonsensitive input values may lead to the reidentification of records in another database. It is extremely difficult to manually set an appropriate privacy budget for each input value, as the possible number of input values can be very large. Unfortunately, all of the above studies [12], [13], [14] do not consider how to automatically set appropriate privacy budgets. Therefore, we propose privacy budget tuning algorithms and a new instance of ID-LDP called OneID-LDP to prevent reidentification while keeping high utility.
DP and Reidentification Risks: The relationship between DP and the reidentification risk was shown in recent studies by Cohen and Nissim [45], [46]. Specifically, they formally defined a concept of predicate singling out, which is weaker than singling out in GDPR. Security against predicate singling out is a necessary (but not sufficient) condition for security against singling out in GDPR. They showed that DP prevents predicate singling out (whereas k-anonymity does not) in an asymptotic setting where the number of users goes to ∞. However, they do not clarify the relationship between the privacy budget in DP and the reidentification risk. They also do not consider different levels of sensitivity in input values.
There are also some variants of DP related to the reidentification risk. For example, Gehrke et al. [47] proposed the notion of crowd-blending privacy, which weakens centralized DP so that each record is indistinguishable from at least k − 1 other records. Bindshaedler et al. [48] proposed a similar notion called plausible deniability and showed that a plausible deniability mechanism generates differentially private synthetic data under some conditions. Murakami and Takahashi [22] proposed personal information entropy (PIE) privacy as a relaxation of LDP to reduce the reidentification risk. 2 All of these studies [22], [47], [48] do not consider different levels of sensitivity in input values.
Finally, we note that our work is totally different from a recently proposed shuffling technique [49], [50], [51]. Specifically, the shuffling technique reduces the privacy budget in DP by introducing an intermediate server (shuffler) that randomly shuffles obfuscated data. Our work is different from this technique in three ways. First, our algorithms upper bound the reidentification accuracy, whereas the shuffling technique does not consider it. Second, our work deals with different levels of sensitivity in input values, whereas the shuffling technique does not. Third, we show that reidentification is strongly prevented even if the privacy budget is ∞ in some cases (see Section IV-D for details), whereas the shuffling technique cannot reduce the privacy budget in this case.
In summary, our work is the first to automatically determine a privacy budget for each input value to prevent reidentification, to our knowledge.

III. PRELIMINARIES
In this section, we provide some preliminaries for our work. Section III-A introduces basic notations used in this article. Sections III-B-III-D explain utility metrics, privacy metrics, and randomized mechanisms, respectively.

A. Notations
Let R, N, R ≥0 , Z ≥0 be the sets of real numbers, natural numbers, nonnegative real numbers, and nonnegative integers, Let U be a finite set of users who use an application (e.g., wearable device and connected car). Let n ∈ N be the number of users, and u i ∈ U be the ith user; i.e., U = {u 1 , . . . , u n }.
Let X be a finite set of personal data (e.g., locations and physical activities). We assume that continuous data are discretized into some bins; e.g., a location map is divided into smaller regions or POIs. We also assume that each user u i sends a single datum (we discuss the case where a user sends multiple data in Section IV-B). Let X (i) be a random variable representing personal data of user u i . Let X = {X (1) , . . . , X (n) } be a set of all personal data.
Let Y be a finite set of obfuscated data. Let Y (i) be a random variable representing obfuscated data of user u i . Let Y = {Y (1) , . . . , Y (n) } be a set of all personal data. Each user obfuscates her personal data using a randomized mechanism Q, which maps x ∈ X to y ∈ Y with probability Q(y|x), and sends the obfuscated data to a data collector.
We consider frequency estimation as a task of the data collector. Let c be a frequency distribution, whose element c(x) is the number of users who possess x; i.e., where 1 X (i) =x is an indicator function that takes 1 if X (i) = x and 0 otherwise. The data collector estimates a frequency distribution c from obfuscated data of all users. Letĉ be an estimate of c. Table III shows the basic notations in this article.

B. Utility Metrics
In this article, we use the mean absolute error (MAE) and mean-squared error (MSE) as metrics of utility loss. The MAE and MSE are defined using the l 1 loss and l 2 loss, respectively.

C. Privacy Metrics
LDP: LDP is defined as follows.
Definition 1 (ε-LDP [29]): Let ε ∈ R ≥0 be a privacy budget. A randomized mechanism Q provides ε-LDP if and only if for any x, x ∈ X and any y ∈ Y Intuitively, LDP guarantees that an adversary who obtains obfuscated data y cannot determine, for any pair of input values x and x , whether it comes from x or x . This holds, especially, when the privacy budget ε is close to 0 because all of the input values in X are almost equally likely; i.e., Q(y|x) ≈ Q(y|x ) for any x and x . Thus, LDP strongly protects user privacy when ε is small; e.g., ε ≤ 1 [15].
ID-LDP: LDP regards all input values in X as equally sensitive. However, the sensitivity differs according to the input values in practice; e.g., hospitals and homes are highly sensitive locations, whereas other locations, such as parks and restaurants, are not sensitive for most users. Thus, LDP causes excessive obfuscation and a significant loss of utility.
To address this issue, Gu et al. [12] proposed ID-LDP. The feature of ID-LDP is that it introduces a privacy budget ε x for each input value x in X . Formally, ID-LDP is defined as follows.
Definition 2 ((E, r)-ID-LDP [12]): be a function that takes two privacy budgets as input and outputs a nonnegative value. A randomized mechanism Q provides (E, r)-ID-LDP if and only if for any x, x ∈ X and any y ∈ Y We refer to r(ε x , ε x ) as a pair budget for x and x . ID-LDP is general in that we can use any function as r.
Definition 3 (E-MinID-LDP [12]): Let ε x ∈ R ≥0 be a privacy budget for personal data MinID-LDP controls the adversary's capability of distinguishing x and x by using the minimum of ε x and ε x . For example, assume that the set of personal data is X = {cancer, headache, sore throat}. We set ε cancer = 1 and ε headache = ε sore throat = 2 because a cancer is the most sensitive disease. Then, MinID-LDP adopts 1 as a pair budget for (cancer, headache) and 2 for (headache, sore throat). For a pair of nonsensitive input values x and x , MinID-LDP can assign a large pair budget. However, it needs to use a small pair budget when either x or x is sensitive. Consequently, when we consider a reidentification as a risk, MinID-LDP still overprotects personal data-the utility gain of MinID-LDP over LDP is limited, as shown in our experiments.
Definition 4 ((X S , ε S )-HLLDP [13], [14]): Let ε x ∈ R ≥0 be a privacy budget for personal data x ∈ X . Let E = {ε x } x∈X . Let X S ⊆ X be a finite set of sensitive data. Let ε S ∈ R ≥0 be a privacy budget for sensitive data. A randomized mechanism Q provides (X S , ε S )-HLLDP if and only if it provides E-ID-LDP, where and Since HLLDP assigns ε x = ∞ for nonsensitive data x, it provides much higher utility than LDP [13]. However, this comes at the expense of privacy-HLLDP is vulnerable to reidentification attacks, as shown in our experiments.
Remark on Sensitive Data: Note that the distinction between sensitive and nonsensitive data can be different from user to user; e.g., x 1 ∈ X is sensitive for Alice and Bob, whereas x 2 ∈ X is sensitive for only Carol. The study in [13] proposes a distribution estimation method under LDP in such a personalized scenario. Specifically, their method first maps sensitive data for each user to a bot symbol "⊥" and uses an ID-LDP mechanism with domain X ∪ {⊥}. After computing the frequency distribution of input data including ⊥, their method discards the frequency of ⊥ and normalizes the other frequencies so that the sum is n. It is shown in [13] that a distribution can be accurately estimated by using this method.
In our experiments, we assume that the set of sensitive data is common to all users, e.g., POIs with "home" and "hospital" categories in the location data set. However, our proposed methods are easily extended to the personalized scenario explained above by mapping each user's sensitive data to a bot ⊥ in the same way as [13].

D. Randomized Mechanisms
UE: As a randomized mechanism Q providing LDP, we use the unary encoding (UE) mechanism [23]. The set of obfuscated data in UE is Y = {0, 1} |X | .
IDUE: As a randomized mechanism Q providing ID-LDP, we use the input-discriminative UE (IDUE) mechanism [12]. The IDUE mechanism is a modification of UE to provide ID-LDP.
For any k ∈ [|X |], the IDUE mechanism first maps x k to the kth standard basis vector e k ∈ {0, 1} |X | . Let y ∈ {0, 1} |X | be obfuscated data. Then, for each element i ∈ [|X |], the IDUE mechanism outputs 1 with the following probabilities: (3) and (4), IDUE differs from UE in that it assigns different flip probabilities to different bits.
Proposition 1: The IDUE mechanism Q with Gu et al. [12] proposed an IDUE mechanism Q that minimizes the MSE. Assume that the input domain X is divided into t ∈ N subsets X 1 , . . . , X t according to privacy budgets; i.e., all input values have the same privacy budget within each subset. Let m i = |X i |. Then, the optimization problem can be written as follows (see [12] for details): The objective function represents the upper bound of the MSE. The constraints are imposed to satisfy ID-LDP. The optimization problem in (5) is nonconvex. In our experiments, we used FindMinimum 3 in Mathematica as a solver for nonconvex optimization problems.

IV. AUTOMATIC TUNING OF PRIVACY BUDGETS IN INPUT-DISCRIMINATIVE LDP
ID-LDP provides fine-grained protection for input values with different sensitivity. A crucial issue in ID-LDP is that appropriate values of privacy budgets (|X | budgets in total) are unknown, as explained in Section I. Since the possible number of input values can be very large in IoT devices, it is also extremely difficult to manually set and check an appropriate privacy budget for each input value.
To address this issue, we propose algorithms to automatically determine privacy budgets in ID-LDP so that obfuscated data prevent reidentification, which is considered a major risk in GDPR [19]. We first introduce OneID-LDP as a new instance of ID-LDP and prove that OneID-LDP can be used to bound a reidentification risk. Then, we propose algorithms for automatically tuning privacy budgets in OneID-LDP to prevent reidentification.
Section IV-A describes the overview of our approach. Section IV-B introduces OneID-LDP. Section IV-C formalizes 3 We also confirmed that FindMinimum provides higher utility than NMinimize, another solver for nonconvex optimization problems. a reidentification risk. Section IV-D (resp., IV-E) shows the relationship between OneID-LDP (resp., MinID-LDP) and the reidentification risk. Section IV-F proposes our privacy budget tuning algorithms. The proofs of all statements in this section are given in Appendix A.
A. Overview   Fig. 1 shows the overview of our approach. Our approach consists of two phases: 1) privacy budget tuning phase and 2) frequency estimation phase.
In the privacy budget tuning phase, a data collector calculates privacy budgets E = {ε x } x∈X using a privacy budget tuning algorithm proposed in this article. This algorithm outputs privacy budgets E such that obfuscated data prevent reidentification. It can optionally take some auxiliary data as input. We propose one budget tuning algorithm without any auxiliary data and two budget tuning algorithms with auxiliary data. We explain their details in Section IV-F. After calculating privacy budgets E, the data collector distributes E to each user.
In the frequency estimation phase, each user u i ∈ U uses a randomized mechanism providing OneID-LDP, which is introduced in Section IV-B. By using OneID-LDP as a privacy metric, we can strongly prevent reidentification, as explained in Sections IV-C and IV-D. Each user u i obfuscates her personal data X (i) using OneID-LDP with privacy budgets E and sends obfuscated data Y (i) to the data collector. Finally, the data collector calculates an estimateĉ of the frequency distribution c from the obfuscated data.

B. OneID-LDP
We now introduce OneID-LDP as a privacy metric. As described in Section III-C, MinID-LDP adopts the minimum of ε x and ε x as a privacy budget for a pair of x and x (see Definition 3). In contrast, OneID-LDP uses only one privacy budget ε x for this pair. Formally, it is defined as follows.

Definition 5 (E-OneID-LDP):
Let ε x ∈ R ≥0 be a privacy budget for personal data In other words, Q provides E-OneID-LDP if and only if for any x, x ∈ X and any y ∈ Y Fig. 2 shows an example of pair budgets r(ε x , ε x ) in MinID-LDP, HLLDP, and OneID-LDP when ε 1 = ε 2 = 1, ε 3 = 2, ε 4 = 4, ε 5 = 8, X = {x 1 , x 2 }, and ε S = 1. In this example, x 1 and x 2 are more sensitive than the others, and x 5 is the least sensitive. MinID-LDP adopts the minimum of ε x and ε x as a pair budget for x and x . Consequently, it uses a small pair budget (= 1 or 2) for most pairs. Thus, MinID-LDP lacks utility, as shown in our experiments.
HLLDP is a special case of OneID-LDP where ε x is set by (2). HLLDP sets the privacy budget of nonsensitive data (x 3 , x 4 , and x 5 ) to ∞. This is too drastic and leads to reidentification, as shown in our experiments.
In contrast, OneID-LDP provides more fine-grained protection for each input value and prevents reidentification attacks, as shown in Section IV-D. Moreover, OneID-LDP uses only one privacy budget ε x for a pair of x and x . Therefore, it uses large pair budgets for less sensitive data (x 3 , x 4 , and x 5 ). Consequently, OneID-LDP provides much higher utility than MinID-LDP while preventing reidentification.
It is well known that DP has basic properties, such as compositionality and immunity to post-processing [7], [15]. OneID-LDP also has these properties.
For example, assume that a user obfuscates k (> 1) data using a mechanism providing E-OneID-LDP, where E = {ε x } x∈X . Then by Proposition 2, we obtain E * -OneID-LDP in total, where E * = {kε x } x∈X . By Proposition 3, this privacy guarantee is immune to any post-processing algorithm run by the data collector.

C. Formalizing Reidentification Risk
Next, we formalize a reidentification risk. Let U be a random variable representing a user in U . Let Y be a random variable representing obfuscated data of U. We assume that user U sends obfuscated data Y to a data collector and that Y is leaked to an adversary. Since each user sends a single datum, a prior distribution of U before obtaining Y is uniform for this adversary; 4 i.e., Pr(U = u i ) = (1/n) for any u i ∈ U . Assume that Y takes a value y ∈ Y. The adversary attempts to determine whether U is u 1 , u 2 , . . ., or u n based on Y = y.
Let p U|Y=y be the posterior distribution, whose element p U|Y=y (u i ) represents the posterior probability that U is u i ; i.e., p U|Y=y (u i ) = Pr(U = u i |Y = y). Using the posterior distribution, we can define the reidentification accuracy of the Bayes classifier. Specifically, let Acc U|Y=y be the following quantity: Acc U|Y=y is the reidentification accuracy of the Bayes classifier after observing Y. In other words, it is the highest possible reidentification accuracy. Acc U|Y=y is a reidentification risk caused by sending Y = y.

D. Relationship Between OneID-LDP and Reidentification Accuracy
We prove that OneID-LDP can be used to upper bound the reidentification accuracy Acc U|Y=y by a desired value.
Theorem 1: Let ε x ∈ R ≥0 be a privacy budget for per- and γ ∈ [1, n]. Then for any y ∈ Y output by Q Acc U|Y=y ≤ γ n .
Theorem 1 states that if we set privacy budgets E = {ε x } x∈X by (7), then we can upper bound the reidentification accuracy by (γ /n) for any obfuscated data y, hence, any user. Note that even if the adversary randomly guesses U, the reidentification accuracy is (1/n). By (7), this accuracy is achieved when γ = 1 and ε x = 0, i.e., no utility.
A study in [22] proposed a privacy notion that upper bounds an average reidentification accuracy over all users. However, this average notion is weak because some users can be victims; e.g., even if the average reidentification accuracy is 1%, the adversary may reidentify 1% of all users with high confidence. In contrast, OneID-LDP with E in (7) upper bounds the reidentification accuracy for every user; i.e., the adversary cannot reidentify any user with high confidence. Thus, OneID-LDP is very strong in that there are no victims.
The value γ should be larger than 1 and much smaller than n to guarantee a small reidentification risk for every user with high utility. For example, we set γ = 100 n in our experiments. Then by (7), ε x ≈ log γ for an unpopular input value x whose frequency is c(x) (n/γ ). ε x is much larger or ∞ for a popular input value x whose frequency is c(x) ≈ (n/γ ) or more. This means that for popular input values, we can strongly prevent reidentification with very little noise.

E. Relationship Between MinID-LDP and Reidentification Accuracy
We prove that MinID-LDP can also upper bound the reidentification accuracy Acc U|Y=y by a desired value.
Proposition 4: Let ε x ∈ R ≥0 be a privacy budget for personal data x ∈ X . Let E = {ε x } x∈X . Let Q be a randomized mechanism providing E-MinID-LDP, where and γ ∈ [1, n]. Then for any y ∈ Y output by Q Acc U|Y=y ≤ γ n .
By Theorem 1 and Proposition 4, the privacy budgets are the same between OneID-LDP and MinID-LDP. This means that our privacy budget tuning algorithms for OneID-LDP, which are proposed in Section IV-F, can also be used for determining privacy budgets in MinID-LDP to prevent reidentification.

F. Automatic Tuning of Privacy Budgets E
In Section IV-D, we showed that we can upper bound the reidentification accuracy Acc U|Y=y by using OneID-LDP with privacy budgets E = {ε x } x∈X in (7). However, E in (7) includes the true frequency distribution c. Unfortunately, the data collector cannot obtain c in advance, because the goal for the data collector is to estimate c.
Therefore, we propose three privacy budget tuning algorithms, all of which do not use the true frequency distribution c. Our three algorithms differ in the auxiliary data used for input. The first algorithm does not use any auxiliary data as input and determines the privacy budget ε x based on the worst case value (i.e., smallest possible value) of c(x). We refer to this algorithm as a worst case tuning algorithm. The second algorithm assumes that the data collector knows that c(x) is larger than or equal to some value for some input values x. Then it determines ε x by using the prior knowledge as auxiliary data. We refer to this algorithm as a prior-based tuning algorithm. Note that this prior knowledge is weak in that the data collector does not know the value of c(x) itself. The third algorithm uses obfuscated data of some users output by OneID-LDP mechanisms as auxiliary data. It estimates a confidence interval of c(x) from the obfuscated data and determines ε x based on the confidence interval. We refer to this algorithm as a confidence interval tuning algorithm. As explained in Section IV-E, all of the three algorithms can be applied to both OneID-LDP and MinID-LDP.
Below, we explain these algorithms in detail. Worst Case Tuning: The worst case tuning algorithm uses the fact that ε x in (7) takes the smallest value when c(x) = 0. Specifically, it outputs privacy budgets E = {ε x } x∈X , where Then, the reidentification accuracy Acc U|Y=y is bounded by (γ /n) for any y ∈ Y. The worst case tuning algorithm does not use any auxiliary data as input. The next two algorithms provide higher utility than this algorithm by using auxiliary data.
Prior-Based Tuning: The prior-based tuning algorithm uses some weak prior knowledge about the frequency count c(x). Specifically, it assumes that the data collector knows c(x) is larger than or equal to some threshold for some input values x. This assumption is reasonable in many practical scenarios. For example, suppose that the data collector wants to estimate a population distribution in 47 prefectures of Japan from two million users (n = 2 × 10 6 ). It is well known that more than 11% of people live in Tokyo. Thus, the data collector would know that c(x) ≥ 10 5 for Tokyo. Note that the data collector does not know the exact value of c(x) in Tokyo. We can use OneID-LDP mechanisms with this prior knowledge to accurately estimate the exact value of c(x) in Tokyo.
Formally, letX ⊆ X be the set of personal data for which the data collector has prior knowledge. The data collector knows that c(x) is larger than or equal to a threshold λ(x) ∈ Z ≥0 for x ∈X . Then, the prior-based tuning algorithm assigns λ(x) to c(x) for x ∈X and 0 to c(x) for x / ∈X in (7). That is, the prior-based tuning algorithm takes λ(x) for x ∈ X as input and outputs privacy budgets Note that c(x) ≥ λ(x) for x ∈X and c(x) ≥ 0 for x / ∈X . Since ε x in (7) is monotonically increasing with respect to c(x), Acc U|Y=y is bounded by (γ /n) for any y ∈ Y.
Confidence Interval Tuning: The prior-based tuning algorithm assumes weak prior knowledge about the frequency count c(x). Although this is reasonable in many practical scenarios, as explained above, it can happen that the data collector has no prior knowledge about c(x) in some cases. For example, the data collector may not have any prior about c(x) for health conditions collected from new wearable devices.
For this scenario, we propose the confidence interval tuning algorithm to estimate c more accurately than the worst case tuning algorithm. In the confidence interval tuning, we divide users into two groups: 1) worst case group and 2) confidence interval group. First, we use the worst case tuning algorithm for users in the worst case group and collect their obfuscated data providing OneID-LDP. Then, we estimate the confidence interval of c(x) from the obfuscated data. Based on the confidence interval, we determine privacy budgets E = {ε x } x∈X for users in the confidence interval group. Finally, we collect their obfuscated data providing OneID-LDP and calculate an estimateĉ. All users are protected by OneID-LDP.
Formally, let U 0 ⊆ U be the worst case group. Without loss of generality, we assume that the worst case group is U 0 = {u 1 , . . . , u n 0 }, where n 0 ∈ [n]. Each user in the worst case group U 0 obfuscates her personal data using an IDUE mechanism (described in Section III-D) to provide OneID-LDP with

Algorithm 1 Confidence Interval Tuning
if c(x i ) < n γ then 7: n−γ c(x) 8: else 9: (9), i.e., worst case tuning. Then, she sends her obfuscated data to the data collector. Let Y 0 = {Y (1) , . . . , Y (n 0 ) } be the set of obfuscated data of U 0 . The data collector estimates a confidence interval of c(x) from Y 0 and determines E based on the interval. Algorithm 1 shows our confidence interval tuning algorithm. Assume that the input domain is X = {x 1 , . . . , x |X | } without loss of generality. For i ∈ [|X |], the data collector estimates a confidence interval of c(x i ) and sets ε x i based on the interval (lines 2-11). Recall that the output range be the number of "1"s in the ith bit of output data in Y 0 (output of FrequencyCount(Y 0 , i) in line 3). The relative frequency of "1" (resp., "0") in the ith bit of input data is ([c(x i )]/n) (resp., 1 − ([c(x i )]/n)). Thus, by (4), the probability that the ith bit of output data is "1" can be written as: Let r(x i ) be the following quantity: r(x i ) is the probability that the ith bit of output data is "1." Thus, we can assume that the number t i of "1"s in the ith bit of output data is generated from the Binomial distribution B(n, r(x i )) with success probability r(x i ) A confidence interval for r(x i ) is known as the binomial proportion confidence interval [53], [54], [55]. It can be estimated from t i and n using estimators, such as the Normal approximation interval and the Wilson score interval. We estimate the confidence interval of r(x i ) from t i and n. Here, we use the Wilson score interval because it is accurate [53], [54], [55]. Specifically, let z ∈ R ≥0 be the 1 − (α/2) quantile of a standard normal distribution N(0, 1) corresponding to the significance level α (output of ZValue(α) in line 1). For example, for a 95% (resp., 99%) confidence interval, α = 0.05 and z = 1.96 (resp., 2.576). Then, the Wilson score interval of r(x i ) is given by By (11), we can calculate the confidence interval of c(x i ) corresponding to r(x i ) in (13) (if c(x i ) becomes negative, we set c(x i ) = 0). Since ε x in (7) is a nondecreasing function of c(x), we adopt the minimum values of r(x i ) and c(x i ) in the intervals (lines 4 and 5). In other words, we consider the worst case about c(x i ) in the confidence interval. Then, we set E = {ε x i |1 ≤ i ≤ |X |} by (7) (lines 6-10). Finally, each user in the confidence interval group U \ U 0 obfuscates her personal data using an IDUE mechanism to provide OneID-LDP with E output by the confidence interval tuning algorithm (Algorithm 1). The data collector estimates c from obfuscated data Y. Note that the worst case group and confidence interval group use different randomized mechanisms. To deal with this difference, we calculate an empirical estimate [12] for each group and then calculate an estimateĉ of c by the inverse-variance weighting [56] of the two empirical estimates.
Significance Level: The utility and privacy of our confidence interval tuning depend on the significance level α. If we set the significance level α to α = 0, then z = ∞ and the estimated minimum value of c(x i ) becomes 0. Thus, the confidence interval tuning with α = 0 is identical to the worst case tuning. As α is increased from 0, the privacy budgets E become larger and the utility is increased. However, the true frequency count c(x i ) can be smaller than the estimated minimum value with probability (α/2). Thus, the reidentification accuracy Acc U|Y=y may exceed (γ /n) when α is too large.
In our experiments, we set α = 0.05 and show that Acc U|Y=y does not exceed (γ /n) in this case.
Which Tuning Algorithm to Use? We have so far proposed three privacy budget tuning algorithms. Here, we provide a guideline for which algorithm to use in practice.
As we will show in our experiments, an appropriate tuning algorithm depends on the task of the data collector and the prior knowledge about the frequency count c(x). In some tasks, frequency counts of popular input values [e.g., c(x) ≥ (n/γ )] are, especially, important; e.g., they are used for finding popular POIs [2] and automatic labeling of POIs, such as offices and schools [57]. For popular input values, the worst case tuning provides the lowest utility, and the prior-based tuning provides the highest utility. Thus, if we have some prior knowledge about c(x), we should use the prior-based tuning algorithm. Otherwise, the confidence interval tuning could be the best choice.
However, for unpopular input values [e.g., c(x) (n/γ )], our three tuning algorithms provide almost the same utility. Thus, if we want to estimate frequency counts of unpopular input values, the worst case tuning would be sufficient.

V. EXPERIMENTAL EVALUATION
In this section, we show through experiments that our algorithms provide much higher utility than LDP mechanisms while preventing reidentification. We also show that existing ID-LDP mechanisms (i.e., MinID-LDP and HL-LDP mechanisms) lack either utility or privacy.
Section V-A explains our experimental setup. Section V-B reports our experimental results.

A. Experimental Setup
Data Set: We conducted experiments using the following four real data sets.
1) Foursquare Data Set: The Foursquare data set (Globalscale Check-in Data Set with User Social Networks) [58] is a large-scale location data set. It includes 90 048 627 check-ins all over the world, each of which is associated with a POI ID and venue category (e.g., hospital, restaurant, park, and university). Following [58], we selected six cities with numerous check-ins and with cultural diversities: Istanbul (denoted by IST), New York (NYK), Tokyo (TKY), San Paulo (SP), Kuala Lumpur (KL), and Jakarta (JK). We extracted one check-in from each user. 2) Localization Data Set: The Localization data set [3] (denoted by Local) is a person activity data set collected using wearable sensors. It includes 164 860 records, each of which has an activity value, such as walking, falling, lying, and on all fours (11 values in total). 3) ADL Data Set: The activities of daily living (ADL) data set [59] (denoted by ADL) is a person activity data set collected using a wireless sensor network. It includes 741 records, each of which has an activity value, such as toileting, sleeping, showering, and lunch (10 values in total). 4) RFID Data Set: The RFID-based activity recognition data set [60] (denoted by RFID) is a person activity data set collected from older people using RFID reader antennas around rooms. It includes 75 128 records, each of which has an activity value, such as sitting on a bed, lying on a bed, and ambulating (4 values in total). In Local, ADL, and RFID, we assumed that each record is from a different user. In the Foursquare data set, we assumed that input values (POIs) with "home" or "hospital" categories are sensitive. In the person activity data sets, we assumed sleeping (or lying/lying down), toileting, and showering as sensitive because they reveal detailed life patterns. We set the privacy budgets for these sensitive input values to 1 [15] to strongly protect them, as described in Section I. Table IV shows the number of users, input values, and sensitive input values in each data set.

3) MinID-LDP Mechanism:
A randomized mechanism providing E-MinID-LDP. To provide MinID-LDP, we used the optimal IDUE mechanism in Section III-D. 4) HLLDP Mechanism: A randomized mechanism providing (X S , ε S )-HLLDP. Specifically, we used the utilityoptimized RAPPOR [13] as an HLLDP mechanism. 5) OneID-LDP Mechanism: Our E-OneID-LDP mechanism in Section IV-B. To provide OneID-LDP, we used the optimal IDUE mechanism. Parameters: We set the privacy budgets for sensitive input values to 1, as explained above. Since LDP regards all input values as equally sensitive, we set ε = 1 for RAPPOR and OUE. For HLLDP, we set the privacy budget to ε S = 1 for the sensitive input values X S and ∞ for the remaining input values. For the MinID-LDP and OneID-LDP mechanisms, we used our privacy budget tuning algorithms to determine the privacy budgets E.
Our privacy budget tuning algorithms have three parameters: γ , α, and n 0 (α and n 0 are used only in the confidence interval tuning algorithm). In our experiments, we set γ = 10 or 100; i.e., we set E so that the reidentification accuracy is smaller than (10/n) or (100/n) (see Theorem 1). In the prior-based tuning algorithm, we assumed that the data collector knows popular personal data x whose frequency c(x) is larger than or equal to (n/γ ). In other words, we used the set of the popular personal data asX and set λ(x) = (n/γ ) for x ∈X . In the confidence interval tuning algorithm, we set the significance level α to α = 0.01 or 0.05 and assumed that 10% or 50% of users are in the worst case group; i.e., n 0 = 0.1n or 0.5n.
We set γ = 100, α = 0.05, and n 0 = 0.1n as default values. Then, we changed each of γ , α, and n 0 while fixing the other two to see how each parameter affects the performance.
Utility and Privacy: We evaluated the utility and privacy of the randomized mechanisms.
For utility loss, we evaluated the MAE and MSE over all input values, as described in Section III-B. We also evaluated the MAE and MSE over popular input values x whose frequency counts c(x) are larger than or equal to (n/100).
For privacy, we considered the following reidentification attack. In our experiments, the input domain is X = {x 1 , . . . , x |X | } and the output range is Y = {0, 1} |X | . Given obfuscated data y ∈ Y, the adversary extracts indices whose corresponding values in y are 1. Then, the adversary chooses an index i whose privacy budget ε x i is the largest among the extracted indices. Finally, the adversary outputs a user who has x i as a reidentification result (if multiple users have personal data with the largest privacy budget, then the adversary randomly outputs one user from them).
Note that this adversary knows the values of privacy budgets and each user's personal data x i , i.e., maximum-knowledge attacker [61], [62]. The maximum-knowledge attacker model is useful for evaluating reidentification risks when we assume a worst case scenario about the adversary's background knowledge. It also poses a threat in some practical situations. For example, if a user sends some additional information (e.g., time and health condition) along with x i (e.g., location) from her wearable device, the adversary can link the additional information to the user by this reidentification attack. The linked information may also be used for reidentifying other databases or making a user profile, as described in Section I.
We implemented the above reidentification attack and evaluated a reidentification rate, which is the proportion of correctly identified data. For both utility and privacy, we ran a randomized mechanism 1000 times and evaluated the average performance.

B. Experimental Results
Utility: Figs. 3 and 4 show the MAE/MSE over popular input values and all input values, respectively (W, C, and P in the parentheses represent the worst case tuning, confidence interval tuning, and prior-based tuning, respectively). Here, we set γ = 100, α = 0.05, and n 0 = 0.1n (later, we will change the values of γ , α, and n 0 ). In JK, there is no popular input values such that c(x) ≥ (n/100). Thus, we do not show the results for JK in Fig. 3.
Figs. 3 and 4 show that LDP mechanisms (RAPPOR and OUE) provide poor utility. This is because LDP regards all input values as equally sensitive. MinID-LDP provides utility similar to LDP, and the prior-based tuning does not improve the utility of MinID-LDP. This is because MinID-LDP uses a small pair budget when either of the two input values is sensitive. In other words, it still overprotects personal data. Figs. 3 and 4 also show that HLLDP provides the highest utility. However, it comes at the expense of privacy-later, we will show that HLLDP is vulnerable to reidentification attacks and cannot be used for our purpose of privacy protection.
Except for the insecure HLLDP, our OneID-LDP mechanisms provide the best performance. They outperform LDP and MinID-LDP mechanisms by one or two orders of magnitude. For popular input values, OneID-LDP (C) outperforms OneID-LDP (W), and OneID-LDP (P) provides the highest utility (see Fig. 3). For example, the MSEs of OneID-LDP (W), OneID-LDP (C), and OneID-LDP (P) in IST were 4.43, 3.72, and 2.04, respectively. In contrast, for all input values, all of our three OneID-LDP mechanisms provide almost the same utility (see Fig. 4). This is because most of the input values are unpopular (c(x) (n/γ )) and ε x ≈ log γ for these input values in all of our three OneID-LDP mechanisms. In other words, the worst case tuning is sufficient for estimating the frequency counts of unpopular input values.
Thus, an appropriate tuning method depends on the task and the prior knowledge about c(x). If we want to accurately estimate popular input values and have some prior knowledge about c(x), then we should use the prior-based tuning. If we want to estimate popular input values without any prior, then we could use the confidence interval tuning. Otherwise, the worst case tuning would be sufficient.
Privacy: Next, we evaluated the reidentification risk. Fig. 5 shows the reidentification rate for all users. when γ = 100, α = 0.05, and n 0 = 0.1n (later, we will change γ , α, and n 0 ). We show the results for HLLDP and our three OneID-LDP mechanisms (W, C, and P). We do not show the results for RAPPOR, OUE, and MinID-LDP, because they lack utility as shown in Figs. 3 and 4. Fig. 5 shows that all of our three OneID-LDP mechanisms (W, C, and P) keep the reidentification rate smaller than the required value (= [γ /n]). This is because our privacy budget tuning algorithms determine privacy budgets in OneID-LDP so that the reidentification accuracy is bounded by (γ /n), as described in Section IV-F. In contrast, the reidentification rate of HLLDP is much higher than the required value in the Foursquare data set. This is because HLLDP assigns ε x = ∞ for nonsensitive data and reveals the corresponding input values. In Local, ADL, and RFID, the number |X | of input values is very small, as shown in Table IV. Thus, many users have the same input value, and reidentification is difficult in these data sets. However, |X | is very large in the Foursquare data set, Fig. 5. Reidentification rate for all users (W: worst case tuning, C: confidence interval tuning, P: prior-based tuning, γ = 100, α = 0.05, and n 0 = 0.1n). Fig. 6. Reidentification rate for outliers in the Foursquare data set (W: worst case tuning, C: confidence interval tuning, P: prior-based tuning, γ = 100, α = 0.05, and n 0 = 0.1n). Outliers have a unique input value and an output value with at least one "1" in nonsensitive bits. and consequently, many users have a "unique" input value; i.e., many input values are associated with only one user. Therefore, HLLDP is vulnerable to the reidentification attack in the Foursquare data set.
To show the vulnerability of HLLDP more comprehensively, we also evaluated the reidentification rate for "outliers" who have a unique input value and an output value with at least one "1" in nonsensitive bits. Fig. 6 shows the results in the Foursquare data set. We observe that the reidentification rate of HLLDP is 100%. This is because, in the HLLDP mechanism in [13], every output value with at least one "1" in nonsensitive bits reveals the corresponding input value. These output data are called invertible data in [13]. Since the invertible data reveal the corresponding input values, HLLDP allows the adversary to perfectly reidentify the outliers. Thus, HLLDP cannot be used to prevent reidentification.
In contrast, our three OneID-LDP mechanisms keep the reidentification rate smaller than the required value (= [γ /n]) even for the outliers. This is because OneID-LDP upper bounds the reidentification accuracy by (γ /n) for any obfuscated data y ∈ Y, hence, any user (Theorem 1).
Note that MinID-LDP also upper bounds the reidentification accuracy by (γ /n) because MinID-LDP uses smaller privacy budgets than OneID-LDP. However, it comes at the cost of utility, as shown in Figs. 3 and 4.
Effects of Parameters: We also examined how the parameters γ , α, and n 0 in our privacy budget tuning algorithms affect the utility and privacy. Fig. 7 shows the MAE/MSE over all input values 5 when we set γ = 100 or 10. In addition,  Figs. 8 and 9 show the reidentification rate for all users and outliers, respectively, when we set γ = 100 or 10.
Figs. 7-9 show that γ controls the privacy-utility tradeoffas γ decreases from 100 to 10, the privacy is improved at the cost of utility. Figs. 8 and 9 also show that our OneID-LDP mechanisms keep the reidentification rate smaller than the required value (= [γ /n]), irrespective of the value of γ . This result demonstrates that our privacy budget tuning algorithms successfully determine the privacy budgets so that obfuscated data prevent reidentification, as desired.
Finally, we examined the effect of the other two parameters α and n 0 in our confidence interval tuning algorithm. Figs. 10-12 show the MAE/MSE over popular input values, the MAE/MSE over all input values, and the reidentification rate, respectively, when we change α and n 0 . Fig. 10 shows that as the significance level α decreases from 0.05 to 0.01, the utility becomes worse, especially, in IST and KL. This result is expected, as the privacy budgets decrease with decrease in α. Fig. 12 shows that the privacy is slightly improved with decrease in α. However, our OneID-LDP mechanisms keep the reidentification rate smaller than the required value even when α = 0.05, as shown in Figs. 8 and 9. Thus, Fig. 9. Effect of the parameter γ on the reidentification rate for outliers (W: worst case tuning, C: confidence interval tuning, P: prior-based tuning, α = 0.05, and n 0 = 0.1n).  becomes worse, and the privacy is slightly improved. This is because the users in the worst case groups have smaller privacy budgets. Note that when n 0 = n, our confidence interval tuning is equivalent to the worst case tuning. Our confidence interval tuning provides higher utility when there are a lot of users in the confidence interval group.
Summary: In summary, our experimental results show that the existing instances of ID-LDP lack either utility or privacy. Specifically, Min-IDP still overprotects personal data, and, therefore, its utility gain over LDP (i.e., RAPPOR and OUE) is limited. HLLDP assigns ε x = ∞ for nonsensitive data and, therefore, is vulnerable to the reidentification attack.
In contrast, our OneID-LDP mechanisms with our privacy budget tuning algorithms provide much higher utility than LDP and Min-LDP mechanisms while keeping the reidentification accuracy smaller than the required value. Thus, our mechanisms can be used for accurate analysis of personal data collected from IoT devices while strongly preventing reidentification, which is considered to be a major risk in GDPR.
One limitation of our proposed methods is that our confidence interval tuning algorithm does not theoretically upper bound the reidentification accuracy. As described in Section IV-F ("Significance Level"), the reidentification accuracy may exceed the required value (γ /n) when the significance level α is too large. If we want to theoretically upper bound the reidentification accuracy, we should use the worst case tuning algorithm or the prior-based tuning algorithm.

VI. CONCLUSION
We proposed three privacy budget tuning algorithms for ID-LDP to provide high utility while preventing reidentification. We also proposed OneID-LDP as a new instance of ID-LDP and proved that it upper bounds the reidentification accuracy for every user. Through experiments using four real data sets, we showed that existing ID-LDP mechanisms lack either utility or privacy. Then, we showed that our OneID-LDP mechanisms with our privacy budget tuning algorithms provide much higher utility than LDP mechanisms while keeping the reidentification accuracy smaller than the required value.
In this article, we focused on frequency estimation of personal data such as locations and person activity data as a task of the data collector. As future work, we would like to develop ID-LDP mechanisms and privacy budget tuning algorithms for more complicated tasks, such as item recommendation [63] and subgraph counting in a social graph [34].

APPENDIX A PROOF OF THEOREM 1
Recall that U and Y are random variables representing a user and obfuscated data of U, respectively. Let X be a random variable representing personal data of U. Personal data X is uniquely determined given U. Let x be the personal data of u i . Then, c(x) ≥ 1 and the posterior probability p U|Y=y (u i ) can be written as By (14) and (15), we have p U|Y=y (u i ) ≤ e ε x e ε x c(x) + (n − c(x)) .
Since this inequality holds for any u i ∈ U , (8) holds.

APPENDIX C PROOF OF PROPOSITION 3
Let x, x ∈ X , y ∈ Y, and z ∈ Range(λ). Since Q provides E-OneID-LDP, we have

APPENDIX D PROOF OF PROPOSITION 4
We prove Proposition 4 via the following lemma. Lemma 1: Let ε x ∈ R ≥0 be a privacy budget for personal data x ∈ X . Let E = {ε x } x∈X . If a randomized mechanism Q provides E-MinID-LDP, then Q also provides E-OneID-LDP.
Proof of Lemma 1: If a randomized mechanism Q provides E-MinID-LDP, Q provides (E, r)-ID-LDP, where r(ε x , ε x ) = min{ε x , ε x } (see Definition 3). This means that for any x, x ∈ X and any y ∈ Y, we have Q(y|x) ≤ e min{ε x ,ε x } Q y|x ≤ e ε x Q y|x .
Proposition 4 is immediately derived from Theorem 1 and Lemma 1. Specifically, a randomized mechanism Q providing E-MinID-LDP for E in (7) also provides E-OneID-LDP for E in (7) (by Lemma 1). Therefore, Q provides Acc U|Y=y ≤ γ n for any y ∈ Y (by Theorem 1).