Secure, ID Privacy and Inference Threat Prevention Mechanisms for Distributed Systems

This paper investigates facilitating remote collection of a patient’s data in distributed system while protecting the security of the data, preserving the privacy of the patient’s ID, and preventing inference attack. The paper presents a novel framework called SPID stand for a Secure, ID Privacy, and Inference Threat Prevention Mechanisms for Distributed Systems. In designing this framework, we make the following novel contributions. The SPID presents a novel architecture that supports the use of a distributed set of servers owned by different service providers. The SPID allows the patient to access these servers using certificates generated by the patient. The SPID allows the patient to select one server to be the home server, and select a number of servers to be the foreign servers. The patient uses the foreign servers to upload data. The home server is responsible for collecting the patient’s data from the foreign servers and sending them to the healthcare provider. The SPID proposes a method for efficient verification of each request from the patient without searching in the server’s database for the verification key. This is done by using some of the Elliptic Curves Cryptography (ECC) properties. The SPID has been analyzed using a bench-marking tool and evaluated using queuing theory. The evaluation results indicate an efficient performance when the number of servers increases. We uses Shannon entropy method to measure the likelihood of the inference attack.


I. INTRODUCTION
The Internet of Things (IoT) can be defined as the paradigm of connecting smart things (e.g. sensors, devices) together by means of information and communication technologies to build intelligent systems and services to obtain required information. The smart things can sense the surrounding environment and communicate with each other to exchange information. These features (i.e. sensing and communicating) facilitate many emerging attractive applications. One of these applications is Patient Health Monitoring (PHM) systems. A PHM system involves the use of mobile computing and wireless communication technologies to regularly collect data from a patient for the purpose of analyzing the patient's health and making health-related decisions [1], [2]. A typical PHM system, as shown in Figure 1, consists of body sensors and a mobile or fixed device at the The associate editor coordinating the review of this manuscript and approving it for publication was Zheng Yan . patient's end (e.g. home) and remote servers at the healthcare provider's end. The body sensors worn by the patient are connected wirelessly to the device that is, in turn, connected to the servers via wireless and/or wired networks. The health-related data (e.g. heart rate, blood pressure, etc) collected by the body sensors are sent to the device, which are then delivered to the healthcare provider for further analysis and decision making. The operations performed by a PHM system are typically of three stages: data collection, data analysis (i.e., the analysis of the collected data), and decision making (based on the outcome of the data analysis). The data collection stage is crucial to the correct running of a PHM system, as the correctness of the analysis and decision making are dependent on the correctness of the data collected [3], [4].
It is expected that future PHM systems will be built on infrastructure owned by a third-party service provider which has more resourceful storage and data processing capabilities, such as Microsoft [5], [6]. This is because on-premises infrastructures, on which most healthcare providers rely today, might not be able to handle the volume of data generated from wearable devices. It is estimated that by 2021 more than 222 million wearable devices may have been poured onto the market and three in five patients may use remote monitoring services. These devices can generate data at high frequencies (e.g. every 5 minutes), generating massive volumes of data. When the patient's data is collected by the third-party service provider, a number of security and privacy threats will be brought up. These threats are authentication, data confidentiality and authenticity, ID privacy, and inference threats. The last threat comes from using one service provider.
The inference attack [7] occurs when an unauthorized entity (e.g. a third-party service provider) can use some attributes such as the pattern of communication (e.g. how many times per day a patient uploads data to the service provider) to link multiple transactions to the same user. For example, a patient uploads some data (such as blood pressure) at 1 pm every day and other data (such as heart rhythm) every 10 minutes. Then, even if that patient uses a different pseudonym in each transaction, over time and by observing the upload pattern (i.e., 1 pm and every 10 minutes), the service provider may infer that all the data uploaded at this time belongs to the same patient. This is considered as a threat to the patient's privacy. Many research [8], [9], [10], [11], [12], [13], [14], [15] devoted to investigate in mechanisms that prevent and detect authentication, data confidentiality and authenticity threats.
To prevent the inference attack some research has suggested using group signature schema and a broadcasting strategy to protect contextual privacy. Boussada et al [12] present a privacy-preserving aware data transmission protocol that preserves the privacy of a patient's data and the contextual data. To preserve the patient's data privacy, the patient's health data is encrypted and to preserve the contextual privacy (to prevent the inference attack) the protocol used pseudonym IDs combined with a broadcasting strategy. Liang [14] propose a privacy-preserving emergency call scheme, called PEC, enabling patients in life-threatening emergencies to transmit emergency data to the nearby helpers via mobile healthcare social networks (MHSNs). In PEC the ID of the patients is preserved via using group signature. Lin et al [15] proposed a strong privacy-preserving scheme against global eavesdropping, named SAGE, for eHealth systems. SAGE uses a broadcasting strategy to prevent an adversary from linking patients to their respective physicians. Marin et al [16] proposed a secure and ID privacy-preserving data collection protocol. The protocol considers a number of security and ID privacy-preserving requirements. To protect the ID privacy of the patient they used a combination of group signature and pseudonym IDs. However, the reliance on a fixed data concentrator for a group of patients would make it easy for that entity to analyze the meta-data associated with the messages and over time cloud build sensitive information about each patient.
In this paper we are going to present an innovative framework called Secure, ID Privacy and Inference threat prevention mechanisms for Distributed Systems (SPID). The SPID uses distinguished mechanisms that consists of a system architecture and software to solve the problem of the inference threat and prevent other threats. The idea is to allow the patient to use different service providers to collect his/her data rather than relying on one provider. Regarding the system architecture, our system architecture consists of a healthcare provider (HCP) and a number of service providers (SPs) (e.g. Amazon, Google, Yahoo). These service providers should first communicate with the HCP and request to participate in the health data collection service to be authorized. The patient can upload the health data to any authorized SP. Each SP is responsible for storing the data temporally and then send them to the HCP for further analysis and decision. The SP is not allowed to know any thing about the patient's health data and the patient's real ID. In more details, the patient can select a number of authorized SPs every day. One of the SPs is going to be a home provider and the other ones are the foreign providers. The patient uses foreign providers to upload the encrypted data. The home provider is responsible for collecting the patient's data from the foreign providers, aggregating and sending them to the HCP. The set of SPs is not static; it is dynamic. This means that the patient has to change the home and foreign servers every day. The HCP need not to know anything about the selected SPs (i.e, do not know the patient's home and foreign servers). The patient uses a swarming algorithm to upload on the foreign servers (the detailed designed of this algorithm will not be covered in this paper). Moreover, each SP can ensure the authenticity of the patient and his/her data. The HCP can also ensure the authenticity of the patient's data. Regarding the software components, SPID allows the patient to generate pseudonyms and certificates to access foreign servers. The certificates are generated by the patient and signed blindly by the patient's home server. To protect against the data authenticity and confidentiality threats, the patient encrypts and signs the data before uploading. The SPID framework has been analyzed using a bench-marking tool and evaluated using queuing theory. The evaluation results indicate an efficient performance when the number of service providers increases. We uses Shannon entropy method to measure the likelihood of the inference attack.

II. RELATED WORKS
The related works is divided in to two board categorise one to study the health data collection systems and the other one examine the authentication in distributed systems. Regarding VOLUME 11, 2023 the health data collection systems, in [8] the authors proposed a federated cloud and Internet of Things (IoT) health monitoring system. The system fails to satisfy a number of security and privacy requirements, such as ensuring patients' data confidentiality and authenticity, and patients' ID privacy. Similar to [8], authors in [9] proposed a secure fog-based smart health service. The patient data is collected by heterogeneous fog layers. This system supports data encryption and confidentiality but the patients' ID privacy was not considered. In [11], the authors proposed a privacy-preserving priority classification scheme, called PPC. The proposed system does not consider the ID privacy of the patient. The authors in [12] presented a privacy-preserving aware data transmission for IoT-based health applications. The proposed system preserves the privacy of a patient's data and the contextual data. To preserve contextual privacy the proposed system uses a building path algorithm and broadcasting strategy. The authors in [15], proposed SAGE which stands for Strong privacy-preserving scheme Against Global Eavesdropping for e-health system. SAGE protects the confidentiality of the health data and cuts off the relationship between the patient and his/her physician. SAGE preserves the content and contextual privacy. This is done by first encrypting the health data and then sending the encrypted data to a health centre. The centre is responsible for broadcasting the data to all physicians. Then, only the potential physicians will be aware of the data of their patients. In [17], the authors proposed a health remote monitoring system based on the blockchain technology called Healthchain. The architecture of the system supports large-scale health data and has good scalability. In Healthchain, the authors neglect preserving the patient's ID privacy. In [18], the authors proposed a decentralized privacy-preserving healthcare blockchain for IoT. In this work, the patient uploads confidential and authenticated data on a cloud server. The cloud server verifies the data, generates a data block, generates a hash for the data block, and sends the hash of the data block to the blockchain. For anonymous transactions, the authors proposed a ring signature scheme. Each block has information about the sender (the patient) and the receiver (doctor). Thus, the blockchain network can link the patients to their doctors. In this work, each time the patient is uploading to the same server.
The architecture of the proposed framework SPID resembles the architecture of Federated Identity Management (FIM) systems [19]. The FIM system has several models: the centralized identity model, the user-centric identity model, and the decentralized identity model. The centralized identity model [19] relies on a central entity to conduct the authentication between two entities. Shibboleth [20] is an example of federated identity management systems. One of the early federated identity management systems that applied this approach was PseudoID [21]. In this work, the identity provider issues the user a number of credentials that are signed blindly using the identity provider's private key. To prevent the identity provider from tracking the user, the author [22] presented a solution called a Crypt-book. The Crypt-book is a layer between a number of identity providers and service providers, which hides the real identity of a user from the service providers. In user-centric identity model, the user is involved in generating credentials to access different service providers. In [23], the user performs authentication with different service providers based on a set of certified attributes issued by the user's identity provider.The Blank Digital Signature (BDS) allows the user to generate a signature on a subset of attributes which can be verified by the service provider. A framework proposed by [24] is called the SPICE framework. In this framework, the authors presented a novel authentication mechanism where the main service provider issues only one credential to each user no matter how many service providers the user wants to use. For authentication, the user generates (based on the credential) many certificates to prove the possession of (different sets of) attributes required by different service providers, without asking the registrar to issue a new certificate each time.
In the decentralized identity model, the power is given to a collection of nodes that use distributed ledger technology (DLT) to store identity information about users where no node can change the content of DLT. The trust between nodes is reached by a consensus mechanism. The distributed DLT is built using blockchain technology. In Trustroam [25], the authors proposed an authentication method to provide cross-domain roaming authentication. This means that a user from institute A can use his/her identity credentials to access a network of institute B. The authentication of this proposed solution is based on distributed consensus algorithms of the blockchain. However, this method allows institutes to link different accesses by the same user as the user is using the same identity across domains. To solve the linkability problem, the authors proposed blockchain lightweight anonymous authentication (BLA) [26]. The proposed mechanism allows a vehicle to access services across distributed domains. This mechanism allows the vehicle to use different pseudonyms each time to prevent linking multiple requests sent by the same vehicle.
From the critical analysis of the related works, we can find that none of the related works provide a system that prevents all the threats (i.e. security, ID privacy, and inference threats). Therefore, this paper presents an innovative framework SPID that prevents all the threats with an acceptable performance.

III. SPID ARCHITECTURE
In this section we present SPID architecture. The SPID architecture is different from the one shown in Figure 1. The generic and detailed SPID architectures are shown in Figures 2 and 3 respectively. The SPID architecture consists of the HCP and different service providers.

A. SPID COMPONENTS AND INTERACTIONS
• Service Providers: Each service provider plays two roles: home and foreign providers. This means that the same service provider may act as a home provider for one patient, but a foreign provider to another patient. The foreign provider is used by the patient to upload the health data. The foreign provider forwards the health data to the patient's home provider. The home provider is responsible for forwarding the data after aggregating them to the HCP. It should be noted that the home and foreign providers for the patient are not static but dynamic. Figure 3 illustrates the SPID architecture from the patient's perspective i.e., the idea of the home and foreign providers. In the rest of this paper we may refer to the home provider or foreign provider as a home server and a foreign server respectively or data collection servers.
• The Healthcare Provider Server (HCP): This runs the business logic of the system. It is the ultimate destination for all patients' data. It is where this data is stored, processed, and analyzed. The HCP receives the patients' data from their respective home providers. In addition, the HCP issues a certificate for the service provider that want to participate in the data collection service. Then, the HCP adds the service provider to its open access database.
• The Patient: The patient wears small devices integrated with low-power computation, communication and storage modules to measure and monitor health data (e.g. ECG, blood pressure) from the patient. The health data is sent to a mobile device (i.e. via Bluetooth) for processing. The mobile device classifies the data into normal or abnormal data, and structures the data into a predefined format, called Patient-Generated Health Data (PGHD). The PGHD is uploaded to foreign providers. In details, each patient registers with one service provider which is called the patient's home provider. The other providers are called the patient's foreign providers. The patient changes the home and foreign providers daily. In the rest of this paper we may use a patient or mobile device interchangeably.

IV. THREATS ANALYSIS
In this section we analyze the potential threats on SPID architecture.

FIGURE 3.
A detailed SPID architecture Each patient has one home and many foreign servers. VOLUME 11, 2023 • Authentication threats: A malicious service provider or an external adversary may try to impersonate a legitimated service provider or patient respectively to gain unauthorised access to the patient's data.
• Data authenticity threats: The patient's data may be delayed, replayed, or even modified. A malicious service provider or an external adversary may try to forge data to make it seems as it is from a legitimate patient.
• Data confidentiality threats: If the patient's data is not protected during transit or in store, a malicious service provider or an external adversary may gain access to the patient data, e.g. by eavesdropping the channel.
• Patient's ID privacy threats: Even if the patient is using different artificial IDs when registering with different service providers, the patient's sessions with the same service provider may be linked if the patient does not change the artificial ID for each session.

A. DESIGN REQUIREMENTS
Here we specify a set of design requirements which facilitate the design of a secure and ID privacy-preserving protocols for SPID framework. The SPID protocols should provide strong protection against predefined threats.
• The system should preserve patient's ID privacy (P): To satisfy this requirement, only the HCP can learn the real ID of the patient. We should prevent linkability across different service providers, each service provider should identify the patient under a pseudonym ID. We should prevent linkability across sessions with the same service provider, the patient should use different pseudonym IDs for each session.
• The system should support mutual entity authentication(S1): Entity authentication ensures that a communicating entity is indeed who it claims to be. This requirement should be satisfied without compromising the patient's ID privacy.
• The system should support end-to-end data authenticity (S2): Data authenticity assures that data are indeed from the claimed source and that it is the same as has been sent by the original sender. Each service provider should ensure that the uploaded data is authenticated without compromising the ID privacy of the patient.
• The system should support end-to-end data confidentiality(S3): This requirement is to protect against unauthorized access to data in transit or on store.

B. HIGH LEVEL IDEAS
In this section we list a number of novel ideas to satisfy the design requirements.
• Design multiple index pseudonyms and multiple request pseudonyms to satisfy ID privacy preservation requirements.
-First, with each provider, the patient should have a pseudonym ID called the 'index pseudonym'. The index pseudonym serves as an account name. This is to prevent multiple providers from colluding together to identify the patient. -Second, for each request with the same provider, the patient should use a fresh pseudonym called a request pseudonym. The request pseudonym should be generated based on the index pseudonym. This is to make the linking process feasible. The pseudonym generation and linkage algorithms are presented in Section VI.
• Design double signing and encryption method to satisfy the security requirements (S2, S3): This method is explained in Section VI.
• Design an anonymous authentication method to satisfy the security requirements (S1): To ensure authorized use of the data collection service, the patient should be identified and authenticated before s/he is allowed to access the collection service and this should be achieved without compromising the patient's ID privacy. To support such authentication in a seamless and scalable manner, we designed two anonymous authentication credentials as follows.
-A Home Pseudonym Certificate (HCert) for authenticating the patient with the home provider and Foreign Pseudonym Certificates (FCerts) for authenticating the patient with foreign providers. Figure 4 and Figure 5 shows the HCert and FCert respectively. -The HCert is generated by the HCP to allow the patient to register with any provider as a home provider. -The FCert is generated by the patient and blindly signed by patient's home provider. The FCert allows the patient to use any foreign provider.

C. NOTATION
The notation used throughout the paper is summarized below.

V. CRYPTOGRAPHY BUILDING BLOCKS
In this section we introduce the cryptographic building blocks which are used in designing our methods and protocols. We use Rivest, Shamir, Adleman (RSA), Advanced Encryption Standard (AES), Elliptic Curve Cryptography (ECC), blind signature based on ECC, and the digital certificate [27]. We here present only the ECC and the blind signature [7], [28].

A. ELLIPTIC CURVE CRYPTOGRAPHY (ECC)
The following defines the elliptical curve field and equation.
The elliptic curve over a finite field : Z p , p > 3, is the set of all pairs (x,y) in Z p which satisfies the equation E : , and the following condition should be satisfied 4a 3 + 27b 2 = 0 (mod n). The domain parameters which define the elliptic curve are (t,a,b,P,n), where n is the module prime, a and b are coefficients of the elliptic curve equation, P is the generator point, and t is the number of points in the field.
The ECC comprises four algorithms: a key generation algorithm (EKeyGen), a key exchange and agreement algorithm (EKeyExg), a signature generation algorithm (ESigGen) and a signature verification algorithm (ESigVer). We assume two entities A and B execute the following algorithms: • Computes a public key, EPK as EPK= EPR*P, where the operation (*) means multiplication.

2) EKeyExg
• Both entities A and B exchange their public keys, EPK A , EPK B • A and B compute their shared secret key (S) as follows, S = EPR A * (EPK B ) = EPR B * (EPK A ).

3) ESigGen
• A selects an integer as a random private key (EPR r ), where 0 < EPR r < t The h(x) is a hash function.
• The verifier computes Q = u1 * P + u2 * EPK B . This results in a point (x Q , y Q ).
• The verifier checks if r = x Q (mod n) signature is valid.

B. ECC BLIND SIGNATURE 1) BLIND MESSAGE (BlndMsgGen) ALGORITHM
Here a sender wants to use the proxy blind signature service provided by the proxy signer. The sender first generates a VOLUME 11, 2023 blinding value (R * ) using Equation (1), where a, b, and c are random blinding factors. It then computes a hash value of the message (m) using Equation (2). It then blinds the hash value e * using Equation (3). After that it sends a request for a blind signature on the hash value to the proxy signer.

2) BLIND SIGNATURE GENERATION (BlndSigGen) ALGORITHM
Once the request is received by the proxy signer, it generates the blind signature (S * ) on (e) using Equation (4). Then it sends the blind signature (S * ) to the sender.

3) BLIND SIGNATURE DRIVEN (BlndSigDrv) ALGORITHM
After receiving the blind signature (S * ) from the proxy signer, the sender derives the unblind version of signature (S) using Equation (5). This signature (S) can be proven by any verifier using Equation (6).

4) THE BLIND SIGNATURE VERIFICATION (BlndSigVer) ALGORITHM
This algorithm is executed by a verifier. After receiving the blind signature, the verifier verifies the proxy blind signature using Equation (6).

VI. SPID METHODS
In this section we present a number of methods that used in designing the SPID protocols.

A. DOUBLE SIGNING AND ENCRYPTION
To satisfy the data confidentiality and authenticity requirements we designed the double signing and encryption method. The method (as shown in Figure 6) can be explained as follows. The patient's data, which is uploaded to several foreign servers, has two forms. The first is Encrypted Message Authenticated Code and Patient Generated Health Data (EMPGHD), and the second form is Double Encrypted Message Authenticated Codes and Patient Generated Health Data (2EMPGHD). In EMPGHD, the patient generates a MAC on PGHD using the shared key (SK i ). This shared key is known to the patient and the HCP. Then, the patient encrypts both the PGHD and MAC with the same key. In 2EMPGHD, the patient generates the MAC on the EMPGHD using the home shared key (HSK i ) which is a key known to the patient and his/her home server. Then, the patient encrypts both the MAC and the EMPGHD using the home shared key. The result is the 2EMPGHD that will be uploaded onto different foreign servers. By using this method, the patient's data is kept confidential and authenticated through encryption and MAC generation, respectively. Only the HCP can learn the unencrypted form of the patient's data and verify the authenticity of the data. Only home server can verify the authenticity of the patient's data (i.e., EMPGHD) that is forwarded by foreign servers. By using this method, the HCP and the patient ensure that no entity can generate data on behalf of the patient and no entity can learn anything about the patient's data.

B. PSEUDONYM GENERATION AND LINKAGE
To satisfy the ID privacy and mutual authentication requirements, we designed pseudonym generation and linkage algorithms. We first explain the type of pseudonyms used in SPID. Then, we explain how to generate and link them. There are four types of pseudonym and they are as follows. The Home Index Pseudonym (HIP), the Foreign Index Pseudonyms (FIPs), the Home Request Pseudonyms (HRPs), and the Foreign Request Pseudonyms (FRPs). The following algorithms show who, how to generate, and link the pseudonyms.

1) HIP ALGORITHMS
The HIP generation (HIP-Gen) algorithm is executed by the patient. In this algorithm, the patient generates an elliptical curve public key (EPK) using the ECC key generation algorithm. This EPK will be the patient's HIP known to both the home server and HCP. The HIP linkage (HIP-Lnk) algorithm is executed by the HCP. In this algorithm, the HCP links the HIP to the patient's real ID.

2) FIPs ALGORITHMS
The FIPs generation (FIPs-Gen) algorithm is executed by the patient using an RSA encryption and home server's RSA public key (RPK h ). The inputs to FIPs-Gen are the public key of the home server (RPK h ), the HIP, a random number (Rnd), and the current time (T). The output from the FIPs-Gen is a foreign index pseudonym (FIP f i ), i.e., FIP f i = Enc (RPK h , HIP i || T || Rnd). Where Enc means encryption using RSA and ( || ) is the concatenation symbol. The FIPs Linkage (FIPs-Lnk) algorithm is the reverse process of FIPs-Gen executed by a patient's home server, i.e., HIP i || T || Rnd = Dec (RPR h ,FIP f i ). Where Dec means decryption using RSA.

3) THE HRPs ALGORITHMS
The HRPs generation (HRPs-Gen) algorithm is executed by the patient. The HRPs-Gen is used to generate a fresh HRP for each request carried out by the patient with his/her home server. The inputs to HRPs-Gen are the HIP, the RSA public key of home server (RPK h ), an index nonce (Inc), a random number (Rnd), and a priority tag (PR), i.e., HRP i,r = Enc (RPK h , HIP i || PR || Rnd || Inc). The priority tag helps to prioritize the data.
The HRPs linkage HRPs-Lnk alogrithm is the reverse process of HRPs-Gen executed by the patient's home server to link each HRP back to its home index pseudonym (HIP), i.e., HIP i ||PR || Rnd ||Inc = Dec (RPR h ,HRP i,r ).

4) FRPs ALGORITHMS
The FRPs generation (FRPs-Gen) algorithm executed by the patient to generate a fresh FRP for each uploading request carried out by the patient with the foreign server. The inputs to FRPs-Gen are a string format of the temporal public key (stPK f i ), an RSA public key of foreign server (RPK f ), index nonce (Inc), a random number (Rnd), and the priority tag (PR), i.e., FRP f i,r = Enc (RPK f , tPK f i ||PR ||Rnd|| Inc). The FRPs-Lnk is the reverse process of the FRPs-Gen executed by the patient's foreign server, stPK f i || Pr || Rnd || Inc = Dec (RPR f ,FRP f i,r ).

VII. SPID PROTOCOLS
In this section we will use the SPID methods and cryptography algorithms to design the SPID protocols. In this paper we only going to cover the following protocols. The FCert Generation (FCertG), Foreign Server Registration (FSR), and data uploading protocols.

A. ASSUMPTIONS
We have the following assumptions • The HCP initializes the system by establishing the domain parameters which define the asymmetrical, symmetrical, and elliptic curve cryptosystems. All the entities of the system download these parameters from the HCP. Each patient, along with each service provider (SP), generates an ECC public/private key pair, EPK i /EPR i and EPK sp /EPR sp , respectively. The public keys are signed by the HCP and certified in the form of digital certificates. In addition, each SP generates RSA public/private key pairs, RPK sp /RPR sp of different sizes (i.e., 15360 and 2048). These public keys are signed by the HCP and certified in digital certificates. A list of valid SP certificates is stored in an open-access database.
The patient's mobile device can access and download a number of certificates for the SPs with which it wants to establish communication. It should be noted that, even if the SP is an intruder, it can not learn anything about the patient's data as the data is uploaded encrypted.
• The patient has registered with the HCP using the home index pseudonym (HIP) and the HCP has issued to the patient the home pseudonym certificate (HCert).
• The mobile device has a software agent that randomly selects a number of service providers, one is assigned as the home provider and the other ones are the foreign providers. The home and foreign servers are dynamic for each patient (i.e., the home provider for the patient can be the foreign provider in the next day).
• The patient has registered with the home provider using the HCert and by using the EKeyExg algorithm, the patient and the home server has generated a home shared key (HSK)

B. FCert GENERATION (FCertG) PROTOCOL
In this protocol the patient is generating a number of foreign pseudonym certificates (FCerts) and needs the home server to sign them blindly. The patient uses the FCerts to access foreign providers. The protocol below explains how the patient generates one FCert.
(1) The patient's mobile device performs the following.
• It generates a temporal elliptic curve key pair public and private key respectively (tPK f i , tPR f i ) using the EKeyGen algorithm.
• It then generates a home request pseudonym (HRP i,r ) using the HRP-Gen algorithm. It also generates the foreign index pseudonym (FIP) using FIPs-Gen algorithm.
• It generates the other data structure fields of the FCert which are the signature algorithm, the home ID, and validity. Then it uses the BlndMsgGen algorithm to generate blind factors, hash the FCert data structure fields and then blind the hash result. The FCert data structure fields are the signature algorithm, the home ID, the validity, and the subject which is a foreign index pseudonym (FIP). (2) Then, the mobile device constructs the blind signature request message (BlndSigReq) which contains the home request pseudonym (HRP i,r ), the ID of the home server (ID h ), the blind hash (e). After that, the mobile device generates the MAC on the message using the home shared key (HSK i ). Then it sends the message to the home server, i.e., BlndSigReq = (ID h || HRP i,r || e || MAC). The home server receives the request and do the following.
(3) The home server uses the HRP-Lnk to learn the home index pseudonym. Then, to find the home shared key (HSK i ) which is used to verify the MAC, the home server multiplies the home index pseudonym (elliptical curve public key) with its elliptical private key (EPR h ). VOLUME 11, 2023 The result of this multiplication is the home shared key.
This key is used to verify the MAC. (4) If the MAC is successfully verified, the home server validates the index nonce. This validation guarantees the freshness of the home request pseudonym (to protect against replay attacks). It then checks that the HCert of the patient (HCert i ) has not expired. If both verifications are correct, the home server moves to the next step.  (15) It verifies the MAC using the foreign shared key (FSK f i ). (16) If the MAC is successfully verified, the mobile device uses the same key to decrypt the encrypted part of the message, i.e., Md = D(Me, FSK f i ). Then, it stores index nonce (Inc) to be used later when requesting an uploading service. The patient repeats the registration process with the selected foreign servers. Then, the patient moves to the uploading process.

D. DATA UPLOADING PROTOCOL
This protocol is executed between the patient and a foreign provider. The purpose of this protocol is to allow the patient to upload his/her data to the foreign provider. It should be noted that, the patient has a number of foreign servers to upload to. The patient shuffle randomly among the servers to upload the data.
(1) The patient's mobile device generates the 2EMPGHD using a double signing and encryption method.
(2) Then, the patient's mobile device generates a foreign request pseudonym (FRP f i,r ) using the FRPs-Gen algorithm.
(3) Subsequently, it constructs an uploading request (UpReq) message. This message contains the foreign server ID (ID f ), the patient's ID (FRP f i,r ), and the patient's data (2EMPGHD i ). (4) Then, the mobile device generates a MAC on the message by using the foreign shared key (FSK f i ). (5) The mobile device then sends an uploading request (UpReq) message to the foreign server.
). The foreign server receives the request message and performs the following.
(6) It uses its RSA private key (RPR f ) as an input to the FRP-Lnk algorithm to decrypt the (FRP f i,r ) and find the temporal elliptical curve public key (tPK f i ) of the patient and other information related to the patient (i.e., nonce, priority of the data). The foreign shared key can be found by multiplying the (EPR f ) with the (tPK f i ). The result is the foreign shared key which is used to verify the MAC.

(7)
The foreign server next validates the index nonce. This validation is to guarantee the freshness of the foreign request pseudonym (i.e., protect against replay attack). It then checks that the foreign pseudonym certificate (FCert) of the patient has not expired (i.e., within 24 hours). If both verifications are correct, the foreign server sends an acknowledgement to the patient. (8) Next, the foreign server checks the priority tag (PR). If the priority tag is set, it sends a notification to the patient's home server. Otherwise, it stores the patient's 2EMPGHD in its database. The patient can continue uploading on the same server or select randomly another server to upload.

VIII. REQUIREMENT ANALYSIS
• The SPID framework supports mutual anonymous authentication. As we can see the patient is using anonymous credentials to access service providers. The patient requests a home pseudonym certificate (HCert) from the HCP to register with the selected home server. The patient is using a number of foreign pseudonym certificates (FCerts) to register with any foreign servers. These FCerts are generated by the patient and are blindly signed by the home server. The registration with any server is done by sending the certifiacte (i.e., HCert or FCert) along with the digital signature. For repeated authentication with the same server, the patient uses a fresh request pseudonym each time. The authentication of these pseudonyms is achieved by verifying the MAC associated with the message. To verify the MAC without looking for the key in the database, we use a brilliant method. When a server receives the request pseudonym, the server decrypts the request pseudonym to get the underlying index pseudonym. Recall that, the underlying index pseudonym is an ECC pubic key. So, the server multiplies index pseudonym with the server's ECC private key. This multiplication process results in the shared key. This key is used to verify the request.
• The SPID framework preserves patient's ID privacy (P). The patient's real ID is protected, as the patient uses an elliptical curve public key (EPK i ) as a home index pseudonym only the HCP knows the real ID of the patient. The linkability of the patient's foreign index pseudonym IDs that are used as identities for the patient across service providers are only linked by the patient's home provider to the patient's home index pseudonym. The patient uses a new request pseudonym for each upload. This is to protect the sessions from being linked by unauthorized entity. Only the same server can link these request pseudonyms to the patient's index pseudonym used to identify the patient with that server. The patient uses a number of foreign servers to upload data and not to stick to one server.
• The SPID framework supports end-to-end data authenticity (S2). In the data uploading protocol, the patient generates a MAC on the upload message before sending it to a foreign server. By using the MAC, the foreign server guarantees that the message has not been altered from the point at which it was dispatched. The key used to generate the MAC is generated from the private key of the patient. This guarantees that no other entity can generate the key and the message is from the claimed patient. In the forwarding protocol (i.e., from foreign to home), the patient's data is in the form of 2EMPGHD (i.e., double signed and encrypted). When the patient's data arrives at the home server, it can verify the authenticity of this data using a home shared key which is only known to the patient and the home server. In the second forwarding protocol (i.e., from home to HCP), the patient's data is still in the form of EMPGHD. When the patient's data arrives at the HCP, it can verify the authenticity of this data using a shared key which is only known to the patient and the HCP.
• The SPID framework supports end-to-end data confidentiality(S3). The patient's data that is uploaded on any foreign server is double encrypted using two keys. Therefore, no entity can decrypt and learn anything about the patient's data.

IX. SECURITY ANALYSIS
In this section, the SPID is analysed against threats. Here, Alice is an authorised patient. Eve is an adversary.
• SPID is protected against an impersonation attack. Suppose that Eve learns Alice's certificate (i.e., HCert or FCert), Eve is attempting to play Alice's role and deceive a data collection server. Eve sends the certificate to the data collection server, and the server accepts the certificate from Eve. The data collection server finds that the certificate has been sent before by Alice, and in this case asks Eve to prove knowledge of the private key which corresponds to the public key stated in the certificate. VOLUME 11, 2023 Eve needs to prove to the data collection server that she is the owner of the certificate, so she needs to generate a digital signature using the private key which corresponds to the public key stated in the certificate. Eve fails to generate the digital signature as she does not have the private key. By design, Eve fails to impersonate Alice.
• The SPID is protected against the man in the middle attack. Suppose a malicious server intercepts the messages between the patient and home server to make the patient believe it is the home server. The patient sends the HCert to the malicious server to register. The malicious server verifies both the HCP's signature on HCert and Alice's signature. The malicious server generates the shared key by multiplying its elliptical curve private key with the patient's elliptical curve public key (stated in the HCert). By using the generated shared key, the malicious server generates a MAC on response message and sends it to the patient. The patient receives the message from the malicious server and then verifies the MAC associated with the message using the home shared key. The verification step fails because the malicious server generates the MAC using its key, not a key which the patient used to generate the home shared key.
• The SPID is protected against linkability attack. As explained in the protocol section each type of protocol involves using different pseudonyms. In the registration protocol, the patient uses a different index pseudonym and certificate for each server. In addition each request carried out by the patient with any server involves using fresh request pseudonyms, one for each request. Thus linking these request pseudonyms to the same patient is quite difficult, as we will explain in the degree of anonymity section.
• SPID is protected against a data forgery attack launched by an internal entity. Suppose a home server tries to generate data for Alice. The home server cannot generate data for Alice. This is because the home server needs to obtain Alice's shared key (SK) (i.e., the key between Alice and the HCP) to generate the EMPGHD. Further if the foreign server tries to generate data for Alice, the foreign server needs to obtain two keys. The first key is the SK used to generate the EMPGHD and the second is the home shared key (HSK) (i.e., the one between the patient and home server) to generate the 2EMPGHD, which is not feasible.
• SPID is protected against data forgery attack launched by an external entity. Suppose Eve has captured Alice's uploading message; as explained previously the message contains the following fields: Alice's ID (i.e., request pseudonym), Alice's data (i.e. EMPGHD), and the MAC generated on the message fields. Suppose Eve generates the MAC on the request pseudonym and her data. The foreign server decrypts the request pseudonym, extracts the index pseudonym, and multiplies the index pseudonym after decoding it (recall this is a public key) with the foreign server's elliptical curve private key. This process results in a shared key. The key will not verify the MAC associated with the message so the server discards the message.
• SPID is protected against replay attack. Suppose Eve tries to replay uploading messages to disturb the server. Eve forwards Alice's old uploading messages to a foreign server. After the server performs all the verifications, the server verifies the freshness of the index nonce, and discovers that the messages sent by Eve are old and so discards the messages.
• SPID is protected against repudiation attacks. Messages exchanged between Alice and any server contain a MAC generated by Alice's shared key. As explained previously, generating the shared key involves using both parties' private keys. The knowledge of the private key requires solving the elliptic curve discrete logarithm problem (ECDLP), which has been proven to not be computationally feasible. Thus, it is very hard for Eve to generate a message on behalf of Alice. So Alice can not claim that an entity has generated a message of behalf of her.

X. DEGREE OF ANONYMITY
The degree of anonymity provided by pseudonym generation and linkage method measured by using the Shannon entropy method [29]. What we need to show is that the patients are indistinguishable from the attacker. If the attacker could determine a particular patient to be the generator of a pseudonym by any means, we can say that the anonymity provided by the pseudonym generation and linkage method is not acceptable.
To calculate the degree of anonymity provided by the method, we assume that (X) is a discrete random variable which represents a pseudonym. This pseudonym can be correctly linked to a certain patient (i) from a set of (N) patients. In mathematics, this can be represented as (p i = Pr(X = i)), where Pr is the probability. The current entropy (H(X)) of the corresponding pseudonym can be calculated as: The maximum entropy of the system (H(M)) can be calculated as: The degree of anonymity (d) provided by the method can be calculated as: 1) When d = 0, the attacker knows the generator of the pseudonym (p i ) with probability 1. 2) When d = 1, all patients appear as being an originator with the same probability. Suppose that we have 100 patients (i.e., N = 100). Each patient generates several pseudonyms and the attacker cannot distinguish one particular patient as the owner of these pseudonyms. However, the attacker can divide the patients into say two groups (G 1 and G 2 ), G1 has 40 patients and G2 has 60 patients. Then the attacker assigns each group a probability of generating a set pseudonyms as follows.
In Equation 10, we have two groups of patients, one with 40 patients and the other with 60 patients. Patients belonging to the same group are seen by the attacker as having the same probability of generating the pseudonyms. Figure 7 shows that the degree of anonymity (d) is proportional to the number of patients. When the number of patients is 100, the maximum degree of anonymity (d = 1) is achieved for the probability distribution (p = .41). The degree of anonymity is equal to 0.8 when one group is assigned the probability p = .93 and the number of patients is (N = 10,000). However, the anonymity does not drop to zero even in the case that we have only two patients in the system. This is because the attacker sees all patients as potential owners of the pseudonyms.

XI. SPID PERFORMANCE
In this section we are going to measure the performance of the SPID protocols theoretically using queuing theory.
• The software which is used to implement the SPID protocols is Java 2 Platform, Standard Edition (J2SE). Java provides the implementation of several cryptographic primitives and key management services required in our protocols.
• The performance metrics used for the SPID protocols evaluation are the average time a patient spends waiting in a queue, and the average response time of the system. We want these metrics to be within reason.
• To measure the execution time for each protocol, we use a Java benchmarking tool called Java Microbenchmark Harness (JMH) [30].
• To prototype the protocols, a desktop computer running Windows 10 with a 1.99 GHz Intel Core i7 and 8GB of RAM is used. The timing results from the protocols execution presented here are based on this computer specifications.
• The queuing theory [31] is used to predicate the performance of protocol.

A. QUEUING THEORY
Queuing theory [31] helps to predict the following: an average waiting time a user spends in the queue, the total response time of the system, the number of users in the system, and the average utilization of the system. There are two models in queuing theory, the single sever model and the multi-server model. The single server queuing model (M/M/1) assumes there is only one server in the system. The multi-server queuing model (M/M/C), where C is the number of servers. To predict the performance metrics of our protocols, we applied queuing theory. Queuing theory uses mathematical formulas to theoretically analyze the performance metrics of a system and they are as follows.
Mathematical Formulas for the Single Queue Model: • λ = mean arrival rate of patients (average number of patients arriving per unit of time).
• µ = mean service rate (average number of patients that can be served per unit of time).
• L Q = ρ * L = the average number of patients waiting in line.
• W = 1/(µ − λ) = the average time spent waiting in the system, including service time.
• W Q = ρ * W = the average time spent waiting in line.
= the average number of patients waiting in line.
• W Q = L Q /λ = the average time spent waiting in line. • W = W Q + 1/µ = the average time spent in the system, including service time.
• L = λ * W = the average number of patients in the service system. To theoretically analyze the performance of a protocol, we first need to determine the following inputs: the service time (m), the arrival rates of the requests (λ), and the service rate (µ). The service time (m) is the time the server needs to fulfil each request. For example, the service time for a registration request is the sum of the execution time of each method executed on the server side, listed in Table 3, which is equal to 0.75 seconds. The service rate µ can be calculated as (1/m).

1) FCertG PROTOCOL PERFORMANCE
Here we describe the performance of the FCertG protocol. The service time for the protocol is the summation of the execution time of the methods executed on the home server listed in Table 2. Table 1 shows the performance of our system using the mathematical formulas for the single and multiple queuing models. The overall average of the waiting time and response time of our system using one server are 0.88 seconds and 0.95 seconds respectively. In the case of 3 servers, the arrival rates should not exceed µ * s which is (333.3*3) = 999.9 requests per second. We selected the arrival rates to range from 100.5 to 900.5. The minimum waiting time and response time arise when the arrival rate is 100.5 requests per second and the maximum waiting time and response time arise when the arrival rate reaches 900.5 requests per second. The overall average of the waiting time and response time of our system using 3 servers are 0.17 seconds and 0.19 seconds respectively. Figure 8 shows the performance of the protocol.

2) FSR PROTOCOL PERFORMANCE
Here we describe the performance of the foreign server registration protocol. The service time for the protocol is the summation of the execution time of the methods executed  on the server listed in Table 3. Table 4 shows that we have calculated the performance of our system using the mathematical formulas for the single and multiple queuing models. The overall average of the waiting time and response time of our system using one server are 9.9 seconds and 10.6 seconds respectively. In the case of 3 servers, the arrival rates should not exceed the µ * s which is (1.33*3) = 3.9 requests per second. We selected the arrival rates to range from 1.5 to 3.9. The minimum waiting time and response time arise when the arrival rate is 1.5 requests per second. The maximum waiting time and response time arise when the arrival rate reaches 3.9 requests per second. The overall average of the waiting time and response time of our system using 3 servers are 2.2 seconds and 2.9 seconds respectively. We enhance the system by 70% by using 3 servers.

3) DATA UPLOADING PROTOCOL PERFORMANCE
Here we describe the performance of the data uploading protocol. The service time for the protocol is the summation of the execution time of the methods executed on the server   listed in Table 5. Table 6 shows that we have calculated the performance of our system using the mathematical formulas for the single and multiple queuing models. The overall average of the waiting time and response time of our system using one server are 6.8 seconds and 0.24 seconds respectively. In the case of 3 servers, the arrival rates should not exceed µ * s which is (142.85*3) = 428.6 requests per second. We selected arrival rates ranging from 85.7 to 426.4. We can see that the waiting time and the response time are negligible.

XII. DISCUSSION
The SPID framework has the following features that are not found in other related works. The SPID allows the patient to select where to store the data and who is responsible for delivering the data to the final destination (HCP). The SPID allows the patient to generate certificates and pseudonym IDs to access any service providers. In SPID, the authentication per uploading can be done without searching in the foreign provider's database for the verification key. This is achieved by using some properties of the ECC. The SPID allows the patient to double sign and encrypt the data before upload so no service providers can know the content of the patient's data. The SPID allows the patient to not be recognized based on the communication pattern by using different service providers and pseudonyms and the patient is responsible for choosing a new set of service providers everyday. The SPID framework uses a combination of different cryptographic building blocks to enhance performance. For example, the request pseudonym is generated based on an elliptical curve public key (EPK) and some random text. Then, the EPK is encrypted using the server's RSA public key to form the request pseudonym. So when the pseudonym arrives at the server, the server decrypts the pseudonym using the server's RSA private key to discover the EPK. Then, the server multiples the EPK with its elliptical curve private key (EPR) to get the verification key and verify the message.

XIII. CONCLUSION
To protect the security and ID privacy in data collection distributed system we designed a secure anonymous data collection framework called SPID. The proposed framework has advantages of using multiple service providers to collect a patient's data to prevent a single provider from inferring the patient's identity based on the pattern of interactions, allow the patient to generate pseudonym identities and certificates to access these service providers, and allow each patient's home service provider for anonymously linking the patient's data which are scattered across different foreign service providers. Then, the home provider of the patient delivers the patient's data after aggregation to the healthcare provider. From the design requirements, it is understood that the SPID framework has advantages that are not supported in other related works. The security analysis shows the strength of the SPID in preventing a number of security attacks. From the performance result, we found the SPID performance is acceptable when the number of servers increased. The Shannon entropy method showed that it is hard to distinguish one patient as the generator of the a set of pseudonyms.