The Latticed-Based Path Privacy Protection Aggregation Scheme for Internet of Vehicles

With the occurrence of information leakage events of online ride-hailing, the information security of Internet of Vehicles has been highly valued by the society. Once the information of the Internet of Vehicles is exposed by the online ride-hailing platform or stolen by hackers, it will pose a fatal threat to the personal and property safety of users, and even cause a major information security accident to the whole society. Based on a post-quantum cryptography system, this paper proposes a practical privacy protection scheme for ride-hailing route information, which can present the complete statistical aggregation operation of the route and the frequency from the starting point to the destination without the visibility of the ride-hailing platform, and ensure the data privacy security of a single vehicle. Compared with representative multi-vehicle aggregation solutions, we not only guarantee message privacy, confidentiality, integrity, forward and backward security, anti-man in attack and anti-redial attack, but also achieve multi-dimensional aggregation, CCA security and anti-quantum attack. Through the analysis of the experiment, the cost of our scheme is reasonable. Therefore, the scheme is more practical in this scenario.


I. INTRODUCTION
The Internet of Vehicles (IoV) is derived from the Internet of Things, which is the Internet of things for vehicles. With the help of the new generation of information and communication technology, it realizes the network connection between cars and people, cars, roads and cloud platforms, so as to improve the overall intelligent driving level of the vehicle, and provide users with safe, comfortable, intelligent and efficient driving experience and traffic service. Also, it improves the efficiency of traffic operation and enhances the intelligence level of social traffic service. Meanwhile, thanks to the recent explosive growth of the ride-hailing business, a large amount of private information is collected through the IoV, which includes user starting point information, vehicle conversation information and vehicle driving record, etc. This information will help build a more efficient and intelligent solution to the IoV, which will make a significant contribution to improving the user experience and alleviating road traffic congestion.
However, this mass collection of information raises wide concern. It is feared that once the information of IoV is The associate editor coordinating the review of this manuscript and approving it for publication was Tyson Brooks . exposed by the online ride-hailing platform or stolen by hackers, this will pose a fatal threat to the personal and property safety of users, and even cause the whole society a major information security accident. Therefore, the security of privacy becomes particularly important. It is necessary to build a multi-party data and information trust mechanism for users, platforms and government agencies.
To do this, there are many privacy security challenges to address [16]. These include how to establish identity trust networks between vehicles, between vehicles and base stations, and between base stations and control centers; how to prevent hackers from attacking the network system in coalition with other members of the Internet of vehicles, eavesdropping or tampering with the message, setting up pseudo base stations to intercept the sender's message and return the query information, etc.
Attempts have been made to tackle these challenges. Liu [12] respectively proposed some simple sensing devices, privacy protection data collection and aggregation solutions for mobile devices, vehicle-mounted devices, smart grid and other application scenarios. These schemes can realize dynamic addition and deletion of members and VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ allocation of calculations to each device, but they cannot meet other security requirements such as anti-quantum computing and anti-selective ciphertext attacks. In addition, there is the problem known as ''the superior knows too much''. This is to say, operators may sell a large number of users' personal information and behavioral privacy data, resulting in serious problems such as user behavior information leakage. For example, hackers can know when the user went from where to where, and how many times. To this end, we need to complete the aggregation of privacy protection data by integrating some of the technologies of previous solutions, and building trust between government and control centers, base stations and vehicles.
Then there is the issue of overhead. High overhead of the traditional communication network is fatal to the IoV scenario that relies on wireless sensing and micro-base station devices with weak computing power. Therefore, we need to adopt a third-party trusted institution with certain authority to reduce the overhead.
There are several schemes we look at when developing our own. Qian et al. [1], [2] proposed three lightweight and quantum-resistant vector space-based privacy protection data aggregation schemes on smart grid, which can aggregate multidimensional data. Based on this scheme, Lin and Qian [3] proposed a post-quantum SAAS platform, which can also achieve secure financial statistics. Based on this, this paper proposes a more practical scheme, which can aggregate multiple departure points, corresponding destinations and corresponding times simultaneously within a valid period. In addition, we have the following innovations compared with the representative multiple internet-of-vehicles aggregation solutions: 1. Our scheme not only realizes message privacy, confidentiality, integrity, forward and backward security, anti-man in attack and redial attack, but also realizes multidimensional aggregation, CCA security and anti-quantum attack.
2. Under our scheme, CC only knows how many times TA travels from the origin A to the destination B in A certain period of time without recording the details of authorization. But it does not know the total number of times TA travels, nor does it know the actual locations of A and B.
In addition, after TA authorization, the specific locations of A and B corresponding to the road sections with relatively large values can be decrypted. Therefore, so long as the specific time from A to B is protected, the frequency of the e-hailing vehicles passing through road sections can be known, thus bringing convenience to users and realizing more detailed data management and control.
The structure of this paper is as follows: Section 2 introduces PRELIMINARIES. Chapter 3 introduces the SYSTEM AND SECURITY MODEL. Chapter 4 introduces OUR PRO-POSED SCHEME. Chapter 5 introduces the security analysis and proof of the scheme. Chapter 6 shows that we have low overhead through experiments. The last chapter is the summary of the whole paper.

II. GUIDELINES FOR MANUSCRIPT PREPARATION
A. RSA-OAEP CRYPTOGRAPHY RSA [13] algorithm is one of the most commonly used public key encryption algorithms. Because of its homomorphism, CCA cannot be protected against selected ciphertext attacks. The Rsa-oaep [14] obtained by adding salt to RSA can achieve CCA security, as it reduces the difficult problem to a one-way limited gate function.

B. LATTICE-BASED HOMOMORPHIC ENCRYPTION ALGORITHM BASED ON HLP PROBLEM
Firstly, we introduce the Hidden Lattice Problem (HLP). HLP is defined by Aguilar-Melchor and Gaborit [5]. Wieschebrink demonstrated in [6] that it is NP-complete and resistant to quantum computing. Then, in Melchor et al. [4], [5], two lattice homomorphic encryption schemes are given. Qian et al. [1] modified some flawed formulas, correctness verification process and security proof of these algorithms to make them correct and secure, and applied them in the field of smart grid to put forward a very lightweight privacy protection data aggregation scheme. The process of the algorithm is shown in Table 1 (All symbols in the scheme are completely consistent with [1], where the number of homomorphic operations can be regarded as sufficiently large):

III. SYMBOL, SYSTEM MODEL AND SECURITY MODEL A. SYMBOL
All symbols used in the system model and scheme in this paper are shown in Table 2.  Fig.1 shows that the network system structure model in this paper is composed of a government Trusted Authority (TA), semi-honest Control Center (CC), multiple Base Stations (BS) and multiple Cars (CAR) through a lightweight wireless network connection. Each CC connects to multiple BSs and each BS connects to multiple CARs. The following describes the functions and permissions of each section: 1. The TA takes charge of the primary key and authorizes CC to decrypt the message under the corresponding permission.

B. SYSTEM MODEL
2. CC holds part of the key and can view part of the information authorized by TA. For example, from a partially authorized place to a place how many times the total number of cars passed, the total distance traveled by each car. For the rest of the information, CC can only get ciphertext such as user information and total driving distance and has no permission to decrypt it.
3. Each BS has its own secret real ID information and the corresponding hash value h(ID), which is stored in a secure register along with other data. Only h(h(ID) + r ID ) is disclosed to the public. BS also needs to count and encrypt the number of times each vehicle has traveled between the starting point and the destination point over a period of time. 4. Each CAR has its own private real ID information and the corresponding hash value H (ID) and the private random number r ID , and stores it in a secure register with other data. Only h(h(ID) + r ID ) is open to the public, and the coordinates of the start and destination of each drive, as well as the corresponding timeline, also need to be encrypted. In addition, the distance traveled over a period of time also needs to be encrypted.

C. SECURITY MODEL
Assume that CC, BS and Car are semi-honest (cannot submit fake data during aggregation but can maliciously attack others). Based on the international security and privacy requirements of cloud Internet of Things [15], we built our security model, including the following parts: -Anti-middleman attack replay attack target If opponent A takes a middleman attack or replay attack, it cannot get any valid information.
-Public key part CCA security target VOLUME 10, 2022 As for the public key part of the scheme, the adversary A cannot obtain any valid information by using the selected ciphertext attack.
-Structural safety objective The adversary A cannot obtain valid information by intercepting the message and enumerating it in combination with the member's public parameters.
-Privacy, message authenticity, integrity, and confidentiality goals Adversary A cannot extract any useful information from the intercepted messages, nor can it forge its real identity to participate in the aggregation process. And when it tries to modify the message, the malicious behavior will be detected.
-Back and forth to the safety target When the shared key is leaked, the data security of the previous and next time is not affected.

IV. OUR PROPOSED SCHEME
A. INITIALIZATION PHASE 1. Each CAR generates its own public and private key pair for receiving data using the 2048-bit RSA-OAEP algorithm and exposes its own public key 2. Firstly, TA generates its own 1024-bit public key encryption function Enc TA (·) based on RSA-OAEP, and then the hash function h IDs are distributed to all participating members and recorded h(ID), the IDS of BS and CAR i are denoted as ID BS and ID i respectively, and the corresponding hash values are h(ID BS ) and h (ID i ) . In addition, they were given random numbers r BS and r ID i , which were held synchronously by TA. While CC and BS only have h (ID BS ) + r BS and h (ID i ) + r ID i respectively, and they promull their ID values as h(h (ID BS )) and h(h (ID i )).
3. TA generates a large database, and the start and end points are arranged in order. In addition, Enc TA (·) is used to encrypt the location, and CC and BS are stored synchronously in order to reduce their burden.
4. TA issues a random aggregated number R BS to each BS and tells CC the total value of R BS .

B. AGGREGATION PHASE
1. The honest and curious CAR (Its ID is ID i , and its name is recorded as CAR ID i ) takes the three parts (starting point Star ID i ,j and its time, the end point End ID i ,j and its time as well as the precise identification signature with the sending time). The messages are encrypted with TA's public key and then signed with its own RSA-oaep private key respectively to . After the shared key on XOR, The message will be sent to BS by CAR within 5s to ensure the validity of time T. 2. After receiving the message, BS first decrypts the message with the shared key on XOR, and then uses the public key Enc CAR i (·) and the identity signature h(h (ID i ) + r ID i + T ) to verify the authenticity of the data. If the authentication fails, a message is sent by BS and the message is retransmitted by CAR (the transmission process of the short message is less than 45s to ensure that the identity can pass the authentication normally).
3. After the verification is successful, BS first records and stores some information from the car numbered h (ID i )+r ID i , for example, the J th path and time period of Enc TA Star ID i ,j , End ID i ,j , T ID,j , and add the number of trips Num (h(ID i )+r ID i ,Enc TA (Star ID i ,j , End ID i ,j )) on the corresponding section of the route by 1.
represents the total number of times that CAR i passes these two places as the starting and ending points within the jurisdiction of BS. Then the information is aggregated and calculated, and the total elapsed times Num (ID BS ,CStar,CEnd) corresponding to the ciphertext path CStar, CEnd = Enc TA Star ID i ,j , End ID i ,j recorded in the database is added by 1. Num (ID BS ,CStar,CEnd) refers to the number of rides that started and ended on these two places within BS's jurisdiction. After a period of time the data Num (h(ID i )+r ID i ,Enc TA Star ID i ,j ,End ID i ,j ) and Num (ID BS ,CStar,CEnd) in accordance with the order in the database. In addition, a group of N = 600 was divided into Z-group vectors, and then the random number was added as follows (When the number of locations is greater than 600, the matrix partition operation is used for storage): Then the Num matrix is encrypted with the lattice homomorphic cipher based on the HLP (in Section II.B) below to obtain the ciphertext matrix as follows: And add a random number R to each component to get the matrix CNum . And the message is signed and linked to get The shared key on XOR is sent to CC. The accuracy of T is within 1h. 4. After receiving the message, CC first decrypts the shared key on XOR, and then verifies whether the time is within a valid time period and whether the message is real or not. If the message is real and valid, it can pass the authentication and obtain valid CNum', then CC aggregates all CNum'. The matrix of the total number of trips in any two locations in a period of time is CNum = CNum + R BS . Therefore, CC can recover CNum, and then use the lattice password private key to decrypt The Times to get degree matrix Num.
Note, the correctness of the scheme is easy to verify, so no further details are needed.

C. QUERY PHASE
If CC wants to know the elapsed time between Start and End, there is no way to directly encrypt with TA's public key and then compare it with the database (because the algorithm is salt-added, the data cannot be matched without decrypting). Therefore, it must be encrypted first and then sent to TA. TA decrypts and matches the data before encrypting the serial number and sending it to CC. CC will know how many times a period of time he wants to know.

D. DYNAMIC MEMBERSHIP PHASE
On the one hand, you can see from the initialization process, in the network, adding a CAR and deleting a CAR have little overhead.
On the other hand, adding a BS and assigning it a new random number R new through its public key, and the random number before the update is R: = R + R new , so the dynamic flexibility of the number is valid.

V. SECURITY ANALYSIS AND PROOF
This section mainly introduces the security analysis and proof of the solution, which is divided into the following parts: Structure Security is always true, so the system is safe. Theorem 2: (Privacy, Message Authenticity, Integrity, and Confidentiality Theorems) Adversary A cannot destroy the privacy of the scheme, tamper with message authenticity, destroy data integrity, and steal confidentiality.
Proof: First of all, according to the Theorem 1 and the shared key, the message obtained by adversary A is encrypted with the shared key and random numbers, so privacy can be guaranteed according to this. In the process from CAR to BS, if the message in the round is obtained, since h(h (ID i ) + r ID i + T ) has used timestamp and ID after the private key signature, the sender's ID must be real and the message must be complete, otherwise the authentication cannot be passed. Therefore, the authenticity and integrity of the message are guaranteed. In the process from BS to CC, if the message in this round is obtained, because after the private key is signed, h h (ID BS ) + r BS + T , CNum ||CNum along with a timestamp and ID and message of the signature, the identity of the message sender must be truthful and message must be complete, otherwise cannot be verified. The adversary A cannot get CC's private key and shared key, so it can't forge CC and can't destroy the privacy, integrity and confidentiality of CC's message. CC and TA are directly point-to-point secure channel authentication; hence they can also do the above properties.
Theorem 3: Adversary A is unable to perform a replay attack and a middleman attack on this program.
Proof: Because of the automatic timestamp mechanism, the ID is dynamic, and opponent A cannot launch a replay attack on our scheme. In addition, because the signature cannot be verified even if the identity is leaked, enemy A cannot act as a middleman to some extent.
Sincerely, since the Adversary has a dynamic key of some round, it breaches the advantage of any one node function This scheme can achieve a replay attack and a middleman attack on this program. Theorem 4: This scheme satisfies the differential privacy nature.
Proof: The scheme adopts salt-added RSA signature and lattice-homomorphic cryptosystem to satisfy the differential privacy. Therefore, the condition Adv ε,A,M < ε is satisfied in terms of public key cryptosystem. From the perspective of plaintext message collision, Hence, this scheme satisfies the differential privacy nature. Theorem 5: (Forward and Backward Security Theorem) When the key is leaked, the data security of the previous and next time will not be affected.
Proof: It assumes that the difficulty of cracking the shared key is ε. Since the Adversary has a secret shared key of some round, it breaches the advantage of any one node function This scheme can achieve forward and backward security. Theorem 6: (CCA Security Theorem) Our scheme can resist select ciphertext (geographic location information, etc) attack in the public key part, so the security reaches CCA security level. Proof: The security of the public key part of the scheme depends on the security of the RSA-OAEP cryptography scheme, which can reach the CCA security level [10], as the result, the public key part of our scheme can reach the CCA security level. Theorem 7: This scheme can resist differential privacy to some extent. VOLUME 10, 2022 Proof: The opponent will not obtain a valid random number r from the value of m + r when the message m = 0.
In addition, we compared the security of previous schemes with our own and obtained the results as shown in Table 3 (Indicators are Post Quantum Attack(P1), Privacy, Entity Authentication and Authenticity of message(P2), Forward Backward Security(P3), Differential Privacy(P4), Anti Man in the Middle(P5), Replay Attack(P6), Flexibility of Network Structure(P7)), Therefore, our scheme has more comprehensive security.

VI. PERFORMANCE EVALUATION
This section provides an analysis of the communication and computational costs of our proposed scheme. We implemented our solution using a Python 3.7 environment on a computer with a 3.00ghz processor, 4GB of RAM, and Intel Core-i5CPU. See Table 4 for parameter settings.  × n Message × n CAR +t HLP−ENC + t hash ) × n BS + t HLP−DEC Then, the parallelization computation overhead time is (t RSA−TA−ENC + t RSA−CAR−DEC + (t RSA−CAR−ENC + t hash ) × n CAR ) × n Message + t HLP−ENC + t hash + t HLP−DEC . Since the theoretical analysis is not obvious, we obtained the relationship between the total overhead, as shown in Through the experimental calculation, we know that under the premise of n BS = 10, n Message = 50, the total computation cost is about 3175s, and the computation time after parallelization is about 53s, which is an extremely short and acceptable parallel computation time.

VII. CONCLUSION
Based on a post-quantum cryptography system, this paper proposes a practical privacy protection scheme for ridehailing route information, which can make the statistical aggregation operation of the route and frequency from the starting point to the destination complete without the visibility of the ride-hailing platform, and ensure the data privacy security of a single vehicle. Compared with representative multi-vehicle aggregation solutions, we not only achieve message privacy, confidentiality, integrity, forward and backward security, anti-man in attack and redial attack, but also achieve multi-dimensional aggregation, CCA security and anti-quantum attack. In addition, through the analysis of the experiment, the cost of our scheme is reasonable, thus, the scheme is practical in this scenario. CHENG WEI received the bachelor's degree in software engineering from the Jiangxi University of Finance and Economics and the master's degree in computer science from the School of Software, Huazhong University of Science and Technology, in 2019. His current research interests include cryptography and artificial intelligence algorithms. SHUO XIONG received the master's and Ph.D. degrees in information science from the Advanced Institute of Science and Technology, Japan. He is an Assistant Professor with the School of Journalism and Information Communication, Huazhong Univerisity of Science and Technology. His research interests include game informatics, game theory and AI, serious game, and game design. VOLUME 10, 2022