Privacy Protection Framework for Android

The increase in popularity and users of the Android platform in recent years has led to a lot of innovative and smart Android applications (apps). Many of these apps are highly interactive, customizable, and require user data to provide services. While being convenient, user privacy is the primary concern. It is not guaranteed that these apps are not storing user data for their need or scrapping algorithms through them. Android uses the system of permissions to provide security and protect user data. The user can grant permission for requested resources either at runtime or during the installation process. However, this system is often misused in practice by demanding extra permissions that are not required to provide services. These kinds of apps stop functioning if all permissions are not granted to them. Therefore, in this paper, a privacy preserved secure framework is proposed to prevent an app from stealing user data by restricting all unnecessary permissions. Unnecessary permissions are recognized by predicting the permissions required by a given app by using collaborative filtering and frequent permission set mining algorithms. Thus, the proposed model interacts with the target application and modifies the permission data inside. Experimental results reveal that the proposed model not only protects the user data but also ensures the proper functioning of the given application.


I. INTRODUCTION
Android is the most popular operating system (OS) when it comes to mobile platforms. According to Global Stats [1], Android OS enjoys almost 75% of the market share in the Mobile OS Industry, followed by iOS with a 25% share in June 2020. Users prefer Android because of its free and open-source nature with support for many apps. Developers also select Android over the competitive iOS as it is opensource in nature. Applications for Android are written mainly in Java and are commonly referred to as 'apps'.
Security is a crucial aspect of apps. The nature of Android apps makes it difficult to rely on standard, traditional, and dynamic malware analysis systems [2]. Google launched a Google play app security improvement program for providing security services to Google Play app developers to improve the security of their apps [3]. Apps are scanned for potential The associate editor coordinating the review of this manuscript and approving it for publication was Kaitai Liang . malware before uploading on Play Store. In 2017, Google worked on detecting malware and potentially harmful apps for improving security on devices and Play Store using Google Play Protect [4]. In Android 9, Google put a restriction on the access of sensors in the background so that apps running in the background cannot access the camera, microphone, and sensors [5].
To protect user data and passwords, Google has provided a feature for hardware-backed keys. Safe Browsing application programming interface (API) is also present for protection against deceptive websites. While there have been significant developments towards platform security, application development security, and secure Android OS, the apps taking user data can sometimes be malicious. With over 2.7 million apps already present in the Google play store [6], it is hard to determine which app is malicious or which may take data to analyze behavior or sell it to any third party.
To prevent the issue of data security and malicious usage of the applications, Android works on the principle of VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The paper is organized into the following sections. Section II describes security measures that are available in Android. Sections III and IV describe the motivation and previous work done in this regard, respectively. The proposed framework and the communication scheme are discussed in section V. The application of the proposed model, through a case study, is discussed in section VI. Conclusions and future work are shown in section VII.

II. SECURITY IN ANDROID A. ANDROID ARCHITECTURE
Android is a Linux-based open-source software program or rather, an Operating System (OS) [2]. The platform security is based on the OS's architecture, Fig. 1, which is achieved by separating resources and accessibility in subsequent layers. Each layer assumes that the proceeding layer is secure. Subsequent layers become less accessible. The Linux kernel runs on the lowest level and is responsible for performing the basic OS operations such as process management, memory management, and managing other device-related operations. Above the Linux, kernel layer runs the Hardware Abstraction Layer (HAL) which is responsible for providing a standard interface to hardware so that hardware vendors can build their hardware without affecting higher-level layers. Now, the OS can be updated without changing or re-configuring hardware implementations. It also provides an advantage in the HAL. It promotes the principle of least privileges, as HALs in a process do not have access to an identical set of permissions compared to the rest of the process [10].
HAL lies on the services layer of the Android system. The service layer consists of Android runtime libraries and native C libraries. These can be easily accessed by the application layer to perform various Android related functions such as accessing the camera module to record, media module to play something, notification service for sending notifications, and various other Android provided services.
The Java API layer is placed on the top of the service layer. It is an intermediary layer that makes it easy for the application layer to access Android services using Java-based APIs corresponding to them. The topmost layer is the application layer where all the applications installed on the device remain and make use of the Java API framework to access various required Android services.
The Android low-level security model is based on application sandboxing [4], [5]. Android sandboxing is the process of isolating an application in the system. It prevents outside influences on the layers mentioned in the architecture. All applications are assigned a user ID while they are running. They have access rights to their own files only. It prevents outside malware and security threats; if an application experiences a security breach, other applications' operations will not be affected.
Android provides hardware-backed key protection for cryptographic services. The stored keys provide a safe, secure channel for the authentication of user data. Verified Boot is used to check the state of the system when it starts [4]. It verifies whether the system is in a good state. In Android 8.0 (Oreo), Google introduced Project Treble to increase low-level security [4], [11]. Project Treble separates the open-source Android OS framework from the hardware code implementations at the vendor level. It has had a positive impact on device security and the speed of updates.

B. APPLICATION SECURITY
Android uses the permission model to prevent an app from using sensitive data and resources that are not required during runtime. Apps need corresponding permissions to use APIs to interact with the underlying system [12]- [14]. All permissions taken by an app must be specified in the application's manifest file [15].
Permissions are grouped into three categories corresponding to the risk and security level associated with resources and APIs: normal, dangerous, and signature permissions. Normal permissions include the permissions where the application must interact with resources out of the sandbox and do not pose any threat to user privacy. Normal permissions include Bluetooth, KILL_BACKGROUND _PROCESS, and Internet. Signature permissions are granted at the install time and allow an application to use the permissions signed by the identical certificate. VPN_SERVICE is included in the signature permissions. Dangerous permissions are the permissions that could pose a potential security threat or a threat to user privacy. The user is required to approve each of these permissions needed by the application after installation. SMS, storage, and camera permissions are some of the permissions included in this category.
In earlier versions (Until Android 5.0), users were not allowed to choose a subset of permissions. They needed to accept all the permissions mentioned by the application in their manifest file to install the software application into their devices. In Android 6.0 (Android Marshmallow), Google introduced a new mechanism for permission, called runtime permission [16]. Where users are notified of any dangerous permissions at runtime and choose not to give specific permission, it gives a choice to the user to understand the usage of the app and determine whether the requested permission is required for the proper functioning of the app. In some cases, if permissions are not given, the app may not work correctly. Ultimately, the user is forced to accept all the permissions to use the app.
Android's System Alert Window API was modified in Android 8.0. It does not allow apps to draw special windows used to notify the user of the critical messages. It has resulted in the prevention of clickjacking that was used by malicious apps creating overlays on the screen. Users are now allowed to tap the notification to hide overlays [16].
For protecting user phone data, Android provides strict policies for sensitive APIs. In Android 8.0 and above, the GET_ACCOUNTS permission is no longer sufficient to gain complete access to the list of accounts active on the device. For example, the user is now required to grant permission to the Gmail app to access the Google account on the device even though Google owns Gmail. As concrete examples, Settings.Secure.ANDROID_ID or SSAID is an ID provided to all apps. To prevent misuse of the ANDROID_ID value, Android 8.0 provides a mechanism that does not allow the change in ANDROID_ID when the application is re-installed until the package name and key are identical. Another feature, Build.getSerial() returns the actual serial number of the device till the caller holds the PHONE permission. Android 8.0 has deprecated this API's use, and it protects the serial number of the device from being misused by the applications.
Android has seen advancements in hardening security policies. However, it can be noted that as of October 2020, only 40.35% of devices are running Android 10.0, and 22.59% of devices are running Android 9.0 (Pie). More than 35 percent of users are using older versions of Android on their phones [17]. Due to a lack of knowledge in users and lack of security in sensitive APIs, users are often manipulated into using over-privileged applications. VOLUME 10, 2022

III. RELATED WORK
With the growth of Android in the market, malicious applications have also surfaced, which has driven many studies and research works towards it. Iman and Aala [25] proposed a comprehensive analysis of Android permission systems. They provided important insight into the permission system evolution over the years and how permission usage has increased up to 73.33% in top applications by 2020. Sanz et al. [18] proposed a method to recognize malicious Android applications with the help of machine learning (ML) techniques, which extracts the Android permissions from the application. Permissions extracted from the Android Manifest file of an app were utilized to categorize an application as malware using the machine learning model for Android permissions. Karim et al. [15] suggested permissions of an Android app using the collaborative filtering method, associative rule mining, and Bayesian text mining. This approach tried to predict the permissions that must be used by the application after making an association with similar applications. This was developed to help developers know what type of permissions their app might require; it does not include the feedback from the end-users.
Mathur et al. [27] presented a malware detection framework for Android called NATICUSdroid, which investigated and classified benign and malware using statistically selected native and custom Android permissions as features for various ML classifiers. However, these approaches were limited to only mobile resources for its processing and classification. Furthermore, these approaches lacked learning abilities, dynamic processing, and they did nothing to stop these malicious activities in the application, which kept them from being overly successful. Azim and Neamtiu [19] used static dataflow analysis on the apps bytecode for systematic testing of Android apps, which as a result, constructed a highlevel control flow graph among various activities inside the apps. They deduced a method of depth-first flow among these activities, which mimicked the user actions. This approach showed good potential and was the basis of dynamic analysis, though it was still unable to make it learn for itself. It used mobile processing power to do the analysis, which has some shortcomings.
Ricardo et al. [20] worked on a framework for Android apps, which instrumented the app with injections to keep track of any malicious activity an app performs. This approach used dynamic analysis and is the basis of the proposed work. However, this approach was also unable to use the previous results and did not learn from the apps. Shahriar et al. [21] proposed an approach to reduce the number of apps needed to be sandboxed to determine if they are malicious. They used Latent Semantic Indexing (LSI) to identify malware apps though this was limited to the identification of malware applications.
Sadeghi et al. [22] presented a Terminator framework which can provide an effective yet non-disruptive defense against permission-induced attacks by identifying the system's safe state and controlling the permission based on this. It provided access to the permissions and revoked identified unsafe permissions without modifying the app's implementation logic. Zhang et al. [23] presented VetDroid to analyze fine-grained causes of information leaks by capturing the app's sensitive behaviors with permission to use graphs. Security analysts were utilized to analyze the internal sensitive behaviors of the app by reconstructing these behaviors after they have been allowed dangerous permissions. Although, it has a lot of potential but lacks a way to inform and educate users about security threats and does nothing to protect it.
Wu et al. [24] proposed a system that achieved the robust and interpretable classification of Android malware. Their work demonstrated state-of-the-art obfuscationresilient malware analysis which can work on obfuscated Android apps hiding their functionality. Mill et al. [26] proposed a way to classify both obfuscated and unobfuscated apps as malicious or benign. Qu et al. proposed Permizer, an automatic permission optimization to recommend app permission configuration to users. Permizer builds a mapping between permissions and functionalities for each app then regulates the relationship between permission and functionality based on the user's privacy preferences [28]. Xiao et al. proposed an approach to identify minimum required permissions for an android app. They used collaborative filtering to determine the initial minimum permissions of the app. Then, they find the actual requirements that the app really required for proper functioning using static analysis and evaluate the risk by inspecting extra permissions requested by the app, thereafter, generate a permission recommendation [29].
Gao et al. suggested an autonomous permission recommendation system, AutoPer+. It automatically recommends permission decisions to users at runtime. They proposed a deep semi-supervised machine to identify similar apps and explore the privacy permission usage in a cluster of apps that help in determining the correlation between permission and app, which is used in generating permission recommendations [30]. Li et al. built an automatic fuzzing tool, CUPERFUZZER+, to detect vulnerabilities related to custom permissions in existing Android OS and given general design guidelines to secure custom permissions [31].
From literature, it is found that the existing methods do very little to determine how to stop the applications from being malicious and still use it; almost all of them used mobile processing power for the framework and did not learn with time. One of the works in which permissions were being revoked after use, also does not prevent the application from using user data at runtime. This study addresses all these issues by providing a system that analyses an application's functionality permission by permission and prevents them from using user data potentially for malicious purposes. The results that are obtained from the server during analysis are processed and stored for all new applications.

IV. PROPOSED APPROACH
We propose an end-to-end framework to ensure the proper functioning of the app along with user privacy protection. A background data protection service is installed on the user's phone to capture the dangerous API calls on the runtime. It returns garbage data to be sent back to the malicious app. Dedicated server analyze and instrument apps that the user is using for making it compatible with the proposed service. Table 1 shows various keywords used in the proposed model.
The proposed approach uses two algorithms to determine malicious permissions asked by the apps that needs some initial dataset to work on. The control flow of the proposed work is illustrated in Fig. 2. The following two components are presented in the proposed framework:

A. ANALYSIS AND INSTRUMENTATION OF THE APK
The analysis of the app is done using the resources and permissions the app requires. Analysis of permissions against the API calls and the utility of the application is done. The permission recommendations are used to predict whether the application is demanding extra permissions for stealing user data. For that purpose, the algorithms collaborative filtering and frequent permission set mining are used on permissions. The training set is based on the data collected for various applications in various categories (as described in the later section). The result formed by the intersection of the results from both the algorithms individually containing unsafe and extra permissions is sent to the instrumentation engine, which instruments the APK to support functionality to call the Data  Protection Service. Both algorithms work by classifying the permissions of app based on the category that lies in in the play store, and the acquired dataset.

B. DATA PROTECTION SERVICE
To prevent user privacy, the proposed service runs in the background on the user's phone andsends the APK file of the installed app to the server for analysis and instrumentation. It facilitates the installation of the instrumented APK for the user. Finally, when the application starts running, it provides garbage data to the app whenever an identified malicious call to API is made. Garbage data is produced using the broadcast receiver. The Android framework has a facility to allow users to register for events using a broadcast receiver according to the lifetime i.e., statistically and dynamically. In the case of dynamic, the lifetime depends upon Context.registerReceiver() and Context.unregisterReceiver() on the app component. In the case of static, a receiver is specified in the AndroidManifest.xml and has an identical lifetime to the app. The receiver utilizes a callback approach i.e., BroadcastReceiver.onReceive(), to override SDK calls [27].
The following approaches on which the proposed framework works are described in detail:

1) DATA COLLECTION
The data collection approach is divided into two parts. The initial data collection is done by developing and circulating one data collection app. This app is downloaded by roughly 300 users through which information about 1000 unique applications is collected. Thereafter, the data of asked permissions and the permissions provided by the user are extracted, and the half probability rule is used to determine whether the permission is necessary or malicious. Afterward, whenever the algorithms run on the server, unique app data is added to the database, which would help to increase the dataset and help the proposed framework learn with time.

2) ANALYSIS AND INSTRUMENTATION
The engine on the server runs to analyze and instrument APKs (see Fig. 3). It begins with the decompilation of the app using Apktool which is used to reverse engineer Android apps. It decompiles app into Smali code, i.e., the assembly code that runs on the Dalvik Virtual Machine (Android's Java Virtual Machine). The decompiled code goes through the following stages in the parsing and instrumentation engine.
The application is then repackaged using Apktool. It is installed on the user's Android phone by the Data Protection Service.

a: STATIC ANALYSIS
The original Smali code is given as input to the proposed engine. Static analysis is performed by parsing Smali code files. A map of app is created, which shows the file name and the methods included in the file, class names, and API calls. A clear map is created for reference in instrumentation in the later stages. The manifest file is parsed separately. All permissions are extracted from the file, and dangerous permissions from the superset are kept for analysis. A pythonbased parser handles the manifest file and Smali code; it traverses the directory of the decompiled APK and maintains a record of class names, method names, and API Calls for each file. A map is created for all method calls. During enforcement of redefined permission policies, the map is used to locate the functions and files to be instrumented.

b: PERMISSION ANALYSIS
As discussed earlier, two approaches were used to identify the permissions required by an app: collaborative filtering and frequent permission set mining. The training set consists of apps of all categories as available in the Google play store.

COLLABORATIVE FILTERING
Collaborative filtering is one of the commonly used techniques in recommender systems. It utilizes the information contained in a group to recommend information on a new entity related to the group. It is based on the idea that entities that share certain evaluation criteria of certain items in the past are likely to agree again in the future. Feature vectors are used in this method to represent the items in the entity that goes through the evaluation process of finding a similarity score.
(i) Finding the feature vector in the proposed engine The app permissions are used as feature vectors for the collaborative filtering engine. Permissions of an app are extracted in a vector as V =< P 1 , P 2 , . . . , P n >, where P i can take values from the set [0, 1] depending upon whether the app takes that permission. Feature vectors from the apps in the training data set are taken. The engine first extracts the apps in the same category as that of the test app. It then extracts all the permissions in the feature vectors for filtering and recommendation. A i = {AP 1 , AP 2 , . . . , AP n }, where AP i takes the value from the set [0, 1].
(ii) Evaluation of similarity A similarity score is a measure of how closely related two entities are. The similarity is calculated for the app with all the apps in the same category using the Jaccard similarity score as A i is the app from the data set in the same category. A t is the test app. F 11 is the frequency of matches in the permissions between A i and A t . F 01 is the frequency of permissions 1 in the case of A i and 0 in the case of A t . F 10 is the frequency of permissions 1 in the case of A t and 0 in the case of A i .
(iii) Recommendation of Permissions A recommendation score is generated for each permission request by the app A t . It can be calculated using Here, majority voting is considered, with the voting's weight proportional to the similarity score generated above. The generated RScore is normalized. Depending on the score, the permission is marked as safe if recommended for the app to be used; otherwise, it is marked unsafe. The step-by-step flow of the permission segregator is presented in Algo. (i) Training of the Proposed Model Applications belonging to the same category follow some pattern of frequently co-occurring permissions as: Taking in pairs < P i , P j >, the support of the co-occurring permission pair is calculated as: where N is the total number of applications in the category Freq < P i , P j > is the frequency of the pair < P i , P j > when it is requested together.
(ii) Recommendation of permissions The support calculated for all pairs is analyzed.
Here, t defines the threshold value. Permission pairs having a support value higher than threshold are marked safe and recommended. Rest is marked unsafe Recommend < P i , P j > = 0. After calculating safe permissions from each recommender, the intersection of the resulting permission sets is the final permission set that will be used for further processing. Algo. 2 presents the various steps of the permission miner.

C. INSTRUMENTATION
In this study, our primary focus is on dangerous permissions. The permissions suggested by the permission recommendation engine are fed into the instrumentation engine. The policies suggested by the permission recommendation engine are marked safe. The instrumentation engine modifies the policies marked as unsafe. Smali code is instrumented to facilitate communication with the background service at runtime. Hence, through instrumentation, the communication between the malicious detected app and the background service through broadcast receivers is enabled. Background processes are utilized as services on Android. These processes do not provide graphical components and are implemented for background activities for a given program. All services utilized by an app must be added in the manifest [23]. Permissions that are marked unsafe are injected with the piece of code invoked by the required services using broadcast receivers. All the unsafe policies are instrumented within the app then repackaged using Apktool.

1) DATA PROTECTION SERVICE
The Data Protection Service is installed on the user's phone andserves two purposes: It is the job of the service installed on the device to communicate with the server using a secure communication channel when the user asks the service app to secure the malicious app. The background service uploads the APK file of the app to be sent to the server for analysis and instrumentation. After the instrumentation is completed, the server sends the instrumented APK file back to the service using the same secure channel. The Data Protection Service receives the instrumented app to be installed back to the user's phone. On receiving app, the background service prompts the user to ''uninstall and install'' i.e., uninstall the previous build and install the new modified build. The service runs a check on that app for its working. If the app is found to work without issue, the modified APK configuration is approved and sent to the server for future use. Whenever an app is sent to the server for analysis, the server checks the database for pre-existing records of the corresponding app. If found, it instruments the APK file using pre-processed values else, the algorithm determines the required permission set, and instrumentation is done accordingly.

b: BACKGROUND SERVICE
Instrumented apps that get installed on the phone are now allowed to communicate to the pre-installed background service. Garbage values are returned to the app when the call to unsafe policies is made. The call to the API triggers an intent to the background service running on the user's phone. The service returns a garbage value, and hence the user's privacy is protected. Thus, the data received by the proposed app is carefully monitored, and the broadcast receiver sends the garbage data corresponding to that permission to the malicious app. The data received by the app is treated as real data, while it is garbage. Hence, the proposed framework does not hamper the proper functioning of the app, solving the problem of malfunction of the apps if the user declines some unwanted permissions.

2) ANONYMOUS AUTHENTICATION AND KEY AGREEMENT SCHEME
To provide secure communication between mobile clients and the server, an anonymous authentication and key agreement scheme is also proposed. The notations used in this section are presented in Table 2.

a: PROPOSED SCHEME
In the proposed scheme, each mobile is treated as a client.
The proposed scheme provides client authentication without revealing the real identity of the client and session key agreement for secure communication between client and server. The scheme consists of four phases: system setup phase, client registration phase, authentication phase, and client's secret parameter updating phase. The working of the scheme is as follows:

SYSTEM SETUP PHASE
This phase is to generate initial parameters for the client registration and authentication phase of the scheme. The system generates initial parameters as follows (i) Choose two large prime numbers p and q, and an elliptic curve E over a prime field F p (y 2 = x 3 + ax + b mod p, where a, b ∈ F p , and 4a 3 + 27b = 0). Define O at infinity, P is the generator point of E with order q (where P = O). (ii) The server S chooses a random number s m as the private key and computes P pub = s m .P, and selects three oneway hash functions h 1 , h 2 , and h 3 . (iii) Server S publishes as

CLIENT REGISTRATION PHASE
When a client C i wants to register with server S, client and server perform the following steps: (i) The client C i generates a random number u i and computes the ID i using the user's email id EID i and u i , While ID i and T 1 are encrypted with the server's public key, only the server's private key can decrypt C 1 . (ii) On receiving a registration request < C 1 >, the server S decrypts the message to read the identity of the client C i and timestamp, (ID i , T 1 ) = D s m (C 1 ). Server S checks the freshness of the timestamp. If it is not fresh, S drops the registration process; otherwise, S checks for a collision in the verifier table. If the collision happens, the server S informs the client C i to restart the registration process; else, the server S chooses a random number v i and computes the client's secret key K i = ((v i )/(s m .ID i )).P, client's anonymous identity AID i = ID i ⊕ h 2 (v i s m ), and symmetric key KT = h 2 (ID i T 1 ) to communicate K i and AID i securely. After that, server S encrypts K i and AID i using KT , C 2 = E KT (K i AID i T 2 ) and sent < C 2 > to client C i over a public channel. Server S stores {ID i , v i , AID i } in a table. (iii) After receiving C 2 from the server S, the client C i computes KT = h 2 (ID i T 1 ) and decrypts the message (K i AID i T 2 ) = D KT (C 2 ). Client C i checks the freshness of T 2 , C i will abort the current registration attempt and start the registration process from the start if T 2 fails the freshness test; else,

AUTHENTICATION PHASE
In this phase, mutual authentication shall be accomplished between client C i and server S, and a session key will be generated. To achieve this, server and client perform the following steps. The details are illustrated in Fig. 4. (i) The client C i chooses a random number r i and computes  to the client C i through a public channel. Session key agreement suggests that the C i wants to communicate securely with server S, subsequent session work as an acknowledgment, and server S updates the old AID i with AID inew in the verifier table. If server S does not receive any message encrypted with SK from C i after mutual authentication and key agreement, the server S will know the client C i may have lost the message and stops AID i . (iii) After receiving the message < AID i , R s , M 2 , T 4 > from server S, client C i checks the freshness of T 4 .
If not, C i drops the session; else, C i computes R cs = r i .R s (= r i .r s .P = R sc ), session key SK = h 3 (ID i R cs K i T 3 T 4 ) and checks if M 2 = ? h 3 (ID i R s T 4 SK ). If not, the C i terminates the session; otherwise, the client C i accepts SK and computes the anonymous identity AID inew = AID i ⊕ h 3 (R cs ID i ) and replaces the old AID i with AID inew .

CLIENT'S SECRET PARAMETER UPDATING PHASE
After authentication and agreeing on the session key, the client C i sends an update request to the server S, encrypted using session key SK . On receiving an update request, S generates v inew for C i and computes K inew = v inew s m .ID i .P and AID inew = ID i ⊕ h2(v inew s m ). After that, S sends K inew and AID inew with a timestamp to C i encrypted using session key SK . Client C i updates the K i and AID i with received parameters K inew and AID inew and send acknowledgement to the server S. After receiving an acknowledgment from C i , S updates v i with v inew and AID i with AID inew in the verifier table.

b: SECURITY ANALYSIS
This section provides formal security verification using AVISPA and informal security analysis to prove that this scheme provides mutual authentication, client anonymity, session key agreement, and the scheme is secure against known attacks.

FORMAL SECURITY VERIFICATION USING AVISPA
In the scheme execution, Client Ci receives the start signal and sends the identity IDi with timestamp T 1 encrypted with the server's public key as the registration request. Afterward, client Ci receives the security parameter Ki and anonymous identity AIDi with timestamp T 2 encrypted with symmetric 7982 VOLUME 10, 2022 key KT from server S, and store IDi, Ki, and AIDi into memory. Server S stores IDi, vi, and AIDi in the verifier table.
During the authentication phase, client Ci sends < AIDi, Ri, M 1, T 3 > to the server S. On receiving the message from client Ci, the server S computes session key SK and new anonymous identity AIDinew for Ci and sends a message < AIDi, Rs, M 2, T 4 >. Afterward, Ci computes session key SK using IDi, Rsc, Ki, T 3, and T 4 where Rsc is a sessionspecific shared secret between Ci and S. Role of Ci and S are given in Figs. 5 and 6, respectively.
The constants sec1, sec2, sec3, sec4, sec5, sec6, sec7, sec8, ctos, and stoc are used to identify the goal of secrecy and authentication in the goal section (see Fig. 9). The communication channel (dy) used in the implementation of this scheme belongs to the Dolev-Yao threat model in which intruders (i) can intercept, analyze, reroute, and modify the message. The HLPSL code has been simulated using the SPAN (Security Protocol ANimator) to examine the results. HLPSL code of environment and session is given in Figs. 7 and 8, respectively. The simulation results are shown in Fig. 10 and Fig. 11 for the OFMC and CL-AtSe model. Results show that the proposed scheme is safe.

INFORMAL SECURITY ANALYSIS
(i) Mutual authentication: In this scheme, the server S authenticates the client C i by checking M 1 . If M 1 is     C i authenticates S. Thus, the proposed scheme attains mutual authentication. (ii) Client anonymity and privacy: In the proposed scheme, only identity-related information used in communication is anonymous identity AID i . Initially, it is computed using the client's identity ID i , servers master key s m and client-specific random number v i , secured using a one-way hash function h 2 . s m and v i are only known to S, and ID i is not used in unsecured communication.
So, an attacker cannot relate any communication to the client id. Anonymous id AID i is updated in every session using shared computed session value R sc and the client's identity ID i . Where R sc is computed using two random values r i and r s (R sc = r i .r s .P), which guarantees the randomness of anonymity so it is computationally hard to distinguish whether the messages belong to the same client.  valid then check the validity of M 1 . If both are valid, the server S will authenticate the client C i . The attacker cannot change M 1 without knowledge of ID i . As a result, the attacker will not be able to launch a successful replay attack. (iv) Malicious insider attack: if an attacker is also a registered client, they know their secret information ID e , K e and AID e where AID e = ID e ⊕ h 2 (v e s m ) and K e = ((v e )/(s m .ID e )).P as well as messages communicated between server S and other clients on the unsecured channel, e.g., < AID i , R i , M 1 , T 3 > and < AID i , R s , M 2 , T 4 > where AID i is an anonymous identity of the client, R i = r i .P, . It is computationally hard to extract s m from k e and D e , r i from R i , and r s from R s . ID i , K , and SK are secured with a one-way hash function. So, using these parameters, the attacker will not be able to extract any secret parameters of the server or other clients. (v) Impersonation attack: To impersonate as the client C i , the attacker needs ID i , K i , and current AID i . These parameters are secured on the client device or can be computed using the server's master key and clientspecific information stored on the server. Let us assume that the attacker knows the real and anonymous identity of the client C i . To generate the request message, it can compute R ie = r e .P, but computing M 1 required the client's secret parameter K i . To impersonate server S, an attacker needs the server's master key s m , as it does not have s m and client's identity ID i . Hence, the attacker cannot impersonate a client or server.

V. RESULTS AND CASE STUDY
In this study, the behavior of nine permissions is analyzed. These permissions (listed in Table 3) are the permissions   that 'read' user data on the phone. The proposed framework begins the analysis of the manifest file and Smali code concerning these permissions. To check the working of the proposed framework, the three apps such as Brightest Flashlight (golden-shores-technologies. Brightest-flashlight.free), Peacock Flashlight (com.peacock. flashlight), and Flashlight (com. spend apps. torch) were studied. Each of the apps required a different set of permissions to run. The Brightest Flashlight app asked for various permissions, the peacock flashlight asked for location (loc) and storage permissions, and the spend apps flashlight asked for none. To get a minimum set of permissions and instrumentation, these apps were sent to the proposed framework. The apps were decompiled and Smali code was generated for each APK. Here, AP and ACC represent Android.permission and access, respectively.

A. STATIC ANALYSIS
The first step in the process is static analysis; it yields the permissions declared in the AndroidManifest.xml and classes and their respective methods from the Smali code using Smali parsers of the engine. A list of all permissions that are processed by our engine that is required by Brightest flashlight is shown in Fig. 12, and all the dangerous permissions are shown in Fig. 13. As discussed in Section V in analysis and instrumentation, a map is obtained from the Python parser which shows method traces and data flow. The class names and method calls of these dangerous permissions of Brightest Flashlight APK are shown in Fig. 14.
The dangerous permissions declared in the Brightest Flashlight app are AP.CAMERA, AP.ACC_FINE_LOC, AP.ACC_COARSE_LOC, AP.READ_PHONE_STATE, and AP.WRITE_EXTERNAL_STORAGE, as shown in Figure 13. Similar code analysis on the two apps shows that the dangerous permissions in the Peacock Flashlight are AP.CAMERA, AP.WRITE_EXTERNAL_STORAGE, AP.ACC_FINE_LOC, and AP.ACC_COARSE_LOC. Meanwhile, the Splendapps flashlight takes AP.CAMERA permission only which is justified as per the requirement.

B. PERMISSION ANALYSIS
The permissions parsed as described in the previous subsection are provided as input to permission recommendation algorithms. The algorithms evaluate each permission and yield result vectors. Each result vector contains 9 elements which can be 0 or 1. Each value in the vector corresponds to permission as given in Table 2. If the permission is marked safe to use and is required by the application, the corresponding value is 1 else the value is 0.
Running the permission recommender for Brightest Flashlight: for collaborative filtering, using a threshold value as 0.1. The resultant vector can be obtained as: Here, r p shows the resultPermissions. The RScoreCF for each permission was found below the threshold value. It signifies that this app required none of the permissions and all the three permissions 'AP.ACC_FINE_LOC', VOLUME 10, 2022 'AP.ACC_COARSE_LOC', and 'AP.READ_PHONE_ STATE' are classified as unsafe.
From frequent permission set mining, each permission's support was computed and evaluated against the average value as mentioned in Section V. The result vector can be obtained as: Since Splendid Torch took no extra permissions, therefore, we did not perform any permission analysis for this app.

C. INSTRUMENTATION AND FINAL RESULTS
The permission analysis phase identified LOC and READ_PHONE_STATE permissions as unsafe for the flashlight applications. Brightest Flashlight and Peacock Flashlight were instrumented and installed on the target device. The instrumented apps interacted with the background service at runtime. Garbage location data was sent to the apps, and it was seen that the apps functioned properly after instrumentation.
After completing the whole process, the proposed framework gave the following results: 1) A flashlight app requires CAMERA permission for its operation.
2) The rest of the permissions that the two applications requested are classified as unsafe. The results of the applications were added to the dataset for use in the future.
3) Instrumentation and re-packaging the application restored the application's true use while protecting user data that could have been used for malicious activities. We took three apps in the same category and of the same utility to study their patterns of operation. The three apps although, of the same nature, behaved differently as they were taking different permissions which were not directly related to the actual functionality that they have been listed for. The results obtained from the permission recommender show that two of the applications are taking extra permissions. After instrumentation, it is seen that their operation was unaltered, which shows that their functioning had not been impacted. At the same time, the user location was protected from a potentially malicious Android application.
The above case study shows that the proposed framework can be used to analyze and instrument Android applications to prevent user data from being used maliciously.
The existing related works discussed in Section IV is compared to the proposed framework as depicted in Table 4. For comparison, four features such as use of dynamic learning (DL), instrumentation (IN), permission recommendation (PR), and app based dangerous permission detection (ADPD) are considered. It is found that the proposed framework is able to support the analysis of an Android application as well as the prevention of user data theft. Existing works focused on detecting whether the app is benign/malware. Terminator prevents the application from using extra permissions by revoking access to those permissions identified as dangerous. But it is failed in the scenarios where an application fails to start without access to the permissions it requires. The proposed solution addresses this issue with instrumentation and ensures that the app is functioning as expected.

VI. CONCLUSION AND FUTURE DIRECTION
The smartphone market has grown extensively in recent years and has become a repository for users' private data making the security of the device a big challenge. As technology advances, the risk of data breaches and invasion of privacy increases. Various research approaches were presented to identify the malicious behavior of Android applications. A privacy-preserving secure framework was proposed to prevent the applications from stealing user data by restricting all unnecessary permissions using instrumentation and repackaging of the application. These permissions were recognized by predicting the permissions required by a given Android app by using collaborative filtering and frequent permission set mining algorithms. Thus, the proposed model interacts with the target app and modifies the permission data inside. A layer of security was added in proposed framework to prevent attackers from intercepting communications. Therefore, the proposed framework is more secure and efficient than the competitive models. Experimental results have shown that the proposed model not only protects the user data but also ensures the proper functioning of the given application.
However, this approach may achieve poor results for sealed protected applications that generally come under the category of finance/ payments as these applications come with additional security. Hence, these apps cannot be installed after they have been instrumented. In the future, the framework can be modified to make it resilient to the additional securities/ protections in the applications.
BHARAVI MISHRA received the master's degree from the Indian Institute of Information Technology, Allahabad, India, and the Ph.D. degree from the Indian Institute of Technology (BHU), Varanasi. He is working as the Assistant Professor with the Department of Computer Science and Engineering, The LNM Institute of Information Technology, Jaipur, India. He published more than 15 research articles in reputed journals and conferences. He also published three book chapters. His research interests include machine learning and its applications, security, and privacy.
AASTHA AGARWAL received the B.Tech. degree from The LNM Institute of Information Technology. Currently, she is working at VMware, India, as a Software Development Engineer. Her research interests include brain-computer interface with psychology, android security, and machine learning. VOLUME 10, 2022