Loading web-font TeX/Main/Regular
Runtime and Design Time Completeness Checking of Dangerous Android App Permissions Against GDPR | IEEE Journals & Magazine | IEEE Xplore

Runtime and Design Time Completeness Checking of Dangerous Android App Permissions Against GDPR


The Architecture of the Proof of Concept (PoC) for generating GDPR-compliant Dangerous Android Permission Policy Declarations (DAPD) using class relationship diagrams in ...

Abstract:

Data and privacy laws, such as the GDPR, require mobile apps that collect and process the personal data of their citizens to have a legally-compliant policy. Since these ...Show More

Abstract:

Data and privacy laws, such as the GDPR, require mobile apps that collect and process the personal data of their citizens to have a legally-compliant policy. Since these mobile apps are hosted on app distribution platforms such as Google Play Store and App Store, the app publishers also require the app developers who wish to submit a new app or make changes to an existing app to be transparent about their app privacy practices regarding handling sensitive user data that requires sensitive permissions such as calendar, camera, microphone. To verify compliance with privacy regulators and app distribution platforms, the app privacy policies and permissions are investigated for consistency. However, little has been done to investigate GDPR completeness checking within the Android permission ecosystem. In this paper, we investigate the design and runtime approaches towards completeness checking of sensitive (‘dangerous’) Android permission policy declarations against GDPR. In this paper, we investigate the design and runtime approaches towards completeness checking of dangerous Android permission policy declarations against GDPR. Leveraging the MPP-270 annotated corpus that describes permission declarations in application privacy policies, six natural language processing and language modelling algorithms are developed to measure permission completeness during runtime while a proof of concept Class Unified Modeling Language Diagram (UML) tool is developed to generate GDPR-compliant permission policy declarations using UML diagrams during design time. This paper makes a significant contribution to the identification of appropriate permission policy declaration methodologies that a developer can use to target particular GDPR laws, increasing GDPR compliance by 12% in cases during runtime using BERT word embedding, measuring GDPR compliance in permission policy sentences, and a UML-driven tool to generate compliant permission declarations.
The Architecture of the Proof of Concept (PoC) for generating GDPR-compliant Dangerous Android Permission Policy Declarations (DAPD) using class relationship diagrams in ...
Published in: IEEE Access ( Volume: 12)
Page(s): 1 - 22
Date of Publication: 25 December 2023
Electronic ISSN: 2169-3536

SECTION I.

Introduction

The EU’s General Data Protection Regulation (GDPR) came into effect in 2018 and contains 99 articles and 173 recitals that apply to any company that processes or stores personal data for EU citizens even if the application is not EU-based [1]. The penalties for breaking GDPR laws in the most serious cases can be as high as €20 million or 4% of the annual turnover rate. In lesser scenarios, penalties and fines can still lead to reprimands and restrictions on obtaining and processing personal data which can become detrimental for a company or organization that needs to store personal information [2]. To protect access to sensitive information and actions, Android utilises app permissions to support user privacy [3]. While there are different base permission types in the Android ecosystem, they are characterized by a protection level that describes the risk implied in the permission. Dangerous permissions (aka runtime permissions) are one of the select range of permissions types that the user has to accept and acknowledge. In the official description on Google developer documentation [4], dangerous permissions are, “a higher-risk permission that would give a requesting application access to private user data or control over the device that can negatively impact the user”. Dangerous permissions carry the risk of revealing personal information and the identity of the user. The use of dangerous permission requires a privacy policy by law [5]. The need to access sensitive areas of a device to gain personal information is a decision taken by the application developer and must be defined in the application manifest file.1 Developers are susceptible to errors when writing privacy policies that declare the collection, usage, processing and transfer of personal information in a meaningful and transparent way [6]. Such mistakes could lead to the developer inadvertently breaking GDPR laws and receiving a heavy fine, jeopardizing the company or organization they work for and tarnishing consumer transparency concerning the handling of personal data. Several studies [6], [7], [8], [9], [10], [11] have shown that developers struggled to embed privacy into software systems. These studies suggest that software developers who design systems that collect and process sensitive user data have difficulties with incorporating privacy requirements and protocols from regulatory authorities into software applications. The lack of decision support tools for applying data protection principles, privacy reasoning, and user privacy verification in software design is cited by developers as the main deterrent to incorporating GDPR principles into software development practises [7], [8], [12].

In evaluating Android permission completeness, a large-scale evaluation of 164156 Android apps was explored in [13] and [14] to investigate whether the privacy policy matches its dangerous permission request. The investigations have shown that app privacy policies and sensitive (or “dangerous”) permission requests are not always transparent. Prior literatures [15], [16], [17], [18], [19], [20], [21], and [22] have demonstrated the discrepancy that exists in the Android and iOS ecosystem by evaluating sensitive data access through dangerous permissions, app’s code, third party library, data dissemination practices, Android API usage, app privacy policies, library inclusion and other relevant metadata. The common denominator amongst these works of literature is the investigation of the trustworthiness of the app’s privacy policies from a privacy and regulatory point of view. The conclusion of the privacy compliance analysis of mobile apps investigated in the literature is the prevalence of questionable privacy policies, inconsistencies, lack of transparency and non-compliance with regulatory requirements. A challenge that developers face is that developers must comply with privacy laws and there is no real methodology that exists to assist in the development of a privacy policy thus developers are trying to comply with regulations without the necessary knowledge of what language and explicit terms of language are needed to implement dangerous android permission-policy declarations (DAPD) [23]. This has resulted in many mobile application developers seeking guidance on Stack Overflow for the creation of compliant privacy policies [24], [25], [26]. The challenge of creating GDPR-compliant privacy policies becomes more evident as developers due to either confusion, ease of development, misuse or disregard requests for multiple permissions for the same information [27].

One way to mitigate the challenges developers face is by creating automated tools to assist small to medium-sized teams in the generation of permission policy snippets that are compliant with privacy laws. To create developer-centric solutions, this study investigates GDPR compliance of the dangerous Android permission-policy declarations used for each permission group in 270 mobile applications during runtime. The three-pronged approach investigates (i) the completeness of dangerous android permissions in fulfilling GDPR obligations, (ii) the feasibility of generating GDPR-compliant policies for sensitive permission requirements extracted from UML diagrams at design time, and (iii) evaluates if the GDPR is fit for purpose in describing android permission categories, the sensitive data requested, sensitive APIs, actions permissions represent and the semantic meaning. Since the GDPR contains articles and recitals that describe the data protection regulation an individual or organisation must comply with, while Android permission policy declarations are a developer’s attempt to convey transparently information about apps accessing dangerous permission to collect sensitive data, it is therefore, necessary to investigate whether such permission-policy snippets are coherent, explicit, accurate, concise and transparent complies with GDPR as the benchmark. As a result of this, dangerous Android permission policy statements are a verified approach for completeness checking of privacy policies and applications. The contribution of this research is highlighted below:

  • Completeness Checking of Sensitive Android Permissions and GDPR: To the best of our knowledge, this is the first work that evaluates completeness checking of articles of the GDPR and sentences declaring the request and usage of sensitive Android permissions. Most works of the literature [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39] evaluate completeness checking of applications and privacy policies against GDPR requirements. We investigated how well the permissions policy and categories adhered to GDPR. This was further backed by a thorough examination of the GDPR’s suitability for verifying the accuracy of Android permissions.

  • Empirical Analysis of the Suitability of Diverse Natural Language Processing Techniques for Text Similarity: We evaluate six NLP algorithms to measure GDPR compliance in the language and declarations used in different dangerous android permission declaration methodologies at multiple textual dimensions. The algorithms investigated are Universal Sentence Encoder (USE) [40], Sentence Bert (SBERT) [41], Glove [42], Bi-Directional Encoder Representations (BERT) word embedding [43], N-Grams, Vector Space Modelling (VSM) [44], [45] and Fuzzy String Matching (FSM).

  • Requirements Engineering: While other techniques have operationalized requirements from texts using statistical NLP [46], semantic frames [47], semantic parsing [48], domain-specific language [49], graphical modelling language [50], privacy-enhanced business process model and notation [51],information-flow labels [21], we used statistical NLP and UML mapping to identify permission-related requirement.

  • Privacy Policy Generation at Design Time Using UML Diagrams: Using modelling languages for visualising a system at design time, we implement a solution that helps developers to generate compliant sensitive permission declarations using UML diagrams (class diagrams, activity diagrams etc) during design time. First, it scans the UML diagrams and checks which permission is required based on the classes, attributes, operations and relationships between objects and generates a privacy policy declaration for the specific sensitive permission based on a specified threshold.

In this paper, the term DAPD is used frequently. By DAPD, we mean statements in the app privacy policy explicitly or implicitly describing access to dangerous or sensitive Android permission declared in the app’s manifest file. These statements are required to provide information about the sensitive data the application is collecting through dangerous permissions and how it will be processed. If the application is accessing multiple sensitive areas of a user’s device, then, it is expected to find multiple DAPDs in the app policy since the app requires permissions for each area. By DAPD methodologies, we mean the different methods, application developers are using to provide these permission-policy snippets in their app privacy policy. We use completeness and compliance interchangeably. We are also aware of the debate around the use of terminologies, privacy policies and privacy notices, which are two distinct documents. The argument has been that privacy policy is internal, while privacy notices are external and customer-facing. As a result, privacy notices are statements that explain to visitors (users) how their data will be used and their privacy rights, but privacy policies are the company’s guidelines for how employees should protect customer data.2 For the sake of this study, an external customer-facing statement prepared by app developers that outlines how the app collects uses, and shares user data is referred to as the app privacy policy.

The rest of the paper is structured as follows. Section II reviews the literature on eliciting privacy and security requirements from GDPR for system compliance, completeness checking of privacy policies and applications, and natural language processing techniques for textual similarity in GDPR. The methodology is presented in Section III including the key components of the proposed framework for runtime and design time GDPR compliance checking using Android app permissions, the datasets used and the pre-processing steps, textual similarity algorithms implemented and the similarity metric used. Section IV demonstrates the results obtained from experiments designed to evaluate the proposed compliance-checking methodology. We also discuss the practical implications of the results from a developer and platform perspective. Section V discusses the limitations of the proposed approach and future directions, while Section VI concludes the work with a summary of the key findings and future work.

SECTION II.

Literature Review

While there are works in literature [50], [52], [53], [54], [55], [56], [57] that have focused on extracting privacy-related and software requirements from GDPR, our work is focused on assisting developers with the compliance requirements associated with Android permissions declarations and UML design based on articles from the GDPR law. We provide a literature review of two key areas that relate to our work: (i) completeness checking of privacy policies, and (ii) completeness checking of software (applications) against data protection regulations.

A. Completeness Checking of Privacy Policies

Completeness checking of privacy policies against GDPR was examined in [28] and [29] using a two-pronged approach that identifies privacy-related requirements in GDPR with privacy policies using a conceptual model of metadata traceable to GDPR articles. Abualhaija et al. [30] proposed an automated question-answering approach useful for discovering legal text passages related to compliance requirements to help requirements engineers embed privacy in the design of software systems. Lippi et al. [31] proposed CLAUDETTE, a web server that automates the detection of potentially unfair clauses in online contracts using machine learning and natural language processing on a corpus of 50 contracts, to accomplish AI-enabled consumer protection. Tesfay et al. [38] proposed PrivacyGuide, an end-user support tool for reading and understanding privacy policies using GDPR as the guide. Sanchez et al. [32] investigated the automation of privacy policy compliance as a multilabel text classification task using SVM. Each statement in a given policy is assessed and classified against each data protection goal listed in GDPR.

Using a dataset of 115 privacy policies, Mousavi et al. [39] used word embeddings, CNN and BERT for the multilabel classification of privacy policy paragraphs into predefined categories to produce a standard benchmark for privacy policy classification. Through the representation of data practice descriptions in privacy statements as semantic frames, Bhatia et al. [33] proposed an approach for identifying incompleteness in data action instances such as collection, retention, usage and transfer. By modelling data-intensive applications (DIAS) as a dataflow, Guerriero et al. [34] proposed a framework for defining, enforcing and checking privacy policies in large-scale DIAs. Elwany et al. [58] produced an Optical Character Recognition (OCR) mechanism to analyze legal documentation by leveraging a fine-turned BERT model to understand and extract text from legal corpora. Elluri et al. [59] measured the semantic similarity of different GDPR laws with cloud privacy policies. Hegel et al. [60] used NLP and OCR in legal documents to extract visual features such as layout, style and text placement to extract important pieces of information through enhanced contextual understanding. Other approaches have used crowdsourcing techniques to investigate whether data practises and privacy goals can be reliably extracted from privacy policies through crowdsourcing for the completeness of privacy policy checking [35], [36], [37].

The major advantage of approaches in this area is that they provide an automated way of verifying whether the content of a privacy policy is complete according to the provisions of relevant data protection regulations such as GDPR. By designing completeness criteria based on data privacy goals or privacy-related provisions in the GDPR, these approaches can investigate violations in privacy policies. This approach has some limitations. Firstly, they do not investigate the problem at a personal data or sensitive user actions level in the privacy policy. Solutions are developed by extracting metadata from the GDPR for completeness checking. A violation has taken place, for instance, if a controller is not named in a privacy policy. Such information is vague about which sensitive or personally identifiable information was compromised. Second, a subjective interpretation and comprehension of GDPR articles are used in the construction of the criterion. Thirdly, the GDPR identifies personal data and special categories of data in its definition, which calls for various processing requirements. However, the problem is only broadly examined by present methodologies for completeness methods. Finally, the approaches are not generalizable as the multi-domain evaluation of the metadata identification and completeness approaches have not been verified. To replicate the methodologies of completeness checking based on metadata for other data protection regulations such as the California Consumer Privacy Act (CCPA), a new conceptual model of privacy-policy metadata through systematic qualitative and completeness checking criteria for privacy policies for CCPA would be developed that feeds into developing an automated solution. This required effort hinders the replication of the proposed methodologies.

B. Completeness Checking of Applications

Users are concerned about the privacy of applications they use, especially if sensitive user data is involved, as evidenced by user reviews of COVID-19 contact tracing apps [61]. Fan et al. [62] investigated GDPR compliance violations at the app privacy and code level in mobile health applications by verifying the completeness of privacy policy, the consistency of data collection and the security of data transmission. In an exploratory study, Kununka et al. [63] examined the data handling practices and privacy policy compliance of Android and iOS apps for discrepancies. Hatamian et al. [64] studied the extent to which COVID-19 contact tracing Android apps comply with the legal requirements of GDPR. Rahman et al. [13] proposed an automated machine learning solution to evaluate completeness checking in Android applications dangerous permissions against privacy policies and highlighted the non-transparent state of permission-policy declarations of dangerous Android permissions. Shezan et al. [48] developed an NLP-driven approach, NLP2GDPR, to automatically extract text from Android applications and generate a GDPR-compliant feature. Slavin et al. [19] created an approach that identifies privacy promises in mobile application privacy policies and checks against the code using information flow analysis to see if data is extracted outside of an application thus infringing on privacy policy declarations.

The approaches in this domain have made significant progress in compliance checking of applications. This is done by investigating the compliance level of different kinds of mobile applications with legal requirements in GDPR and investigating discrepancies in applications for violations. These approaches also go beyond the app privacy policies by checking for violations in the app code and permissions. One of the major limitations of these approaches is that they have not considered the three-pronged approach of completeness checking using the permissions, app privacy policy and GDPR for a robust view. Apps require privacy policies, and those policies must be GDPR-compliant and disclose sensitive data access that requires dangerous permissions. It is this limitation that influenced the proposed research. By examining related works on completeness checking of privacy policies against GDPR and completeness checking of Applications, identifying privacy-related requirements and NLP for semantic similarity in legal documents, it was observed that little or no empirical analysis has been conducted to measure compliance of permission policy statements for dangerous android permissions with GDPR and the generation of GDPR-compliance permission policy statements at design time using UML diagrams.

SECTION III.

Methodology

The methodology investigated in this study is an NLP-based automated compliance checking of Android permissions-policy declaration against GDPR. We discuss the proposed methodology by describing the framework, dataset collection and pre-processing, language understanding algorithms for similarity matching and the similarity metric for measuring the distance between the vector representation of the permission policy and the GDPR corpus.

A. Overview of Framework

The methodology investigated in this study is an NLP-based automated compliance checking of Android permissions-policy declaration against GDPR. The proposed approach for checking the runtime and design time GDPR compliance using Android app permissions spans four different tasks. In the first task, we extract and process the text in GDPR using natural language processing algorithms. In the second task, we process the text from the annotated corpus that matches each dangerous android permission to declarations used by over 270 mobile applications. In the third task, we perform the completeness checking of the Android permission declarations against GDPR articles and recitals. In the final task, we extract permission requirements from UML diagrams for GDPR-compliant permission policy generation at design time. In general, our approach enables an implicit compliance checking of the software using the dangerous android permissions declaration and the class diagram against the articles and recitals in the GDPR. Our work concentrates on providing automation for all the tasks. Figure 1 shows the framework for measuring the completeness of dangerous Android permissions declarations in privacy policies against GDPR laws.

FIGURE 1. - Completeness checking of dangerous android permissions declarations in app privacy policies against GDPR.
FIGURE 1.

Completeness checking of dangerous android permissions declarations in app privacy policies against GDPR.

We propose a novel framework that leverages the MPP-270 annotated policy corpus that maps permission and privacy policy snippets of all the 10 dangerous permission categories and every GDPR article and are compared using six NLP algorithms at five textual dimensions to calculate a cosine similarity (CS) results as shown in Figure 1. The five different textual dimensions are represented at the sentence level using SBERT and USE, BERT at the word embedding level, FSM at a pure string level, VSM at a vectorization level and BERT and GloVe vectorizations are applied at the N-Gram level. The DAPD identified with the highest cosine similarity result is extracted for each GDPR article on every algorithm for all the permission categories. As a benchmark dataset for permission completeness, the open-source MPP-270 annotated corpus was developed in [13]. We used the annotated corpus to investigate the GDPR compliance of permission-policy statements. Details regarding the dataset’s development and the human annotation process are available on the project website [14].

B. Dataset

The framework requires two input datasets - the GDPR [1] and MPP-270 [14] corpus. To measure DAPD compliance with GDPR laws, a GDPR corpus with suitable recitals was designed that contained every GDPR article number, title and text in a structured format. This corpus was self-created to be data analytic and includes suitable recitals. The MPP-270 Corpus which is an annotated corpus containing the methodologies that have been used to declare DAPD in 10 permission categories from 270 Android applications was used as a ground truth to match the semantic similarity against the text found in every GDPR article. The annotated policy corpus describes three key pieces of information about the app, namely: i) the app identifier, in this case, the package name ii) its declared dangerous permissions extracted from the app manifest file, and iii) the permission-policy snippets extracted from the app privacy policy for each dangerous android permissions declared in the app manifest file. If the app did not declare the dangerous permission, then the value ’0’ is used in place of a policy text, while ’NOT_FOUND’ means that the VIIIannotators were unable to locate any permission-policy snippets for the declared dangerous permission [14]. These permission categories include CAMERA, MICROPHONE, PHONE_CALL, SENSOR, SMS, CALENDAR, CONTACTS, LOCATION, STORAGE and PERSISTENTID (cf Table 1). The list of permissions considered is consistent with 30 dangerous permission APIs categorized in 10 permission groups in MPP-270 [13], [14], an annotated policy corpus for mapping between dangerous android permissions and privacy.

TABLE 1 30 Dangerous Permission APIs Categorized Into 10 Permission Groups [65]
Table 1- 
30 Dangerous Permission APIs Categorized Into 10 Permission Groups [65]

C. Dataset Preprocessing

The GDPR and the MPP-270 corpus dataset were pre-processed for the N-grams, VSM, FSM and implementation of the BERT word embedding algorithms. The preprocessing steps include removing all stop words and punctuation and applying lemmatization. Lemmatization was applied over stemming for the reason that lemmatization stores more semantic context. Context is important while applying semantic similarity thus applying a form of stemming could cause the reduced words to become ambiguous or incorrect. Numbers were not removed as some articles are included with certain references to laws and directives which is considered an important aspect. For example, if a DAPD directly references a law or directive then the compliance should increase. For the implementation of SBERT and USE algorithms, removing stop words and lemmatization was not applied to maximize effectiveness and improve accuracy. This was because SBERT reads and takes into consideration the words left to right of each scanned word for each sentence to understand the sentence context. The MPP-270 dataset also had additional measures implemented to extract accurate information. For example, any value encountered in a column that was 0 or did not exist was not extracted for analysis and handled accordingly.

D. Semantic Similarity Algorithms

The goal of the semantic similarity algorithms is to extract textual entities at different textual dimensions from the GDPR and MPP-270 annotated policy corpus. The output would take one of these forms - sentences, word embeddings, strings, vectors, and N-grams depending on the encoding methods of the algorithm. We describe the choice and methodology of the six algorithms implemented in the research below.

1) Sentence Embedding

SBERT was implemented by encoding the meaning of the specified sentence with the rest of the index for both DAPD and GDPR laws. The SBERT algorithm implemented the pre-trained model ‘all-mpnet-base-v2’ which is a model trained on 1 billion training pairs of data. SBERT was used as it takes into consideration the semantic context of every word in a sentence [41]. Both USE and SBERT used the cosine similarity function found in the Scipy toolkit for the result. Other sentence embedding techniques such as InferSent and SentEval were considered. However, the results in [41], highlight that an SBERT implementation outperforms InferSent and SentEval. USE on the other hand was implemented to gauge the embedding interpretation derived from using a question-and-answer pre-trained model. Implementing embedding at a sentence dimension allows the identification of areas that fail to conform to aspects of GDPR laws but also identifies the most compliant areas.

The sentence embedding techniques SBERT and USE enable the embedding encoding methodology. The outputs are completely different to each other with SBERT producing a vector embedding representation while USE outputs the results as a tensor object for each sentence. Universal Sentence Encoders have been used in [66] for encoding texts of the GDPR articles and privacy by design principles for automated text similarity tasks. Sentence embedding models have been utilised to detect dangerous Android permissions in-app privacy policies in [18]. The use of USE and SBERT have yielded highly precise annotation in [67] for semantic matching between text associated with privacy controls and user queries.

2) Word Embedding

The BERT word embedding algorithm implements the pre-trained model ‘bert-base-uncased’ which was trained on 110 million parameters of uncased text in the English language [43]. In this implementation, the context of each word is considered for the entire index for both text corpora using a tensor-based approach. BERT takes into consideration the words surrounding each word and contextualizes each word. As a result, two BERT implementations were investigated. The first BERT implementation was created with no major preprocessing techniques such as the removal of punctuation. Stop words were retained to investigate whether the use of stop words increased compliance because of the increased context the algorithm may derive from the overall sequence of tokens. The second implementation of BERT is a preprocessed implementation that uses N-grams. BERT has been used in [58] for understanding and analyzing legal documents and in [30] for extracting compliance requirements from GDPR. The word embedding technique uses BERT as the methodology and uses tokenized encoding. The output is a tensor object created using the sequence of input tokens from the sentence with individually tokenized words.

The transformer architecture [68], which makes use of bidirectional self-attention, is the foundation of BERT. The BERT’s attention mechanism operates on a collection of queries (Q), keys (K), and values (V), each of which is a scaling dot product matrix. The dimension of Q and K is {\text{d}}_{k} , while the dimension of V is {\text{d}}_{v} . The weights on the values are obtained using a softmax function, and the matrix of the result is calculated as follows:\begin{equation*} { \text {Attention }}(Q, K, V)=\mathrm {softmax}\left ({\frac {Q K^{T}}{\sqrt {d_{k}}}}\right) V \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Each Head_{i} trains its attention map using a group of random parameter matrices on the queries, keys, and values since Multi-Head attention comprises several attention layers operating concurrently [68] as shown below:\begin{align*} \mathrm {Multi-Head}(Q, K, V)&=\mathrm {Concat}\left ({\mathrm {Head}_{1}, \ldots, { \text {Head }}_{\mathrm {h}}}\right) W^{O} \\ \quad { \text {where $Head_{i}$}}&=\mathrm {Attention}\left ({Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}}\right) \\{}\tag{2}\end{align*}

View SourceRight-click on figure for MathML and additional features. where the projections W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V},W^{O} are parameter matrices W_{i}^{Q} \in \mathbb {R}^{d_{\text {model }} \times d_{k}}, W_{i}^{K} \in \mathbb {R}^{d_{\text {model }} \times d_{k}}, W_{i}^{V} \in \mathbb {R}^{d_{\text {model }} \times d_{v}} W^{O} \in \mathbb {R}^{h d_{v} \times d_{\text {model}}}

The BERT base model (uncased) adopts Masked Language Modelling (MLM) and Next Sentence Prediction (NSP) as training objectives to learn bidirectional representations. Given a sentence s = (s_{1} , s_{2} , s_{3} ,...s_{n} ), MLM randomly masks 15% tokens and replaces them with a special symbol [MASK]. Let us define T as the set of masked positions, s_{T} as the set of masked tokens, and s_{\backslash T} as the sentence after masking. MLM pre-trains the model \theta by maximizing the following objective:\begin{equation*} \log P\left ({s_{T}\mid s_{\backslash {T}};\theta }\right) \approx \sum _{t \in {T}} \log P\left ({s_{t} \mid s_{\backslash {T}}; \theta }\right). \tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

3) N-Grams

The N-gram algorithm analyses and extracts the most common N-grams of two in every GDPR law and DAPD. The algorithm then identifies the DAPD N-gram with the highest cosine similarity against each GDPR law. The most common and highest cosine similarity N-grams are then embedded using GloVe. The GloVe implementation is trained using the ‘glove.6B.300d.word2vec’ corpus and then semantically compared. The most common N-grams are also compared semantically to both corpora. N-grams were analyzed only at a bi-gram level as the Android permission category SENSOR which is also interpreted occasionally as BODY_SENSORS is the only category represented at a bi-gram level. A calculation to measure the semantic meaning of bi-grams was established in [69] and the benchmark synonymy value of two words was proposed to be 0.8025. Thus, 0.8025 will be the threshold value to conclude if the bi-grams between each corpus are compliant at the N-gram level dimension. GloVe and BERT were used for embedding and obtaining a cosine similarity measurement of the most common N-grams over other word embedding techniques. It was important to implement BERT to compare how a contextualized N-gram implementation compares to a fixed vector interpretation. The BERT results have been converted from a tensor flow numerical representation to a floating point. Implementing N-grams and GloVe embeddings is consistent with state-of-the-art techniques in tasks that involve mapping privacy policies with GDPR laws. N-grams were used in learning the GDPR data protection goals for completeness checking of privacy policies under GDPR in [32]. Pre-trained Glove Word embeddings were used in [28] and [29] for vector-space representations of text in the completeness checking of privacy policies against GDPR. Word embedding models like GloVe, word2vec and fastText were implemented in [70] for measuring the semantic correlation between sensitive Android permissions and app textual descriptions. N-Gram was implemented with three different encoding methods. The standard N-Gram implementation tokenizes the most common N-Grams and outputs the N-Gram as a tuple. The N-Gram BERT and GloVe implementations output the result as an array of vectors and a tensor object respectively.

4) VSM TFIDF

The vector space dimension statistical approach was utilised to describe the semantic similarity between the GDPR and sensitive Android permission-policy snippets using a TFIDF implementation utilising VSM. Shahmirzadi et al. [44] used Vector Space Modelling (VSM) to extract metrics relating to patent-to-patent similarity to evaluate the performance of VSM on a variety of TFIDF variations and text similarity methodologies. According to these findings, the baseline Term Frequency-Inverse Document Frequency (TFIDF) implementation for VSM is an appropriate choice for determining text similarity, while other TFIDF versions were not beneficial. We implemented the algorithm since the findings from [44] demonstrated that TFIDF VSM is suitable for determining textual similarity at the vector level. For the cosine similarity result, the Scikit-Learn pairwise similarity function was used. The VSM technique enables TFIDF and uses tokenized vectors, the output is an array consisting of the term frequency of each tokenized word.

5) Fuzzy String Matching

The FSM algorithm was implemented using the TheFuzz3toolkit to interpret GDPR and DAPD at a pure string dimension. This algorithm can recognise smaller changes to both text similarity, such an algorithm could assist in interpreting semantic similarity. Two FSM variants were chosen to measure text similarity, these include the FSM Set Ratio which finds the ratio of common tokens and calculates a similarity score and the FSM Partial Ratio which is a Levensthtein distance approach in which each word is tokenised with the accumulated common words in both strings for comparison. Partial Ratio was chosen due to its suitability in comparing strings that are not the same length while Set Ratio was chosen due to its flexible detection ability regarding the interpretation of out-of-order words and textual homographs. FSM calculates the score between the two corpora and also compares the score between the most common N-Grams extracted for every GDPR law and the most common N-Gram from the most similar DAPD. Match similarity produced by the Fuzzy String matching technique based on the Levenshtein distance was used in [71] for verifying GDPR compliance based on informed consent and in [72] for analysing the impact of GDPR on website privacy policies.

Other methods for representing textual dimensions in the domain of topic distribution and clustering algorithms were investigated for their suitability. Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA) and Hierarchical Dirichlet Process (HDP) were all experimented with. Jenson-Shannon distance, Wasserstein Distance (WD) and Euclidean distance were used as distance metrics to attempt to find similarities between textual entities in the corpora. However, for these algorithms to produce findings that are dependable, stable, and consistent, a big corpus is required. Since the GDPR corpora are extensive and the MPP-270 corpus is short, this strategy was quickly determined to be inadequate. Table 2 shows the NLP algorithm techniques, methodologies, encoding method and output result.

TABLE 2 Comparing the Different Algorithms Applied for Semantic Textual Similarity Between DAPD and GDPR Compliance
Table 2- 
Comparing the Different Algorithms Applied for Semantic Textual Similarity Between DAPD and GDPR Compliance

E. Cosine Similarity

To measure the results of the USE, SBERT, BERT word embedding, N-gram and VSM algorithms, cosine similarity was adopted to interpret a measurement of similarity between indexes of the two corpora. The choice of cosine similarity measure for computing the statistical similarity between two textual entities is consistent with their effective use in document similarity tasks [73], [74], [75], [76]. Furthermore, the effectiveness of the cosine similarity measure has been validated in completeness checking tasks of mapping privacy policies against GDPR [29], [62], [77]. One of the aims of this study is to find permission-policy snippets that maximize integrability with GDPR compliance through semantic similarity, and cosine similarity seems to be the most suited for the task. Other metrics such as Wasserstein Distance (WD) were considered but this metric assumes the inputs are probability distributions while the algorithm implemented was represented using embedding and vectorization.

The cosine similarity for comparing two vectors is defined as follows:\begin{equation*} \cos ({\mathbf{X}}, {\mathbf{Y}})= \boldsymbol{\frac{ X \cdot {\mathbf{Y}} }{ \|{\mathbf{X}}\| \|{\mathbf{Y}}\|}} = \frac { \sum _{i=1}^{n}\boldsymbol{ X_{i}{\mathbf{Y}}_{i}} }{ \sqrt {\sum _{i=1}^{n}{({\mathbf{X}}_{i})^{2}}} \sqrt {\sum _{i=1}^{n}{({\mathbf{Y}}_{i})^{2}}} } \tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where X and Y are vector representations from the permission-policy declaration and GDPR corpus. A high cosine value indicates that permission declaration in the app privacy policy is closely related to an article in the GDPR and thus a completeness and compliance judgement can be made about the permission.

SECTION IV.

Evaluation

This study aims to answer three key questions and sub-questions that inform the experimental design.

  • RQ1: Does the declaration for sensitive Android permission in the App permission policy contain meaningful and relevant information in line with GDPR articles and recitals about collecting and processing sensitive data?

    • Is the range of sensitive or dangerous Android app permissions supported by the Android ecosystem adequate and sufficient to fulfil GDPR obligations?

    • What is the level of GDPR compliance of DAPD used by developers?

    • Does GDPR use meaningful and relevant language to enhance the completeness checking of Android app permissions?

    • Are the declarations used in DAPD detailed enough?

  • RQ2: How can we generate privacy policies for sensitive dangerous permissions requested by an app from the UML diagram in such a way to clearly and specifically inform users about sensitive data being requested, actions the permissions represent or their semantic meaning in line with data protection laws?

    • Is it feasible to assist mobile app developers in the automated generation of GDPR-compliant permission-policy snippets by extracting permission requirements from UML diagrams at design time?

  • RQ3: Can we adequately conduct compliance by matching GDRP laws with Android permissions categories, APIs and permission-policy declarations?

    • To what extent is it possible to accurately classify dangerous Android permissions with GDPR?

To answer RQ1, experiments were conducted by mapping the permission-policy snippets of dangerous Android permissions for measuring completeness and compliance. To answer RQ2, UML diagrams in the form of XML data or raw PNG files are taken as input for requirements engineering and privacy policy generation for sensitive Android permissions for design time compliance. To answer RQ3, we analyse the results from the runtime analysis using permission-policy snippets in RQ1 and design time analysis using UML diagrams to measure the effectiveness of GDPR compliance at design and runtime using Android permissions.

In presenting the results, we use some terms such as average declaration, cosine similarity average, and highest average identified. The average declaration is a metric calculated for each permission category in which every GDPR article is matched with every DAPD methodology with an average calculated from the resulting cosine similarity, a cosine similarity average is then derived from the resulting cosine similarity average for each GDPR and DAPD comparison. This can be described as an average of averages. An equivalent FSM score is calculated in Table 3. The highest average identified declaration metric on the other hand takes the cosine similarity result for the highest identified DAPD methodologies for each GDPR article in each permission category.

TABLE 3 VSM, USE, SBERT, GloVe N-gram and BERT N-gram Results of DAPD Compliance Against GDPR Laws Showing the Difference Between the Highest and the Average Cosine Similarity
Table 3- 
VSM, USE, SBERT, GloVe N-gram and BERT N-gram Results of DAPD Compliance Against GDPR Laws Showing the Difference Between the Highest and the Average Cosine Similarity

A. RQ1: Completeness Checking of Senstive Android Permissions and GDPR

To answer RQ1, the permission-policy snippets for the dangerous android permission with the highest cosine similarity are extracted for each GDPR law using all the textual similarity algorithms (cf Section III-D) for all permission categories (cf Table 1). The result of the experiments shows the most compliant dangerous Android permission policy declarations to use for each GDPR law. Table 3 shows the results from this experiment, the highest DAPD cosine similarity result for every GDPR law is compared to the average cosine similarity result for every GDPR law to visualize the compliance to GDPR increase when using the correct DAPD methodologies. Table 4 shows the FSM results for the average FSM DAPD compliance to GDPR compared to the highest FSM DAPD methodologies for every GDPR law. It is important to note the difference in scale and sensitivity of the textual similarity score of each algorithm for measuring compliance. For example, a cosine similarity of 0.50 for GloVe might be considered very high based on the nature of vectorization. While a BERT word embedding cosine similarity of 0.60 would be described as low and 0.80 as high based on the contextual nature of BERT and its ability to find similarities of long-distance words.

TABLE 4 FSM Partial and Set Ratio Results of DAPD Compliance Against GDPR Laws Showing the Difference Between the Highest and the Average Cosine Similarity
Table 4- 
FSM Partial and Set Ratio Results of DAPD Compliance Against GDPR Laws Showing the Difference Between the Highest and the Average Cosine Similarity

Table 3 shows low compliance when the cosine similarity results are derived from using VSM TFIDIF. This shows at a vector level the methodology to declare DAPD with GDPR laws does not comply. VSM TFIDF is the equivalent of searching for a word-to-word similarity and uses a term frequency to derive a result on how important certain words are. The compliance is low as certain words contained in GDPR laws are not being used in the DAPD methodologies. The level of compliance is expected to be low at this textual dimension considering the method VSM TFIDF functions, the level of compliance in some increases significantly to the point that in the case of the STORAGE category, the compliance increased by 325% to the highest cosine similarity average of 0.34. The compliance level may be low but this means 34% of the word and the associated term frequency comply between GDPR and using the highest resulting cosine similarity identified DAPD. Such results could indicate that to raise DAPD compliance, a developer could use contextually similar words. Table 4 shows the results from both the FSM algorithms. Using the highest DAPD increases GDPR compliance substantially for both algorithms. Although more reliable results were derived from the pre-processed word embedding technique. The USE results in Table 3 do not give good results although this was expected considering the pre-trained model was trained on a question-and-answer set.

In Table 5, the highest identified cosine similarity relates to the highest similarity value identified between a GDPR law and permission-policy declarations for each dangerous android permission category. The average highest cosine similarity relates to the derived highest cosine similarity result of every DAPD vs GDPR. In contrast, the overall average cosine similarity relates to the average derived result of each DAPD vs a corresponding GDPR law. The main issue identified relates to the average result of methodologies developers use to declare DAPD which are not compliant with GDPR. The contribution of this research is the identification of the most compliant DAPD for a developer to use for each corresponding GDPR law in each dangerous permission category. Analyzing the average cosine similarity results from Table 5, it is found that the CAMERA dangerous permission category has an overall average of 0.70 for DAPD compliance with GDPR, while the highest identified DAPD for every GDPR law in the category averages a cosine similarity score of 0.80. For the MICROPHONE permission category, using the highest identified DAPD methodologies for each GDPR law increases GDPR compliance to 0.81 which is a substantial increase from the overall compliance average of 0.71. This trend for increasing compliance continues for the PHONE_CALL category in which compliance rises from an overall average of 0.72 to 0.77. The SENSOR and SMS categories reveal the smallest increases in DAPD GDPR compliance in which the SENSOR permission category increases from an overall compliance result of 0.70 to 0.73 and the SMS category increases from an overall average result of 0.72 to the highest identified average of 0.75. The compliance for the CALENDAR permission category had an identified increase from 0.67 to 0.72. The SMS, SENSOR and CALENDAR permission categories have the lowest compliance increase, thus suggesting the overall quality of declaring DAPD is not good enough compared to the other dangerous permission categories. The CONTACTS dangerous permission category increases from 0.70 to 0.81 while LOCATION has a substantial increase in compliance from an overall average of 0.73 to the most compliant permissions increasing to 0.85. STORAGE increases in compliance from an overall average of 0.69 to 0.80 with the final dangerous permission category PERSISTENTID increasing from an overall average of 0.73 to 0.83. In some categories, using the identified highest complying DAPD can increase the average compliance for every GDPR law by nearly 20%.

TABLE 5 Comparison of the Cosine Similarity Results Between an Identified DAPD and the Corresponding GDPR Article
Table 5- 
Comparison of the Cosine Similarity Results Between an Identified DAPD and the Corresponding GDPR Article

With the SBERT results in Table 6, it is found that DAPD complies differently with different sentences in GDPR laws. Another contribution of this research is the identification of sections of GDPR sentences that are not covered or reduce DAPD compliance. Not only are the sections that reduce compliance identified, but the best DAPD methodology to comply best to that sentence is identified. Though these results express compliance issues, even the best methodologies that are used do not adequately cover certain sentences in parts of GDPR laws. This could reveal that more in-depth methodologies may be needed to comply with all sentences of GDPR laws. Table 6 represents an example in which two similar sentences in the same GDPR law use the same highest identified DAPD. The first aspect to note is that identifying the highest complying DAPD for each sentence significantly increases compliance with the GDPR law. The average cosine similarity compliance result for the first sentence in Table 6 is 0.29 while using the highest identified complying DAPD increases the compliance with GDPR to a cosine similarity value of 0.62. The second sentence in Table 6 is very similar to the first sentence but has a different context. The DAPD used is the same as the first sentence with the higher level of compliance, this shows that more in-depth details in declaring DAPD are needed for every sentence in the GDPR law to comply. The low compliance value for the second sentence shows that one specified declaration is not adequate to cover the entirety of GDPR laws. DAPD could in theory be mapped to each sentence of a GDPR law to derive the best level of GDPR compliance. The DAPD was not split into sentences to enhance investigation into the parts of GDPR that are lacking compliance when compared to the dangerous declaration permission methodology used. Some declaration methodologies are also too small to compare at the sentence level, for example, ‘name and photo’ are used as a declaration methodology to declare DAPD in the CAMERA category.

TABLE 6 Examples Comparing How Different Sentences in the Same GDPR Law Have Dissimilar Compliance for Similar Parts of the Law
Table 6- 
Examples Comparing How Different Sentences in the Same GDPR Law Have Dissimilar Compliance for Similar Parts of the Law

Also, Table 6 shows two different sentences from the GDPR law Right to Object are shown. Each DAPD methodology has the identified highest cosine similarity score declaration methodology. Each sentence from the same GDPR law has different methodologies to declare the DAPD with each sentence. The most important observation to note is the levels of compliance between the two methodologies and the associated sentences. The first sentence in the ‘Right to Object’ GDPR law has an associated detailed and in-depth DAPD methodology that derives a cosine similarity score of 0.66. The average derived DAPD methodology for this sentence is only 0.31. On the other hand, the second sentence in the ‘Right to Object’ GDPR law in Table 6 has a less in-depth highest scoring DAPD methodology which derives a cosine similarity score of 0.48 while the average derived cosine similarity score is 0.27. This example shows that not all methodologies are equal in quality and depth. This also supports the fact that methodologies to declare DAPD may not have enough variation to comply with all aspects of GDPR laws. Again the results reinforce the increase in GDPR compliance by using the identified highest-scoring cosine similarity DAPD methodologies.

Table 7 shows the identification of the most frequent N-Grams found in the highest complying DAPD and corresponding GDPR law for each permission category. These N-grams are compared to derive a GloVe vectorization and BERT word embedding result to determine the contextual and global vectorization similarity between the N-grams. The dangerous permission categories CAMERA, PHONE_CALL, SMS and LOCATION give exact or close to similar results for both the BERT and GloVe results. For the other categories, BERT’s contextualized and sensitive approach is more clear. For MICROPHONE, GloVe gives a result of -0.07 meaning the results are dissimilar while BERT gives a value of 0.39. This expresses the sensitivity of BERT to at least find some type of connection or context. A result of 0.39 is nearly the equivalent of a dissimilar GloVe result but as the pre-trained data is so large and BERT is far more sensitive than GloVe then the results tend to be inflated. Interestingly, no correlation can be found between the DAPD and GDPR laws found in Table 5 and the results derived from the N-Grams using the same DAPD and GDPR laws in Table 7. Considering the synonymy threshold of 0.8025 which was proposed in [69] as a threshold value to correlate a semantic meaning between bi-grams, only the dangerous permission category LOCATION reaches this threshold between the bi-grams identified in the DAPD and the associated GDPR law. All the other categories fail to meet this threshold, some of these categories have observable similarities such as the dangerous permission category CAMERA with the Android N-gram (‘personal’, ‘data’) and the GDPR N-Gram (‘data’, ‘subject’) with the respective cosine similarity results of 0.81 for GloVe and 0.75 for BERT. This could indicate that the entire context and syntactic structure of the DAPD may be the reason for increasing compliance rather than using similar words and that the context is more important than the similarity of the text.

TABLE 7 The Most Common N-Grams Found in the GDPR and DAPD Identified in Table 5
Table 7- 
The Most Common N-Grams Found in the GDPR and DAPD Identified in Table 5

Based on the analysis using several algorithms, the textual similarity dimension with the highest similarity results was found to be the BERT word embedding implementation with the most accurate variant being the pre-processed implementation. Thus, a more in-depth analysis to compare GDPR and the associated text in the identified highest DAPD was conducted. The use of SBERT directly identifies where compliance is failing between each GDPR law and DAPD.

1) Developer Perspective

Do the sentences used for declaring DAPD in Android created by developers map with relevant and meaningful information in the GDPR articles? Using the highest identified policy methodologies in Table 5, each DAPD and its corresponding GDPR law will be analyzed to investigate whether the mappings are meaningful. In some dangerous permissions, the mappings between the permission policy declaration and corresponding GDPR articles are meaningful. Permission categories with meaningful mappings are PHONE_CALL, SENSOR, LOCATION and PERSISTENT_ID. The similarity amongst these permissions is that they are all mapped with Article 4 GDPR - Definitions. This suggests that these section of the GDPR law implicitly or explicitly refers to the permission categories, sensitive Android APIs, sensitive data requested, actions the permissions represent and the semantic meaning of the permissions. For the SENSOR permission category, there are sections in Article 4 that focus on genetic, biometric data and data concerning health (Article 4(13), Article 4(14), Article 4(15)). For LOCATION permission, location data is mentioned as part of personal and profiling data in sections of Article 4(1) and Article 4(4). Article 4(1) uses online identifier as an example of personal data, which matches well with the permission-policy declaration for PERSISTENT_ID. Further, Article 4 is linked with Recital 30 - Online Identifiers for Profiling and Identification which explicitly mentions “online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags.”,4 which matches with the sensitive data PERSISTENT_ID provides and the PHONE_CALL permission policy description.

There are other cases where contextually, both pieces of data from the permission policy description and the GDPR match but they do not in any way represent the sensitive data requested by the permissions or the actions they represented. Permissions in this category are CAMERA, MICROPHONE and STORAGE. Similarly, CALENDAR, CONTACTS and SMS permission policy descriptions and the corresponding match GDPR article lack contextual similarity and do not represent the actions of the permissions. These findings imply that the use of SBERT for matching permission policy declaration with GDPR articles using cosine similarity shows that completeness checking of dangerous Android permission policy declaration against GDPR is achievable and can be automated.

Why do a large number of DAPD lack GDPR Compliance? While the context among GDPR laws and the highest identified DAPD are similar, some GDPR laws have different aims. From a developer perspective, the issue regarding better compliance among other methodologies for declaring dangerous android permission policy may relate to a lack of context and difficulty targeting certain GDPR laws because there are no appropriate recitals and articles that accurately capture the permission category. As demonstrated in [13], another possible reason for the lack of compliance could be that these permissions, such as SMS, CALENDAR and CONTACTS are difficult to explicitly or implicitly declare in privacy policies even though they have been declared in the app manifest file.

2) Platform Perspective

Is the range of permission categories used in the Android ecosystem sufficient? It is difficult for a developer to comply with every GDPR law based on the limited range of dangerous permission categories. For example, it might be difficult for a developer to comply with articles on ’Territorial Scope’ while declaring the usage of the dangerous permission category which is more focused on complying with articles such as ’Conditions for Consent’? Compliance with specific GDPR laws is more crucial than others. For instance, Article 4 GDPR - Definitions is very important as it defines different kinds of personal sensitive data protected under GDPR. This suggests that the Android ecosystem can develop permissions around these sensitive data that apply to mobile applications, and ensure that the permission-policy description aligns with the provisions in GDPR for the collection, transparency and processing requirements. On the other hand, increasing the number of dangerous permission categories may complicate and confuse the process of declaring compliant DAPD permissions for developers. However, the advantage of expanding the number of permission categories for compliance is that certain categories can be used to target crucial GDPR laws. As proven by the results in Table 5, carefully constructed DAPD can contextually comply with GDPR laws. One solution is that Google creates more dangerous permission categories based upon selections of GDPR laws thus allowing developers to target sections of GDPR. Since permissions on Android aim to support user’s privacy by also protecting access to restricted actions and not just restricted data [3], the definition of the restricted actions can be influenced by Chapter 2 (Art.5-11) Processing and Chapter 3 (Art.12-23) Rights of Data Subject to create the required dangerous permissions. Determining which articles and recitals should be targeted could lead to other compliance issues and misinterpretation. As they neatly map to important categories of personal data in the GDPR, as shown in Table 8 where Y stands for Yes and N for No, we believe that the app permission categories supported by the Android ecosystem are sufficient. The metadata from Storage can be obtained to elicit Location information. Similarly, since Storage is also intended for storing any kind of media including images, then Biometric data can also map with storage. However, the recommendations on focusing on particular articles and recitals relevant to the app ecosystem could be implemented to make it easier for developers to comply with GDPR.

TABLE 8 Mapping Between Article 4 GDPR - Definitions and Permissions Category
Table 8- 
Mapping Between Article 4 GDPR - Definitions and Permissions Category

Is it that the language used for GDPR laws is not explicit enough? From Table 5 the larger articles that cover more scope tend to have higher compliance results. Article 4 GDPR Definitions is of one the articles with the most depth and scope. The average cosine similarity compliance result among every permission category for the highest identified corresponding DAPD is 0.84. This may indicate that GDPR laws that are less explicit and have a larger scope may make it easier for a developer to comply with the GDPR law.

Do the declarations that are used for DAPD need to be longer and more detailed? The results from Table 5 which derives the highest cosine similarity compliance results with the corresponding GDPR laws and Table 7 which details the cosine similarity between N-Grams indicate that the best method to create DAPD methodologies is to structure the methodologies in a similar syntactic and contextual structure rather than using the same words. As per the results from Table 6, the methodologies to declare DAPD do not comply well with all sections of the associated GDPR law. This may indicate that to comply with a high standard, the highest identified DAPD for each sentence may need to be used and conjoined into a longer more detailed declaration. The contribution from the results of the BERT sentence embedding techniques enables this to happen thus each sentence for each GDPR law can have a DAPD with the highest identified compliance. An argument can also be made about mismatches in explicit declarations used in GDPR and the terms used in DAPD policies. For example, Table 5 shows that although contextually the DAPD and associated GDPR law are consistent, the actual aim of the GDPR law is usually completely different. For example, the DAPD in the CAMERA dangerous permission category complies best with the GDPR law ’Right to rectification’. Contextually both the permission and the GDPR law draw similarities but the aim of the GDPR law is different.

B. RQ2: Permission-Policy Generation at Design Time With UML Diagrams

To answer, RQ2, a proof of concept for class relationship UML design time compliance tool for automatic dangerous android permission policy generation was developed. This approach is focused on developers that use UML and is implemented using the results derived from the BERT word embedding since it generated the best results for completeness checking (cf IV-A). The sample class relationship UML image used for the Tesseract OCR Engine5 text extraction component of the design time tool is sourced from a section of a large UML diagram that was used for an actual mobile application. For the XML data input, the entire XML data of the UML diagram used for the image snippet was used as data input. The rationale for using UML diagrams relates to a design time-oriented approach in which GDPR-compliant DAPD can be generated using information during the development process rather than at the end of a development life cycle. Such a method creates a new approach to developing GDPR-compliant applications. This approach saves a developer time, reduces 8error from a developer who isn’t knowledgeable about data protection laws, reduces the likelihood of GDPR DAPD methodologies that are not compliant from being created for the privacy policy, removes costly legal fees associated with privacy policy creation, equips a developer with a tool to streamline DAPD and reveals increased transparency in the compliant methodologies a developer should use to comply to each GDPR law.

Figure 2 describes the framework for the tool where a user (developer) can either upload the UML diagram as an image or XML file and the user is prompted to select a permission category. If the input is an image, the Tesseract OCR engine is used to detect all the words in the image, while all the values associated with the text in the XML would be extracted if the input was XML data. Regardless of the type of input, the extracted text would have developer naming conventions such as camel casing and underscores removed. The result of removing naming conventions leads to the separation of individual words from the original words. Preprocessing for both the input and every GDPR law takes place in which stop word removal and lemmatization is applied and the highest identified DAPD for each GDPR law is loaded. BERT word embedding using the ‘bert-base-uncased’ pre-trained model is then used in which every GDPR law and the UML extracted data is tokenized, encoded, and embedded. Cosine similarity is then used to calculate the syntactic, semantic, and contextual similarity between the UML data and every GDPR law with a defined threshold value indicating the need to retrieve the associated DAPD derived from the highest identified DAPD found for every permission category. The retrieved DAPD is then saved to a text file which a developer can use in a privacy policy. With the use of the Tesseract OCR engine implementation, the solution is not limited to class relationship diagrams but other types of data falling into structural and behavioural diagrams could be used individually or as a collection to generate GDPR-compliant DAPD.

FIGURE 2. - Proof of concept design time class relationship DAPD generator.
FIGURE 2.

Proof of concept design time class relationship DAPD generator.

Table 9 demonstrates the extraction of UML data from an image is transformed into generated permissions when compared to laws that reach a user-defined contextual similarity threshold. All three laws identified generate the permission most contextually similar based on the inputted UML data. The threshold value used in such an example was 0.09 meaning each law would need an approximate contextual similarity of around 10% to automatically generate the DAPD. Another use of the tool is that once developers upload their UML, the tool automatically scans the UML against the dangerous android permission categories, and produces as an output, the dangerous android permissions that the application requires based on the UML and also generates an optimal permission policy description based on the MPP-270 corpus that complies with GDPR. Based on these results, developers can target specific articles of GDPR of interest for compliance, and also specify specific thresholds based on business needs of their requirements engineering. With the proof of concept in Figure 2 and the results in Table 9, it is possible to automate privacy policy generation for dangerous android permissions from UML diagrams. Originally a classification model was planned but the NLP approach was more suitable due to the small MPP-270 corpus dataset.

TABLE 9 Demonstration of How UML Data Can be Identified With GDPR Laws Using BERT to Generate Permission Declarations (Threshold Value = 0.09)
Table 9- 
Demonstration of How UML Data Can be Identified With GDPR Laws Using BERT to Generate Permission Declarations (Threshold Value = 0.09)

C. RQ3: Suitability of GDPR for Completeness Checking of Android Application Permission Policies

We have demonstrated the possibility of inferring some dangerous android permissions from GDPR articles such as PHONE_CALL, PERSISTENT_ID, SENSOR, LOCATION by measuring compliance derived from textual similarity algorithms. With the results from runtime analysis (cf Section IV-A), the highest compliant DAPD was found to increase DAPD compliance to GDPR by 12% using BERT. With design time (cf Section IV-B), GDPR laws can be matched with text extracted from images or XML input of UML diagrams. We also combined the framework for the runtime and design time analysis to design an automated tool that generates GDPR-compliant permission-policy snippets for permission requirements inferred from the UML information.

To further corroborate the results from Table 5, we investigated whether the permission categories, sensitive Android API usage, the sensitive data they request, the actions these permissions represent, or their semantic meaning are implicitly or explicitly declared in the matched GDPR. As shown in Table 10, we denoted the result as NM - Not Mentioned, IM - Implicitly mentioned and EM - Explicitly Mentioned, which shows that some of the permissions in the Android ecosystem can be inferred and categorised from GDPR articles and recitals.

TABLE 10 Dangerous Permission Occurrence in Matched GDPR Articles
Table 10- 
Dangerous Permission Occurrence in Matched GDPR Articles

We argue that the GDPR is adequate for sensitive Android permission declaration completeness, as it includes Android permission policy or implicitly describe sensitive user data collection and processing. In some permission categories, the permission policy snippets can be matched explicitly with GDPR articles and recitals, and in some scenarios, the permission category is only implicitly covered in the GDPR. There are some reasons for the implicit matching in some permission categories. Firstly, some permission-relevant information from the MPP-270 could have explicitly matched GDPR articles and recitals if they contained relevant information about sensitive data collection. For example, Article 4 GDPR - Definitions provides permission policy information for biometric data which includes facial images and dactyloscopic data, which should have been directly mapped to the CAMERA permission category. However, due to the quality of the information provided in the MPP-270 for permission category, the policy information matched with Article 16 GDPR - Right to Rectification, which does not describe the permission or the actions it represents (cf Table 5).

Another reason for some of the Not Mentioned or Implicitly Mentioned cases in Table 10 is the language used by GDPR and Google in defining personal and sensitive user data. Table 11 shows the definition of personal and sensitive user data by Google6 and GDPR (cf Art. 4 GDPR - Definitions, Art.9 GDPR - Processing of special categories of personal data). Voice is considered as personal data under GDPR, because it is information relating to an identified or identifiable natural person, and in some cases, voice recordings may constitute biometric information under GDPR. While Google uses clear and direct data for voice data defining the microphone as sensitive user data, it is lumped under PII or biometric data under GDPR, which is ambiguous and generic. As a result, developers might find it easier to write permission policy snippets using languages that comply with Google Play Developer Programme Policies than GDPR. Another case in point is the CALENDAR permission category which allows an app to read, share and save a user’s calendar data. This also falls under personal data because it is personal information stored on the user’s contact card and it could contain PII, however, this permission and the action it represents is not explicitly covered in the GDPR. Table 11 and Table 10 further shows that all the categories of personal and sensitive user data are covered in the GDPR, hence, we can adequately conduct compliance by matching GDRP articles with Android permissions categories, APIs and permission-policy declarations.

TABLE 11 Meaning of Personal Data
Table 11- 
Meaning of Personal Data

To further argue that GDPR is suitable for completeness and compliance checking using Android permissions, we align Google Privacy and Terms of Service (ToS) with GDPR to investigate the similarities between sections in Google Privacy & Terms matches with articles in the GDPR. Since the Android operating system which supports app permissions investigated in this study is a platform owned by Google, investigating the completeness of Google Privacy & Terms against GDPR will provide additional insights into the suitability of GDPR. There are 16 and 11 sections respectively in the Google Privacy Policy and ToS respectively. To achieve this goal, BERT embeddings were used to match the different sections in the Google terms of service and privacy policies to GDPR. The closest matching GDPR articles are then identified using the articles with the highest cosine similarity. Table 12 shows the results of the analysis of Google TOS and privacy policies against GDPR. An interesting insight is that contextually, the different sections of Google TOS and privacy policies all match with articles in the GDPR with an average cosine similarity value of 0.83, except the section on Updates in the ToS that matches with Repeal of Directive 95/46/EC. This may also prove how contextually structuring a permission-policy declaration would yield higher results as Google has contextually structured the majority of their TOS and privacy policies towards the GDPR article 4 Definitions, which is a key section of the GDPR that discusses the general provision of the regulation. We can therefore conclude that the GDPR is suitable for performing completeness for Google Privacy Policy and Terms of Service, which can be cascaded down to the Google platform such as the Android operating system that supports app permissions.

TABLE 12 Completeness Checking of Google Privacy Policy and Terms of Service Against GDPR
Table 12- 
Completeness Checking of Google Privacy Policy and Terms of Service Against GDPR

SECTION V.

Limitation and Future Work

One of the limitations of the research is the examined annotated policy corpus. The systematic mapping between the app privacy policy and Android permissions was done by manually annotating 270 Android application privacy policies. The apps were selected based on popularity measured by number of downloads and user ratings. Firstly, there are currently over 2.65 million apps and games in the Google Play Store7 and 270 apps are not a representative of the app distribution. Additionally, apps (including games) on the Google Play store fall into 49 categories,8 however, the top 270 apps used in the corpus only covered 13 app categories. The annotated policy corpus for mapping between permission and privacy considered 30 dangerous permission APIs, however, there are 42 dangerous permission APIs on the official Android API documentation.9 This means that the coverage of the permission-policy snippet analysis for compliance was not investigated for some permission groups that are not part of the 10 considered dangerous permission categories or permissions added in newer versions of API releases. Finally, paid apps were not part of the selected apps for policy annotation. The implication of these selection biases is that permission policy behaviours might vary between apps and games, popular and non-popular apps, paid and free apps, and evaluated app categories vs non-evaluated app categories. The transparency of app privacy policies used in creating the gold standard dataset could be biased towards the selection criteria which are not a true representation of the app market. However, we argue that this limitation does not affect the findings in this research we focused on investigating the suitability of GDPR for completeness checking of permission-policy declaration. Since the corpus depends on human annotators to find permission-policy snippets in the app privacy policy for declared permission in the permission manifest file, this means that the corpus is highly subjective in interpreting privacy policies for permission transparency. This is due to the nature of privacy policy being ambiguous and subject to multiple interpretations, even among privacy and legal experts [78], [79], [80].

The semantic relationship of textual description bi-grams using GloVe, word2vec and Fasttext were investigated in [70], and the results revealed inaccuracies in the way each algorithm matches semantic and context-driven disambiguation between entities. Other findings suggested that word embedding techniques struggled in cases to produce an accurate result for words depicting similar meanings, which reveals the limitation of the technique to understand the same context that a human would interpret in certain bi-grams. Such an issue may have affected the performance of the GDPR completeness checking approach for dangerous android permission policy declaration as the word embedding techniques used may have at some point misinterpreted the semantic relationship with other words in the sentence transformer techniques or the N-Gram driven experiments. As the UML design time compliance tool is a proof of concept idea, the tool focuses solely on developers that use class relationship diagrams with UML in the software development cycle. This could alienate a proportion of developers who do not use UML during development or have a UML class relationship diagram. For permissions requirement engineering, different sources of design time elements beyond UML, such as UI textual descriptions can be leveraged. UI textual descriptions have been employed in [70] for the semantical resolution of permission request patterns in Android apps. The texts may also describe access to restricted data or sensitive action. For example, a UI text field can have a description like “Upload supporting files”, “Take a photo” “Start recording an audio message”, which are all accessing private user data or sensitive actions such as STORAGE, CAMERA and MICROPHONE protected by permissions. Regardless of the source of the design time information, whether they are UML diagrams or UI textual descriptions, we have demonstrated the relevance of our approach in automated permission policy generation. We have shown the utility of our method in automated permission policy generation, regardless of the design time element, whether they are UML diagrams or UI textual descriptions.

The solution could also be extended to other permission-declaring files such as iOS applications, browser extensions etc. A similar analysis for permission compliance could be investigated for other GDPR-like laws such as the California Consumer Privacy Act (CCPA), and Payment Card Industry Data Security Standard (PCI DSS). Concerning GDPR, an expanded empirical analysis could be conducted by implementing more textual representations thus expanding the scope past textual dimensions. The measurement of textual similarity was mapped in [81] with textual distance and representation highlighting the many combinations that can be used both textually and numerically to derive results for an enhanced conclusion regarding DAPD-GDPR compliance. The development of an application generating compliant and contextualized DAPD using machine learning based on the information from the UML could be investigated as this approach would require large amounts of data of DAPD to generate a compliant level of contextualized declarations which are unique for each UML application. The UML design time tool concept could be extended through the incorporation of a browser extension plug-in in which the tool scans the DAPD in a privacy policy and detects inadequate DAPD. Developers of applications could then be alerted if such declarations fall below a compliance threshold. This idea could be deployed by the Google Play Store as part of the approval process for users uploading applications in which the privacy policy has to have a compliant DAPD. This would require a substantial amount of training data which is not yet available. The UML design time tool concept could be expanded to include other structural and behavioural UML diagram components such as flowcharts, entity relationship databases, and sequence and activity diagrams. Collecting information in these diagrams could assist in creating a more targeted and compliant DAPD. Another future direction is investigating other pre-trained models for language understanding such as MPNet [82], which combines masked and permuted language modelling.

SECTION VI.

Conclusion

This paper investigates runtime and design time GDPR completeness checking using dangerous Android permissions. For runtime analysis, completeness checking was done by representing the permission policy declaration for each permission category requested in the app privacy policy. For design time analysis, UML class diagrams were utilized to extract permission requirements from the class elements and generate a permission-policy declaration that is GDPR-compliant. Through the results, we demonstrate the most compliant permission policy declarations for each permission category. As previously highlighted, developers lack the legal knowledge to develop compliant permission policy declarations. This paper contributes to the state-of-the-art by developing a tool to equip developers with apparatus to automatically generate compliant DAPD methodologies to GDPR and avoid non-compliant DAPD, this uses design time requirements without developer legal knowledge. We also demonstrated that the completeness of permission policy with GDPR articles could be substantially improved by applying a similar contextual structure to a targeted GDPR law rather than allocating the exact words in the DAPD. Other state-of-the-art solutions focus on generating requirements or taking already created privacy policies for textual analysis. This project combines NLP with semantic similarity to automatically generate compliant DAPD based on requirements using UML class diagrams.

One area of future work we are keen on exploring is the usability analysis of the proposed UML tool. Since the goal is to help actual developers with privacy policy generation and requirements elicitation with GDPR-compliant permission declaration using UML diagrams, a usability evaluation would help in measuring the extent to which learning and using that tool to achieve compliance goals, especially with their permission declaring systems such as browser extensions, mobile apps etc. The user’s satisfaction with the usability evaluation process will serve as feedback into the tool development process to improve its effectiveness, efficiency, flexibility and robustness. This usable study contributes to building compliance tools that are developer-friendly and developer-centric. Another area of future work involves creating a larger benchmark annotated policy corpus for permission completeness. In this study, we leveraged MPP-270 which creates a mapping between permission requested (declared in the app manifest file) and permission-relevant information in the app privacy policies, created by manually annotating 270 Android application policies. With a large annotated corpus, a classification model built on machine learning algorithms could be integrated into our solution.

References

References is not available for this document.