The Effect of Text Ambiguity on creating Policy Knowledge Graphs | IEEE Conference Publication | IEEE Xplore

The Effect of Text Ambiguity on creating Policy Knowledge Graphs


Abstract:

A growing number of web and cloud-based products and services rely on data sharing between consumers, service providers, and their subsidiaries and third parties. There i...Show More

Abstract:

A growing number of web and cloud-based products and services rely on data sharing between consumers, service providers, and their subsidiaries and third parties. There is a growing concern around the security and privacy of data in such large-scale shared architectures. Most organizations have a human-written privacy policy that discloses all the ways that data is shared, stored, and used. The organizational privacy policies must also be compliant with government and administrative regulations. This raises a major challenge for providers as they try to launch new services. Thus they are moving towards a system of automatic policy maintenance and regulatory compliance. This requires extracting policy from text documents and representing it in a semi-structured, machine-processable framework. The most popular method to this end is extracting policy information into a Knowledge Graph (KG). There exists a significant body of work that converts text descriptions of regulations into policies expressed in languages such as OWL and XACML and is grounded in the control-based schema by using NLP approaches. In this paper, we show that the NLP-based approaches to extract knowledge from written policy documents and representing them in enforceable Knowledge Graphs fail when the text policies are ambiguous. Ambiguity can arise from lack of clarity, misuse of syntax, and/or the use of complex language. We describe a system to extract features from a policy document that affect its ambiguity and classify the documents based on the level of ambiguity present. We validate this approach using human annotators. We show that a large number of documents in a popular privacy policy corpus (OPP-115) are ambiguous. This affects the ability to automatically monitor privacy policies. We show that for policies that are more ambiguous according to our proposed measure, NLP-based text segment classifiers are less accurate.
Date of Conference: 30 September 2021 - 03 October 2021
Date Added to IEEE Xplore: 22 December 2021
ISBN Information:
Conference Location: New York City, NY, USA

I. Introduction

Data sharing between consumers, service providers, and their subsidiaries and third parties are increasingly common these days [12]. Cloud-hosted services provide a low-maintenance alternative to hosting in-house technology. The cloud service architecture is built on a model of shared resources where there is a continuous flow of data. The resulting potential for inappropriate dissemination and usage of a given consumer’s private information has raised concern among the public [3], prompting the creation of a plethora of data protection regulations like the Payment Card Industry Data Security Standard (PCI DSS) [11], the European Union’s General Data Protection Regulation (GDPR) [1], and the Children’s Online Privacy Protection Act (COPPA) [10]. A key element of this process is the extraction of these policies from text and representing them as Knowledge Graphs [18], [22], [23].

Contact IEEE to Subscribe

References

References is not available for this document.