An Integrated Knowledge Graph to Automate Cloud Data Compliance

To address data protection concerns, authorities and standards bodies worldwide have released a plethora of regulations, guidelines, and software controls to be applied to Cloud data. As a result, service providers maintaining their end-user’s private attributes have seen a surge in compliance requirements. Since most of these regulations are not available in a machine-processable format, it requires significant manual effort to adhere to them. Often many of the laws have overlapping rules, but as they are not referencing each other, providers must duplicate efforts to comply with each regulation. We have done a detailed study of all the data protection regulations that apply to Cloud data. We have developed an integrated, semantically rich knowledge graph that captures these various data compliance regulations. It includes the data threats and security controls that are needed to mitigate the risks. In this paper, we present this knowledge graph in detail, along with the system that we have developed to evaluate it. We have validated our knowledge graph against the privacy policies of various Cloud service providers like Amazon, Google, IBM, and Rackspace. This knowledge graph is available in the public domain and can be used by organizations to automate their compliance processes and set their enterprise Cloud security policies.


I. INTRODUCTION
Cloud Services are increasingly maintaining their consumer's confidential attributes, like personal, browsing patterns, and financial payment details, to facilitate seamless user experience. A significant portion of this consumer data is often shared by the Cloud service providers with their subsidiaries and third parties for further analysis to ensure customer retention and increase their purchase volume. Hence, even though Cloud-based services provide cost savings and rapid provisioning/scaling, privacy and security of Cloud data remain a concern for most consumers [42]. Because of this surge in sensitive information on the Cloud, regulatory organizations world over are formulating data protection legislation, like the European Union's General Data Protection Regulation (EU GDPR) [63] and Payment Card Industry Data Security Standard (PCI DSS) [64], etc. Cloud service providers must adhere to that. Simultaneously, various security standards The associate editor coordinating the review of this manuscript and approving it for publication was Longxiang Gao .
for Cloud data have been proposed, or are being developed, by standard organizations like Cloud Security Alliance (CSA) [51], International Organization for Standards (ISO) [52], and National Institute for Standards and Technology (NIST) [6]. Cloud providers are incorporating these regulations and standards in their solutions to make their system robust and acceptable to consumers. This spurt in data protection regulations and security standards has resulted in overwhelming legal compliance challenges of Cloud services, and businesses often fixate on a single tree or branch in the forest of laws, regulations, standards, and seldom step back to gain an overall view of the compliance forest [42].
Data protection regulations are currently not machineprocessable and are available only in a textual format requiring significant manual effort to parse their rules and constraints. Therefore, it is nearly impossible to determine in real-time if a compliance violation has occurred. Another issue is that data protection policies often contain legalese jargon that requires expert interpretation resulting in increased compliance costs. Real-time tracking of data flow VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ on the Cloud would ensure that any operation performed on consumer data, from the acquisition of the data to its manipulation or sharing to its end-state archival in an organization, can be verified and documented for future audits. We envision that an integrated, semantically rich, machineprocessable knowledge graph (or ontology) that captures the various data compliance regulations, as they apply to Cloud data, will significantly help in automating an organization's data compliance processes. In addition to saving organizational resources dedicated to compliance adherence, it will also help in proactively identifying data breaches. Another advantage of building this integrated knowledge graph will be that potential contradictory policies in the organization can be identified and rectified as needed.
As a first step towards this vision of a holistic data compliance knowledge graph (or Ontology), we have created a semantically rich knowledge graph to capture the various compliance regulations, potential data threats with corresponding CSA controls [44]. We have also developed a comprehensive representation of the rules encapsulated in PCI DSS and GDPR [44]. We used Semantic Web technologies, Natural Language Processing (NLP), and text mining techniques to create this ontology, which is machineprocessable. Hence, it can also contribute significantly to automating the continuous monitoring of data operation, transfer, and sharing. In this paper, we describe this knowledge graph in detail, along with the methodology we have used to build it. We have validated this Knowledge Graph against the data policies of five key vendors. This Knowledge Graph that is available in the public domain [85], [86] can be used to automate data protection compliance in an organization significantly.
We conducted a comprehensive study of the various compliance models and security controls that apply to Cloud-based services. We also reviewed the potential threats faced by Cloud consumers and determined the compliance models and security controls that should be in place to manage these risks [61]. For our study, we analyzed more than 20 compliance models for Cloud computing as well as for IT management. We also reviewed more than 100 Cloud providers for their security standards by examining the security-related whitepapers posted on their websites.
In this paper, we first discuss the related work in section II. In section III, we present our analysis of the various Cloud security compliance models and classify them according to their security domains. The semantic web ontology for Cloud security compliances and security standards are described in section IV. In section V, we describe results & validation. We conclude in section VI and define the future work planned.

II. RELATED WORK
A. CLOUD DATA COMPLIANCE Data protection standards contain a set of rules or policies formulated by regulatory agencies or standards organizations [58]. Security and privacy compliance models, like ISO 27001, COBIT, etc., have been proposed for Cloud computing security to ensure data protection and user privacy. We have analyzed and categorized the various Cloud compliance models according to security controls implemented. The features of each compliance model relevant to Cloud security are discussed in section III.
Cloud security [56] mainly focuses on the policies and controls used to protect the data present in the Cloud. Both Cloud providers and consumers can face security issues. Cloud providers should ensure that consumers understand data protection requirements while using their services. To enforce security, Cloud providers implement various security controls, which can be categorized as Deterrent, Preventive, Detective, and Corrective controls [57].
While the Cloud services and deployment models have been classified into different types, the security controls they use to protect their environment is the same for all -SaaS, PaaS, and IaaS -service types. Compliance models are applied based on security controls. We have to synchronize these models to ensure adequate Cloud security.
The IT compliance model focuses on electronic data processing, network, and IT infrastructure. The compliance model implements some rules and regulations across the various components of IT to make them work harmoniously. The security model is adopted based on these compliance models. One of our key contributions has been to associate the various compliance models and security controls. This transparency amongst the Cloud model, security control model and the compliance model will help the end-users achieve the data protection in a better way.
Before adopting a Cloud service, consumers should consider all potential threats that might compromise their data. CSA [2] lists threats like data breaches, data loss, account or service hijacking, insecure interfaces and APIs, denial of service, malicious insiders, abuse of Cloud services, insufficient due diligence & shared technology vulnerabilities. Cloud providers understand the importance of these persistent issues and have implemented various security standards. Vendors like Amazon [3], Rackspace [4], and Google [5] specify the security standards that they have incorporated on their platform. According to Spamina [37], there are more than 800 Cloud providers available all over the world. The question is, how many of them are using Cloud security standards and are capable of fighting potential threats [6], [2].
In [49], security issues of different Cloud services are defined. It is also mentioned that Cloud providers should mention security issues in their SLA (service level agreements). This will give a clear idea to Cloud consumers about Cloud security issues. In 2013, CSA published CCM v3 (Cloud control matrix version 3) [14], which consists of more than 135 security controls and related compliance models. ISO 27001:2013 document [21], [7]. consists of 114 security controls in 14 different groups. However, it does not have security controls like data encryption and media protection. NIST 800-53 [25] presented its list of security controls with 18 groups. DoD (Department of Defense) has also published a list of eight information assurance areas and controls. There is a need for identifying common security controls that are easy to comprehend by consumers, and our prototype system attempts to do just that.

B. SEMANTIC WEB ONTOLOGY
The semantic web is a representation of the World Wide Web by providing standards to express relationships between web information and deals primarily with data instead of documents. It enables data to be annotated with machine-understandable meta-data, allowing the automation of their retrieval and their usage of incorrect contexts [1], [45].
Semantic Web technologies include languages such as Resource Description Framework (RDF) and Web Ontology Language (OWL) for defining ontologies and describing meta-data using these ontologies as well as tools for reasoning over these descriptions [1], [17], [83], [84]. These technologies can be used to provide standard semantics of privacy information and policies enabling all agents who understand basic Semantic Web technologies to communicate and use each other's data and Services effectively [1], [17] [83], [84].

C. TEXT EXTRACTION
Researchers have used and applied Natural Language Processing technique to extract relevant information from the vast corpus of text documents. In the research, Rusu et al. [10] the authors suggested the technique to extract the information and relevant phrases in the form of subject-predicateobject triples. To do so, Parse Trees were generated from English sentences, and triples were extracted from the parse trees [17], [10]. In the research work of Etzioni et al. [11], the author developed the KNOWITALL system, which helped in the automation of extracting extensive collections of facts from the web in an unsupervised, domainindependent, and scalable manner [17]. The author used the approach of Pattern Learning to address this challenge [17]. In another research, other necessary NLP technique approach applied for information extraction from unstructured text is 'Noun Phrase Extraction' [71]. Author Rusu et al. [10] showed the technique of creating triplets by considering 'Noun Phrases' obtained via various part-of-speech taggers. Different automated techniques have been used for extracting the permissions and obligations from legal documents [17]. Techniques such as text mining and semantic techniques have been explored and applied by various authors in the past [17], [24], [25]. In the research work of Kagal and Finin [19], [22], the authors proposed an ontology-based policy framework to model conversation specifications and policies using obligations and permissions [17], [19], [22].

III. DATA PROTECTION REGULATIONS
As a first step towards building our integrated data compliance ontology, we did a detailed study of various security and privacy regulations and guidelines that apply to data managed by Cloud services. Figure 1 illustrates our high-level reference model for Cloud data security that we used to build our methodology. In this section, we list the key Cloud compliance standards along with the security controls that are needed for these regulations. In our prior work [12], we have analyzed the critical security threats faced by Cloud services consumers and related them to the security controls and compliance models that protect from these threats.
The following are the critical security controls that affect Cloud security. We have referenced the NIST and CSA security documents [28], [14], [54]. We also co-relate them with security standards based on the description of controls.

A. DATA ENCRYPTION, KEY MANAGEMENT
Data encryption is necessary to provide data confidentiality and integrity. Encryption/decryption key management also allows users to access authorized data securely. Data encryption includes application encryption and network encryption. The compliance model for data encryption should be capable of preventing accidental exposure and misuse of the data in public domains. After analyzing several security standards, we found that data encryption standards like FIPS 140-2 and Vaultive fulfill these requirements. CSA guide [14] suggests avoiding old security standards like DES (Data Encryption standards). Key management is also an essential aspect of data encryptions. Key management can also be done using KEK (Key encrypting keys) [40].

B. MEDIA PROTECTION
Media protection includes the protection of entertainment content like music, movies, and software [61]. It is the responsibility of Cloud providers to protect the entertainment content of users from piracy [61]. It may contain pre-release material from creative arts to the software industry. Strong compliance models should be adhered to, and legal action should be taken against the attackers. If the media protection security control model is implemented correctly, more consumers will store the data in the Cloud. The MPAA compliance model is specially designed for media protection. VOLUME 8, 2020

C. IDENTIFICATION, AUTHENTICATION, AND AUTHORIZATION
Identification not only consists of user identification but also device and resource identification. Multi-tenancy requires that consumers share common resources in the public domain. The identification of correct resources to authorized users is an essential aspect of this security control [61]. After identification, authentication of users also plays a crucial part in this model. The users should be identified by key management and passwords. Cloud providers should also provide access controls to users so that they can give rights to other authorized users [61]. This is called authorization. Cloud providers should apply the compliance model that manages these three tasks. This will not only enforce data security but will also help to implement other security control models more effectively. Compliance models like Oauth and NIST 800-63 provide guidelines for valid authentication, identification, and authorization of the Cloud system.

D. VIRTUALIZATION AND RESOURCE ABSTRACTION
Virtualization in the Cloud can be used to achieve higher density through multi-tenancy and resource utilization, which makes the organization more efficient. Virtualization and Resource abstraction control models (mainly technology, architecture, and service models) should focus on new tools and techniques to improve visibility for security operators. Virtualization brings more specific Cloud security issues like inter-virtual machine attacks, hypervisor security, etc. It is recommended that a virtual machine setup should also include firewall implementation. PCI-DSS standard is not only focused on the payment card industry, but it also supports hypervisor security implementation.

E. PORTABILITY AND INTEROPERABILITY
Various components in the Cloud system working together for higher performance are called Interoperability. Interoperability is achieved by creating standards for application interfaces (APIs) for collaborating with all the components. Different platforms have different APIs, so there should be some standard, which will make systems interoperable with each other. It is advised that OCCI (Open Cloud computing interface) [14], libCloud should be applied whenever possible. Portability is reusing the components of the Cloud system. Portability decreases the production cost. However, we have to make sure there should be some mechanism, which will reuse the component between different systems, but data is secured. The security standards implemented on the Cloud system should enable information sharing amongst the other system. Otherwise, it will increase additional expense and reengineering.

F. APPLICATION SECURITY
Application security is the overall security of the applications running on the Cloud. If we want to achieve application security, we have to take care of the following processes -secured SDLC (software development lifecycle), authentication, and authorization. Secure SDLC can be achieved if we implement the maturity models like system security engineering capability maturity models (SSE-CMM). Application security controls should implement and validate controls for validation and authentication.

G. SECURITY RISK ASSESSMENT AND MANAGEMENT
When security is added to the Cloud, the risk factors should also be considered. Cloud computing allows the sharing of resources across all the consumers at a low cost. However, Cloud providers should implement the authorization and risk assessment for utilizing shared resources. FedRAMP is a compliance model, which provides guidelines for risk assessment and management. It also differentiates between the shared authorization model and the system-centric authorization model.

H. PRIVACY, ELECTRONIC DISCOVERY, AND OTHER LEGAL ISSUES
Privacy and electronic discovery focuses on managing the physical location of data and also accessing confidentially. It also implements privacy and confidentiality policies to ensure compliance. For this security control documents, terms of services and privacy policies should be reviewed. EDRM-PSRRM compliance model provides security and risk reduction models for privacy and e-discovery.

I. CONTINGENCY PLANNING
It is the Cloud consumers' responsibility to understand the Cloud provider's contingency plans and Service Level Agreements (SLAs) to make sure that Cloud providers meet all the requirements. According to NIST 800-34, steps for contingency plan are development of statement, conduct business impact analysis, identify preventive controls, create strategies, develop a contingency plan, ensure testing, and plan for maintenance.

J. DATACENTER OPERATIONS, MAINTENANCE
Security controls should also have standards for maintaining data centers. Maintenance of data centers includes configuration and personnel security with a background check to enter secured data center location, physical privacy of data center, and authentication [61].

K. INCIDENT RESPONSE
Cloud providers should develop a response plan in case of any incident like data breaches, data loss, etc. Computer forensics has some different tools and techniques for incident response. The incident response lifecycle consists of the following phases -Preparation for the incident, detection, and analysis of incidents, data sources, forensics, and other investigation support for incident analysis and recovery from the incident [61].

L. COMPLIANCE, AUDIT, AND ACCOUNTABILITY
Cloud computing environments are dynamic and bring new opportunities for additional audit capabilities. These policies require the implementation of robust evaluation criteria. After implementing the compliances, regular audits should be conducted to ensure data security.

M. AWARENESS AND TRAINING
Cloud awareness and training program should be for those consumers who want to migrate their data to the Cloud but not aware of all the threats and security controls. Cloud providers should develop a response plan in case of any incident like data breaches, data loss, etc. Computer forensics has some different tools and techniques for incident response [61]. The incident response lifecycle consists of the following phases -Preparation for the incident, detection, and analysis of incidents, data sources, forensics, and other investigation support for incident analysis and recovery from the incident.

N. COMPLIANCE, AUDIT, AND ACCOUNTABILITY
Cloud computing environments are dynamic and bring new opportunities for additional audit capabilities. These policies require the implementation of robust evaluation criteria. After implementing the compliances, regular audits should be conducted to ensure data security.

O. AWARENESS AND TRAINING
Cloud awareness and training program should be for those consumers who want to migrate their data to the Cloud but not aware of all the threats and security controls. Based on the security controls definition provided by NIST [28] and CSA [14], we try to relate the security compliance laws to the security controls. In Table 1, the security controls supported by NIST or CSA are listed, followed by the recommended Cloud compliance regulations.

IV. COMPLIANCE KNOWLEDGE GRAPH
In this section, we describe our methodology in detail. We aim to present a rich policy-based knowledge representation of the data compliance regulations with the corresponding CSA controls. Figure 4 illustrates the integrated high-level ontology. The three phases of our methodology are:

A. PREPROCESSING STAGE
For the regulations, we extracted relevant chapters and key terms and then mapped them with corresponding CSA controls. In the first stage of our system, we extracted the repository & checklist of GDPR [44] and PCI DSS [38], respectively. In our previous work [38] [44], we extracted the relevant key terms from the PCI DSS documents & GDPR and built the knowledge graph accordingly. In the preprocessing stage as part of previous work [44], we extracted chapters 3 and 4 of the GDPR regulation, which are for Consumers and Providers. Like mentioned as part of previous research did [38] [44], we have obtained the key terms which are shown below in the respective Table 2 & Table 3:

B. KNOWLEDGE GRAPH/ONTOLOGY DEVELOPMENT
We have developed a comprehensive Data Compliance ontology (Figure 4) that integrates the knowledge representation of VOLUME 8, 2020 various Cloud regulations. For creating the knowledge graph, we utilized the Protégé toolset.
The main classes include- • The Stakeholder class is the main class that represents the key organizations that are affected by the regulations. This class has three main subclasses. These are Consumers, Providers, and Regulators. The Consumer class represents the data users and includes properties of endusers. The Provider class represents the data providers and includes properties of providing organization Cloud policies. The Regulators class represents the regulatory bodies and includes all the details of the council.
• Regulations class captures details of the regulation, including its name, description, scope, and country of the regulation. The regulations class is associated with one or more stakeholders. These individual regulations are then captured by different sub-classes. We have also integrated the knowledge graphs that we have already developed for various regulations like GDPR [49], PCI-DSS [38], HIPPA, with this ontology. As part of our ongoing work, we are developing knowledge graphs for other regulations.
• Regulations class is associated with Cloud Security Controls and Cloud Threats classes.
• Cloud Security Control: This class represents the security controls recommended by the Cloud Security Alliance. In this paper, we have related all the regulations that are associated with Regulations to Cloud security controls class.
• Cloud Threats: The purpose of this class is to associate various Cloud threats to appropriate regulation from the Regulations class. This captures the threat name and description as properties.
Below are the example statements from the privacy policies for the key term ''controller''.
''Microsoft: Identified which Microsoft entities are data controllers under the GDPR, how to contact us, and how to lodge a complaint'' [43] ''WhatsApp: Partners (the data controllers) may submit personal information about their customers to WhatsApp using WhatsApp's Business Products.'' [57] ''Google: Additionally, for products where Google and the customer each act as independent controllers of personal data, we have updated our agreements or made available terms that reflect that status.'' [48] ''Facebook: A company is a data controller when it has the responsibility of deciding why and how (the 'purposes' and 'means') the personal data is processed.'' [46] AWS: 'the data exporter' means controller who transfers personal data'' [47] We then applied deontic logic and divided the whole set rules into either Permissions or Obligations. Some of the statements from organizational policies are listed below.
''Facebook: Under the GDPR, data controllers must adopt compliance measures to cover how data is collected, what it's used for, and how long it's retained. They also need to make sure people can access the data about them'' [46] ''WhatsApp: The data subject can enforce against the data exporter this Clause, Clause 4(b) to (i), Clause 5(a) to (e), and (g) to (j), Clause 7, Clause 8 (2), and Clauses 9 to 12 as third-party beneficiary'' [57] ''AWS: This DPA shall continue in force until the termination of the Agreement (the ''Termination Date'').'' [47] ''Facebook: Data processed must be necessary for the Service and defined in the contract with the individual.'' [46] We have also identified the key classes of a knowledge graph to represent the GDPR rules. We have referenced the GDPR regulation available at [37], [38] for this.

1) CONSUMERS AND PROVIDERS
The regulation splits the tasks and obligations of consumers and providers, obligating consumers and providers that provide ''adequate guarantees to implement suitable technical and organizational measures'' to meet the regulation's policies and protect data subject's rights [63].
The regulation provides specific counsels for what kinds of security actions should be considered ''appropriate to the risk,'' including [63]: • The pseudonymization and/or encryption of individual data.
• The capability to certify the ongoing confidentiality, integrity, availability, and resilience of systems and services processing personal data. The aptitude to restore the availability and access to data promptly in the event of a physical or technical occurrence.
• A procedure for regularly testing, assessing, and evaluating the efficiency of technical and organizational measures for ensuring the security of the processing.

2) FINES AND ENFORCEMENT
Breach of compliance will result in fines of up to 4% of global revenue or e20m, equivalent to roughly $23.4m whichever is greater. It will depend on the severity of the breach and the organization's ability to demonstrate that there were initial measures in place (or not) to protect customer data.

3) BREACH AND NOTIFICATION
In the incident of a personal data breach, data consumers must inform the appropriate supervisory authority without undue delay and, where possible, not later than 72 hours after knowing about a data breach. If notice is not made within 72 hours, the consumer must provide a reasoned justification for the delay [37].

4) DATA PROTECTION OFFICER
Whoever holds this position will be accountable for managing data protection and data privacy, and free to give approvals or feedback without any fear of negative implications. This only applies if an organization handles huge important volumes of data, typically not applicable to small to medium-sized enterprises.

5) DATA SUBJECT
Individuals will have more data on how their data is handled, and this information should be available in a clear and reasonable way. Consumers must inform data subjects about the period of (or reasons why) data will be reserved on collection. Data subject consequently wish to have their data removed, and the data is no longer required for the reasons for which it was composed, then it must be erased.
To develop the ontology, we have used the mixture of top-down and bottom-up approach by answering the following questions: • What are the major obligations that will impact an organization?
• What are the specific entities that will be affected? • Are there any common obligations for consumers and providers?
• Can we come up with a list of obligations that will affect consumers and providers individually?
• Is there a CSA code of conduct control associated with each obligation? Upon answering the above questions, we could identify our classes, subclasses, and relationships for the ontology, as illustrated in Figure 2. We have identified the associated CSA Code of Conduct controls for the GDPR articles. Table 4, 5, and 6 represents the association between GDPR obligations vs. CSA controls. In our knowledge graph, we have included the associated CSA Code of Conduct controls [79] for the GDPR articles.

D. PCI DSS KNOWLEDGE GRAPH
In our previous paper, we have described the PCI DSS ontology developed by us based on the requirements defined by   the PCI DSS council. The security controls and processes required by PCI DSS are vital for protecting cardholder account data, including the PAN -the primary account number printed on the front of a payment card [38]. This includes VOLUME 8, 2020 sensitive data that is printed on a card or stored on a card's magnetic stripe or chip -and personal identification numbers entered by the cardholder [38]. In general, if an organization deals in card transactions, then it must follow the policies listed below [38].

1) BUILD AND MAINTAIN A SECURE NETWORK
'Install and maintain a firewall configuration to protect cardholder data [1], [4]'. The network configuration and its security requirements should be shared by the IT team and Cloud service providers [38], [39]. 'Define the system password and its security parameters' [38], [39]. This means that all the default passwords supplied by the providers should be changed when a system is getting installed in the configured network [38], [39].

2) PROTECT CARDHOLDER DAT
'Protect stored cardholder data' [38], [39]. This means that only the necessary data should be stored, and at least every quarter, any unnecessary data should be purged. PAN details should be masked, the first six and last four digits are the maximum number of digits you may display [38], [39]. Also, PAN details must be made unreadable wherever it is being stored [38], [39]. 'Encrypt transmission of cardholder data across open, public networks' [38], [39]. This rule of PCI DSS policy asks the organization to make use of strong cryptography and encryption technologies like SL/TLS, SSH, or IPSec, etc. to safeguard sensitive cardholder data during transmission over any networks [38], [39].

3) MAINTAIN A VULNERABILITY MANAGEMENT PROGRAM
'Use and regularly update the anti-virus software or programs' [8], [39]. All the systems and servers should have anti-virus software's to prevent malicious activity. At the same time, anti-virus services should be running in the background and generating auditing logs [38], [39]. 'Develop and maintain secure systems and applications' [38], [39]. This policy ensures that all the patches must be installed on time whenever any new patches are published by the vendors [38], [39]. Any changes to the system components, coding of applications must be done through proper change and control procedures [38], [39]. Also, firewall protection should be ensured for any public-facing web applications [38], [39].

4) IMPLEMENT STRONG ACCESS CONTROL MEASURES
'Restrict access to cardholder data by business need to know' [38], [39]. This policy ensures that access is limited to system components and cardholder's data. Also, access control protocol for systems components should be in place for multiple users, and it must restrict access based on a user's needs and should be set to ''deny all'' unless specifically authorized [38], [39]. 'Assign a unique ID to each person with computer access. These policies ensure that any person who is accessing the data should have a unique ID [4]. This will help in tracing an individual's activity in case of any violation or misuse [4]. Also, there should be two-factor authentication for remotely logging into the network for, such as making use of RSA token or other technologies that facilitate two-factor authentication [38], [39] 'Restrict physical access to cardholder data' [38], [39]. This ensures that proper facility controls should be applied to the cardholder data environment, and individuals only with proper authorization should be allowed to access cardholder data [38], [39]. For visitors, the proper token should be given with expiry, and a visitor log must be maintained for tracking purposes [1], [4].

5) REGULARLY MONITOR AND TEST NETWORKS
'Track and monitor all access to network resources and cardholder data' [38], [39]. This ensures that an established process should be implemented to link access of individuals to system components [38], [39]. Log activities of the system components must be reviewed daily, and audit trail history must be retained for at least one year so that three months of activity is available immediately [38], [39]. 'Regularly test security systems and processes' [38], [39]. This ensures that all the test procedures should be in place to detect access points and unauthorized users [38], [39]. Also, external and internal penetration testing should be performed, including network and application-layer penetration tests at least annually [38], [39].

6) MAINTAIN AN INFORMATION SECURITY POLICY
This ensures that the PCI DSS policies that have been established, published, and maintained have clear, descriptive definitions of the procedures that everyone in the system knows thoroughly, and such policy must be reviewed at least once a year [38], [39].
Based on the PCI DSS repository, we created the knowledge graph. Our knowledgebase consists of six different class which incorporate the 12 requirements. Figure 3 illustrates our ontology. The main stakeholder entities are PCI DSS Council, Educational Institutions, and Cloud Service Providers. In our ontology, we have six classes having two or more subclasses in it. Each class are disjoint from other classes which means that an individual (or object) cannot be an instance of more than one of these six classes Based on the security controls definition provided by NIST [28] and CSA [14], we try to relate the security compliance model to the security controls. In Table 1, the security control supported by NIST or CSA is listed, followed by the recommended Cloud compliance system.
In our previous work [44], we have identified the keywords that are associated with PCI-DSS. Key terms under PCI-DSS are ''maintain'', ''control'', ''establish'', ''access'', ''unauthorized,'' and ''ensure''. To populate our ontology, we have searched for these key terms from the individual organizational policies. Below are the example statements from the privacy policies for the key term ''control''.
''AWS: Service providers now are required to detect and report on failures of critical security control systems''. [47] ''eBay: You will maintain such compliance at all times during the term of the Terms. This requirement will survive the duration of the Terms until you return, destroy, or cause

E. CLOUD SECURITY ONTOLOGY
The ontology for Cloud computing security is illustrated in Figure 4. The Cloud computing security class is divided into Cloud security compliance models, Cloud security controls, and Cloud security threats. The relations between all the classes are described in the ontology. The ontology is further developed with individual class and its subclasses. The Cloud security control class and its subclasses are illustrated in Figure 5.
Some of the compliances and security standards are displayed for understanding the relationship between two classes. As discussed in section III, each Cloud security standard supports a type of compliance. For example, the security standard MPAA (Motion Picture Association of America) is used for protecting the original content from piracy. It fulfills all the requirements stated in media protection compliance. Hence, we can show that MPAA supports Media protection compliance. Similarly, we can show the relation between security standards and Cloud security compliances mentioned in section III, Table 1. Figure 6 describes the class Cloud security compliances and its relationship with the security control class. The types of Cloud security compliances, explained in the Appendix, are represented in the ontology. Figure 7 illustrates the relation between security standards and security threats.   The security standards overcome the threats if they are correctly used in Cloud security. For example, a data breach is a security threat to the Cloud, but it can be overcome if we apply the compliance standard FedRAMP that is specifically used for data security.

V. VALIDATION RESULTS
We have validated our knowledge graph with the privacy policies of various providers like amazon, google, IBM, and Rackspace. Figure 8 illustrates the PCI-DSS regulation instance with all the policies associated with it. Each regulation is associated with threats and controls instances as well. Likewise, we have integrated all the regulations and built relationships with Cloud standards, controls, and threat classes. Figure 9 shows the results of amazon instance from the provider class. End-user can quickly glance if all the regulations are followed by their organization and act by  finding out the missing policies. We have listed the SWRL rules in Figure 10.
Based on the key terms extracted from the above sections, we populated the statements in corresponding classes of our   ontology. We then check the regulations followed by organizations using the SPARQL queries [70]. Below are the sample queries to check for the consumer and provider obligations under GDPR/PCI-DSS that are followed by an organization.
SPARQL query to check for GDPR provider obligations Amazon: For the process of validation, we referred to Cloud data policies of major Cloud data providers. We wanted to verify if key terms and obligations specified in these data policies and can be populated as instances of our data compliance knowledge graph. First, we populate the ontology by utilizing the original GDPR and PCI-DSS policy documents. We then run SPARQL queries to identify original policy statements under each class of our ontology. Results from these SPARQL queries are exported to compare with the results of organizational policies. Classes that do not have any induvial means that an organization is not in compliance with that regulation either under GDPR/PCI-DSS. This analysis will help an organization to verify the results with the original regulation document quickly. We found similar key terms in the organizational policies along with the number of times that term has occurred. The graph in Figure 11 is a snapshot of key terms and the count for various organizations. With the help of these terms, each organization's policies were populated as instances of our knowledge graph. The data policies are now available as an RDF graph and are machine-processable. It will now be possible to automate the compliance validation by using policy reasoning engines that can alert any potential compliance violation.

VI. CONCLUSION
We have developed the Cloud security comparator system for consumers who are planning to move their data to the Cloud but are uncertain due to security concerns as they may not be aware of various compliance models. This study also helped us determine the Cloud security controls and policies and quantify them in a comprehensive manner. As part of our ongoing work, we will further analyze other IT compliance models to improve our recommendation system.
As we discussed, the analysis will clarify the importance of security controls and compliance models. Also, the prototype will help Cloud consumers choose Cloud providers based on the security compliance model. In the future, we plan on refining the recommendation system by adding the cost of Cloud providers. The cost factor will give us the cognitive result to choose the best Cloud provider. Similarly, this prototype model can be implemented on the IT compliance models other than security. We can also integrate this tool with ecommerce providers to find an optimized solution for B2B services.

VII. APPENDIX: ANALYSIS OF VARIOUS CLOUD DATA COMPLIANCE MODELS
After a detailed study of existing data protection regulations, we have identified the following standards that apply to Cloud-based services and applications.

1) ISO 27002
ISO standard for information security controls [20]. It was initially published as ISO 17799. This standard advises how to implement various controls in an organization, but it does not focus on a particular compliance model.
Key features: Network security, incident management, security compliance review 2) ISO 27001 ISO 27001 [21] is an auditable international standard for information security management system (ISMS) and focuses on selecting adequate and appropriate security controls. Generally, a full assessment is done every three years, and a surveillance audit is performed every six months.
Key features: Compliance Audit, risk assessment, IT security management.

3) PAYMENT CARD INDUSTRY DATA SECURITY STANDARD (PCI-DSS)
This standard by Payment Card Industry Security Standards Council (PCI-SSC) [39] aims to reduce credit card frauds. It applies to organizations that store, process, and transmit cardholder's information. Note that even though a Cloud provider is PCI-DSS compliant, the Cloud consumer does not necessarily become PCI-DSS compliant.
Key features: Protect Credit, Debit cardholder-related information, Strong access control, Maintain a firewall, Antivirus software maintenance.

4) STATEMENT ON STANDARDS FOR ATTESTATION ENGAGEMENTS (SSAE16)
This standard [8] was developed by American Institutes of Certified Public Accountants (AICPA) for reporting on Controls at a Service Organization, the Statement on Auditing Standards SAS70. SSAE16 has three kinds of Service Organization Controls(SOC) reports. SOC1 report is required when audits conducted over internal controls over financial reporting, management of the user organization, and management of the service organization. SOC2 report is required when auditing the organization's security, availability, privacy, confidentiality, and processing. SOC3 report is given to Cloud provider organization when there are restrictions on providing information about current and potential customers in auditing, KARUNA PANDE JOSHI (Member, IEEE) received the bachelor's degree in computer engineering from the University of Mumbai, India, and the M.S. and Ph.D. degrees in computer science from the University of Maryland Baltimore County (UMBC), where she was twice awarded the IBM Ph.D. Fellowship. She is currently an Associate Professor of information systems at UMBC and the UMBC Site Director of the Center for Accelerated Real-Time Analytics (CARTA). She also directs the Knowledge Analytics Cognitive and Cloud (KnACC) Laboratory. She teaches courses in big data, database systems design, and software engineering. She has extensive experience of working in the industry primarily as an IT Program/Project Manager at the International Monetary Fund. She has published over 50 articles. Her research interests include data science, cloud computing, data security and privacy, and healthcare IT systems. Her research was supported by ONR, NSF, DoD, GE Research, and Cisco.
LAVANYA ELLURI (Graduate Student Member, IEEE) received the master's degree in management information systems from the University of Houston Clear Lake. She is currently pursuing the Ph.D. degree in information systems (IS) with the University of Maryland Baltimore County. In parallel, she is also a Senior Database Engineer at REI Systems, Sterling, VA, USA. Her research interests include cloud computing, semantic web, and text mining.
ANKUR NAGAR received the master's degree, in August 2019. He is currently working for UBS on electronic trading business and client hub connectivity. He worked on the Automated Legal Document Analytics (ALDA) project, where he developed novel techniques for automating rules for Mobile Wallet processing. His research interests include cloud computing and financial services. VOLUME 8, 2020