A Consent-Based Privacy-Compliant Personal Data-Sharing System

Personal data is becoming increasingly valuable in business, as the insights that can be obtained from data processing continue to improve. However, it also can cause adverse effects on individuals. To improve data quality while satisfying privacy compliance, companies now have focused on collecting informed consent from individuals to directly handle personal data without applying any privacy-preserving techniques. Even though the companies obtain consent to use personal data, to improve transparency and accountability to ensure that companies deal with personal data according to consent, it is necessary for a system to comply with privacy requirements. Therefore, this paper proposes a new consent-based privacy-compliant personal data-sharing system that considers personal data-sharing flows and requirements obtained from enterprises and privacy frameworks, respectively. By analyzing a general process and the roles of actors for data sharing in enterprise environments according to standard privacy frameworks, this paper has proposed system requirements, system architecture, and detailed procedure for a consent-based privacy-compliant processing method that considers compliance checking as well as consent checking. To show the feasibility of the proposed system, this paper demonstrates a prototype and the performance analysis in the lab and real-world environments.


I. INTRODUCTION
Data is becoming increasingly valuable in business, as the insights that can be obtained from data processing continue to improve. Advancements in artificial intelligence and data processing technology [1] have enabled data-driven insights, uncovering business potentials and opportunities. Companies are now actively willing to utilize data to operate and expand their businesses.
According to a report on big data analytics market size [2], the global market size is projected to grow from 307 Billion US dollars in 2023 to 745 Billion US dollars in 2030. Therefore, the importance of having more data has led companies to not only collect but also actively share and trade data The associate editor coordinating the review of this manuscript and approving it for publication was Zheng Yan .
(especially, personal data) among business stakeholders [3]. By combining and synthesizing large amounts of high-quality data, companies can gain deeper insights and improve their predictive capabilities. In this aspect, among all types of data, personal data plays a key role in maximizing the value of data in business by giving a basis for understanding and predicting customers' behavior as well as market trends. Thus, companies are making efforts to gather more personal data for their business through various channels.
A company's proactive sharing and utilization of personal data can benefit its own business growth. However, this also can cause adverse effects on data providers (or data subjects) such as privacy infringement, unwanted marketing, and alleviated risk of data breaches. Since it is challenging to have data-driven innovation while protecting privacy [4], the necessity of guidance that can mitigate negative impacts and risks, safeguard data subjects' rights, and enable corporate data utilization has increased.
In response to these issues, many governments have implemented legal and regulatory frameworks, including the General Data Protection Regulation (GDPR) in the European Union [5], California Consumer Privacy Act (CCPA) in the United States [6], etc. These laws and regulations aim to protect individuals' personal data by setting rules and guidelines for companies and organizations to collect, process, store, and share personal data. Companies are now obliged to have systems and procedures in place for the legitimate use of personal data.
Under the advent of new or more stringent regulatory frameworks, one approach to securely use and share (personal) data is applying privacy-preserving techniques. Applying privacy-preserving technologies (e.g., data anonymization, differential privacy, secure multi-party computation, homomorphic encryption, etc.) [7], [8], [9] can make the sharing parties unable to recognize any personal data from the shared data so that the privacy regulations are no longer applied. By making personal data unidentifiable among the data processing parties, privacy-preserving technology ensures the protection of sensitive or personal data from unauthorized access, disclosure, or misuse. However, such technologies require extra data processing resources and can diminish the value of the data due to the loss of information in quantity and quality. Therefore, it is necessary to handle personal data without privacy-preserving techniques (i.e., handling personal data as it is for obtaining better quality information) in a privacy-compliant manner that supports and protects individuals' rights.
To use and share personal data without compromising its quality and quantity while adhering to privacy regulations, it is necessary to consider various privacy-related compliance requirements for companies. Therefore, it is important to build a personal data-sharing system for supporting the data utilization stakeholders, which considers individuals' consent and other privacy compliance requirements. There are several ways to follow privacy compliance requirements (e.g., consent-based, privacy-preserving based, legal-based, etc.); particularly, considering consent-based personal data handling mechanisms draws attention from both academia and industry since new data-related regulations and governance has emphasized individuals' rights and consent management [5], [6], [10].
In other words, if companies can collect informed consent from individuals with proper purposes of personal data utilization and sharing, they are able to directly handle personal data without applying any privacy-preserving techniques. Therefore, from companies' perspectives, methods are needed for consent management and consent-based data utilization, which properly match and audit consent information on the data utilization process.
Since consent management and consent-based personal data processing techniques become important parts of the data utilization process, a consent-based privacy-compliant personal data-sharing system needs to be implemented to cover the data utilization processes among the data subjects (or data providers), the data controller, the data processor, and the third party (or data requester) according to standards and regulations [5], [10]. Note that, in this paper, the terms ''data subject'' and ''data provider'' are used interchangeably, which means that an individual provides personal data to the data controller and the data processor. Figure 1 (inspired by [10], [11]) shows a general sequence of consent-based privacy-compliant personal data utilization among the actors. The company, which is responsible for providing a service to data subjects, acts as a data controller (usually acts as a data processor, too) and obtains personal data with necessary consent from the data subjects (or data providers). While the data controller manages the collected personal data and consent from data providers, it receives many data-sharing requests from data requesters who want to utilize personal data for their own purposes (e.g., internal departments, contracted third parties, or regulatory authorities, etc.). Upon receiving data-sharing requests from data requesters, the data controller assesses the requests. If a data request is acceptable, the data controller instructs the data processor to process the requested dataset in accordance with the obtained consent, applicable requirements, and compliance. After the data processor reports the processing result, the data controller reviews and examines the processed dataset to make a decision. When the data controller decides to share, then the data requester receives a valid dataset.
In the literature, apart from research on privacypreserving [12], [13], [14], [15], [16], consent-based approaches have mainly focused on obtaining consent from the data providers and/or managing data access mechanisms according to the obtained consent [17], [18], [19], [20], [21], and there are only few studies [22], [23] that have tackled consent-based personal data processing and sharing. However, there are many cases [24], [25], [26] showed that violations of consent occur when companies utilize personal data. To have transparent and accountable use of the data shared by data subjects, the companies now need to have a proper system that supports privacy-compliant data-sharing which ensures the data controller actually follows the consent of data subjects to prevent any violations of consent.
However, to the best of the authors' knowledge, there is no study has considered the entire process of consent-based personal data sharing (explained in Figure 1) to share personal data with others by complying with privacy requirements. Based on these backgrounds, this paper proposes a new consent-based privacy-compliant personal data-sharing system that considers personal data-sharing flows and requirements obtained from enterprises and privacy frameworks, respectively. By applying the proposed system, the companies can easily record logs and pieces of evidence that prove the companies properly follow the consent when they share personal data with others [4].
The summary of contributions of this paper follows.
• This paper proposes a process for a privacy-compliant personal data-sharing system that should be considered for having transparency and accountability of personal data use of the companies according to the standard privacy frameworks [10], [27], which consists of three actors for personal data utilization: the data requester, the data controller, and the data processor. Note that the informed consent and raw data are already collected and stored on the data controller side.
• Based on the proposed privacy-compliant personal datasharing process (i.e., data sharing request, request assessment, dataset processing, data sharing decision, and data use), this paper describes the major roles of the actors and proposes the requirements of each actor for a privacy-compliant data-sharing system according to the standard privacy frameworks [10], [27].
• Based on the requirements, this paper proposes a new privacy-compliant personal data-sharing system that can assess and match compliance and consent information to satisfy privacy requirements. Moreover, this paper proposes a procedure with pseudo-code for a privacycompliant data processing method.
• By showing a demonstration of the implemented prototype for the proposed system, this paper presents how the proposed process is implemented as a real-world application. In addition, to show the feasibility of the proposed system, this paper analyses the performances of the proposed privacy-compliant personal data sharing in the lab and real-world environments. The rest of this paper is organized as follows. The recent studies on personal data-sharing are introduced in Section II. After that, Section III illustrates a personal data-sharing flow in an enterprise system to identify the roles and requirements along with standard privacy frameworks. According to the identified requirements, the proposed consent-based privacycompliant personal data-sharing system is described with the proposed consent-based privacy-compliant processing procedure in Section IV. Using the proposed system architecture, Section V shows the demonstration and performance analysis using the prototype of the proposed system for both simulation and real-world environments. In Section VI, this paper is concluded with several discussion points of the proposed system.

II. LITERATURE REVIEWS
To support data providers' rights and comply with privacy requirements for sharing personal data with other stakeholders, there are two ways mainly considered in the literature: 1) privacy-preserving-based sharing and 2) consent-based sharing. The main difference between the two approaches is whether they can share raw data or not. Using a privacypreserving-based approach, only filtered/processed personal data (i.e., relatively low-quality data) are shared with others, and it is less complicated to comply with privacy regulations. On the other hand, using a consent-based approach, raw personal data (i.e., relatively high-quality data) also can be shared with others, but it is more complicated to comply with privacy regulations.

A. PRIVACY-PRESERVING-BASED SHARING
A privacy-preserving approach is a typical way to share data, which deletes/masks sensitive data using various techniques such as anonymization, pseudonymization, de-identification, etc. As a result, the shared dataset contains only non-sensitive data, and data providers are not deeply concerned about privacy infringement issues. There are several survey papers [7], [8], [9] that introduced various privacy-preserving big data management and exchange models.
One approach is applying the privacy-preserving technique from the data provider side. In other words, only the processed/filtered datasets can be collected from the data subjects (or data providers). Zhang and Dong [12] proposed a privacy-preserving data aggregation scheme to collect smart meter data in smart grid environments. By introducing a modified symmetric homomorphic encryption technique, the authors showed that the scheme is able to aggregate data more securely against tampering attacks from malicious aggregators. However, data encapsulated by homomorphic encryption makes it hard to apply complex data processing methods to extract good-quality information.
Shojaee et al. [13] proposed a deep learning-based privacypreserving data distillation method to share only latent feature datasets extracted from the original dataset. Data consumers aggregate the distilled datasets and process the datasets to achieve their goals. However, it is hard to directly use latent features in general because they only contain abstracted information.
Applying the privacy-preserving techniques from the data provider side make the data controller hard to provide good quality service to the data providers. Therefore, applying the privacy-preserving techniques from the data controller and/or the data processor side is a more popular approach for data-sharing systems. In other words, the data controller collects raw data from the data providers and applies privacy-preserving techniques when it shares the data with others.
Xiao et al. [14] proposed a data privacy-preserving automation architecture for industrial data exchange in smart city environments, which is able to deliver privacy-preserved datasets. The authors also utilized the concept of offset dataset that can restore the original datasets from the privacypreserved ones. This paper proposed a way to share the original datasets, but the authors did not consider compliance issues.
Wu et al. [15] introduced a decentralized privacy-preserved medical data-sharing system. To share and exchange medical data, the authors proposed a local differential privacy technique that can aggregate each privacy data into a statistical information dataset. If a data requester's privacy requirement on a dataset matches that of a data publisher, the dataset can be exchanged.
To apply privacy-preserving techniques at the attribute level of datasets, Li and He [16] proposed local generalization and bucketization methods for applying anonymization techniques for aggregated datasets. By independently separating sensitive attributes in different spaces, the authors were able to comply with various privacy requirements for different types of attributes.
Privacy-preserved approaches address compliance issues, but they result in reduced information due to filtering sensitive data. In other words, the processed (filtered, anonymized, distilled, etc.) datasets contain less useful data/information than the original ones from the data consumers' point of view. Therefore, the necessity of using raw datasets to extract more useful information has increased in a safe and compliant manner. To overcome this issue, data providers' consentbased data-sharing approach has been considered.

B. CONSENT-BASED SHARING
As one approach, with the emergence of blockchain technology, many studies [28] have focused on utilizing the characteristics of the blockchain (i.e., immutability, traceability, etc.) to check and enforce the consent of data providers for managing data access.
Xu et al. [17] proposed a blockchain-based consent management model for financial service platforms. The authors utilized a consortium blockchain with a proof of authority mechanism to check both the informed consent of users and the certificates of regulators.
Roman-Martinez et al. [18] suggested a service-oriented architecture for consent management, access control, and auditing of health data usage with a blockchain. By utilizing two separate blockchains for checking consent and auditing events, the authors developed a system with service-oriented architecture, which shares personal health data.
However, blockchain-based studies have performance issues (e.g., execution time, scalability, etc.) in applying real-world systems, particularly, for large-scale enterprise systems. Therefore, other approaches have considered mapping consent information into the datasets (or databases) within the existing systems. Particularly, some studies have focused on access management of datasets according to the consent information of data providers. In other words, data consumers are able to access only the consented dataset.
Olca and Can [19] proposed a semantic web-based domain-independent consent management method. By utilizing the proposed ontology models for consent management, service users are able to protect their personal data according to informed consent.
Pathmabandu et al. [20] proposed an informed consent management engine that can analyze risky permissions and potential risks of privacy policies and track and log events for auditing purposes. In addition, the authors provided a reference architecture of the proposed model with the demonstration in smart building environments.
Debackere et al. [21] proposed a technical architecture for enforcing consent in the Solid project 1 (initiated by Tim Berners-Lee for empowering individuals by separating user data storage and data using services) by improving the existing authorization and authentication process with the proposed policy based scheme.
Those studies have focused on the consent-based access management of the entire dataset, so there are some limitations to satisfying requirements about accessing and sharing only some parts of datasets. Therefore, tuples or attribute-level access management of databases also have been studied for more precise consent management, which also can be relatively easily applied to existing data management systems.
Drien et al. [22] proposed a method for managing consent and controlling data access in shared databases, which focused on the relational model and queries.
Konstantinidis et al. [23] proposed a database-level consent evaluation method. The authors proposed a method to evaluate consent information and apply it to the attributes in the tables of the database. By providing a formal specification to represent consent information from data providers, the system is able to automatically filter query results that violate consent.
The studies for attributed-level consent management showed the feasibility of accessing and sharing datasets stored in the existing database systems. However, it was ambiguous whether such attributed-level consent management is feasible in real-world systems. Therefore, this paper proposes a consent-based privacy-compliant personal data-sharing system that considers not only a personal data export process but also consent mapping methods for enterprise-level systems, which is feasible to apply in realworld environments.

III. PERSONAL DATA-SHARING IN ENTERPRISE SYSTEM
When companies want to share data, they must comply with guidelines and regulations and be able to establish agreements on the rights and responsibilities of the parties.
To manage these requirements, a company needs to establish standard procedures and maintain an automated system for regulatory compliance on data-sharing. Therefore, this section describes a general personal data-sharing flow and proposes the system requirements to comply with various privacy requirements and regulations.

A. ENTERPRISE DATASET SHARING FLOW
This section describes a general personal data-sharing flow in an enterprise data management system. In order to share data based on a data provider's consent and comply with applicable regulations, it is important for a company to have the ability to 1) obtain and manage the data provider's consent and 2) process and evaluate the data sharing request by the privacy compliance requirements at the system level.
Therefore, the company implements a personal data processing system with consent management that not only allows a data provider to provide and revoke consent but also keeps records of the data provider's consent history. Any data processing activities should be done in accordance with the data provider's consent and only for the purposes that were explicitly disclosed within the announced retention period.
In terms of timing, data providers' consent and data produced by them must be obtained in advance to share any of the data with data requesters. The consent management method that guarantees the exercise of their data provider rights (i.e., the right to access or delete their data, the processing of their requests, etc.) is a must-have in this procedure. However, the event of obtaining and updating the data provider's consent and the event of processing the data-sharing requests do not occur simultaneously. In a system design manner, having no temporal association means that the two procedures work independently so they can be designed differently as long as they can be connected and operated together.
Therefore, this paper focuses on proposing a system that can process the requests of sharing data based on consent and regulatory requirements. In other words, the proposed system handles raw data and consent information, which are already gathered from the data providers. Note that a general procedure of data sharing with consent is performed among the four actors: data providers, data controllers, data processors, and data requesters (as illustrated in Figure 1). However, since consent management itself is not included in this paper's scope, the proposed personal data-sharing system is operated among the three actors: data controllers, data processors, and data requesters.
As represented in Figure 2, the workflow of a privacy-compliant enterprise data-sharing process is as follows. When a data requester needs data for a certain purpose, the data requester analyzes what to request and applies a data request to a data controller for data sharing. In the application, the data requester should explicitly reveal the purpose, duration, region of use along with the items, and subjects of the personal data (i.e., data sharing request). Upon the reception of the application, the data controller assesses the validity of the request according to the applicable regulations and compliance requirements. If the request is acceptable, then the data controller sends an order to a data controller for obtaining the requested dataset with data protection guidelines (i.e., request assessment). Then, the data processor conveys the order by selecting the proper data sources, matching the consent and compliance conditions, processing guided de-identification schemes or pseudonymization schemes, and generating a valid sharable dataset with a data processing report (i.e., data processing). Upon the reception of the processing report with some samples of the dataset, the data controller examines the dataset and makes a decision on whether to share the dataset as processed or to re-order to process under different conditions and guidance (i.e., data sharing decision). When the data controller decides to share the dataset, then the data controller provides a response to the data requester with data access credentials and terms of use. By confirming the conditions for the usage of the shared dataset, the data requester can have the credentials to download the dataset (i.e., data use).
To perform a privacy-compliant personal data-sharing process, the roles of each actor described in the workflow are as follows (according to [5], [27]). Data requester: The following roles of the data requester are mandatory for the data controller to be able to make decisions about whether the data processing for requested data sharing can comply with any applicable legal frameworks. In GDPR [5], the related clauses are ''Principles relating to processing of personal data'', ''Lawfulness of processing'', ''Conditions of consent'', ''Processing of special categories of personal data''. On the one hand, in ISO/IEC 27001 [27], the related clauses are ''Planning'', ''Support'', ''Operation'', etc.
• The data requester specifically clarifies the purpose, duration, and region of use along with the items and subjects of the personal data that they need; • The data requester clearly fills in all the questions on the data sharing request application; • The data requester should keep the terms of use while utilizing the dataset. Data controller: All the roles of the data controller are related to making decisions and taking responsibility for any data processing activities whether they are compliant with the applicable privacy frameworks. Under the GDPR, the data controller exercises overall control of the personal data being processed and is ultimately in charge of and responsible for the processing. In GDRP [5], the related clauses are ''Responsibility of the controller'', ''Processing under the authority of the controller or processor'', ''Records of processing activities'', etc. On the one hand, in ISO/IEC 27001 [27],the related clauses are ''Context of the Organization'', ''Leadership'', ''Planning'', ''Support'', ''Performance Evaluation'', etc.
• The data controller provides a proper data sharing request form that contains every information that is needed to assess the availability of that sharing; • The data controller has a thorough knowledge of the regulations and consents applicable in determining the acceptability of data-sharing requests; • The data controller provides the data processor detailed instructions about what and how to process the dataset; • The data controller examines the dataset and makes decisions on the sharing; • The data controller provides the terms of use (i.e., conditions or policy of use) on the dataset for the data requester. Data processor: The roles and responsibilities of data processors under the supervision of the data controller are identified as follows. In GDPR [5], the related clauses are ''Processor'', ''Cooperation with the supervisory authority'', ''Security of processing'', etc. On the one hand, in ISO/IEC 27001 [27], the related clauses are ''Planning'', ''Support'', ''Operation'', ''Performance Evaluation'', etc.
• The data processor conveys the orders as requested transparently; • The data processor follows privacy compliance and consent while creating a shareable dataset for the request; • The data processor delivers the dataset and generates a report on the process and the result.

B. REQUIREMENTS FOR DATA SHARING OF ENTERPRISE
To enable the actors to perform roles in the privacy-compliant personal data-sharing workflow, a system needs to be able to provide the following functionalities (according to [5], [27]): • Provide ways of receiving data-sharing requests from data requesters; • Instruct the data requesters on what information they need to provide (e.g., provide the application forms that contain all information that is required for data controllers to assess the availability of the request, such as the purpose of use, attributes of data, period, region, etc.); • Manage the request and responding states of data sharing requests; • Provide information that is needed by the data controller to assess the requests (i.e., terms of consent, data sharing history, statistical information of data); • Convey the data processing orders with detailed conditions and comments to comply with the applied regulations from the data controller to the data processors; • Support the data processors to be able to process the data as instructed by the data controllers while keeping the privacy regulations applied for the data processor; • Generate the report on data processing results, including detailed information, which is required for data controllers to examine and make decisions (e.g., the size of the data, de-identification level, statistical distribution, etc.); • Support the data controller to examine the processed dataset; • Record and deliver the data controller's decision and reasons on the requested data sharing; • Provide the costumed terms of use with detailed conditions on the use of the approved dataset; • Apply the access control on the sharing dataset with the verification scheme; According to the requirements obtained by assessing a data-sharing process, the next section presents and proposes a new privacy-compliant personal data-sharing system.

IV. PRIVACY-COMPLIANT DATA-SHARING SYSTEM
According to the identified requirements and the process of enterprise dataset sharing, this section proposes a privacycompliant data-sharing system that consists of five major functions: i) data request, ii) request assessment, iii) data process, iv) data sharing evaluation, and v) data export. Particularly, this section describes the important features and procedures of each function, and this paper proposes a new privacy-compliant model for safely processing datasets by checking compliance and consent information. An overview of the proposed system architecture is shown in Figure 3. This paper assumes that the system handles the table structure of databases (e.g., relational databases, etc.) so that each row becomes a data element, and each column becomes an attribute.

A. DATA REQUEST FUNCTION
The first step for personal data-sharing is that a data requester requests datasets with the proper request form. A data requester makes a data request by specifying the necessary data types (or target data types) (T ), the purposes of using the dataset (O), etc. The data requester generates a data request {T , O} to a data controller who manages datasets. Here, the sets T = {t 1 , · · · , t k } and O = {o 1 , · · · , o w } specify each necessary data type (t k ) and the each purpose (o w ) of data use, respectively. From now on, the notation | · | is used to represent the number of elements in a set. For example, if a data requester specifies 5 data types (e.g., age, email, address, purchase history, etc.) with 2 purposes of data use (e.g., marketing, analysis, etc.), then |T | becomes 5, and |O| becomes 2.
Note that the data requester can query multiple independent data requests R = {{T 1 , O 1 }, · · · , {T |R| , O |R| }} where |R| means the number of data requests; however, for simplicity, this section considers one data request {T , O} ∈ R is submitted by the data requester.

B. REQUEST ASSESSMENT FUNCTION
When the data requester submits a data request {T , O} to the data controller of the system, the data controller needs to assess the incoming request. Particularly, the data controller validates each data request consisting of data type T and purpose O by checking regulations and/or rules of in-house compliance requirements L = {l 1 , · · · , l n }.
Since both the requested data types and data usage purpose should comply with the compliance requirements, the data controller independently checks the compliance of the requested data types T and purposes of data use O specified by the data requester.
For checking compliance of the request in data types aspects, the data controller should check that each data type (t k ∈ T ) can be shared with the data requester by checking compliance information (l n ∈ L). By jointly considering the requested data type and compliance information using a compliance checking function f (l n , t k ) that produces either 0 (not matched) or 1 (matched), the data controller can check the validity of compliance l n for data type t k .
From now on, the function f (l n , t k ) is notated as l n · t k for simplicity in matrix notation. To expand this, the compliance check for the entire data types of the request T with the entire compliance information L can be performed as follows.
where T means the transpose of the matrix. For passing compliance requirements, all elements in matrix L T ·T should be 1. Therefore, the data type request T is compliant if the following equation (2) becomes true, On the other hand, the data controller should check the compliance of the requested purpose (o w ∈ O) with the compliance requirement (l n ∈ L) similar to the equation (1). As a result, the data controller can verify the compliance requirements of each purpose of data use using the equation (3).
If compliance checking methods for both data types (equation (2)) and data use purposes (equation (3)) become true, then the data processor decides that the data request {T , O} is valid for compliance requirements; otherwise, the data processor judges that the data request is invalid. Based on the decision on the data request, the data controller informs its decision to the data requester. If the data request is valid, the data controller proceeds to the next step, which orders data processing to data processors according to the data request.

C. DATA PROCESS FUNCTION
After the data controller accepts the data request {T , O} from the data requester, the data processor handles the data request for extracting available data.
According to the data request and orders from the data controller, the data processor selects the proper databases for 95918 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. raw data (D) and consent (C) about all data providers (U). More specifically, for raw data, it is defined as follows, where U = {U 1 , · · · , U i } is a set of data providers in the system, D U i is a set of all data related to a data provider U i .
{d U i j,1 , · · · , d U i j,t k } is a j-th data row of the data provider U i , and d U i j,t k is a data element that is a data type t k in a j-th data row of a data provider U i . Note that this paper assumes the raw data D already complies with the data use duration and/or data retention time requirements, and only the requested data types (t k ∈ T ) are processed.
Similarly, the consent information of each data provider can be defined as follows.
where U = {U 1 , · · · , U i } is a set of data providers in the system, and C U i = {c U i 1 , · · · , c U i t k } is a set of consent expressed by a data provider U i for all data types (t k ∈ T ).
To check the validity between the consent of data providers and the purpose of the data request, the data processor should check whether each data provider U i allows the purpose of the data request (o w ∈ O) or not by checking the expressed consent of a data provider (c U i t k ∈ C U i ). To map the requested data use purpose from the data requester and the consent from data providers, a consent checking function g(c U i t k , o w ) is used, which produces either 0 (not matched) or 1 (matched). Similar to equation (1), the function g(c U i t k , o w ) is noted as c U i t k · o w . To expand this, a consent checking function G(C U i , O) for the entire vectors of consents (C U i ) and the purpose (O) can be performed as follows.
where c U i t k and o w mean a consent expressed by a data provider U i for the data type t k and a purpose expressed by the data request for the data type t k , respectively. Since, for one data type (t k ∈ T ), a specific purpose (o w ∈ O) should comply with all consent information ( ∀t k c U i t k ), a consent flag variable (σ U i t k ) is used to represent whether the purpose o w is complied to the user consent or not (i.e., σ U i . If σ U i t k becomes 1, the purpose is fit to the data provider's consent; σ U i t k becomes 0, the purpose is not suitable. With the consent checking information produced by the function G(C U i , O R ), the data processor can filter the dataset to extract sharable data by using a data filtering function h(d U i j , G(C U i , O R )). This data filtering function processes each data row j related to a data provider U i .
whered U i j,t k is a filtered data element with type t k of a data provider U i in data row j. If the result of σ U i t k becomes 1 (which means the data provider allows it), thend U i j,t k becomes the original data. Otherwise,d U i j,t k becomes empty (which means the data provider denies it).
Note that a filtered data rowd U i j with one or more than one denied data type (i.e, consent check result (σ U i t k ) becomes 0 (false) for any data type t k ) can be handled differently depending on the conditions or rules in the enterprise. In other words, the data processor can either export the data row with empty elements or discard the entire row.
Consequently, to filter all datasets the following data filter function H is proposed for the processing data of a data provider (U i ) in the system.
whered U i j,k is a filtered data element with type t k of a data provider U i in data row j, andD U i is the filtered dataset about the data provider U i resulting from consent checking. By iteratively applying the function H (·) for all data providers (U i ∈ U), the system is able to obtain the filtered datasetD = {D U 1 , · · · ,D U i } that contains data about all data providers.
After obtaining the filtered datasetD, the data processor should check whether de-identification and/or pseudonymization methods should be applied for some data types of datasets or not according to the compliance rules (regardless of the existence of the allowed consent). Note that any de-identification and/or pseudonymization methods that are applicable to the specific data type can be used (e.g., text data and image data should be processed by different de-identification and/or pseudonymization methods); however, the details of each method are out of the scope of this paper.
As a summary, the procedure for a proposed consentbased privacy-compliant processing method is explained in Procedure 1. Note that the symbol ▷ indicates a comment. For the request assessment from the data requester, the system needs to check the required types (T ) and the purposes of data use (O) with compliance information of the enterprise (L). Moreover, for data processing, the system needs to handle information about data providers (U), consent of data providers (C), and the original dataset (D). All input variables are defined from lines 1 to 6 in the procedure.
This procedure produces a shareable dataset (D) as an output of the consent-based privacy-compliant processing method, which contains data that fits all privacy and compliance requirements and data providers' consent (line 7).
The system assesses the data request ({T , O}) for both data types (line 8) and purposes (line 9) by comparing each element with compliance requirements (L) using the compliance check function f (·) (explained in equations (2) and (3)). Note that temporary variables (i.e., α and β) are used to represent the results of compliance checks in the procedure. If the data request passes the compliance assessment (i.e., α ∧ β becomes true), then the data processor handles the request to produce a shareable dataset; otherwise, the request is rejected (line 26).
When the data processor receives an order to produce a shareable dataset from the data controller with the request {T , O}, the data processor handles the dataset by applying the data provider basis process (line 11). All consents (c U i t k ∈ C U i ) should agree to each purpose of data use (o w ∈ O) from the data requester (lines 12 to 13). Therefore, the data processor checks consent requirements by using the consent check function g(·) (explained in equation (6)) with c U i t k and o w , and the data processor creates a consent flag σ U i t k for one purpose o w with all consents C U i . Note that a temporary variable γ is used to store the result of g(c U i t k , o R w ) in the procedure (line 16). Finally, the data processor produces a set of consent flags (σ U i ) for all purposes expressed in O (line 18).
Then, the data processor selects a shareable data row (d U i j ) with the filtering function h(·) (explained in equations (7) and (8)) by choosing a j-th data row d U i j ∈ D U i with consent flagsσ U i indicated by a data provider U i (line 20). By processing all rows in D U i , it produces a shareable datasetD U i about one data provider U i (line 22). By iterating the process for all data providers ∀U i ∈ U, the data processor finally produces the entire shareable datasetD that contains all data that comply with the consent of all data providers (line 24).

D. DATA SHARING EVALUATION FUNCTION
After creating a shareable datasetD, the data controller receives the results with reports from the data processor. Firstly, the data controller evaluates the data processing

Input:
1 (3)) 10: if α ∧ β then ▷ Pass compliance 11: for ∀U i ∈ U do 12: for ∀o w ∈ O do 13: for ∀c U i t k ∈ C U i do 14: end for 16: 19: results by checking the report, which contains various information, including the summary of the data request, the person in charge of data processing, the statistics of the sharable dataset, etc.
Moreover, the data controller performs a risk evaluation (e.g., risk of data re-identification) of the data request with the sharable dataset by considering the history of data requests R = {{T 1 , O 1 }, · · · , {T |R| , O |R| }} from the data request in past and potential risks when the dataset actually is delivered to the data requester. Various risk evaluation methods [29], [30] can be utilized; however, the detailed approaches are out of the scope of this paper even if risk evaluation is also an important topic in this area.
After examining and verifying the produced sharable dataset, the data controller finally decides whether the dataset can be shared or not. When the data controller approves the data sharing, a data access method for the shareable dataset is applied to the data requester with detailed information about the dataset. This access key is only available to the specific data requester. Note that any access control mechanism that can provide the access rights of the shareable dataset 95920 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
to the proper recipients can be applied to the data export function.

E. DATA EXPORT FUNCTION
After confirming the sharing of the dataset by the data controller, the data requester receives the final results and the access key for downloading the dataset. The dataset is available until its retention date which is set by consent and compliance requirements, and it can be accessed by only using the access key provided by the system.
In the next section, this paper introduces an actual implementation of the proposed privacy-compliant data-sharing system with a cloud computing environment.

V. DEMONSTRATION AND PERFORMANCE ANALYSIS
This section demonstrates an implementation for the proposed privacy-compliant data-sharing system in a cloud computing environment. Particularly, for the demonstration, the prototype of the proposed functions is presented. Moreover, this section also shows performance analysis results when a huge amount of data is processed with the proposed procedure (Procedure 1).

A. DEMONSTRATION
The detailed demonstrations of the proposed data-sharing system implementation are shown in Figure 4 for the proposed five functions: i) data request, ii) request assessment, iii) data process, iv) data sharing evaluation, and v) data export functions. The data-sharing system is implemented in JavaScript with Node.js and Express framework.
For demonstrating the prototype, it is assumed that a data-sharing scenario in a multi-national cosmetic corporation that has two sets of enterprise resource planning (ERP) systems in the United States of America (USA) and the European Union (EU). A marketing manager (as a data requester) wants to create new online marketing strategies for targeting customers in the USA by analyzing customers in the EU. In this case, customer information in the EU should be transferred to the system in the USA. This demonstration shows how the proposed data-sharing system processes work to comply with regulations and privacy requirements for sharing datasets.

1) DATA REQUEST
First, a data requester accesses the system and fills out a form that consists of the purposes of the data request, required data types, and additional information for clarifying compliance rules (in Figure 4a). For describing the purposes of the data request, the system provides some pre-defined purposes, which are commonly used (e.g., marketing, analysis, notice, etc.), and also provides a textbox where the data requester puts additional descriptions. For the scenario, the marketing manager chooses the data utilization purpose and ''Marketing'' and adds some descriptions such as ''Target user selection''. Demonstrations for the data-sharing system. VOLUME 11, 2023 For the required data types, only a textbox is provided where the requester describes required data types in natural language that can be understood by a data controller and data processors. Since the data requester cannot know the exact name of the attributes of the raw data database, the data requester only mentions the required data types. By reading the request, the data processor finally selects relevant data types when the request is actually processed. For the scenario, the marketing manager describes the required data types such as ''name, email, age, and state'' Moreover, the data requester should clarify the region (or location) and the time (or period) of data utilization for following with data providers' consent and compliance requirements. By filling out the form, the data requester creates a formal data request by describing his/her requirements. For the scenario, the marketing manager selects the location as the USA and one of the AWS regions in the USA for cloud service within the period of July 2020. Moreover, the manager selects relevant personal data flags and consent information predefined in the system for clearly describing the required data types.

2) REQUEST ASSESSMENT
After the data requester submits a data request, a data controller reviews and assesses the data request (in Figure 4b).
The data controller reviews the submitted data request and checks whether it is acceptable or not. If the request is not acceptable, the data controller can decline the request and send it back to the data requester. Otherwise, the data controller can set guidelines for applicable compliance rules and data sources if necessary; by doing this, the data controller can guide relevant raw data database according to the applied compliance rules, and the data processor can only access the limited set of data while processing the shareable dataset for the request. For the scenario, a data protection officer (DPO as the data controller) sets the applicable compliance rules as ''GDPR_USregion'' and chooses a relevant raw data database as ''ZPROFILES'' that contains customers' profile information.

3) DATA PROCESS
When the data controller decides to accept the request, the data processor is able to process the request according to the request from the data requester and comments from the data controller (in Figure 4c). The data processor selects only relevant data types from all attributes in the database. For the scenario, the data processor selects four data types out of 17 (i.e., ''age'', ''email'', ''name'', and ''state'').
When the data processor selects a shareable data type, each data type should be controlled by specific rules or regulations for compliance. For each data type, the data processor can see detailed information (e.g., metadata, relevant privacy terms, etc.) and can set the required compliance information, including company regulations and consent database matching. By setting all properties for compliance checking (e.g., consent, purpose, and period under designated controls) the system shows the consent checking result; that is, how much data can actually be shared with the data requester.
For the scenario, the data processor sets compliance information for the data type ''name''. The data processor set company regulation as ''GDPR Personal ID control'', and refers to privacy and terms with legal utilization period information. In addition, the data controller checks data providers' consent information in the ''ZPROFILES_AGREE'' table. By setting all compliance-related properties, the data processor gets 174,681 rows (out of 1.1M rows) for the data type ''name'' that can actually be shared with the data requester. On the right-hand side, the data processor also can see how many rows can be shared with the data requester when the data types are combined (i.e., the combination of ''age'', ''email'', ''name'', and ''state''). In this case, only 127,205 rows can be shared as a result.

4) DATA SHARING EVALUATION
When the data processor has finished creating a shareable dataset, the data controller evaluates the processed result by checking a compliance report generated by the system with the sample of rows (in Figure 4d). In the report, all process results are summarized, including the approval process, data export summary, and compliance summary. After evaluating the report, the data controller generates an access key for the dataset if the data controller approves sharing the dataset.
For the scenario, the data controller checks the compliance report with the summary of the data request and data process results. Particularly, for the compliance summary, the data controller is able to examine the properties for each data type with how many rows are selected according to the compliance matching. For example, a data type ''email'' has ''ID'' level privacy sensitivity (i.e., it is personal data) and is processed under ''GDPR Personal Data Control'' rules without applying any de-identification method. Note that although 174,666 rows are available for the ''email'' data type, the total number of exported rows is 127,205 due to complying with requirements for all combinations of data types. After checking the report, the data controller generates an access key ''YN1EQ· · · '' and finally approves to share the dataset.

5) DATA EXPORT
The data requester can check the progress and the result of the data request (in Figure 4e). If the data controller and the data processor have finished processing the data request, the data requester receives the notice with the access key issued by the data controller. After receiving the access key, the data requester can download the dataset created as a result of the data request process. For the scenario, the data requester checks a summary of the data request (i.e., ''Target Customer Selection · · · '') and downloads the dataset by clicking the ''download'' button with putting the access key ''YN1EQ· · · '' issued by the data controller.
In this section, the implementations and demonstration of the entire proposed data-sharing process are explained. In the next section, the results of the performance analysis are shown to check the feasibility of the proposed system (i.e., the prototype) in the real world.

B. PERFORMANCE ANALYSIS
This section analyses the performance of the proposed privacy-compliant data-sharing system. Before checking the performance of the implemented prototype of the proposed system, the performance of the proposed compliance checking algorithm in Procedure 1 is checked based on the simulation in the localhost environment. Then, the performance of the prototype implemented in a cloud computing environment is analyzed with a huge amount of real-world data in an enterprise resource planning system.

1) SIMULATION BASED ANALYSIS
For the simulation, an SQL-based database is used in a localhost environment that has a 3.9GHz CPU clock speed with 32GB memory size. The size of the database is 1.5GB with 200 attributes for a table with raw data consisting of alphanumeric values and a table of consent information consisting of boolean values. The number of maximum data rows is 1 million (1000K) for each table. Similar to the prototype development, performance analysis is done with the Node.js environment, which is the same as the demonstration. Figure 5 and Table 1 show the performance of the proposed privacy-compliant data sharing method (Procedure 1) that processes raw data and consent information matching for privacy-compliant data sharing. To show the results, the average values are plotted among 100 times independent simulations, and the performance is measured between querying data from the database and processing the proposed privacycompliant method.
First, in Figure 5, the effects of the number of rows are checked. In other words, when the number of data types is fixed at 200, the execution performances for processing the different rows are observed. Figure 5a shows the execution times of the proposed privacy-compliant data-sharing method with respect to various rows for both raw data and consent information. The figure shows that when the number of data rows is 200K, 600K, and 1000K, it takes about 13.4, 49.8, and 94.1 seconds, respectively. This shows that the execution time quadratically increases with the combination of both raw data and consent. Note that the number of rows for consent is related to the number of data providers and the number of data request purposes for the system. Moreover, Figure 5b shows the effects of the number of rows when either the number of the raw data or consent is fixed at 1000K rows. This experiment is for checking the effects of the number of raw data and consents individually. The figure shows that the amounts of raw data largely affect the execution time more than that of consent. This is because the characteristics of raw data are more complicated than that of consent from volume as well as types points of view. In addition, since the slope of the case with the fixed number of raw data is gradual rather than that of the case with the fixed number of consent, it shows that the effects of the consent checking procedure are less than that of raw data extraction on the entire execution time. This means the compliance checking function requires much fewer resources than the raw data extraction function.
Next, the effects of the number of data types are observed in Table 1. With various combinations for the number of data types and rows, the execution time of the proposed privacy-compliant data-sharing method. For the case with 1000K rows, the execution time takes about 14.82, 24.04, and 69.5 seconds when the number of data types is 30, 90, and 150, respectively. This shows that the execution time linearly  increases while the number of data types is linearly increased. On the other hand, similar to Figure 5a, for the case with 150 data types, the execution time takes about 10.04, 35.5, and 69.5 seconds when the number of rows is 200K, 600K, and 1000K, respectively. The execution time quadratically increases with the combination of rows for both raw data and consent.
In the next section, this paper shows a real-world based performance analysis with the implemented prototype with an enterprise resource planning system in a cloud computing environment.

2) REAL-WORLD BASED ANALYSIS
The setup for the prototype of the proposed system is shown in Figure 6, which is implemented on a public cloud service, AWS (Amazon Web Services). It is assumed that a company individually operates the databases as the data source and the data-sharing system; therefore, two virtual private clouds (VPC) are used, one is for the proposed privacycompliant data-sharing system and the other is for the data source. Two VPCs are connected by the peering function provided by AWS. The data-sharing system uses a secure   database connector that supports end-to-end encryption (i.e., Transport Layer Security (TLS)). The data source is an SAPbased [31], [32] enterprise resource planning (ERP) system operated by a cosmetic company, which contains raw data and consent databases. To ensure security, the proposed system follows the guidelines from the cloud service provider and only the authorized data processor was able to access the data and consent databases for extracting data from the ERP system.
On the other hand, the implementation of the proposed system follows by two-tier architecture; that is, the main functionalities are implemented in the public network (i.e., a public subnet) and the databases for the data-sharing system are implemented in the private network (i.e., a private subnet), which the system users only can directly access the functions of the data-sharing system, not the databases.
For analyzing the data processing performance of the proposed system, the execution environments are summarized in Table 2. For the datasets, this paper considers two types of datasets: Dataset 1 and Dataset 2. Dataset 1 contains 12 data types with 500M rows, and Dataset 2 contains 22 data 95924 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.     Table 3 shows the data processing results with two datasets. With the given computing environment, it takes about 130 and 304 minutes to process and match compliance information of Dataset 1 with 12 data types and Dataset 2 with 22 data types, respectively. On the other hand, it is assumed that three identical attributes are selected for de-identification for both datasets; therefore, the de-identification processing time for both cases is almost similar, which is 71 minutes. As a consequence, the total processing times for Dataset 1 and Dataset 2 are 201.8 and 375 minutes, respectively.
The results show that the compliance matching time is highly related to the number of data types considered for sharing rather than the absolute size of the datasets. For Dataset 1 and Dataset 2, it is shown that the size is increased by 56%, but the processing time is increased by 232%; moreover, the average processing time is also different (i.e., 398MB/min. and 268MB/min. for Dataset 1 and Dataset 2, respectively). Since not only the number of data types but also the volume of the database can affect the execution time of the proposed system, handling Dataset 2 (contains 22 attributes) takes much more time to match the consent and compliance information than that of Dataset 1 (contains 12 attributes).
The detailed analysis results of computing resources for each dataset processing are shown in Figure 7 and Figure 8. For both figures, they show that memory is highly utilized in the compliance matching phase but storage I/O (''EBSRead/Write'' in the figures) is relatively underutilized. On the other hand, in the de-identification phase, the storage I/O is also highly utilized. For both cases, the storage I/O speeds are limited to around 400-500Mbytes. Note that during the de-identification phase, only three data types are processed; therefore, the memory usage becomes lower. It can be interpreted that the network performance affects the performance of the entire system (particularly, the compliance matching) because the raw data databases are connected remotely.
Therefore, it is checked how the network throughputs affect the performance of the proposed system with the remote raw databases. For this experiment, all computing specifications except the network throughputs are the same, and it processes 10 million rows with 12 attributes. Figure 9 and Figure 10 show the performance metrics with the average 200Mbps and 400Mbps network throughputs, respectively. Note that identical de-identification methods are applied for both cases. It takes 12 and 6 minutes to process 10 million rows for the cases with 200Mbps and 400Mbps network throughput, respectively. Depending on the time consumed, the memory usage and disk read/write usage are also different. In the case of 200Mbps network throughput, it consumes more memory (about twice) and less disk input/output (about half) compared with the case of 400Mbps network throughput because bulk-based disk writes are performed for the data processing. This shows that it is important to consider not only computing capability but also networking capability for enterprise-level systems.

C. DISCUSSION
This paper observes and analyzes various aspects regarding the proposed consent-based privacy-compliant personal datasharing system, and there are several points to be discussed for further understanding of the proposed system. This paper has mainly focused on text-based datasets stored in relational databases. Accordingly, the proposed consentbased privacy-compliant processing method (Procedure 1) is for structured datasets stored in table-like databases. However, there are many different data storage methods and media, and companies usually utilize multiple data storage methods and media for managing their datasets depending on the various requirements such as costs, delay, volume, etc. Moreover, they deal with various data types (e.g., not only text but also image, audio, video, document, etc.).
Moreover, consent discovery methods should be considered, which find relevant consent of data owners to match various raw data for processing personal data to generate shareable datasets for data requesters.
Therefore, consent discovery and matching methods that can handle various different types of data storage methods and media should be studied in the future.

2) PRIVACY-PRESERVING TECHNIQUES
Even though this paper proposes a consent-based personal data-sharing system that does not rely on privacypreserving methods, there are inevitable cases to apply privacy-preserving techniques to comply with certain privacy requirements regardless of the existence of the allowed consent from data providers.
For the experiment, this paper adopts the same de-identification methods with the same type of personal data to minimize the impact of the privacy-preserving process (see Table 3). However, in reality, it is possible to apply various kinds of privacy-preserving techniques depending on the size, types (e.g., text, image, audio, video, etc.), or characteristics of datasets. The impact of privacy-preserving techniques on processing personal data with different characteristics should be analyzed further in the future.

VI. CONCLUSION
Since the issues of utilizing personal data while protecting privacy and data providers' right have focused, many companies now require tools for safely handling personal data. Especially, since identifying whether data is personal data or not becomes more difficult, a data provider's explicit consent on data utilization becomes more important to companies that want to utilize personal data. Therefore, this paper has proposed a consent-based privacy-compliant datasharing system. By analyzing a general process and the roles of actors for data-sharing in an enterprise environment, this paper has proposed system requirements that can support a consent-based privacy-compliant personal data-sharing system. According to the identified requirements, this paper has proposed the system architecture and detailed procedure for a consent-based privacy-compliant processing method that considers compliance checking as well as consent checking. For the demonstration, this paper also has presented a prototype implemented in a public cloud computing environment. Using the prototype, the performance analysis in the lab and real-world environments has shown that the proposed consent-based privacy-compliant personal data sharing system is feasible for real-world application.