Mapping RDF Databases to Property Graph Databases

RDF triplestores and property graph databases are two approaches for data management which are based on modeling, storing and querying graph-like data. In spite of such common principle, they present special features that complicate the task of database interoperability. While there exist some methods to transform RDF graphs into property graphs, and vice versa, they lack compatibility and a solid formal foundation. This paper presents three direct mappings (schema-dependent and schema-independent) for transforming an RDF database into a property graph database, including data and schema. We show that two of the proposed mappings satisfy the properties of semantics preservation and information preservation. The existence of both mappings allows us to conclude that the property graph data model subsumes the information capacity of the RDF data model.


I. INTRODUCTION
The database systems based on graph-oriented models are gaining relevance in the industry due to their use in various application domains where complex data analytics is required [4]. RDF triple stores and graph database systems are two approaches for data management that are based on modeling, storing, and querying graph-like data.
RDF triplestores are based on the RDF data model [24], [41], their standard query language is SPARQL [19], and RDF Schema [11] allows to describe classes of resources and properties (i.e. the data schema). On the other hand, most graph database systems are based on the Property Graph (PG) data model [2], and a standard query language is in current development [23].
Although the RDF model and the PG model are based on a graph-oriented structure, they have particularities that complicate the task of data interoperability between them. Among the most important differences we can mention: a PG allows properties for nodes and edges, i.e. nodes and edges could have a set of name-value pairs which are used The associate editor coordinating the review of this manuscript and approving it for publication was Wajahat Ali Khan . to introduce metadata; the RDF model use elements with special syntax and semantics (e.g. IRIs, blank nodes, literals, namespaces, reification, collections).

A. MOTIVATION
Considering the intrinsic connection between RDF triple stores and PG databases, and their popularity for representing knowledge databases, it becomes necessary to develop methods to allow interoperability among these types of systems.
The term ''Interoperability'' was introduced in the area of information systems, and is defined as the ability of two or more systems or components to exchange information, and to use the information that has been exchanged [35]. In the context of data management, interoperability is concerned with the support of applications that share and exchange information across the boundaries of existing databases [32].
Database interoperability is relevant for several reasons: promotes data exchange and data integration [30]; facilitates the reuse of available systems and tools [26], [32]; enables a fair comparison of database systems by using benchmarks [3], [37]; and supports the success of emergent systems and technologies [32]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

B. THE PROBLEM
To the best of our knowledge, the research about the interoperability between RDF and PG databases is very restricted (cf. Section VII). While there exist some system-specific approaches, most of them are restricted to data transformation and lack of solid formal foundations.

C. OBJECTIVES & CONTRIBUTIONS
Database interoperability can be divided into syntactic interoperability (i.e. data transformations), semantic interoperability (i.e. data exchange via schema and instance mappings), and query interoperability (i.e. transformations among different query languages or data accessing methods) [5].
The main objective of this paper is to study the semantic interoperability between RDF and PG databases. Specifically, we propose three database mappings: a simple mapping which allows transforming an RDF graph into a PG without considering schema restrictions (in both sides); a generic mapping which allows transforming an RDF graph (without RDF schema) into a PG that follows the restrictions defined by a generic PG schema; and, a complete mapping which allows transforming a complete RDF database into a complete PG database (i.e. schema and instance).
We study three desirable properties of the above database mappings: computability, semantics preservation, and information preservation. Based on such analysis, we argued that any RDF database can be transformed into a PG database. In terms of data modeling, we conclude that the PG data model subsumes the information capacity of the RDF data model.
The remainder of this paper is as follows: a formal background is presented in Section IV; the simple database mapping is described in Section III; the generic database mapping is described in Section IV; the complete database mapping is described in Section V; the experimental evaluation of the mappings is presented in Section VI; the related work is discussed in Section VII; our conclusions are presented in Section VIII.

II. PRELIMINARIES
This section presents a formal background required to study the interoperability between RDF and PG databases. In particular, we formalize the notions of database mapping, RDF database, and PG database.

A. DATABASE MAPPINGS
In general terms, a database mapping is a method to translate databases from a source database model to a target database model. We can consider two types of database mappings: direct database mappings, which allow an automatic translation of databases without any input from the user [34]; and manual database mappings, which require additional information (e.g. an ontology) to conduct the database translation. In this paper, we focus on direct database mappings.

2) SCHEMA, INSTANCE, AND DATABASE MAPPING
A database mapping defines a way to translate databases from a ''target'' database model to a ''source'' database model. For the rest of this section, assume that M 1 and M 2 are the source and the target database models respectively. Considering that a database includes a schema and an instance, we first define the notions of schema mapping and instance mapping. A schema mapping from M 1 to M 2 is a function SM from the set of all database schemas in M 1 , to the set of all database schemas in M 2 . Similarly, an instance mapping from M 1 to M 2 is a function IM from the set of all database instances in M 1 , to the set of all database instances in M 2 .
A database mapping from M 1 to M 2 is a function DM from the set of all databases in M 1 , to the set of all databases in M 2 . Specifically, a database mapping is defined as the combination of a schema mapping and an instance mapping.
Definition 1 (Database Mapping): A database mapping is a pair DM = (SM, IM) where SM is a schema mapping and IM is an instance mapping.

a: PROPERTIES OF DATABASE MAPPINGS
Every data model allows to structure the data in a specific way, or using a particular abstraction. Such abstraction determines the conceptual elements that the data model can represent, i.e. its representation power or information capacity [22].
Given two database models M 1 and M 2 , the possibility to exchange databases between them depends on their information capacity. Specifically, we say that M 1 subsumes the information capacity of M 2 iff every database in M 2 can be translated to a database in M 2 . Additionally, we say that M 1 and M 2 have the same information capacity iff M 1 subsumes M 2 and M 2 subsumes M 1 .
The information capacity of two database models can be evaluated in terms of a database mapping satisfying some properties. In particular, we consider three properties: computability, semantics preservation, and information preservation.
Assume that D 1 is the set of all databases in a source database model M 1 , and D 2 is the set of all databases in a target database model M 2 .
Definition 2 (Computable mapping): A database mapping DM : D 1 → D 2 is computable if there exists an algorithm A that, given a database D ∈ D 1 , A computes DM(D).
The property of computability indicates the existence and feasibility of implementing a database mapping from M 1 to M 2 . This property also implicates that M 2 subsumes the information capacity of M 1 .

Definition 3 (Semantics Preservation):
A computable database mapping DM : D 1 → D 2 is semantics preserving if for every valid database D ∈ D 1 , there is a valid database D ∈ D 2 satisfying that D = DM(D).
Semantics preservation indicates that the output of a database mapping is always a valid database. Specifically, the output database instance satisfies the constraints defined by the output database schema. In this sense, we can say that this property evaluates the correctness of a database mapping.

Definition 4 (Information preservation):
Information preservation indicates that, for some database mapping DM 3 , there exists an ''inverse'' database mapping DM −1 which allows recovering a database transformed with DM 3 . Note that the above definition implies the existence of both a ''inverse'' schema mapping SM −1 and a ''inverse'' instance mapping IM −1 .
Information preservation is a fundamental property because it guarantees that a database mapping does not lose information [34]. Moreover, it implies that the information capacity of the target database model subsumes the information capacity of the source database model.
Our goal is to define database mappings between the RDF data model and the PG data model. Hence, next, we will present a formal definition of the notions of instance, schema, and database for them.

B. RDF DATABASES
An RDF database is an approach for data management which is oriented to describe the information about Web resources by using Web models and languages. In this section we describe two fundamental standards used by RDF databases: the Resource Description Framework (RDF) [24], which is the standard data model to describe the data; and RDF Schema [11], which is a standard vocabulary to describe the structure of the data.

1) RDF GRAPH
Assume that I , B, and L are three disjoint infinite sets, corresponding to IRIs, blank nodes and literals respectively.
An IRI identifies a concrete web resource, a blank node identifies an anonymous resource, and a literal is a basic value (e.g. a string, a number or a date). We will use the term RDF resource to indicate any element in the set I ∪ B.
An RDF triple is a tuple t = (v 1 , v 2 , v 3 ) where v 1 ∈ I ∪ B is called the subject, v 2 ∈ I is called the predicate and v 3 ∈ I ∪ B ∪ L is called the object. Here, the subject represents a resource, the predicate represents a relationship of the resource, and the object represents the value of such relationship. Given a set of RDF triples S, we will use sub(S), pred(S) and obj(S) to denote the sets of subjects, predicates, and objects in S respectively.
There are different data formats to encode a set of RDF triples, including Notation3 (N3) [12], RDF/XML [17], N-Triples [14], Turtle [8] and N-Quads [13]. The following example shows a set of RDF triples encoded using the Turtle data format.
Example 2.1: The lines beginning with @prefix are prefix definitions and the rest are RDF triples. A prefix definition associates a prefix (e.g. voc) with an IRI (e.g. http://www.example.org/voc/). Hence, a full IRI like http://www.example.org/voc/Person can be abbreviated as a prefixed name voc:Person. We will use prefix(r) and name(r) to extract the prefix and the name of an IRI r respectively.
In order to facilitate readability, we will use prefixed names instead of full URIs. Moreover, we will assume that there exists a standard way to transform a full URI into a prefixed name, and vice versa (e.g. by using an internal index or an external service like DRPD [42]).
A blank node is usually represented as _: followed by a blank node label which is a series of name characters (e.g. _:b1). There are other ways to encode blank nodes (e.g. []), but we will use the above for simplicity. Given a blank node b, the function lab(b) returns the label of b.
We will consider two types of literals: a simple literal which is a Unicode string (e.g. "Elon Musk"), and a typed literal which consists of a string and a datatype IRI (e.g. "46"ˆˆxsd:int). Numbers can be unquoted and boolean values may be written as either true or false. Given a literal l, the function val(l) returns the string of l.
A set of RDF triples can be visualized as a graph where the nodes represent the resources, and the edges represent properties and values. However, the RDF model has a particular feature: an IRI can be used as an object and predicate in an RDF graph. For instance, the triple (voc:ceo, rdfs:label, "Chief Executive Officer") can be added to the graph shown in Example 2.1 to include metadata about the property voc:ceo. It implies that an RDF graph is not a traditional graph because it allows edges between edges, and consequently an RDF graph cannot be visualized in a traditional way. Next, we introduce a formal definition of the RDF data model which is able to support the above feature.
Definition 5 (RDF Graph): An RDF graph is defined as a • N R is a finite set of nodes representing RDF resources (i.e. resource nodes divided in IRI nodes and blank nodes); • N L is a finite set of nodes representing RDF literals (i.e. literal nodes), satisfying that N R ∩ N L = ∅; • E O is a finite set of edges called object property edges; • E D is a finite set of edges called datatype property edges, 1 satisfying that E O ∩ E D = ∅.; • α R : N R → I ∪ B is a total one-to-one function that associates each resource node with a resource identifier (i.e. either a IRI or a blank node identifier); • α L : N L → L is a total one-to-one function that associates each literal node with a single literal; is a total function that associates each object property edge with a pair of resource nodes; is a total function that associates each datatype property edge with a resource node and a literal node; → I is a partial function that assigns a resource class label to each node or edge.
Note that the function δ has been defined as being partial in order to support a partial connection between schema and data (which is usual in real RDF datasets). However, it is possible to define the following simple procedure to make the function δ total: For each resource r ∈ N R , if r / ∈ dom(δ) then assign δ(r) = rdfs:Resource. Therefore, we will assume that every resource in an RDF graph defines its resource class.
Concerning the issue about an IRI u occurring as both resource and property, note that u will occur as resource and property separately. In such a case, we will have a bipartite graph. The same applies for blank nodes.
Given a set of RDF triples S, the procedure to create a formal RDF graph G R = (N R , N L , E O , E D , α R , α L , β O , β D , δ) from S is defined as follow: • For every resource r ∈ sub(S), there is a node n ∈ N R with α R (n) = r; -If (r, rdf:type, c) ∈ S then δ(n) = c, else δ(n) = rdfs:Resource; • For every literal l ∈ obj(S) ∩ L, there is a node n ∈ N L ; -If l is a simple literal then α L (n) = l and δ(n) = xsd:string; -If l is a typed literal of the form valueˆˆdatatype then α L (n) = value and δ(n) = datatype; Additionally, Figure 1 shows a graphical representation of the RDF graph described above. The IRI nodes are represented as ellipses, the blank nodes are represented as dotted ellipses and literal nodes are presented as rectangles. Each node is labeled with two IRIs: the inner IRI indicates the resource identifier, and the outer IRI indicates the resource class of the node. Each edge is labeled with an IRI that indicates its property class. We use balloons to indicate the object identifiers.
2) RDF GRAPH SCHEMA RDF Schema (RDFS) [11] defines a standard vocabulary (i.e., a set of terms, each having a well-defined meaning) which enables the description of resource classes and property classes. From a database perspective, RDF Schema can be used to describe the structure of the data in an RDF database. In order to describe classes of resources and properties, the RDF Schema vocabulary defines the following terms: rdfs:Class and rdf:Property represent the classes of resources, and properties respectively; rdf:type can be used (as property) to state that a resource is an instance of a class; rdfs:domain and rdfs:range allow to define domain resource classes and range domain classes for a property, respectively. Note that rdf: and rdfs: are the prefixes for RDF and RDFS respectively.
An RDF Schema description consists into a set of RDF triples, so it can be encoded using RDF data formats. The following example shows an RDF Schema document which describes the structure of the data shown in Example 2.1, using the Turtle data format.

Example 2.2:
Note that: a resource class rc is defined by a triple of the form (rc rdf:type rdfs:Class); a property class pc is defined by a triple of the form (pc rdf:type rdf:Property); a triple (pc rdfs:domain rc 1 ) indicates that the resource class rc 1 is part of the domain of pc (i.e. a resource of class rc 1 could have an outgoing property pc); a triple (pc rdfs:range rc 2 ) indicates that the resource class rc 2 is part of the range of pc (i.e. a resource of class rc 1 could have an incoming property pc).
If the range of a property class pc is a resource class (defined by the user), then pc is called an object property (e.g. voc:ceo). If the range is a datatype class, defined by RDF Schema or another vocabulary, then pc is called a datatype property (e.g. age). The IRIs xsd:string, xsd:integer and xsd:dateTime are examples of datatypes defined by XML Schema [9]. Let I DT ⊂ I be the set of RDF datatypes.
Note that the RDF schema presented in Example 2.2 provides a complete description of resource classes and property classes. However, in practice, it is possible to find incomplete or partial RDF schema descriptions. In particular, a datatype could not be defined as a resource class, and property could not define its domain or its range.
We will assume that a partial schema can be ''normalized'' to be a total schema. In this sense, we will use the term rdfs:Resource 2 to complete the definition of properties without domain or range. For instance, suppose that our sample RDF Schema does not define the range of the property class voc:ceo. In such case, we will include the triple (voc:ceo, rdfs:range, rdfs:Resource) to complete the definition of voc:ceo. Now, we introduce the notion of RDF graph schema as a formal way to represent an RDF schema description. Assume that I V ⊂ I is the set that includes the RDF Schema terms rdf:type, rdfs:Class, rdfs:Property, rdfs:domain and rdfs:range.
Definition 6 (RDF Graph Schema): An RDF graph schema is defined as a tuple S R = (N S , E S , φ, ϕ) where: • N S is a finite set of nodes representing resource classes; • E S is a finite set of edges representing property classes; • φ : (N S ∪ E S ) → I \ I V is a total function that associates each node or edge with an IRI representing a class identifier; • ϕ: E S → (N S × N S ) is a total function that associates each property class with a pair of resource classes. Recall that I DT denotes the set of RDF datatypes. Given an RDF Schema description D, the procedure to create an RDF graph schema S R = (N S , E S , φ, ϕ) from D is given as follows: 1) Let C = {rc | (rc, rdf:type, rdfs:Class) ∈ D ∨ (pc, rdfs:domain, rc) ∈ D ∨ (pc, rdfs:range, rc) ∈ D} 2) For each rc ∈ C, we create n ∈ N S with φ(n) = rc 3) For each pair of triples (pc, rdfs:domain, rc 1 ) and (pc, rdfs:range, rc 2 ) in D, we create e ∈ E S with φ(e) = pc and ϕ(e) = (n 1 , n 2 ), satisfying that n 1 , n 2 ∈ N S , φ(n 1 ) = rc 1 and φ(n 2 ) = rc 2 . Following the above procedure, the RDF schema shown in Example 2.2 can be formally described as follows: Additionally, Figure 2 shows a graphical representation of the RDF schema graph described above.
Given an RDF graph schema S R = (N S , E S , φ, ϕ) and an RDF graph

2) for each
Here, condition (1) validates that every resource node is labeled with a resource class defined by the schema; condition (2) verifies that each object property edge, and the pairs of resource nodes that it connects, are labeled with the corresponding resource classes; and condition (3) verifies that each datatype property edge, and the pairs of nodes that it connects (i.e. a resource node and a literal node), are labeled with the corresponding resource classes Finally, we present the notion of RDF database.

C. PROPERTY GRAPH DATABASES
A Property Graph (PG) is a labeled directed multigraph whose main characteristic is that nodes and edges can contain a set (possibly empty) of name-value pairs referred to as properties. From the point of view of data modeling, each node represents an entity, each edge represents a relationship (between two entities), and each property represents a specific characteristic (of an entity or a relationship). Figure 3 presents a graphical representation of a PG. The circles represent nodes, the arrows represent edges, and the boxes contain the properties for nodes and edges.
Currently, there are no standard definitions for the notions of PG and PG Schema. However, we present formal definitions that resemble most of the features provided by current PG database systems.

1) PROPERTY GRAPH
Assume that L is an infinite set of labels (for nodes, edges and properties), V is an infinite set of (atomic or complex) values, and T is a finite set of data types (e.g. string, integer, date, etc.). A value in V will be distinguished as a quoted string. Given a value v ∈ V, the function type(v) returns the datatype of v. Given a set S, P + (S) denotes the set of nonempty subsets of S. Definition 8 (Property Graph): A Property Graph is defined as a tuple G P = (N, E, P, , ϒ, , ) where: • N is a finite set of nodes, E is a finite set of edges, P is a finite set of properties, and N, E, P are mutually disjoint sets; • : (N ∪ E) → L is a total function that associates each node or edge with a label; is a total function that assigns a labelvalue pair to each property.
is a total function that associates each edge with a pair of nodes; • : (N∪E) → P + (P) is a partial function that associates a node or edge with a non-empty set of properties, satisfying that (o 1 )∩ (o 2 ) = ∅ for each pair of objects The above definition supports PGs with the following features: a pair of nodes can have zero or more edges; each node or edge has a single label; each node or edge can have zero or more properties; and a node or edge can have the same label-value pair one or more times.
On the other side, the above definition does not support multiple labels for nodes or edges. We have two reasons to justify this restriction. First, this feature is not supported by all graph database systems. Second, it makes complex the definition of schema-instance consistency.
Given two nodes n 1 , n 2 ∈ N and an edge e ∈ E, satisfying that (e) = (n 1 , n 2 ), we will use e = (n 1 , n 2 ) as a shorthand representation for e, where n 1 and n 2 are called the ''source node'' and the ''target node'' of e respectively.
Hence, the formal description of the PG presented in Figure 3 is given as follows:

2) PROPERTY GRAPH SCHEMA
A Property Graph Schema defines the structure of a PG database. Specifically, it defines types of nodes, types of edges, and the properties for such types.
For instance, Figure 4 shows a graphical representation of a PG schema. The formal definition of PG schema is presented next.
Definition 9 (Property Graph Schema): A property graph schema is defined as a tuple S P = (N S , E S , P S , , , , ) where: • N S is a finite set of node types; • E S is a finite set of edge types; • P S is a finite set of property types; • : (N S ∪ E S ) → L is a total function that assigns a label to each node or edge; • : P S → (L × T) is a total function that associates each property type with a property label and a data type; • : E S → (N S × N S ) is a total function that associates each edge type with a pair of node types; is a partial function that associates a node or edge type with a non-empty set of property types, satisfying that (o 1 ) ∩ (o 2 ) = ∅, for each pair of objects o 1 , o 2 ∈ dom( ). Hence, the formal description of the PG schema shown in Figure 4 is the following: Given a PG schema S P = (N S , E S , P S , , , , ) and a PG G P = (N, E, P, , ϒ, , ), we say that G P is valid with respect to S P , denoted as G P | S P , iff: 1) for each n ∈ N, it applies that there is nt ∈ N S satisfying that: ). Here, condition (1a) validates that every node is labeled with a node type defined by the schema; condition (1b) verifies that each node contains the properties defined by its node type; condition (2a) verifies that each edge, and the pairs of nodes that it connects, are labeled with an edge type, and the corresponding node types; and condition (2b) verifies that each edge contains the properties defined by the schema.
Finally, we present the notion of PG database. Definition 10 (Property Graph Database): A property graph database D P is a pair (S P , G P ) where S P is a PG schema and G P is a PG satisfying that G P | S P .

D. RDF DATABASES VERSUS PG DATABASES
Upon comparison of RDF graphs and PGs, we see that both share the main characteristics of a traditional labeled directed graph, that is, nodes and edges contain labels, the edges are directed, and multiple edges are possible between a given pair of nodes. However, there are also some differences between them: • An RDF graph allows three types of nodes (IRIs, blank nodes and literals) whereas a PG allows a single type of node; VOLUME 8, 2020 • Each node or edge in an RDF graph contains just a single value (i.e. a label), whereas each node or edge in a PG could contain multiple labels and properties respectively; • An RDF graph supports multi-value properties, whereas a PG usually just support mono-value properties; • An RDF graph allows to have edges between edges, a feature which isn't supported in a PG (by definition); • A node in an RDF graph could be associated with zero or more classes or resources, while a node in a PG usually has a single node type. In addition to the above structural differences, RDF Schema gives special semantics to the terms in its vocabulary. For example, the terms rdf:Statement, rdf:subject, rdf:predicate and rdf:object can be used to describe explicitly RDF statements. This feature, called ''reification'', is not studied in this article as it is rarely used in practice.
A very interesting feature of both, RDF and PG databases, is the support for schema-less databases, i.e. the databases could not have a fixed data structure. In the particular case of RDF, it is possible to find three types of datasets: datasets without schema definitions, datasets that merge data and schema; and datasets that separate schema and instance.
Depending on whether or not the input RDF dataset has a schema, the database mappings can be classified into two types: (i) schema-dependent: one that generates a target PG schema from the input RDF graph schema, and then transforms the RDF graph into a PG; and (ii) schema-independent: one that creates a generic PG schema (based on a predefined structure) and then transforms the RDF graph into a PG. In this paper, we developed these two types of database mappings.

III. SIMPLE DATABASE MAPPING (SDM)
This section describes the schema-independent database mapping DM 1 which allows to transform an schema-less RDF database into a schema-less PG database. DM 1 is just composed of an instance mapping which allows to transform the input RDF graph into a PG graph.
Given an RDF database D R = (∅, G R ), we define the database mapping DM 1 = (∅, IM 1 ) such that The instance mapping IM 1 is defined as follows: Definition 11 (Instance mapping IM 1 ): be an RDF graph and G P = (N, E, P, , ϒ, , ) be a PG. The instance mapping IM 1 (G R ) = G P is defined as follows: 1) For each r ∈ N R • There will be n ∈ N with (n) = name(δ(r)) 2) For each op ∈ E O satisfying that β O (op) = (r, r ) where r, r ∈ N R FIGURE 5. Property graph obtained after applying the instance mapping IM 1 to the RDF graph shown in Figure 1.
In general terms, the instance mapping IM 1 creates PG nodes from resource nodes, PG properties from datatype properties, and PG edges from object properties. Nodes, edges and properties are labeled with the name of the corresponding resource class label (defined by the function δ) or the name of the resource identifier (when function δ is undefined).
For example, the PG obtained after applying IM 1 over the RDF graph shown in Figure 1 is given as follows:  In this section we evaluate the properties of the database mapping DM 1 , i.e. computability, semantics preservation and information preservation. Recall that DM 1 just contains the instance mapping IM 1 , and the output is a PG database without RDF graph schema.
Proposition 1: The database mapping DM 1 is computable.
It is easy to see that the procedure presented in Definition 11 can be implemented as an algorithm.

Proposition 2:
The database mapping DM 1 is semantic preserving.
Note that DM 1 assumes that there is no RDF graph schema, i.e. no schema restrictions are considered. Moreover, the output PG database does not contain a PG schema. Hence, it is straightforward to see that DM 1 is semantic preserving.
Proposition 3: The database mapping DM 1 is not information preserving.
Note that the instance mapping IM 1 loses multiple pieces of information from the input RDF graph. In particular, it extract simple labels from IRIs and blank nodes (e.g. by removing the namespace part of a IRI). Hence, it is not possible to define an inverse mapping which is able to reconstruct all the original information.
Although the database mapping DM 1 does not satisfy the information preservation property, it is a simple method to transform RDF datasets that contains a merge of data and schema. In particular, it works well with RDF graphs where each resource defines its resource class by means of the rdf:type term.

IV. GENERIC DATABASE MAPPING (GDM)
This section describes the schema-independent database mapping DM 2 which allows to transform a schema-less RDF database into a complete PG database. DM 2 is composed of a schema mapping SM 2 and an instance mapping IM 2 such that SM 2 generates a ''generic'' PG schema (always the same) and IM 2 allows to generate a PG graph from the input RDF graph.
Given an RDF database D R = (∅, G R ), we define the database mapping DM 2 = (SM 2 , IM 2 ) such that DM 2 (D R ) = (S P , G P ) where S P is a generic PG schema and G P = IM 2 (G R ). The schema mapping SM 2 and the instance mapping IM 2 are defined next.

A. GENERIC PROPERTY GRAPH SCHEMA
First we introduce a property graph schema which is able to model any RDF graph.
Definition 12 (Generic Property Graph Schema): Let S * = (N S , E S , P S , , , , ) be the PG schema defined as follows:  In the above definition: the node type Resource will be used to represent RDF resources, the node type Literal will be used to represent RDF literals, the edge type ObjectProperty allows to represent object properties (i.e. relationships between RDF resources), and the edge type DatatypeProperty allows representing datatype properties (i.e. relationships between an RDF resource an a literal). Figure 6 shows a graphical representation of the generic PG schema.

B. INSTANCE MAPPING IM 2
Now, we define the instance mapping IM 2 which takes an RDF graph and produces a PG following the restrictions established by the generic PG schema defined above.
Definition 13 (Instance mapping IM 2 ): Let G R = (N R , N L , E O , E D , α R , α L , β O , β D , δ) be an RDF graph and G P = (N, E, P, , ϒ, , ) be a PG. The instance mapping IM 2 (G R ) = G P is defined as follows: 1) For each r ∈ N R • There will be n ∈ N with (n) = Resource • There will be p ∈ P • If α R (r) ∈ I then ϒ(p) = (iri, α R (r)) • If α R (r) ∈ B then ϒ(p) = (id, α R (r)) • There will be p ∈ P with ϒ(p ) = (type, δ(r)) • (n) = {p, p } 2) For each l ∈ N L • There will be n ∈ N with (n) = Literal • There will be p ∈ P with ϒ(p) = (value, α L (l)) • There will be p ∈ P with ϒ(p ) = (type, δ(l)) where r 1 , r 2 ∈ N R • There will be e ∈ E with (e) = ObjectProperty, and (e) = (n 1 , n 2 ) where n 1 , n 2 ∈ N correspond to r 1 , r 2 ∈ N R respectively • There will be p ∈ P with ϒ(p) = (type, δ(op)) • (e) = {p} 4) For each dp ∈ E D satisfying that β D (dp) = (r, l) where r ∈ N R and l ∈ N L VOLUME 8, 2020 • There will be e ∈ E with (e) = DatatypeProperty, and (e) = (n 1 , n 2 ) where n 1 , n 2 ∈ N correspond to r and l respectively • There will be p ∈ P with ϒ(p) = (type, δ(dp)) • (e) = {p} According to the above definition, the instance mapping IM 2 creates PG nodes from resource nodes and literal nodes, and PG edges from datatype properties and object properties. The property type is used to maintain resource class identifiers and RDF datatypes. The property iri is used to store the IRI of RDF resources and properties. The property value is used to maintain a literal value.
For example, the PG obtained after applying IM 2 over the RDF graph shown in Figure 1 is given as follows: Figure 7 shows a graphical representation of the PG described above.

C. PROPERTIES OF DM 2
In this section we evaluate the properties of the database mapping DM 2 . Recall that DM 2 is a formed by the schema mapping SM 2 and the instance mapping IM 2 , where SM 2 always creates a generic PG schema S * .

Proposition 4:
The database mapping DM 2 is computable. It is not difficult to see that an algorithm can be created from the description of the instance mapping IM 2 , presented in Definition 13.
Lemma 1: The database mapping DM 2 is semantics preserving.
It is straightforward to see (by definition) that any PG graph created with the instance mapping IM 2 will be valid with respect to the generic PG schema S * .
Theorem 1: The database mapping DM 2 is information preserving.
In order to prove that DM 2 is information preserving, we need to provide a database mapping DM −1 2 which allows to transform a PG database into an RDF database, and show that for every RDF database D R , it applies that D R = DM −1 2 (DM 2 (D R ). Recalling that the objective of this section is to provide a schema-independent database mapping, we will assume that for any RDF database D R = (S R , G R ), the RDF graph schema S R is null or irrelevant to validate G R . Hence, we just define an instance mapping IM −1 2 which allows to transform a PG graph into an RDF database, such that for every RDF graph G R , it must satisfy that G R = IM −1 2 (IM 2 (G R )). Definition 14 (Instance mapping IM −1 2 ): Let G P = (N, E, P, , ϒ, , ) be a property graph and be an RDF graph. The instance mapping IM −1 2 (G P ) = G R is defined as follows: 1) For each n ∈ N satisfying that (n) = Resource, p 1 , p 2 ∈ (n), ϒ(p 1 ) = (iri, v 1 ) and ϒ(p 2 ) = (type, v 2 ), then there will be r ∈ N R with α R (r) = v 1 and δ(r) = v 2 2) For each n ∈ N satisfying that (n) = BlankNode, p 1 , p 2 ∈ (n), ϒ(p 1 ) = (id, v 1 ) and ϒ(p 2 ) = (type, v 2 ), then there will be r ∈ N R with α R (r) = v 1 and δ(r) = v 2 3) For each n ∈ N satisfying that (n) = Literal, p 1 , p 2 ∈ (n), ϒ(p 1 ) = (value, v 1 ) and ϒ(p 2 ) = (type, v 2 ), then there will be r ∈ N L with α L (r) = v 1 and δ(r) = v 2 4) For each e ∈ E satisfying that (e) = ObjectProperty, p ∈ (e), ϒ(p) = (type, v), (e) = (n 1 , n 2 ), then there will be op ∈ E O with δ(op) = v, β O (op) = (r 1 , r 2 ) where r 1 ∈ N R corresponds to n 1 ∈ N, and r 2 ∈ N R corresponds to n 2 ∈ N 5) For each e ∈ E satisfying that (e) = DatatypeProperty, p ∈ (e), ϒ(p) = (type, v), (e) = (n 1 , n 2 ), then there will be dp ∈ E D with δ(dp) = v, β D (dp) = (r 1 , r 2 ) where r 1 ∈ N R corresponds to n 1 ∈ N, and r 2 ∈ N L corresponds to n 2 ∈ N Hence, the above method defines that for each node labeled with Resource or BlankNode is transformed into a resource node, each node labeled with Literal is transformed into a literal node, each edge labeled with ObjectProperty is transformed into a resource-resource edge, and each edge labeled with DatatypeProperty is transformed into a resource-literal edge. Additionally, the property iri is used to recover the original IRI identifier for Resource nodes, the property id is used to recover the original identifier for BlankNode nodes, and the property type allows us to recover the IRI identifier of the resource class associated to each node or edge.
It is not difficult to verify that for any RDF graph G R , we can produce a PG graph G P = SM 3 (G R ), and then recover G R by using IM −1 2 (G P ).

V. COMPLETE DATABASE MAPPING (CDM)
This section describes the schema-dependent database mapping DM 3 which allows to transform a complete RDF database into a complete PG database. DM 3 is composed of a schema mapping SM 3 and an instance mapping IM 3 such that SM 3 generates a PG schema from the input RDF graph schema, and IM 3 generates a PG graph from the input RDF graph.
Recall that I DT is the set of IRIs referencing RDF datatypes, and T is the set of PG datatypes. Assume that there is a total function f : I DT → T which maps RDF datatypes into PG datatypes. Additionally, assume that f −1 is the inverse function of f , i.e. f −1 maps PG datatypes into RDF datatypes.
Given an RDF database D R = (S R , G R ), we define the database mapping DM 3 = (SM 3 , IM 3 ) such that DM 3 (D R ) = (S P , G P ) where S P = SM 3 (S R ) and G P = IM 3 (G R ). The schema mapping SM 3 and the instance mapping IM 3 are defined next.

A. SCHEMA MAPPING SM 3
We define a schema mapping SM 3 which takes an RDF graph schema as input and returns a PG Schema as output.
Hence, the schema mapping SM 3 creates a node type for each resource type (with exception of RDF data types);, creates a property type for each object property, and creates an edge type for each value property.
Assume that the function f is defined by the following datatype assignments: f (xsd:string) = String, f (xsd:int) = Integer and f (xsd:date) = Date. Hence, the PG schema obtained from the RDF graph schema shown in Figure 2 is given as follows:

B. INSTANCE MAPPING IM 3
Now, we define the instance mapping IM 3 which takes an RDF graph as input and returns a PG as output.
be an RDF graph and G P = (N, E, P, , ϒ, , ) be a PG. The instance mapping IM 3 (G R ) = G P is defined as follows: 1) For each r ∈ N R • There will be n ∈ N with (n) = δ(r) • There will be p ∈ P • If α R (r) ∈ I then ϒ(p) = (iri, α R (r)) • There will be e ∈ E with (e) = δ(op), (e) = (n 1 , n 2 ) where n1, n2 ∈ N correspond to r1, r2 ∈ N R respectively. 3) For each dp ∈ E D satisfying that β D (dp) = (r 1 , r 2 ) • There will be p ∈ P with ϒ(p) = (δ(dp), α L (r 2 )), (n) = (n) ∪ {p} where n ∈ N corresponds to r 1 ∈ N R . According to the above definition, the instance mapping IM 3 creates a node in G R for each resource node, creates a property in G R for each datatype property, and creates an edge in G R for each object property.
For example, the PG obtained after applying IM 3 over the RDF graph shown in Figure 1 is given as follows:  Figure 8 shows a graphical representation of the PG described above.

C. PROPERTIES OF DM 3
In this section we will evaluate the properties of the database mapping DM 3 . Recall that DM 3 is formed by the schema mapping SM 3 and the instance mapping IM 3 .
Proposition 5: The database mapping DM 3 is computable.
It is straightforward to see that Definition 15 and Definition 16 can be transformed into algorithms to compute SM 3 and IM 3 respectively.
Lemma 2: The database mapping DM 3 is semantics preserving.
Note that the schema mapping SM 3 and the instance mapping IM 3 have been designed to create a PG database that maintains the restrictions defined by the source RDF database. On one side, the schema mapping SM 3 allows transforming the structural and semantic restrictions from the RDF graph schema to the PG schema. On the other side, any PG generated by the instance mapping IM 3 will be valid with respect to the generated PG schema.
The following facts support the semantics preservation property of DM 3 : • We provide a procedure to create a complete RDF graph schema S R from a set of RDF triples describing an RDF schema, i.e. each property defines its domain and range resource classes.
• We provide a procedure to create an RDF graph G R from a set of RDF triples, satisfying that every node and edge in G R is associated with a resource class; it allows a complete connection between the RDF instance and the RDF schema.
• The schema mapping SM 3 creates a node type for each user-defined resource type, a property type for each datatype property, and an edge for each object property.
• Similarly, the instance mapping IM 3 creates a node for each resource, a property for each resource-literal edge, and an edge for each resource-resource edge. Theorem 2: The database mapping DM 3 is information preserving.
In order to prove that DM 3 is information preserving, we will define a database mapping DM −1 3 ) which allows to transform a PG database into an RDF database. The inverse mapping DM −1 3 must satisfy that D = DM −1 3 (DM 3 (D)) for any RDF database D. Next we define the schema mapping SM −1 3 and the instance mapping IM −1 3 . Definition 17 (Schema mapping SM −1 3 ): Let S P = (N S , E S , P S , , , , ) be a PG schema and S R = (N S , E S , φ, ϕ) be an RDF schema. The schema mapping SM −1 3 (S R ) = S P is defined as follows: Let ω : C → N S be a function that maps IRIs to resource classes • For each n ∈ C -There will be rc ∈ N S with φ(rc) = n ω(n) = rc -There will be pc ∈ E S with φ(pc) = (et) and ϕ(pc) = (ω(nt 1 ), ω(nt 2 )) • For each nt ∈ N S -For each pt ∈ (nt) with (pt) = (n, t) * There will be pc ∈ E S with phi(pc) = n and ϕ(pc) = (ω(nt), ω(f −1 (t))) In general terms, the schema mapping SM −1 3 creates a resource class for each node type, an object property for each edge type, and a datatype property for each property type. Given a PG schema S P = SM 3 (S R ), the schema mapping SM −1 3 allows to ''recover'' all the schema constraints defined by S R , i.e SM −1 3 (S P ) = S R . An issue of SM −1 3 , is the existence of RDF datatypes which are not supported by PG databases. For example, rdfs:Literal has no equivalent datatype in PG database systems. The solution to this issue is to find a one-to-one correspondence between RDF datatypes and PG datatypes.
Definition 18 (Instance mapping IM −1 3 ): Let G P = (N, E, P, , ϒ, , ) be a property graph and G R = (N R , N L , E O , E D , α R , α L , β O , β D , δ) be an RDF graph. The instance mapping IM −1 3 (G P ) = G R is defined as follows: 1) For each n ∈ N, there will be r ∈ N R where a) α R (r) = v such that p ∈ (n) and ϒ(p) = (iri, v) or ϒ(p) = (id, v) b) δ(r) = (n) c) For each p ∈ (n) satisfying that ϒ(p) = (lab, val) and lab / ∈ {iri,id}, there will be l ∈ N L and dp ∈ E D with α L (l) = val, δ(l) = f −1 (type(val)), δ(dp) = lab and β(dp) = (r, l) 2) For each e ∈ E where (e) = (n 1 , n 2 ), there will be op ∈ E O with δ(op) = (e) and β(op) = (r 1 , r 2 ) such that r 1 , r 2 correspond to n 1 , n 2 respectively. Hence, the above method defines that each node in G P is transformed into a resource node in G R , each property in G P is transformed into a datatype property in G R , and each edge in G P is transformed into an object property in G R . Given a PG G P = IM 3 (G R ), the instance mapping IM −1 3 allows to ''recover'' all the data in G R , i.e IM −1 3 (G P ) = G R . Note that each RDF graph produced by the instance mapping IM −1 3 will be valid with respect to the schema produced with the corresponding schema mapping SM −1 3 . Hence, any RDF database D R can be transformed into a PG database by using the database mapping DM(D R ), and D R could be recovered by using the database mapping DM −1 3 .

VI. EXPERIMENTAL EVALUATION
The objective of our experimental evaluation is to examine the performance and scalability of the database mappings presented in this work. This section includes a description of the implementation, the evaluation methodology, the experimental results, and the corresponding discussion.

A. IMPLEMENTATION
We have developed a java application called rdf2pg which implements the mappings described in this article. The source code and the executable jar file of rdf2pg can be downloaded from Github (https://github.com/ renzoar/rdf2pg). The tool can be executed in command line by using an expression with the structure java -jar rdf2pg.jar <m> <i> <s> where <m> indicates the database mapping (-sdm = simple database mapping, -gdm = generic database mapping, -cdm = complete database mapping), <i> indicates the input instance RDF graph file, and <s> indicates the input RDF schema file (in case of using -gdm or -cdm ).
The output of the simple database mapping is a file encoding a PG. In addition, the generic and the complete instance mappings produce a second file containing the PG schema. The current implementation uses the YARS-PG [40] data format for both output files.
The rdf2pg API includes an interface named PGWriter which can be implemented to support other data formats. The use of PGWriter is very simple as it provides the methods WriteNode(PGNode node) and WriteEdge(PGEdge edge) which should be implemented with the corresponding instructions to write nodes and edges in the output data format.
In order to support the processing of large RDF data files, rdf2pg uses the StreamRDF class provided by Apache Jena. Additionally, rdf2pg implements two methods for writing the output file: a memory-based method which creates a PG object (which follows the definition presented in Section II-C.1); and a disk-based method which writes the output by using a minimal set of structures.  Additionally, we have developed a Java application called rdfs-processor which provides three functionalities: analysis of an RDF Schema file to obtain basic information (i.e. the number of resource classes, number of property classes, and number of datatypes); normalization of an RDF Schema, in the case of incomplete definitions (e.g. empty domains); and schema discovery from an RDF data file.
The functionality of schema discovery is very relevant for this paper because most of the available RDF datasets do not provide an RDF Schema file. Our method for schema discovery follows the approach described in [31]. In general terms, the method reads the set of RDF triples two times: in the first pass, it identifies resource classes and property classes; in the second pass, it determines the domain and range for each property class. The output is an RDF file containing a basic description of the RDF Schema by means of the terms rdf:type, rdfs:Class, rdf:Property, rdfs:domain and rdfs:range. The source code of the rdfs-processor is available in Github (https:// github.com/renzoar/rdfs-processor).

B. METHODOLOGY AND EXPERIMENTAL SETUP
The experimental evaluation consists of a series of experiments that combine three variables: database mapping, data source, RDF graph size, and processing power. We evaluate the three database mappings defined in this paper: simple mapping, generic mapping, and complete mapping.
We consider four sources of RDF data whose characteristics (domain, nature, and structure) are shown in Table 1. We use nine RDF graphs (obtained from the data sources) whose size 3 goes from 328 triples to 41,191,235 triples, as shown in Table 2. The processing power variable indicates the use of machines with different characteristics in terms of hardware. In this case, we used four virtual machines hosted in the Google Cloud Platform, having a varying number of CPUs (Intel Skylake), main/primary memory size (RAM), and secondary memory size (SSD). The technical specification of each machine is shown in Table 3. All the machines worked with Debian GNU/Linux 9 (amd64 built on 20200309) as the operating system and Java OpenJDK 1.8.0_242 (64-Bit) without a graphic environment.
Based on the above variables, we evaluated the database mappings in terms of performance and scalability. The performance is measured as the running time (or runtime) required to execute a mapping and construct the corresponding output database (schema graph and instance graph). To do this, the rdf2pg application uses the built-in Java function System.currentTimeMillis to register the runtime. The objective is to determine the computational complexity of the mappings in practice.
Each mapping is evaluated under two notions of scalability. Former, we measure the scalability with respect to the size of the input data (i.e. the number of triples). The objective is to determine the behavior of the mappings with RDF graphs of different sizes. Later, we analyze the scalability with respect to the computational resources. The objective is to determine the dependency of each mapping with respect to the hardware.

C. EXPERIMENTAL RESULTS
Our experimental evaluation begins with the extraction of the RDF Schema for each RDF graph. This task was performed by using the rdfs-processor tool described in Section VI-A. Table 4 shows information about the corresponding RDF Schemas. We can observe that: the SP2B graphs do not change too much in terms of the number of classes and properties; the number of classes in GeoData and BSBM is larger than the number of properties; a small number of datatypes are defined in the graphs.   Once we had the RDF Schema files for each dataset, we executed the experiments in the virtual machines. Every execution of the rdf2pg application was configured to use the maximum amount of primary memory allowed by the machine. To do this, we use the -Xmx parameter defined by Java. Table 5, Table 6 and Table 7 show the runtimes for the simple mapping, the generic mapping and the complete mapping respectively.
In general terms, we can observe that the mappings worked well with most of the graphs (i.e. G1 to G7). However, there were problems to complete the task for graphs G8 and G9, running on virtual machines VM1 and VM2. Specifically, the execution of rdf2pg produced an error of ''insufficient memory for the Java Runtime Environment''. Hence, our current implementation has a restriction to process large input graphs with small-memory machines.
The above problem is related to the main memory (RAM) required to manage the intermediate objects used by the mappings. Being more specific, the mappings create a HashMap  to store all the nodes and their properties, and such a structure could be very large for some graphs. Note that the number of nodes is not directly related to the number of triples. For example, we observed that the number of nodes generated for G8 was higher than G9, even when G9 has more triples than G8. It explains why the simple mapping was able to process G9, but it was not able to process G8, both using VM2.
In order to analyze the scalability of the mappings with respect to the size of the input data, we selected the runtimes obtained with VM4. As shown in Figure 9, the execution time of all the mappings grows up in concordance with the size of the input graphs, i.e., the larger the size of the graph, the larger the runtime. Note also that the runtimes of the mappings are under the baseline defined by the graph sizes. Hence, we can conclude that the complexity of the mappings is linear with respect to the size of the input.
In order to analyze the scalability of the mappings with respect to the computational power, we prepare a plot for each mapping showing the runtimes for all the virtual machines (see Figures 10, 11 and 12). The plots show that the runtimes decrease for VM1, VM2, and VM3; however, the runtimes for VM3 and VM4 are not so different. The latter implies that there is a threshold in which the computational power does not reduce the execution time of the mapping.
As a general conclusion, we can say that the three database mappings presented in this work have an efficient implementation to process large datasets and work under middle-size computational resources.  All the input and output files described in the above experiments are available in Figshare [6].

D. INTEROPERABILITY IN PRACTICE
In order to show the practical use of the database mappings proposed in this article, we conducted a complete ETL process that involves: Extracting RDF data from an RDF dataset;  Transforming the extracted RDF data to PG data, and Loading the transformed data into a property graph database system. Due to its popularity and availability of features for data loading, we selected Neo4j as the target database system.
The main issue in this experiment is the configuration of rdf2pg to generate and encode property graphs into a data format that can be consumed by the Neo4j system. To do this, we created the Neo4jWriter class as an implementation of the PGWriter interface provided by rdf2pg. The Neo4jWriter class allows exporting a property graph as a set of Cypher instructions to create nodes and edges. For example, the property graph shown in Figure 5 will be exported as follows: 1  To demonstrate the validity of our mappings, we used the property graph obtained by applying the simple database mapping over the RDF graph G2, i.e. the SP2B file containing 1,285 triples. The output file containing Cypher instructions was loaded in Neo4j Desktop 1.2.3 by using the browserbased user interface. The loading process took 7 ms, resulting in a property graph with 270 nodes, 348 relationships, 677 properties, and 260 labels. A graphical representation of the loaded property graph, obtained from the Neo4j browser, is shown in Figure 13.
The above experiment was repeated for the generic and the complete database mappings. The generic mapping produced, after 55 ms, a property graph containing 949 nodes, 1285 relationships, 2911 properties, and 949 labels. The complete mapping took 8 ms, producing a property graph with the same number of elements produced by the simple mapping.
All the information related to this data loading experiment, including the output files and the charts of the property graphs, are available in Figshare [6].

VII. RELATED WORK
In this section we present the related work that targets the interoperability issue between RDF and property graphs. We group the efforts based on the direction of the mapping, i.e. RDF → PG and PG → PG.

A. FROM RDF TO PROPERTY GRAPHS
Hartig and Thompson [20] proposes two formal transformations between RDF and PGs. RDF is a conceptual extension of RDF which is based on reification. The first transformation maps any RDF triple as an edge in the resulting PG. Each node has the ''kind'' property to describe the node type (e.g. IRI). The second transformation distinguishes data and object properties. The former is transformed into node properties and latter into edges of a PG. The shortcoming of this approach is that RDF -(i) does not support mapping an RDF graph schema, and (ii) adds an extra step of an intermediate mapping and; (iii) isn't supported by major RDF stores.
In S2X, Schätzle et al. [33] propose a GraphX-specific RDF-PG transformation. The mapping uses attribute label to store the node and edge identifiers, i.e. each triple t = (s, p, o) is represented using two vertices v s , v 0 , an edge (v s , v o ) and labels v s .label = s, v o .label = o, (v s , v o ).label = p. Apart from being only GraphX-specific, this approach misses the concept of properties and also does not cover RDF graph schema.
Nguyen et al. [28] propose the LDM-3N (labeled directed multigraph -three nodes) graph model. This data model represents each triple element as separate nodes, thus the three nodes (3N). The LDM-3N graph model is used to address the Singleton Property (SP) [29] based on reified RDF data. The problem with this approach is that: (i) it adds adds an extra computation step (and 2n triples); (ii) does not cover RDF graph Schema; and misses the concept of properties.
Tomaszuk [39] proposes YARS serialization for transforming RDF data into PGs. This approach performs only a syntactic transformation between encoding schemes and does not cover RDF Schema.
Brandizi et al. [10] propose rdf2neo, a tool that can be used to map any RDF schema to the desired PG schema. This hybrid architecture facilitates access to knowledge networks based on shared data models. However, the disadvantage of this solution is that it maintains a more complex infrastructure that works well in the paper use case, but not for more general applications.
Another approach is presented in [16]. In this paper, the author presents a proposal for converting an RDF data store to a graph database by exploiting the ontology and the constraints of the source.

B. FROM PROPERTY GRAPHS TO RDF
There exist very few proposals for the PG-to-RDF transformation, such as Das et al. [15] and Hartig and Thompson [20], that mainly use RDF reification methods (including Blank Nodes) to convert nodes and edge properties in a PG to RDF data. While [20] propose an in-direct mapping that requires converting to the RDF model (as mentioned earlier), [15] lacks a formal foundation. Both approaches do not consider the presence of a PG schema.
Another approach is Unified Relational Storage (URS) [43]. It focuses on interchangeably managing RDF and PGs, and this is not a strict transformation method. Barrasa [7] proposes NSMNTX, a plugin that enables the use of RDF in Neo4j. This plugin allows the import and export of both schema and data. The problem with this approach is that NSMNTX is not formally defined and the mappings do not satisfy the property of information preservation. Table 9, presents a summary of the related work and the features they address. It should be mentioned that some works have studied the problem of mapping RDF to PGs in the scope of specific use cases, e.g. disease networks [25], protein structure exploration [1], and Wikidata reification [21].

VIII. CONCLUSIONS
In this article, we have proposed a novel approach, which consists of three direct mappings, to transform RDF databases into PG databases. We demonstrate, empirically and formally, that the mappings have an efficient implementation to process large datasets.
We showed that two of the proposed mappings satisfy the property of information preservation, i.e. there exist inverse mappings that allow recovering the original databases without losing information. These results allow us to present the following conclusion about the information capacity of the PG data model with respect to the RDF data model.
Corollary 1: The property graph data model subsumes the information capacity of the RDF data model.
Although our methods assume some condition for the input RDF databases, they are generic and can be easily extended (by overloading the mapping functions) to provide support for features such as inheritance and reification. Furthermore, our formal definitions will be very useful to study query interoperability [36], [38] and query preservation between RDF and PG databases (i.e. transformations among SPARQL and PG query languages). Thus, with this paper, we take a substantial step by laying the core formal foundation for supporting interoperability between RDF and PG databases.
Among the limitations of the mappings presented in this paper we can mention: the simple mapping is not suitable for RDF datasets with complex vocabularies as the common names will be merged in the resulting property graph; the generic mapping works with any RDF dataset, but the size of the output property graph will be bigger than the other two mappings; the complete mapping is suitable for any RDF dataset, but in practice, it requires the special directory to map prefixes to namespaces. A general limitation of the three mappings is that they are not able to deal with the special semantics defined by the RDF model (e.g. reification) and the inference rules supported by RDF Schema (e.g. subclass, sub-property). We plan to study these features in the future.