Loading web-font TeX/Math/Italic
Graph-Based Profiling of Blockchain Oracles | IEEE Journals & Magazine | IEEE Xplore

Graph-Based Profiling of Blockchain Oracles


Graph-based Profiling of Blockchain Oracles.

Abstract:

The usage of blockchain technology has been significantly expanded with smart contracts and blockchain oracles. While smart contracts enables to automate the execution of...Show More

Abstract:

The usage of blockchain technology has been significantly expanded with smart contracts and blockchain oracles. While smart contracts enables to automate the execution of an agreement between untrusted parties, oracles provide smart contracts with data external to a given blockchain, i.e., off-chain data. However, the validity and accuracy of such off-chain data can be questionable that compromises the transparency and immutability chacteristics of blockchain. Despite many studies on the trustworthiness of blockchain oracles, more precisely, off-chain data, their solutions are often ‘short-sighted’ and dependent on binary decisions. In this paper, we present a novel graph-based profiling method to determine the trustworthiness of blockchain oracles. We construct a graph with oracles as nodes and cumulative average discrepancies of validity and accuracy of data as edge weights. Our profiling method continues to update the graph, edge weights in particular, to distinguish trustworthy oracles. Clearly, this discourages the provision of false and inaccurate data. We have conducted an evaluation study to see the effectiveness of our proposed method, in which we have run the experiments utilizing the Ethereum network. Additionally, we have also calculated the cost of running these experiments. Consequently, our experiment results show that the proposed method achieves around 93% accuracy in identifying the trustworthiness of data sources.
Graph-based Profiling of Blockchain Oracles.
Published in: IEEE Access ( Volume: 11)
Page(s): 24995 - 25007
Date of Publication: 09 March 2023
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Blockchains have been increasingly adopted in many emerging decentralized applications, such as cryptocurrencies, supply chains and logistics due primarily to its transparency and immutability characteristics [1], [2], [3], [4], [5], [6], [7], [8], [9]. These characteristics in essence are ensured by the fact that participating parties (users) have identical copies of business transactions (ledger) arranged as blocks. Blocks are linked to each other using their hash values. A newly inserted block in the blockchain will also have the hash value for the last block added. Due to the presence of the linking process, any attempt to change a block transaction breaks the chain since the new hash value will not match the hash value stored in the next block.

In a blockchain, business logic is implemented using smart contracts. A smart contract is a written code (program) that automates the execution of an agreement between untrusted parities. However, the usability of smart contracts is primarily limited to data stored on a blockchain network (on-chain) without access to the external systems (off-chain) where real-world data and events reside. This limitation has been largely overcome by blockchain oracles also known as data feeds. Blockchain oracles are third-party services that enable blockchains to connect to external systems (data sources) with smart contracts, more specifically off-chain data, such as currency exchange rates, stock prices, sports scores and live stock DNA fingerprinting (Figure 1). For example, the traceability data of meat products can be provided by different data sources, such as Internet of Things (IoT) devices in livestock farms, warehouse records and smart labels in supermarkets. However, the validity and accuracy of data provided by these third-party services is often hard to be determined. In the above example, storage data from warehouse can be manipulated, and smart label scanning can be mixed up. In particular, the existence of the oracle raises security and privacy concerns [10] since oracles can be manipulated to feed the blockchain falsified information, and some may provide inaccurate information due to malfunctioning data sources. This problem is generally termed as the blockchain oracle problem [11], [12].

FIGURE 1. - A blockchain with an oracle. In this example, the oracle is connected to three data sources, hardware, software and people.
FIGURE 1.

A blockchain with an oracle. In this example, the oracle is connected to three data sources, hardware, software and people.

There have been several studies to address this problem, e.g., [13], [14], [15], [16]. A majority of these previous studies have employed voting/reputation-based mechanisms that are inspired by the crowdsourcing concept (wisdom of the crowd). In this context, the end user (application) submits a question, and data sources submit their answers to the oracle(s). Oracles invoke the voting-based mechanism to filter out falsified, inaccurate responses from the data sources, where a rewarding mechanism is employed to incentivize data sources to act honestly. However, their applicability is often limited to on-chain. Besides, the determination of trustworthiness of oracles is largely made based on a simple and short-sighted binary decision process; that is, the provided data is valid or invalid or accurate or inaccurate.

In this paper, we present a graph-based profiling method for oracles to determine the trustworthiness of their provided data. Our proposed method constructs a graph with oracles (data sources) as nodes and cumulative average discrepancies of validity and accuracy of data as edge weights. These weights are then used to partition the graph into two sub-graphs, trustworthy and manipulative graphs. Data sources that submit true and accurate answers form the trustworthy subgraph whereas those with false and inaccurate answers constitute the manipulative subgraph. The existence of these two subgraphs depends on the distances between sources. For instance, sources located within a relatively short distance are labelled as trustworthy. Specific contributions of this paper are as follows:

  • We develop a novel method to profile blockchain oracles based on their historical behaviors.

  • We devise a rewarding mechanism that incentivizes trustworthy data sources and penalizes manipulative ones.

  • We design our profiling method to be generic enabling it to be implemented either on-chain or off-chain.

  • We implement our profiling method in solidity (0.8.17) using the Hardhat development environment with the Ethereum blockchain and smart contracts.

  • We conduct an extensive evaluation study to analyze the cost of invoking the proposed method with varying performance parameters.

Our analysis shows that it costs almost 0.013 Ether to execute the proposed method when the number of the involved source is 20. Based on the application scenario, such costs are considered manageable. Additionally, several experiments have been conducted to investigate the performance of the proposed method. The results show that the proposed method can achieve an average accuracy of 93% in identifying the trustworthiness of data sources. Moreover, it is noted that the performance of the proposed method is influenced by the probability distributions in which the inaccurate sources’ answers are generated; performance is improved in situations where the sources that generate inaccurate answers are more likely to generate answers for future questions.

The rest of the paper is organized as follows: Section II reviews and discusses related work. Section III details the proposed graph-based profiling method. Section IV presents our evaluation study results. We conclude the paper in Section V.

SECTION II.

Related Work

There have been several studies to address the oracle blockchain problem [17], [18], [19], [20], [21], [22]. Mühlberger et al. [23] have classified oracles into inbound and outbound in terms of information flow. Inbound oracles transmit data from the outside world into the blockchain, whereas outbound oracles transmit data to the outside world. Additionally, the authors have provided a quantitative study to capture the provided patterns (classification) characteristics in terms of cost and latency. In [24], a framework that discusses the communication issues between the blockchain and the oracles is presented. In [12], the authors discussed and analyzed the current stage of the proposed oracle blockchain solutions in terms of several aspects, including the trust model and design patterns. Therein, the authors highlighted several challenges that should be considered while designing a blockchain oracle solution. Our work focuses on the reliability of the employed trust model to address the blockchain oracle problem in a decentralized fashion, and this will direct the discussion provided in this section.

Chainlink [25] is presented as a middleware layer that connects the blockchain infrastructure to the outside world. Chainlink is a distributed oracle framework that retrieves and processes data collected from participating sources. The data are filtered and processed through aggregation functions, whereby any misbehaving data source is penalized. The detection of misbehaving sources is established using a reputation mechanism that tracks the involvement of the participating sources. In Chainlink, the submitter of a query also identifies the oracles that can respond to the query, which may, consequently, introduce security issues. Inspired by the wisdom of the crowds’ principle, Augur [26] is presented as a decentralized oracle solution for the predictive market. In Augur, once a market is created, participants can start the process of shares trading. In this process, participants submit their answers to the proposed question in the market. Additionally, the participants have to determine and pay the amount of money they wish to invest in their submitted responses. Once a market is closed, the outcome is determined using a subset of the participants termed as reporters, where reporters who try to manipulate the system by submitting falsified information lose their reputation token (RE1). Augur has a market share-driven mechanism; however, it has been assumed that sources receive the same share (profit).

Adler et al. [14] proposed a decentralized, voting-based, blockchain oracle termed ASTRAEA that aims to ensure the trustworthiness of the participating data sources. In this framework, authors assume that only boolean propositions can be submitted, where the outcome of a proposition is either True or False. The participants in this framework are divided into three groups i.e., submitters, voters, and certifiers. Submitters contribute to the framework by submitting the propositions (queries) to the system. They also determine the allocated fund (reward) for each query. Voters have to reply to all submitted queries in terms of True, False, or Unknown. Based on the outcome of the proposition, voters may receive (or lose) money. When the voter’s reply matches the result, the voter will receive the announced reward. In case of a mismatch, the voter will lose his deposited stake (money). Voters who decide to send an unknown reply will not gain or lose money. In contrast, certifiers can select the proposition that they wish to answer. Therefore, they can only reply with True or False. In this framework, the outcome is calculated using a majority voting mechanism, where the required number of votes to start the outcome calculation process is an input parameter. The available number of participants influences the efficiency of the ASTRAEA protocol. Several protocols have been presented in the literature as an extension to the ASTRAEA protocol [15], [16]. In this paper, the proposed mechanism can also be considered voting-based. However, the proposed method employed in this work builds its solution by considering all of the available historical data. On the contrary, the ASTRAEA protocol and its variations do not use any available previous data.

Similarly, Nelaturu et al. [13] have also proposed voting-based, decentralized oracle protocols that address the binary market, where oracles respond with either True or False. The presented protocols are designed to address different market structures. For instance, one of the protocols can be used to manage a situation where the number of participants is relatively small. Similarly, others can be utilized where the number of participants is relatively high. Additionally, the authors have introduced a general mathematical model. They have also proved that an honest Nash equilibrium exists for binary markets. Similar to the ASTRAEA protocol, the protocols proposed by Nelaturu et al. [13] can be considered memoryless. In these protocols, unlike the proposed method in this work, previous interaction with the sources has not been considered during the calculation of the trust-related values. However, in this paper, the situation where the expected answer from the sources is in the form of real numbers has been additionally addressed.

Furthermore, Truong et al. [1] proposed a decentralized system, which works as middleware between blockchain infrastructure and any Decentralized Applications. The authors have proposed a modified version of the REK [28] trust model. The modified model uses the Experience and Reputation indicators to determine the trustiness of entity information. Accordingly, after an interaction between two entities, each can record its feedback about the other entity on the blockchain. The recorded feedback is then used to construct and maintain an Experience Network, which inputs the reputation calculation mechanism. In this system, the trustiness of an entity is calculated based on its Reputation and Experience scores. The proposed mechanism in this work shares some similarities with the trust mechanism proposed by Truong et al. [1]. In both mechanisms, the trustiness of an entity (data source) takes into consideration its previous behavior. In this work, to determine such trustiness, all sources’ responses (answers) are considered, and the obtained graph represents the overall similarity between the sources in terms of provided answers. Whereas, in [1], data sources have to submit feedback about each other as the main step for building the trust graph, and the constructed graph is built in a peer-to-peer manner.

In [27], the authors proposed the DeepThought protocol that can be considered as an extended version of the ASTRAEA protocol. Similar to the ASTRAEA protocol, the DeepThought protocol is designed to address binary market, where True or False is the only acceptable outcome for the submitted propositions. However, the DeepThought protocol is incorporated with voter’s reputation component. Once a proposition is closed, if a voter outcome matches the final outcome, the voter reputation will be incremented by one. In case of a mismatch, the reputation of the voter will be decremented by one. Besides not limiting its scope to binary market, the method proposed in this paper calculates the voters’ (data sources) reputations by considering the cumulative differences between the voters’ outcomes. Additionally, the final outcome is not used as part of the reputation calculation since if a high number of sources are decided to act maliciously, the authenticity of the final outcome becomes questionable. In [29], the authors proposed Bayesian-based reputation model, where the proposed model can be used to determine the cheapest oracles that can provide accurate answers.

Trust evaluation has been extensively studied in the vehicular energy network domain [30], [31], [32], [33]. In [30], a trust mechanism has been employed by the authors to build the trust graph that is used to determine the trustiness of participating entity (node). In the proposed mechanism, trust is calculated in a peer-to-peer fashion based on the feedback provided by the participants. Whereas once two participants interacted with each other, feedback must be provided by each participant about its interaction with the other participant. In this line, Wang et al. [31] proposed a reputation evaluation mechanism where the participants’ feedback is used to determine the trustiness of the participants. In the proposed mechanism, authors have also investigated the “fading” effect on reputation evaluation. Fading effect highlights that the reputation of a participant is expected to improve over time. Thus, the impact of new feedback must be higher than the old feedback on the reputation calculation. In [33], the employed reputation calculation mechanism categorizes the participants’ feedback into excellent (1), fair (0), and failed (−1). Accordingly, a participant’s reputation score is calculated by calculating the total feedback values received regarding the participant. Unlike the problem investigated in [30], [31], and [33], in this paper, the trust graph is constructed from the end-user perspective, where data sources have no direct interaction with each other. Accordingly, the problem addressed in this paper has a different structure compared to the problems investigated in [30] and [31]. Additionally, the solution proposed in [30] is designed for permissioned networks, where on-chain computation is free of charge. In contrast, to support decentralization, the method proposed in this work is designed for a permissionless network (Ethereum), where any computations performed on-chain has execution cost.

Table 1 summarized the proposals discussed in this section regarding the deployment environment and the adopting mechanism. The proposed method in this work distinguishes itself by adopting a voting-based mechanism that considers the available information about the sources. Additionally, as will be discussed through the remainder of the paper, based on the application scenario, the proposed method can be implemented on-chain or off-chain.

TABLE 1 A Summary of the Discussed Blockchain Oracles Approaches
Table 1- 
A Summary of the Discussed Blockchain Oracles Approaches

SECTION III.

The Proposed Method

Based on the target application scenario, the proposed method can be implemented on-chain or off-chain. On-chain implementation is desirable when the number of participants involved is relatively small. However, as will be seen in the experiments section, a cost feasibility analysis should be performed before adopting on-chain implementation. When the number of sources is expected to be large, an off-chain implementation can be adopted to reduce the execution cost.

A. Design Overview

Figure 2 shows the interaction between the system components in the proposed architecture. The figure shows that the entire process starts with the end-user submitting a query to the blockchain. This submission also clarifies the proposed reward for correct answers, the participation fees, and the number of data sources required to answer the submitted query. The selected number of data sources depends on the nature of the query. For instance, the user may choose to involve many data sources to answer a highly sensitive query. Subsequently, as a second step, the deployed smart contract on the blockchain notifies the data sources about the new submission. The participating data sources are then divided based on their expertise (knowledge domain). Therefore, a selected data source is obligated to answer the submitted query since the requested information falls under its knowledge. The third step in this process entails submitting answers and the requests’ fees by the involved data sources to the query contract to answer a query. Once the contract receives the involved data sources’ responses, the proposed method starts analyzing the submitted solutions to identify the correct answers. When the proposed method is implemented off-chain, the query contract submits the sources’ solutions to an off-chain version of the proposed method. On the contrary, when the proposed method is implemented on-chain, the analysis is performed within the smart contract. The last step of this process involves distributing rewards to the sources that submitted the correct answer. In contrast, sources that submitted incorrect answers will lose their submitted fees.

FIGURE 2. - A conceptual workflow of the proposed method.
FIGURE 2.

A conceptual workflow of the proposed method.

The proposed method represents the participating data sources as a complete graph. In the presented graph, the set of nodes represents the data sources, and an edge between two nodes (participants) represents the cumulative differences between their answers to previous queries. Such representation aims to capture and detect biased and inaccurate query answers. Bias in the query answer occurs when there is a significant difference between the sources’ answers due to malicious behavior. Moreover, inaccurate answers occur when there is a small difference between sources’ answers due to a malfunctioning process. Once the participants’ answers are received, the edges will be updated to reflect the received answers. The obtained graph will then be used to determine the final query answer. The main idea of this process is to detect whether the participants’ sources are split into two groups based on their behavior (trustworthy and manipulative). In such a situation, the largest group will be considered the trustworthy group. Furthermore, the proposed method consists of the graph maintenance and query processing stages. These two stages work recursively to determine the outcome for each submitted query.

In the proposed method, several factors incentivize the sources to act honestly. The mandatory fees that each selected source has to pay encourage the sources to participate. Additionally, the risk of losing the deposited fees and the possibility of being blocked from participation motivates the participants to behave honestly. Such risks also reduce the chances of having lazy sources, where sources answer the submitted queries without performing all of the expected tasks to obtain an answer. Furthermore, the graph representation is not visible to the sources, and the sources are not aware of each other’s existence. Therefore, the possibility of a source that mirrors other sources is eliminated. Source A mirrors source B when it copies source B solution to the submitted query.

B. Graph Maintenance Stage

Data sources are grouped based on their domain knowledge, where the definition of the domain knowledge is application-dependent. For instance, in a currency exchange scenario, all data sources that can provide information about currency exchange rates are considered in the same knowledge domain. The participating data sources are represented as a complete graph G= < V,E> . In this graph, the vertices (set V ) represent the participants, and the edges (set E ) represent the differences between the participants in terms of their reported answers. The weight of an edge w(e(v_{i},v_{j})), \forall v_{i}, v_{j}\in V captures the cumulative difference between the sources that are connected using this edge. The initial weight for each edge is zero. Once the responses from all participants are received, the weight of an edge that connects participants v_{i}, v_{j} \in V will be calculated as follows:\begin{equation*} w(e(v_{i},v_{j}))=dif(v_{i},v_{j})+w(e(v_{i},v_{j})) \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where dif(v_{i} ,v_{j} ) represents the absolute difference between v_{i} and v_{j} reported answers. Such representation also helps in situations where the sources selected to answer a query must fulfill pre-determined criteria. For instance, assume that the user is looking for x number of sources. In such a situation, finding x number of sources that have the smallest total distance to each other fulfills the criteria.

Additionally, by considering the cumulative weight between the sources, the employed representation has the advantage of detecting sources that generate inaccurate answers. Sources that generate inaccurate answers once are expected to generate more inaccurate answers in the future, especially in situations where inaccurate answers are generated due to malfunctioning processes. To this effect, using cumulative weight reduces the required time to detect abnormality in sources’ answers.

Moreover, where inaccurate answers are expected to be generated in a stochastic manner, a memoryless version of the edge weight calculation can be employed. In such a situation, sources that provided inaccurate answers are not likely to generate inaccurate answers in the future. In the memoryless version, the weight of an edge that connects participants v_{i}, v_{j} \in V can be calculated as follows:\begin{equation*} w(e(v_{i},v_{j}))=dif(v_{i},v_{j}) \tag{2}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

C. Query Processing Stage

To submit a query request, the user must specify the number of sources he/she is planning to use (SP). When the specified number is less than the total number of sources, the source selection step starts by identifying the source with the smallest distance (weight) from the rest of the sources. Consequently, using the shortest path strategy, the selection is expanded iteratively by selecting the nearest source to the currently selected sources. In situations where the number of specified sources is equal to the total number of sources, the entire set of sources will participate in answering the query. Once the answers from the involved sources are received, the query processing stage will be triggered to analyze the answers and remove abnormal sources’ answers from consideration.

Algorithm 1 shows the steps of the query processing stage. The algorithm starts by calculating the weight of the average edges ({\mathrm {avg}}_{A} ). Following this, the participating sources are divided into two groups, p_{1} , and p_{2} . This division aims to determine whether the participating sources are split in terms of behavior or not. This division is performed by identifying the edge with the largest weight w(e(v_{i},v_{j})), \forall v_{i} , v_{j} \in V (line 2). The selection of this edge aims to determine whether different patterns in the data sources’ answers can be recognized. Then, v_{i} is added to the first group p_{1} , and v_{j} is added to the second group p_{2} . Other sources are added to the nearest group based on the distance (weight) between sources and the first source inserted into each group (v_{i} and v_{j} ) (line 4). Additionally, the average weight between the groups is calculated ({\mathrm {avg}}_{p} ). This average is calculated by dividing the edges’ total weight that connects the two groups over the number of these edges. In situations where {\mathrm {avg}}_{P} \geq {\mathrm {avg}}_{A} , the division is confirmed since there is a clear, distinct difference between the grouped sources in terms of answers (line 7). In other situations, where {\mathrm {avg}}_{P} < {\mathrm {avg}}_{A} , the division is not confirmed since there is no clear, distinct difference between the sources’ answers (line 11).

Algorithm 1 - The Query Processing Stage
Algorithm 1

The Query Processing Stage

In situations where the division is confirmed, an Abnormality Score (AS) will be calculated as follows:\begin{equation*} \text {AS}= \frac {\text {Max}_{p_{1},p_{2}}} {| |p_{1} |-|p_{2} | | } \tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where {\mathrm {Max}}_{p_{1},p_{2}} refers to the weight of the longest edge that connects a source from p_{1} to a source from p_{2} . Dividing this weight by the difference between the groups in terms of size (number of sources) aims to highlight the criticality of the captured abnormality (difference). In situations where the sizes of the groups are significantly different, the reported abnormality can be due to a malfunctioning data source. However, in situations where the sizes of the groups are relatively similar, the captured abnormality should be further analyzed since a significant number of sources may try to alter the final outcome. Thus, the abnormality score can be an indicator that alarms the user about the possibility of having malfunctioning sources and/or malicious behavior.

The participants in the largest group are highlighted as trustworthy sources. Accordingly, they receive the announced reward in full for answering the query (line 15). Participants who belong to the smallest group lose their deposited stakes unless their query response matches the final outcome. In this case, the participant is not rewarded. However, they do not lose their deposited fees and the participant receives a partial reward specified by the end user. Partial payment is not expected to occur that often since sources that are labelled as manipulative are expected to be blocked and replaced. Additionally, the end-user should have the ability to reset the distance of any source if the factors behind submitting inaccurate answers no longer exist. For instance, in cases where sources submit inaccurate answers due to hardware failure. In such a situation, if the hardware error is fixed, the distance between this source and the other sources will be reinitialized.

D. Example and Discussion

Figure 3 shows different stages of the proposed method. This example assumes that five data sources have successfully reported their answers to five queries. The sources’ answers are shown in Table 2 where the cumulative difference between the sources (weight) is shown in Table 3. From the tables, it is evident that data sources C and D have reported the same answers for all five queries processed, and therefore, the distance between these two sources is equal to zero. Additionally, we can see that the edge with the largest weight (=6 ) exists between sources D and A and sources C and A . Accordingly, an abnormality between sources A , D , and C in reported answers is recognized. This abnormality highlights the possibility that either source A or the group of sources D and C might be reporting inaccurate values. Such abnormality can be confirmed based on the rest of the sources’ behavior (reported answers). In situations where most sources are closer to source A in terms of cumulative weight, the reported value by source A can be recognized as the accurate answer. Otherwise, the reported answer by sources C and D can be recognized as the accurate answer. In this example, the edge with the largest weight is the one that connects data sources A to either sources D or C (w=6 ). Data source A will be added to p_{1} , whereas source D will be added to p_{2} . Accordingly, based on the edge weights, p_{1} will be expanded to include source B , and p_{2} will also be expanded to include sources C and E (Figure 3c).

TABLE 2 The Source’s Answers for the Five Rounds (Queries)
Table 2- 
The Source’s Answers for the Five Rounds (Queries)
TABLE 3 The Cumulative Difference Between the Sources (Weight)
Table 3- 
The Cumulative Difference Between the Sources (Weight)
FIGURE 3. - Examples to illustrate the query processing stage.
FIGURE 3.

Examples to illustrate the query processing stage.

Once the two groups are identified, the query processing stage works to analyze the detected abnormality and whether it should result in ignoring some sources’ reported answers (manipulated answers). This analysis starts by calculating the average weight for the edges by dividing the total weight of the edges by the number of edges, and it is calculated as follows:\begin{equation*} \text {avg}_{A}= \frac {35}{10}=3.5\end{equation*}

View SourceRight-click on figure for MathML and additional features. where 35 is the total edges weights and 10 is the number of edges (Table 3). Then, the average distance between the established groups is calculated as follows:\begin{equation*} \text {avg}_{p}= \frac {26}{6}=4.333\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where 26 is the total weight of the 6-edge that connects the two groups. In this example, {\mathrm {avg}}_{p} is greater than {\mathrm {avg}}_{A} , and therefore a split in terms of the data sources reported answers is confirmed. Accordingly, since (p_{1} ) is the largest group, it will be selected as the trustworthy group and the answers of this group data sources will be used to answer the query (calculateOutcome()). In this example, the abnormality score is calculated as follows:\begin{equation*} \text {AS}= \frac {6}{|3-2|} =6\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where 6 is the largest edge weight and |3-2| is the difference between the two partitions in terms of size. The abnormality score can be interpreted as an alarming indicator. Having a high score underlines the importance of taking a closer look at the sources’ behavior in order to determine whether some of the participating sources should be removed. It is evident the abnormality score value is mainly influenced by the largest edge weight and the difference between the two partitions in terms of size. In situations where the difference between the sources is small (close to one), the small group participants’ existence as active members should be re-examined. This should also be the case when the largest edge’s value is significantly higher than the rest of the edges.

To underline the importance of representing the weights of the edges as the cumulative difference between the sources, let us revisit the example shown in Figure 3. Let us assume that the complete graph weight is calculated based only on the current active fifth round (memoryless version). Figure 4 shows the complete graph representation for this scenario. As can be seen from the figure, the difference between sources E and A (=2) is exactly similar to the difference between sources E and D (or C ). Therefore, in this situation, source E may end up in the same group as sources A and B , which will change the outcome. However, by representing the weights of the edges as the cumulative difference, it can be observed that source E behaves closer to sources D and C compared to source A . Additionally, representing the weight of the edges as the cumulative difference helps address the requirement that a small difference between the reported values is expected. In such a situation, the answers of the trustworthy group sources are expected to be aggregated in order to determine the final query solution.

FIGURE 4. - Example without cumulative data representation.
FIGURE 4.

Example without cumulative data representation.

Using the proposed method, the data submitted by the participants (data sources) are closely monitored to highlight any abnormality. The monitoring of the participants’ behaviors emphasizes the advantages of using the proposed method in decentralized applications, where data sources are by nature different. For instance, the proposed method can be used in the Metaverse applications domain [34], where several wearable and IoT devices are expected to the application with the required data.

SECTION IV.

Experiments

The proposed method is implemented in solidity (0.8.17) using the Hardhat development environment,2 wherein the optimizer feature is enabled to reduce the ‘gas’ consumption. Each participant is assigned an Ethereum account in order to interact with the proposed method, where the Ethereum account is associated with a balance (Ether). Through the experiments, once a participant (address) invokes the method contract, it will be charged based on the amount of computation performed through this call. Any performed computations cost (gas) will eventually be converted to Ether. Table 4 summarizes the main parameters used in the presented experiments.

TABLE 4 Experiments Parameters
Table 4- 
Experiments Parameters

The proposed method’s functionalities are implemented as two contracts,3 the Oracle and the ID contracts. The identity (ID) contract handles the registration and de-registration of the data sources. In the meantime, the Oracle contract is used to process all query-related tasks. Table 5 summarized the functions implemented in both contracts. The functions listed in Table 5 are executed by either the data sources or the end-user. Data sources interact with the system via the submitAnswer() function. Whereby the end-user executes and maintains the reset of the functions. The execution of the graph maintenance and the query processing stages are performed by the end-user via the processQuery() function. Additionally, the end-user is responsible for the execution of the data sources’ registration-related functions.

TABLE 5 Smart Contracts Functions Summary
Table 5- 
Smart Contracts Functions Summary

The interaction between the proposed method’s components and the participants (data sources and end-user) is illustrated in Figure 5. Once a query is received from the end user, an event will be created to inform the data sources about the presence of the new query. A data source is expected to answer the query within a predetermined timeline. Before storing the submitted answer by a data source, the identity of the data source will be verified. Once enough answers are received, the submitted answers will be processed in order to determine the query’s final answer and to pay the trustworthy data sources.

FIGURE 5. - Sequence diagram showing the interactions between the participants and the smart contracts.
FIGURE 5.

Sequence diagram showing the interactions between the participants and the smart contracts.

To evaluate the performance of the presented method, several sets of experiments are conducted to analyze the cost of running the proposed method on-chain and to investigate its performance. Accordingly, the results are divided into two sections: (1) the cost analysis; and (2) the performance analysis sections.

A. Cost Analysis

In this section, the charges incurred by each participant during the interaction with the smart contracts are explored. To highlight the impact of the number of data sources on the cost of the executing method, the experiment was run for a different number of sources (5 up to 20). Table 6 shows the average cost paid by each participant to perform the highlighted transaction type in terms of gas usage and the corresponding Ether’s cost. In the proposed method, the cost of running the graph maintenance and the query processing stages (processQuery()) depends on the submitted answers by the involved sources. The lowest cost (number of required computations) occurs when all participants submit the same query answer. In such a situation, the weights of the edges are not updated. In contrast, the highest cost occurs when each participant submits a different, unique answer; consequently, each edge in the graph is updated. Accordingly, for each number of used sources (5 up to 20), we performed ten experiments. In each experiment, the probability that a source obtains an inaccurate answer was increased by 10% compared to the last constructed experiment. Thus, the results are the average for ten experiments.

TABLE 6 Cost of Transactions
Table 6- 
Cost of Transactions

From the table, we can see that the costs associated with the deployment of smart contracts are the highest. This is an initialization cost since the redeployment of the contracts will not occur unless the code of the contracts is changed. The cost of registerDataSource(), and submitAnswer() functions are relatively small, and, therefore, can be neglected. The cost associated with the submitQuery() functionality is due to the storage requirement of this function (number of stored variables). The cost incurred by the end-user to maintain the graph and determine the query answer (processQuery()) is noticeable. Such cost highlights the importance of performing a cost feasibility study before adopting on-chain implementation. On-chain implementation is expected to be more secure and transparent. It is desirable in situations where the pattern of publishing queries is less frequent and the number of sources is small. When the number of participating sources is expected to be high, the proposed method can be implemented off-chain to reduce the operational cost. Accordingly, implementing the graph maintenance stage and/or the query processing stage can be mitigated outside the Ethereum blockchain.

The choice of which part to move outside the blockchain is application dependent. The graph maintenance stage stores detailed information about the sources. Thus, it can be fully moved outside the blockchain when all participants agree on the new hosting environment. For instance, participants could agree to host this stage in a cloud environment since all of the participants can easily obtain the characteristics of such an environment. However, in applications where the participants do not fully trust each other, and where the nature of the application can be classified as sensitive, keeping the graph maintaining stage on-chain is reasonable. With regards to the query processing stage, significantly lesser constraints apply toward off-chain implementation. In situations where the graph maintenance stage is implemented off-chain, the other stage is also expected to be implemented off-chain since the maintenance stage is more restricted than the other stage. Furthermore, adopting off-chain implementation does not change the overall proposed method performance. To this effect, the off-chain components will be called from the Ethereum blockchain via API, where the query processing stage results will be pushed back to the blockchain.

B. Performance Evaluation

This section starts by describing the implementation details of the proposed method. Subsequently, the results of the experiments are discussed in detail.

To evaluate the performance of the presented method, an extensive set of experiments are conducted. In particular, we are interested in evaluating the impact of the following factors on the proposed method’s performance:

  • Number of data sources

  • Percentage of malfunctioning sources

  • Sources distributions: The distributions in which the inaccurate source values are generated.

As the proposed method is designed to detect the sources that have malicious behavior, our experiments with varying values of these parameters sufficiently evaluate the performance of our method. The presented experiments assume that data sources report temperatures between 30–40 Celsius over 100 days. In the presented experiments, if a data source reported a value 10−2 away from the expected value, the reported value was considered accurate. The nature of the reported data does not influence the performance of the proposed method since the method mainly uses the absolute difference between the values of the task as an input to its mechanism. Each data source reports its captured temperature at the end of each day. The percentage of malfunctioning data sources that provide inaccurate temperature readings is varied based on each experiment setting. An inaccurately reported value by a source is expected to be at most 10% away from the original true value. Additionally, unless mentioned otherwise, the number of data sources used in each experiment equals to 50. The presented method classifies the sources’ reported values into trustworthy (true) and manipulated (false). Therefore, in this section, we are interested in measuring the performance of the presented method in terms of accuracy.

For benchmarking purposes, the problems discussed by Adler et. al [14] and Nelaturu et. al [13] are very close to the problem addressed in this work. However, the solutions proposed by these authors are not directly comparable to the presented method in this work. The solutions proposed in [14] and [13] are designed for a binary market, where the expected answer for a query is either true or false. Additionally, the probability calculation in [13] did not fully address the situation where data sources would obtain and submit inaccurate answers due to malfunctioning. Thus, to benchmark the proposed profiling method, we have used the memoryless version (ML-V) of the profiling method. For ensuring the fairness of the comparison, once a malfunctioning source is detected by the profiling and/or the memoryless methods, the weights of its edges will be reinitialized.

1) Number of Sources

To clarify the impact of the number of sources on the overall performance, we ran the experiments while changing the number of sources. Figure 6 shows the results of this evaluation. In the presented experiments, 20% of the sources are selected at random to act as the malfunctioning sources. On daily basis, each one of these sources is expected to generate an inaccurate value with a probability of 50%. The figure shows that increasing the number of sources increases the gap between the profiling and memoryless methods. Increasing the number of sources increases the expected number of inaccurate values generated by the malfunctioning sources. Additionally, from the results, we can see that the profiling method outperforms the memoryless method. The cumulative weight strategy employed by the proposed method has the advantage of tracking the sources’ answers and therefore detecting inaccurate answers. This behavior becomes more noticeable while increasing the number of sources due to the increment in the number of inaccurate values.

FIGURE 6. - The impact of the number of sources on the reported performance metrics.
FIGURE 6.

The impact of the number of sources on the reported performance metrics.

2) Percentages of Malfunctioning Sources

The experiments were then run for different percentages of malfunctioning sources to evaluate the relationship between the expected percentage of malfunctioning sources and the reported accuracy score. Figure 7 shows the result of this experiment, whereby it can be observed that increasing this percentage results in degrading the performance of the profiling and the memoryless methods. Additionally, from the results, it can be seen that at a low percentage (10%), the memoryless method achieves almost the same accuracy as the profiling method. Increasing this percentage is expected to increase the number of sources with inaccurate submissions. Such an increment is expected to increase the probability of detecting abnormality in terms of reported values since more sources will be added to the manipulative group. Accordingly, the profiling method will have the advantage since malfunctioning sources will be detected if they generate inaccurate values in more than a day.

FIGURE 7. - The impact of the percentage of malfunctioning sources on the reported performance metrics.
FIGURE 7.

The impact of the percentage of malfunctioning sources on the reported performance metrics.

3) Sources Distributions

To capture the impact of the distributions in which the inaccurate sources’ values are generated, experiments were run while assuming the inaccurate values generated by the malfunctioning sources follow the Beta distribution. Under this distribution, on each day, each malfunctioning source is expected to generate an inaccurate value with a higher probability compared to the previous day. In the presented experiments, it was assumed that 50% of the sources act as malfunctioning sources, where the total number of sources is equal to 50. Accordingly, on the last day, each source will generate an inaccurate value with a probability of 50%, where the lowest probability occurs on the first day (0.5%). In general, the probability increases by 0.5% on a daily basis. Table 7 shows the results of this evaluation. Using the adopted graph-based strategy helps the proposed method by capturing the cumulative behavior of sources accurately. Such behavior highlights the benefits of ensuring that the first few iterations (days) occur with a small percentage of inaccurate values to stabilize the method’s performance. Additionally, from the results, it is evident that the achieved abnormality score by the profiling method is higher compared to the memoryless version. Besides the length of the longest edge in the graph, the value of the abnormality score is impacted by the difference between the trustworthy and manipulative groups in terms of size (number of sources) (Eq. 3). Thus, obtaining a higher score by the profiling method indicates that the profiling method assigns a lesser number of sources to the manipulative group. This emphasizes the accuracy of the proposed method since it is able to correctly identify a higher percentage of manipulative and trustworthy sources.

TABLE 7 Accuracy and Abnormality Score Under the Beta Distribution
Table 7- 
Accuracy and Abnormality Score Under the Beta Distribution

C. Discussion

The proposed method’s performance is mainly impacted by the pattern in which the malfunctioning sources generate inaccurate answers and the expected percentage of inaccurate answers in each round (day). The cumulative weight strategy becomes more valuable in situations where sources that generate inaccurate answers repeat the same behavior in subsequent rounds. Thus, increasing the randomness in terms of the sources that generate inaccurate answers is expected to limit the advantages of the employed cumulative weight strategy. For instance, if a source generates at most one inaccurate answer, the cumulative weight strategy becomes meaningless.

Regarding the expected percentage of inaccurate answers, increasing this percentage to more than 50% challenges the efficiency of the proposed method. However, in such a situation, the abnormality score becomes a crucial factor in detecting the abnormality in terms of reported answers. Increasing this percentage reduces the abnormality score since it reduces the gap between the manipulative and trustworthy groups in terms of size. Accordingly, a low abnormality score reduces the trust in the reported values. Defining what can be classified as a low abnormality score is application-dependent since it requires statistical information about the expected abnormality in the reported answers.

SECTION V.

Conclusion

In this paper, we have presented a graph-based profiling method to monitor and determine the trustworthiness of blockchain oracles (data sources). The graph presentation of data sources with their trustworthiness represented as weights on edges enables the identification of sources that repeatedly provide inaccurate data. Our experiment results strongly support our claims regarding the costs of running the method and accuracy in identifying the trustworthiness of sources. In particular, the results showed that under Beta distribution, the method has achieved the highest accuracy, 93%.

Our study has demonstrated that the use of a complete graph that maintains multiple data sources into two sub-graphs, trustworthy and manipulative graphs encourages them to provide true and accurate data. The efficacy of our method is significantly leveraged by the historical information, of data sources, on the validity and accuracy of data. In the future, we plan to conduct a case study with real-world data, such as livestock data. We also plan to employ machine learning techniques, such as reinforcement learning. This can be applied to improve both the accuracy of profiling and the incentivization/penalty scheme. We plan to further investigate the impact of the adopted application on the performance of the proposed method. Applications are expected to have different source distributions, and as discussed in the results section, such distribution impacts the performance of the proposed method.

ACKNOWLEDGMENT

Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Higher Colleges of Technology.

References

References is not available for this document.