To incentivize or not: Impact of blockchain-based cryptoeconomic tokens on human information sharing behavior

Cryptoeconomic incentives in the form of blockchain-based tokens are seen as an enabler of the sharing economy that could shift society towards greater sustainability. Nevertheless, knowledge of the impact of these tokens on human sharing behavior is still limited and this poses a challenge to the design of effective cryptoeconomic incentives. This study applies the theory of self-determination to investigate the impact of such tokens on human behavior in an information-sharing scenario. By utilizing an experimental methodology in the form of a randomized control trial with a 2x2 factorial design involving 132 participants, the effects of two token incentives on human information-sharing behavior are analyzed. Individuals obtain these tokens in exchange for their shared information. Based on the collected tokens, individuals receive a monetary payment and build reputation. Besides investigating the effect of these incentives on the quantity of shared information, the study includes quality characteristics of the information, such as accuracy and contextualization. The focus on quantity while excluding quality has been identified as a limitation in previous work. In addition to confirming previously known effects such as a crowding-out of intrinsic motivation by incentives, which also exists for blockchain-based tokens, the findings of this paper point to a hitherto unreported interaction effect between multiple tokens when applied simultaneously. The findings are critically discussed and put into the context of recent work and ethical considerations. The theory-based-empirical study is of interest to those investigating the effect of cryptoeconomic tokens or digital currencies on human behavior and supports the community in the design of effective personalized incentives for sharing economies.


I. INTRODUCTION
Cryptoeconomic incentives in the form of blockchain-based tokens are seen as an enabler of the sharing economy [1,2] that could shift society toward greater sustainability [3,4].One of the resources that is shared in these economies is information [5,6,7], which due to its growing utilization in data-intensive technologies [8] is becoming increasingly important [9,10].This has resulted in the collection of large data sets by organizations [11].Nevertheless, in this age of vast data quantities, obtaining high-quality information is a challenge [11,12] (e.g. the accuracy of the collected data is low).Moreover, because organizations collect massive amounts of unstructured data, such as customer behavior (product choices and sleeping patterns), opinions (e.g.Facebook likes), medical health records, or IoT data, the amount of data collected exceeds the processing power available to analyze it [13], possibly resulting in sampling biases.Furthermore, these "Big Data" approaches often involve the danger of collapsing the complexity of entire human personalities into assumptions constructed from simple data (e.g.website clicks) and usually miss the unique domain-specific knowledge users have [14].Thus, it has been suggested that information providers should structure their input in a contextualized way when sharing their data, utilizing semantic web technologies [15], such as linked data and ontologies [16,17], and evaluate the quality of information shared by other providers [18].Nevertheless, as this would require additional effort on the part of the information providers [18], incentives such as gamification [19], reputation [20], money [21], or auctions [22] are suggested to motivate the data providers and thus improve the characteristics of the collected information.However, previous work on the incentivization of information sharing focuses on the quantity of collected information while excluding quality characteristics such as accuracy or contextualization [23].
Increasingly, cyrptoeconomic incentives in the form of blockchain-based tokens are proposed to be awarded to participants of information-sharing communities [24,25,26,27,28].In these studies, the performance of the applied incentives is investigated using simulations [29], game-theoretical methodologies [30], and case studies [31].Nonetheless, behavioral data of users in comparison with treatment groups with and without incentivization have not been collected, which limits these approaches as the utilized models cannot be calibrated with real-world data [32].Controlled experiments are therefore required that investigate the impact of these cryptoeconomic incentives on humans information-sharing behavior.In particular, such an empirical approach could assess and validate the accuracy of the utilized theoretical models [1].Likewise, although the application of multiple token incentives has been proposed to improve the maintenance and sharing of a common resource [31,33] and has been investigated in games [30] and simulations [34], the impact of simultaneously applying these incentives has not been investigated empirically in a controlled experiment.
By testing hypotheses that are informed from selfdetermination theory [35,36] with an experimental methodology in the form of a randomized control trial utilizing a 2x2 factorial design, the impact of two types of cryptoeconomic incentives in the form of blockchain-based tokens on the information-sharing behavior of humans is investigated.
The contributions of this paper can be summarized as follows: • A conceptual impact model (Figure 1) links cryptoeconomic incentives to human motivation and informationsharing behavior in consideration of self-determination theory [35,36].• The living lab experimental methodology [37] is augmented with a 2x2 factorial design to investigate the impacts of blockchain-based cryptoeconomic incentives on human information-sharing behavior.• Four effects of cryptoeconomic tokens on human behavior are identified: i) a hitherto unreported interaction effect between two types of cryptoeconomic tokens when applied simultaneously; ii) an internalization effect of cryptoeconomic tokens in the form of improved information-sharing behavior even after the incentivization period has ended; iii) a crowding-out effect on intrinsic motivation when cryptoeconomic tokens are applied; iv) a time effect resulting in a variation of the impact of cryptoeconomic incentivization over time.• A novel high-quality dataset illustrates user informationsharing behavior under multiple token incentives that facilitates causal inferences about human behavior under cryptoeconomic incentivization.• The work demonstrates how self determination theory can be applied in the formulation of hypotheses and testing in Token Engineering and Token Economics.• The implications of the findings for the design and engineering of one-dimensional and multi-dimensional token systems are discussed critically, taking into account ethical impacts.
Since these contributions inform an improved construction of blockchain-based incentives, they are of relevance for the community, which is increasingly utilizing and investigating such incentives in various application domains [38,39,40].
This paper is structured as follows: In Section II, related work in information sharing is discussed.The research methodology is introduced in Section III, while Section IV presents the evaluation.Section V summarizes the findings and discusses their implications.Finally, Section VI draws the conclusion and provides an outlook for future work.

A. Self-determination theory and incentives
Humans are intrinsically and extrinsically motivated to share information [41].Intrinsic motivation refers to when people perform a task such as information sharing out of the pleasure they derive from the task itself, whereas extrinsic motivation stems from incentives, such as monetary payments, reputation gains, or punishments.When compared to extrinsic motivation for a specific task, intrinsic motivation leads to enhanced performance, persistence, creativity, learning capacity, and endurance in humans [36] and may therefore be more important than extrinsic motivation for specific scenarios such as contributing computer code [42] or sharing information [43].
Introduced by Deci and Ryan [35,44], self-determination theory illustrates the conditions under which humans are intrinsically motivated to work on a task: Three innate psychological needs must be satisfied: "competence", "autonomy" and "relatedness".In particular, a feeling of competence does not enhance intrinsic motivation unless accompanied by a sense of autonomy [36,45].In this context, applying misaligned incentives may infringe on the perceived autonomy of humans and thereby reduce their intrinsic motivation [46]; this is referred to as crowding-out effect [41].However, competence-enhancing incentives may support intrinsic motivation [35,46] and this is referred to as internalization [35].
Figure 1 illustrates findings from literature about the dependencies between different types of incentives, the two types of motivation, and their impact on characteristics of information.Extrinsic and intrinsic motivation are not separate systems, but influence each other [46].Extrinsic motivation can be integrated to become intrinsic motivation (Arrow I in Figure 1) [35].People are extrinsically motivated to share a higher quantity of information by money [37,47], the access to the collected information [47,48], reputation [47], and their intrinsic motivation [47] (Arrows II and VII in Figure 1).In particular, the strongest increase in quantity is observed for money, followed by the access to information, reputation, and then intrinsic motivation [47].In contrast to quantity, it has been observed that the quality of shared information remains unaffected under monetary incentives when mesasuring it in terms of the prediction accuracy of stock recommendations [49] or is even worse than before incentivization when meassured by a word count index [50], or useability (helpfulness of reviews) [50], or quality of produced images [51] (Arrow III in Figure 1).
It has been found that monetary incentives decrease intrinsic motivation (Arrow IV in Figure 1), while they increase extrinsic motivation (Arrow V in Figure 1) [45,52,53], which might explain the impact of monetary incentives on quality: As intrinsic motivation has been observed to predict quality (Arrow VI in Figure 1 ) [42,54] and only to a lesser extent quantity (Arrow VII in Figure 1) [47], whereas extrinsic motivation predicts the quantity of shared information (Arrow VIII in Figure 1) [54], the use of monetary incentives would result in an increased extrinsic motivation and decreased Fig. 1: Impact of incentives (money/ access/ reputation) on motivation (intrinsic/ extrinsic) and information-sharing behavior (quality/ quantity).Impacts (arrows) are derived from related work (in parentheses).intrinsic motivation, thereby leading to a higher quantity but lower quality of shared information.
In contrast to monetary incentives, because they may enhance an individuals feeling of competence [35,46], reputation systems impact extrinsic motivation positively [55], while also having a positive impact on intrinsic motivation [56] (Arrows IX and X in Figure 1).
Moreover, it has been found that rewarding each information-sharing action is more effective than summarized payments [57], and that small subgroups have shown moderate to strong aversion to incentives [58], which has been confirmed by Pournaras et al. [37].

B. Tokens for information sharing
Cryptoeconomic incentives in the form of blockchainbased tokens span a multi-dimensional incentive system [33] enabling a differentiated pricing of a broader spectrum of externalities [33,63].This can result in the improved selforganization of society when compared to one-dimensional incentive systems such as the current monetary system [64,65].These tokens are defined as a "a unit of value issued within a DLT system [or blockchain system] and which can be used as a medium of exchange or unit of account" [66] and are increasingly utilized in communities to encourage the sharing of information.
Table I illustrates related work that utilizes blockchainbased tokens in information-sharing scenarios.All of these works contribute a conceptual framework about blockchain and tokens and how they can be applied to improve the information sharing in a community (Column Fram. in Table I).Four of these frameworks are implemented in a software artifact (Column Impl. in Table I): Naz et al. [28] implement a software artifact that integrates IPFS1 with a blockchain to improve the quality of shared data by incentivizing stakeholders with tokens to review the shared information.Similarly, Hunhevicz et al. [31] use tokens to incentivize high-quality datasets in a construction process by awarding those that provide complete and accurate information.Zhang et al. [61] use tokens in their prototype to incentivize the provision of credit data.Finally, Jaiman et al. [62] I).Hülsemann and Tumasjan [59] apply an agentbased modeling approach to investigate three different types of tokens.Likewise, ImaniMehr and DehghanTakhtFooladi [30] investigate with a simulation the application of multiple tokens to incentivize the optimal utilization of video stream layers.Moreover, they analytically investigate their scenario with methods from game theory (Column Anly. in Table I).
Similarly, Jung et al. [27] utilize both methods from game theory/ mechanism design as well as simulations to evaluate their framework that improves the provision and maintenance of patient health records.Hunhevicz et al. [31] evaluate their framework and implementation with stakeholders in a workshop (Column Work.; ID 7 in Table I).Three frameworks are not evaluated (ID 1, 2 and 6 in Table I).
By utilizing tokens in their frameworks and implementations, only two of the contributions investigate the implications of introduced tokens on system properties (Column Imp.; ID 3 and 5 in Table I).Moreover, only one of the contributions illustrates the token design (Column Des.; ID 10 in Table I): The token is a modified ERC-721 token that has a source of value of ownership/access rights to data, its supply is uncapped and the token is transferable.Nevertheless, a standard illustration as utilized by Dobler et al. [67], Ballandies et al. [68] that would make different designs comparable has not been applied.Moreover, the impact of a specific token design on user information-sharing behavior is not rigorously investigated with a controlled experiment (Column Exp. in Table I).Assumptions thus have to be utilized in the abovementioned simulations and analysis that limit the applicability of the findings to real-world scenarios.Two of the works utilize multiple tokens in their application scenario (Column Mult.; ID 1 and 5 in Table I): Pazaitis et al. [1] enable the setup of multiple tokens to capture the value created in different decentralised communities, and ImaniMehr and DehghanTakhtFooladi [30] use multiple tokens for optimal sharing of bandwidth.Finally, despite touching on sensitive applications domains such as health or credit data, only one of the works discusses the ethical implications of their contributions and findings (Column Ethics; ID 1 in Table II).
This paper (ID 12 in Table I) addresses these limitations by evaluating the impact of two cryptoeconomic incentives on human information-sharing behavior in a controlled experiment involving 132 participants over four days (Section IV).The utilized token designs are illustrated (Section III-A3) and the (ethical) implications (Section V) of this work are discussed.

III. RESEARCH METHODOLOGY
The impact of two token incentives on human informationsharing behavior is investigated with an experimental methodology.The conducted experiment is explained below (Section III-A), followed by the measured variables (Section III-B), tested hypotheses (Section III-C) and the analysis methods (Section III-D).

A. Experiment
The experiment has been conducted by modifying the mixed-mode "living lab" [37] experimental methodology such that the randomized control trial is augmented with a 2x2 factorial pre-test/ post-test design that utilizes two token incentives (Figure 8).
The experiment consists of three phases (Figure 8): The entry and exit phase is facilitated by the ETH Decision Science Laboratory 2 (DeSciL) of ETH Zurich, using their   infrastructure and staff.The core phase is facilitated by the research team and a blockchain-based Web 3.0 application (Section III-A4).In the entry phase, participants provide their consent to the study and are instructed on the application of the software used in the core phase.Before the core phase, the participants answer an entry survey consisting of demographic questions.The core phase of the experiment consists of four days in which participants utilize the software artifact to share information.On the second and third day of the core phase, in exchange for their shared information participants obtain token rewards.In the exit phase, participants answer an exit survey and receive their financial compensation.The conducted experiment was granted an ethical approval by the Decision Science Laboratory (DeSciL) as well as the Ethics Commission of ETH Zurich.
In the following, Section III-A1 illustrates the real-world information-sharing scenario of the core phase, followed by Section III-A2, where the model of the collected data is introduced.Section III-A3 illustrates the applied incentives and the treatment groups.Then, Section III-A4 provides the technical specifications of the utilized software artifact.Finally, Section III-A5 provides an overview of the recruitment process and the compensation paid to the participants.

1) Scenario:
The scenario of the experiment has been illustrated by Ballandies et al. [15].A summary is given in the following and depicted in Figure 2: Participants of the experiment share solicited information via their personal devices (e.g.laptop or mobile phone) with an organization, using a Web 3.0 application (Section III-A4).In order to facilitate a realistic setup of the experiment and to comply with the anti-deception policy of DeSciL 2 , the shared information is received by a real-world library organization that has an interest in obtaining feedback from customers and unaware-customers3 of their services.Furthermore, in order to study user behavior in a realistic setting, participants can choose the time of feedback provision such that it is best integrated into their daily routines.As an incentive for sharing information with the library organization, participants receive units of two types of blockchain-based tokens (Section III-A3).The amount of token units collected by other users and a subset of their shared information can be discovered in interactions.
The model of the shared information is illustrated in Section III-A2.
2) Data Model: Figure 4 illustrates the ecosystem of the collected information as an ontology [69].The library formulated 274 survey questions, which they wanted to ask customers and unaware-customers of their services.These questions are of one of the following types: single-choice, multiple-choice, Likert scale, open text, or a combination of thereof.In the experiment, participants take the role of customers and share information with the library in the form of answers to the given questions.Participants have the option to enrich their answer to a question with three types of contextualizations (Figure 6).They can state from their perspective how important the question is for the library to improve their services (Likert scale), how satisfied they are with the answer options to the question (Likert scale), and provide a comment (open text field).As an incentive to share information, participants obtain units of two types of cryptoeconomic incentives: Money token and Context token, which are illustrated in greater detail in Section III-A3.
3) Incentives and treatment groups: Two types of cryptoeconomic incentives are utilized in this paper: The money token is a stable coin [70] that resembles the Swiss fiat currency.It Fig. 4: Data model of the experiment that visualizes the stakeholders (library organization and its customers), collected information, survey questions, and token incentives. is a capped, pre-mined, transferable, and non-burnable ERC-20 token whose units are pegged to the Swiss franc at an exchange rate of 1:0.2 CHF.Users obtain a unit of this token whenever they provide an answer to a survey question (Figure 4).
The context token is a utility token [66] and models reputation in the system: It is a ERC-20 token that is uncapped, transferable, burnable, and not pre-mined.A token unit is created whenever a contextualization (Figure 4) is performed in the system and is awarded to the user who provided that information.The amount of context token units collected is visible to others on a leaderboard during the experiment (Figure 5) and thus constructs users' reputation, which functions as a source of value to this token.In particular, reputation is a widely adopted incentive mechanism that has been utilized to improve the quality of shared data [21].Additionally, the context token can be utilized to access a privileged service in the form of voting actions [15], which further provides value to the token [21].
Figure 3 illustrates the 2x2 factorial design of the study, whereby the two token types awarded to experiment participants are varied: Group N is the control group that receives no token incentives; Group C obtains the context token; Group M obtains the money token; Group B obtains both, the money and the context token.
4) Technical infrastructure: This research applies the customer feedback system developed by Ballandies et al. [15] to enable users to share information with a library organization and to receive two cryptoeconomic incentives (Section III-A3) in exchange.The software artifact is a Web 3.0 app that utilizes the Finance 4.0 infrastructure [64] and the Ethereum4 (ETH) blockchain.It enables the collection of solicited and unsolicited feedback from users of an organization.Figure 6 illustrates how users can provide solicited feedback by answering questions posed by an organization.This feedback can be contextualized by i) stating the importance of the question to improve the organization's service (bottom left in Figure 6), or ii) stating the satisfaction with the answer options to the question (bottom center in Figure 6), or iii) providing further feedback via a comment field (bottom right in Figure 6).Figure 7 depicts how users can contextualize an answer with their satisfaction regarding the answer options.Figure 5 shows how reputation is facilitated in the system by comparing users based on the amount of collected context token units.Moreover, this view gives users an overview of their collected money and context token units.
5) Recruitment, compensation, and ethical approval: The participants were recruited by the ETH Decision Science Laboratory 2 (DeSciL), who, following their protocols and ethical standards, were guaranteed fair compensation, and information regarding participants' identity was separated from the experiment data, thereby enabling anonymity for the participants.150 participants were recruited, 132 of which completed the exit phase (88 % completion rate), which is a reasonable number that balances resources (compensation/ infrastructure), rigor, and control of the experimental process [37].In particular, the mixed-mode experimental process preserves the realism of the scenario by involving a realworld organization that obtains the shared information, while facilitating controlled experimental conditions that result in a novel high-quality dataset to allow (causal) inferences about human behavior under cryptoeconomic incentivization.
Participants were recruited from the full UAST5 pool (no criterion was applied), which mainly consists of students and researchers of ETH Zurich and the University of Zurich, and thus is subject to sampling biases when making inferences about the behavior of the general population.Nevertheless, as these are exactly the customers and unaware-customers of the real-world library organization around which the use case of information collection in this experiment was constructed, the participants' profiles match well to the experimental scenario.Consequently, the findings may be transferable to similar scenarios, where customers share information with an organization.Four recruitment sessions were performed within the period from May 17, 2021 to June 11, 2021.The DeSciL requires the fair minimum and avarage compensation of experiment participants.This is satisfied by compensating each participant i of a treatment group (N, C, M, B in Figure 3) Swiss francs via one of the following payout formulas p: where, MT(i) : amount of collected money token units of participant i, T = N × 40 CHF; total available payout i p(B i ); total received payout by groups M and B N : number of participants N (j) : number of participants in treatment group j (2) This results in a minimum compensation of 20 CHF and an average compensation of 40 CHF (0.5 CHF/min) for the participants.The payout for participants of treatment groups that received the money token (groups M and B in Figure 3) depends on the amount of money token units collected.This amount is multiplied by 0.20 CHF and then awarded to the participants.The total payout is capped at 60 CHF per participant resulting in a maximum of 150 questions for which a user can be rewarded per day.The payout for participants of the other treatment groups (groups N and C in Figure 3) depend on the payout of the treatment groups that receive the money token (groups M and B in Figure 3), such that the average compensation over all experiment participants is 40 CHF.

B. Variables and measures
Figure 8 illustrates the measured variables of this paper in the three phases of the experiment.The participants answered demographic questions in the entry phase and another survey in the exit phase.Two extrinsic incentives, the money token,   and the context token (Section III-A3) are manipulated as independent variables on the second and third day of the core phase.Several dependent variables are measured each day: The quantity of shared information is measured by the number of replies to survey questions.Moreover, two quality characteristics are measured: i) Contextualization is the number of contextualization actions performed by participants in response survey questions (via the bottom buttons in Figure 6 as shown in Section III-A4).This is the amount of "metadata" that a user provides with an answer that contributes to the usability of information and is considered a quality dimension of information [11].Further, ii) accuracy is a quality element of information that contributes to the reliability of information [11].Applying the methodology of estimating choice variability [71,72], accuracy is operationalized in this paper as follows: With equal probability, survey questions are displayed more than once to participants.The average accuracy with which a participant answers a specific question is then calculated by taking the Jaccard similarity [73] between the answers provided to that question.The final accuracy for a user is then obtained by taking the average similarity over all questions and days.

C. Hypotheses
The hypotheses of this paper test five assumptions regarding the utilized token incentives (Section III-A3) and treatment group assembly.They are formulated by connecting these assumptions to the introduced conceptual impact model (Figure 1).In the following, the five assumptions are first illustrated (Section III-C1) before the hypotheses are introduced (Section III-C2).
1) Assumptions: Assumption 1: The money token (stable coin) is perceived as a monetary incentive and thus has a similar impact on the human information-sharing behavior as money.The money token utilized in the experiment is a stable coin that has a fixed exchange rate with the Swiss franc (Section III-A3) and thus resembles fiat money.The impact of fiat money on human behavior has been studied in information-sharing scenarios of related work (Section II).Due to this resemblance, it is hypothesized that the money token has a similar impact on human motivation and information-sharing behavior as monetary incentives.In particular, the money token impacts the extrinsic motivation positively and the intrinsic motivation negatively such that the quantity of shared information is increased and the quality is decreased, as illustrated in Figure 1.
Assumption 2: The context token impacts intrinsic motivation positively.The context token is a utility token that has reputation as its source of value (Section III-A3) and it is thus hypothesized that it is perceived as a competence-enhancing incentive [46] that increases the intrinsic motivation of individuals.
Assumption 3: The context token impacts extrinsic motivation positively.Since the context token shares some characteristics with money (it is transferable and collectible), it is hypothesized that it has a positive impact on extrinsic motivation, albeit to a lower extend when compared to the money token.
Assumption 4: No interaction exists between the money and the context token.It is assumed that no interaction between the money and context token exists when they are applied simultaneously.
Assumption 5: No bias exists in the assembly of the treatment groups.It is assumed that each treatment group consists of a similar participant structure.
These assumptions are the basis for the hypotheses that are formulated in the following.
2) Hyptoheses formulation: In order to formulate the hypotheses, the assumptions (Section III-C1) are linked to the conceptual impact model (Figure 1): Under the assumption of no biases in the assembly of the treatment groups (Assumption 5), it is hypothesized on Day 1 of the experiment that no difference in behavior among the treatment groups measured in quantity of answers or contextualizations are observed, because no token incentives are applied on that day: Hypothesis 1: Day 1: Due to the impact on extrinsic motivation (Assumptions 1 and 3), it is hypothesized from the arguments above that the M group (money token incentives) shares a greater quantity of information during incentivization days when compared to to the C group (context token incentive), which in turn shares more information than the N group (control group).Moreover, under the assumption of no interactions (Assumption 4) and because both tokens contribute to extrinsic motivation (Assumptions 1 and 3), it is hypothesized that the B group (both token incentives) shares more information than the M group.Thus for Days 2 and 3, when incentives are applied, the following hypothesis is posed: Also, it is hypothesized that because of the competenceenhancing effect of the context token that would increase the intrinsic motivation of individuals (Assumption 2), the C group shares information with greater quality characteristics such as contextualization or accuracy when compared to the N group.Moreover, because of the negative impact of the money token on intrinsic motivation (Assumptions 1), the M groups quality characteristics are hypothesized to be worse than those of the N group.Finally, because i) the context token offsets the negative impact of the money token on intrinsic motivation (Assumptions 1 and 2), and ii) there is no interaction effect between the tokens (Assumption 4), it is hypothesized that the B group shares information with equal quality when compared to the N group, but less than the C group.Thus for Day 2 and 3, when incentives are applied, the following hypotheses are stated: Since no incentives are applied on the fourth day of the experiment, only the intrinsic motivation of individuals affects the characteristics of shared information on that day (Figure 1).Thus, because it is assumed that the money token decreased (Assumption 1) while the context token incentive increased (Assumption 2) the intrinsic motivation, it is hypothesized that for the quality characteristics the C group outperforms the N group, which in turn outperforms the M group.Moreover, the N group and B group share an equal number of contextualizations: Hypothesis 5: Day 4: In contrast, because intrinsic motivation only plays a minor role for the quantity of shared information (Figure 1), it is hypothesized that the number of answers given on Day 4 does not differ significantly between the groups: Hypothesis 6: Day 4: The accuracy is measured over all four days.Consequently, a hypothesis about the daily differences among the groups cannot be drawn.Accuracy is a quality characteristic (Section III-B).Quality has been found to be positively impacted by intrinsic motivation (Section II).Thus, it is hypothesized that the averaged accuracy score of the C group is higher than that of N group, which in turn has a higher score than the M group.Moreover, because the context token offsets the negative impact of the money token on intrinsic motivation, the N and B groups have a similar accuracy: Hypothesis 7: All days: accuracy(C) > accuracy (N) = accuracy(B) > accuracy (M)

D. Analysis methods
Figure 8 illustrates the methods that are applied to evaluate the hypotheses.The demographic information from the entry survey in the entry phase is utilized to illustrate the profiles of participants.Moreover, this information is applied in chisquared (χ 2 ) tests [74] to validate that no treatment group biases are present.In particular, the test is employed to test the null hypothesis that no relationship exists on the demographic variables among the treatment groups.Survey responses from the exit survey are used to validate the experimental setup such as the rewards obtained by the participants.Histograms, qqplots, the Shapiro-Wilk test [75,76] and the D'Agostiono & Pearson test [77,78] are used to investigate the distribution of the dependent variables.In order to analyze treatment group differences, the Kruskal-Wallis one-way analysis of variance by ranks for independent samples (H-test) is utilized [79] and, for a post-hoc pairwise comparisons test of mean rank sums, the Conover-Iman [80] and the Dunn [81] methods are applied.Furthermore, CDF plots are utilized to investigate differences in group behavior.The treatment effect of the applied incentives and the interaction effect among the incentives are analyzed via interaction plots.

IV. RESULTS
A. Demographics/ Profiles of the participants 150 candidates were invited to participate in the study, 132 of which completed all three phases (entry, core, and exit phase in Figure 8).The average age of the participants was 23.2 years.62 were male, 68 were female and 2 did not specify their gender.36 users had used blockchain/crypto apps before the experiment (50% 1-6 month, 16.6% 6-12 month, 16.6% 1-2 years, 16.6 % > 2 years).65 participants were bachelor students, 56 were master students, and 11 were "other".54.5 % of the participants stated that they were active users of the services of the library that functioned as a use case for the living lab experiment methodology.

B. Treatment groups biases, experiment validation, and dependent variable distribution
Table II depicts the results of the chi-squared (χ 2 ) test for the demographic questions and the treatment/wave per treatment group/recruitment wave.Neither in the treatment group construction nor in the recruitment waves are biases identified.
On average, the participants found the rewards fair (2.6/4) 6 and the onboarding materials useful (2.8/4).In particular, it has been identified in earlier work that learning how to utilize the web application is perceived as easy by the participants [15].Thus, the chosen compensation fulfilled the requirements of DeSciL 2 and the chosen technology in the form of a blockchain-based Web application did not restrict users in participating in information sharing.
Table III illustrates the distributions of the dependent variables utilized in the analysis.In the majority of cases, these variables are non-normally distributed, thus requiring the Kruskal-Wallis test that does not assume normally distributed variables to analyze the distribution [79].

C. Group differences and interaction effects
Table V depicts the results of the Kruskal-Wallis test applied to the distributions of the dependent variables for each day/ over all days of the four treatment groups.Moreover, Table VI and VII illustrate the post-hoc analysis that applies the Conover-Iman test for those days which exhibit significant differences in the Kruskal-Wallis analysis.Moreover, Figure 9 depicts the cumulative distribution for each treatment group and Figure 10 shows the interactions among the treatments for the analyzed dependent variables.
In the following, the observations for each dependent variable are illustrated in detail.
1) Quantity: The treatment group behaviors for the quantity variable are significantly different for Day 2 and Day 3 (Table V).Considering the post-hoc analysis (Table VI), it is possible to determine that for both days, all treatment group pairs are significantly different, except for the B-M (both token incentives-money token incentive) pair.The CDF plot illustrates this observation (Figure 9): The M and B groups have a similar higher probability to provide more answers when compared to the C (context token incentive) and N (control group) groups (in this order).Moreover, M and B distributions show two peaks, one around 60 answers and one around 150 answers, the latter being the maximum number of answers for which a payment is received on a given day (Section III-A3).These peaks are more clear visible on Day 3 and are stronger for the B group when compared to the M group.Moreover, the CDF plot for Day 3 (Figure 9c) indicates a tendency for money token receivers to answer a higher number of questions.The plots in Figure 10 illustrate the median interaction effects.Similarily to the Kruskal-Wallis test, on Day 1 and Day 4 no effect of the incentives are identified (Figure 10a  and 10d).On days 2 and 3 (Figure 10b and 10c), both incentives result in an increase of questions answered when compared to the control group, whereby the money token leads to a considerably stronger increase than the context token.Moreover, at Day 3 an interaction is observed: When compared to the money group, the context token dampens the effect of the money token in the B group resulting in fewer questions answered.
2) Contextualization: In contrast to quantity, the treatment group behaviors are significantly different for all four days (Table VI).The interaction plot on Day 1 (Figure 10e) illustrates how the context token treatment resulted in a higher number of contextualizations.The CDF plot on Day 1 (Figure 9e) depicts a similar distribution of the treatment groups with a higher tendency of the context group to provide more contextualizations.On Day 2, all groups except the M-N pair are significantly different (Table VI).Nevertheless, on Day 3 no differences between the B-N and B-M pairs are observed any longer.An opposing trend is observed in the M-N and B-C pairs where the p-values become smaller over the two days (Table VI).
The CDF plots for Day 2 and 3 (Figure 9f and 9g) illustrate these trends: On Day 2, the B group distribution is close to the C distribution.Nevertheless, on Day 3 it more closely resembles the M group distribution, where most individuals provide few contextualizations and few individuals many.Moreover, the difference between the M and N groups becomes stronger for individuals that provide few contextualizations.The interaction plots for Day 2 and day 3 (Figure 10f  and 10g) illustrate an interaction effect between the money and context token resulting in fewer contextualizations when compared to the context token alone.In contrast to the money token, no trend in the interaction is observed over the four days.
After removing the incentives on Day 4, the pairs B-C and C-M and M-N show a significantly different behavior (Table VI).This is in contrast to the observation for the quantity variable, where no distinct behavior on Day 4 is identified.No significant difference between treatments with the context token and the control group is identified.The CDF plot for Day 4 (Figure 9h) illustrates the similarity between the B-M groups, and respectively between the C-N groups and the difference between each of these pairs.The interaction plot (Figure 10h) illustrates how the C group provides the most contextualizations on Day 4, followed by the control group and then the other two groups.
3) Accuracy: A significant difference among the group behaviors for the accuracy is identified over all four days (Table V).Table VII indicates that this difference originates from the pairs B-N and M-N.Nevertheless, the p-values of the pairs B-C and M-C are also almost significant (p-value = 0.052).These differences are illustrated in the CDF-plot (Figure 9i) which depicts higher probabilities for the C and N group to reach higher accuracy values when compared to the other two groups.Figure 10i shows that the control group reaches the highest accuracy in their answers, followed by the (i) All Days Fig. 10: Interaction plots among the treatments for the dependent variables over the four days/ all days.The minus indicates that the token has not been applied, whereas the plus indicates an utilization of tokens.Thus, the dashed line connects the treatment groups that utilized the context token (left: treatment with context token; right: treatment with both tokens).The solid line connects the treatment groups that did not utilize the context token (left: treatment with no tokens (control group); right: treatment with money token).
context, both and money groups.
V. DISCUSSION

A. Results
Table IV illustrates the findings from the Kruskal-Wallis and post-hoc analysis with regard to the hypotheses (Section III-C2).The results inform an adjustment to the assumptions (Section III-C1) that were utilized in the formulation of these hypotheses: I) It was assumed that the context token has a positive impact on both the intrinsic and extrinsic motivation (Assumption 2 and 3 in Section III-C).Yet, the findings provide evidence that this token has overall only a small positive or negligible impact on intrinsic motivation.This would explain the parity of the C group and N group for the context characteristic on Day 4 (ID 8/Column b in Table IV): No incentives are applied on that day, thus the extrinsic motivation is equally zero for both groups and only the intrinsic motivation defines the contextualization behavior.However, the median number of contextualizations on Day 4 (Figure 10h) is higher for the C group when compared to the N group indicating a small positive impact that is also illustrated by the CDF ID Character.Hyp.Day TABLE IV: Findings from the Kruskal-Wallis and Conover-Iman posthoc analysis.Light green are those entries marked that accept the hypotheses stated in Section III-C.The following deviations can be explained by adjusting the assumptions of the hypotheses: i -Context token has a negligible effect on intrinsic motivation; ii -Extrinsic motivation has a considerable impact on contextualization; iii -Interaction effects between the tokens are present.
plot (Figure 9h).The neglible impact is also illustrated in the parity between the B and M group on Day 4 (ID 8/ Column d in Table IV).The money token reduced the intrinsic motivation in both groups and because the context token did not offset this negative impact, it is the same for both groups on Day 4.Moreover, the parity of these pairs for the accuracy characteristic (ID 9/Column b and d in Table IV) can be explained thus: Since accuracy is mainly impacted by intrinsic motivation, which following the previous considerations is equally pronounced between the M and B groups, the group behaviors are equal.It also explains the inequality between the B and N group (ID 9/ Column f in Table IV): Since the intrinsic motivation is reduced in the B group due to the money token, and the context token does not have a significant positive impact on intrinsic motivation, the intrinsic motivation in the B group is lower than in the N group and thus the shared information is of lesser accuracy.
II) Extrinsic motivation has a considerable impact on the context characteristic.i) This explains the parity between the M and N group for Days 2 and 3 (ID 6,7/Column c in Table IV).Although the intrinsic motivation is reduced due to the money token, it is replaced by the extrinsic motivation stemming from this incentive resulting in a similar contextualization-sharing behavior.Consequently at Day 4, when the incentive is removed, the M group shares fewer contextualizations (ID 8/Column c in Table IV).ii) It also explains the inequality between the B and N group on Day 2 (ID 6/Column f in Table IV).In contrast to the comparison between the M and N group, the context token in the B group adds to the extrinsic motivation such that the decrease in intrinsic motivation is exceeded resulting in a greater motivation to share contextualizations when compared to the N group.iii) Moreover, following the same arguments, it also explains the inequality between the B and M group on Day 2 (ID 6/Column d in Table IV).iv) Finally, it also explains the inequality between the C and N group on Days 2 and 3 (ID 6,7/Column b in Table IV).
III) In contrast to Assumption 4, interactions between the money and context token incentive are observed.i) For Day 2, the B group shares more contextualizations than the monetary group (ID 6 Column d in Table IV).Nevertheless, on Day 3 no difference is observed (ID 7 Column d in Table IV), which indicates that over time the two tokens interact with each other, thereby decreasing their impact on the users' motivation.ii) This might also explain the parity between these groups for the quantity of information shared on Day 2 and 3 (ID 2,3/ Column d in Table IV): Both tokens interfere such that their combined impact on users' motivation does not differ from a single token incentive.Furthermore, the interaction plot (Figure 10) even indicates a lower positive impact of the combined incentives when compared to the single money token incentive.In addition, the plot indicates that this interaction becomes stronger over the four days, which is also illustrated by the CDF plot.iii) This interaction also explains the shift from inequality to equality for the B and N group on Days 2 and 3 (ID 6,7 Column f in Table IV).
The findings further provide evidence that the money token crowds out intrinsic motivation.The interaction plots and CDF plots for the accuracy indicate a crowding-out of intrinsic motivation by the context token.Nevertheless, according to the Kruskal-Wallis test and its post-hoc analysis, these latter differences are not significant.In particular, for the number of contextualizations the context token even has a positive impact after incentivization ends, which might be explained by an internalization of the incentive (Section II-A) for this information dimension.Finally, a time effect is present in both, single-and multiple-token scenarios, which indicates that the behavioral change can vary over time.

B. Implications
1) One-dimensional token systems: The internalization effect of the context token on contextualization actions after incentivization ends illustrates a potential advantage of blockchain-based cryptoeconomic incentives when compared to traditional approaches utilizing monetary incentives.The intrinsic motivation of users might be impacted positively by internalizing these incentives (Section II-A), thus resulting in an improvement of performance measured in the amount of contextualizations provided, even after the incentivization period ends.However, the findings also indicate that this utility token induces a worse performance in the accuracy of shared data when compared to the control group (Figure 9i and 10i).Therefore, the identified internalization might be limited to information dimensions that are directly incentivized by a utility token.
Moreover, the findings indicate that stable coins such as the utilized money token crowd out intrinsic motivation, which resulted in this work in a reduction of information quality measured in accuracy and contextualization.
In order to design effective incentives, future work should evaluate the token designs and scenarios under which internalization or crowding-out are observed.In particular, as internalization or crowding-out can vary between different performance measures (e.g.contextualization and accuracy), one is advised to carefully evaluate all impacts a token might have before using it in real-world applications.
2) Multi-dimensional token systems: The findings provide evidence that applying multiple token-based incentives simultaneously can result in a combined improvement of several information characteristics (e.g., as shown for quantity and contextualization) and could therefore improve system performance when compared to a scenario where a single token is utilized.Nevertheless, the identified interaction effect between the two tokens of this paper indicates that designing multitoken systems is a non-trivial task that has implications for systems in which the application of multiple tokens is considered.In particular, positive and negative impacts of tokens on human behavior may not simply add up.The findings also show that these effects may only become apparent over time.Thus, it is necessary to carefully analyze the interdependencies between combinations of tokens in longitudinal studies before they are utilized in real-world systems.The results of simulations and formal analyses of multi-dimensional token systems are limited if they do not consider these token interactions.
3) (Ethical) risks: Considering the observation, that the current big data paradigm is not challenged by a lack of data [13], but contextualized and accurate information, the findings of this paper raise the question if incentives in the form of blockchain-based tokens should be applied at all to motivate individuals of communities to share information.Such incentives may result in a further increase in quantity of collected information while reducing its quality (e.g.accuracy).
However, incentives might work differently in data-sharing scenarios where the quality of shared data under different incentivizations is determined by decisions users take a priori, which are then posteriori executed by an artificial intelligence, as studied by Pournaras et al. [82], Asikis and Pournaras [83] with a computational methodology for privacy-utility decisions.Yet, neither the user acceptance of these decisions nor the impact they have on the trust of users for decisionsupport systems have been studied.Therefore, unknown effects could be present in these scenarios that bias users' behavior and which limit the generalizability of findings from such simulations to real-world situations.Furthermore, because users have to perform decisions a priori in these scenarios, such approaches might fail to capture the unique situational domain knowledge users possess or their creativity and intuition which is required for some application domains such as the customer feedback provision analyzed in this paper.
Increasingly, token incentives are applied in various application domains of society such as construction [31], health [27], Covid-19 prevention measures [60], electricity production and consumption [32], car sharing [84,85], alleviating traffic congestion [86], book-keeping [38], decentralized accesscontrol systems [40], or waste reduction [34].Nevertheless, behavioral traits stemming from intrinsic motivation, such as creativity, joy, self-determination, purpose, and endurance may be important for some of those application domains which could be crowded out by these cryptoeconomic incentives and thus would result in reduced performance.For instance, endurance [87] and creativity [88] have been identified as important factors for addressing climate change.
Moreover, this increasing tokenization of areas of life that have not been tokenized before could reduce social relations and human interactions to transactions within a market-driven economy [1].This might be in opposition to values that stakeholders in these systems hold [1].In addition, it has been found that the measurement act itself, which is required in tokenization for the quantifying and proving of actions [64], can reduce intrinsic motivation and thus creativity and endurance in individuals [89].
In addition, the identified effects of this work question the assumptions of controversial token systems in the form of social credit systems [90,91], which are also discussed in Western democracies as tools for managing society [92]: Centrally designing and introducing token incentives may fail due to unkown, crowding-out or interdependent effects that may cascade over time.Considering the large and complex design space of DLT systems [66], an iterative, local, and community-driven approach utilizing the wisdom of crowds and self-organization for token designs as illustrated in Dapp et al. [64] might be the way to proceed in designing stable token systems.In particular, these principles have been found to enable communities to mitigate the tragedy of the commons and successfully share and maintain a common resource [93].
Thus, before applying token incentives in an application scenario, this author suggests rigorously considering the values of all stakeholders in the system construction process and analyzing whether applying such incentives could crowd out intrinsic motivation in the scenario under consideration.Only then should the system be iteratively constructed in scenarios that are locally bound.For this, the methodology of this paper in combination with value-sensitive design [94,95] and iterative design science research [96,97] methodologies can be applied, as demonstrated for token-based blockchain systems by Ballandies et al. [15,68].
4) Limitations: The experiment facilitates realism while enabling the laboratory-like testing of hypotheses [37].Due to the realism sought, not all influences on users' informationsharing behavior could be controlled for, which may reduce the quality of the measurements and findings.In particular, the questions asked are formulated by the library organization which had a real business interest in the answers.Thus, the questions are not standardized and hence, some of the questions might be more difficult to answer.This could introduce bias to quality characteristics such as accuracy and may have resulted in the lower differentiation between treatment groups in this characteristic (ID 9 in Table IV).Furthermore, the accuracy was summarized over the four days.As a result, there is a lack of a granular daily view on the impact of token incentives on this characteristic.
5) Impact: The realistic setup of the experiment illustrates and underlines the importance of the findings of this paper for real-world organizations and communities.The identification of significant positive and negative effects of both token incentives on human sharing behavior and their observed interactions provide evidence that such effects are present in real-world sharing scenarios and should therefore be analyzed and evaluated by organizations and communities before they are applied in their use cases.In particular, a token design may not be robust, with use of the token having a different impact than intended [98].The methodology of this paper can be applied to analyze such effects in real-world systems.
The identified effects (interaction, internalization, time, and crowding-out) inform the Token Engineering and Token Economics community in the design of stable cryptoeconomies.Currently, methodologies in these fields mainly rely on game theory, mechanism design, and simulations [39,99,100,101,102,103].Nevertheless, none of these approaches considers the identified effects of this work on human behavior in their assumptions.Consequently, including these effects could improve the correspondence of findings from these methodologies with reality.Thus, this paper demonstrates the importance of behavioral experiments in the field of Token Engineering and Token Economics.
In addition, this work illustrates the usability of selfdetermination theory to test hypotheses of token designs on human behavior.

VI. CONCLUSION AND OUTLOOK
This work evaluates the combined impact of multiple cryptoeconomic incentives in the form of blockchain-based tokens on human information-sharing behavior.By utilizing a rigorous experimental methodology with a 2x2 factorial design involving 132 participants, the impact is evaluated in a realworld information-sharing scenario involving a major Swiss organization and its customers.The identified interaction effect between the tokens and the potential crowding-out of intrinsic motivation by these cryptoeconomic incentives are important for researchers and practitioners to consider because they indicate that designing multi-token systems is a non-trivial task: The impact of individual token incentives on human behavior are not independent from each other and a token design might not be sufficiently robust, with the impact of the token possibly differing from the intended effect.These impacts have to be considered when implementing, simulating or mathematically analysing token economies as presented in the Discussion (Section V-B5).In particular, they inform the assumptions taken in theoretical models, validate their accuracy, and may thus facilitate their improved connection with reality.Therefore, the methodology of this paper and the identified effects might be of use for organizations and communities that intend to apply (multiple) token incentives.
The results point to various avenues for future research.i) Since information quality is a multi-dimensional concept (Section III-B) and the impact of token incentives can vary between those dimensions (Section V-B1), the impact of the chosen token incentives on other operationalizations of quality than accuracy or contextualization can be evaluated to further quantify the impact of these tokens on human information-sharing behavior.ii) In general, considering the broad design space of tokens and blockchain systems, the impact of further instances of cryptoeconomic token incentives should be evaluated in experiments to identify conditions and scenarios that are impaired or benefit from the introduction of cryptoeconomic incentives.iii) Due to the identified interaction effect and the complexity of potential system layouts, evaluating all these combinations in experimental setups might not be feasible.Thus, simulations should be employed to identify areas of interest in the design space, which, in a second step, are investigated in experiments.Modeling the determined effects of this work as emergent phenomena of a complex system could be a promising approach for these simulations.Finally, machine learning methods such as kmeans or hierarchical clustering could be utilized to identify hidden patterns in the data that may impact human sharing behavior under incentivization.
Thus, to conclude, further research by the cryptoeconomics community is required to identify why, how, and in which situations cryptoeconomic incentives should be applied.

Fig. 2 :
Fig. 2: Experiment scenario: Participants share information with a library institution and obtain blockchain-based tokens in return.Tokens collected by other users can be discovered in interactions.

Fig. 3 :
Fig. 3: Treatment groups in the 2x2 factorial design experiment vary according to the received token incentive: N = no token incentives; C = context token incentive; M = money token incentive; B = both token incentives.

Fig. 5 :
Fig. 5: Statistics view of the utilized software artifact.It depicts the amount of context token and money token units collected by a user (above).Moreover, the software artifact shows the leaderboard that compares users based on the collected context token units (below).

Fig. 6 :
Fig. 6: Answer view of the utilized software artifact: Users can answer questions posed by the library and contextualize it with three contextualization options (importance, satisfaction, and comment).

Fig. 7 :
Fig.7: View of the satisfaction contextualization (Figure6).Users can specify how satisfied they are with the answer options provided for a question.

Fig. 8 :
Fig.8: Measured variables in the three phases of the experiment and the applied analysis methods.

Fig. 9 :
Fig.9: Cumulative density plots for the treatments of the dependent variables over the four days/all days.The plot illustrates the cumulative percentage of users who reach an equal or lower value of the variable.
V: p-values obtained from the Kruskal-Wallis test when comparing the different treatment groups for the five dependent variables.Levels identifying significant differences among the treatment groups distributions: ≤ utilize tokens as representations of ownership and access rights to data sets.

TABLE II :
Results of the chi-squared (χ 2 ) test for the eight demographic questions and the treatment/ wave grouping per treatment group/ recruitment wave illustrating that no bias is identified in the construction of the groups or the recruitment waves.

TABLE VI :
Daily p-values of Conover-Iman post-hoc analysis for the significant values of the Kruskal-Wallis test (TableV) for the quantity and contextualization variables.

TABLE VII :
p-values of Conover-Iman post-hoc analysis for the significant values of the Kruskal-Wallis test (TableV) for the accuracy variable over all days.