A Volunteer Computing Architecture for Computational Workflows on Decentralized Web

The amount of accessible computational devices over the Internet offers an enormous but latent computational power. Nonetheless, the complexity of orchestrating and managing such devices requires dedicated architectures and tools and hinders the exploitation of this vast processing capacity. Over the last years, the paradigm of (Browser-based) Volunteer Computing emerged as a unique approach to harnessing such computational capabilities, leveraging the idea of voluntarily offering resources. This article proposes VFuse, a groundbreaking architecture to exploit the Browser-based Volunteer Computing paradigm via a ready-to-access volunteer network. VFuse offers a modern multi-language programming environment for developing scientific workflows using WebAssembly technology without requiring the user any local installation or configuration. We equipped our architecture with a secure and transparent rewarding mechanism based on blockchain technology (Ethereum) and distributed P2P file system (IPFS). Further, the use of Non-Fungible Tokens provides a unique, secure, and transparent methodology for recognizing the users’ participation in the network. We developed a prototype of the proposed architecture and four example applications implemented with our system. All code and examples are publicly available on GitHub.


I. INTRODUCTION
Over the past decade, personal computers (PCs) have become one of the most consolidated markets.In 2021, approximately 340 million PCs were shipped worldwide [1], considering a revert of the trajectory since 2011 [2].Still, today smartphones represent the most dominant technology, with around 1.5 billion devices sold per year in the last five years [3], [4].
Further, Internet users are currently growing at an annual rate of 4.0 percent, equating to an average of more than half a million new users each day [5].This colossal number of computational devices represents an enormous opportunity from a computing perspective.According to the 2021 TOP500 rank [6], the most powerful supercomputer FUGAKU provides around 8 million cores.Based on these statistics, the The associate editor coordinating the review of this manuscript and approving it for publication was Fabrizio Messina .computational power obtainable by a tiny number of all pos-30 sible internet-accessible devices is significantly more.Taking 31 advantage of this huge (and mostly unused [7]) processing 32 capacity represents a powerful opportunity for science and 33 society.

34
The paradigm of Volunteer Computing [8] (VC) emerged 35 as a prominent approach to harnessing the computational 36 capabilities of such devices.VC is a type of distributed com-37 puting based on two pillars: computation and participation.38 The former refers to the ability of the network to orchestrate 39 heterogeneous computational nodes to perform a given task.40 The latter is the cornerstone of the whole paradigm and refers 41 to the mechanism by that people voluntarily donate their com-42 puting resources to the network to collaborate on a project.43 Although VC comes with peculiar technological challenges 44 (e.g., managing nodes with heterogeneous hardware and soft-45 ware, high dynamicity of the environment, asynchronism), 46 this paradigm provides researchers with lower-cost computing power and reduced energy consumption.To alleviate some intrinsic limitations of VC systems and encourage joining volunteer networks, the paradigm of Browser-Based Volunteer Computing (BBVC) [9] gained popularity, also thanks to improvements in the processing capacity of web browsers and the release of powerful software libraries (e.g., WebGL, and TensorFlow.js)[10].BBVC provides access to the volunteer network using web applications, which execute volunteer jobs in the background and transparently from the user's perspective.These systems inherit all benefits from web browsers, offering portability, flexibility, and ubiquity.
Contributions: Over the last four years, different BBVC platforms have been proposed.Still, there is room for improvements to make such platforms fully decentralized and scalable, offering users a trusted computing environment and providing complete control over the resources donated to the volunteer network.To make a step toward this direction, this paper introduces VFuse, a fully-distributed volunteer network based on a peer-to-peer (P2P) architecture.VFuse offers volunteers an easy and ready-to-use programming environment directly in their browsers through web-based interactive notebooks, which allow users to develop and monitor the requested computation and analyze its results.Further, our platform provides a secure and trustworthy execution environment and a reward mechanism thanks to the adoption of Blockchain technology to ensure results' reliability.VFuse represents an effort to make the web decentralized [11], allowing users to run intensive computing tasks in a free-touse and trusted environment.
The major contributions of this paper can be summarized as follows: • The design of a novel architecture for BBVC defined over a fully-distributed P2P network; • The proposal of an innovative rewarding strategy based on blockchain technology to incentive users to join the volunteer network; • A user-friendly interface based on web notebooks to easily access the volunteer network and benefit from its computing capabilities transparently; • An empowered multi-language programming environment offered via interactive web notebooks; • A detailed description of four applications, presenting how VFuse can be exploited in such contexts; • A prototype of the VFuse system, available on a public GitHub repository [12], which includes workflow orchestration functionalities, storage capabilities using IPFS, and two execution backends offered with JavaScript and Python programming languages.
The remainder of this paper is organized as follows.
Section II reviews the key ideas behind BBVC, describes the main existing frameworks in the context of BBVC and discusses the challenges these frameworks need to face.
Section III illustrates the main features of VFuse, the rationale behind the choice of the technologies used, and how they impacted the platform's design.Section IV details the 103 architecture of VFuse, delineating its functionalities and 104 internal mechanisms.Section V describes how VFuse can 105 be exploited through four use cases.Section VI discusses 106 a first performance evaluation and the current limitation of 107 the VFuse prototype.Finally, Section VII concludes this 108 work by delineating our current work and possible future 109 directions.

111
VFuse is a volunteer distributed browser-based library to 112 create and execute scientific workflows.To better clarify its 113 position within the state-of-the-art, we first introduce the 114 concept of VC, along with the most popular VC frameworks.115 Then, we focus on BBVC and its peculiar challenges.Finally, 116 we present existing solutions, describe the peculiarities of our 117 proposed system, and provide a detailed comparison among 118 the available BBVC frameworks.

120
VC is a computational paradigm based on the willingness 121 of people to donate idle computing resources to run com-122 putational and storage-intensive tasks [13].VC shares many 123 similarities with online community-based projects, in which 124 people's desire to voluntarily contribute resources -such as 125 knowledge, time, and skills -underpins the sustainability of 126 such initiatives [8].

127
The idea of exploiting idle resources from volunteer com-128 puters came from the GIMPS (Great Internet Mersenne Prime 129 Search) project in 1995 [14].The project is still running, 130 and it allowed the discovery of the 51 st Mersenne prime 131 in 2018, the current largest known prime number.Other 132 early projects include distributed.net[15], SETI@home [16], 133 and Folding@home [17].Today there are over 30 active 134 projects.

135
Two of the most popular VC frameworks are BOINC [18] 136 and XtremWeb [19].Both frameworks exploit a centralized 137 architecture for managing jobs and resources.Further, they 138 require users to download and install a specialized client 139 to execute project tasks.This approach has two critical 140 drawbacks concerning the programmability and the trusti-141 ness of the applications.First, there is a constraint on the 142 programming language used as each application must be 143 implemented with the same framework language.Second, 144 adopting a centralized architecture upper limits the number of 145 volunteer nodes as their number cannot exceed the network 146 and computational capabilities of the server node.

147
To address the issues related to the client-server archi-148 tecture, other VC frameworks either rely on a P2P overlay 149 network [20], [21] or a blockchain-based system [22], [23].150 Specifically, the idea of using a blockchain in a VC frame-151 work is tailored to solve the issue of trust between devices and 152 the lack of traceability, making it difficult for users to evalu-153 ate the contribution and credit of each volunteer [24].Further, 154 the characteristics of blockchain, such as decentralization 155 and persistence, allow solving the problems of scalability 156 • Fault Tolerance.The BBVC platform should be tolerant

207
• Usability.The BBVC platform should be easy to deploy 208 and use.

209
Along with the above desiderata, we also defined some additional features a BBVC system should have to encourage volunteers' engagement and improve the platform's reliability and functionalities.
• Task deployment and scheduling.Flexibility of the BBVC platform in supporting different deployment and scheduling policies.
• Result reliability.The BBVC platform must ensure the computed results' correctness and prevent any result manipulation or malicious execution.
• Supported programming languages.Flexibility of the BBVC platform in supporting different programming languages.
• Supported computational paradigms.Flexibility of the BBVC platform in supporting different computational paradigms.
• User resource usage.Possibility of configuring the number of local volunteer resources, such as CPU or memory, to allocate for computing the task.
• Data management.Ability of the BBVC platform to support data operations, such as data gathering, manipulation, and storage.

2) BBVC FRAMEWORKS
BBVC frameworks rose to prominence over the last decade thanks to the incredible advancement in web technologies and the ever-increasing web usage.We can distinguish three generations of BBVC systems [9], which reflect the improvements in the web programming language, communication protocols, and thread support.In this paper, we specifically focus on reviewing and comparing the third-generation frameworks most similar to VFuse.For a comprehensive review of BBVC frameworks, we refer the reader to the survey of Fabisiak and Danilecki [9].
Madoop [32] leverages the power of WebAssembly [37] to implement a distributed MapReduce framework on browsers.The central server, which hosts the Hadoop software, handles the management of both jobs and results.Each job is written in C/C++ and compiled in a WebAssembly format to be run in the browser.The client web page, which runs a Madoop code snippet, requests a job to the main server, and, upon completion of its execution, it returns the results to the server.Then, the server sends back the results to the job initiator.
JSDoop [10] is a library for distributed collaborative high-performance computing in web browsers, based on the MapReduce paradigm as Madoop.Both JSDoop clients and servers are implemented in JavaScript.A queue server handles the task scheduling and the result management, while data are available in a centralized server.More queue servers could be used to guarantee load balancing.
BrowserCloud.js[33] proposes a decentralized architecture to find and utilize resources through a P2P overlay network.Participants join the network via a centralized rendezvous point; then, the message routing is handled via an adaptation of the Chord routing algorithm, designed for a P2P   Genet [35] is an evolution of Pando that tries to overcome its scalability problems -due to direct connections handled with WebRTC -by using a fat-tree overlay network (where processors are located on the leaves and internal nodes relay data for all their children).Genet differs from Pando in managing browser connections, switching a node's role from management to relay when its direct connections (children) reach a given threshold.
CollabChain [36] is a browser-based volunteer platform that relies on blockchain technology to provide a trusted environment and foster users to make their resources available to the network.CollabChain is based on a P2P overlay network and defines three types of nodes: submitters, executors, and coordinators.Submitters require a task, i.e., a JavaScript function and its inputs, while executors compute them.A single coordinator acts as a bootstrap node and maintains a database of all tasks uploaded by submitters.The blockchain guarantees payment for the volunteers that complete their work and honesty of results by matching the output evaluated by the volunteer and the pre-computed output described in the smart contract.

CollabChain currently represents the most similar work to
VFuse.Nonetheless, several major points distinguish the two architectures.The first significant difference relies on how tasks are deployed and scheduled.If a VFuse node wants to submit a workflow (see Section III-A), then it has to broadcast it over the P2P network and wait for the results.Even if the node disconnects, the network still gossips the workflow.
Tasks of each workflow are then scheduled by each volunteer based on the associated priority.On CollabChain, submitters have to submit their tasks to the coordinator, and they are required to stay online even after delegating the process function and the inputs to the executors to obtain computed outputs from the executor.Executors directly choose tasks from the coordinator, and no scheduling policy is explicitly described.The second significant distinction regards the computing paradigm.VFuse offers users a platform to define the requested computation as a workflow, within which either dependent or parallelizable tasks may be specified and run by different volunteers.In contrast, CollabChain defines the requested computation via a JavaScript function that can be run by a single executor.Other main dissimilarities relate to (i) the management of the user resources offered to the volunteer network (CollabChain does not offer a direct control), (ii) how data are handled (VFuse relies on IPFS, while submitters and coordinators need to exchange data on Col-labChain directly), and (iii) the rewarding mechanism (VFuse exploits the rewarding mechanism to prioritize workflows, 329 while CollabChain provides an actual currency).
330 Table 1 compares VFuse with the main BBVC frameworks, 331 considering the desiderata described in Section II-B1.It is 332 worth noting that all systems inherit accessibility, availabil-333 ity, heterogeneity, security, and usability requirements from 334 browsers.

336
The VFuse architecture is built upon two cornerstones: 337 (i) ensuring a high level of scalability and (ii) storing 338 inputs, outputs, and authorship of users' tasks over a pub-339 lic blockchain.Table 2 describes the main characteristic of 340 VFuse, clarifying how the platform addresses the challenges 341 described in Section II-B1.The following sections illustrate 342 the main design choices behind VFuse.

344
Distributed computing offers mechanisms and tools for 345 orchestrating distributed computational workflows and 346 resources for transparently solving problems.In this context, 347 a critical issue is to use the proper programming model to 348 define the distributed computation.Scientific workflow [38] 349 is a commonly used paradigm to manage the coordinated exe-350 cution of actions that can be repeatable and dependent on each 351 other.This design enables the plugging of problem-solving 352 components within the workflow to prove a scientific hypoth-353 esis.Such a paradigm brings several benefits, such as automa-354 tion, scalability, resilience, and verifiability.

355
VFuse adopts the workflow pattern, allowing the design of 356 the requested computation as a sequence of interdependent 357 jobs (or tasks).In other words, the computation is divided into 358 self-consistent jobs, whose execution may depend on the ter-359 mination of other tasks.Hence, a VFuse application is defined 360 by a pipeline of jobs that is modeled via a directed acyclic 361 graph (DAG) (see Section IV-B5.a).VFuse provides opera-362 tions to build workflows, add jobs and describe dependencies 363 between them.The generic nature of this approach allows 364 programmers also to exploit other distributed paradigms in 365 VFuse, such as fork/join and MapReduce.

367
We designed VFuse to support the development and exe-368 cution of distributed applications via an interactive web 369 notebook, à-la Jupyter, CodePen, Gitpod, or JS Fid-370 dle.This choice offers VFuse volunteers a ready-to-use, 371 quick programming environment without requiring software 372 configuration and installation.Further, web notebooks 373 provide a dynamic programming environment supporting 374 multi-languages, workflow monitoring, submission, and 375 visualization of results.Specifically, VFuse provides a set of 376 asynchronous functions -that support different programming 377 languages -to retrieve and store data from the network, build 378 workflows, and add jobs to them, specifying input data and 379 dependencies.

381
The myriad of different technological challenges we incurred 382 during the design of our system profoundly shaped the final 383 architecture of VFuse.We exploited the following technolo-384 gies to enable communications among several P2P-connected 385 devices, guarantee the trustiness of job executions, manage distributed data, and support a multi-language programming environment.A description of the key technologies chosen and their impact on the VFuse follows.
• WebAssembly [37] (Wasm) is a low-level assembly-like language runnable in web browsers.Wasm is designed as a portable compilation target for programming languages, meaning that it allows languages like C/C++, Rust, or Python to run on the web with near-native performance.Wasm is also designed to run alongside JavaScript, offering programmers a way to take advantage of WebAssembly's performance and power and JavaScript's expressiveness and flexibility in the same application.Among the other strengths of Wasm, there is its safeness (memory-safe, sandboxed execution) and easiness of debugging (textual format).Further, Wasm maintains the versionless, feature-tested, and backward-compatible nature of the web.
Architecture Insight: The use of Wasm as a core technology of VFuse is critical to improve Web Workers' performance and provide support for programming languages other than JavaScript.Currently, VFuse supports the development of workflows written in JavaScript or Python.The rationale behind the choice of Python comes from the plethora of libraries the language offers to manipulate and analyze data, well-suited to implement scientific workflows.To implement Python Web Workers, we used Pyodide [39] as a Wasm-compiled Python interpreter.Specifically, Pyodide is a Python distribution for the browser and Node.js based on WebAssembly/Emscripten [40] that makes it possible to install and run Python packages in the browser with an embedded version of the pip python package manager.Hence, all general-purpose and scientific Python packages -such as NumPy, pandas, SciPy, Matplotlib, and scikitlearn -can be used.Further, Pyodide allows the programmer to easily mix JavaScript and Python in the same code script thanks to a robust foreign function interface.The use of WASM as underlying technology ensures that VFuse can be easily expanded to support other programming languages.
• Libp2p [41] is a network framework supporting the development of decentralized P2P applications based on WebSocket or WebRTC to enable communication among nodes.Built upon the Kademlia DHT [42], a network protocol that allows the development of P2P network applications, Libp2p leverages public-key cryptography [43] to manage peer identities and enable secure communication.Libp2p offers NAT traversal, circuit relay, stream multiplexing, and addressing functionalities.
Architecture Insight: The use of Libp2p enables VFuse to be aware of the status of the network by hindering the underlying network communication protocol and details on the routing tables.Libp2p also offers new VFuse volunteer devices the possibility to join the network through bootstrap nodes, whose number can be increased on-demand based on the size of the network.Lastly, Libp2p enables direct communication between VFuse nodes, allowing them to inform other devices about a new workflow to be run.The flooding of the workflow within the network happens via a gossip strategy, which avoids a centralized orchestration, and terminates either when the Initiator ends its execution or its time-to-live (TTL) expires (see Section IV-B4).
• IPFS [44] is a distributed file system built upon Libp2p

514
VFuse is a decentralized network that acts as a workflow 515 manager accessible via browser for volunteer-based dis-516 tributed computation.VFuse's primary purpose is to enable 517 users to access a robust and secure volunteer network without 518 requiring the installation and configuration of any additional 519 software.VFuse users can define asynchronous workflows 520 made up of functions (jobs or tasks) with possible temporal 521 dependencies on their execution.Thanks to the asynchronous 522 nature of VFuse workflows, users are free to leave the net-523 work while their required workflow is running and gather its 524 results at any moment in the future.

525
The VFuse architecture is designed on top of the following 526 innovative objectives:

608
The following sections illustrate each component in detail.The VFuse application provides access to the user profile configurations (Profile page), workflow management features (My Workflows and Running Workflows pages), network monitoring (Network page), and a logging console (Console page).
• In the Profile tab, each user can set up the information related to the particular VFuse network to join by specifying the IP address of its bootstrap node, signal server, and pinning cluster.
• On the My Workflows page, users can create, develop, locally test, and submit a new workflow to the volunteer network.Figure 3a shows a snapshot of the VFuse client application detailing the programming IDE offered to manipulate workflows.Users can also visualize an interactive graphical representation of the computation using a Job DAG (see Figure 3b), which is automatically updated according to the computation's status and provides the computed results for each job.
• In the Running Workflows tab, users can visualize the queue with the received and running workflows, representing all workflows the user received from the network and offered to their computational resources.
• In the Network tab, users can continuously monitor the status of the VFuse network by listing the peers they are connected with.
• Finally, on the Console page, users can check VFuse logging messages.

2) API COMPONENT
The API Component provides external access to the functionalities offered by the VFuse platform.Further, this  component offers programmers an API to manage workflows and develop them using different programming languages.
Specifically, the workflow API allows the programmer to (i) retrieve and store computation data asynchronously and (ii) manage new jobs and their dependencies.Table 3 briefly describes each function, along with the required parameters and return type.

4) NETWORK COMPONENT 687
The VFuse network component is based on the event-driven 688 programming paradigm and enables VFuse peers to exchange 689 workflows and data through GossipSub [46], a publish/-690 subscribe protocol.GossipSub exploits the idea of gossip-691 ing [47]; namely, it floods the network with messages to 692 ensure a reliable communication of data and workflows in a 693 dynamic environment.Messages to gossip are stored in the 694 nodes' local Gossip queue and are forwarded only if their 695 TTL is not expired or the Initiator stopped the workflow.The 696 message payload is compressed using the LZ77 [48], [49] 697 algorithm to preserve bandwidth and memory.

698
Further, the communication component also provides (i) 699 a configurable proxy to enable the system to use HTTPS and 700 Web Socket Secure protocols and (ii) a built-in IPFS gateway, 701 granting direct access to the IPFS resources via the HTTP 702 protocol and avoiding the use of an external public gateway.703

5) ENGINE COMPONENT 704
The Engine component is the core of the VFuse architecture.705 It comprises four interoperable modules, through which it 706 defines (Workflow module) and orchestrates workflows and 707 their computation (Computing module), stores users' profile 708 information and preferences (Identity module), and regulates 709 the reward mechanism (Reward module).A detailed descrip-710 tion of each module follows.

712
A VFuse computation is defined using a computing work-713 flow.Specifically, each workflow is a sequence of jobs, i.e., 714 functions with properties and data.The programmer can 715 TABLE 4. VFuse events and data component messages.
define temporal dependencies between jobs and visualize 716 them via the workflow DAG, representing jobs as nodes and 717 dependencies as edges.Users can check the execution of (yellow) when waiting for the termination of other jobs, (iii) Repeating (grey) when the job is (re)scheduled until the entire workflow stops or expires, (iv) Terminated (blue) or Errored (red) when terminates with no or one or more errors, respectively.
Job Dependencies: The execution of a job j may depend on the termination of one or more other tasks.In this case, the status of the job j will switch from Waiting to Ready when all previous tasks have been completed by at least a node of the network.
Job Repeating: VFuse allows users to define repeating jobs (marked with the status Repeating) to (re)schedule the same job (hence, reiterate some workflow activities) until the entire workflow stops or expires.In practice, when a repeating job terminates, its status does never change to Terminated.The same dependency rules also apply in this case.

b: COMPUTING MODULE
The Computing module takes care of all aspects related to orchestrating workflows over the VFuse volunteer network and exchanging the computed results among nodes.
Workflow Orchestration: Upon submitting the workflow (see Section IV-B5.a), the Initiator transparently broadcasts a new EXECUTION_REQUEST message.As this message is also used to update a workflow already existing on the network, every peer receiving it compares the CID of the local copy of the workflow (if existing) with the information received and updates the Gossip and Execution queues.If the node receives the workflow for the first time, it adds it to both queues.
A workflow terminates in either one of the three following cases: (i) all jobs of the workflow have been computed at least once, (ii) the TTL associated with the workflow expires, or (iii) the Initiator stops the workflow.In this last case, the system broadcasts a DROP_REQUEST message, which forces each node receiving the message to remove the given workflow from its working queues.This message is then broadcasted until the TTL of the workflow runs out.

Workflow and Job Selection:
A VFuse node selects the next workflow to compute by choosing a candidate in the Execution queue.The final choice depends on three factors: (i) whether the workflow has at least a job in the status Ready, (ii) whether the workflow has at least one job that has not been selected for execution by another node, and (iii) the associated priority (see next paragraph).In particular, this value is proportional to the amount of reward owned by the Initiator of the workflow.
After selecting the workflow, the node chooses a job uniformly at random among all Ready jobs not present in the Selected Jobs List.This list keeps track of all jobs that will be run in the network.Specifically, each node broadcasts information about the jobs it is about to process with the The number of NFTs (i.e., reward) owned by the Initiator impacts the priority of the requested workflow.This design choice translates into granting workflows of users with higher rewards a higher probability of being scheduled before the workflow submitted by a user with a lower amount of reward.
In other words, the more a user contributes to computing jobs on a VFuse network, the sooner its submitted workflow will be completed.To avoid the starvation of workflows requested by users with low rewards, we introduced a scaling factor based on the waiting time of the workflow, i.e., the time that a workflow has been waiting in the Execution queue before being selected.
Hence, the priority associated with each workflow keeps into account (i) how much the Initiator of the requested workflow has already contributed to the volunteer network (i.e., amount of rewards) and (ii) the waiting time of the workflow, according to the following rule: , where • v is a volunteer node; • w is a workflow; • E v is the Execution queue of a node v; • u w is the user that has submitted the workflow w; • R(u w ) is the number of NFTs (reward) owned by the user u w ; • t is the current scheduling time, such that t > t w .
• t w is the time when the client v has received the workflow w.
The so-designed priority function balances the contribution previously given to the network (in terms of computing capabilities) by the Initiator with the time they have to attend to see their requested workflow completed.If (the Initiators of) two different workflows have the same reward, the workflow with a higher waiting time will be computed first.Clearly, if a user with a low reward submits a workflow and there is no competition, then the requested workflow will be immediately computed.
Each VFuse node computes the priority of each workflow in the status Ready in its Execution queue before the selection phase.
Job Results: A job may terminate because (i) its computation naturally ends by producing the desired result(s), (ii) its computation errored, or (iii) its computation time exceeded 842 the maximum running time offered by the volunteer node.

843
Each VFuse node that has run a job informs the entire net-844 work of the computed result(s) (or obtained error(s)), broad-845 casting an EXECUTION_RESPONSE message and pinning 846 them on the shared IPFS cluster.When a VFuse node receives 847 the result(s) related to a job it has already computed, it then 848 compares its locally stored result(s) with the one(s) received.849 If they differ, the node launches a WARNING_RESULT to 850 inform the Initiator that there has been a divergence in the job 851 result(s).This protocol guarantees that each workflow will 852 eventually end (with the expected results) since the same job 853 will not be scheduled again for the same client.Consequently, 854 it allows two or more peers to compute results for the same 855 task if they scheduled the same job concurrently.

856
Execution Backend: Each VFuse node offers its compu-857 tational capabilities by providing a pool of Web Workers 858 running different computational backends developed using 859 WebAssembly.In more detail, Web Workers are threads run-860 ning in a private scope that allow the code to be executed 861 in a sandbox.Hence, they ensure that a VFuse client cannot 862 be damaged by any malicious code.The Computing mod-863 ule automatically runs the Web Worker associated with the 864 specific implementation language and asynchronously com-865 municates with them via JavaScript promises.Web Workers 866 directly enable this module to build the workflow, locally 867 execute a workflow and run jobs.These operations are piloted 868 using the specific messages detailed in Table 5.It is worth 869 noting that even though VFuse is designed to be executed in 870 a web browser, it can also be run -without any changes -in 871 other computing environments, such as desktop machines or 872 servers running a NodeJS server.The Identity module is responsible for storing users' personal 875 information within the browser using the Events and Data 876 Component.Specifically, this module creates a new user 877 environment inside the browser cache when the node is ini-878 tialized.Table 6 lists the environment properties (which users 879 can personalize), specifying the parameters for (i) entering 880 a specific VFuse network, (ii) defining the computing capa-881 bilities offered to the network, and (iii) tuning the network 882 performance for receiving and sending data.occurrences for each word encountered (map function).Results are then collected and combined to compute the word with the highest number of occurrences (reduce function).Figure 4 depicts the resulting job DAG.

2) IMPLEMENTATION DETAILS
The implementation of the most common word in a text2 is shown in Listing 1.
The workflow starts retrieving the size of the file using a fetch (Lines 25 − 27) to evenly split the text among the jobs (Lines 28 − 40).Each job runs a map function (Lines 1 − 18), implementing the core logic of the algorithm.This function receives two parameters in input: the URL of the file and two indices (bytes) limiting the chunk to compute.The map function uses the VFuse API getDataFromURL() to retrieve the associated chunk based on the input indices and computes how many times each word occurs, avoiding truncated words.Line 41 adds each job to the VFuse workflow and includes it in the group map_group.The reduce function collects the results calculated by these jobs (Line 43), finally computing the word with the maximum number of occurrences (Lines 20 − 23).It is worth highlighting how VFuse transparently handles the data dependencies between the map and reduce jobs.This mechanism is implemented via job groups; in this example, the function reduce waits for all jobs included in groups starting with the word corresponding to the regex ''^map_''.

B. ML ALGORITHMS COMPARISON
Binary classification is a common task studied in Machine Learning (ML) that aims to classify the instances of a dataset into two groups [51].Binary classification problems are pretty common, and several models exist to address them.Each model has a different performance depending on the application domain; thus, it may produce optimal results on some data while performing poorly on another.Consequently, a crucial task in ML is to identify the best model for a specific problem.In this use case, we compare several binary classifiers available in the Python library scikit-learn.First, the workflow loads the required libraries (Lines 1−3) with the standard Python syntax.Then, as in the previous use case, the workflow reads the data from an URL and returns a string (Line 5).In lines 22 − 25, the execution of each classification model is delegated to a different job (whose behavior is described by the function eval(), lines 10 − 15).The workflow then waits for the termination of all jobs and gets the computed data as input to compare the performance of all algorithms via the function compare() (Lines 27 and 10 − 15).The data transfer between the two computing functions -i.e., eval() and compare() -is automatically provided by VFuse.

C. PI ESTIMATION USING MONTE CARLO
Monte Carlo methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results.One of the Monte Carlo algorithm's primary applications is Pi's estimation.Specifically, this method considers a square space with a circle inscribed and generates several random points within this space.The value obtained by dividing the total number of points by the number of points within the circle represents an approximation of Pi.The random nature of the Monte Carlo algorithm implies that the more the points generated, the more accurate the result is.Adopting a distributed approach would allow the generation of hundreds of thousands of points, resulting in a more accurate value.

1) VFuse WORKFLOW
The workflow implemented by this VFuse application is based on a Repeating job, which never reaches the status LISTING 3. Estimating the value of Pi using Monte Carlo (JavaScript).
Terminated (see Section IV-B5.a).Every time a node publishes new random points, the application reevaluates the estimation of Pi.

2) IMPLEMENTATION DETAILS
The implementation of the Monte Carlo estimation of Pi 5 is shown in Listing 3. The application sets as a repeating job the function getPoints() (lines 9 − 10), that generates random points within a given interval (lines 1 − 4).In the same way, the application assigns the function estimatePi (lines 5−8) to a job (line 11) and declares the job as repeating (line 12).

D. SEQUENCE ALIGNMENT WITH SMITH-WATERMAN
One of the most common procedures in molecular biology is searching for similarities in protein and DNA sequences [52].
The established method to perform sequence alignment is the Smith-Waterman algorithm, based on the dynamic programming approach developed by Temple F. Smith and Michael S. Waterman.The algorithm computes optimal local alignments of two sequences identifying the two sub-sequences with maximal similarity scoring.The sequence comparison is made using the segments of all possible lengths instead of the entire sequence.The Smith-Waterman algorithm first determines the scoring matrix, then performs a trace-back measure to generate the segments with the highest similarity score using the previous scoring matrix.Although effective, this algorithm is costly in terms of computational cost since it requires a number of operations proportional to the product of the length of the sequences.Therefore, searching for similarities in large data sets requires a huge amount of time.
The use of distributed computing in sequence alignment can drastically reduce the time required: the data set to match with a sequence can be split among different workers to find the most similar one.

1) VFuse WORKFLOW
The VFuse workflow assigns a portion of the sequence data set to each job.Upon receiving the data, each job computes the similarity between the string to match and the sequences included in its assigned data chunk and returns the sequence with the highest similarity score.The workflow waits for the 1038 results computed by all jobs and compares their outcomes to 1039 find the best matching sequence.

VII. CONCLUSION AND FUTURE WORK
The paradigm of (BB)VC gained attention over the last years as a potential tool for allowing researchers and companies to access the enormous computing capabilities over the Internet for solving high-demand computational problems.
In this work, we proposed VFuse, a novel BBVC architecture that offers (i) a ready-to-access network through web browsers and (ii) a multi-language programming environment thanks to WebAssembly, (iii) stimulates users' participation by providing a secure and transparent rewarding mechanism based on Blockchain technology, and (iv) specifies an innovative definition of users' participation via NFTs that guarantee the user ownership of computing results.We demonstrated the advantages of VFuse and its added value by comparing our platform with the most common BBVC and discussing four example applications.A prototype of VFuse and the presented examples are freely available on GitHub.
Currently, we are working on developing the rewarding mechanism implemented with the Ethereum blockchain and IPFS.In future work, we plan to perform systematic experiments to assess the performance gain of our platform against other well-known systems.We also aim to introduce workflows with an associated deadline, whose importance could be reflected in the workflow priority and the rewarding strategy.Another interesting future directions is integrating VFuse into standard web pages to allow visitors of websites, such as online news and game sites, blogs, and social networks, to participate in a VFuse volunteer network by visiting some dedicated pages offered by the owner of the service.Future research should also consider designing a more sophisticated scheduling algorithm to allow a fairer job execution, including more advanced security mechanisms, extending the workflow definition by adding more programming constructs for enriching the programming model, and developing other programming backends supported by WebAssembly.

197 of faults and disconnections. 198 •
Heterogeneity.The BBVC platform should consider that 199 volunteer machines could have different hardware, oper-200 ating systems, and performance.201 • Programmability.Easiness of developing new tasks on 202 the BBVC platform.203 • Scalability.The BBVC platform must handle a growing 204 amount of connections.205 • Security.The code run by the platform should not harm 206 the volunteer machine.
267provides a simple mechanism to define JavaScript functions, 268 including inline data and the number of required peers to complete the task.Pando[34] is a tool born with the intent of leveraging the potential of VC for personal projects.Its programming model corresponds to a streaming version of the functional map operation: Pando applies a given function on a series of input values to obtain a series of results.Pando relies on the pull-stream design pattern to manage the input stream for functions, which are written in JavaScript and can be combined in Unix pipelines.The task deployment is based on a Node.jsmaster server (Stream Lender) responsible for scheduling functions and collecting their results.
to store large data files and support blockchain operations.The main characteristic of IPFS is how content is identified.Rather than associating a location with a resource (like what happens with URLs), IPFS uses an immutable hash code -called Content Identifier (CID) -to identify resources in the network.To allow dynamic resource addressing, IPFS provides the Inter-Planetary Name System (IPNS) that leverages a unique hash pointer targeting different CIDs when the content changes.Further, IPFS ensures data distribution and replication to guarantee availability and fault tolerance.Architecture Insight: The VFuse architecture is designed as a fully-distributed application running over a volunteer network of computational nodes.The use of IPFS as a component of VFuse ensures the unique identification of inputs and jobs' outputs, giving our platform the power of contentaddressed storage.To safeguard the indefinite persistence of data and workflows on the VFuse network, these can be pinned to one or more IPFS nodes.Pinning gives the programmer control over disk space and data retention and guarantees that the pinned resources are not deleted during IPFS garbage collection.Further, the use of the IPFS Cluster, a distributed application that works as a sidecar to IPFS peers, enables VFuse to allocate, replicate, and track pinned resources among multiple peers; hence, guaranteeing data redundancy and availability without compromising the distributed nature of the IPFS network.Finally, IPFS empowers the exploitation of blockchain functionalities to implement a secure and trustworthy mechanism (see Section IV-B5.d).• Ethereum [45] is a decentralized, open-source blockchain platform establishing a P2P network that securely executes and verifies smart contracts.Smart contracts are event-driven distributed programs stored on the blockchain that run when predetermined conditions are met.They allow participants to transact with each other without a trusted central authority.Transaction records on Ethereum are immutable, verifiable, and securely distributed across the network, giving participants full ownership and visibility into transaction data.Ethereum allows the creation of unique and indivisible tokens, called non-fungible tokens (NFTs).NFTs represent ownership of unique items, such as a piece of art, digital content, or media.Each NFT can only have one official owner at a time, and the Ethereum blockchain secures them (e.g., no one can modify the record of ownership or copying existing NFTs).In other words, NFTs embody an irrevocable digital certificate of 497 ownership and authenticity for a given digital or physical 498 asset.499 Architecture Insight: VFuse exploits the Ethereum 500 blockchain to implement a rewarding strategy in the volunteer 501 network.Every time a node contributes to the computation 502 of a job, it receives a reward in the form of a VFuse NFT, 503 which guarantees the ownership of the produced digital asset 504 (namely, the result of the computation).The number of NFTs 505 collected by VFuse clients is then used to prioritize their 506 submitted workflows.Section IV-B5.b and Section IV-B5.d507 detail this process.It is worth stressing that we did not 508 adopt a VFuse currency given the volunteer nature of the 509 network itself: users do not have to pay to use the network, 510 but, at the same time, they are encouraged to offer their 511 computing resources in exchange for a faster termination of 512 their submitted workflows.

FIGURE 1 .
FIGURE 1. Interactions between nodes within the VFuse network in a typical execution flow.A new computing node C may enter the network by connecting with a VFuse bootstrap node (Step 0).A node A which submits a workflow (Step 1) becomes the Initiator of the requested computation.Each computing node of the network, such as B and D, receives the workflow, executes some jobs based on the associated priority (Step 2), stores the results locally (Step 3), gets a reward (Step 4), and pins the result to the IPFS cluster (Step 5).Eventually, each node broadcasts an update workflow message (Step 6).

FIGURE 3 .
FIGURE 3. Snapshots of the VFuse web client application (My Workflows page).
component embodies the middleman 611 between users and the VFuse system.It offers users access 612 to the network through a web application (implemented via 613 the ReactJS framework 1 ) that exploits the API component to 614 use the functionalities provided by VFuse.Specifically, a user 615 who wants to join the volunteer network sends an HTTPS 616 request to the VFuse central server, hosting the VFuse web 617 application.Once the browser renders the application, the 618 user can then join a specific VFuse volunteer network by 619 specifying the required (i) bootstrap node, (ii) signal server, 620 and (iii) pinning cluster.It is worth stressing that the VFuse 621 web application runs all code client-side or via the P2P 622 volunteer network.Hence, the central server's only task is to 623 serve the VFuse application.624 1 https://reactjs.org/

3 )
EVENTS AND DATA COMPONENT The VFuse Events and Data Component offers a software interface to orchestrate communication across components and manipulate local and remote data via the Event Module and the Data Management Module, respectively.The Event Module controls the inter-component communication and network updates through events, which transmit information about jobs' status and data asynchronously.This module also handles the initialization of each node's workspace -comprising local data (such as the user profile, workflows, and settings), the Gossip and Execution queues (see Section IV-B4 and Section IV-B5.b), and local web workers for communicating and running jobs.

VOLUME 10 ,
2022 message SELECTED_JOBS.Upon receiving this message, each node adds the received job ids in the Selected Jobs List, attaching to each of them the time the message arrived and a Selection_TTL.When the TTL expires, the job ID is removed from the list.The node repeats the whole process until the maximum number of concurrent jobs is reached.Workflow Priority: Users offering their computational capabilities to a VFuse network collect VFuse NFTs, which are saved into a public blockchain (see Section IV-B5.d).

TABLE 6 .
VFuse user profile properties, which can be configured in the setting panel of the web client application.d:REWARD MODULE884The Reward module handles all the utilities concerning the 885 rewarding mechanism provided by VFuse, such as the def-886 inition of smart contracts and the integration with IPFS to 887 implement the VFuse NFTs.888 The VFuse rewarding strategy is based on the concept of 889 artifacts.Every time a node computes a job, it stores its result 890 on IPFS, which will assign to it a unique and immutable CID.891 We consider this identifier (and, hence, the result written on 892 the IPFS cluster) as an artifact produced by a user's contri-893 bution to the computation of a workflow.A VFuse peer is 894 rewarded with a VFuse NFT per artifact.Specifically, an NFT 895 is a smart contract [50] assigning the ownership of a particular 896 thing (in our case, a file on IPFS) to a specific user (a VFuse 897 node).898 V. EXAMPLE WORKFLOWS 899 This section presents four use cases addressed with our sys-900 tem, describing the design and the workflow of each VFuse 901 application.Each listing highlights the use of the VFuse 902 API, leaving out the implementation details of the specific 903 algorithms.Readers interested in the complete version of the 904 code can refer to the VFuse GitHub repository.

905A.
MOST COMMON WORD IN A TEXT 906 Finding the word which occurs the most in a text is a 907 variation of the famous word count problem.This class of 908 problems is often exploited to demonstrate the benefits of 909 distributed MapReduce paradigm, given their embarrassingly 910 parallelizable nature.

911 1 )
VFuse WORKFLOW 912 The implemented VFuse workflow assigns to each job a 913 different portion of the file that returns the number of 914

FIGURE 4 .
FIGURE 4. Job DAG of the WordCount example.

LISTING 1 .
Most common word in a text (JavaScript).Specifically, this VFuse Python application trains and tests952 five different binary classifiers using the UCI PIMA Indian 953 Diabetes dataset to predict whether a person has dia-954 betes or not using the medical attributes provided.The 955 algorithms used are Linear Discriminant Analysis (LDA), 956 Decision tree classifier (specifically, CART), K-Neighbors 957 Classifier (KNN), Naive Bayes (NB), and Support Vector 958 Machine (SVM).All algorithms were run using their default 959 parameter configurations.

960 1 )
VFuse WORKFLOW 961 The implemented VFuse application assigns a different 962 binary classifier to as many jobs.Each job receives the train-963 ing and testing data sets and returns the model's accuracy.964 Each algorithm is evaluated using 10-fold cross-validation 965 using the same random seed to ensure consistency when 966 splitting the training data.

967 2 )
IMPLEMENTATION DETAILS968The implementation of the comparison of ML models 4 is 969 shown in Listing 2.

1040 2 )
IMPLEMENTATION DETAILS 1041 The implementation of the standard Smith-Waterman for 1042 sequence alignment 6 is shown in Listing 4. 1043 First, the workflow loads the data set via the function 1044 getDataFromURL() (line 23).Then, it splits the data set 1045 and assigns each chunks to a different job (lines 28 − 33), 1046 which computes the Smith-Waterman algorithm on its 1047 input data (function sw(), lines 15 − 17).The functions 1048 similarity() (lines 2 − 5), fill_matrix() (lines 1049 7 − 9), and trace_back() (lines 11 − 13) are three 1050 auxiliary functions used by the matching algorithm.Finally, 1051 the workflow waits for the termination of all jobs and gets 1052 the computed data as input to find the sequence with the 1053 maximum alignment score via the function compare() 1054 (Lines 19 − 21).

1056
We performed a preliminary evaluation of the VFuse proto-1057 type, experimenting with the Most Common Word example 1058 described in the previous section.Specifically, we bench-1059 marked and analyzed the running time of computing the 1060 most frequent word among 64 equally-sized files for a total 1061

FIGURE 5 .Figure 5
FIGURE 5. Running time and relative speedup of the P2P BBVC VFuse platform compared against the centralized BBVC system Pando.

TABLE 1 .
Main characteristics of the BBVC platforms.N/C stands for not clarified.

TABLE 2 .
Answers of VFuse to the challenges of BBVC.
537 ning node.Specifically, a computing node is a VFuse volun-538 teer device that offers and may require network computing 539 capabilities through a web browser.Being a P2P network, 540 VFuse requires the presence of bootstrap nodes to allow 541 new devices to join the network.These special nodes are 542 responsible for the discoverability of the VFuse network as 543 well as for the initialization of new connections.In particular, 544 530• designing a modular and expandable architecture that 531 transparently exploits the underlying technologies; 532 • supporting a distributed volunteer network built 533 on P2P communications, storage, and Blockchain 534 technologies.535 A. VFuse NODE TYPES AND THEIR INTERACTION 536 A VFuse node may be either a computing, bootstrap, or pina VFuse bootstrap node runs the same software stack of 545 computing nodes plus a WebRTC signal server (publicly 546 accessible over the Internet and hosted on a NodeJS server) 547 that allows direct connections with other peers, such as a 548 browser-to-browser communication.It is worth noting that 549 every computing node may also act as a bootstrap node to 550 VOLUME 10, 2022 This component defines the synchronous 584 and asynchronous function interfaces to let users access 585 the VFuse platform.
create, define, submit, and stop workflows inside inter-581 active web notebooks, which enable the (remote/local) 582 verification of workflow executions.583 API Component.591 Network Component.This component is responsible for 592 providing the VFuse network communication protocol 593 and data management (i.e., storage) functionalities by 594 exploiting Libp2p and IPFS over HTTPS.It further 595

TABLE 3 .
VFuse API for developing workflows within the VFuse IDE panel.

Table 4
delete, and update data and workflows.Specifically, this module exploits the IPFS Mutable File System (MFS) to manage local user information, such as preferences, workflows, execution data, and publishing queues.Remote data, such as submitted workflows, shared data, and job results, are handled using the IPFS network API.