Machine Learning on Cloud With Blockchain: A Secure, Verifiable and Fair Approach to Outsource the Linear Regression

Linear Regression (LR) is a classical machine learning algorithm which has many applications in the cyber physical social systems (CPSS) to shape and simplify the way we live, work, and communicate. This paper focuses on the data analysis for CPSS when the Linear Regression is applied. The training process of LR is time-consuming since it involves complex matrix operations, especially when it gets a large scale training dataset In the CPSS. Thus, how to enable devices to efficiently perform the training process of the Linear Regression is of significant importance. To address this issue, in this paper, we present a secure, verifiable and fair approach to outsource LR to an untrustworthy cloud-server. In the proposed scheme, computation inputs/outputs are obscured so that the privacy of sensitive information is protected against cloud-server. Meanwhile, computation result from cloud-server is verifiable. Also, fairness is guaranteed by the blockchain, which ensures that the cloud gets paid only if he correctly performed the outsourced workload. Based on the presented approach, we exploited the fair, secure outsourcing system on the Ethereum blockchain. We analysed our presented scheme on theoretical and experimental, all of which indicate that the presented scheme is valid, secure and efficient.


I. INTRODUCTION
Cyber physical social systems (CPSS), as an emerging paradigm, is providing efficient, convenient, and personalized services and benefiting human lives.Machine Learning (ML) techniques have been widely implemented in the CPSS for providing new predictive models for large scale data analysis in various applications.Linear Regression (LR) is a classical supervised learning algorithm, which is widely used to establish the relationship between the target variable and the input variable based on a trained model.The training process of the linear regression involves the matrix multiplications and the matrix inversion, which are time-consuming operations.On the other hand, in the age of big data, the LR is often applied to a large-scale training dataset, which will require unaffordable computation power for individuals.Thus, how to enable devices to efficiently perform the training process of the linear regression is a critical problem.
In the age of cloud computing, outsourcing training process to an untrusted cloud can be an alternative solution.Cloud computing is providing fast and secure computing and data storage services over the internet.With cloud computing, users can avoid the upfront cost and complexity of owning and maintaining their own hardware, and instead simply pay for what they use.Although outsourcing the heavy workload to a cloud server has many advantages, it also brings some new challenges [1,2].First, the outsourced data might include private information such as patients' health records, which should not be leaked to a public cloud.How to enable the cloud to perform the computation on privacy-preserved input is a critical challenge.Second, cloud server may return invalid computation outputs prepensely or unconsciously.The incorrect results might be caused by a software bug, a malicious attack on the cloud, or financial incentives to save computation power.Thus, how to enable outsourcer to detect malicious behavior from cloud servers is another challenge.Third, if cloud server performs computation task before outsourcer paying the services fee, the outsourcer might not pay after receiving the results.If the outsourcer pays the service fee first, the cloud might not conduct the computation and return random and invalid results.Thus, how to guarantee fairness for both the cloud server and the outsourcer is the last challenge.As far as we know, no existing research achieves secure, verifiable, and fair outsourcing for the linear regression.
To address the above challenges, in this article, we make research on how to safely implement a linear regression model on an untrustworthy cloud, in a way that guarantees fairness for both parties.To be specific, we propose a new elementary transformation based technique to obscure the computation input and output.To achieve fairness, taking advantage of the decentralized, traceable nature of the blockchain, we employ the blockchain as the middleman, which verifies the calculation results and guarantees the fairness.Notice that the verification process of secure outsourcing schemes often involves private data of the outsourcer, while the blockchain is a public ledger that anyone can have access to the data on it.Thus, we propose a verification method that does not involve any private information (i.e., the proposed outsourcing scheme is publicly verifiable).Based on the designed scheme, we implement the fair, secure outsourcing system on the Ethereum blockchain [3], which includes a verification system and a payment system.We develop the smart contract as well as the graphical user interface and introduce the implementation details.To evaluate our presented scheme, we analyze the correctness, security and efficiency of our approach on theoretical.Also, we perform experiment to assess the efficiency of our presented scheme.
The rest of the paper is organized as follows: Section II introduces some essential preliminaries for the proposed scheme.Section III provides the system model.Section IV describes the design rationale, the generic framework and the detailed scheme.Also, we analyses the correctness, security and efficiency of the presented scheme.Section V introduces the implementation of the developed system in detail.Section VI evaluates the practical performance of the proposed scheme through experiments.In Section VII, we discuss some possible applications where our proposed algorithm can be applied.Section VIII overviews the related work.Finally, Section IX draws conclusions to the paper.

II. PRELIMINARIES
In this section, we introduce some background knowledge of Linear Regression, Blockchain and Ethereum smart contract.

A. Linear Regression
Linear Regression is a regression model that can make regression forecasting in machine learning.It predicts a numerical value as accurately as possible through learning a linear model.There are variety of applications such as predicting the fuel efficiency of a car in terms of cylinders, displacement, horsepower, weight, etc.. Given a training set D = {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n )}, where x i ∈ R 1×m , y i ∈ R. A typical linear regression model in machine learning is defined as: where y ∈ D is the vector In this model, each vector x i in X is taken as an input and the scalar y i is an output corresponding to We view it as a set of weights that determines the degree of prediction accuracy.The optimal coefficient ω is calculated as: In eq.( 2), We need to execute a matrix inversion and two matrix multiplications.In the age of big data, the scale of training set generally becomes increasingly enormous.The client is not able to conduct such expensive computation locally especially for the resource-constraint client.

B. Blockchain and Ethereum Smart Contract
Since Nakamoto et al. first presented Bitcoin system in 2008 [4], blockchain has attracted extensive attention of many researchers and enterprises.The blockchain is a decentralized shared ledger which is composed of many blocks in terms of chronological order and uses cryptography to guarantee tamper-resistance, traceability and unforgetablity.In Bitcoin, each block is composed of the block header and the block body.The block header describes the information of the block including Block Version, Time Stamp, Nonce, Parent Block Hash, Difficult and Merkle Tree Root Hash.The block body is a set of transactions in the block.
Nick et al. first proposed the conception of smart contract in 1995 [5].Smart contract is a digital protocol that is aimed at propagating, verifying or executing in an information way.Ethereum is the first platform that permits the developers to deploy their own smart contract [3].It provides a smart contract programming language named Solidity and a smart contract execution environment named Ethereum Virtual Machine (EVM).EVM is the core innovation of Ethereum and is a Turing complete software that runs on the Ethereum network.Smart contract developers write smart contract codes in Solidity and deploy it on Ethereum blockchain.Then the smart contracts are saved in a block.The smart contracts will be executed automatically only if the smart contract receives a specific trigger condition.Smart contract execution results are verified by all the Ethereum nodes and are stored on Ethereum blockchain.

III. SYSTEM MODEL AND DEFINITIONS
In this section, we first provide the system model.Then, we describe the security definitions.

A. System Model
Fig. 1 shows the system model.As we can observe, a secure and fair outsourcing scheme includes three entities: the client C, the cloud server CS and the fair payment platform F P P .C cannot carry out the heavy computation tasks with computation-constraint devices.Thus, he/she outsources computation tasks to CS.It is an untrustworthy entity that offers computation services.F P P ensures the fairness of transaction between C and CS.F P P has two subsystems including a verification system and a payment system.The verification system is used to verify results from CS while the payment system is used to ensure the fairness of payment.
The workflow of the proposed scheme is defined the following process: C needs to leave computation tasks F (x) to CS.First, C blinds the input x into x and uploads the computation task F (x ) to F P P .Meanwhile, C pays the service fee to F P P .CS accepts F (x ) on F P P and makes deposits to F P P .Then, CS computes F (x ) and submits the result R to F P P .F P P verifies R from CS.If R is an invalid result, F P P transfers the service fee and the deposits to C. Otherwise, F P P transfers the service fee and the deposits to CS and sends R to C. After receiving R from F P P , C recovers the real result R from R .

B. Security Definitions
We introduce some security definitions for secure outsourcing computation including framework, privacy, checkability and efficiency.Researchers have similar definitions on security properties of secure outsourcing computations [6,7], in which security, verifiability and efficiency are included.According to their theories, we summarize that a secure outsourcing algorithm satisfies the following properties: Definition 1: A secure outsourcing computation algorithm SOC = (KeyGen, ProbGen, Compute, Verify, Recover) contains five algorithms defined belows.
• KeyGen(F , λ) → (P K, SK): Given the security parameter λ, the randomized KeyGen algorithm generates a public key P K used to encode the target function F and a secret key SK, which is used to obfuscate to computation inputs.• ProbGen SK (x) → (σ x , τ x ): Using the secret key SK, the ProbGen algorithm encodes the function input x as a public value σ x which is submitted to server, and a secret value τ x which is kept private by the client.Definition 2 (Privacy [6]): Privacy requires server cannot get any sensitive information in terms of the encoded input/output from client.We consider the following experiment: In the experiment, the adversary A is able to request the Oracle on any input he desires.The oracle P ubP robGen SK (x) executes P robGen SK (x) to generate (σ x , τ x ) and returns only the public part σ x .
For a secure outsourcing computation algorithm SOC, it is defined that advantage of an adversary A in the experiment as below: We define that a secure outsourcing computation algorithm SOC is privacy if for any probabilistic polynomial time adversary A, where negli() is a negligible function of its input.Definition 3 (α-Efficient [7]): A pair of algorithms (C, CS) is considered to be an α-efficient execution of an algorithm A if (1) the client and the cloud server correctly execute the algorithms and (2) for any inputs x, the execution time of C CS is less than or equal to an α-multiplicative factor of the execution time of A(x).
Definition 4 (β-checkable [7]): A pair of algorithms (C, CS) is considered to be a β-checkable execution of an algorithm A if (1) the client and the cloud server correctly execute the algorithms and (2) for any inputs x if a vicious server CS depart from its preinstall functionality during the execution of C CS (x), C will catch the error with probability greater than or equal to β.

IV. PROPOSED SCHEME
Firstly, we express the design rationale.Then, we describe our proposed scheme in detail.

A. Design Rationale
Our idea is to devise a novel scheme which allows client to securely perform linear regression model on cloud server.According to eq.( 2), matrix inversion and matrix multiplications are the most time-consuming operations.Thus, we consider to outsource (X T X) −1 X T , and leave the matrixvector production to be calculated locally.To maintain the confidence of inputs, we consider to apply a series of elementary transformations to X.By doing so, the position and the value of each element in X and X T will be fully obscured.To guarantee the fairness of the outsourcing scheme, we develop a fair payment platform based on the blockchain.Client uploads computation tasks to the platform and cloud server accepts the computation task platform from the platform.The blockchain verifies the outputs which calculated by cloud server and guarantees the fairness.However, blockchain is a public ledger that anyone can have access to the data on it.Thus, we design a verification mechanism which does not involve any private information of the client.In other words, the proposed outsourcing scheme is publicly verifiable.

B. Detailed Scheme
Algorithm 1 shows our detailed proposed scheme EF P − SOLR.To protect the privacy of X and X T , client uses 2k(3 < k min(m, n)) n × n elementary transformation matrices to blind X and X T .These elementary transformation matrices conduct the following three types of operations to a matrix: • M ultiplication: A multiplication operation multiplies the i-th row (resp.column) of a matrix by a non-zero scalar.

Algorithm 1 The EFP-SOLR algorithm
Input: X ∈ R m×n , a large-scale matrix; X T ∈ R n×m , the transpose of X; y ∈ R m×1 , a vector.Output: ω ∈ R n×1 , a vector such that ω = Ry.
Step 5. Recover(SK, R , y) → ω: • Client calculates R with the secret key SK P as: • Client calculates the real output ω with R as: • P ermutation: A permutation operation makes the two rows (resp.columns) of a matrix exchange their location.• Addition: An addition operation makes a matrix add jth row (resp.column) multiplied by a non-zero scalar to the i-th row (resp.column).
These elementary transformation matrices are invertible and their inverse matrices are easy to be calculated.In our scheme, P 1 and Q 1 are the elementary matrix of M ultiplication operations.Client chooses 2n random scalars p 1 , p 2 , ..., p n , q 1 , q 2 , ..., q n and constructs the following ele-mentary transformation matrices: P 2 and Q 2 are the elementary matrix of P ermutation operations.Client randomly generates a permutation π 1 and constructs P 2 as: • For each row i in P 2 , the value of π 1 (i)-th element is 1 and the value of the other elements is 0. We use a 4×4 matrix to show the construction process of P 2 .We assume permutation π 1 is as: Then, P 2 is constructed as: The construction process of Q 2 is similar to P 2 .P 3 , ..., P k , Q 3 , ..., Q k are the elementary matrices of Addition operations.Client chooses 2k − 4 random scalars r 1 , r 2 , ..., r 2k−4 and constructs P 3 as: • Set the value of a randomly chosen element which is not on the main diagonal as r 1 .
• Set the value of each element on the main diagonal as 1.
• Set the values of other elements as 0. For example, we use a 4×4 matrix to show the construction process of P 3 .Assuming r 1 = 3, constructe P 3 as follows: The constructions of P 4 , ..., P k , Q 4 , ..., Q k share the same logic with P 3 .Then the client preserves these elementary transformation matrices as privacy keys.To blind the computation input X and X T , the client computes X 1 and X 2 as in eq.( 5) and eq.( 6).Note that the eq.( 5) is calculated from left to right, while the eq.( 6) is calculated from right to left.Then, client uploads the computation task F (X 1 , X 2 ) and pays service fee to fair payment platform.Cloud server accepts F (X 1 , X 2 ) on fair payment platform and makes deposits to fair payment platform.Then, the cloud server performs the computation task as in eq.( 7).After solving F (X 1 , X 2 ), the cloud server submits the result R to the fair payment platform.
On receiving R from cloud server, fair payment platform chooses a random vector r ∈ R 1×n and calculates V 1 , V 2 as in eq.( 8) and eq.( 9).Then the fair payment platform inspects the validity of R by checking whether the eq.( 10) holds.If the result R is valid, fair payment platform transfers the service fee and the deposits to cloud server and stores the result R .Otherwise, fair payment platform transfers the service fee and the deposits to the client.The client downloads the valid result R on fair payment platform and recovers the result ω as ω = P 1 P 2 ...P k R y.
Remark 1: Note that we use the elementary transformation matrices to illustrate the elementary transformations.In fact, when implementing the scheme, we directly conduct elementary transform a matrix instead of multiplying the matrix by a elementary transformation matrix.The reason is that multiplying the input matrix by a elementary transformation matrix will produce many unnecessary extra scalar multiplications.For example, if we multiply the input X by P 3 in eq.( 15), the elements of X will be multiplied by the elements 0 and 1 in P 3 .However, these scalar multiplications are unnecessary and time-consuming.Thus, we directly add the elements of the first column multiplied by three to the elements of the third column.

C. Correctness Analysis
We denote P as respectively.Notice that the client recovers the results R from R as R = P 1 P 2 ...P k R , where Thus, the recovered R is the original computation output.We now prove the correctness of the verification process.Blockchain inspects the validity of R by checking whether , the returned result is correct.

D. Security Analysis
Theorem 1: The presented algorithms EF P − SOLR is input/output private.
Proof.We first analyze the privacy of inputs in our scheme, which are X and X T .In our scheme, the client transforms X into X 1 via multiplying a series of elementary transformation matrices.Note that these elementary transformation matrices are randomly chosen.Each element in X is obscured completely.Thus, cloud server is not able to retrieve X from X 1 without the series of elementary transformation matrices.Notice that X T is also obscured by the elementary transformation matrices.We assume that the cloud server can correctly guess an element of input/output matrices with a probability of 1 δ .In fact, the probability is very small because the elements of input/output matrices are real number.Therefore, for the inputs X and X T , the probability that cloud could correctly guess the input X or X T is negli(mn) = 1 δ mn .For the output R, R is equal to P R .The cloud server can reveal the real result R only if the cloud server gets the private key SK P .However, the SK P is stored at the client side.The probability that cloud is able to correctly guess the result R is negli(mn) = 1 δ mn .As we all have observed, for any valid input (X, X T ) and output (R), the probability that cloud could correctly guess the inputs or outputs both are nonpolynomial-time function.Thus, the probability that the cloud server reveals input (X, X T ) or output (R ) is negligible.

E. Efficiency Analysis
Theorem 2: The proposed algorithm EF P − SOLR is Proof.We analyze the efficiency of our scheme as shown in Table I.We assume scalar multiplication as SM and assignment operation as AS.We ignore calculating light computations such as scalar addition.Actually, AS consumes less time than SM .In the step ProbGen, client executes 2(k − 1)mn + (k − 2)(m + n) SM and 2mn AS.In the step Compute, cloud server executes m 2 n + n 3 + mn 2 SM .In the step Verify, fair payment platform executes 3mn SM .In the step Recover, client executes (k −1)mn+(k −2)n SM and mn AS.Therefore, client executes total 3(k−1)mn+(k− 2)(m + 2n) SM and 3mn AS.Cloud server executes total m 2 n + n 3 + mn 2 SM .The complexity of client executing without outsourcing algrithm is as high as the complexity of cloud server executing the step Compute.Fair payment platform executes 3mn SM .Thus, according to definition 3, Theorem 3: The proposed algorithms EF P − SOLR is 100 %-verifiable secure outsourcing algorithm Proof.According to definition 4, we need to prove that if a cloud server performs malicious behavior, the probability that fair payment platform can detect the malicious behavior is 1.Fair payment platform performing V erif ication subalgorithm needs four parameters including r, X 1 , X 2 and R .Parameter r is produced by himself.Parameter X 1 and X 2 are produced by client.Only parameter R is produced by cloud server.Thus, fair payment platform can completely detect the malicious behavior by checking if the eq.( 10) not holds.
Communication cost.Table .II. shows communication cost of EF P − SOLR alogorithms.In EF P − SOLR algorithm, client sends the blind matrix X 1 ∈ R m×n and X 2 ∈ R n×m to cloud server in the phase P robGen.In the phase Compute, cloud server computes the result R ∈ R n×m in terms of R = (X 2 X 1 ) −1 X 2 .And then, cloud server returns R to client.Client conducts phase V erif y and Recover locally.Thus, there is no communication cost in V erif y and Recover.

TABLE II: Communication cost
In this section, we introduce the implementation details of the developed system.We first describe the system architecture, then we show an example of the developed smart contract.

A. System Implementation Architecture Diagram
Fig. 2 shows the architecture of the developed fair payment system, which consists four layers.The front-end UI layer provides the graphical user interface for users.We develop a series of scripts written in AngularJS to realize ajax interaction.Ajax uses HTTP requests to realize asynchronous data transmission between the browser and the web server.Web3.JS is a JavaScript library provided by Ethereum, which provides the interface for front-end javascripts to interact with the smart contract functions.We use MetaMask to manage the ethereum account.MetaMask is a plug-in wallet for web browsers that allows users to interact with the Ethereum blockchain; The logical layer includes a series of functions.It receives requests from UI layer and executes the corresponding functions.These functions are provided by the web server and smart contract.We develop the web server in NodeJS and develop the smart contract in Solidity.The web server interacts with the database while the smart contract interacts with the Ethereum Blockchain; Data Storage Layer provides data interaction to the Logical layer.The data is stored in database or Ethereum Blockchain.Specifically, it stores the Fig. 2: System Implementation Architecture basic information (e.g.user-name, user-password and user-ID) in the database, and stores crucial calculation related data (e.g.task-parameters and server fee) on the blockchain.Our fair payment system runs on Ubuntu 18.04 LTS.We use Ganache to simulate Ethereum Blockchain, which is a private Ethereum blockchain for developers that can be used for local deployment of smart contracts.
Listing.1: Solidity code of the Payment function 1 f u n c t i o n Payment ( u i n t t a s k i d , 2 u i n t CID , u i n t 8 f l a g ) p a y a b l e { 3 v a r t a s k = T a s k s [ t a s k i d ] ;

}
Listing. 1 shows the Solidity code of function P ayment, which takes three input parameters including taskid, CID, f lag.taskid is the task ID and CID is the cloud server ID.Flag f lag is the verification outcome, which determines whether the calculation result passed the verification.When the value of f lag is 1, smart contract will transfer the service fee and deposits to cloud server.When the value of f lag is 0, smart contract will transfer the service fee and deposits to client.The variable task is a structural body which contains all the information of a task.status is an attribute of task, which represents the status of task.When status equals 0, it means that the task is uploaded by the client and has not been taken by any cloud server yet.When the value of status is 1, it represents that the task has been taken but not solved yet.When the value of status is 2, it means that the task has been correctly conducted by a cloud server.When the value of status is 3, it means that a cloud server returned an invalid result of the task.T asks is a mapping which maps the variable type uint to the variable type struct.addr is a variable of type address, which represents an Ethereum address to receive ether.e is an ether unit.SerP lusDep represents the sum of service fee and deposits.

B. System Demonstration
Fig. 3 demonstrates the developed system.To outsource a computation task, the client submits the computation task as shown in Fig. 3(a).The client input the service fee and two matrices.In this example the client pays five ethers for a LR computation task.When the client clicks the Submit button, MetaMask will package the transaction as shown in Fig. 3(b).In Fig. 3(b), client uses an ethereum address (0x8fO61A...2963) to transfer five ethers as serve fee to smart contract address (0xc86d77...1347).Client and cloud server can view all tasks in the task list as shown in Fig. 3(c).The task list shows all task information including Taskid, Serve Fee and Matrix.Client and cloud server can view detailed matrix by clicking on the following link.In Fig. 3(d), the cloud server input the task ID to claim a computation task.Then the fair payment system will show the deposits, which is equal to the service fee.When the cloud server clicks the COMFIRM button, MetaMask also packages the transaction, which is similar to Fig. 3(b).In this transaction, the cloud server uses an ethereum address (0xb2b04a...240B) to transfer five ethers as deposits to smart contract address (0xc86d77...1347).After completing the task, cloud server submits the result as shown in Fig. 3(e).Cloud server input the task ID and uploads a txt file which stores the result.After cloud server clicks the CONFIRM button, MetaMask still packages the transaction which is similar to Fig. 3(b).Finally, smart contract checks the validity of result and transfers deposits and service fee to the corresponding address.

VI. EXPERIMENTAL PERFORMANCE EVALUATION
In this section, we evaluate the practical performance of our proposed scheme.We first describe the evaluation methodology, then show the evaluation results.In our experiments, we conduct the simulation of all phases of our scheme on a Windows machine.Specifically, the testbed is with the Windows 10 on an i5-6500 at 3.20GHz with 8GB memory.We use python to implement our proposed algorithms.We execute the experiment 20 times and calculate the average execution time.

B. Evaluation Results
We present the evaluation results for EF P − SOLR in Fig. 4. The size of the matrix X ranges from 2000 × 1500 to 5500 × 5000.Fig. 4(a) compares time cost between EF P − SOLR and conducting the LR without-outsourcing at the client.We can observe that our proposed EF P − SOLR costs much less time.Fig. 4(b) shows time cost comparison among different phases.Obviously phase P robGen is the most time-consuming.The reason is that as discussed in Section IV, the P robGen phase requires more scalar multiplications than other phases.In the Fig. 4(c), we compare the time cost among three elementary transformations.M ultiplication and Addition are mainly composed of scalar multiplications, while P ermutation is mainly composed of assignment operations.M ultiplication performs much few scalar multiplications than Addition.Thus, there is a significant difference between M ultiplication and Addition.P ermutation consumes less time than M ultiplication.The reason is that the assignment operation consumes less time than scalar multiplication.

VII. APPLICATIONS
In this section, we discuss possible applications where our proposed scheme can be applied.Linear regression has been applied in many applications, such as face recognition [8] and disease prediction [9].In these applications, data involved in machine learning often contain some sensitive information.Thus, when users use a cloud server to accomplish timeconsuming machine learning, data security faces critical challenges.To protect the data security while leveraging the cloud server, our proposed scheme can be applied.
For example, Naseema et al. in [8] proposed a novel approach of face identification by formulating the pattern recognition problem in terms of linear regression.The dataset is shown is Fig. 5.The features of a person's face are supposed to be private information, especially when the face identification is applied in military uses, the leak of the training data may cause serious losses.Thus, to ensure data security, we can apply our proposed secure outsourcing algorithm to accomplish the face identification.By doing so, the user can efficiently accomplish the face identification with the help of a cloud server, while the data privacy, result verifiability and payment fairness are ensured.
Fig. 6 shows a map of an Australia coastal site.Ali et al. in [10] applied linear regression to develop a real-time significant wave height forecasting system.The geography data involved in the forecasting system should be kept secret because of its commercial value.Thus, we can apply our proposed secure outsourcing algorithm to accomplish the significant wave height forecasting.Our algorithm can also be applied in many other applications where the linear regression is in used, so that the data security in those applications can be guaranteed.

A. Machine Learning And Linear Regression
Applying machine learning approaches to efficiently analyze large scale of matrix data has allured increased attention due to their special feature for facilitating pattern recognition, classification, and prediction.Regression and multilevel/hierarchical models are widely implemented to conduct data processing with linear or nonlinear regression and multilevel models [11,12].In the past, a number of machine learning and statistical methods have been proposed to generate meaningful information on different datasets [13][14][15][16][17][18][19][20][21].For example, Huang et al. [22] presented a novel approach to compute special label features of multiple label.They proposed a new augmented matrix using advanced order label correlations and implement a multi-label classifier simultaneously to enhance multi-label classification.Tu et al. [17] proposed a multiple label answer aggregation method which applied the Joint Matrix Factorization (JMF) to picky There are also many research efforts focusing on improving the efficiency and effectiveness of linear regression models [13,16,16,23,24].In a linear regression model, it is often assumed that the explanatory variables are independent.Lukmanand et al. [24] proposed estimators based on Hoerl and Kennard estimation techniques to improve the ridge parameter.Fasoranbaku et al. [16] evaluated the basis of six parameters and helped to improve more powerful experiment appropriate for better parameters estimation of the LR.Chen et al. [25] presented an approach to securely performing linear regression on a cloud.However, their scheme can not hide the number of element 0 in the process of blinding inputs.Zhou et al. [26] introduced a secure method to outsourcing linear regression.The proposed scheme protected the privacy of inputs and outputs.

B. Secure Outsourcing Computations
There are extensive research efforts on a variety of secure outsourcing schemes for scientific computations.For example, Atallah et al. in [27] first presented a generic structure for the secure outsourcing of scientific computations.However, their presented framework could not verify the correctness of the result which calculated by cloud server.Chen et al. in [28] designed a secure outsourcing approach for the large-scale linear equations.Their approach used some special sparse matrixes to blind the inputs and outputs.And the approach allowed client to detect cheating behavior of cloud servers with a probability of 100%.Salinas et al. in [29] proposed a secure outsourcing method which allows resourceconstrained devices to solve large-scale sparse linear systems of equations (SLSEs).The proposed method protected the privacy of inputs/outputs and is efficient comparing with other schemes simultaneously.The computation cost of some basic cryptographic operations is too heavy for resource-constrained devices.To free these devices from such computations, a number of research efforts have been conducted on how to securely outsource cryptographic computations [30][31][32][33][34]. Hohenberger et al. in [32] proposed a security framework for outsourcing cryptographic computations.Based on the frameworkm, they proposed two practical outsource-secure approaches.Zhang et al. in [35] proposed two practical algorithms to securely outsource the Cippola's algorithm.The proposed two schemes enable IoT devices to accomplish the Cippola's algorithm efficiently.Also, IoT devices can detect the misbehavior of cloud servers with a probability of 1. Yu et al. in [2] designed a cloud storage auditing scheme which achieved the verifiable outsourcing of key updates.In the presented scheme, the key updates was be able to securely outsourced to an authorized entity, which reduced the key-update burden on the user.

C. Blockchain
Blockchain technology enables secure, trusted, and decentralized autonomous ecosystems for various scenarios.The advanced blockchain technology has been widely leveraged in machine learning to securely and efficiently collect, organzine and audit the extensive quantities of data for model building and accurate prediction [36][37][38][39][40][41][42][43].Li et al. [37] presented a security mechanics for distributed cloud storage based on blockchain.In their framework, client could distribute all their data into encrypted data blocks and send these data blocks randomly to blockchain network.Juneja et al. in [41] implemented blockchain technology to develop an access control system in which classifier can safely store and access data during retraining in real-time using Stacked Denoising Autoencoders (SDA) networks.Kurtulmus et al. in [43] proposed a blockchain-based model for exchanging machine learning models.They used the Ethereum blockchain to create contracts that offer a reward in exchange for a trained machine learning model for a particular data set.
Shafagh in et al. [44] proposed a blockchain-based auditable storage and sharing scheme of IoT Data.Their proposed scheme provides distributed access control and data management.Different from existing trust model that delegates access control of our data to a centralized trusted authority, their proposed scheme empowers the users with data ownership.To provide a systematic review of blockchain technology in IoT, Christidis et al. [45] explored how the combination of Blockchain and Internet of Thing (IoT).In their research, they stated that it will has a bright prospect and will lead to significant changes in multiple industries When Blockchain and IoT are combined.In this paper, we employ the blockchain as the middleman, which verifies the calculation results and guarantees the fairness and further ensure the security and accuracy of our outsourcing algorithm.Lin et al. in [46] proposed a blockchain-based system to securely outsource the billinear pairing.In the proposed system, the cloud server can get paid for the computation task only when he correctly performed the outsourced workload from the client.

IX. CONCLUSION AND FUTURE WORK
In this paper, we designed a secure, verifiable and fair scheme to outsource a classical statistical machine learning models: the linear regression.Similar practices can be applied to other statistical machine learning models.The presented scheme prevents the computation input and output from leaking to cloud server, and the computation result is verifiable.Also, fairness is guaranteed by the blockchain.We introduced the presented approach detailedly and analyzed correctness, security and efficiency of it.In addition, we developed the fair, verifiable system on the Ethereum blockchain.To evaluate our presented scheme, we carried out some experiments.The experimental data indicate that our presented algorithm is efficient.
The cloud-aided machine learning faces security challenges, including data privacy, result verifiability and payment fairness.In this work, we studied the classic linear regression as an example to show how to address these challenges.To the best of our knowledge, there is no generic secure outsourcing approach for all machine learning algorithms.Fully homomorphic encryption (FHE) is a possible solution for a generic secure outsourcing approach, but the efficiency of the FHE is too low such that the FHE-based approach is not practical.Thus, current researches focus on designing specific outsourcing approaches for particular machine learning algorithms.In our future work, we plan to explore other specific machine learning algorithms, in which data privacy, result verifiability and payment fairness are guaranteed.To enable fair payment, we employ the blockchain technology, which plays a role in our scheme to verify the computation result and make a judgment accordingly.Notice that the blockchain is a public ledger that anyone can view the content.Thus, when designing such outsourcing algorithms, the task on the blockchain (the verification process) cannot involve any sensitive data.The outsourcing algorithms have to be publicly verifiable.

•
Compute P K (σ x ) → σ y : Using the public key P K and the encoded value σ x , the server computes an encoded version σ y of the function's output y = F (x). • Verify SK (τ x , σ y ) → 1 0: Given the secret key SK and the secret decoding τ x , the Verify algorithm checks the correctness of σ y .If the encoded output σ y is valid, this algorithm outputs 1. Otherwise it outputs 0. • Recover(τ x , σ y ) → y: Using the secret key SK, the secret value τ x and the encoded answer σ y , the Recover algorithm recovers the original result y = F (x).

12 } e l s e { 13 t
t S e r P l u s D e p = t a s k .S e r v F e e + 7 t a s k .D e p o s i t ; 8 i f ( f l a g == 1 ) { 9 t a s k .s t a t u s = 2 ; 10 a d d r = t a s k .C l o u d a d d r ; 11 a d d r .t r a n s f e r ( S e r P l u s D e p * e ) ; a s k .s t a t u s = 3 ; 14 a d d r = t a s k .C l i e n t a d d r ; 15 a d d r .t r a n s f e r ( S e r P l u s D e p * e ) ; 16 } Fig. 3: System Demonstration

TABLE I :
Time cost of phases.