Efficient Sum-Check Protocol for Convolution

Many applications have recently adopted machine learning and deep learning techniques. Convolutional neural networks (CNNs) are made up of sequential operations including activation, pooling, convolution, and fully connected layer, and their computation cost is enormous, with convolution and fully connected layer dominating. In general, a user with insufficient computer capacity delegated certain tasks to a server with sufficient computing power, and the user may want to verify that the outputs are truly machine learning model predictions. In this paper, we are interested in verifying that the delegation of CNNs, one of the deep learning models for image recognition and classification, is correct. Specifically, we focus on the verifiable computation of matrix multiplications in a CNN convolutional layer. We use Thaler’s idea (CRYPTO 2013) for validating matrix multiplication operations and present a predicate function based on the insight that the sequence of operations can be viewed as sequential matrix multiplication. Furthermore, we lower the cost of proving by splitting a convolution operation into two halves. As a result, we can provide an efficient sum-check protocol for a convolution operation that, like the state-of-the-art zkCNN (ePrint 2021) approach, achieves asymptotically optimal proving cost. The suggested protocol is about 2× cheaper than zkCNN in terms of communication costs. We also propose a verified inference system based on our method as the fundamental building component.


I. INTRODUCTION
Machine learning enables computers to improve themselves through experience with data. A convolutional neural network (CNN) is a machine learning technique that is particularly useful for recognizing and classifying the image. In recent years, there has been massive progress in efficiency, and this has led to many practical applications of machine learning [1]- [3]. However, one may concern that the malicious service provider manipulates the outputs; hence, the service clients want to ensure that some models' results are true predictions. Thus, we can raise the following natural and meaningful question: How can a user ensure the results which are indeed predictions models efficiently?
The naïve approach is that a user computes a model directly and compares the outputs. However, the machine learning model could be too complicated and a burden for the user with limited computational resources. The verifiable computation, which is formalized by Gennaro, Gentry, and Parno [4] is a powerful building block to resolve this issue. It enables outsourcing some computations to an untrusted server while maintaining computation results. After the server computes some public function, it sends back the output together with a proof; a user can verify the result efficiently with consuming less computational cost for verification than the direct computation. Due to its strength, this concept as known as a verifiable delegation of computation is widely employed in various applications, which want to be guaranteed the results [4]- [6]. SafetyNet [7] and zkCNN [8] employ a verifiable computation technique to give guarantee the results of the deep learning inference models, especially verifying CNN. As is well-known, CNN has a several layers, and each layer consists of multiple operations such as convolution operation, activation, and pooling; however, among these, a convolution operation dominates all other operations in terms of verifiable computation. Since a convolution operation can be expressed into matrix multiplication, they focus on reducing the computational cost for matrix multiplication. To achieve it, Safe-tyNet uses a variant of GKR protocol [9] that is specialized for matrix multiplication, and they can reduce the computational cost from O(n 3 ) to O(n 2 ) where n is the dimension of the square matrix; however, unnecessary large n is required because 3-dimensional data is appropriate for the CNN, and thus n becomes more significant than the original dimension of the convolution operation. zkCNN proposes a new sum-check protocol for the convolution operation using Fast Fourier Transform (FFT). Even though zkCNN accomplishes an asymptotically optimal proving cost, it requires three sumcheck protocols proportional to the communication cost.

A. OUR CONTRIBUTION
In this paper, we propose a new efficient sum-check protocol for a CNN convolution operation. Inspired by [9], we introduce a function for a matrix that takes binary vectors representing the index and outputs a corresponding element. When a convolution process is translated into 2-dimensional matrices, CNN is impracticable since it is specified as a form of matrix multiplication that takes a cubic time in the matrix dimension naïvely. Thus, we interpret the input and intermediate values as 3-dimensional matrices, which is a more intuitive approach. However, a full binary adder should be used to compute a convolution operation in this setting and it does not support linearity. To overcome this barrier, we introduce a predicate function that allows a convolution operation to be linearized in each binary variable, which leads to saving the proving cost than naïve approach. Moreover, we can reduce further by well understanding our linearized convolution operation. As a result, we can achieve an asymptotically optimal proving cost for a convolution operation. Moreover, our scheme employs the sum-check protocol [10] and GKR protocol [6], thus does not require any trusted setup and cryptographic assumption. Additionally, we present a new verifiable CNN by applying our construction.
We rigorously analyze the efficiency of the proposed sumcheck protocol for a convolution layer in CNN. Our analysis shows that our approach also reaches the optimal computation cost for the prover as in that of zkCNN, which is substantially lower than that of SafetyNet. The computation cost for the verifier of our approach is much smaller than that of SafetyNet and larger than that of zkCNN. However, because zkCNN does not consider batch operations, we expect that our approach can be more efficient than that of zkCNN for large batch sizes. Finally, the proof size of our approach is slightly larger than that of SafetyNet and approximately 2× smaller than that of zkCNN.

B. RELATED WORK
Verifiable Computation. In the verifiable delegation of computation, there are two participants a prover and a verifier. The prover provides a computation result and a proof claiming the result is correct, and the verifier wants to check the proof efficiently so that convince if the result is correct or not [4]- [6]. Goldwasser et al. [6] proposed an interactive proof protocol for a layered arithmetic circuit, called GKR protocol, that makes the verifier can be sure of the computational integrity with much less computational cost than the cost of performing the computation by itself. Several optimization works of GKR protocol are provided, including, Cormode et al. [11] and Vu et al. [12] with practical implementations, and assuming structured circuits [9], [13]. In [9], Thaler proposed a highly efficient matrix multiplication sumcheck protocol with linear prover time, which enables the matrix multiplication task to be delegated.
zkSNARK. Gennaro et al. [14] proposed a novel zeroknowledge succinct non-interactive argument system (zk-SNARK) by introducing an efficient encoding method called quadratic arithmetic programming (QAP). The following works, [15]- [18] are proposed as improvements of QAP based zkSNARK. Though these methods yield efficient verification and proof sizes that are suitable for practical applications, they inherently require a trusted setup to generate the structured reference string. The trusted setup issue is addressed in [19]- [21]. There are alternatives techniques [20], [22]- [24] to construct zk-SNARK using polynomial commitment schemes [25], [26]. A prover provides the commitment of a polynomial and then proves the value of an evaluation of the polynomial at an arbitrary point chosen by the verifier in a polynomial commitment scheme. And its security and assumptions rely on the underlying polynomial commitment schemes.
Verifiable Matrix Multiplication for CNN. SafetyNet [7] is a verifiable computation of neural network inference using an interactive proof protocol. They express each operation in the deep neural network as the form of matrix multiplication and use a variant of GKR protocol that is specialized for matrix multiplication [9]. This approach reduces the verifier's computational costs for matrix multiplication from O(n 3 ) to O(n 2 ) assuming the two square matrices. However, the 3-dimensional data must be modified to fit with the 2dimensional matrix representation of operation when The convolution operation, in particular, exacerbates performance loss since one side of the matrix grows greater than the original. We will describe more about this in Section III-A. It is worth noting that the gap introduces a slew of redundant procedures and increases the protocol's overall complexity.
ZEN [27] is a compiler that provides a verifiable inference model for neural network models while preserving the privacy of input data. The authors devise the compiler schemes SNARKs based on GKR protocol. zkCNN [8] also proposes a zero-knowledge proof scheme for CNNs. At first, they propose a new sum-check protocol for the Fast Fourier Transform (FFT), whose proving cost is O(N ) for a vector of size N . Using it, they propose the sum-check protocol for a convolution operation of a n × n input and a w × w kernel, which can achieve an asymptotically optimal proving cost O(n 2 + w 2 ). Using it as a building block, the total proving cost for the CNNs becomes O((c 1 + c 2 )n 2 + c 1 c 2 w 2 ), where c 1 and c 2 denotes the number of input and output channels, respectively. However, they require three sum-check protocols for FFT, Hadamard product, and inverse FFT. Since the proof size is linear in the number of sum-check protocols and our scheme only needs two sum-check protocols, our proof size is less than zkCNN.

C. ORGANIZATION
The remainder of this paper is organized as follows. Section II provides some necessary background for the rest of the paper. In Section III, we present a main building block protocol for verifiable convolution operation. We then present how our technique for the convolution operation can be applied to a verifiable CNN in Section IV. Finally, we provide some concluding remarks in Section V.

II. BACKGROUNDS
Notations. We use a notation R to denote a set of real numbers with fixed precision. For a security parameter, λ and a prime p of length λ, Z p denotes a field of integers modulo p, and Z * p denote its multiplicative group. Let Z p [X 1 , · · · X n ] be a set of n-variate polynomials on Z p . We use two constant vectors (1, . . . , 1), (2 i−1 , . . . , 2, 1) ∈ Z i , denoted by 1 i and 2 i , respectively. For two integers j, k with j < k, we use a subscript [j : k] to represent a vector whose components are consecutive integers, i.e. x [j:k] := (x j , x j+1 · · · , x k ). F i ∈ R ni×ni×ci×b represents a feature map, which is the input of i-th operation in CNNs, where n i is the number of components in width and height, c i is the number of channels at F i . and b is the batch size. For the simplicity, we let N i = n i · n i · c i . For a matrix A ∈ Z n×m p , we denote A x,y to represent (x, y)-th element in A and interchangeably use it as a function mapping A : {0, 1} log n × {0, 1} log m → Z p that takes binary vectors representing indices of A as inputs and returns a corresponding element. For example, A(x, y) = A x,y , where x ∈ {0, 1} log n and y ∈ {0, 1} log m are binary vectors of x and y with the order of the higher bit first. We remark that function mapping always exists for any matrix. Thus, we overwrite the function notion for any given matrix without redefining it. This can be extended to any high dimensional matrices. We denote the feature map space and weight space by F and W, respectively. We define convolution operation as C : F × W → F, activation as A : F → F, pooling as P : F → F, and the fully connected operation as FC : F → F.

A. CONVOLUTIONAL NEURAL NETWORKS 1) Fully Connected Neural Networks
CNNs that we consider in this paper consist of several layers such as fully connected layer and pooling layer. The fully connected layer is followed by an activation function σ : R → R. 1 Input to the network is a column vector x 0 ∈ R N0×b , where N 0 is the input dimension and b is the batch size.
Suppose that the -th layer is a fully connected layer. A fully connected layer has N output neurons storing a weight matrix W −1 ∈ R N ×N −1 and a bias b −1 ∈ R N . It takes x −1 ∈ R N −1 ×b as an input and outputs Suppose that the -th layer is a pooling layer. Pooling layers are commonly used to reduce the network size and hence avoid overfitting. The methods of max pooling, average pooling, and stochastic pooling are all commonly utilized. The sum pooling method, which outputs the sum of activations in each local region, is considered in this paper. That is, it takes x −1 ∈ R N −1 ×b as an input and outputs x = P −1 ·x −1 ∈ R N ×b , where P −1 is an N ×N −1 matrix which consists of zeros and ones. Note that sum pooling is almost equivalently effective to average pooling and used by the SafetyNet [7].

2) Convolutional Neural Networks
CNNs additionally use a convolution layer. Input to the network is a column vector x 0 ∈ R n0×n0×c0×b , where n 0 × n 0 × c 0 is the input dimension and b is the batch size. For example, a 32 × 32 size image data with RGB colors is of dimension 32 × 32 × 3.
Suppose that the -th layer is a convolution layer having n × n × c output neurons storing a weight matrix W −1 ∈ R d −1 ×d −1 ×c −1 ×c and a bias B −1 ∈ R n ×n ×c . It takes X −1 ∈ R n −1 ×n −1 ×c −1 ×b as an input and outputs where the (2-dimensional) convolution U = W * X ∈ R n ×n ×c ×b is defined as X α+r,β+s,t,τ · W r,s,t,γ .
Note that we consider only the case of stride 1 with no padding for the sake of simplicity, and thus n = n −1 − d −1 + 1.

3) Embedding Real Numbers into a Finite Field
Although the fact that neural networks can conduct all realnumber operations, proving the integrity of real-number operations, even with state-of-the-art approaches, is difficult. Instead, we quantize each real number and embed it into a finite field with a large characteristic.

B. INTERACTIVE ORACLE PROOFS 1) Sum-Check Protocol
The sum-check protocol enables solving the sum-check problem, proving that evaluations of some n-variate polynomial over the Boolean hypercube agrees with a claimed value. More specifically, for P (X 1 , · · · X n ) ∈ F[X 1 , · · · X n ] and y ∈ F, the sum-check problem is proving In general, one can verify it by evaluating all 2 n points, however, it causes inefficiency problem as n increases. The sum-check protocol enables to verify only O(n) polynomial evaluations.
In the sum-check protocol, both parties P and V share a multivariate polynomial P (X 1 , . . . , X n ) ∈ F[X 1 , . . . , X n ] and an output y claimed by P. At first, P constructs a univariate polynomial and sends P 1 (X) to V. After receiving P 1 (X), V checks whether y = P 1 (0) + P 1 (1) and if it holds, sends a random element r 1 ∈ F. Similarly, P constructs a univariate polynomial · · · xn∈{0,1} P (r 1 , X, · · · , x n ) and sends P 2 (X) to V. After receiving P 2 (X), V checks whether P 1 (r 1 ) = P 2 (0)+P 2 (1) and sends a random number r 2 ∈ F to P and repeat this procedure until the end of the protocol. More specifically, for each round 2 ≤ i ≤ n, P sends a univariate polynomial and V checks whether P i (r i ) = P i+1 (0) + P i+1 (1). In the final round, V additionally checks whether P (r 1 , r 2 , · · · , r n ) = P n (r n ).
If all tests are passed, V accepts, otherwise V rejects.
The sum-check protocol satisfies perfect correctness and soundness with error nd/|F| where deg(P ) is at most d. If P follows the whole protocol honestly, V accepts the protocol with probability 1. However, if P does not follow the protocol, V accepts the protocol with probability at most nd/|F|. The sum-check protocol consists of n rounds. Let deg(P i ) be the degree of variable X i in P . At each round P sends deg(P i ) + 1 elements in F and V sends one element in F, except the last round. Thus, the total communication cost , and additionally P (r 1 , . . . , r n ) for the last round, thus, the overall computation cost for V is O(n + n i=1 deg(P i )) evaluations of univariate polynomials P i 's. The sum-check protocol is a public coin interactive protocol, and it can be converted non-interactive protocol by Fiat-Shamir transformation under random oracle model [31].

2) Multilinear Extensions
For an n-dimensional vector x ∈ F n , we can construct a function f : {0, 1} log n → F that takes a binary vector representing an index of x as an input and outputs corresponding component of x. Multilinear extensions (MLE) allow us to transform from an index-component function f : Define a function f : {0, 1} log n → F from log ndimensional hypercube to F. Then there is a unique MLẼ f : F log n → F of f [11]. By uniqueness property, n distinct field elements can be viewed as the MLE polynomialf of f and vice versa. Using Lagrange interpolation, the MLE of f can be represented as

3) Sum-Check Protocol for Matrix Multiplications
Thaler proposes a new efficient sum-check protocol for the matrix multiplications [9]. Let W, X ∈ F n×n . Even though it costs O(n 3 ) field operations to multiply X · W , one can verify only O(n 2 ) operation using the sum-check protocol due to Thaler [9]. To apply the sum-check protocol, an arithmetic circuit of a matrix multiplication must be transformed according to the sum-check protocol.
Without loss of generality, assume that n is the power of 2.
for all (i, j) ∈ {0, 1} log n × {0, 1} log n . LetW : F log n × F log n ,X : F log n × F log n andỸ : F log n × F log n be a MLE of W , X and Y , respectively. Since a MLE for a function is unique, we conclude that for is the MLE of (3). Instead of checking polynomial equivalent, it is enough to check the evaluation on random point (r L , r R ) ∈ F log n × F log n . Here, soundness error is bound on log n 2 /|F| by Schwartz-Zippel lemma [32], [33]. Put a random point (r L , r R ) on (4). Theñ This is a form of the sum-check problem. The last step is applying sum-check protocol on (5).

III. AN EFFICIENT MATRIX REPRESENTATION FOR CNN
We remark that our goal is to construct an efficient proof system that proves correct evaluation for a given CNN. To achieve this goal, we propose a new sum-check protocol specializing in image data. In this section, we explain how our approach achieves the efficiency for each operation.

A. CONVOLUTION OPERATION
We consider zero padding with stride 1 for the purpose of simplicity. As mentioned in Section II-A, the convolution operation differs from the standard matrix multiplication. The convolution operation can be transformed into a matrix multiplication by reformulating the weights to a sparse matrix because it performs dot products between the weights and input feature maps. After looking into how it can be expressed as 2-dimensional matrix multiplication, we describe our approach and compare the efficiency of two different approaches. In Fig. 1, we illustrate the existing approach and our approach to a matrix multiplication for a convolution operation.

1) Naïve Approach
Consider a direct approach from SafetyNet [7] to fit in the proof system for 2-dimensional matrix multiplication. We write N i = n i · n i · c i . In fact, the operations in (1) and (2) are conducting a same operations called inner-product operation. Hence it can be reduced as the form of matrix multiplication. In the convolution operation in (2), a feature map be the resulted feature map of convolution operation. Then we can express the convolution to matrix multiplication of F i and W i as After that, we can directly apply the sum-check protocol for 2-dimensional matrix multiplication onŨ (7) holds by the fact that (6) is multilinear in each binary variable. However, this transformation causes redundant multiplications because it increases the size of two side in W from d i to n i and in the practical usages, the parameters tend to be d i n i . Thus, the transformation results in significant performance degradation.

2) Our Approach
We use a method to eliminate the transformation in order to avoid the weight matrix blowing up in size. This is a straightforward technique, but it is not simple. The first issue is that the convolution operation cannot be represented in terms of each index as a linear polynomial. As a result, using the MLE-based sum-check protocol is difficult. The second point is that, while we can describe the convolution process linearly, this does not ensure the efficiency of the sum-check protocol, whose performance is largely determined by how efficiently the MLE is calculated. Finally, because we are working with a deep neural network, the formula should be recursively induced. This allows the sum-check procedure to be applied repeatedly to backward operations.
Our Linearlized Convolution Operation. Let F ∈ Z ni×ni×ci×b p and W ∈ Z di×di×ci×ci+1 p be the feature map and weight matrix in i-th layer, respectively, and U = C(F, W ) ∈ Z ni+1×ni+1×ci+1×b p be the result of convolution operation in i-th layer, then we express a convolution operation as where denotes a binary full adder. However, a binary full adder does not support linearity. To express the convolution operation as a linear form in each binary variable that representing index, we introduce two additional functions B d and J d for some d ∈ Z as follows: where m 1 , m 2 , m 3 ∈ Z, 2 i = (2 i−1 , . . . , 2, 1) for some i ∈ Z and , denotes an inner-product operation. Thus, we get the following linear expression for the convolution operation for (8): for τ ∈ {0, 1} log b and γ ∈ {0, 1} log ci+1 .
Reducing Cost for J 0 . We linearize a convolution operation to execute the sum-check protocol by introducing a new function J 0 and the prover must show that F , J 0 , and W satisfies (9). Due to [9], the computational complexity of the sum-check protocol totally depends on the degree of the polynomial and then it takes approximately O(n 2 to construct multilinear extension of F , J 0 and W , respectively, and it takes about O(n 2 i c i d 2 i ) to execute the sum-check protocol with (9). Due to J 0 , it is apart from asymptotically optimal proving cost, thus we need to present J 0 with less degree polynomial.
Reducing Cost for Sum-Check Protocol. In the above paragraph, we reduce computational complexity for multi- However, in (10), it takes about O(4n 2 i c i d 4 i ) to execute the sum-check protocol and it is still far from the asymptotically optimal complexity.
We run the sum-check protocol in two steps to reduce the cost during the sum-check protocol. More specifically, we divide the Eq. (10) and apply a linear extension. Definẽ Then, (10) is translated tõ The equality in the above two equations holds by the uniqueness of MLE. The prover claims that the matrix U is the result of a valid convolution operation. To check whether U is computed correctly, the verifier use (12) and (11) instead of (10). As we noted about MLE in Section II-B, checking whether the MLEŨ has the right value at a random point that the verifier selects is enough to check the validity of the computation. Hence the prover and the verifier first run the sumcheck protocol for (12). At the final round, the verifier must know the valuesF ((α 1 , p 0 ), (β 1 , q 0 ), c 0 , τ ),J 0 (α 2 , a 0 , η), J 0 (β 2 , b 0 , ζ) andW (γ, a 0 , b 0 , c 0 ) where a 0 , b 0 , c 0 , p 0 , q 0 are randomly chosen during the sum-check protocol. The values ofJ 0 andW are easily checked by verifier and the value ofF ((α 1 , p 0 ), (β 1 , q 0 ), c 0 , τ ) is again the claim by prover that must be proved. They continue to run the sumcheck protocol for (11) to make the verifier convince that theF is derived from F . Assuming that the F is input, the verifier can finalize the continued sum-check protocol by computingF (p, q, c, τ ) using the input. We present our approach for a sum-check protocol for a CNN convolution operation in Protocol 1.
Theorem 1: The interactive protocol presented above has soundness error 2 log(ni 2 cidi 2 (di+1) 2 ) p . Proof: First, we consider the sum-check protocol of (12). In each round, the prover sends univariate polynomial degree at most 2. Hence, if there is only one round, the provability that a dishonest prover's claim satisfies the verification is less than 2 p . Denote ν := log(c i d i 2 (d i + 1) 2 ), µ := log n 2 i and let f i be a dishonest prover's polynomial that is sent to verifier in i-th round and f i be a honest one. And let E be the event that verifier accepts invalid statement. Then we have Therefore, the soundness error for (12) is 2ν p . Similarly the soundness error for (11) is 2µ p . By adding the two soundness errors, we get the upper bound of soundness error for Protocol 1.

3) Efficiency Analysis
In the below analysis, the unit operation is the field operation.
Naïve Approach. We analyze the efficiency of naïve approach first. We recall that during the protocol, the prover should substitute a random number from the verifier for the previous round's variable and compute a univariate polynomial for the following round of sum-check. As a result, the proving cost in the sum-check protocol is proportional to the number and degree of variables. Therefore, the total proving cost for evaluating the sum-check protocol (7) is i c i ) for checking consistency of polynomial that computed by the prover. Since this term is dominated by above terms O(n 2 i c i n 2 i+1 c i+1 ), we conclude that total verifier cost is O(n 2 i c i n 2 i+1 c i+1 ) for i-th intermediate layer, O(n 2 1 c 1 n 2 2 c 2 + n 2 1 c 1 b) for the first layer, O(n 2 i c i n 2 i+1 c i+1 + n 2 i+1 c i+1 b) for the last layer. The prover and the verifier run Sum-check protocol for equation (7) and its target polynomial has degree 2 for all log n 2 i c i variables. Then the prover sends 3 log n 2 i c i field elements to the verifier and verifier sends log n 2 i c i − 1 random values to the prover. Our Approach. Now we consider our approach. Our approach for convolution layer consists of two sum-check protocols to prove (11) and (12). First, to prove (12), VOLUME 4, 2016 where log d = k, α = (α1 α2) and β = (β 1 β 2 ) with α1, β 1 ∈ Z log n 1 −k p , α2, β 2 ∈ Z k p Common inputs: public parameter pp = (n0, c0, n1, c1, d, b), input feature maps F , Weight matrix W , evaluation y of MLE of output feature maps at random points (α, β, γ, τ ) Rounds indices for Sum-check protocols j := 4k + log c0 + 2 and l := 2 log n0 Step 1 j-rounds Sum-check protocol for following equation V checks fi−1(δi−1) = fi(0) + fi(1) and if it does not hold, abort it. And then V send a random δi ∈ Zp to P 6: P and V parse δ [1:j] p and then V computes Step 2 l-rounds Sum-check protocol for following equation where x = (x1 x2) with x1, x2 ∈ {0, 1} log n 0 . V checks gi−1( i−1) = gi(0) + gi(1) and if it does not hold, abort it. And then V send a random i ∈ Zp to P 12: V checks g l ( l ) = g( [1:l] ) and if all checks hold, V accept the protocol the prover evaluates MLE of resulted feature map U ∈ Z ni+1×ni+1×ci+1×b p , weight matrix W ∈ Z ci×di×di×ci+1 p , and two sparse matricesF ∈ Z 2ni×2ni×ci×b , and O(2d 3 i ) field operations, respectively. Additionally, O(4d 4 i c i ) field operations are required for constructing univariate polynomials in the first sum-check protocol. Second, to prove (11), the prover evaluates MLE of a feature map matrix F ∈ Z ni×ni×ci×b p and sparse matrix J di ∈ Z ni+1/di×2di×ni p and they requires O(n 2 i c i b) and O(2n i n i+1 ) respectively. Additionally, O(n 2 i ) operations are required for constructing univariate polynomial in the second sum-check protocol. We conclude that total prover cost for our protocol is O(n 2 . Additionally, the costs for checking polynomial consistency are O(log n 2 i ) and O(log(4d 4 i c i )) for sum-check protocol on (11) and (12), respectively. In the case of the first and last layers, the verifier should computeF andŨ at a random, respectively. These cost O(n 2 i c i b) In the case of the first layer, the verifier must computeF at a random point, which costs additional O(n 2 i c i b) operations. In the case of the last layer, the verifier should computeŨ at a random point, which costs additional O(n 2 i+1 c i+1 b) operations. Note that the verifier does not have to evaluateF .
Finally, during sum-check protocols on (11) and (12), the prover sens 3 log n 2 i and 3 log d 4 i c i + 6 field elements, respectively. The verifier sends log n 2 i field elements for step 1 because the prover needs last random elements to execute Step 2. In Step 2, verifier sends log d 4 i c i +1 random elements. For our approach and SafetyNet, both non-batch and batch cases are presented while zkCNN does not consider the batch case. In zkCNN, we do not include the prover and the verifier cost for polynomial commitment which is required for zero knowledge.

Protocols Computation Prover
Verifier For our approach and SafetyNet, both non-batch and batch cases are presented while zkCNN does not consider the batch case. In zkCNN, we do not include communication cost for polynomial commitment which is required for zero knowledge.

Protocols Proof Size
SafetyNet non-batch 6 log n i + 3 log c i batch zkCNN non-batch 6(2 log n i + log c i + log c i+1 + log d i ) This work non-batch 6 log n i + 3 log c i + 12 log d i + 6 batch We estimate the proof size in bytes. The base field is F = Zp for p = 2 61 − 1 as considered in SafetyNet. Original convolutional layers of ResNet have a stride of 2, however, we consider a stride of 1 for convenience sake. For all networks, we only consider the non-batch case. This work   LeNet-5  32  5  1  6  229  732  549   VGG  224  3  3  64  412  1,190  641   ResNet  224  7  3  64  412  1,236  732 Totally, the prover sends 3 log(n 2 i c i d 4 i )+6 field elements and the verifier sends log n 2 i d 4 i c i + 1 random elements for the convolution sum-check protocol.
Efficiency Comparison. We summarize our analysis on efficiency compared with SafetyNet and zkCNN in Table 1 and Table 2. Table 1 shows that the computation cost for the prover in our approach is comparable with that of zkCNN, which is quite smaller than that of SafetyNet. The computation cost for the verifier in our approach is smaller than that of SafetyNet and larger than that of zkCNN. However, we should note that zkCNN does not explain how to deal with batch operations; thus, the computation cost for the verifier in zkCNN may increase in proportion to the batch size b. Table 2 shows that the proof size of our approach is highly efficient. The proof size in our approach is comparable with that of SafetyNet and almost 2× smaller than that of zkCNN. We also note that the proof size of zkCNN may increase in proportion to the batch size b. We present the actual proof size for the existing CNNs, i.e., LeNet-5 [1], VGG [34], and ResNet [35] in Table 3. In Table 3, we estimate the proof sizes of SafetyNet, zkCNN, and our approach from Table 2 for the first convolution operation of each network.
We should note that n i is greater than d i in the real application. As a result, our methodology implies that the computational complexity and communication complexity of the prover and the verifier will be significantly reduced. The reason for this is that while the naïve approach is effective with one-dimensional data, it does not suit well with the convolution operation, which deals with three-dimensional data. We point out that our method is the first sum-check protocol optimized for three-dimensional convolution operations, which are employed in image recognition, image classification, and other image-processing neural networks.
Finally, we remark that the computation cost of the verifier is comparable with that of zkCNN for a CNN with deep layers such as VGG and ResNet. This is because c i becomes larger than n i for deep layers while n i is larger than c i for early layers. For example, the last layer of VGG has parameters n −1 = 14, n = 7, and c −1 = c = 512.

B. ACTIVATION AND POOLING
We adopt quadratic activation and sum pooling to make all of the neural network processes can appear only with arithmetic operations following the ideas used in SafetyNet [7]. Here, we address how to conduct the activation and sum pooling operations in our network settings. VOLUME 4, 2016

1) Quadratic Activation
The quadratic activation performs an element-wise operations. For our purpose, we should express the quadratic function as a linear function in each binary variable. We consider U ∈ Z ni+1×ni+1×ci+1×b p . To show the correct execution of the quadratic operations, we observe that where I is the identity matrix. We emphasize that it allows us to take a linear extension. The prover and the verifier run the sum-check protocol on the MLE of above (13). At the end of the sum-check protocol, the verifier yields an assertion about U . And to prove/verify the assertion is correct, both keep going on the proof procedure until the verifier gets the assertion from the input layer. The prover computational complexity is O(n i+1 2 c i+1 b), the verifier computational complexity is O(log (n i+1 2 c i+1 )) and to check in the last round. They communicate with O(log (n i+1 2 c i+1 )) field elements.

2) Sum Pooling
The purpose of using pooling layer is to reduce the dimensions of the feature maps to prevent overfitting. For the sake of simplicity, we consider 2 by 2 pooling layer but it can be easily generalized n by n pooling layer. Let T ∈ Z ni+1×ni+1×ci+1×b p be an input matrix of pooling layer, S ∈ Z n i+1 2 × n i+1 2 ×ci+1×b p be an output matrix of pooling layer. We can represent sum pooling as multiplication of three matrices by defining a sparse matrix P ∈ Z ni+1× n i+1 2 p such that If S is valid output of pooling layer from the input matrix T , S γ,τ = P · T γ,τ · P holds for any channel γ ∈ [c i+1 ] and batch τ ∈ [b], where P denotes a transpose of P . Using the matrix equality and property of P (x, y) = P (y, x), we have From above equation, we consider a linear extension and use the sum-check protocol. Note that at the end of the protocol, the verifier checks the correctness of the valueP locally and keeps going on another sum-check protocol to check T . The prover computational cost is O(n i+1 2 c i+1 b) and the verifier computational complexity is O(log (n i+1 2 )). They communicate with O(log (n i+1 2 )) field elements.

IV. EXTENSION TO CNN
Now we describe our protocol for simplified version of CNN that consists of convolution-activation-pooling-fully connected layers. From this simple neural network, it is straightforward to extend to other common CNNs. We assume that the prover and the verifier agree on the computational model and parameters. In other words, model parameters such as weights, biases, kernel size, and so on are publicly available. As we described in Section III, we assume that all of the operations in the CNN are the matrix multiplications.

A. HIGH LEVEL DESCRIPTION
Softmax is frequently used as the activation after the fully connected layer in a classification challenge to make the result interpretable as a probability. However, the proof generation for softmax is computationally demanding works. Thus, we assume that the prover provides the outputs of the fully connected layer first, rather than the probability result of network. The prover then proves the claim for a fully connected layer using the sum-check protocol. For ] be the polynomial that the prover sends to the verifier in i-th round of the sumcheck protocol for fully connected layer. To finalize the claim about fully connected layer, the verifier checks whether for random point r = (r 1 , ..., r N3 ) ∈ Z N3 p . Here,W and F are MLE of weight and inputs of fully connected layer, respectively.
To check the above equation, the verifier should compute MLE of F , which is the output of pooling layer. Instead of evaluating directly, the verifier checks that the F is from the previous pooling layer using (15). Since F is the output of the previous pooling layer, F has the form of (14). Both parties run the sum-check protocol onF (r). This protocol guarantees that F is a correct evaluation from pooling layer. After that, the verifier needs to compute the MLE of inputs of pooling layer, which should be equivalent to outputs of activation layer. In this step, both parties run the sum-check protocol for the activation layer. As the last step, they run the convolution sum-check protocol. Since the verifier already knows the inputs of the network, the verifier can complete the convolution sum-check. It is crucial to improve the efficiency of the convolution part because the evaluation of MLE for weight matrix in each convolutional layer is the most computationally demanding works for the verifier. Thus, we remind that our approach of convolution is a key idea for efficient verifiable CNN.
To reduce prover cost, we adapt the two-step sum-check idea. At the end of the convolution sum-check parts, V computes X using the input value of the CNN and then finalizes the protocol by outputting accept or reject. Proof: The protocol starts from reducing the prover's claim about output matrix to a single point of the multilinear extension. This introduces the soundness error log sb p . The sum-check is carried over log N 3 rounds for the fully connected layer and 2 log n 2 rounds for the pooling layer and log N 1 rounds for quadratic activation layer. Finally, the two sum-check protocols for convolutional layer take log (d 2 (d + 1) 2 c 0 ) + 2 log n 0 rounds. And the univariate polynomials that transmitted by prover during the sum-check protocol for pooling layer have degree at most 3 and all the rest of univariate polynomials have degree at most 2. This causes the factor of 3 and 2 in the soundness error. Hence, as the observation of Theorem 1, by adding up all the soundness error from all layers based on the protocols, we obtain the soundness error 1 p · (log sb + 2 log N 3 + 6 log n 2 + 2 log N 1 + 2 log (d 2 (d + 1) 2 c 0 ) + 4 log n 0 .
Since the sum-check round for pooling layer is only 2 log n 2 rounds, log c 2 does not appear in this step. However, for convenience sake, we write the term 6 log n2 p as O( log N2 p ).

V. CONCLUSION
In this paper, we construct a new efficient sum-check protocol for a convolution operation of CNN. The existing protocol such as SafetyNet and zkCNN transform a convolution operation of CNN to 2-dimensional matrix multiplication. However, CNN is defined as a form of matrix multiplication that takes a cubic time in the matrix dimension, and those approaches should apply a sum-check protocol for large weight matrices. The idea behind the construction is to interpret the input and intermediate values as 3-dimensional matrices, which is a natural approach for convolution operations. Our approach is non-trivial because convolution operations in this setting do not satisfy linearity. We resolve the challenge by introducing predicate functions that make a convolution operation linearized in the each binary variable. Our construction provides asymptotically optimal proving cost as zkCNN, which is much more efficient than SafetyNet. The proof size of our construction is approximately 2× smaller than that of zkCNN. In addition, by applying our construction, we present a new verifiable CNN which employs quadratic activation and sum pooling. We will continue to research applying our approach for CNN with other activation functions and pooling operations widely used in future work.