Quantum Dilated Convolutional Neural Networks

In recent years, with rapid progress in the development of quantum technologies, quantum machine learning has attracted a lot of interest. In particular, a family of hybrid quantum-classical neural networks, consisting of classical and quantum elements, has been massively explored for the purpose of improving the performance of classical neural networks. In this paper, we propose a novel hybrid quantumclassical algorithm called quantum dilated convolutional neural networks (QDCNNs). Our method extends the concept of dilated convolution, which has been widely applied in modern deep learning algorithms, to the context of hybrid neural networks. The proposed QDCNNs are able to capture larger context during the quantum convolution process while reducing the computational cost.We perform empirical experiments on MNIST and Fashion-MNIST datasets for the task of image recognition and demonstrate that QDCNN models generally enjoy better performances in terms of both accuracy and computation efficiency compared to existing quantum convolutional neural networks (QCNNs).


I. INTRODUCTION
Convolutional neural networks (CNNs), proposed by Yann LeCun et al. [1] in 1989, are one of the most powerful algorithms in the context of deep learning. The main advantage of CNNs is that they use multiple feature extraction stages to automatically and accurately learn important features from the data without any human supervision. Due to this advantage, CNNs have been tremendously successful in a broad array of high-level computer vision problems, including image recognition [2]- [5] , object detection [6]- [8], and image segmentation [9]- [11]. In recent years, with further development in deep learning , CNNs have also demonstrated promising performances in other machine learning areas such as time series forecasting [12], [13], speech recognition [14], and recommendation system [15].
Parallelly, with recent achievements in quantum technologies (e.g. noisy intermediate-scale quantum (NISQ) processors are currently available), the domain of quantum machine learning has attracted growing concerns and triggered an enormous amount of work. Quantum machine learning is a research area with the purpose of utilizing quantum mechanical effects such as superposition and entanglement to improve the performance of machine learning algorithms. Even though quantum machine learning is a new discipline, it has witnessed a number of successful quantum extensions to classical machine learning problems, including support vector machine [16], clustering [17], [18], and principal component analysis [19].
Among quantum machine learning algorithms, quantum convolutional neural networks (QCNNs), also known as hybrid quantum-classical convolutional neural networks, are a family of variational quantum algorithms and they have recently become a very active research field. The central idea of QCNNs is to construct a quantum convolutional layer within neural networks based on parameterized quantum circuits to estimate complex kernel functions in high dimensional Hilbert space. Inspired by CNNs, Liu et al. [20] proposed the first QCNN model and implement it for image recognition. Afterwards, the QCNN model was investigated further in various work [21]- [26]. Recently, it has been demonstrated in [27] that QCNN models can also achieve promising results in speech recognition.
Despite these successes, QCNNs suffer from computational bottlenecks which make it time consuming to train QC-NNs. Firstly, quantum operations applied on n-qubit quantum circuits require unitary matrices of size 2 n × 2 n which will scale exponentially as the size of quantum circuit. Moreover, the calculation of gradients, due to the parameter-shift rule [28], [29], result in more quantum circuit executions, when QCNNs are trained on a real quantum device. As an example, a quantum filter with p trainable parameters will add 2p more quantum circuit executions for each training sample to compute the required gradients. Even though this problem can be mitigated when QCNNs are implemented on quantum simulators that support more efficient gradient computation methods such as back-propagation [30], [31] and adjoint differentiation method [32], it is inevitable for QCNNs to face another challenge. In CNNs, a convolutional layer, due to local connectivity, performs a large amount of elementwise matrix multiplication operations. For example, an output 100×120 feature map of a convolutional layer is obtained from 100 × 120 = 12, 000 multiplication operations. The computational cost will increase significantly with the feature map size. Fortunately, this computational issue in CNNs can be handled by using vectorization techniques [33], [34]. QCNNs, as the counterpart of CNNs, have the same problem. However, unlike CNNs, most of current quantum devices, including quantum hardware and quantum simulator, do not support vectorization. Despite the availability of more mature quantum devices in the NISQ era, executing a large number of quantum circuits would be impractical in general.
A few works have been done to investigated how to reduce the runtime complexity of QCNNs. In the first family of works, a small number of qubits required for the quantum circuit is achieved by using classical data pre-processing techniques to reduce the dimension of the input features fed into the quantum (convolutional) layer. For instance, Pramanik et al. [35] employ principal component analysis (PCA) to reduce the VGG-16 features for the quantum variational classifier (VQC), while Hur et al. [36] adopt autoencoding (AutoEnc) for the dimensionality reduction. Nevertheless, the performance of the model trained in this way is likely to be compromised by the limited expressive power of the reduced features, as shown in [35]. The second family of works focus on how to efficiently encode classical data into quantum states. Schuld and Killoran [37] propose and implement the amplitude encoding for variational quantum circuits, which is explored further in [38] for Flexible Representation of Quantum Images (FRQI). This type of encoding method is efficient in terms of required qubits for data encoding but it relies on too deep quantum circuits which are unpractical on NISQ devices. In a different direction, some recent researchers [26], [39], [40] propose angle encoding (also referred to as qubit encoding) and its variants (e.g. dense angle encoding) which use a constant quantum circuit depth for state preparation. This encoding scheme requires one qubit to encode one or a limited number of components of the input feature vector and thus is not efficient for highdimensional input features from a resource prospective. To trades off these two encoding methods mentioned above, Hur et al. [36] further develop a hybrid encoding approach which requires fewer number of qubits than the angle encoding and use shallower quantum circuit depth than the amplitude encoding. Moreover, Henderson et al. [22] employ a threshold based encoding technique to reduce the input-state space and made it possible to obtain the output feature map through a look-up table during the quantum convolution process without needing to execute the same quantum circuit repeatedly on image segments. This method is easy to implement, but it is infeasible on real quantum devices, as mentioned in [22].
Having reviewed all these challenges and developments, in this work, we propose a novel hybrid quantum-classical architecture which we will call quantum dilated convolutional neural network (QDCNN). Our approach, motivated by the dilated convolution in deep learning, is an extension of the architectures presented in [20] and [22], and helps reduce the computational cost of QCNNs in a different way compared to the aforementioned approaches. Dilated convolution, also known as atrous convolution, was originally developed for efficiently computating the undecimated discrete wavelet transform [41]. In recent years, dilated convolution has attracted more and more attention, and is widely used in semantic segmentation [11], [42]- [46]. Following these successes, dilated convolution has also been adopted for a broader set of tasks, such as object localization [47], time series forecasting [12], [13], and sound classification [48]. The advantage of dilated convolution is that it allows for effectively expanding the field of view of filters to capture larger context without increasing the number of parameters or the computational complexity. By virtue of dilated convolution, the proposed QDCNNs can generally improve the computational efficiency of existing QCNNs while achieving the better task performance.
In summary, the contributions of our work are • We propose a novel architecture of quantum convolutional neural network based on quantum dilated convolution operation. To the best of our knowledge, our work is the first attempt to combine the concept of dilated convolution with variational quantum circuits.
• We conduct experiments using MNIST and Fashin-MNIST datasets and demonstrate the superior performance of QDCNN models over QCNN models.

1) Convolution Operation
The convolutional layer, which performs an operation called a "convolution", plays a central role in CNNs. In the context of convolutional networks, a convolution is a linear operation that involves the multiplication of a set of weights with the input. For a convolution operation, a kernel or filter is defined as a feature extractor which is a two-dimensional (2-D) array of learnable weights. A filter is applied to a filter-size patch of the input image called receptive field and a dot product is performed between the pixels within the receptive field and the weight values in the filter. Afterwards, the filter shifts to the next patch according to a step size called stride, and repeats the above process until it has swept across the entire image. The final output from the series of dot products between the filter weights and the values underneath the filter, is called a feature map. Let us denote the output feature map by y and the input image by x. In the 2-D convolution process, the feature map y is obtained by applying a filter k to the input image x: where i and j are location indices of y. The output feature map, due to the convolution operation, usually has smaller spatial resolution than the input image. This reduction in dimensions can be avoided by employing zero padding technique, namely adding a border of pixels with value zero around the edges of the input image before the application of a filter. A hyperparameter called padding can be defined to determine how many zero values to add to the border of the image. Generally, the spatial resolution o w and o h of the resulting feature map, extracted from an i w × i h input image by a m × n kernel, can be calculated as where p and s represent padding and stride respectively.

2) Dilated Convolution
Dilated convolution is a type of convolution that expands the kernel by inserting holes (i.e. points with weight of zero) between the consecutive kernel elements. In simple terms, dilated convolution is just a convolution applied to the input with defined gaps. Compared to standard convolution, dilated convolution introduces an extra hyperparameter called dilation rate that determines the stride with which the input pixels are sampled. According to the definition of dilated convolution, r − 1 zero values are inserted between two consecutive filter values, if the dilation rate is denoted by r.
In this spirit, (1) needs to be reformulated as in the context of dilated convolution. It can be seen from (4) that dilated convolution is able to capture a larger receptive field without introducing more learnable parameters compared to standard convolution with the same kernel size. Moreover, for dilated convolution, we also need to rewrite (2) and (3) as which indicate that dilated convolution generally results in a feature map with smaller size compared to standard convolution for the same set of hyperparameters. It is worth noting that the standard convolution can be regarded as a special case of dilated convolution with dilation rate r = 1.

3) Quantum Convolution
In contrast to classical convolution, quantum convolution is a new type of convolution based on quantum circuits and it generally consists of three modules: • ENCODING MODULE. In this module, classical data are encoded into a quantum state which will be further processed in the quantum convolutional circuit. There exist various encoding methods such as angle encoding, amplitude encoding and basis encoding. A summary of them can be found in the literature [49]. Among these methods, angle encoding is the most commonly used encoding approach. In this encoding scheme, the classical input is treated as the rotation angle of a single-qubit rotation gate (e.g. RY rotation gate). For example, a classical variable or feature a can be encoded by RY (a) which is applied on some initial state (e.g. vacuum state |0 ). In this sense, we can say that the classical information a is encoded into the initial state of a qubit. This type of angle encoding is called one variable/qubit encoding. This approach requires n qubits to encode n input variables. To reduce the required qubits, we can also encode multiple variables by sequential rotations applied on a single qubit. For example, input variables a 1 , a 2 , and a 3 can be encoded using RX(a 1 ), RY (a 2 ), and RZ(a 3 ) rotation gates applied successively on a single qubit. This angle encoding is called multiple variables/qubit encoding or dense angle encoding. In this paper, we focus on one variable/qubit encoding method. Let us denote by E(x) the encoding operator where x is the input vector. Then the encoded quantum state is obtained by It is worth noting that E(x) usually contain the Hadamard gate which transforms the initial state into a uniform superposition state.
• ENTANGLEMENT MODULE. In this module, a cluster of single-and multi-qubit gates are applied to the encoded quantum state obtained from the previous module. Multi-qubit gates are usually CN OT gate and parametric controlled rotation gate (e.g. CRZ(θ) where θ is a trainable parameter), and they are used to generate correlated quantum states, namely entangled states. Single-qubit gates are mainly parametric rotation gates. This combination of single-and multi-qubit gates is referred to as parameterized layer in a QCNN and is designed to extract task-specific features. This parameterized layer is usually repeated multiple times to extend the feature space. If we denote all unitary operations in the entanglement module by U (θ) for simplicity, the output quantum state will be |x, θ = U (θ)|x .
• DECODING MODULE. At this stage, certain local observable A ⊗m (e.g. Pauli Z operator σ ⊗m z ) is measured VOLUME 4, 2016 in the final quantum state |x, θ from the entanglement module, where m is equal or smaller than the total number of qubits n in the quantum system. The expectation value of the chosen observable A ⊗m can be obtained by repeated measurements: f (x, θ) = x, θ|A ⊗m |x, θ So the purpose of this layer is to extract a classical output vector f (x, θ) by using the mapping from the quantum state to a classical vector: This classical vector f (x, θ) can be used as the input features for the subsequent layer in the QCNN.

B. QDCNN
The proposed QDCNN is designed in the same fashion as QCNNs described in literatures [20], [22]. Our model integrates quantum layers with classical layers and the quantum circuit ansatz can be placed anywhere in the model (e.g. at the beginning of the network, at intermediate layers in the network).
The key difference between our method and existing QC-NNs is that the dilated convolution is employed for the quantum convolutional layer. So the quantum layer in QDCNNs is called quantum dilated convolutional (QDC) layer. An example of a QDC layer is illustrated in Fig. 1. Due to the mechanism of dilated convolution , the quantum kernel in our model generally covers larger image patches (i.e. receptive fields ). For example, a 2 × 2 quantum dilated convolution with dilation rate of 3 has a receptive field of 4 × 4 whereas the standard quantum convolution with the same kernel size has only a receptive field of 2 × 2 . It is noteworthy that even though the quantum dilated convolution is able to expand the receptive field the number of data points that are fed into the quantum convolution circuit is the same as the one for the standard quantum convolution. This means that the quantum dilated convolution does not require more qubits than the standard quantum convolution with the same kernel size.
Our QDCNN model has mainly two advantages. Firstly, the QDC layer in our model, thanks to the enlarged receptive field, requires less number of times that the quantum kernel slides across the image (if there is no padding and the stride is the same), compared to the existing QCNN models. This can be understood by comparing (2) and (3) with (5) and (6) respectively. Therefore, using the QDC layer helps reduce the number of quantum circuit executions during the quantum convolution process. In the NISQ era, long training time is one of the biggest challenges facing the QCNN models. This difficulty mainly stems from the large number of quantum circuit executions from the quantum layers. In the quantum feature mapping process, due to the probabilistic characteristics, quantum measurement is usually performed multiple times (e.g. 1024) to get expectation values of some observables which can be considered as the extracted quantum feature maps. So how to reduce the number of quantum circuit executions plays a crucial role in mitigating the longrunning-time problem of QCNNs. Our proposed quantum dilation is a powerful tool to explicitly control the amount of quantum circuit executions in the quantum layer.
The second advantage of our model is that it can improve the performance (e.g. classification accuracy) of existing QCNN models. Due to the expanded receptive field, the QDC layer in our model generally reduce the spatial resolution of the resulting feature maps. However, these feature maps are extracted from larger receptive fields of the image and hence contain long-range context which plays an essential role in many machine learning tasks such as image recognition and image segmentation.

III. EXPERIMENTS
In this section, we conduct two experiments to evaluate the performance of our proposed QDCNN model and compare it with the existing QCNN model. In Experiment A and Experiment B, we construct quantum convolutional models with non-trainable and trainable quantum filters, respectively.

1) Dataset
We choose the image benchmark MNIST and Fashion-MNIST datasets [50], [51] for our experiments. The MNIST dataset contains 10 different classes of handwritten digits from '0' to '9' , while the Fashion-MNIST dataset is a collection of 10 different shapes of t-shirts, dresses, shoes, etc. Both of these datasets have 60,000 training samples, and 10,000 test samples of 28-by-28 gray scale pixel images. Due to the expensive training and validation, we pick two subsets of the entire MNIST and Fashion-MNIST datasets, respectively, both of which consist of 1,000 balanced training samples and 200 balanced testing samples.

2) Tested Models
In this research, we consider two types of models: • QDCNN Model. We employ the architecture of the most basic convolution-inspired hybrid quantum-classical neural network. Our QDCNN model consists of one QDC layer with one filter and one fully-connected layer with 10 neurons. The kernel size and stride for the QDC layer is selected as 2 × 2 and 2 respectively without specification. The quantum circuit ansatz of the QDC layer is designed as below. The 1 variable/qubit encoding scheme is adopted to encode the input image. Specifically, 2 × 2 pixels are encoded into a 4-qubit state using RY rotation gates. Note that, these 2 × 2 pixels are not adjacent to each other in the input image, due to the quantum dilated convolution. The resulting 4qubit state is further transformed by a following random parameterized quantum circuit which might creates the entanglement. The decoding method follows the same spirit of [52], in which each expectation value is mapped to a different channel of a single output FIGURE 1. An example of a quantum dilated convolutional layer (QDC) with kernel size = 2 × 2 and dilation rate = 2. In contrast to standard quantum convolution, dilated convolution is applied to the input with defined gaps (one gap in this example) to enlarge the receptive field. Encoding, entanglement, decoding modules for the QDC layer are highlighted with green, purple, and yellow colors respectively.
pixel. Consequently, even though there is only one filter, the quantum layer can transform the input 2-D image into four feature maps. This type of quantum layer might benefit the model performance as it allows for correlation among channels of the output feature maps. In both cases of MNIST and Fashion-MNIST datasets, the QDC layer extracts from the 28 × 28 input image a feature tensor of size 13 × 13 × 4, which is then transformed to 10 output probabilities by the fullyconnected layer with softmax activation. To evaluate how the dilation rate impacts the model performance, we consider two QDCNN models with dilation rate r = 2 and r = 3. We refer to these two models as QDCNN_r2 and QDCNN_r3 respectively for the rest of the paper.
• QCNN Model. We choose the standard QCNN model as our benchmark model. The QCNN model follows the same structure of our QDCNN model with the only difference that it uses standard quantum kernel rather than dilated quantum kernel.
The random quantum circuit in each of these models consists of two 4-qubit random layers, each of which has four nontrainable or trainable parameters. For fair comparison, all of these random circuits share the same architecture generated by the same random seed.

3) Training Setup
In Experiment A, after applying the non-trainable quantum filter to transform the original image data into feature maps, we use a mini-batch of 32 and Adam optimizer with a learning rate of 0.01 to train each model for 30 epochs. In Experiment B, due to the computational cost of training parametric quantum circuits involved in the trainable quantum filter, we reduce the batch size to four and train all models for 20 epochs with other hyperparameters remaining unchanged.

4) Experimental Environment
Experiments are conducted on the local computer with a 6core CPU (2.2 GHz) by using PennyLane [53], Qulacs [54], and PyTorch [55]. PennyLane is an open source pythonbased framework that enables the automatic differentiation for hybrid quantum-classical computations. It is compatible with mainstream machine learning frameworks such as TensorFlow [56] and PyTorch, and it has a large plugin ecosystem which offers access to numerous quantum devices (i.e. simulators and hardware) from different vendors including IBM, Google, Microsoft, Rigetti, and QunaSys. In Experiment A, we perform the quantum processing of the original image data by using the Qulacs simulator [57] which is a high-performance C++ quantum simulator and made available through the community contributed PennyLane-Qulacs plugin [58]. In Experiment B, considering the large amount of quantum circuit executions required in the scheme of parameter-shift rule, we train all hybrid models by using instead the built-in Pennylane simulator default.qubit which supports back-propagation method for PyTorch interface.

B. RESULTS
As demonstrated in Table 1 and  13 × 13 quantum circuits need to be executed for both the QDC layers with dilation rate = 2 and dilation rate = 3 while 14 × 14 quantum circuits for standard quantum convolutional layer. This means that QDCNN_r2 and QDCNN_r3 require 27 fewer quantum circuit executions per image than QCNN models. Compared with Experiment A, it generally takes much longer time to train hybrid models in Experiment B, even though the quantum filter in each of them has only eight trainable parameters coming from the random circuit. This is mainly due to the fact that PennyLane does not support vectorization for quantum circuit executions. Nevertheless, quantum dilated convolution can still help reduce the training time significantly in this case. Furthermore, it can also be seen from Table 1 and Table 2 that our QDCNN models generally enjoy higher recognition accuracy than the QCNN model. In particular, QDCNN_r3 achieves the best performance with regard to both validation loss and accuracy across all tasks. In light of the QDC layer with dilation rate of 3, QDCNN_r3 provides up to 31.74% lower validation loss and up to 3% higher validation accuracy, compared with the QCNN model. This observed model performance boost mainly stems from the contextual information in larger scales captured by the QDC layer.

IV. CONCLUSIONS
In this work, we propose the QDCNN model, which adopts the idea of dilated convolution in deep learning to the quantum neural network. We show through empirical evidence that the QDCNN model outperforms the recent QCNN method in terms of computation time and recognition accuracy. In particular, we find that the quantum dilated convolution with a larger dilation rate generally contribute to a better model performance. Dilated convolution has been extensively studied in the area of deep learning, but little work has been done to explore it in the context of quantum machine learning. Our work constitutes a first step in this direction. With the promising results on both MNIST and Fashion-MNIST datasets, our QDCNN approach deserves further investigation in the future.