I. Introduction
Convolutional Neural Network (CNN) is recently given great attention because of its extended applications in image classification [1], segmentation, and other computer vision problems. CNN usually consists of 2 parts: features extracting part – made of convolutional layers and pooling layers, and classifying part – which contains lots of stacked fully connected layers. In the first part, kernels in convolutional layers scan the input image step by step, multiplying the weights in each kernel by the pixels’ values and combining the sum to create a new image passed to the next layer. Pooling layers play a role in down-sampling to reduce the number of data and save computational resources. In the second part, the image first passes through a flatten layer to be converted to a one-dimensional array. The following fully connected layers use this array as input and produce the predicted label by applying the linear combination and the non-linear activation function.