A Frequency-Domain Convolutional Neural Network Architecture Based on the Frequency-Domain Randomized Offset Rectified Linear Unit and Frequency-Domain Chunk Max Pooling Method

It is of great importance to construct a convolutional neural network architecture in the frequency domain to explore the theory of deep learning in the frequency domain. However, due to the complexity of the construction mechanism of the forward and backward pipelines needed to train the convolutional neural network in the frequency domain, higher requirements are put forward for the representation strategy of the frequency-domain activation function and pooling method in the forward and backward pipelines. Therefore, to construct a full frequency-domain convolutional neural network architecture, it is necessary to construct a frequency-domain representation strategy with a high classification accuracy and excellent time performance. In this paper, based on a chunk decomposition mechanism and the construction principle of the frequency-domain unsaturated activation function, a frequency-domain convolutional neural network architecture is proposed. Two important representation strategies are introduced into the frequency-domain forward/backward pipeline: a frequency-domain randomized offset rectified linear unit and a frequency-domain chunk max pooling method. The former can alleviate the vanishing and exploding gradient phenomena in the frequency-domain forward/backward pipeline and ensure the convergence of the convolutional neural network architecture in the frequency-domain training stage; the latter can capture the partial location information and characteristic strength of the frequency-domain neurons and improve the classification performance of the convolutional neural network in the frequency domain. This full frequency-domain convolutional neural network architecture improves the training accuracy of the convolutional neural network in the frequency-domain pipeline. The results show that on the basis of ResNet-50 as the backbone framework, an NVIDIA GeForce CUDA(Compute Unified Device Architecture) as the training pipeline, and $4\times 4$ as the activation block size of the third-level output neuron’s characteristic parameter matrix, the convolutional neural network architecture proposed in this paper can lower the top-1 error from 24.90% to 17.95%, the top-5 error from 12.85% to 9.23%. Furthermore, when the batch size is equal to 128 (in the worst-case bandwidth usage scenario), the acceleration ratio of the proposed architecture can still reach 13.0375 by selecting cuDNN as the reference model. Under the same backbone framework, the proposed architecture is tested on the MetData-1 dataset, and the classification accuracy can reach the maximum value; that is, the average difference is merely 0.18. This finding shows that the proposed architecture can improve the accuracy of the deep learning-based frequency-domain convolutional neural network model without reducing the time performance and expand the frequency-domain representation strategy of the frequency-domain activation function and pooling method.


I. INTRODUCTION
As an important deep learning framework, convolutional neural networks are widely used in many artificial intelligence fields, such as object classification, speech recognition, target tracking and automated driving [1]- [9]. How to improve the training accuracy and training speed of convolutional neural networks has become an important research topic in this field [10]- [12]. Due to the rapid development of artificial intelligence technology, the training accuracy and speed of time-domain convolutional neural networks have far exceeded the original character recognition scope. Now, they can more accurately carry out semantic classification and rapid detection of targets than before [13]- [16].
In the last decade, time-domain convolutional neural networks have been used as a benchmark framework for large-scale data processing, complex data computing and object semantic classification [17]- [22], and their construction theory and training reasoning methods have increasingly improved [23]- [25]. At present, the most popular time-domain convolutional neural networks are presented in reference [26], reference [27] and reference [28]. A convolution operation is used to extract the feature parameters of neurons in all three frameworks, and the convolution operation takes up much GPU bandwidth and has high computational complexity. It can be seen that the training and reasoning of convolutional neural networks in the time domain usually require high computational costs. To improve the training accuracy and speed of time-domain convolutional neural networks, researchers have proposed to directly employ the fast Fourier transform to convert the convolutional operation in each convolutional layer into a frequency-domain point product operation; the convolutional neural network based on this method is called the FFT-based CNN. At present, the widely used FFT-based convolutional neural network framework includes cuFFT proposed by NVIDIA, fbFFT proposed by Facebook and the FFT-based fast training network proposed by Mathieu et al. [29]- [31]. The FFT-based CNN is only suitable for accelerating the training of shallow convolutional neural networks with large-scale filter kernels. However, the most advanced convolutional neural network framework at present mostly adopts the deep convolutional neural network architecture with a small-scale filter kernel [32]- [35]. Network architectures with smaller filter kernels and deeper layers are more conducive to training acceleration. Because the FFT-based convolutional neural network needs to perform a fast Fourier transform and an inverse transform on the input and output of each convolution layer, it needs much GPU memory bandwidth, and the training time will be greatly extended. Therefore, the convolutional neural network based on FFT is not suitable for high-precision classification tasks with high real-time requirements. Therefore, researchers began to explore how to transfer the training process of convolutional neural networks from the time domain to the frequency domain. Ripple et al. [36] proposed a spectrum representation strategy of a convolutional neural network, which uses direct spectrum truncation to complete the downsampling operation of neuron characteristic parameters. This method improves the training speed by eliminating the high-frequency part of the neuron, but the training accuracy of the network will be reduced accordingly [37]. Ko et al. [38] proposed a discrete synchronous interpolation method to accelerate the frequency-domain calculation of a convolutional neural network. In this method, the tanh or sigmoid function is selected as the activation function. These two functions are saturation functions. When performing high-precision classification tasks, the backpropagation accuracy of the weight gradient will decrease, which easily causes gradient explosion in the frequency domain. In this case, it is often necessary to sacrifice the classification accuracy in exchange for a high-speed training process. To summarize, to build an effective full frequency-domain training convolutional neural network architecture, we need to complete two precision improvement tasks: first, build an effective frequency-domain pooling function [39]- [43] to ensure that the classification accuracy of the frequency-domain convolutional neural network architecture is not affected; and second, build an effective unsaturated activation function to alleviate the gradient vanishing problem in the frequency-domain backpropagation process and then ensure that the training process of the convolutional neural network in the frequency domain easily converges [44]- [49].
Based on the above two points, this paper proposes a full frequency-domain convolutional neural network architecture, which can be used in the high-precision training process of frequency-domain convolutional neural networks under three kinds of backbone frameworks. In this paper, the full frequency-domain convolutional neural network architecture includes two important aspects. First, to improve the classification accuracy, the high-precision frequency-domain unsaturated activation function and frequency-domain pooling operation are constructed, which enrich the frequencydomain application of deep learning theory. Second, in terms of training acceleration, a frequency-domain forward pipeline and a frequency-domain backward pipeline are constructed, and a fast frequency-domain training benchmark architecture is implemented to ensure the fast training process of the whole frequency-domain network. In this paper, we propose a frequency-domain randomized offset rectified linear unit and a frequency-domain chunk max pooling operation. Based on these two representation strategies, an accurate full frequency-domain convolutional neural network architecture is constructed, as shown in Figure 1. Based on the study of the chunk decomposition mechanism and the construction principle of the frequency-domain unsaturated activation function, the frequency-domain training process of the convolutional neural network is realized without reducing the classification precision or excessively relying on the fast Fourier transform method. First, we introduce a frequency-domain randomized offset rectified linear unit into the and frequencydomain forward and backward pipelines of the architecture to alleviate the vanishing and exploding gradient problems of the frequency-domain neuron characteristic parameters in the forward/frequency-domain backward pipeline. Second, the chunk max pooling method is proposed to capture the partial location information and feature strength of the frequency-domain neurons, which can improve the classification performance of the frequency-domain convolutional neural network architecture. Finally, according to the two representation strategies of the frequency-domain forward pipeline, the representation strategy of the frequency-domain backward pipeline is constructed, and the frequency-domain inverse randomized offset rectified linear unit and frequencydomain inverse chunk max pooling method are obtained. The rest of this paper is arranged as follows: in the second section, the full frequency-domain convolutional neural network architecture proposed in this paper is summarized. Here, we call the full frequency-domain architecture FCNN (frequency-domain convolutional neural network) for short; in the third section, the frequency-domain forward pipeline structure of FCNN is given, and the frequencydomain randomized offset rectified linear unit (FRReLU) and the frequency-domain chunk max pooling method (Fcmp) are introduced. In the fourth section, the structure of the FCNN frequency-domain backward pipeline is given, and the frequency-domain inverse randomized offset rectified linear unit (FRReLU-1) and frequency-domain inverse chunk max pooling method (Fcmp-1) are presented. In the fifth section, the tests are given and analysed carefully. At last, the conclusion is drawn in the sixth section.

II. OVERALL FRAMEWORK
The FCNN architecture is a full frequency-domain convolutional neural network that completes the time-frequency conversion of the neuron's characteristic parameters in the initialization and fully connected layers, while the point product operation, activation function and pooling operation of the neuron's characteristic parameters are all completed in the frequency domain. The FCNN architecture consists of a frequency-domain forward pipeline and a frequency-domain backward pipeline, as shown in Figure 2. The latter is divided into two sub-pipelines: F-b propagation pass-1 and F-b propagation pass-2. F-b propagation pass-1 is used for the reverse transmission of the loss deviation value of the output neuron's characteristic parameter, and F-b propagation pass-2 is used for the reverse transmission of the neuron weight parameter. In the frequency-domain forward pipeline, the FCNN architecture uses the frequency-domain randomized offset rectified linear unit (FRReLU) as the unsaturated activation function to extract the non-linear frequency-domain characteristics of the output neuron's characteristic parameters and uses the frequency-domain chunk max pooling method (Fcmp) as the frequency-domain downsampling operation to retain part of the location information and feature strength of the frequency-domain neurons. In the frequency-domain backward pipeline, the FCNN architecture uses the frequency-domain inverse randomized offset rectified linear unit (FRReLU-1) to transmit the output neuron's characteristic parameter gradient and the convolution kernel's parameter gradient to the convolution neuron of the previous layer, providing the numerical offset for the parameter adjustment in the frequency-domain convolutional neural network and using the frequency-domain inverse chunk max pooling method (Fcmp-1) as the frequency-domain downsampling operation. The gradient block is propagated to the neurons of the pipeline in the frequency-domain backward pipeline to generate the loss deviation value in the frequency domain, and then the weight parameters of the convolutional neural network in the frequency domain are adjusted.
In the frequency-domain forward pipeline of the FCNN architecture, the output neuron's characteristic parameter 98128 VOLUME 8, 2020 matrix is the sum of the point product of the input neuron's characteristic parameter matrix and multiple filter kernels. In frequency-domain backward sub-pipeline 1 of the FCNN architecture, the loss deviation value of the input neuron's characteristic parameter (also called the loss gradient value) is the sum of the dot product of the transposition of the filter kernel and the deviation value of neuron's characteristic parameter. Similarly, the loss gradient of the filter kernel is the sum of the point product of the loss gradient of the input neuron's characteristic parameter and the loss gradient of the output neuron's characteristic parameter in the frequency-domain backward sub-pipeline 2 of the FCNN architecture. It should be noted that the matrix dot product in the frequency-domain forward and backward pipelines of the FCNN architecture is converted from a convolution operation in the time domain, and the conversion method is based on the base-2 fast Fourier transform. The FCNN architecture only uses a fast Fourier transform once during initialization and once in the fully connected layer, while the matrix dot product is used in the whole pipeline training process. This method inherits the advantages of the traditional FFT-based convolutional neural network and avoids the time-consuming system overhead of using an FFT repeatedly in each convolution layer. To realize the whole training process in the frequency domain, the core of the FCNN architecture is to design the activation function fully and the pooling layer in the frequency domain. In this paper, four frequency-domain representation strategies are proposed: the frequency-domain activation function, the frequency-domain inverse activation function, the frequency-domain pooling operation and frequency-domain the inverse pooling operation, which are used in the training process of the frequency-domain forward pipeline and frequency-domain backward pipeline of the FCNN. First, a frequency-domain randomized offset rectified linear unit (FRReLU) is proposed. FRReLU is used as the frequency-domain activation function of the FCNN architecture in the frequency-domain forward pipeline to extract the non-linear characteristics of the output neuron's characteristic parameters in the frequency-domain forward pipeline. Accordingly, a frequency-domain inverse randomized offset rectified linear unit (FRReLU-1) is proposed. FRReLU-1 is used as the frequency-domain inverse activation function of the FCNN architecture in the frequency-domain backward pipeline to extract the non-linear characteristics of the input neuron's characteristic parameter loss gradient value in the frequency-domain backward pipeline. FRReLU and FRReLU-1, as the frequency-domain activation functions of the FCNN architecture in the frequency-domain forward and backward pipelines, respectively, can effectively alleviate the gradient explosion problem of FCNN architecture in the frequency-domain forward pipeline training process and alleviate the gradient disappearance problem of FCNN architecture in the frequency-domain backward pipeline training process to ensure the convergence of FCNN architecture. Second, a frequency-domain chunk max pooling method (Fcmp) is proposed. As the frequency-domain pooling operation of the FCNN architecture in the frequency-domain forward pipeline, Fcmp is used to capture the coarse-grained location information and feature strength of the frequencydomain neurons. This coarse-grained location information retention mechanism can improve the classification performance of the FCNN architecture to a certain extent. Correspondingly, a frequency-domain inverse chunk max pooling method (Fcmp-1) is proposed. Fcmp-1 is used as the frequency-domain inverse pooling operation of the FCNN architecture in the frequency-domain backward pipeline. It is used to sample the gradient value of the input neuron's characteristic parameters and extract part of the position information and characteristic strength of the loss gradient value of the frequency-domain neuron characteristic parameters to ensure that the loss gradient value of the frequency-domain neuron characteristic parameters is effectively propagated in the frequency-domain backward pipeline. These four frequency-domain representation strategies can ensure that the training and reasoning process of the FCNN architecture is completed in the frequency domain efficiently. The FCNN's detailed implementation will be described in next section.

III. THE FREQUENCY-DOMAIN FORWARD PIPELINE (F-F PROPAGATION PASS)
In the training framework of the convolutional neural network in the frequency domain, the frequency-domain pipeline that transfers the parameters of the input neurons in the convolution layer to the frequency-domain architecture of the hidden layer before the convolution layer is called the frequency-domain forward propagation pass for short. The implementation process of the frequency-domain forward pipeline is composed of three parts: (1) for the input neuron parameter (input), a base-2 fast Fourier transform is performed, the input neuron parameter is converted into the frequency-domain form and is recorded as F(x); similarly, the convolution kernel parameter (filter) is converted into the frequency-domain form and is recorded as F(w); the dot product of the two frequency-domain parameters is calculated and is recorded as F(y). (2) The frequency-domain randomized offset rectified linear unit (FRReLU) is used as the activation function of the frequency-domain forward pipeline, and the value of the frequency-domain activation function of the point product result F(y) in the previous step is calculated, which is the frequency-domain output neuron's characteristic parameter (output). Note that the characteristic parameters of the output neuron and the input neuron have the same scale in the two-dimensional vertical and horizontal parameters.
(3) Using the frequency-domain chunk max pooling method, the output neuron's characteristic parameters are pooled. The results of the pooling operation will be the input parameter of the fully connected layer, and finally, the frequencydomain classification results are obtained. In this section, we will introduce the architecture and execution process of the frequency-domain forward pipeline in detail.

A. THE FREQUENCY-DOMAIN RANDOMIZED OFFSET RECTIFIED LINEAR UNIT (FRRELU)
The time-domain activation operation is the mathematical expression of the neuron's characteristic parameters, which is used to inject non-linear time-domain characteristics into convolutional neurons. At present, the representative time-domain activation functions are divided into saturated activation functions and unsaturated activation functions.
Saturated activation functions include the sigmoid function and tanh function. Unsaturated activation function includes the ReLU function and the ELU function. In the frequencydomain training framework, the frequency-domain variations of these activation functions can be used to extract the non-linear characteristics of neurons' characteristic parameters, but they cannot be used in the training and testing process of convolutional neural network pipelines in the full frequency domain. Therefore, we need to design an effective frequency-domain activation function to extract the non-linear frequency-domain characteristics of the output neuron's characteristic parameters. In this paper, a frequencydomain randomized offset rectified linear unit function is used to inject non-linear frequency-domain characteristics into convolution neurons of frequency-domain pipelines.
Before we introduce the functions, we define the variables. In a convolution layer, the input neuron's parameters are represented by the x qp with subscripts: subscript q is the convolution layer, and subscript p is the index of the input neuron's parameters in the q-layer convolution. The size of the input neuron's parameters is represented by n qp1 ×n qp2 . The number of input neuron parameters is represented by f . The convolution layers contained in the frequency-domain forward pipeline are denoted by L. Similarly, the characteristic parameters of the output neuron are represented by the variable y qp , and the subscript p is the index of the output neuron's parameters in the q-layer convolution. The size of the output neuron parameters is expressed by m qp 1 × m qp 2 . The number of output neuron parameters is represented by f . Convolution kernel parameters are represented by the variable w qp , and the subscript p is the index of convolution kernel parameters in the q-layer convolution. The size of the convolution kernel parameters is expressed by k qp 1 × k qp 2 . In the time-domain training framework, the randomized offset rectified linear unit function adds linear activation values with different offset values to the negative semi-axis of the time domain, which is used to represent the unsaturated state of neuron characteristic parameters and to ensure the effective transmission of neuron characteristic parameters in the forward time-domain pipeline. Based on this characteristic, we design the frequency-domain randomized offset rectified linear unit function to represent the unsaturated state of neuron characteristic parameters in the frequency-domain forward pipeline. The function is defined as follows: where Y qp is the dot product of the input neuron's parameters and the convolution kernel's parameters.  Fourier coefficients of the characteristic parameters of the output neurons in the frequency domain. Here, we divide the linear unit function of the frequency-domain randomized offset rectifier into two parts: the positive semi-axis function and the negative semi-axis function. The former selects Fourier coefficient term whose result is positive, while the latter selects the Fourier coefficient term whose result is negative. Therefore, the linear unit function of the frequency-domain randomized offset rectifier in formula (1) can be rewritten as follows: where FR+(·) represents the positive semi-axis function and FR+(·) represents the negative semi-axis function. Y qp is the discrete Fourier transform value of the characteristic parameter of the output neuron, so the Fourier expansion of Y qp is as follows: where m qp is the size of the output neuron's parameter; that is, m qp = m qp 1 × m qp 2 . U is used to index the output neuron parameters. As shown in Figure 3, the output of the randomized offset rectified linear unit saturates in the linearly increasing sequence of VOLUME 8, 2020 the positive semi-axis and the linearly decreasing sequence of the negative semi-axis. In the same way, the output of the linear unit function of the frequency-domain randomized offset rectifier is saturated along the equiangular sampling point of the helix in the z-plane. In the z-plane, the time-domain eigenvalue y qp represents the linear distance between the origin and the equiangular sampling point and is also the Fourier coefficient term of the positive semi-axis function FR + . Therefore, to filter the positive coefficient term of Y qp , the FR + function is defined as follows: where · is the absolute value operator. FR-includes negative coefficient term of the characteristic parameter of the output neuron in the frequency domain. When the negative coefficient term of the output neuron's characteristic parameter contains negative random parts, equation (3) cannot be used to characterize these random values. Therefore, we redefine formula (3) as follows: where a is the randomized offset of the non-zero linear activation value added to the negative semi-axis. To filter the negative coefficient term of y qp , the FR function is defined as follows: Finally, the linear unit function of the frequency-domain randomized offset rectifier (FRReLU) is derived by the superposition of formula (4) and formula (6): The output of the frequency-domain randomized offset rectified linear unit function adds the negative coefficient term of the frequency-domain output neuron's characteristic parameter. Under the action of the positive coefficient and negative coefficient, it can effectively represent the unsaturated state of the neuron's characteristic parameters, ensure the effective transmission of the neuron's characteristic parameters in the forward time-domain pipeline, and maintain the computational complexity at a low level. For example, in Figure 3, while FR(·) only includes the negative coefficient term of the frequency-domain output neuron's characteristic parameter, the subscripts of the negative coefficient terms are 1, 3,5,6,7,10,14,19,22,26,28,30,32 36 35 ), then the subscripts of the positive coefficient terms are 0, 2,4,8,9,11,12,13,15,16,17,18,20,21,23,24,25,27,29,31,34 and 35, while the negative coefficient terms are not included. Therefore, the negative semi-axis function proposed in this paper removes the positive parts of the output neuron's characteristic parameter, while the positive semi-axis function removes the negative parts of the output neuron's characteristic parameter. Therefore, the proposed frequency-domain randomized offset rectified linear unit calculates the positive or negative parts of the output neuron's characteristic parameters at the same time. This strategy normalizes the negative coefficient terms of the output neuron's characteristic parameters, reduces the computational complexity in the frequency domain, and improves the learning accuracy of the frequency-domain forward pipeline.

B. THE FREQUENCY-DOMAIN CHUNK MAX POOLING METHOD (FCMP)
Max pooling is the most common downsampling operation in the CNN framework. However, in the training framework of a convolutional neural network in the frequency domain, the location information of the feature parameters of neurons in the frequency domain is completely lost in this step, and the strength information of the same neuron feature will also be screened out. Therefore, building an effective frequency-domain pooling method that can maintain part of the location information and feature strength of frequency-domain neurons is an urgent task. In this paper, the frequency-domain chunk max pooling method is proposed to capture the coarse-grained location information and characteristic strength of neurons in the frequency-domain. This mechanism can improve the classification performance to some extent (see Section 5). For example, the traditional frequency-domain max pooling method only extracts the most characteristic value from a group of neuron eigenvalues for a single convolution layer, discards the location and strength information of other neurons, and reduces the classification accuracy of the frequency-domain CNN. The frequency-domain chunk max pooling method segments all the output neuron's feature parameters for a single convolution layer and extracts a maximum feature value from each segment. The extracted multiple feature values can represent the situation in which the same type of feature appear multiple times and the relative order of these feature values can also be preserved. The output of the frequency-domain chunk max pooling operation will be used as the input parameter of the fully connected layer, and finally, the frequency-domain classification result value will be obtained.
In the frequency-domain chunk max pooling layer, the output neuron's characteristic parameters (Y qp ) for each convolution layer are divided into a set of chunks (Y qp k ), where k is used to index these chunks, the size of each chunk is 98132 VOLUME 8, 2020 represented by l qp k 1 × l qp k 2 , and the number of chunks is represented by M p . F(·) represents FFT. The chunk max pooling layer is located behind the activation function of each layer to retain part of the location information and feature strength of the neurons in the frequency domain. Here, we express the chunks contained in each layer as follows: where y qp k is the time-domain output neuron's characteristic parameter block, which forms a mapping relationship with the frequency-domain output neuron's characteristic parameter block Y qp k . The complete frequency-domain output neuron's characteristic parameter matrix y qp is expressed by the following formula: Therefore, when the initial frequency-domain output neuron's characteristic parameter matrix is divided into multiple chunks, the max pooling operation of each layer no longer works on the whole output neuron's characteristic parameter matrix but separately performs the max pooling operation on multiple chunks. We rewrite formula (8) as follows: where A ap k represents the activation value of each layer's output neuron's characteristic parameter chunk, which is denoted by the activation chunk. Fdown(·) represents the frequency-domain pooling operation. The frequency-domain pooling operation contains max and average pooling operations, both of which can reduce the dimensions of the training parameters and reduce the overfitting phenomenon. This paper takes the frequency-domain max pooling operation as an example to expand the theory to show that the average pooling operation is similar. Because the characteristic parameters of the output neurons in the frequency domain are divided into several chunks, we need to build a unified max pooling operation in the frequency domain for each activation chunk in the convolution layer. The frequency-domain chunk max pooling method extracts the Fourier coefficient with the largest value from multiple neighbourhoods contained in each activation chunk and replaces the other coefficients in multiple neighbourhoods with this maximum value, as shown in Figure 4. Therefore, formula (10) can be rewritten as follows: i qp k n∩ (11) In the formula, β is used to index the Fourier coefficients with eigenvalues greater than zero in each activation chunk.
∩ is used to index the Fourier coefficient with the largest value extracted from multiple neighbourhoods of each activation chunk. The size of the pooled activation chunk is ((l qp k 1 − k qp 1 )/str1 + 1) × ((l qp k 2 − k qp 2 )/str2 + 1), and U ∈ [0, (l qp k − k qp )/str]. L qp k is used to represent the size of the k -th activation chunk of the p -th output neuron's characteristic parameter in the q-th layer; that is, l qp k = l qp k 1 × l qp k 2 . str1 is the horizontal displacement of the activation chunk, str2 is the vertical displacement of the activation chunk, and str = str1 × str2.
Similarly, the frequency-domain chunk average pooling formula can be further rewritten as: where Fcap(·) represents the frequency-domain chunk average pooling operation. In the frequency-domain chunk average pooling operation, the number of activation chunks corresponding to each output neuron characteristic matrix is fixed regardless of the value of ∩. Therefore, ∩ is set to a random constant. In summary, the output of the frequency-domain chunk average pooling operation is: In the frequency-domain forward pipeline, the frequencydomain chunk max pooling method is used to pool the output neuron's characteristic parameters of each layer, effectively capturing part of the location information and feature strength of the frequency-domain neurons. The output of the last layer pooling operation will be the input parameter of the fully connected layer, and finally, the frequency-domain classification results will be obtained. In addition, the size of the activation chunk of each layer is only two-thirds the size of the output neuron's characteristic parameter matrix, which is convenient for applying the frequency-domain pooling method to the GPU and FPGA hardware (parallel acceleration environment). In other words, in the frequencydomain forward-training pipeline, the frequency-domain chunk max pooling method has two significant features: (1) the frequency-domain chunk max pooling method can extract the coarse-grained location information of frequencydomain neurons and the number of occurrences of the same feature type (feature strength); (2) the frequency-domain chunk max pooling method can retain the local neuron feature information.
At the end of this section, based on FRReLU (see Section 3.1) and Fcmp (see Section 3.2), the working steps of the frequency-domain forward pipeline are presented as follows.
Step 1: To reduce the confusion of the convolution operation in the frequency domain, it is necessary to expand the dimension of the input neuron's characteristic parameters and VOLUME 8, 2020 filter kernel, and the expansion results are as follows: Step 2: Calculate the N-point FFT value (w qp (n)) in the forward pipeline, i.e.,w qp (U ) = F(w qp (n)) = FFT(w qp (n)).
Step 5: Take the result of step 4 as the input of FRReLU and calculate the neuron characteristic parameter matrix. That is, A qp (U ) = FRReLU(Y qp (U )).
Step 6: Take the result of step 5 as the input of the frequency-domain chunk max pooling method (Fcmp) and calculate the activation chunk of the frequency-domain output neuron's characteristic parameter, namely, Y qp k (U ) = Fcmp(A qp k (U )). Note that in the frequency-domain chunk max pooling layer, the output neuron's characteristic parameter (Y qp (U )) is decomposed into a fixed number of activation chunks (Y qp k (U )).
Step 7: In the frequency-domain forward pipeline, repeat step 2 to step 6 until the last layer, perform the inverse Fourier transform operation on Y qp (U ) to obtain the value of y qp (n) and output it; i.e., y qp (n)

IV. THE FREQUENCY-DOMAIN BACKWARD PIPELINE (F-B PROPAGATION PASS)
In the training framework of a convolutional neural network in the frequency domain, the frequency-domain pipeline that transfers parameters backward refers to the frequencydomain architecture that transfers the loss deviation value of the output neuron's characteristic parameters to the input neuron's characteristic parameters, which is called the frequency-domain backward propagation pass for short. The implementation process of the frequency-domain backward pipeline consists of three parts: (1) for the gradient of the output neuron's characteristic parameter, a base-2 fast Fourier transform is executed, and the result is converted into frequency-domain form and is denoted by F( ∂L ∂y ); similarly, the transposition value of the convolution kernel parameter gradient is converted into the frequency-domain form and is denoted by F( ∂L ∂w );the dot product of the two frequencydomain gradients is calculated and is denoted by F( ∂L ∂x ). (2) FRReLU-1 is used as the activation function of the pipeline in the frequency-domain backward pass, and the value of the activation function in the frequency domain of the point product result F( ∂L ∂x ) in the previous step is calculated, which is the gradient of the characteristic parameter of the input neuron in the frequency domain. (3) Using the frequency-domain inverse chunk max pooling method, an inverse pooling operation is performed on the input neuron's characteristic parameter gradient. The result of the inverse pooling operation will be used as the input offset value of the convolution layer to adjust the training parameters of the convolution layer. In this section, we will introduce the architecture and execution process of the frequency-domain backward pipeline in detail.
In the frequency-domain backward pipeline, the inverse activation function in the frequency domain is used to alleviate the gradient explosion phenomenon of the convolutional neural network and to transfer the loss deviation value of the output neuron's characteristic parameters quickly and inversely. In this paper, we propose a frequency-domain inverse randomized offset rectified linear unit, which is used to transmit the characteristic parameter gradient and convolution kernel parameter gradient of the output neuron to the convolution neuron of the previous layer and provide a numerical offset for the parameter adjustment of the frequency-domain convolutional neural network.
For a layer in the backward pipeline in the frequency domain, the gradient of the input neuron's characteristic parameter is denoted by ∂L ∂x qp . Similarly, the gradient of the characteristic parameters of the output neuron is denoted by ∂L ∂y qp , and the gradient of the convolution kernel is denoted by ∂L ∂w qp . The linear unit function of the frequency-domain inverse randomized offset rectifier is expressed as follows: where ∂L ∂X qp is the Fourier transform value of the gradient of the characteristic parameter of the input neuron. FR −1 (·) is a linear unit function of the inverse randomized offset rectifier. In the frequency-domain backward pipeline, the inverse randomized offset rectified linear unit function in the frequency domain is divided into two parts: the inverse positive and inverse negative semi-axis functions. The inverse positive semi-axis function selects the positive Fourier coefficient term of the characteristic parameter gradient of the input neuron. Therefore, the frequency-domain inverse randomized offset rectified linear unit function can be rewritten: The Fourier expansion equation of the gradient of the characteristic parameter of the input neuron is given here: where n qp is the size of the input neuron's characteristic parameter; i.e., n qp = n qp1 × n qp2 . To filter the positive coefficient terms of ∂L ∂X qp , the FR −1 + (·) function is defined as follows: where ε 0 and ε j are used to index the positive Fourier coefficient terms of the input neuron's characteristic parameters. When the negative coefficient terms of the input neuron's characteristic parameter gradient contain negative random values, formula (17) cannot be used to represent these negative random values. Therefore, we redefine the equation (17) as: ∂L where a is the randomized offset of the non-zero linear activation value added to the negative semi-axis. To filter the negative coefficient terms of ∂L ∂X qp , the FR −1 − (·) function is defined as follows: Finally, the linear unit function (FRReLU −1 ) of the frequency-domain inverse randomized offset rectifier is derived by the superposition of formula (18) and formula (20): + · · · + a(6)e −j 2π 36 33 ; the positive coefficient terms are not included. In contrast, when the FR −1 (·) includes the positive parts of the characteristic parameter gradient of the input neuron in the frequency domain, a −1 00 = 2.5e −j 2π 36 0 35 ; the negative coefficient terms are not included. Therefore, the inverse negative semi-axis function proposed in this paper removes the positive parts of the characteristic parameter gradient of the input neuron, and the inverse positive semi-axis operation removes negative parts of the characteristic parameter gradient of the input neuron. Therefore, the proposed inverse randomized offset rectified linear unit function calculates the negative coefficient terms for input neuron's characteristic parameter gradient at the same time. This strategy normalizes the input neuron's characteristic parameter gradient, reduces the calculation complexity in the frequency domain, and improves the learning accuracy of the frequency-domain backward pipeline.

B. THE FREQUENCY-DOMAIN INVERSE CHUNK MAX POOLING METHOD (FCMP −1)
A frequency-domain inverse chunk max pooling layer is proposed to pool the gradient of the input neuron's characteristic parameter and extract the coarse-grained location information and feature strength of the frequency-domain neuron's characteristic parameter gradient to ensure the effective propagation of the frequency-domain neuron's characteristic parameter gradient in the frequency-domain backward pipeline. In the frequency-domain inverse chunk max pooling method, the gradient of all input neuron's characteristic parameters in a single convolution layer is segmented, and the loss deviation value (gradient chunk) in each segment is transferred to the specific area of the original input neuron's characteristic parameters. In addition, the inverse chunk max pooling method in the frequency domain propagates the gradient chunk to the neurons of the pipeline in the frequencydomain backward pass, generates the loss deviation value in the frequency domain, and then adjusts the weight parameters of the convolutional neural network in the frequency domain.
In the frequency-domain inverse chunk max pooling layer, the input neuron's characteristic parameter gradient ( ∂L ∂X qp ) of each convolution layer is divided into a set of chunks ( ∂L ∂X qpk ), where k is used to index these chunks, the size of each chunk is represented by l qpk1 × l qpk2 , and the number of chunks is represented by M p . The inverse chunk max pooling layer is located behind the fully connected layer and the FRReLU-1 layer to retain part of the location information and feature strength of the characteristic parameter gradient of the neurons in the frequency domain.
Here, we express the chunks contained in the characteristic parameter gradient of neurons in the frequency domain as follows: where ∂L ∂x qpk is the gradient chunk of the characteristic parameters of the input neurons in the time domain. The characteristic parameter gradient ∂L ∂X qp of an input neuron in the frequency domain can be expressed as follows: When the initial gradient matrix of the input neuron's characteristic parameters in the frequency domain is divided into multiple chunks, the inverse max pooling operation in the backward pipeline no longer works on the whole gradient matrix of the input neuron's characteristic parameters but performs the inverse max pooling operation on multiple chunks. Therefore, we rewrite formula (22) as follows: where ∂L ∂X qpk is the frequency-domain inverse pooling gradient chunk. Aqpk represents the inverse activation value of the gradient chunk of the input neuron's characteristic parameter of each layer, which is called the inverse activation chunk for short. Fcmp −1 extracts the Fourier coefficient with the largest value from multiple neighbourhoods contained in each input neuron's characteristic parameter gradient chunk and replaces other coefficients in multiple neighbourhoods with this maximum value, as shown in Figure 6. The frequency-domain inverse chunk max pooling function in formula (24) can be rewritten as follows: l qpk n∩ (25) where β −1 is used to index the Fourier coefficients whose eigenvalues are greater than zero in each gradient chunk. ∩ is used to index the Fourier coefficient with the largest value extracted from multiple neighbourhoods of each gradient chunk. The size of the inverse pooling gradient chunk is ((l qpk1 − k qp 1 )/str1 + 1) × ((l qpk2 − k qp 2 )/str2 + 1), and U ∈ [0, (l qpk − k qp )/str]. L qpk is used to represent the size of the k-th gradient chunk of the p-th input neuron's characteristic parameter gradient matrix in the q-th layer; that is, l qpk = l qpk1 × l qpk2 . str1 is the horizontal displacement of the gradient chunk, str2 is the vertical displacement of the gradient chunk, and str = str1 × str2.
where Fcap −1 (·) represents the frequency-domain inverse chunk average pooling operation. δ −1 represents the size of the neighbourhood contained in each gradient chunk. In the frequency-domain inverse chunk average pooling operation, the number of gradient chunks corresponding to each input neuron's characteristic parameter gradient matrix is fixed regardless of the value of ∩. Therefore, ∩ is set to a random constant. In summary, the output of the frequency-domain inverse chunk average pooling operation is: The inverse chunk average pooling layer in the frequency domain can retain part of the position information and feature strength of the characteristic parameter gradient of the input neuron in the frequency domain, transmit it to the neuron in the frequency-domain backward pipeline to generate the loss deviation value in the frequency domain, and then adjust the weight parameter of the convolutional neural network in the frequency domain. Note that the frequency-domain inverse pooling operation proposed in this paper is a variant of the time-domain inverse pooling operation.

A. DATASETS AND TRAINING CONFIGURATION
In this paper, an open source dataset and an internal dataset are used to train the convolutional neural network model in the frequency domain. The open source dataset is a widely used ImageNet dataset [50], which consists of 1000 classified feature images. The internal dataset is a manually created dataset built by our research group. The images with high recognition and no ambiguity are selected manually. The dataset consists of 1800 images with classification labels. In addition, before training, we further screened the 2800 feature images, deleted the images with redundant features and those unable to be initialized successfully, and finally retained 2200 images with accurate feature labels. In this paper, the CUDA and Caffe frameworks are used to train and test the frequency-domain neural network model. In the training phase, the CUDA and the Caffe adopt the single max pooling method (without chunks) in the time domain. We need to build a training scheme of the frequency-domain chunk max pooling layer with chunks in combination with this time-domain max pooling method without chunks to ensure that the frequency-domain chunk max pooling layer proposed in this paper is realized on the CUDA and Caffe frameworks.
In the time-domain forward single max pooling stage, we need to divide the output neuron's characteristic parameter matrix into multiple chunks and calculate the size of these chunks in advance. These chunks will be used as the input of the frequency-domain chunk max pooling layer and then obtain part of the location information of the output neuron's characteristic parameters. The output neuron's characteristic parameter matrix of each layer is represented by m qp 1 ×m qp 2 , and the size of each chunk is represented by l qp k 1 × l qp k 2 . To train our frequency-domain chunk max pooling layer with multiple chunks, we treat each chunk as a single max pooling object in the frequency domain; that is, we use a sliding window to locate the output neuron's characteristic parameter matrix and extract chunks of fixed size. Similarly, in the frequency-domain backward pipeline, we also use the sliding window positioning method to divide the gradient matrix of input neuron's characteristic parameters. Note that this paper only divides the neuron characteristic parameter matrix once during initialization, so the sliding window positioning operation is only effective in the current layer. In Table 1, with LetNet-5 as the backbone framework, the CUDA configuration index of the frequency-domain chunk max pooling layer is given. LetNet-5 is composed of five convolution layers. In each layer, we choose different sliding windows to divide the output neuron's characteristic parameter matrix. FPool-8 indicates that the output neuron's characteristic parameter matrix of the first layer is divided into eight chunks, and the size of the activation chunk is 8× 8. The second layer is divided into six chunks. The output neuron's characteristic parameter matrix of the last layer is one chunk. At this time, the frequency-domain chunk max pooling operation is converted to the frequency-domain single max pooling operation. The fully connected layer is located at the end of the LetNet-5 architecture and is used to integrate the output of the last frequency-domain pooling layer.
In the frequency-domain training stage, we select three advanced convolutional neural network models as the backbone framework of our frequency-domain forward pipeline (see Section 3) and frequency-domain backward pipeline (see Section 3) and construct the frequency-domain convolutional neural network training model. The frequency-domain forward pipeline of the FCNN embeds the unsaturated activation function (FRReLU) and the frequency-domain chunk max pooling function (Fcmp). The former is used to extract the non-linear frequency-domain features of the output neuron's characteristic parameters, and the latter is used to retain part of the position information of the frequency-domain output neuron's characteristic parameters. The frequency-domain backward pipeline of the FCNN embeds the frequency-domain inverse randomized offset rectified linear unit (FRReLU −1 ) and the frequency-domain inverse chunk max pooling function (Fcmp −1 ). The former is used to extract the non-linear frequency-domain characteristics of the input neuron's characteristic parameter gradient, and the latter is used to retain part of the position information of the input neuron's characteristic parameter gradient in the frequency domain. In the experimental part of this paper, the FCNN is set to accept two sizes of neuron characteristic parameters, 180× 180 and 224× 224. We do not use cropping or deformation to change the size of the initial neuron's characteristic parameter matrix; that is, the two neuron characteristic parameters are only different in terms of the resolution, while the eigenvalues and spatial position information remain unchanged. We train the FCNN based on the frequency-domain sliding window positioning method. The input neuron's characteristic parameter size is 180 × 180, the output neuron's characteristic parameter size is 32× 32, the size of the activation chunk is l qp k 1 × l qp k 2 , the frequency-domain sliding window size is max( 32/l qp k 1 , 32/l qp k 2 ), and the sliding window step size is max( 32/l qp k 1 , 32/l qp k 2 ). Although the FCNN is set to accept the input of two sizes, the output of the FCNN still retains the information that the same type of feature appears many times, and the relative order of these feature values is also preserved. In summary, in the FCNN training stage, the frequency-domain chunk max pooling operation consists of several frequency-domain variants of a single max pooling operation in the time domain, and the training parameters of these frequency-domain variants are the same. Therefore, in the CUDA framework used in the experimental part of this paper, we use a single max pooling layer with multiple shared parameters in the time domain to simulate a single chunk max pooling operation in the frequency domain, and the range of the single max pooling operation in the time domain is [1,224].

B. THE FREQUENCY-DOMAIN IMAGE CLASSIFICATION EXPERIMENT
This section uses the internal dataset (MetData-1) and open source dataset (ImageNet) to train our frequency-domain convolutional neural network model (FCNN). Here, the fast Fourier transform algorithm is still used to initialize the characteristic parameters of the input neuron, and the characteristic parameters of the input neuron are converted to the frequency-domain form. When the training reaches the peak error, the learning rate decreases ten-fold. The frequencydomain image classification experiment in this section is based on the CUDA (Compute Unified Device Architecture) architecture environment. CUDA3.0/C++ programming is used to implement the frequency-domain randomized offset rectified linear unit algorithm and frequency-domain chunk max pooling algorithm proposed in this paper. Caffe (collaborative architecture for fast feature embedding) is used to train and test the frequency-domain convolutional neural network model proposed in this paper. In the NVIDIA GeForce RTX 2080 GPU (8 GB) environment, we spent approximately one month training the FCNN.

1) BACKBONE ARCHITECTURES
In the training phase of the frequency-domain forward/ backward pipeline, three popular convolutional neural network architectures are used as the backbone architecture of the FCNN. These three backbone architectures are AlexNet-7 [51], VGG-19 [52] and ResNet-50 [53]. The FCNN is a full frequency-domain convolutional neural network architecture. As the frequency-domain mapping of the time-domain convolutional neural network architecture, the frequency-domain variants of the time-domain convolutional neural network architecture are needed to construct the frequency-domain backbone architecture for the training and testing of both forward and backward pipelines. Therefore, we choose the above three time-domain convolutional architectures as the backbone architecture of the FCNN for the training phase. Here, we denote the FCNN integrated with AlexNet-7 by FCNN-Al, the FCNN integrated with VGG-19 by FCNN-VGG, and the FCNN integrated with ResNet-50 by FCNN-Res. In the frequency-domain forward pipeline of the three networks, the dimensions of the output neuron's characteristic parameter matrix of the last layer's frequency-domain chunk max pooling operation are 6× 6, the dimensions of the output neuron's characteristic parameter matrix of the fully connected layer are 4096× 1, and the number of classification labels output by the softmax layer is 120.

2) THE CLASSIFICATION ACCURACY OF FCNN BASED ON THE METDATA-1 DATASET
In the frequency-domain forward pipeline, the frequencydomain randomized offset rectified linear unit as the frequency-domain activation function of FCNN-Al, FCNN-VGG and FCNN-Res is placed in each convolution layer. Similarly, in the frequency-domain backward pipeline, the frequency-domain inverse randomized shift rectified linear unit, as the frequency-domain inverse activation function of FCNN-Al, FCNN-VGG and FCNN-Res, is placed in each convolution layer to replace the initial time-domain inverse activation function, while the frequency-domain inverse chunk max pooling layer is placed in front of the frequencydomain inverse activation function. Note that we will denoted the frequency-domain network without integrating the FCNN core architecture by Non-FCNN. The FRReLU/FRReLU-1 and Fcmp/Fcmp-1 functions proposed in this paper are not embedded in the frequency-domain forward/backward pipeline of Non-FCNN, but variants of the initial timedomain activation function and pooling function are used. During the training process, the frequency-domain chunk max pooling operation is set to 5 levels. The first pooling level of Fcmp is denoted by FPool-8, that is, the output neuron's characteristic parameter matrix of the first level is divided into 8 chunks, and the activation chunk size of FPool-8 is 8×8. The second pooling level is denoted by FPool-6; that is, the output neuron's characteristic parameter matrix of the second level is divided into 6 chunks, and the activation chunk size of FPool-6 is 6× 6. The third pooling level is denoted by FPool-4; that is, the output neuron's characteristic parameter matrix of the third level is divided into four chunks, and the activation chunk size of FPool-4 is 4× 4. The fourth pooling level is denoted by FPool-2; that is, the output neuron's characteristic parameter matrix of the fourth level is divided into two chunks, and the activation chunk size of FPool-2 is 2× 2. The fifth pooling level is denoted by FPool-1; that is, the output neuron parameter matrix of the fifth level is divided into one chunk characteristic matrix of the activation chunk size of FPool-1 is 1× 1. The rated number of activation chunks is set to 100. The three frequency-domain networks corresponding to these five levels are represented by the frequency-domain backbone architecture name and pooling level. For example, the first pooling level of FCNN-Al is denoted by FCNN-Al-FPool-8.
In Table 2, we give the training results of three frequencydomain networks at the five pooling levels and compare them with the training results of Non-FCNN. Note that since the pooling layer of the frequency-domain network (Non-FCNN), which is not embedded in the FCNN core architecture, adopts the classic frequency-domain max pooling operation, there is only one pooling level of Non-FCNN; that is, the chunk value is always equal to 1. We use AlexNet-7 as the backbone framework and MetData-1 dataset as the  This finding shows that even with the same pooling operation, the classification accuracy of the proposed full frequency-domain training network is still higher than that of the traditional frequencydomain network. The reason is due to the frequency-domain randomized offset rectified linear unit (FRReLU) used by FCNN in the frequency-domain forward pipeline. The testing of the time performance of FRReLU will be introduced in the next section. In addition, under the AlexNet-7 backbone framework, regardless of the pooling level of FCNN-Al, the classification error rate of FCNN-Al is less than the Non-FCNN-Al, that shows that the classification accuracy of our full frequency-domain network is significantly higher than that of the frequency-domain network without the FCNN core framework. By analysing the classification error values of the five pooling levels, we can see that when the output neuron's characteristic parameter matrix of the third level is divided into four chunks, the activation chunk size of FPool-4 is 4× 4, and the top-1 classification error value of FCNN-Al-FPool-4 is 24.30. Compared with Non-FCNN-Al, the top-1 error value of FCNN-Al-FPool-4 is reduced by 4.85%, which is lower than that of all the other pooling levels. That is, when the pool level is 3, the classification accuracy of FCNN-Al is the best. Therefore, when training on other datasets, we can set the pooling level to 3 to achieve the optimal classification accuracy. Note that when the pooling level is 1, the top-1 error value of FCNN-Al-FPool-8 is only reduced by 3.94 percentage points, which is lower than the top-1 error value of the fourth pooling level. This finding indicates that the lower the pooling level is, the higher the classification accuracy will be. The main reason is that the larger the number of chunks is, the higher the calculation error rate will be. The solution to this problem needs further study.
In Here, we evaluate the classification performance of each network by calculating the average precision (AP) value, which represents the average value of the classification threshold. It is a universal precision measurement unit widely used in visual information processing. In Table 3, we give the average precision values of three time-domain networks under three backbone frameworks and the FCNN under five pooling levels. In the frequency-domain forward pipeline, AlexNet-7, VGG-19 and ResNet-50 are used as backbone frameworks to train these four networks, and the following three results are obtained. (1) When the third-level output neuron's characteristic parameter matrix of the FCNN is divided into four chunks, the average precision of FCNN-Al is 76.50, while that of FCNN-VGG-FPool-1 is 76.65. Although the latter chooses the backbone framework with the higher accuracy, the classification accuracy of the latter is not significantly higher than that of the former in the frequency-domain training pipeline. This finding shows that the FCNN is not limited by the accuracy of the backbone framework. Even if the backbone framework with lower accuracy is selected, as long as the pooling level is selected reasonably, it can still achieve a higher classification accuracy. (2) In the same backbone framework, the average accuracy of the FCNN is higher than that of other time-domain networks with the same backbone framework, regardless of whether the output neuron's characteristic parameter matrix is divided into several chunks. For example, in the AlexNet-7 backbone framework, when the fifth level output neuron's characteristic parameter matrix is divided into one chunk, the average precision of the FCNN reaches its lowest value of 73.00, but the value is still higher than the highest value of the average precision of the other three time-domain networks, and the highest value of the average precision is 68.02. The same applies to the other two backbone frameworks. This finding shows that as long as the time-domain network and the frequency-domain network adopt the same backbone framework, and even if the FCNN chooses the most basic max pooling function as its frequency-domain pooling function, the classification accuracy of the FCNN is still higher than that of other time-domain networks. (3) With different backbone frameworks, the deeper the backbone framework is, the higher the average precision of the frequency-domain network is, and the more effective the frequency-domain chunk max pooling operation is. This finding shows that the frequency-domain chunk max pooling operation proposed in this paper is effective for backbone frameworks with any precision, especially for high-precision backbone frameworks with deep convolution and small filter dimensions.
We divide the MetData-1 dataset into three sets, namely, a training set, validation set and test set, which are used to train and test the FCNN model proposed in this paper. In this experiment, we choose the most widely used time-domain convolutional neural network architecture (cuDNN) and frequency-domain convolutional neural network architecture (koCNN) as the comparison object and compare them with the FCNN architecture proposed in this paper. We evaluate the classification accuracy of the neural network model by calculating the mean accuracy error (MAE) of the FCNN, cuDNN and koCNN models in the above three datasets. The mean accuracy error of the three models in three backbone frameworks is represented by drawing violin plots and scatter plots, as shown in Figure 7. A violin plot is a data visualization method that adds the rotation kernel density value in the neighbourhood of the box graph. We draw the violin plots of the FCNN model in the third pool level  and the fifth pool level (FPool-1) and draw the violin plot of cuDNN and koCNN. In the third pooling level, although three backbone frameworks with different precisions are selected, the violin plot of FCNN shows almost the same average error distribution effect. At the same time, from the average error distribution of the three datasets, the shapes of the violin plots of the FCNN under the three backbone frameworks are very similar. For example, the MAE values of the FCNN-Res-FPool-4 training set are distributed above and below 0.05, the upper limit fluctuation range is not more than 0.0015, and the lower limit fluctuation range is not more than 0.002; the MAE values of the FCNN-Res-FPool-4 verification set are also distributed above and below 0.05, the upper limit fluctuation range is not more than 0.002, and the lower limit fluctuation range is not more than 0.003; the MAE values of the FCNN-Res-FPool-4 test set are still distributed above and below 0.05, the limit fluctuation range still does not exceed 0.003, and the lower limit fluctuation range still does not exceed 0.004. The reason is because the FCNN integrates the frequency-domain chunk max pooling method in the forward pipeline. The FCNN divides the output neuron's characteristic parameters into four chunks at the third pooling level. The extracted part of the feature information can more accurately represent the local features of the dataset. In the fifth pooling level, FRReLU divides the characteristic parameters of the output neurons into a chunk, and the FCNN's pooling operation is transformed into the classic frequency-domain max pooling operation, which is the pooling method adopted by the koCNN architecture. However, from the average error distribution of the three datasets, the violin plot of the FCNN is flat and complete. This finding shows that the error fluctuation of FCNN is small, and the error value is concentrated at approximately 0.05. The reason is because the FCNN integrates FRReLU into the forward pipeline. FRReLU injects non-linear frequency-domain characteristics into the convolution neurons of the frequency-domain pipeline, which effectively reduces the overfitting problem of the full frequency-domain network. However, the upper and lower limits of cuDNN and koCNN's violin plots are large, which shows that their average errors fluctuate greatly and that there is an overfitting problem. In particular, the MAE value of koCNN is generally approximately 0.65, which has exceeded that of the FCNN by nearly 20%. This finding shows that the FCNN is more stable, accurate and robust than the traditional time-domain and frequency-domain architectures.
In Figure 8, based on the MetData-1 dataset and three backbone frameworks, we train the FCNN, cuDNN and koCNN  models and give the average error values of the estimation of the neuron' characteristic parameter of the three models. In the frequency-domain forward pipeline, when ResNet-50 is selected as the backbone framework and the pooling level is 3, the average error value of the estimation of the output neuron's characteristic parameter of the FCNN is approximately 0.05, and the minimum error value is not more than |0.055|, which is the lowest average error value for these three models. In addition, the accuracy of the FCNN training set is higher than that of the test set under the same backbone framework. When a backbone framework with a low accuracy is selected as the main body of the FCNN, the accuracy of the FCNN training set is still higher than that of the test set. For example, with VGG-19 as the backbone framework, the maximum error value of the training set of the FCNN-VGG model is equal to 0.053, while the maximum error value of the test set of the FCNN-Res model is equal to 0.055, and the training error accuracy of the FCNN-VGG model is higher than that of the FCNN-Res model.
To further test the classification performance of the FCNN model, in this group of experiments, we use the test dataset of MetData-1 to test the FCNN, koCNN and cuDNN models and compare the test results with the ground truth. Here, we use a scatter plot to visualize the comparison results, as shown in Figure 9. We plot the straight line from the origin as the identity line: its slope is 1, its starting-point coordinates are (0, 0), and the ending-point coordinates are (1, 1). The abscissa and ordinate of all points on the identity line are equal; that is, when the classification error value of the model is on the identity line, the classification result value of the model is equal to the ground truth value; that is, the classification accuracy of the model reaches the maximum value. From a macroscopic point of view, no matter what kind of pooling level is adopted in the frequency-domain chunk max pooling layer of FCNN architecture, the classification results of the FCNN model (red area) can be distributed along the identity line in the form of a narrow band, while most of the classification results of the koCNN and cuDNN models are distributed around the red area, and the rest are covered by the red area, which shows that the classification results of the FCNN model are closer to the identity line and have higher classification accuracy. When the frequency-domain pooling level of the FCNN is 3 and ResNet-50 is selected as the backbone framework, the classification result value of FCNN-Res-FPool-4 is closest to the identity line. Compared with FCNN-VGG-FPool-4 and FCNN-Al-FPool-4 in VGG-19 and AlexNet-7, the classification accuracy of FCNN reaches the maximum value, and the average difference is merely 0.18. Compared with koCNN-Res, koCNN-VGG and koCNN-Al, FCNN-Res-FPool-4 is 15% higher than koCNN-Res, 11% higher than koCNN-VGG, and 10% higher than koCNN-Al. This shows that FCNN can be used in frequency domain classification tasks with backbone frameworks of various precision, especially for those with deep convolution level and small filter kernel. When the frequency-domain pooling level of FCNN is 5, the classification performance of FCNN is still better than that of koCNN but lower than that of FCNN when the pooling level is 3. This is because VOLUME 8, 2020 in the frequency-domain chunk max pooling layer, the classification error value of level 3 pooling is 16.5% lower than that of level 5 pooling, resulting in the classification accuracy of FCNN-Res-FPool-1 being lower than that of FCNN-Res-FPool-4. This further shows that the frequency-domain chunk max pooling method proposed in this paper is very important to improve the classification performance of the full frequency-domain training network. The above test procedures are also applicable to the time-domain cuDNN model, and the test results are shown in lines 3 and 4 of Figure 9.

3) THE CLASSIFICATION ACCURACY OF FCNN BASED ON IMAGENET DATASET
In the frequency-domain forward and backward pipelines, the same training parameter settings and marking symbols as in Section 5.2.2 are used in this section. This section adopts the ImageNet dataset training network model, gives the training results of FCNN at five pooling levels under three backbone frameworks, and compares them with the training results of Non-FCNN, as shown in Table 4. The analysis is as follows: (1) when ResNet-50 is selected as the backbone framework, the ImageNet dataset is used to  while the error value of top-5 is reduced by 1.75 percentage points. This is attributed to FRReLU, the frequency-domain randomized offset rectified linear unit used by FCNN in the frequency-domain forward pipeline. FRReLU can extract the non-linear frequency-domain characteristics of the output neuron's characteristic parameters and calculate them completely in the frequency domain. However, the traditional frequency-domain architecture needs to convert the frequency-domain data into time-domain data after each layer of training and then extract the non-linear characteristics of the neuron characteristic parameters. This conversion process will lead to a decline in the frequency-domain parameter feature extraction accuracy of Non-FCNN, and the non-linear feature extraction accuracy will also decline. (2) When VGG-19 is selected as the backbone framework, the ImageNet dataset is used to train FCNN-VGG, and the classification error values of FCNN-VGG at five pooling levels are analysed. The classification error rate of FCNN-VGG is lower than that of Non-FCNN-VGG. For example, when the output neuron's characteristic parameter matrix of the third level is divided into four chunks, the activation chunk size of FPool-4 is 4 × 4, and the top-1 classification error value of FCNN-C-FPool-4 is 19.99. Compared with Non-FCNN-VGG, the top-1 error value of FCNN-VGG-FPool-4 is reduced by 4.76 percentage points, which is lower than that of all other pooling levels. That is, when the pooling level is 3, the classification accuracy of FCNN-VGG is optimal, which is similar to the training result in the MetData-1 dataset (see Section 5.2.2). This shows that the classification accuracy of FCNN is significantly higher than that of frequency-domain networks without FRReLU and Fcmp methods, and the train- Furthermore, we divide the ImageNet dataset into three sets, namely, a training set, validation set and test set, which are used to train and test the FCNN model proposed in this paper. Among them, the training set contains 681,000 images, the validation set contains 476,700 images, the test set contains 136,200 images, and each image contains 300 classification labels. In this experiment, we choose the frequency-domain backward pipeline to train the FCNN model and calculate the mean accuracy error (MAE) of the FCNN, cuDNN and koCNN models on the above three datasets to evaluate the classification accuracy on the open source datasets. We also use violin plots to represent the average error precision of three models under three backbone frameworks, as shown in Figure 10. In Figure 10, for the test results of the three models on the ImageNet dataset, we draw the violin plots for the FCNN model at the third and fifth pooling levels, as well as the violin plots for cuDNN and koCNN. In the third pool level of the frequency-domain backward pipeline, ResNet-50 is used as the backbone framework to train the FCNN model. The error distribution map of the training results is similar to that of the test results, and three-fifths (3/5) of the error values are distributed in the small neighbourhood of 0.05. For example, in the frequency-domain backward pipeline, the MAE values of the FCNN-Res-FPool-4 training set are distributed above and below 0.05, the upper limit fluctuation range is not more than 0.0012, and the lower limit fluctuation range is not more than 0.0011; the MAE values of the FCNN-Res-FPool-4 verification set are distributed above and below 0.05, the upper limit fluctuation range is not more than 0.0015, and the lower limit fluctuation range is not more than 0.002; the MAE values of the FCNN-Res-FPool-4 test set are also distributed above and below 0.05, the upper limit fluctuation range is not more than 0.0025, and the lower limit fluctuation range is not more than 0.0035. The reason is because the FCNN integrates the frequency-domain inverse chunk max pooling method (Fcmp-1) in the frequency-domain backward pipeline. Fcmp-1 divides the input neuron's characteristic parameter gradient matrix into four chunks at the third pooling level and extracts the coarse-grained location information and feature strength of the gradient of the frequency-domain neuron characteristic parameter, ensuring the effective propagation of the frequency-domain neuron's characteristic parameter gradient in the frequency-domain backward pipeline. In the fifth pooling level of the frequency-domain backward pipeline, FRReLU-1 divides the characteristic parameter gradients of the output and input neurons into a chunk. The FCNN pooling operation is transformed into the classic frequency-domain inverse max pooling operation, which is also adopted by the koCNN backward pipeline. However, according to the average error distribution of the three datasets, the violin plots of the FCNN model trained with ResNet-50 and VGG-19 as the backbone framework are flat and complete, while the violin plots of the FCNN model trained with AlexNet-7 as the backbone framework are wide in terms of the aspect ratio. This finding shows that when the FCNN model is trained with the network with a high classification accuracy as the backbone framework, the error distribution range of the FCNN is narrow, and the error value is concentrated in the small neighbourhood of 0.05. In contrast, when the FCNN model is trained with the network with a lower classification accuracy as the backbone framework, the error distribution range of FCNN is wide, and the error value of the training result is large. The experimental results show that the error distribution of the FCNN model with AlexNet-7 as the backbone framework is larger than that in the MetData-1 dataset. The reason is due to the integration of FRReLU-1, which injects non-linear frequency-domain characteristics into the convolution neurons of the frequency-domain backward pipeline, effectively reducing the overfitting problem of the full frequency-domain network. However, the frequencydomain inverse randomized offset rectified linear unit is limited by the backbone framework and will display different test results under different datasets. Therefore, when we use the open source dataset ImageNet to train the FCNN model's backward frequency pipeline, we should select a backbone framework with high accuracy to avoid reducing the training accuracy of the backward frequency pipeline. In Figure 10, we give the violin plots of another time-domain network and a frequency-domain network, compare them with the violin plots of the FCNN model, and obtain the following analysis results: (1) under the open source dataset ImageNet, the upper and lower limits of koCNN's violin plot are still large, which shows that their average error fluctuates greatly, and there is an overfitting problem. (2) The error distribution of the cuDNN model trained with AlexNet-7 as the backbone framework has a large vertical-to-horizontal ratio. The reason is because cuDNN uses a convolution pipeline in the time domain to train the neural network model, and the convolution pipeline in the time domain is greatly affected by the accuracy of the backbone framework. Therefore, in the open source dataset, the FCNN model is superior to other full frequency-domain network models when using a higherprecision network as the backbone framework, and the FCNN model is superior to other time-domain network models when using a lower-precision network as the backbone framework. In practical applications, we can select different precision backbone frameworks according to different application situations to train the FCNN's backward pipeline.
To further test the classification performance of the FCNN model in the frequency-domain backward pipeline, in this group of tests, we use the test dataset of ImageNet to test the FCNN, koCNN and fbFFT models and compare the test results with the ground truth values, as shown in Figure 11. From the error distribution diagram in Figure 11, it can be seen that no matter what kind of pooling level is adopted in the frequency-domain inverse chunk max pooling layer of the FCNN architecture, the classification result value (red area) of the FCNN model is distributed on the identity line in the form of a narrow band, while the classification results of the koCNN and fbFFT models are mostly distributed around the red area, and the rest part overlaps with the red area. These results show that the FCNN model is closer to the identity line and has higher classification accuracy. When the frequency-domain pooling level of the FCNN is 3 and ResNet-50 is selected as the backbone framework of the FCNN model, the classification results of FCNN-Res-FPool-4 is the closest to the identity line. Compared with FCNN-VGG-FPool-4 and FCNN-Al-FPool-4 in VGG-19 and AlexNet-7, respectively, the classification accuracy of the FCNN reaches the maximum value, and the average difference is 0.22. Compared with the test results in the MetData-1 dataset, the MAE of the FCNN in the Ima-geNet dataset is 4 percentage points higher than that in the former. This finding shows that when the same backbone framework is selected but the test datasets are different, the test accuracy of the FCNN model on the open source dataset is lower than that on the internal dataset. Compared with koCNN-Res, koCNN-VGG and koCNN-Al, FCNN-Res-FPool-4 is 18% higher than koCNN-Res, 17% higher than koCNN-VGG, and 15% higher than koCNN-Al. This finding shows that FCNN can be used for the classification of the frequency-domain backward pipeline under various precision backbone frameworks, especially for the classification of the frequency-domain backward pipeline under the backbone framework with deep convolution levels and small filter kernels. When the frequency-domain pooling level of the FCNN is 5, the classification performance of the FCNN is still better than that of koCNN but lower than that of the FCNN when the pooling level is 3. The reason is because in the frequency-domain inverse chunk max pooling layer, the classification error value of level-3 pooling is 18.1% lower than that of level-5 pooling, resulting in the classification accuracy of FCNN-Res-FPool-1 being lower than that of FCNN-Res-FPool-4. This finding further shows that the frequency-domain inverse chunk max pooling method proposed in this paper is very important to improve the classification performance of the frequency-domain backward pipeline of the full frequency-domain training network. The above test procedures are all applicable to the frequency domain fbFFT model, and the test results are shown in lines 3 and 4 of Figure 11, which are not discussed here.

C. TIME PERFORMANCE OF THE FCNN
We use two parameters to evaluate the time performance of the FCNN model: the acceleration ratio and throughput. Next, we will describe the evaluation process of the two parameters in detail. Note that in this group of experiments, we still select ResNet-50, VGG-19 and AlexNet-7 as the backbone frameworks and use the MetData-1 dataset to train the model.
The acceleration ratio is a ratio parameter used to evaluate the training speed of the network model in a specific backbone framework. When the koCNN model is selected as the reference model, the acceleration ratio of the FCNN model refers to the training time of koCNN in a specific backbone framework divided by the training time of the FCNN model in that backbone framework. Because the FCNN is a kind of VOLUME 8, 2020 full frequency-domain training network, it only performs the Fourier transform operation once in the initialization layer and once in the terminal fully connected layer and uses the product operation instead of the convolution operation in each convolution layer. Therefore, the training time of the FCNN model in a specific quasi-frame is converted to the complex multiplication complexity of the FCNN model. In addition, the frequency-domain chunk max pooling layer of the FCNN divides the output neuron's characteristic parameters into several chunks of fixed size and then extracts their characteristic parameters. Each chunk can be transmitted to the CUDA's parallel operation unit. In the CUDA, multiple chunks of each convolution layer perform complex multiplication in parallel mode. Here, we use the max() function to represent the maximum complex representation length of chunks in the frequency domain. Corresponding to each convolution layer of CUDA, the complex multiplication complexity of the FCNN model is (max(l Lp k 1 ×l Lp k 2 )+k qp 1 ×k qp 2 −1)(1+ log(max(l Lp k 1 ×l Lp k 2 )+k qp 1 ×k qp 2 −1)). The acceleration ratio formula of FCNN is as follows:  where T represents the training time of the koCNN model. S is the minimum processing unit of the CUDA. In Figure 12 and Table 5, we use formula (28) to calculate two kinds of acceleration ratios of the FCNN model based on the three backbone frameworks. Each kind of acceleration ratio contains different reference models. To evaluate the time performance of the FCNN model more comprehensively, we select two time-domain reference models to calculate the first type of acceleration ratio and two frequency-domain reference models to calculate the second type of acceleration ratio. For the first type of acceleration ratio of the FCNN model, the reference model is cuDNN and Lavin's fast algorithm (FA). When the value of the batch increases from 1 to 128, the acceleration ratio of the FCNN model shows an upward trend. For example, when the batch size is equal to 1 and the backbone framework is AlexNet-7, the acceleration ratio of the FCNN model is 5.7228 for the cuDNN reference model, while for the same backbone framework and the same reference model, when the batch size is equal to 128, the acceleration ratio of the FCNN model is 6.8674. When the reference model is cuDNN, the acceleration ratio of the FCNN model under the AlexNet-7 backbone framework increases by 1.1446 points, and the average acceleration ratio of the FCNN model under the same backbone framework is 6.3667, which is 0.5007 less than that of the FCNN model when batch size is 128. This finding indicates that the acceleration ratio of the FCNN model reaches the maximum value when the batch size is 128; that is, with the deepening of the training level, the FCNN model reaches the maximum value. These results also show that the time performance of the reference cuDNN model will continue to decline, while the training time of the FCNN does not increase with the deepening of the training level. The reason is because the FCNN architecture adopts the full frequency-domain training mode, and the FCNN only uses the fast Fourier transform operation in the training initialization phase and fully connected layer. In contrast, the time-domain convolutional neural network cuDNN uses the convolution operation in the whole training phase, and the time complexity of the convolution operation will increase with the deepening of the number of training layers. Therefore, the FCNN model is more suitable for the training of the framework with deeper layers and smaller filters. For the second type of acceleration ratio of the FCNN model, the reference models are koCNN and fbFFT. When the batch size increases from 1 to 128, the acceleration ratio of the FCNN model shows a downward trend. For example, when the batch size is equal to 1 and the backbone framework is ResNet-50, the acceleration ratio of the FCNN model is 24.7978 for the fbFFT reference model, while in the same backbone framework and the same reference model, when the batch size is equal to 128, the acceleration ratio of the FCNN model is 6.1793. The acceleration ratio of the FCNN is reduced four-fold because the FCNN adopts the activation function in the full frequency domain, eliminating the mutual conversion process between the time domain and frequency domain in the training process. When the reference model is koCNN, the acceleration ratio of the FCNN model with the ResNet-50 backbone framework increases by 6.8958 points, and the average acceleration ratio of the FCNN model in this backbone framework is 3.9650, which is 1.6764 higher than that of the FCNN model when the batch size is 128, which shows that the acceleration ratio of the FCNN model reaches the minimum value when the batch size is 128. That is, with the deepening of the training level, the acceleration ratio of the FCNN model will continue to decline, which suggests that the time performance of the reference model koCNN will continue to decline until it is lower than the average acceleration ratio. However, under this reference model, the acceleration ratio of the FCNN is still a multiple of 2; that is, the maximum training time required for the FCNN under the basic ResNet-50 framework is one-half of that of the koCNN network. The reason is because the FCNN adopts the frequency-domain chunk max pooling method. In this pooling layer, the output neuron's characteristic parameters are divided into multiple chunks in the frequency-domain forward pipeline. Each chunk is processed in parallel by each unit of CUDA, which reduces the algorithm complexity of the FCNN model and shortens the bandwidth needed for FCNN training.
The throughput refers to the maximum data rate that the neural network model can receive and output correctly without data loss. The throughput of each convolution layer of the neural network model is calculated by the formula: GPU unit calculation performance / GPU computing time required to train the layer. The unit calculation performance of the GPU is a fixed value, and the unit is TFLOPS. The peak throughput of the GPU used in this test is 8.92 TFLOPS. When the throughput of a convolution layer of a neural network model is greater than the GPU peak throughput, the convolution layer has a lower computational complexity in the training process; that is, less computational time is required to train the convolution layer, and the hardware consumption cost is lower. The calculation formula of the total throughput of the neural network model is as follows: the overall calculation performance of the GPU / the total calculation time the GPU needs to train the network. The overall calculation performance and total calculation time of the GPU are determined by the single-layer throughput of the neural network model; that is, the sum of the computing throughput of each convolution layer is calculated according to the depth of the network model. In Figure 13, we select the double precision real number as the calculation unit and calculate the single-layer throughput required by the FCNN model and koCNN model in the training and testing stages. The training and testing stages are divided into three types: the frequency-domain forward pipeline (ffprop), frequency-domain backward pipeline (fbprop) and frequency-domain weighted backward pipeline (faccgrad). In the training stage, when the number of input neuron's characteristic parameters is equal to 1 (F = 1), the training rate of the FCNN model is 2.8864 times higher than that of the koCNN model; when the number of characteristic parameters of the input neuron is equal to 256 (F = 256), the training rate of the FCNN model is still 1.6240 times higher than that of the koCNN model. The reason is because the activation function and pooling operation of the FCNN model are all full frequency-domain operations, and the pooling layer adopts the frequency-domain chunk max pooling operation. Multiple chunks perform the feature activation operation in CUDA in parallel, which reduces the computational complexity of the FCNN model. In addition, in the process of forward pipeline training, when f = 1, the throughput of the FCNN model is 17.1982 TFLOPS; when f = 256, the throughput of the FCNN model is 14.2431 TFLOPS; when the number of characteristic parameters of the input neurons increases one hundred-fold, the throughput of the FCNN model in the forward pipeline is only reduced by 2.9551 TFLOPS. In other words, under the framework of Al-7, the worst performance of the FCNN model is still higher than the GPU peak throughput of 5.3231 TFLOPS. This finding shows that the FCNN model has a stable frequency-domain network architecture that can ensure the fast training process of a large number of neuron characteristic parameters while taking into account the overall model's operational throughput. In the training process of backward pipeline, when f' = 1, the operation throughput of FCNN model is 16.7540 TFLOPS; when f' = 256, the operation throughput of FCNN model is 13.8658 TFLOPS; when the number of characteristic parameters of output neurons increases one hundred-fold, the operation throughput of reverse pipeline of the FCNN model only reduces by 2.8882 TFLOPS; that is, the worst performance of the FCNN model is still better than the GPU peak throughput. This finding shows that under the framework of AlexNet-7, the computational complexities of the forward pipeline and backward pipeline of the FCNN model are equivalent, and the FCNN can be used in the frequency-domain training process of multiple backbone frameworks with forward and backward pipeline architectures.
In the process of training and testing, the computational complexity of the FCNN model is generally lower than that of the koCNN model, which can be verified by the results of throughput and runtimes in each layer in Figure 13. However, when the size of the input neuron's characteristic parameters is very large, the computational complexity of koCNN will be reduced, while that of the FCNN will be increased. The reason is because in the initialization stage of the frequency domain, koCNN truncates the high-frequency domain of the input neuron's characteristic parameters and only retains the low-frequency domain of the input neuron's characteristic parameters, thus reducing the computational complexity. However, the strategy used by koCNN will reduce the training accuracy of the whole network model, resulting in inaccurate training results. The frequency-domain architecture of the FCNN model can balance the relationship between the training accuracy and training speed. The FCNN architecture maximizes the classification accuracy of neural networks without reducing the training speed. For example, when the output neuron's characteristic parameters of the frequency-domain chunk max pooling layer are divided into four chunks, the minimum throughput of the FCNN model can reach 10.6200 TFLOPS (1.7 points higher than the peak throughput), while the (top-5) classification error is still 2.62 percentage less than Non-FCNN architecture. This conclusion is applicable to the input neuron's many characteristic parameters. VOLUME 8, 2020

VI. CONCLUSION
In this study, we discuss how to combine the frequencydomain randomized offset rectified linear unit with the frequency-domain chunk max pooling operation to build a frequency-domain convolutional neural network architecture. Under three backbone frameworks with different precisions, the forward pipeline and backward pipeline can be trained and tested completely in the frequency domain, which shows that the proposed architecture has strong frequency-domain representation ability and frequency-domain learning ability. The FCNN model based on the ResNet-50 backbone framework (FCNN-Res-FPool-4) achieves the best classification performance among all the models. When the MetData-1 dataset is selected for training, the experiments show that FCNN-Res-FPool-4 has strong robustness and accuracy. In addition, the FCNN models under other backbone frameworks also achieved good classification results, which shows that the FCNN will be a very promising frequency-domain representation framework, which can provide a new way for researchers to explore the frequency-domain deep learning theory and the frequency-domain construction method for artificial networks.