Differential Neural Networks (DNN)

In this work, we propose an artificial neural network topology to estimate the derivative of a function. This topology is called a differential neural network because it allows the estimation of the derivative of any of the network outputs with respect to any of its inputs. The main advantage of a differential neural network is that it uses some of the weights of a multilayer neural network. Therefore, a differential neural network does not need to be trained. First, a multilayer neural network is trained to find the best set of weights that minimize an error function. Second, the weights of the trained network and its neuron activations are used to build a differential neural network. Consequently, a multilayer artificial neural can produce a specific output, and simultaneously, estimate the derivative of any of its outputs with respect to any of its inputs. Several computer simulations were carried out to validate the performance of the proposed method. The computer simulation results showed that differential neural networks are capable of estimating with good accuracy the derivative of a function. The method was developed for an artificial neural network with two layers; however, the method can be extended to more than two layers. Similarly, the analysis in this study is presented for two common activation functions. Nonetheless, other activation functions can be used as long as the derivative of the activation function can be computed.


I. INTRODUCTION
Artificial neural networks are used when a representation model of a system does not exist or when this model is too complex for practical purposes. The main advantage of artificial neural networks is that they can create a system model directly from a set of samples. Furthermore, artificial neural networks can operate even under noise or distortion in the experimental data [1].
In the last few decades, artificial neural networks have been used for pattern recognition, prediction, control, classification, and many other applications, see [2]- [5]. Current research related to artificial neural networks includes methods that can reduce training time, [6]. Research also focuses on reducing human intervention for data preparation and processing [7]. In the last few years, deep learning has The associate editor coordinating the review of this manuscript and approving it for publication was Wen-Sheng Zhao . become popular in a broad range of fields to solve different kinds of problems, see [8]- [11]. For instance, the authors of [12] use feed-forward artificial neural networks to solve the nonlinear second-order corneal shape model. Thus, scientific efforts to extend the capabilities of artificial neural networks are very important.
In several fields, it is important to analyze how one variable changes over time or how this variable changes with respect to other variables. For instance, in physics, the derivative of the position with respect to time provides the velocity. In business and economics, it is possible to optimize profit using derivative information of other variables. In the same sense, Ordinary Differential Equations (ODEs) and Partial Differential Equations (PDEs) are used in a broad range of fields such as mathematics, physics, astronomy, chemistry, biology, economics, and many other. In past years, artificial neural networks including functional link artificial neural networks have been used to solve differential equations, [13]- [15].
In the same context, the authors of [16] analyzed the solution of linear fractional-order ordinary differential equations using artificial neural networks. Some previous works to apply artificial intelligence to solve differential equations include the methods in [17], [18]. Mall and Chakraverty [19] presented a review of how artificial neural networks have been used through the years to solve differential equations. The authors in [20] approximated the solution of fractional differential equations (FDEs) by using the fundamental properties of artificial neural networks. Recently, the authors in [21] have demonstrated the precision and effectiveness of artificial neural networks to solve high-order linear fractional differential equations. Despite all these works, there is little research performed about how to estimate the partial derivative of a signal using artificial neural networks.
In this article, we propose an artificial network topology called a differential neural network. This type of network can be used to estimate the partial derivative of a variable with respect to another one. In other words, the method proposed in this study can be used to compute the partial derivative of one output of an artificial neural network with respect to one of its inputs. The main contribution of this work is a method that allows the estimation of the partial derivative using an artificial neural network. Therefore, it is possible to use an artificial neural network to create a model directly from the data, and then use our method, to create a second model for the partial derivative of the parameters involved in the process under analysis. For instance, if an artificial neural network is used to model the stock market, then a differential neural network can be used to estimate the rate of change of the stocks.

II. ARTIFICIAL NEURAL NETWORK
In this section, we present a quick review of feed-forward artificial neural networks. First, we begin by describing how the networks are organized in layers. Second, we will explain how scaling can be applied at the input and at the output of the network to normalize the range of the data in the training set. Third, we will describe some common activation functions used in neural networks. These concepts will be used throughout the paper to compute several partial derivatives in different parts of the network. The main objective of this work is to find an expression to compute the partial derivative of one output of the network with respect to one of its inputs.
A feed-forward artificial neural network is composed of a set of layers of neurons [22]. Each neuron in the network is connected to other neurons through a numeric value called weight. Additionally, each neuron produces an activation which is computed by taking a weighted sum of the layer inputs, and then, using a non-linear function, see [23]. During training, the weights are adjusted so that for each input applied to the neural network, each network output is as close as possible to a target value [24]. Figure 1 shows an artificial neural network with M inputs and R outputs. The neural network has two layers. The hidden layer has N neurons, and the output layer has R neurons. Each neuron has an adder and an activation function f (·).

A. INPUT SCALER
The neural network in Figure 1 has a set of input scalers to modify the input values range. The main purpose of input scaling is to reduce the space for the network weight's possible values, and thus, ease the the network's training. These scalers can be represented by the following set of linear equationsx where µ i and β i are constant values determined from the input range.

B. OUTPUT SCALER
If the neurons in the output layer of the network have an activation function that limits the output values, it is necessary to scale the output of the network. For instance, if the activation function is the hyperbolic tangent (tanh), the range forz i is from −1 to 1. In this case, the output scaler must transform each outputz i to the actual range of the target values. The artificial neural network in Figure 1 has a scaler for each neuron in the output layer. These scalers are represented by where δ j and λ j are constants determined using the range of the target values in the training set and the type of activation function of the neurons in the network.

C. ACTIVATION FUNCTION
The activation function is an important component of an artificial neural network because it affects the ability of the network to converge during training. An activation function is VOLUME 8, 2020 a mathematical expression that defines the output of a neuron using the input value [25], [26]. One common activation function is the logistic function, logsig; it has a sigmoid curve with equation Another common activation function that has been used in artificial neural networks is where the constant a commonly takes a value of 1.5 [23].
The main difference between the logsig function and the tanh function is the output range produced by the neuron. Another very common activation function in deep learning is the Rectified Linear Unit, ReLU, described as The ReLU function is computationally efficient and allows the artificial neural network to converge quickly, see [27].

III. METHODOLOGY
For clarity, the discussion presented in this section is focused on the artificial neural network of Figure 1. However, our approach can be extended to networks with more than two layers. Consider the artificial neural network in Figure 2. In this case, we are interested in the estimation of the partial derivative of output z j with respect to input x i , ∂z j /∂x i . This derivative can be computed using the chain rule as The first partial derivative in Equation 5, ∂x i /∂x i , will be referenced as the input derivative. The second partial derivative, ∂q j /∂x i , will be referenced as the internal derivative. Finally, the last partial derivative, ∂z j /∂q j , will be referenced as the output derivative. Figure 2 shows these three partial derivatives; they are necessary to compute ∂z j /∂x i . The figure also shows those parts of the artificial neural network that are involved in the computation of each partial derivative.

A. INPUT DERIVATIVE
The input derivative, ∂x i /∂x i , for each input of the network can be easily computed from Equation 1. Thus, this set of input derivatives can be expressed as where i = 1, 2, · · · , M and the values of µ i are scaling constants.

B. INTERNAL DERIVATIVE
In this section, the internal derivative ∂q j /∂x i indicated in Figure 2 is computed. From this figure, it can be seen that the values of q j can be computed as where j = 1, 2, · · · , R. Thus, the partial derivative of q j with respectx i can be computed as By applying the chain rule in Equation 8 and Figure 2, we get The value of ∂p k /∂y k can be computed from the activation function used in the neurons of the artificial neural network. For the logsig activation function of Equation 3, this derivative can be expressed as where p k is the activation produced by neuron k in the hidden layer of the network in Figure 2. However, if the neurons in the hidden layer use the activation function tanh of Equation 4, the value of ∂p k /∂y k can be expressed as By substituting Equation 11 in Equation 9, we obtain The last partial derivative in Equation 12, ∂y k /∂x i , can be obtained from the network in Figure 2 as Consequently, a final expression for the internal derivative can be computed by substituting Equation 13 in Equation 12 to get where the values of w ki are the weights in the hidden layer, i = 1, 2, · · · , M and j = 1, 2, · · · , R. That is, there is a value of i for each input in the network, and there is a value of j for each output.

C. OUTPUT DERIVATIVE
In this section, we evaluate the last partial derivative of Equation 5, ∂z j /∂q j . This derivative can be determined by applying the chain rule to the neurons in the output layer of the network in Figure 2. Thus, we get The first partial derivative of Equation 15, ∂z j /∂q j , can be computed using the activation function of the neurons in the output layer. If the neurons in the output layer use the activation function tanh of Equation 4, this derivative can be expressed as where j = 1, 2, · · · , R andz j is the activation produced by neuron j in the output layer of the network in Figure 2.
The second partial derivative of Equation 15 can be computed directly from the equation of the output scaler. By using Equation 2, the output derivative can be expressed as where j = 1, 2, · · · , R and δ j is a constant determined by the activation function used in neuron j of the output layer and the range of the target values for output j. Thus, the output derivative, ∂z j /∂q j , can be computed by substituting Equation 16 and Equation 17 in Equation 15 to obtain where j = 1, 2, · · · , R.

D. DIFFERENTIAL NEURAL NETWORK (DNN)
Finally, the value of ∂z j /∂x i can be computed by substituting Equations 6, 14 and 18 in 5, we get By moving to the left of the summation those elements that do not depend on k, Equation 19 can be expressed as Equation 20 can be used to estimate the derivative of output j with respect to input i when the neurons use the activation function tanh. When the neurons use the logsig activation function, the derivative can be estimated using the following equation where i = 1, 2, · · · , M and j = 1, 2, · · · , R. The interesting fact about Equation 20 (or Equation 21) is that all the values in this equation are directly available from an artificial neural network. For instance, the weights in the hidden layer w ki and the weights in the output layer h jk are estimated during the normal training of the network. In the same sense, the activations of the neurons in the hidden layer p k and the activations in the output layerz j are simultaneously computed when the complete artificial neural network is activated. This implies that it is possible to create an artificial neural network that can simultaneously compute the output and its derivative. Figure 3 shows the structure of a differential neural network to compute the derivative of output j with respect input i. As it can be seen from the diagram, the partial derivative ∂z j /∂x i only involves those weights that are coming from input i (w 1i , w 2i , · · · w Ni ) and those weights that are going to output j (h j1 , h j2 , · · · h jN ). Figure 3 shows the artificial neural network topology to estimate the derivative. In this study, this topology or configuration is denominated a differential neural network, DNN. The hidden layer of a DNN is similar to the hidden layer of a feed-forward network. However, the output layer of a DNN is completely different from the output layer of the feed-forward network and requires the derivatives of the activation function as shown in Figure 3. Figure 4 describes how to use the algorithm proposed in this article to estimate the derivative of a function using an artificial neural network. First, the weights of a multilayer neural network are estimated using an appropriate training set. When the training of the network is completed, each case in a dataset is applied to the network to compute the activations of the neurons in the hidden layer: p 1 , p 2 ,. . . , p N , see Figures 1 and 3. At this point, the neuron activationz j is computed. Finally, Equation 21 is used to compute the desired derivative.

IV. COMPUTER SIMULATIONS AND RESULTS
Several computer simulations were performed to establish the validity of the method proposed in this article. These computer simulations were performed using the simulator Neural Lab [28]. The analysis presented in this section focuses on three functions that have well-known derivatives: sin(x), x 2 and ln(x).

A. DERIVATIVE ESTIMATION OF THE SINE FUNCTION, sin(x)
The simulations to estimate the derivative of sin(x) began by creating a training set with 4000 uniformly distributed values for x in the range from zero to 2π. The respective target for each case was the value of sin(x). Once the training set was created, an artificial neural network with sixteen neurons  in the hidden layer was trained using the parameters shown in Table 1. The artificial neural network had the structure shown in Figure 1 with one hidden layer and one output layer. However, the network had only one input and one output. The input scaler was designed to transform the value of x to the range [− 1 1]. Similarly, the output scaler was designed to transform the range [−0.9 0.9] to [−1 1]. From Table 1, it can be seen that the network was trained using the method of simulated annealing, and then, the training was improved using the method of the conjugate gradient, see [29]- [31]. After the training was completed, a validation set with 800 random values for x in the range from zero to 2π was created. As the mean-squared error obtained during training was similar to the mean-squared error obtained during validation, it was concluded that the network was properly trained, see [32].
As the neurons of the artificial neural network had the activation function hyperbolic tangent of Equation 4, Equation 20 was used to estimate the derivative of sin(x). Figure 5 shows the estimation of the derivative of sin(x) using the artificial neural network. Additionally, Figure 6 shows the error of the estimation made by the artificial neural network. As it can be seen from the figure, the derivative has an error that is less than 0.001 for most of the values in the range from zero to 2π.

B. DERIVATIVE ESTIMATION OF THE PARABOLA, x 2
For this function, a training set with 4000 uniformly distributed values x in the range from −2 to 2 was created. The target values of the training set were the values of x 2 . Then, an artificial neural network with 16 neurons in the hidden layer using the same training parameter in Table 1. After the network's training was completed, the network performance was validated using a data set with 800 random values for x in the range from −2 to 2. The computer simulations showed that the mean squared error for both training and validation was similar. The derivative of x 2 was estimated substituting the weights and activations of the network in Equation 20. Figure 7 shows the derivative estimation for x 2 , as it can be seen from the graph, the estimation of the derivative is very good for most of the range from −2 to 2. However, the estimation of the derivative presents an abnormal behavior around −2 and 2. As the artificial neural network was not trained with values outside this range, the network does not have enough information to properly estimate the derivative in the lower limit as well as in the upper limit. Figure 8 shows the difference between the actual value of the derivative 2x and the estimation produced by the network. From the graph, it can be seen that the error remains below 0.01 for most values of x in the range from −2 to 2. Observe that, in this case, it is possible to try improving the performance of the network by increasing the number of training cases and the number of neurons in the hidden layer.

C. DERIVATIVE ESTIMATION FOR THE NATURAL LOGARITHM FUNCTION, ln(x)
The last function used to validate the method proposed in this work is the natural logarithm function. The training set was built with 4000 samples with uniformly distributed values in the range from 1 to 10. Once the training set was ready, one artificial neural network with sixteen neurons in the hidden layer was trained using the parameters in Table 1. Then, the artificial neural network's performance was validated using a dataset with 800 random samples in the range from 1 to 10. Figure 9 shows the estimation of the derivative of ln(x) computed using the artificial neural network. As it can be   seen from the graph in the figure, the estimation of the derivative is very good in most of the range. However, there is a little deviation in the lower limit around x = 1. Figure 10 shows the difference between the exact value of the derivative and the estimation computed by the artificial neural network. For this function, the error is below 0.001 for most of the range.

D. DERIVATIVE ESTIMATION ACCURACY
In this section, we analyze the accuracy of the proposed method to estimate the derivative of a function. Computer simulations using twelve different neural networks were performed to measure the mean-squared error between the exact value of the derivative and the value estimated by the networks. Table 2 shows the results of these simulations. The first column in Table 2 indicates the number of neurons that were used in the hidden layer for each experiment. The second column in the same table shows the mean squared error for the derivative of the sine function. The third column displays the mean squared error for the derivative of the parabola x 2 , while the last column in Table 2 shows the mean squared error for the derivative of the natural logarithm ln(x). After inspecting the values in the table, it was concluded that the accuracy of the derivative estimation can be improved by increasing the number of neurons in the hidden layer.
Numerical differentiation is used to estimate the derivative of a mathematical function. One classic method to perform numerical differentiation is based on finite difference approximations. Finite differences are appropriate for some applications. However, the method proposed in this work has been designed for those applications where an artificial neural network model has already been created, and where additionally, an estimation of the derivative is required. Consider, for instance, an artificial neural network to predict the weather, using a differential neural network it is possible to predict the weather and simultaneously the rate at which the weather is changing.
The study presents a method that allows the computation of the derivative of the output of an artificial neural network with respect to any of its inputs. As mentioned before, our method's main advantage is that all the values required to build a differential neural network are readily available from a classic artificial neural network. Therefore, it is possible to compute simultaneously the output of the network as well as the derivative of any (or all) its outputs. To the best of our knowledge, this is the first time that differential neural networks are proposed.

V. CONCLUSION
In the last few decades, artificial neural networks have been applied to a broad range of fields. Artificial neural networks can learn from data samples, and therefore, can be used in those applications where a model is missing or when the model is too complex. In this work, we propose a new network topology called differential neural network. This network topology can be used to estimate the partial derivative of a function. Differential neural networks use the same weights of a multi-layer neural network but in a different configuration. First, a multi-layer neural network is trained with a dataset to find the best set of weights that minimizes the network error. Second, the weights of this network and its neurons activations can be used to estimate the derivative of any output of the network with respect to any of its inputs. Therefore, an artificial neural network can be extended to simultaneously estimate the partial derivative of any of its outputs with respect to any of its inputs.
Three different artificial neural networks were designed and trained. The first network was trained using a data set with samples from the sine function. The second network was trained with samples from a parabola, and the last network was trained with samples from the natural logarithm. Once the three networks were trained and validated, three different differential neural networks were build using values taken from the trained networks. The results from the computer simulations showed that a differential neural network could accurately estimate the partial derivative directly from data samples. The main advantage of the proposed method is that the same weights that were obtained during the training of a neural network are used for a differential neural network. Thus, a differential neural network does not need further training.