A Review of Optical Neural Networks

Optical neural network can process information in parallel by using the technology based on free-space and integrated platform. Over the last half century, the development of integrated circuits has been limited by Moore’s law. We know that neural network is based on the digital computer for successive calculation, most of which cannot be made into real-time processing system. Therefore, it is necessary to develop ONN for real-time processing and device miniaturization. In this paper, we review the progress of optical neural networks. Firstly, based on the principle of artificial neural networks, we elaborate the essence of optical matrix multiplier for linear operation. Then we introduce the optical neural network achieved by free-space optical interconnection and waveguide optical interconnection. Finally we talk about the nonlinearity in optical neural networks. With the gradual maturity of nanotechnology and the rapid advancement of silicon photonic integrated circuits, the progress of integrated photonic neural network has been promoted. Therefore, the construction of optical neural network on the future integrated photonic platform has potential application value.


I. INTRODUCTION
The concept of artificial neural networks (ANNs) originated in 1943, when Walter Pitts [1] first created a model based on mathematics and algorithms that simulated the principles and processes of biological nerve cells [2], which proved to be able to perform logical functions, opening the era of ANNs' research. Since the development of more than half a century, neural network has made great progress and achieved extensive application, with the abilities of self-learning [3], associative storage [4], high-speed searching for optimal solutions [5], strong nonlinear fitting [6], [7], and mapping arbitrarily complex nonlinear relations [7]. However, as a model of computational network inspired by brain signal processing, ANNs are based on von neumann architecture and implemented electronically [8], which has some inherent defects. For example, electronic signals are easy to interfere with each other [9], which brings certain difficulties to neural networks that need high-density connections. In addition, the demand of energy is too high, resulting in large computing costs The associate editor coordinating the review of this manuscript and approving it for publication was Vivek Kumar Sehgal . [10]. Such flaws in electronics have led to a shift in focus from computers to so-called ''light brains'', which replace electrons or electricity with light to process large amounts of information at high speeds. Its basic component is a spatial light modulator (SLM) [11], which can convert incoherent light into coherent light. In the optical processing of images, coherent light is easier to be modulated. In addition, the optical computer adopts the optical interconnection technology [12], [13] which connects the part of computing and storage, breaking through the traditional architecture of connecting the arithmetic unit, memory, input and output devices by bus [14]. What we're going to talk about is a hot topic in optical computing, building neural networks using optical methods. Refers to optical neural network (ONN), it has a large number of linear layers which interconnect complexly. Optical interconnection is highly parallel, the beam can cross in the space without crosstalk, and the propagation velocity of light is very high, so the time delay and dispersion are negligibled [15], [16]. In 1978, J.W. Goodman of Stanford university in the United States firstly proposed the theoretical model of optical vector-matrix multiplier [17]. This became an important step of optical computing [18], [19] which promoted the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ research on optical matrix multiplier (OMM) [20], [21] and the development of photonic neural networks. With the rapid advancement of integrated photonic technology [22], [23], and the hardware implementation of nano-photon processors [24] building neural networks based on integrated photonic platform has become more and more popular.
In this review, we describe the progress of ONN over the last half century. The structure of this review is as follows. In Sec. II, we start from the basic theory of ANNs, explain the principle of OMM and optical interconnection, and emphasize the significance of nonlinear threshold operation. We know, optical interconnection is divided into freespace optical interconnection (FSOI) and waveguide optical interconnection (WOI). Thus, in Sec. III, we introduce the ONN realized by SLM, micro-lens array (MLA) and holographic element (HOE) from the perspective of FSOI, and then in terms of WOI, we introduce the all-optical neural network (AONN) based on fiber optics and integrated photonic technology in Sec. IV. Futhermore, we elaborate the optical nonlinearity in Sec. V. Finally, Sec. VI gives the summary and outlook of ONN.

II. THE PRINCIPLE OF ONN A. THE WORKING MECHANISM OF ANNs
ANNs are constructed by the inspiration of biological neural networks. Biological neurons have four basic units: dendrites, axons, cell bodies and synapses [25], as shown in Fig. 1(a). A neuron usually has multiple dendrites that receive incoming information when it receives a stimulus. Axons carry information from this neuron to others. There is only one axon with a number of terminals at its end, which can carry information to multiple other neurons. The axon terminals connect to the dendrites of other neurons, and then transmit signals. As shown in Fig. 1(b), similar to the biological neurons, artificial neurons are information processing unit, which consist of the input signals, the weight W i , bias b, linear summation, and threshold function [26]. The interconnection of signals represents axons and dendrites (input is the dendrite, output is the axon), the threshold and linear summation are equal to the activation of the cell body, weights and memories correspond to synapses. As shown in Fig. 1(c), in feed-forward neural network [27], the input x is sent to the artificial neurons, each corresponding to a weight, then we weight and sum the input, perform the nonlinear activation, finally we obtain the output S. The purpose of training is to get the output close to the ground truth. It constantly updated to optimize W and b in the process of training until the network is converged. In the above statement, the form of S is as follows: (1)

B. OPTICAL REALIZATION OF LINEAR OPERATION
Linear operations in neural networks involve a large amount of matrix multiplications. The design of ONN needs to con-  [25].(b) Structure of artificial neuron [26]. (c) The model of feed-forward neural network [27].
sider using optical methods to achieve such linear operations. OMM can quickly realize optical interconnection. Next, we will explain the principle of matrix-vector and matrix-matrix multiplication. In Fig. 2(a), optical vector-matrix multiplier is composed of an array of LED light source, collimating lens L, column lens CL1, SLM, column lens CL2 and CCD [28]. Its working principle is as follows: suppose the vector b=(1,0,2), matrix A=   1 0 1 2 0 1 1 1 1   . By controlling the luminous intensity of the LED light source to represent vector b, and matrix A is loaded by SLM, beams pass through the collimating lens L and shine parallel light on the CL1, then reach the SLM, complete multiplication with matrix A, again after lens CL2, focus on the probe CCD which is vertically placed. The intensity on the CCD can be detected to determine whether the obtained vector is (3,2,3).
Principle of matrix -matrix multiplier: Suppose the magnitude of matrix A is m by p, and the magnitude of matrix B is p by n, then the magnitude of matrix 70774 VOLUME 8, 2020 C is m by n.
The essence of multiplying A and B is to take the inner product of all the row vectors of A and all the column vectors of B. According to the convolution theorem [30], considered separately two vector convolution between alpha and beta, we can regard the calculation process as one of the vector alpha which is centered on the vertical axis rotate a degree of 180, and then moving in the direction of beta in turn. For each unit we move, we multiply and add the entries of vectors until the two vectors no longer intersect. The essence of multiplying matrices by matrices is that if you transpose all the vectors of one matrix, invert them, and then convolve them with another matrix, from the result of the convolution, you get the mathematical result of multiplying matrices. Take any row vector α i in matrix A convolved with matrix B to get D as an example: transpose and invert the row vector of A.
The row number of matrix D is 2p-1, and the column number is n. It can be found that the most middle row element of matrix D is the mathematical result of vector α i and matrix B. The matrix A is multiplied by the matrix B, and the matrix A can be segmented to obtain m row vectors, which are transposed and disordered respectively, and the 0's padding of sufficient columns are added to obtain m submatrices, The m submatrix is arranged horizontally in turn to form a new matrix to convolute with B, and the result is (2p-1) rows and (mn+n-1) columns. Take the former mn column elements of the middle row of the result matrix and arrange them into matrix Cm×n in turn, and the resulting matrix C is the mathematical result of multiplying matrix A and B. According to the above principle, the optical 4f system can be built to realize the matrix multiplier [29]. The matrix B is put at the object surface in the form of input image, and the matrix A is constructed into an optical matrix multiplication filter h(x,y), which is converted to the frequency domain by Fourier transform to obtain H(u,v). H(u,v) is placed at the frequency spectrum plane, the image at the object plane is multiplied by the matrix A, after passing through the Fourier transform lens L1. The result of matrix convolution can be obtained by inverting the Fourier transform, and the mathematical result of matrix multiplication can be obtained by taking part of the elements in the middle row.

C. THE NECESSITY OF NONLINEAR ACTIVATION
The deep neural network can be realized by adding hidden layers. In fact, if deep neural networks do not have nonlinear activation, hidden layers in the network are equivalent to a single linear layer, so it is impossible to learn and identify the nonlinear model, therefore the nonlinearity in neural networks is crucial. The lack of direct and effective nonlinearity in optics seriously limits its role in deep learning calculation. In recent years, studies on optical nonlinearity activation function emerge endlessly. We will discuss it in Sec. V.
So, the implementation of ONN needs to consider optical linear operation and nonlinear activation. Optical linear operation can be realized by FSOI and WOI, among which the weight of optical interconnection can be realized by SLM, holographic element, 4f system, phase change material (PCM) [31],etc. Optical nonlinear activation is indispensable, which can be achieved by using saturable absorber, Kerr medium, and nonlinear optical thermal effect [32].

III. USE FSOI TO IMPLEMENT ONN
FSOI refers to the interconnection mode in which the optical signal propagates in free space after being emitted from the emitter, and finally reaches the receiver after changing the optical path and controlling the beam through some optical components such as lens and gratings. Linear operations in ONN can be implemented through FSOI. The common elements in the FSOI to realize ONN are SLM, MLA and HOE.

A. USE SLM TO BUILD ONN
It is well known that ONN originated in the 1980s, and its mathematical model was first proposed by Hopfield in 1982 [33]. During this period, emerging optical technologies such as optical bistability can perform threshold operations, which were necessary for the realization of this model. SLM is a device that modulates the spatial distribution of light waves and can modulate the beam in real time [11]. It is a key component of real-time optical information processing, optical computing and other systems. Next, we will introduce how to use SLM to realize optical interconnection and build ONN.
In 1985, D.Pasltis and H.Farhat first reported the Hopfield neural network (HNN) based on optical vector-matrix multiplier, namely the external product optical associative storage ONN system [34], which could improve the accuracy and robustness of optical information processing. However, such a system has problems with synapse saturation in the hardware associated with the connection matrix, and the network converges slowly. In the following year, H.K.Liu et al. introduced the inner product optical array processor [35], which did not need to calculate the outer product interconnection matrix in advance, and the dimension of the matrix was reduced in the process of calculation, so the convergent speed of the network was faster than that of the outer product. Neither the outer product nor the inner product ONN is too large in spacial size. Although it shows certain advantages, it is not conducive to its large-scale promotion. Benefit from the initial development of integrated optics, in 1989, Ohta et al. designed a miniaturized integrated chip based on ONN, using dynamic integration of SLM mask instead of the static mask [36]. It can simulate a system with 32 neurons, preliminary realized the associative memory based on neural network. This is the first time that integrated optical neural networks have been developed. Furthermore, Ying Zuo et al.demonstrates an AONN with adjustable linear operations using SLM and Fourier lenses to achieve such linear process [37]. In the process of linear operation, the power of incident light in different regions of SLM represents different input nodes. Through SLM and Fourier lens, all the diffracted beams are superimposed in the same direction and converge to a point on its focal plane to achieve linear summation. Based on this, AONN can obtain linear matrix elements through certain iterative feedback algorithm, and achieve a suitable precision.

B. USE MLA TO BUILD ONN
MLA is an array of lens with aperture of light and relief depth of micron. A complete laser wavefront can be divided into many tiny parts in space, and each part is focused by the corresponding small lens. On the focal plane, a series of microlens can obtain a plane composed of a series of focal points. Based on this, the modulation of incident light wave can be realized, and the spots in different directions can be convergent into points to achieve linear summation. Next, we will introduce the utilization of MLA to realize optical interconnection and ONN.
In 1993, Yasunori Kuratomi proposed an ONN with vector feature extraction, and innovated vector feature extraction equipment (FEOND) [38]. A 2 × 2 small lens array is used in the network to image the same four regions of the reticular pattern and focus these images on the FEOND, fulfilling the function of the first hidden layer. The network can correctly recognize handwritten letters, which is easy to implement in optics, but it has neither scale invariance nor shift invariance, and has limit of linear separability. If the position or size of input letters changes greatly, the vector's characteristics will also change. In addition, Taiwei Lu introduced a two-dimensional hybrid ONN using a high-resolution video monitor as programmable associative memory [39], which improved the Hopfield model, realized optical interconnection and memory storage by using MLA, and had strong self-adaptiveness. However, due to the limitations of imaging aberration and light detection, the number of neurons placed in the structure is severely limited. Based on this, Jianwen Yang's team proposed an ONN that utilized lens arrays and liquid crystal light values (LCLV) to subtract in parallel, the subtraction between the positive and negative of the simulated neuron is transformed into the addition of the normal and antistate of LCLV [40]. It is difficult to process the state of 2d input neuron because the dynamic threshold related to inputs is established on a single mask. In order to solve the problem of aberration and process the input neurons in two dimensions, Yang's team put forward an experimental device with 32 x 32 neurons [41]. The interconnection of large scale ONN is realized by coaxial lens array, aberration and the light efficiency are obviously improved [42].

C. BUILD ONN WITH HOE
HOE is an optical element made according to the principle of holography, which is usually made on a photosensitive thin film material [43], [44]. It acts on the principle of diffraction and is a diffractive optical element (DOE).
In 1990, D.Pasltis et al. proposed holography in the neural network [45], which uses the conjugate mirror to increase the threshold, feedback and gain of the pattern in the hologram reading process to realize the model of the neural network, as shown in Fig. 3(a). The information is encoded and distributed in the recording medium. Two -dimensional input and output planes are connected by interference fringes, and the pixels on the input and output plane are equivalent to neurons. In 2004, Sheng L. Yeh proposed a new optical method to realize HNN by using matrix gratings. Fig. 3(b) shows the structure of matrix gratings. The experimental results prove the effectiveness of this method and the stability of optical efficiency of different optical elements in ONN [46]. In 2018, Lin Xing et al. innovatively invented the diffraction optical neural network [47], and the team made an in-depth analysis of this architecture [48], as shown in Fig. 3(c), which consists of an input layer, a number of diffraction layers and an output layer. The input layer is illuminated by a terahertz light source, and the information can be encoded into the amplitude channel or phase channel of the input surface. The beam irradiated on the input surface was diffracted, and the coherent superposition of wavelet was used to change the amplitude and phase of the input wave. The photodetector array placed in the output detects the intensity of output light, and handwritten digits are recognized according to the difference of intensity in 10 different regions. Each diffractive layer is 3D printed by modeling the updated phase parameters. Training of this ONN, however, is still done by an electronic computer. It is impossible to realize fast real-time programming by parameterizing and 3D printing the virtual diffraction layers. Besides, the experimental environment is also limited by the use of terahertz light source. In 2019, their team proposed a wideband diffraction neural network based on the above architecture [49]. The requirement of the model for light source is no longer limited to monochromatic coherent light, and the application scope of ONN realized by this architecture is expanded, but the above problems still exist. From the perspective of free space, the architecture of this diffraction neural network is highly innovative. Based on the diffraction neural network, a team from Tianjin University creatively developed matrix gratings to replace 3D printed diffraction layers and obtained higher recognition accuracy [50]. They use a carbon dioxide laser tube to emit infrared light of 10.6um and the HgCdTe detector array. The model parameters trained by the computer are converted into the parameter distribution of the matrix grating. The advantage of using an infrared light source is that the size of a single neuron can be reduced to 5 microns, so that a 1mm by 1mm matrix grating can contain 200 by 200 neurons. This kind of miniaturized matrix grating will be very beneficial to the integration of silicon photonic platform and obtain a wider range of applications.
In addition, J.Bueno et al. introduced a network of up to 2,025 diffractive photonic nodes, forming a large-scale recursive photonic neural network [51]. As shown in Fig. 3(d), the digital micro-mirror device (DMD) is used to realize reinforcement learning, and the results are effectively converge. In a network consisting of 2,025 nonlinear network nodes, each of which is a SLM pixel, the DOE is used to realize complex network connections. Because of the large number of neurons in this architecture, it has stimulated the researchers' thinking on the extensibility of the diffraction coupled photonic network. Furthermore, Sheler Maktoobi et al. demonstrated that a diffraction coupled photonic network can contain 30,000 photons and described its scalability in detail [52]. It is pointed out that the diffraction itself is not the basic limit of coupling, the vignetting is the fundamental cause, and the aberration as well as distortion are the limits of the actual optical system. If the diffraction order and the position of the periodic arrangement of photon nodes deviate obviously, the coupling is not complete. The coupling efficiency of diffraction can be solved by changing the position of the single-mode fiber and adjusting the tilt angle of DOE.
The size of optical element in the optical neuron implemented by FSOI above is too large, resulting in a limited number of neurons in the network and a limited number of layers. Moreover, energy cannot be fully utilized when the beam propagated in free space. Due to the small number of layers in the network, large-scale complex nonlinear models cannot be learned and identified. In addition, the ONN realized by FSOI has strict requirements on the experimental environment, the existence of optical diffraction, interference and other phenomena also have different degrees of influence on the experimental results. However, the optical fiber communication interconnection technology, nanotechnology is gradually mature, PCM for neural computation and silicon photonic integrated circuits has been developed rapidly. It is possible to use optical waveguide technology to realize optical interconnection. In the next chapter we will introduce ONN based on optical waveguide technology and integrated photonic technology.

IV. USE WOI TO IMPLEMENT ONN
Optical waveguides are devices that guide the propagation of light waves and are divided into two categories. One is cylindrical optical waveguide, often called optical fibre. The other is the integrated optical waveguide, including the planar optical waveguides and the stripe medium optical waveguides.

A. ONN: BASED ON FIBER INTERCONNECTION
In 2012, F.Duport et al. proposed an all-optical implementation of reservoir computing based on a semiconductor optical amplifier array on a chip [53]. The off-line training is carried out by using the fiber delay switching of single nonlinear node, in which the nonlinearity is realized by using the saturation gain effect of the semiconductor optical amplifier [54]. On this basis, T.Cheng et al. proposed a scheme of an ONN based on reservoir computing in 2019 [55]. The soma of optical neurons can use directional couplers, fibers and amplifiers. Meanwhile, researchers use amplifiers to realize threshold, which is equivalent to nonlinear activation. The results indicate ONN can provide better performance and recognize the input signal waveform. We believe that the advantage of optical fibre is that it can be combined with the optical network in the 5G era, which is an engineering technology that can be further studied.

B. ONN: BASED ON INTEGRATED OPTICAL WAVEGUIDE PLATFORM
The integration of optical elements can greatly reduce the space occupied by neurons, which brings great changes to the further development of ONN. Special materials such as PCM have experienced the development of nearly ten years. In 2011, C.David Wright introduced the use of PCM for arithmetic and bio-inspired calculation [56], and provided the principle experimental proof of ''processor'' based on PCM for the first time, demonstrating the four basic operations of addition, multiplication, division and subtraction, and storing the results at the same time. In the same year, D.Kuzum reported new nanoscale electronic synapses based on PCM for optical data storage and non-volatile storage [57]. Continuous resistance transitions in PCM are used to simulate the properties of biological synapses so as to realize synaptic learning rules. According to the review of integrated optical quantum technology [58], the integrated photonic platforms and devices have experienced rapid development from 2008 to 2018 and have become more and more mature. Under such a background, ONNs based on integrated optical waveguide interconnection technology have emerged.
In 2017, Alexander N.Tait of Princeton university published a paper referred to neuromorphic silicon photonics, introducing the world's first integrated photonic neural network [59], as shown in Fig. 4(a). It uses a neural compiler to program a silicon photonic neural network with 49 nodes, each node operates at a specific wavelength, light from each node is detected and summed before it is fed into the laser, then the output will be feedback to create a feedback loop with nonlinear characteristics. Tait et al. simulated traditional neural networks, demonstrated how photonic neural networks can solve differential equations, and found that photonic neural networks using silicon photonic platforms can be connected to ultra-fast information processing environments for radio control and scientific computation.
In the same year, Shen.Y experimentally demonstrated the new architecture of AONN with a programmable nanophotonic processor (PNP) in a cascade array of 56 programmable Mach-Zehnder interferometers (MZIs) [60], as shown in Fig. 4(b). We know any real matrix M can be decomposed into M = U V + . U , V + can be realized by using optical beam-splitter and phase shifter, can be realized by optical attenuator. By tuning the phase shifter integrated in MZIs, it is possible to compute any size of the input. Such nanophotonic integrated circuits hold out the promise of AONN capable of meeting the upcoming big data and deep learning challenges, but directional couplers and phase modulators take up too much space for ONN with more than 1,000 neurons. So far, the goal of a large-scale and fast programmable photonic neural network has not been achieved.
In 2019, Feldmann J et al. proposed a fully optical synaptic system capable of both supervised and unsupervised learning [61]. By using wavelength division multiplexing (WDM), an extensible photonic neural network circuit is realized and the pattern recognition in optical field is successfully carried out. This photonic synaptic network promises to achieve the high speed and bandwidth inherent in optical systems, enabling direct processing of optical communications and visual data. From Fig. 4(c) we can see that the red rectangular block at the input represents PCM, four small circle is equivalent to multiplexers. If the sum of the four input pulses is insufficient to switch the state of PCM, the probe pulse resonates with the ring resonator. The weight can be modified by adjusting the pulse power of the laser to change the state of PCM, and then generate the output. It is worth learning to integrate PCM or other materials with optical nonlinearity into a ring resonator.
In addition, E.Khoram introduced a nanophotonic medium that can be used for neural inference [62], as shown in Fig. 4(d). The major material of the medium is silicon dioxide, which contains dopants with different refractive index from the major material, and there are linear and nonlinear particles in the dopants. The ONN based on nanophotonic neural medium has achieved high accuracy in handwriting recognition. This new-type of nanoscale neural medium replaces the traditional hierarchical network and can be used to realize photonic memristor and study synaptic photonics [63].

V. OPTICAL NONLINEARITY AND ITS IMPLEMENTATION
Nonlinear optics studies the generation mechanism of new phenomena in the interaction between coherent light and matter. The optical properties of a material change under the irradiation of a laser beam, which in turn affects the properties of the beam. The nonlinear optical effect originates from the nonlinear polarization of molecules and materials. If there is an energy exchange between light and matter, this effect is called the active optical nonlinear effect. It includes: nonlinear absorption, nonlinear refraction, nonlinear scattering, optical bistability, etc.

A. OPTICAL KERR EFFECT AND ITS APPLICATION
Kerr nonlinearity belongs to active optical nonlinearity. When the incident light is strong enough, the refractive index of the wave polarized along the direction of parallel and perpendicular to the light field will have a difference n, which is proportional to the light intensity of the laser beam acting in the medium. The first and third order effects are only considered when a high power laser with frequency ω is incident on an isotropic medium.
One application of Kerr's nonlinearity is to produce optical bistability, which is produced by two conditions: nonlinearity and positive feedback [64]. The simplest optical bistable device is to place a nonlinear medium in an F-P optical cavity. Kerr medium is a dispersive medium. According to the formula n = n 0 + n 2 I , the refractive index of the medium is non-linear depending on the intensity in the cavity, so the resonant wavelength of the cavity is related to I. The difference between the resonant wavelength and the incident laser wavelength can affect the transmission coefficient of the incident light and make the intensity change in the cavity. Therefore, this is a feedback process, and the optical nonlinear activation generated by optical bistable devices can be added to the optical path.
In view of the nonlinearity of Kerr medium, AONN is constructed by using the thin layer of Kerr material separated by free space to realize weighted connection and nonlinear processing, which is discussed in [65]. Nonlinear processing is caused by the Kerr effect (self-focusing or self-defocusing) and interference of the optical signal in the device. The output error of the light is propagated back to produce the error signal, and then the intensity is changed on the nonlinear layer. Compared with other ONNs, which require independent linear and nonlinear operation, AONN constructed by Kerr nonlinear materials can achieve the above functions simultaneously and with fast processing speed. Similarly, single mode Kerr interaction is used to realize single photon coherent nonlinearity [66]. We previously mentioned that active optical nonlinearity includes nonlinear absorption, one of which is two-photon absorption. Considering the fast speed of the Kerr nonlinear material, researchers combined the Kerr effect with two-photon absorption to build a nonlinear mechanism, and used it in conjunction with the ring resonator to achieve all-optical reservoir calculation [67]. This is similar to [31], [61] in the implementation of optical nonlinear activation.

B. OTHER NONLINEAR OPTICAL ACTIVATION
We discussed the importance of optical nonlinearity in Sec. II-C, many researchers have introduced nonlinear optical effects in recent years. R. Amin and J. George et al. pointed out that electro-optical absorption modulation can be used to realize nonlinear modulation of light waves [68]- [70]. The method of mapping the nonlinear activation function to the transfer function of the electro-optic modulator is discussed. It is pointed out that s-type function, ReLu function or other nonlinear activation function can be realized by using different electro-optic materials. For example, the ReLu function can be realized by using the inverted state filling light absorption mechanism of quantum dots (QD) [71]. At present, the mechanism of the nonlinear activation function unit first converts the optical signal into an electrical signal detection mechanism, and then converts it into an optical signal. In optical reservoir computing [72], the most commonly used optical nonlinear unit is graphene saturable absorber or is implemented based on the two-photon absorption [73], other research on nonlinear is based on the bistable switch and ring-resonators [74], [75]. However, these methods do not reach an ideal expectation in efficiency and speed. The optical non-linearity designed by M.Miscuglio et al. depends on the reversible transparent induction caused by the fano resonance in the plasma oscillator subsystem, and the anti-saturation absorption response of buckyball (C60) membrane to the non-linearity [76], achieving fast and efficient all-optical nonlinearity, improving the throughput of neural network, reducing latency and power consumption. In addition, the team from Hong Kong designed an AONN with nonlinear activation function [37]. Based on electromagnetic induction transparency (EIT), a kind of optical quantum interference effect between atomic transitions, AONN can accurately identify the Ising model.

C. IMPLEMENTATION: FREE SPACE AND INTEGRATED WAVEGUIDE
To sum up, SLM, MLA and HOE can be used to realize optical linear interconnection in free space. There are two ways to realize the nonlinearity: (1) the optical field information after optical linear interconnection is recorded by CCD, being sent to the personal computer to add nonlinear threshold, and then feedback to SLM. The Gerchberg-Saxton algorithm is used for feedback iteration, and the weight on SLM is updated in real time [32], [77]. (2) develop fully optical nonlinear activation materials. Optical nonlinearity can be divided into active and passive, and the difference lies in whether there is energy exchange between light and medium. Dyes and graphene with saturable absorption effects can be developed according to active optical nonlinear refraction and nonlinear absorption, and the above-mentioned optical bistable devices can also be added.
The free space interconnection technology of ONN often occupies a large volume of space. Moreover, the energy loss of light is large, and there is distortion in the light path.
The ONN realized by integrated optical interconnect technology can integrate excellent nonlinear materials such as black phosphorus and graphene well, so that the volume of ONN can be greatly reduced and the network can have more neurons and depth. The most important is that the integrated photonic neural chip produced by nanoscale technology can be programmable in real time, and can be used for training and recognition of more complex nonlinear models.

VI. SUMMARY AND OUTLOOK
In this review, we introduce the implementation of ONN in free space and integrated waveguides, and emphasize the significance of optical nonlinear activation. Here, we point out that compared with ONN realized in free space, the integrated photonic neural network is smaller in size and programmable, which has potential application value. Therefore, future research can focus on exploring nano-optical elements and nonlinear materials that can be integrated in optical waveguides, such as black phosphorus [78] and nanolasers [79], and designing integrated photonic circuits with higher density [80]. Of course, exploring the implementation of ONN in free space is a prerequisite for its integration (we can call this ''experience''). In addition, the growing maturity of nanotechnology allows metamaterials to be applied to the ONN [81], [82], and the network can be applied to image recognition, speech processing, target tracking and other fields. We believe that in the future super-large scale and programmable photonic neural networks can be realized. He is currently a Professor with the Nanjing University of Science and Technology. His current research interests include machine learning, image processing, and computer vision. He presided over more than 50 scientific research tasks, including major national projects, models, key pre-research and fund projects, had 43 authorized invention patents and published a monograph, nearly 200 academic articles of which more than 100 have collected by SCI and EI. He had won a second prize of national science and technology progress award, third first prizes of provincial and ministerial level science and technology award and 12 other science and technology awards. In addition, he is the member of the American Optical Society (OSA) and the International Society of Optical Engineering (SPIE), the Standing Director of the Chinese Institute of Electronics and the Chinese Optical Engineering Society, the Vice Chairman of the Optoelectronic Professional Committee of the Chinese Ordnance Society, the Vice Chairman of the Microscopic Instrument Branch of the Chinese Instrumentation Society, the Vice Chairman of the Jiangsu Optical Society, and the Chairman of the Optoelectronic Technology Professional Committee, Jiangsu Province.
GUOHUA GU received the B.S., M.S., and Ph.D. degrees in optical engineering from the Nanjing University of Science and Technology, Nanjing, China. He is currently a Professor with the Nanjing University of Science and Technology. From 2015 to 2019, he has published 29 articles and 37 patents. His current research interests include machine learning, image processing, and computer vision. VOLUME 8, 2020