Data Rate Enhancement in Optical Camera Communications Using an Artificial Neural Network Equaliser

In optical camera communication (OCC) systems leverage on the use of commercial off-the-shelf image sensors to perceive the spatial and temporal variation of light intensity to enable data transmission. However, the transmission data rate is mainly limited by the exposure time and the frame rate of the camera. In addition, the camera’s sampling will introduce intersymbol interference (ISI), which will degrade the system performance. In this paper, an artificial neural network (ANN)-based equaliser with the adaptive algorithm is employed for the first time in the field of OCC to mitigate ISI and therefore increase the data rate. Unlike other communication systems, training of the ANN network in OCC is done only once in a lifetime for a range of different exposure time and the network can be stored with a look-up table. The proposed system is theoretically investigated and experimentally evaluated. The results record the highest bit rate for OCC using a single LED source and the Manchester line code (MLC) non-return to zero (NRZ) encoded signal. It also demonstrates 2 to 9 times improved bandwidth depending on the exposure times where the system’s bit error rate is below the forward error correction limit.


I. INTRODUCTION
There are two type of optical receivers commonly used in visible light communication (VLC) systems: (i) photodiodes (PDs); and (ii) image sensors (ISs) [1]- [3]. The former is low cost and has higher bandwidth (or large detection areas). The latter, which is composed of a large number of PDs (i.e., pixels) aligned orthogonally, offer both multi-input multi-output capabilities and large detection areas but with complex data processing. Advances made in handheld smart devices have triggered the use of off-the-shelf conventional complementary metal oxide-semiconductor (CMOS) The associate editor coordinating the review of this manuscript and approving it for publication was Yong Yang . ISs for cameras. These devices can be used as an inherently integrated receiver (Rx) module in optical camera communications (OCC), which can capture light signals from a range of sources (i.e., LED-based traffic light, signage, headlights, vehicle tail lights, etc.) [3]. Moreover, in IS-based VLC systems the signal to noise ratio is independent of the transmission distance [4]. Note, as long as the projected image of the transmitting LEDs cover a number of pixels, the incident light power level per pixel remains unchanged. However, the drawbacks of ISs-based Rx are low transmission bandwidth due to the camera frame rate R f limitations and higher costs compared with PDs [5]. The maximum data rates of conventional commercial cameras are relatively low within a range of a few kbps, which are sufficient for non-data communications related applications such as device to device communications, sensing, and health care [6]- [13].
In [6], a data rate of 15 bit/s was demonstrated using an under-sampled frequency shift on-off keying (UFSOOK) and a 30-frames-per-second (fps) camera. The data rate increased to 150 bps using under-sampled phase shift on-off keying (UPSOOK), two LEDs, and a 50-fps camera [7]. In [8], a data rate of 150 bps was achieved using a red-green-blue (RGB) LEDs and a 50-fps in an IS-based VLC link. Alternatively, mobile-phone based VLC with a beacon jointed packet transmission scheme with rolling shutter (RS) was deployed with a data rate and R f of 10.3 kbps and 60 fps for a link distance of 20 cm [9].
A colour-shift keying (CSK) modulation scheme was adopted for RGB LEDs based VLC and OCC offering data rates of 240 bps, 5.2 kbps and 8.64 kbps (i.e., 288 bit/frame with a R f of 30 fps), respectively [10], [11], [13]. In [12], both data rate and packet size enhancement were employed in IS-based communications with 4-pulse amplitude modulation (PAM) to achieve a data rate of 3.6 kbps. Whereas in [14], [15], an OCC link with RS-based multilevel phase and amplitude modulation with the LED lights having spatial luminance distributions was investigated with data rates of 80, 78.5 bps for 256-and 64-QPSK, respectively with R f of 30 fps. The use of equalisation techniques is adopted to compensate spatial and time dispersions in the OCC system. In [16], a double-equalisation was investigated for a single white LED. The 1 st equaliser used a frame averaging based signal tracing (FAST) algorithm to extract the signal with nearly constant grayscale, while the 2 nd scheme is an adaptive least mean square (LMS)-based feedforward equalizer (FFE), where the data rate achieved was 14.37 kb/s at R f of 60 fps.
Additionally, different approaches can be adopted to improve the data rates in OCC with CMOS-based ISs including (i) high-speed cameras, which are highly costly and with limited applications; (ii) multiple transmitters, which may suffer from flickering [17]; and (iii) a special IS with a built-in PIN PD array has been used to increase the data rate to 55 Mbps using an optical orthogonal frequency division multiplexing [4]. However, the fabrication process of this IS is too complex and not commercially available. In contrast, the RS technique offers high-speed data transfer and has been widely adopted in OCC [18]- [21]. The concept of a 9.6 kbps VLC link using multiple transmitters (Txs) i.e., an array of 8 × 8 LEDs and an IS of a resolution of 320 × 240 with R f of 30 fps was adopted in [22]. However, providing high capacities still imposes a number of challenges, which needs addressing [23], [24], including sampling rates, frame resolution image distortion factor and detection speed of the transmitted light source.
The sampling rate corresponds to exposure time T exp i.e., the time where IS is exposed to the light. Increasing T exp allows more lights to be integrated by the IS and increases the noise level, which is the key parameter in determining the bandwidth (i.e., T exp acts as a temporal low-pass filter (LPF) where it takes an average over a period of time. If the changes in the scene are faster than the bandwidth of the camera induced by the exposure time, the scene will be blurred).
ANNs as universal classifiers are used in VLC systems, the use of ANN is deployed for each channel in multiple-input multiple-output (MIMO) VLC [25]. In [26], an ANN-based equaliser was deployed in a fully connected mode to reduce the effect of the inter-symbol interference in a non-LOS (NLOS) or diffuse VLC link in indoor environment. It is also deployed in OCC systems to compensate for the data loss by reducing the gap-time between observed frames [28].
Unlike previous studies, in this paper, for the first time, we propose and implement an ANN-based equaliser to mitigate the ISI induced by high sampling duration within the observed frame in OCC, and hence increase the transmission data rate.
The ANN network is trained only once for a range of T exp and can be stored in a look-up table. We have developed an experimental test-bed for the proposed system for evaluating its performance in terms of the data rate, bit error rates (BER) and eye diagrams. In addition, we investigate a number of training methods and show that, the resilient back-propagation algorithm offers the best performance with a trained mean square error value of 9.29 × 10 −5 .
We have also achieved the highest data rate in OCC using a single white LED source, the MLC NRZ encoded signal at the transmitter, an image sensor with R f of 30 fps and an ANN-based equaliser at the receiver.The achievable bandwidth is also increased by approximately 9, 5, and 2 times for T exp of 2, 1 and 0.5 ms, respectively compared with the existing reported systems [6].
The remainder of the paper is organised as follows: Section II introduces the OCC system' CMOS IS model. In Section III the proposed ANN-based post-equaliser is outlined, while in Section IV the system model is presented. The numerical results are presented and commented in Section V. Finally, section VI draws the final conclusion.

II. CMOS IS MODELLING IN OCC SYSTEM
In CMOS ISs with RS, an array of pixels is used to capture the incident light in a progressive manner by exposing each row (column) of pixels as illustrated in Fig. 1(a). In RS-based cameras, the incident light at high frequencies and a relatively low exposure time is observed by forming different illuminated bands indicating the ''ON'' or ''OFF'' status of the incident signal for OOK-NRZ at T exp , or the intensity level at other modulation schemes, while light cannot preserve any signal during the resetting time T rst and the readout time for a given row. The standard IS-based Rx is modelled as a linear shift-invariant (LSI) system, which is composed of two stages as depicted in Figure 1(b). The voltage at a pixel (U , V ), which corresponds to an individual photodiode, is given by [27] v where A is the gain (set to unity for simplicity), C PD and R are the equivalent capacitance and responsivity of the PD and x(t) is the received optical signal at the pixel (U , V ) at time t. Note, T exp >= T sym , where T sym is the symbol period [29]. The number of symbols observed at the Rx depends on the resolution of the IS, exposure time, pixel clock, and the size of the region of interest [5]. The system response is given by [30] where u(t) is the unit step function. Integration of the input signal over T exp results in a finite impulse response (FIR) LPF effect with a transfer function given as [30] The DC gain is proportional to T exp , therefore a trade-off between the gain and the required bandwidth, where increasing T exp will reduce the cut-off frequency. Following sampling, the discrete signal is given by [27] v rs kT cycle = v r (t) (4) where δ(t) is the Dirac delta function, T cycle is the sampling period and n(t) is the noise (i.e., signal-induced shot noise, ambient light induced shot noise and the thermal noise), which is modelled as zero-mean additive white Gaussian noise (AWGN).

III. ARTIFICIAL NEURAL NETWORK EQUALISER
In RS-based OCC systems, the bandwidth limitation imposed by the sampling process of IS (i.e., LPF) results in ISI at higher data rates, thus leading to a significant degradation in system performance. The slow rise-time of the detected symbol is affected by the existence of the transition between different illumination levels, see Figure 2. Generally, a matched filter is adopted to mitigate ISI, however if this is not sufficient equalisation can be used to enhance data rates by estimation and mitigating the ISI effect [31]. Equalisation can be viewed from (i) the information theory, where ISI is predicted by the training filter coefficients based on a training sequence in order to minimise the error cost; and (ii) classification, where class decision boundaries are created in order to classify symbols based on training. The key difference between the two is that the former allows generalisation because of the use of boundaries, where unknown symbol transitions can be tolerated. Linear decision boundaries are not sufficient to provide an optimal decision in practical channels, where the threshold boundaries are nonlinear. Therefore, ANN-based equalisers with the realisation of nonlinear decision boundaries offer improved performance in communication systems [25]. The classification and regression models are categorized as a sub-section of supervised machine learning. The main difference between regression and the classification is that the output variable of the formal one is numerical (or continuous) whereas is categorical (or discrete) for the latter one [32]. Note, the boundaries have a high dependency on the number of neurons and the hidden layers, which are analogous to the human brain, where the synaptic weight is changed based on the training sequence. Different ANN approaches can be deployed for equalisation including the single-layer [33] and the multi-layer perceptron (MLP) [31]. In [26] it was shown that, MLP offers superior performance in mitigating ISI in optical wireless systems and hence, has been adopted in this work.
The ANN-based equaliser includes input layer, variable number of hidden layers (M − 1) and output layer.
, known as the observation vector. The N m−1 tapped delay inputs are considered to study the impact of N previous samples on the desired sample. The n th input has a corresponding weight connected to the k th neuron w kn (m) . Note, N = T exp /T sample , where T sample is the sampling duration which depends mainly on the pixel clock. The neurons represent the functional unit of the ANN, and are represented by a transformed version of the summed weighted inputs. The output is added to a constant offset value C (m) , which is weighted with the threshold factor v k (m) to estimate the output o k (m) of the neuron k using a non-linear function f (.), given by [31] o k By considering a No × 1 input vector, and a N M × 1 output vector the estimation of the next observation vector The MLP record its trained information in the weights w kn (m) and in the threshold factors v n (m) , since C (m) is given as a constant for all layers (set as C (m) = 1, m = 1, 2, . . . , M ). With a suitable number of neurons, a single hidden layer ANN is recognised as a universal approximator. Figure 3 illustrates the block diagram of a single layer ANN-based equaliser. The non-linear hyperbolic tangent sigmoid ϕ activation function is selected in this work since it provides a wider range for faster learning and grading, given by [34] ϕ = e βξ − e −βξ e βξ + e −βξ , where β is the slope factor (the standard unity value is assigned) and ξ is the weighted input. ANN can be trained using supervised and unsupervised methods [35], [36]. Resilient back-propagation (RBP) is an advanced version of back-propagation, which has been adopted to train ANN in this work. RBP, considered as one of the best learning methods in ANN [37], takes into account the sign of the error gradient to designate the direction of the weight update, and hence overcoming the slow convergence of the standard back-propagation algorithm and reducing the level of training compared with other algorithms. The concept of RBP is similar to the regular backpropagation technique, where ANN adjusts the weight in order to minimize the error cost function E n as defined by [31] where d k is the ideal symbol value and y k is the actual received value. RBP is an iterative operation, where the step size is dynamically adapted for each weight depending on a gradient descent of E k . The updated weight is given by [38] w kn (t where w kn is the weight between the junction point of x k and k th neuron and η is the learning rate parameter. The number of input neurons is set to be equal to the length of input patterns or vectors plus one, and the additional neuron being the bias neuron [39]. In the proposed system, the optimal number of neurons is selected based on the experimental measurements of the BER and the mean square error (MSE) values. Figure 4 shows the BER and MSE as a function of the number of neurons in the input and hidden layer. At the FEC BER limit, the number of neurons at the input and the hidden layers are 250 and 100, respectively which correspond to the MSE values of < 10 −2 and 6×10 −1 , respectively.
The momentum parameter µ scales the impact of the pervious step on the present one, thus introducing stability to the system and improving the convergence of the error function. Each weight has an individual evolving-value w kn (t), and the weight-step, which is only determined by its update-value. The sign of the gradient is given by [38] w kn (t) = where kn (t) is the updated value. In order to provide a unity efficiency, the number of neurons is set to be similar to the number of tapped inputs [31].

IV. SYSTEM MODEL
The system block diagram of the proposed OCC system is illustrated in Figure 5(a). The transmitter is comprised of a pseudorandom binary sequence (PRBS) d with a length of 2 14 − 1 is generated in MATLAB, which is then up-sampled and encoded using the non-return to zero (NRZ)-Manchester line code (MLC) in order to avoid flickering and a unity amplitude rectangular pulse shaping filter. The MLC signal, which ensures a uniform distribution of 1 and 0 symbols and facilitates both decoding and synchronisation processes of the signal [29], is packetized in order to ensure proper detection at the Rx as illustrated in Figure 5(b). Each packet consists of 5-bit header [11100], N bit -bit payload and 5-bit footer [00111], which are used for intensity modulation of the LED via the optical driver.
The pattern of header and footer is designed such that (i) it never occurs in the MLC pattern for the payload; and (ii) the transition from one packet to another is smooth, which facilitates the training of the network. The transmitted optical signal over a free space channel is captured using a CMOS IS Rx (i.e., camera). The link distance is set to 60 cm, which can be extended by increasing the intensity of light, having a clear available region of interest and using the lenses at the transmitter and the receiver. For example, increasing the optical transmit power from 1 to 4 mw will increase the transmission range by 100% [1].
A diffuser is utilised to distribute the captured light over the IS (i.e., the LED foot print was projected into the IS).
The RS received data is formed by accumulating intensities for all pixels at each row, since they are exposed to incident light at the same time. This increases the SNR of the signal by a maximum of V times, where V is the number of pixels in each row. The received signal strength forms the amplitude of the signal as explained in section II. The observed frames at the IS are first processed in MATLAB, where conversion to the grayscale is applied in order to eliminate the hue and saturation information while retaining the luminance of the image plane. Due to the non-uniformity of the illumination, the DC gain of the optical signal estimated from (3) is then measured to generate a calibration matrix. In order to mitigate the impact of noise, the calibration matrix is constructed by averaging over 20 frames of plain illumination, i.e., no AC signal. Note, none of the pixels of the region of interest in the calibration matrix should be over/under exposed, see Figures 6(a) and 6(b). The signal extraction algorithm is shown in Algorithm 1.
The frequency responses of the IS are first introduced for the conventional T exp values of 0.5, 1 and 2 ms, which are also estimated using (3), where the frequency of the main lobe and the followed side-lobes is introduced in the following section. The main lobe indicates bandwidth available by means of 3 dB region (i.e., the cut-off frequency), whereas followed harmonics describe the rest of the magnitude spectrum. The harmonic frequency f sh is used to specify the required bandwidth of the transmitted signal to ensure a minimum attenuation (i.e., at the peak of the side-lobes), where h is the harmonic number. Subsequently, a set of different signals are generated where N bit -payload is varied with respect to f sh in a way that ensures the capturing of three packets at every frame. T exp is set to 2 ms and the values of selected N bit are shown in Table 1. In a typical scenario, the bandwidth of the transmitted signal should be less than or equal to the cut-off frequency f c as highlighted in Section II, whereas f sh includes successive harmonics; The processed data is then applied to the proposed equaliser for different f sh , and the quality of the

Algorithm 1 Signal Extraction Algorithm
Input: RS captured image P U ×V ×3 and 20 frame of illumination gain (DC signal only) G U ×V ×3 Output: The received signal is ordered the same as the transmitted signal 1 foreach Set of captured data frames P U ×V ×3 do 2 Read U × V × 3 sized colour plaintext image Accumulate intensities for all pixels at each row Calibrate z with respect to the averaged DC value z cal = z/z DC where z DC = V j=1 GS i×j where is the averaged DC. 6 Resample z cal with respect to the packet length. 7 Locate the start of each packet in the frame. 8 Received packets are placed randomly within the captured frame due to the lack of synchronization between Tx and Rx, hence a correlation algorthim is adopted as follows: Create a filtered version of the ideal signal using a moving average filter where M is the window size (set to n samp ). Correlate the recieved signal with the filtered signal to find the correct sequence and change accordingly.
transmitted signal is investigated in terms of the BER and the data rate.
A moving average filter (i.e., comparing the energy of current per chip with the previous one) is employed to reduce the noise while retaining a sharp response by means of averaging the input signal y in order to produce a discrete signal (i.e., point) given by [40] where L is the number of points. The data is then down-sampled prior to thresholding to recover the estimated version transmitted signal. Finally, we have adopted different training algorithms in order to achieve the optimal results in terms of the BER and the convergence speed. For this, we have used 80% of the data in the first frame for training, which corresponds to N bit , and the number of bits used for training and testing is set  based on N bit per frame. For instance, for N bit of 1170 bits, 80% (i.e., 936 bits) of the data is set to train the network while the rest (i.e.,20%, 234 bits) is used for consecutive  testing purposes. Training is repeated every 10-N bit since a set of 10 frames is considered for this purpose.

V. RESULTS AND DISCUSSION
The system parameters used are shown in Table 1. Figure 7 shows the measured and simulated IS frequency response for a range of T exp . The measured (and simulated) 3-dB f c are 811 (1020), 443 (443) and 250 (250) Hz for T exp of 0.5, 1 and 2 ms, respectively. We have used the odd frequencies f sh (i.e., 716, 1231, 1736 and 2240 Hz, see Figure 7 for generating MLC data formats). Note, we have used an average of 636 bps per frame, thus a total of 12 kbps for 30 frames in order to meet the forward error correction (FEC) BER limit of 3.8 × 10 −3 .
Next, the effective data rate of the system is evaluated following the methodology adopted in the previous section. The higher sampling rate is associated with the higher data rates used in this work. The BER as a function of the data rate for the system is initially measured prior to introduction of the proposed ANN equaliser as depicted in Figure 8. The proposed system demonstrates BER values below the FEC limit of 3.8 × 10 −3 up to a data rate of 12 kbps for all three values of T exp and a transmitted signal bandwidth of 2.275 kHz, which is around 9 times (for T exp of 2 ms) higher the f c of the unequalised system. The eye diagrams at T exp of 2 ms without an ANN equaliser is illustrated in Figure 9, where the received signal shows three levels of amplitude in spite of sending an OOK signal due to the ISI, which creates an additional level due to the transition delay in the status of the captured signal as explained in the previous section (see Figure 2). However, with an ANN equaliser the eye diagram depicted in Figure 9(b) shows significantly reduced ISI. Finally, we have evaluated the proposed system behaviour over time to determine the convergence time for different training algorithms for the hyperbolic tangent sigmoid activation function as illustrated in 10. The system performance is measured based on the estimated MSE values, which is estimated between the equaliser outputs and the desired outputs. The ideal case is considered, where the channel is assumed to be noise-free, and hence the error in the equaliser outputs is solely due to the channel dispersion. As shown, the resilient backpropagation algorithm displays faster convergence compared with others and offers the best performance (the estimated trained mean square error value is recorded as 9.29 × 10 −5 ). The superiority of this algorithm is due to magnitudes elimination of the partial derivatives where the sign of the derivative is merely utilised to estimate the direction of the weight update.

VI. CONCLUSION
In this paper we for the first time demonstrate ANN adaptive equaliser at the receiver within the OCC system. The study provided the performance indicators for the proposed system. The data rates achieved were the highest in the OCC field, recorded as 12 kbps at the exposure time of 2, 1, and 0.5 ms using a single source and the MLC-NRZ encoded signal. The proposed system demonstrated the capability in retrieving the transmitted information with a bandwidth beyond the cut-off frequency limitation. Hence, it provided the bandwidth improvement of around 9, 5, and 2 times at the exposure times of 2, 1, and 0.5 ms, respectively. LUIS NERO ALVES (Member, IEEE) received the degree and the M.Sc. degree in electronics and telecommunication engineering from the University of Aveiro, in 1996 and 2000, respectively, and the Ph.D. degree in electrical engineering from the University of Aveiro, in 2008. His expertise is on integrated circuit design for optical wireless applications. Since 2008, he has been the Lead Researcher from the Integrated Circuits Group, Instituto de Telecomunicações, Aveiro. He has authored and coauthored several research articles in these fields. He has participated/coordinated several research projects in the fields of optical wireless communications and sensing devices (with both national funding-FCT/VIDAS, FCT/EECCO, IT/VLCLightingand international funding-EU-CIP/LITES, EU-FP7/RTMGear, EU-COST/ OPTICWISE, EU-COST/MEMOCIS, EU-/H2020/MSCA/VisIoN, and EU-H2020/FET/NeuralStimSpinal). His current research interests are associated with the design and performance analysis of optical wireless communication systems, with special flavor in visible light communications. Other research topics include materials and devices for IoT sensing devices. He has also served in the technical program committee for several international conferences on the field of optical wireless communications. He has also served as a Reviewer for several journals in optical communications, such as IEEE/PTL, IEEE-OSA/JLT, and Elsevier/OC.  2010. He has been an Associate Professor (Reader) in communications engineering and the Director of Learning and Teaching with the Mathematics, Physics, and Electrical Engineering Department, Northumbria University, U.K., since 2010. He is an Expert in photonics, optical communications, visible light communications, smartphone technology, signal processing, and intelligent networks. He has published more than 200 articles, in journals and conferences, and book chapters. He is currently the Chair of IEEE ComSoc U.K. and Ireland. He is the Editor for a number of journals. VOLUME 8, 2020