300-meter Long-Range Optical Camera Communication on RGB-LED-equipped Drone and Object-Detecting Camera

Large-scale disaster occurs all over the world frequently, disconnects telecommunications, and destroys communication equipment. In recent years, unmanned aerial vehicles (UAVs) network systems have been studied to work on the reconstruction activities safely and flexibly. The more means of telecommunication, the better because the UAV networks are used for emergency communication. Therefore, this paper studies optical camera communication (OCC) systems using RGB-LED-mounted drones and a high-speed camera for disaster recovery and proposes the RGB-LED-mounted drone’s detection scheme and the signal equalization technique to suppress the RGB interference. We detect the drone using the algorithm of a deep neural network (DNN) based object detection called YOLOv3. This paper adds a new function to reduce the frame rate in object detection. Consequently, the proposed scheme reduces the frame rate to a rate that can conduct real-time operations less than 20 fps from 600 fps. Moreover, the experimental results indicate the feasibility of the proposed scheme that can communicate in error-free operation at a 300-m distance.


I. INTRODUCTION
C ATASTROPHIC disasters frequently occur globally [1], [2], and it is essential to monitor and communicate with disaster-stricken areas for rapid rescue and recovery. However, it is challenging to obtain existing communications due to equipment damage or power supply interruptions caused by disasters. We currently use moving base stations (BSs) to restore telecommunications, but it is challenging to use them for maritime accidents or disasters at levels where vehicles cannot enter; therefore, more multifaceted recovery systems are required. Drone empowered networks have been attracting attention. In [3], long-term evolution (LTE) femtocell BSs on drones to replace collapsed base stations have been investigated. From an availability and robustness viewpoint, it is desirable to have multiple telecommunication. Futhermore, the tradeoff between the drone's weight and the battery makes it challenging to install addi-tional batteries or devices for communication, and low power consumption is required. Therefore, this study investigates employing visible light communication (VLC) using lightemitting diode (LED) equipped drones because of their ease of introduction and high-power efficiency.
The VLC is divided into two systems: light fidelity (Li-Fi) [4] and optical camera communication (OCC) [5]. Li-Fi is a high-speed and low-energy consumption method using existing LED lighting equipment and dedicated photodiode (PD) receivers. A 10-Gbps transmission rate has been reported [6]. Furthermore, the system is easily installed by attaching PD receivers to existing LED lighting equipment. Because of using a single PD, the receiver cannot separate the transmitted light signals from multiple transmitters, and must adjust the optical axis. Specifically, the transmitter on the drone moves slightly; it is more challenging to adjust the axis. Moreover, a slight deviation of the axis in long-range transmission affects the performance more. However, OCC uses a commercially supported digital camera as a receiver. A typical digital camera comprises an optical lens and image sensor equipped with color filters. Unlike a PD, the lens and color filter separate the transmitted light signals, and threechannel wavelength division multiplexing (WDM) can be realized. As a disadvantage of the OCC, the camera's frame rate limits the transmission capacity. To improve the capacity, a method based on the camera's shutter mechanism [7] (called rolling shutter (RS) method) and employing a highspeed camera [8] have been investigated. The RS method in [7] is unsuitable for long-range VLC because it requires the multiple PD pixels to demodulate a signal. However, the cost of high-speed cameras has decreased in recent years with the spread of smartphones.
This paper proposes a novel OCC scheme with a red, green, and blue (RGB) LED-equipped drone and a highspeed camera according to these features. The drone monitors and lights up the disaster area and transmits the signals (Fig.  1). The proposed method improves the transmission rate using WDM with RGB-LEDs and a high-speed camera to increase the sampling rate.
The contributions of the proposed scheme are as follows. 1) A transmitter introduces the 8B/10B encoding for flicker-suppressed communication. 2) An algorithm superimposes the multiple images acquired by high-speed cameras and converts them into a lower frame rate. The receiver can execute an object detection algorithm operating with a low frame rate of 20 fps. 3) A moving average filter (MAF) is introduced to remove amplitude fluctuations. 4) A constant modulus algorithm (CMA) is employed to equalize RGB crosstalk. By combining these techniques, we can ensure stable communication even over long distances. A part of the proposed scheme has been reported in IEEE Vehicular Technology Conference (VTC) in spring 2021 [9]. We extend the transmission capacity, employ RGB multiplexing, and confirm the feasibility of the proposed method through proof-of-concept experiments.
As for the following chapters, Section II introduces related works and Section III describes the proposed method. The experimental results are in Section IV. Finally, we conclude in Section V.

II. RELATED WORK
The camera receiver limits the OCC system's capacity; therefore, two methods are typically used: using a high-speed camera [8], [10] and an RS scheme [7]. In [10], the feasibility of 10-Mbps real-time communication has been experimentally reported using a 1,000-fps high-speed camera with a dedicated image sensor. The RS is a capture method where the complementary metal oxide semiconductor (CMOS) image sensor extracts the pixels row by row in order. The sampling rate can increase because the capture start time is  different for each row. The RS scheme has been standardized and is valuable for short-distance communication, such as indoors [11]. In [12], a data rate of over 100 kbps has been reported using the RS method and spatial multiplexing scheme. However, employing the RS scheme on the OCC over long distances is challenging because it requires a massive number of pixels in a frame. In long-distance communication for disaster recovery, the light sources are captured on a few pixels; therefore, we pick the high-speed camera method to improve the data rate. In recent years, employing artificial neural network has been also proposed to suppress intersymbol interference (ISI) [13]- [16] and color separation [17].
The WDM method is also vital for improving the data rate. Due to the RGB lights, we secure a three-channel transmission lane. However, crosstalk occurs due to the LED light's broad linewidth and the image sensor's low wavelength filtering performance. We need the RGB separation scheme. In [18], the channel matrix is obtained in advance and statically equalized to suppress the crosstalk effect using a multi-input-to-multi-output (MIMO) signal processing technique. In other words, the receiver must estimate the channel matrix when it cannot obtain the transmitter's information in advance. We assumed that the OCC system is used outdoors. The transmitter and receiver's owners might be different. The camera's specifications also differ depending on smartphones. Moreover, the channel has time-varying characteristics due to weather effects. Thus, inter-channel crosstalk compensation by blind adaptive equalization is a curious solution.
While OCC systems tend to indoor applications [19]- [21], several experimental results of LED-camera communication between drones and ground BSs have been reported [22]- [24]. In these demonstrations, the distance between the drone and the camera is short. When considering dispatching to the disaster areas, it is necessary to establish communication over several hundred meters. The equalization methods have not been studied intensely and although the long-range OCC has been reported, it is limited to communication between fixed BSs [25].
This paper proposes a drone with RGB-LED and a highspeed camera OCC system, an extension of previously reported work [9], [26]. A high-speed camera is used to increase the sampling rate, and a convolution neural network (CNN) is used to recognize the LED's location. We investigated the proposed system's feasibility by conducting field experiments at 300 m between the LED and the camera.

III. PROPOSED METHOD
This section shows the proposed method divided into 13 blocks. Fig. 2 shows the scheme's operational sequence. For the transmitter side, the bitstream is mapped three channels in parallel at block 1. At block 2, the three-channel bitstream obtains a wide bandwidth close to the zero-frequency component. The low-frequency flicker affects to the human eyes, and a frequency of more than 100 Hz is required as much as possible. Thus, we employ an 8B10B encoding scheme to suppress the lower frequency component [27]. The number of bits in the same pattern is five (00000 or 11111), and the signal's minimum frequency is expressed as, where f max is the signal's maximum frequency. For example, Fig. 3 shows the spectrum before and after 8B10B encoding.
In f max = 200 Hz, the 8B10B encoded signal includes a more than 40 Hz frequency component. Employing other encoding schemes with short bits, such as 4B5B, is useful for suppressing the lower frequency component; however, a tradeoff relationship occurs with the frequency usage efficiency. At block 3, the bitstream is on-off-keying (OOK) modulated, and RGB-LEDs flash depending on the bit patterns. Multiple drones are not allowed to overlap by extension from the camera because signal interference occurs. In addition, the drones must avoid the obstacles to ensure line-ofsight. Therefore, we need to introduce additional techniques, and we have already proposed [28]. For the receiver side, first of all, a GPS roughly obtains the drone's position information. The receiver points the camera to the location and adjusts the zoom rate. Then, the camera uses a deep learning-based object detection for fine-tuning. The camera films the RGB-LED-mounted drones at block 4. The camera's color filter divides the input light's RGB wavelength. The color filter's configuration is typically Bayer arrangement; however, its wavelength resolution is low, and the RGB-LED's linewidth is broad, causing the crosstalk between the RGB. Thus, we employ the adaptive MIMO equalization to suppress the crosstalk. The details are in block 10.
For the digital signal processing, we first describe the region-of-interest (ROI) detection corresponding to block 6. The demodulator must search and detect each LED's position from the filmed image because the LED transmitters are mounted on the drones, and their positions fluctuate over time. We employ YOLOv3 [29] as the ROI detection algorithm. The YOLOv3 is a popular algorithm with the highspeed and precise performance, and real-time performance with 20 fps of the frame rate has been reported.
After filming the video, we conduct the two digital signal processing (DSP). We obtain the drone's most precise positions using all the video frames. Furthermore, real-time ROI    detection is difficult when using a high-speed camera of >30 fps. Therefore, we reduce the frame rate when conducting ROI detection at block 5. It is possible to fail the LED detection for the frame rate adjustment because the LED flashes and no bright time exists when the simple downsampling method is employed. Here we employ the image synthesis method for LED detection. Let P be the set of pixels in an image. p(i, j, t) ∈ P is the pixel of the t-th frame. i and j are the image's vertical and horizontal pixel numbers, respectively. The downsampling rate is R d , and the pixel VOLUME 4, 2016 p ′ (i, j, t) is expressed as, (2) We extract each pixel's maximum value and obtain the images that the LEDs turn on when the signal's symbol rate is higher than the ROI detection frequency. This algorithm assumes that the drone does not move between the image frames. For example, if the frame rate is 1, 000 fps and the downsampling rate is 100, the algorithm synthesizes the images using Eq. 2, and the adjusted frame rate is 10 fps. The time between the frames is 100 ms. For 100 ms, we assume that the drone does not move immediately.
Block 6 detects the LED boundary boxes (BB). YOLOv3 outputs the BB and its label simultaneously. Furthermore, the label's posterior probability is also an output. YOLOv3 outputs many BBs when including the low posterior probability. Thus, we set a BB detection threshold when using YOLOv3. The proposed system can set a low threshold because we can remove the false-positive result using the signal's pilot sequence. The time waveform is generated from each BB's brightness. Block 7 expands the BB to increase the signal power. Let h and w be vertical and horizontal pixel lengths in the initial BB, respectively, then the n-fold expanded BB size is nh × nw. At block 8, we summarize the expanded BB's brightness and obtain the time waveform B 1 (k, t, b). Let p ′ (i, j, k, t, b) ∈ P be the pixel set in the expanded BB where k ∈ {r, g, b} is the RGB indicator and b ∈ {1, 2, · · · } is the BB indicator. The time waveform B 1 (k, t, b) is, At block 9, we employ the MAF and remove the drone's long-term fluctuation. The waveform B 2 (k, t) after the MAF is expressed as, where tap length l is an odd number. Next, we suppress the RGB crosstalk at block 10. The channel matrix H is given by, where q(k, t) is the LED's brightness, and p(k, t) is each pixel's brightness. For simplicity, we omitted the pixel position and the BB indicator. The LED spectrum's emission and the camera's photosensitivity determine the RGB crosstalk. The proposed system must estimate the channel matrix H because the transmitter and receiver cannot share the information in advance. The proposed system communicates using the OOK binary signal; therefore, we employ a CMA on an adaptive filter for the channel estimation. Fig. 4 shows   if Count < 4 then 10: end if 12: end if the configuration of the MIMO equalizer. The tap matrix is expressed by, The equalizer updates the tap coefficient based on the CMA. When W −1 H ≃ 1, then the RGB crosstalk can be suppressed. The equalizer has the possibility of causing the noise emphasis; however, the OCC system can ignore the issue because the SNR is high enough. After the equalization, the bit decision is performed, and we obtain the binary signal. It is difficult to synchronize the camera and transmitter's clock rate; a sync error occurs. We applied the simple symbol sync (Algorithm 1). We cannot adjust the clock error even when employing Algorithm 1 under a high bit error rate (BER). Meanwhile the OCC system extracts the ROI; therefore, the ROI detection requires a high signal-to-noise power ratio (SNR). Thus, Algorithm 1 can be applied to the common OCC case. After symbol synchronization, we conduct the down sampling and obtain the bitstream.

A. SETUP
This paper evaluated four items. The first item is the altitude of an RGB-LED-mounted drone. The signal quality depends on background noise. We set the altitude to 6 m and 15 m because the drone's background was trees in the 6-m case and sky in the 15-m case. The background noise in the 15-m case is larger than that in the 6-m case. The second item is the distance between the drone and a camera and the distance was set to 100 m, 200 m, and 300 m, which is the maximum distance measured in this park. The longer the distance, the smaller the drone's pixels on the image. The third item is reducing the frame rate. The proposed scheme decreased the frame rate when detecting the ROI, and the camera filmed the drone with 600 fps. We reduced the frame rate of 600 fps to 1 or 10 fps, and the reduction rate was 1/600 and 1/60. The lower the frame rate, the worse the signal quality. The fourth item is the BB expansion rate to increase the signal pixels, and we changed the BB expansion rate n from 1 to 10. Fig. 5 shows the experimental setup. The experimental field was the east plaza on the expo'70 commemorative park in Osaka, Japan. We ensured the line-of-sight (LOS) environment. Fig. 6 shows the RGB-LED-mounted drone used, which was a DJI Mavic 2 Pro. The LED was SST-10-DR as red, SST-10-G as green, and SST-10-B as blue, released by LUMINUS, Inc. We adjusted the intensity of the RGB-LED by changing the resistance: 47 Ω as red, 10 Ω as green, and 10 Ω as blue. We used Arduino UNO as the signal modulator. Arduino UNO generated a three-channel 2 13 − 1 pseudorandom binary sequence (PRBS) and conducted 8B10B encoding, and the symbol rate was set to 200 bps/channel. This experiment used one drone. The practical case is expected to increase the number of drones. In such a case, our proposed received signal processing algorithm can be employed. As other related work, we have proposed drone detection method (see [9], [26], [28]). The receiver's camera and lens were STC-MCS43U3V and 3Z4S-LE-SV-10035V released by OMRON Corporation. Table 1 shows the parameters of the camera. We employed YOLOv3 as the ROI detection. For the learning process to detect the RGB-LED when using YOLOv3, we prepared 2,000 photographs  of the RGB-LED-mounted drone as training data. The CMA's step size for suppressing the interference was set to 0.003. After the symbol synchronization, we measured the BER. We experimented five times and obtained 50,000 bits. Fig. 7 shows the histogram of the confidence score of BB obtained by YOLOv3 in the RGB-LED (object) detection period. All the experimental results output a 90% average score. Furthermore, the mode reached approximately 100%. The disadvantage of YOLOv3 is to detect incorrect data as the correct object. Meanwhile, the proposed system conducts the handshake in the link-up period after the object detection period; therefore, the disadvantage of the YOLOv3 can be eliminated. In addition, the YOLOv3 must detect the RGB-LED once at least during the object detection period and has sufficient accuracy in the proposed system from the experimental results.  Fig. 8 shows the received waveforms in the 100-m distance, 15-m altitude, the frame rate reduction of 1/60, and the BB expansion rate n = 1. Fig. 8 (a) shows each channel's waveforms and median without equalization, and Fig. 8 (b) shows the waveforms after the MAF process. The long-term amplitude variation and the RGB crosstalk were removed. Fig. 8 (c) shows the waveforms after removing the crosstalk.

C. BIT ERROR RATE TEST
We measured the BER test because it is difficult to identify changes in the waveform between (b) and (c). Fig. 9 and 10 show the BER results before and after MIMO equalization to suppress the RGB crosstalk. The dashed line in the graphs is the forward error correction (FEC) threshold at 1.0 × 10 −3 , which is the limit of the error-free operation. MIMO equalization improved the BER results. As a comparative work, B. Chhaglani, et al. [23] proposed the OCC system with the drone. The camera at the drone films the light signal transmitted by the LED panel located on the surface. The concept of the OCC is similar to our proposed system; however, the related work has not introduced the digital signal processing functions such as the MAF and MIMO equalizer. That is, the result of the related work is expected to be worse than that of Fig. 9. For the frame rate reduction, the reduction rate results of 1/600 and 1/60 achieved an error-free operation in most cases. The 1/60 rate was a better result than 1/600. The change in the drone position cannot be ignored in the 1/600 case. A frame rate of 10 fps is needed. The YOLOv3 can be operated in real-time with a 20-fps frame rate. Thus, the 10-fps operation is allowed in this system.
For the BB expansion, BER in the 15-m altitude case occurred when the BB was increased because of the background noise. The BER in the 6-m altitude case was the best in the n = 2 BB expansion because the signal power increased due RGB OOKmodulation (10,  (c) Removal of the crosstalk. to the expansion. Furthermore, when n > 2, the background noise is larger than the signal, and the BER deteriorates.

V. CONCLUSION
This paper studied the OCC system using an RGB-LEDmounted drone and high-speed camera aiming to the disaster rapid recovery. We employed a CNN-based object detection algorithm and blind equalizer on the DSP at the receiver. The experimental results indicated the feasibility of the LED detection and the equalizer. Moreover, we achieved up to 300-m distance RGB multiplexed transmission. Owing to the image synthesis, the OCC system conducted an error-free operation even when reducing the frame rate from 600 fps to 1 and 10 fps at the LED detection. Our proposed scheme reduced the drone detection frequency from 600 fps to 10 fps. Meanwhile, we have to study the separation technique from several hundreds fps video to each image as a further study.