Sub-Decimeter VLC 3D Indoor Localization With Handover Probability Analysis

This paper addresses the problem of Visible Light Communication (VLC)-based indoor localization and handover, where mobile users communicate with hybrid VLC/mmWave Access Points (APs). The VLC system consists of multiple Light-Emitting Diodes (LEDs) treated as VLC transmitters, multiple Photodiodes (PDs) on the user’s smart device, and multiple mmWave Radio Frequency (RF) transmitters used as complementary APs for the VLC system in the case of blockage. We propose a Convolutional Neural Network (CNN)-based algorithm consisting of offline and online modes. In the offline mode, we gather a data set by dividing the environment into fixed-sized elements where the received VLC signal along with the data attained from the smart device at each element represent a sample to train a CNN model for indoor localization. In the online mode, users employ the received VLC signals to estimate their locations. We then propose a virtual soft handover process according to the Coordinated Multiple Point (CoMP) transmission, where the HandOver Margin (HOM) and Waiting Time (WT) are dynamically set based on the change in Signal-to-Noise-Ratio (SNR) values in consecutive time slots. We derive a closed-form expression for the average effective throughput during the handover process, which shows the algorithm’s superior performance compared to conventional soft and hard handovers. Simulation results show an average positioning error of 4.31 centimeters for the proposed localization algorithm in a $5\times 4\times 3\,\,\text {m}^{3}$ smart environment.


I. INTRODUCTION
Highly precise and robust user localization has become an inseparable part of 6G and beyond wireless networks to support various location-based services in many emerging applications such as smart homes and automatic factories [1]. Although the Global Positioning System (GPS) is widely used for outdoor applications such as road transport services, it is not able to provide sub-meter level accuracy in an indoor environment as satellite signals are scattered and attenuated by indoor roofs and walls [2]. In the past decade, Radio Frequency (RF) systems such as WiFi have been widely used for indoor localization, by employing high-level software and hardware resources [3]. However, RF systems suffer from low energy efficiency and certain weakness in security, and The associate editor coordinating the review of this manuscript and approving it for publication was Barbara Masini . cannot solely satisfy the Quality of Service (QoS) in the next generation of wireless networks due to the wide employment of Internet of Things (IoT)-based mobile devices (IMDs) for smart indoor services.
Visible Light Communication (VLC) has emerged as a promising technology that is employed as a complementary technique for RF systems in indoor localization. VLC yields more benefits in comparison with the RF technology, such as higher data rate, tremendous bandwidth, and higher security with fewer health risks [4]. Besides, it is more energy-efficient than RF and can be implemented using the commonly-used Light-Emitting Diodes (LEDs) as transmitters and Photo Diodes (PDs) as receivers [5]. The combination of VLC technology with millimeter-Wave (mmWave), known as hybrid VLC/mmWave, provides shortrange communications with a high level of secrecy rate for serving IMDs. However, mobile users need to maintain their connectivity during their movements between the coverage area of VLC cells in an indoor environment [6]. In this regard, two types of handover, namely Horizontal HandOver (HHO) and Vertical HandOver (VHO), should be utilized in hybrid VLC/mmWave indoor area, where HHO refers to a handover between two consecutive VLC transmitters, while VHO refers to the handover between a VLC transmitter and an RF transmitter.
Another practical challenge in such hybrid VLC/mmWave indoor networks is that the Received Signal Strength (RSS) may either change rapidly due to the ping-pong handover or reduce significantly in the receiving end due to user mobility, path loss, and reflection, resulting in a high detection error. To tackle this issue, channel coding is regarded as a popular solution that can control the error at the receiver. Although fixed-rate channel coding can guarantee a reliable transmission in traditional wireless systems with slight changes in channel conditions, it does not work properly in channels with unstable conditions that happen more in VLC indoor service applications. In this situation, rateless fountain codes are applied to adjust the channel code rate on the fly based on the channel conditions to provide reliable transmissions. They have exhibited strong capabilities in reaching the above targets over various channel conditions in indoor environments. The fountain encoding process is extremely simple and has a tight power budget.
This paper addresses the aforementioned challenges by proposing a Convolutional Neural Network (CNN)-based localization algorithm for an indoor environment in the presence of multiple VLC transmitters and the fountain code. The CNN model takes the Line of Sight (LoS) and Time of Arrival (ToA) of the received VLC signal as well as the data obtained from the gyroscope sensors as inputs to estimate the 3 Dimensional (3D) location of the user in the output. The proposed algorithm utilizes the estimated location of users to maintain their connection. Moreover, a virtual soft handover algorithm based on the Coordinated Multiple Point (CoMP) transmission is proposed to prevent the ping-pong effect during the handover process.

A. RELATED WORKS
Motivated by the literature, we review the recent works in the use of VLC technology for indoor localization and the handover process in VLC systems as follows.

1) INDOOR LOCALIZATION IN VLC SYSTEMS
Generally, existing VLC indoor localization techniques in the literature are categorized into three groups: i) triangulation [7]- [10], ii) vision analysis [11]- [14], and iii) fingerprinting [6], [15]- [17]. The triangulation localization techniques employ the geometric properties of triangles to locate the user. These techniques are divided into two classes of angulation and lateration. Angulation techniques, such as Angle Difference of Arrival (ADoA) [7] and Angle of Arrival (AoA) [8], measure the angles between the user and multiple VLC transmitters to estimate the location of the user. The lateration techniques, such as RSS and Time of Arrival (ToA), measure the distance between the user and multiple VLC transmitters to calculate the location of the user. It was shown in [10] that a combination of these techniques can improve the accuracy of VLC-based indoor localization.
Vision-based indoor localization algorithms utilize the camera on other smart devices to find the relationship between the 3D position of objects (e.g., the LEDs) in the real world and their 2D projection on the image sensor. Authors in [14] proposed a hybrid RSS/AoA algorithm to estimate the position of the user by employing image processing and gyroscope measurements in an VLC-based indoor environment with a single LED.
Fingerprinting methods include offline and online stages for indoor localization. In the offline mode, information of the received signal, such as RSS and AoA, is collected in a data set. Then, in the online stage, the user is localized by comparing the measured data and the collected data set using different techniques, such as k-Nearest Neighbor (kNN) and probabilistic methods. Authors in [15] used a fusion of multiple classifiers to estimate the position of the user using RSS fingerprints in a VLC-based indoor environment. Reference [16] used principal component analysis to extract important features from the flicker frequency spectra. Then, it utilized kNN clustering and Artificial Neural Network (ANN) to identify the light source from its flickers. Authors in [18] applied two machine learning techniques, namely secondorder linear regression and Kernel Ridge Regression Machine Learning (KRRML) with sigmoid function data preprocessing for VLC-based indoor localization. Additionally, [19], [20] employ Weighted K-Nearest Neighbor (WKNN) to estimate the location of the receiver in a VLC indoor environment, while [21] uses extreme machine learning for visible light positioning.
To the best of our knowledge, VLC-based triangulation techniques can reach a high positioning accuracy by employing at least two VLCs in an indoor environment at the cost of complex geometric analysis. In addition, vision-based localization algorithms have a simple mechanism and can be implemented on any camera-equipped smart device. However, it cannot be applicable in smart devices without a camera, like a smartwatch. Moreover, frequent use of camera is energy consuming for smart devices. Recently, Deep Learning (DL) algorithms are widely used in VLC systems, particularly, for smart indoor localization purposes. It is demonstrated in [22] that applying deep learning in VLC indoor positioning techniques not only improves the accuracy of the positioning but also reduces the computational complexity of indoor localization compared to conventional schemes since deep learning techniques learn from a series of samples labeled with their 2D/3D locations to estimate a new sample, while the conventional techniques, such as triangulation algorithms, employ complicated geometric computations for user positioning.

2) HANDOVER PROCESS IN VLC INDOOR SERVICES
Different handover management techniques have been studied in the literature including i) applying the resource allocations and placements of VLC transmitters [23]- [28], ii) setting a Waiting Time (WT) before the handover process [29]- [32], and iii) setting a threshold value for HandOver Margin (HOM) [33], [34]. The authors in [27], [28] employed Multipath-Transmission Control Protocol (MPTCP) for vehicular VLC to tackle the handover problem. The authors in [23], [24] determined the steerable and directional VLC devices' placements such that the overlapped area between adjacent transmitters becomes large enough for successful handovers. In [25], [26], a centralized controller adopted the soft handover process among different VLC transmitters by determining the subcarrier assignment to users subject to QoS constraints. The authors in [30] examined the vertical handover in hybrid VLC/RF systems as a Markov decision process aiming to optimize the Quality of Experience (QoE) and time to trigger the handover. The authors in [29] aimed to optimize the waiting time before the vertical handover process based on the rate of blocking and the recovery of VLC optical wireless channels. Authors in [31], [32] employ reinforcement learning to optimize the WT before the handover process. References [33], [34] derived a mathematical expression related to QoS parameters of users in a VLC network. Then, a threshold value for the performance degradation of VLC was computed to determine triggering the vertical handover between VLC and WiFi networks.
To the best of our knowledge, there is no prior research on setting the WT before the handover process, HoM, and average effective throughput analysis in a VLC network. This paper addresses this gap by proposing a CoMP-based soft handover algorithm to jointly set the WT and HOM while applying the rateless fountain code for reliable transmission. Moreover, unlike existing researches, the average effective throughput is derived during the handover process.

B. MAIN CONTRIBUTIONS
In this paper, we consider a hybrid VLC/mmWave system in an indoor environment with multiple VLC transmitters located on the ceiling. We assume that each user is carrying a smart device (e.g., smartphone or smartwatch) equipped with VLC receivers, i.e., PDs, along with inertial sensors, such as accelerometers and gyroscopes. We propose a DL-based algorithm for user localization by processing the received VLC signals and the data obtained from the sensors on the smart device. Besides, we analyze the handover process to maximize the average effective throughput. The key contributions of this paper are summarized as follows.
• We propose a CNN-based indoor localization algorithm containing two stages. In the first stage (offline mode), the environment is divided into several elements with equal sizes. The VLC signal characteristics in the elements are measured and stored in a data set. Then, the data set is used as the training set of a CNN model that estimates the 3D coordinates of the user. In the second stage (online mode), the user uses the received VLC signal and the data obtained from the gyroscope sensor available on the smart device as the inputs of the CNN model to estimate its 3D location. The Central Control Management (CCM) is responsible for controlling the transmitted signal from VLC sources, their encoding process by the fountain code, and the handover process during the movement of users.
• We handle the handover process by employing a virtual soft handover algorithm according to the CoMP transmission. Applying the indoor localization algorithm, a Handover VLC List (HVL) is formed, and the VLC transmitter with the minimum horizontal distance from the user is chosen as the serving transmitter. When the user moves close to the cell edge, the handover process is invoked. Then, the HVL is updated, and the cooperating VLC transmitter is determined according to the updated HVL. In the second stage, HOM and WT are dynamically tuned based on the alternation of Signalto-Noise-Ratio (SNR) values related to the serving and cooperating VLC transmitters in consecutive time slots.
• Different from existing researches, the closed-form expressions for the average effective throughput are derived for two cases in the VLC-based indoor environment, and each case for transmission is determined. More precisely, the user dynamically switches between two cases during the handover process to maximize the average effective throughput.
• We run some simulations in a smart environment (e.g., a smart living room) for different element sizes. The results show that the accuracy of the CNN model increases as the element size is reduced. In addition, the proposed localization algorithm reaches the average positioning error of 4.31 centimeters over the test set for element size 5 × 5 × 5 cm 3 , where more than 95.7% of test samples have an error less than 10 centimeters.
• Finally, we evaluate the performance of handover algorithm in terms of average effective throughput and Bit Error Rate (BER). It is shown that the proposed scheme outperforms the conventional soft and hard handovers in terms of the mentioned performance metrics. Moreover, it is depicted that increasing the users velocity leads to increasing HOM and decreasing WT.

C. ARTICLE OUTLINE
The rest of the paper is structured as follows. Section II presents the smart environment system model and assumptions. Section III, contains the proposed VLC-based indoor localization algorithm. Then, in Section IV, the algorithm for the CoMP-based soft handover process is proposed to maximize the average effective throughput. In Section V, the simulation results indicate the validity of the proposed algorithm. Finally, we summarize our findings and some conclusions in Section VI.

II. SMART INDOOR ENVIRONMENT MODEL
In this work, we consider a hybrid VLC/mmWave indoor environment like a smart living room with the size A × B × C, where C indicates the height. The indoor area is equipped with N T VLC transmitters, indexed by where each transmitter T v i ∈ T v is positioned at coordinates L v i , and has a layout of L LEDs located on the ceiling at height C. LEDs are used for both illumination and transmission purposes and they are equipped with VLC modules. We also consider a mmWave RF transmitter used as a complementary Access Point (AP) for the VLC system in the case of blockage. Based on the position of VLC lights, we divide the indoor area into N T regions, indexed by the set Fig. 1). In addition, we consider N U Internet of Things Mobile Devices (IMDs) moving on the floor, indexed by the set U = {u 1 , . . . , u N U }, where each IMD is equipped with an array of P PDs. Moreover, the gyroscope sensor on the IMD measures the rotation angles over the three axis, i.e., ω u m , ρ u m , δ u m (see Fig. 1). In the proposed system model, the output signals of LEDs and the RF link along with fountain encoding process are controlled by a Central Control Management (CCM) unit. The IMDs can receive their signals from either VLC transmitters or mmWave RF AP although they prefer to communicate with VLC transmitters due to their higher data rates and energy efficiency. More precisely, IMDs connect to the RF AP if the corresponding VLC channel is blocked in downlink transmission. For the uplink transmission, IMDs take the advantage of the RF AP to access the CCM.
The received VLC signal at IMD u m , 1 ≤ m ≤ N U , located at coordinates L u m = (x u m , y u m , z u m ), is obtained as where T v n * refers to the current transmitter serving IMD u m , and x v n * ∈ C L×1 denotes the pre-coded signal vector related to the transmitter index v n * . In the proposed network model, we denote s v n * ∈ C L×1 as the signal transmitted from desired VLC transmitter where each entry of s v n * is composed of K symbols. Then, the pre-coded signal x v n * is obtained after applying fountain encoder, and each entry of x v n * is composed of M symbols. In addition, I u m in (1) is the set of possible interference terms caused by other VLC transmitters when IMD u m is located in the overlapped area of VLC cells. We utilize the Alamouti code and maximum likelihood detection technique to decouple the signal transmitted from several VLCs without applying a complicated decoding technique. In addition, H T vn ,u m ∈ C P×L is the channel matrix between VLC transmitter T v n and IMD u m . Finally, n u m ∈ C P×1 is the additive noise vector at the receiver u m , where each component of n u m is modeled as a zero-mean Gaussian noise consisting of three independent terms namely, shot, thermal, and background noise. Therefore, the variance of the noise is obtained as follows: where the variance of the shot noise would be as σ 2 shot = 2qRP VLC , where q, R, and P VLC indicate the electric charge, photodiode responsivity, and transmitter power, respectively. In addition, the variance of the background noise, denoted VOLUME 9, 2021 by σ 2 bg , and the thermal noise variance, σ 2 thermal , are given by where I bg , K , T k , and R F are the background current of the PD, background light power, Boltzmann's constant, absolute temperature, and receiver's resistance.
There are two types of VLC links between the receivers and transmitters: i) direct links, or equivalently, Line of Sight (LoS), where the signal is received directly from LEDs without any reflection, and ii) reflection links, in which signals are received after several bounces. Thus, the channel impulse response between j th LED of the transmitter T v n and i th PD of the IMD u m can be written as follows: The DC channel gain of the direct VLC link between j th LED of transmitter T v n and i th PD of IMD u m is obtained as follows [35]: where w = − ln 2 ln cos(θ 1/2 ) is the Lambertian factor where θ 1/2 denotes the semiangle at half power for LEDs, A r is the area of the receiver, θ im,jn and φ im,jn represent the irradiance and incident angles, respectively (see Fig. 1). In addition, d im,jn refers to the Euclidean distance between j th LED of the transmitter T v n and i th PD of the user u m . Denoting c and n r as the receiver Field of Vision (FoV) semiangle and the refractive index, respectively, the gain of the optical concentrator, g(φ im,jn ), is obtained as follows: Since a reflection link experiencing N bounces is modeled as a sequence of N direct links, the DC channel gain for a reflection link is calculated as represents the path loss of z th bounce. We assume that all LEDs of each transmitter send the same signal. As mentioned in (1), we apply the space-time block code (Tarokh's scheme) to remove the interference caused by other VLC transmitters. In the special case with two VLC transmitters, the encoder makes a block from symbols and maps them to two VLC transmitters based on the following matrix form expression: Then, each entry of the above matrix is transmitted through all LEDs of the same transmitter. Accordingly the received signal at the receiver is achieved as: where r 1 u m and r 2 u m are the received signals during two consecutive time-slots, while we assume that channel matrix coefficients are constant during transmitting two signals. Applying the encoding technique, we can decouple the signal transmitted from each transmitter as follows: Finally, we utilize the maximum likelihood detection at the receiver to detect the signal transmitted from the VLC transmitter. According to the above scheme and in the presence of other interfere transceivers, the received SNR at user u m is calculated as whereγ is the average SNR received at the transmitter. Table 1 indicates all notations utilized in the paper.

III. PROPOSED CNN-BASED INDOOR LOCALIZATION ALGORITHM
In this section, we develop a DL-based fingerprinting algorithm for user localization by employing the received VLC signals and the data attained from available sensors on the smart devices. The proposed algorithm contains offline and online modes. The offline mode comprises two steps, namely data acquisition and training. In the data acquisition step, we generate a training data set by dividing the environment into 3D elements of the same size. The received VLC signal characteristics at the center of each element with the gyroscope data from the smart devices are stored as a training sample. In the training step, the collected data set is used for training a CNN model that estimates the 3D location by taking the mentioned features of the data set as its inputs. The main reason for using CNN is that it can effectively extract the hidden structure and inherent features of the data.
In the online mode, the smart device uses the received VLC signal characteristics and the gyroscope data as the inputs of the CNN model for localization. Moreover, in case of VLC blockage, the inertial sensors available on the smart device are used to estimate the location of the user according to its movement and the previous accurate location.

A. OFFLINE MODE DESIGN
The offline mode design is summarized in two steps, namely data acquisition and training, as follows: : k ← k + 1 10: end for 11: end for 12: end for 13: for k = 1, . . . , N s do 14: for i,jn according to (15). 22: Calculate L (k) i,jn according to (6). 23: N s as the number of training samples, each sample x (k) in the feature set X = [x (1) , . . . , x (N s ) ] T contains the LoS channel gain and the ToA between the VLC transmitters and the receiver as well as the angular position of the smart device. In addition, Y = [y (1) , . . . , y (N s . . , N s , is the 3D coordinates of the center of k th sample. It is worth to mention that we assume a noise-free environment when generating the training data, while, we test the algorithm with noisy inputs. In order to provide a comprehensive data set which improves the generalization of the proposed algorithm, we take samples from different parts of the environment. Towards this goal, we partition the indoor environment into N s cubic elements with equal size of d x × d y × d z (e.g., 20 cm × 20 cm × 20 cm), where d x , d y , and d z denote the length of each element in the corresponding axis (see lines 1-12 of Algorithm 1). It should be noted that these elements are different from R i ∈ R region that is covered by i th VLC transmitter.
Then, the Euclidean distance between the center of the transmitter and the center of k th element, denoted by d T vn , is calculated (see line 15). Next, the irradiance angle between sample k and each transmitter T v n , denoted by θ Denoting δ (k) , ρ (k) , and ω (k) as the rotation angles of k th sample obtained from the gyroscope sensor on the smart devices, the normal vector of k th sample, denoted by n (k) , is obtained in (14), as shown at the bottom of the page. Thus, the incident angle is obtained as follows: where v (k) T vn = L (k) − L T vn , and L (k) represents the location of k th sample. Throughout this subsection, we omit subscript m from the channel gain described in (6), and denote L (k) i,jn as the gain of the LoS link between the i th PD of sample k and j th LED from transmitter T v n . Accordingly, if sample k lies in the coverage area of transmitter T v n , i.e., φ (k) T vn ≤ c , the Euclidean distance between the LEDs of transmitter T v n and PDs of a receiver located at the center of k th element, denoted by d (k) i,jn is calculated. Moreover, the corresponding irradiance and incident angles, θ is obtained according to (6) and stored in x (k) . Additionally, the ToA between j th LED of transmitter T v n and i th PD of a receiver located at the center of k th element, denoted by t (k) i,jn , is calculated as follows: where c = 3 × 10 8 m/s denotes the speed of light. Finally, the feature set, X, and labels, Y , form a set of inputoutput pairs that are stored in D as the training data set (lines 29-31 of Algorithm 1).

2) TRAINING (METHODOLOGY)
The proposed algorithm proceeds with training a CNN model as shown in Fig. 2. The model takes x (k) as inputs and delivers the estimated 3D position, (x (k) ,ỹ (k) ,z (k) ), in the output. The first step in the proposed indoor localization model is the normalization process, which is considered an important step in training DL models, whose main purpose is to change the values of features to a common scale resulting in faster convergence. In this paper, we use the z-score normalization for each feature, denoted by x (k) q , as follows: where µ q and σ q denote the mean and standard deviation of q th feature in data set D, respectively. While the features corresponding to the received VLC signals are used as the input of the CNN, the data obtained from the gyroscope sensor on the smart device is used as the input of the Fully-Connected which also reduce the dimension of the data. While each sample of x (k) , k = 1, . . . , N s , is reshaped to form a 1×L ×2 sample, the angular position is used as a separate input for FC layers. In other words, the output of the convolutional layers is flattened, and then, it is concatenated with the data obtained from the gyroscope sensor on the smart device (i.e., δ (k) , ρ (k) , ω (k) ). Finally, the resulting vector is used as inputs of FC layers which deliver the estimated location (x (k) ,ỹ (k) ,z (k) ).
It is worth mentioning that the training procedure requires a considerable amount of energy, and cannot be implemented in smart devices, therefore, the offline stage is performed in the CCM unit instead of the user equipment. When the user enters the smart indoor area, the architecture of the CNN model along with the trained weights are transmitted to the user through VLC links.

B. ONLINE MODE DESIGN
In the online mode design, the user utilizes the trained CNN to estimate its absolute position using the received VLC signal and the data obtained from smart device sensors as inputs. In case of VLC blockage, the inertial data of the smart device is used to estimate the user's movement. The accelerometer in smart device is able to measure the acceleration and direction of the user's movement. Using the measured acceleration of user u m , denoted by α u m , the number of steps, represented by n step u m , as well as the direction of each step, denoted by g i u m , the user's movement can be estimated. The size of i th step, represented by i u m , can be estimated as a function of the acceleration at i th step as i u m = g(α u m ). Therefore, the position of user u m after i th step from initial position L u m can be estimated as follows: where L (i) u m is the position of the user at i th step.

IV. PROPOSED HANDOVER ALGORITHM BASED ON INDOOR POSITIONING IN HYBRID VLC/MMWAVE
In this section, we aim to apply the proposed indoor localization algorithm introduced in Section III to identify the nearest VLC transmitter to the user during the handover process. Toward this goal, we first form a Handover VLC List, denoted by HVL u m , which is composed of VLC candidates providing services for the user u m . In other words, the VLC T v n ∈ T in HVL u m is selected if the user u m is located in the communication range of VLC T v n , expressed as: where t refers to the updated iteration and T is a threshold value related to the communication range of each VLC.
In addition, d h T vn ,u m is the horizontal distance between VLC transmitter T v n and user u m that indicates user u m is located in the region R n related to transmitter T v n . In this situation, the transmitter with the minimum d h T vn ,u m in the set HVL u m (t) is selected as the current serving VLC transmitter, denoted by T s v , expressed as: Whenever the handover process is invoked during the user's movement, we update HVL u m , and pick another transmitter with the minimum d h T vn ,u m as the cooperative VLC transmitter, denoted by T c v , where T c v = T s v . In the second stage, HOM and WT are dynamically set based on the change in the SNR value at the receiver during two consecutive time slots to avoid successive handovers. The proposed handover algorithm is summarized in Algorithm 2, which includes two types of handovers: hard handover and CoMP-based soft handover. In the first case, the current serving VLC transmitter terminates its communication with the user, and an adjacent transmitter, known as cooperative VLC transmitter, communicates with the same user whenever the handover is triggered. In the second case, the user connects to the cooperative VLC transmitter, while its connection with the current serving VLC transmitter is still active. Denoting t CoMPHO and t HardHO as the waiting time before the hard handover and CoMP-based soft handover, respectively, the proposed handover process in Algorithm 2 is described as the following four steps: • Step 2 (Serving VLC selection): In this step, we determine the serving VLC transmitter by forming the set HVL u m as in (19). Then, the serving VLC transmitter is chosen according to (20) (See lines 4-6).

• Step 3 (Cooperating VLC selection):
Whenever the handover is invoked, the set HVL u m (t) is updated. Then, the VLC transmitter in set HVL u m (t) with the minimum d h T vn ,u m is chosen as the cooperating VLC transmitter (see lines 7-10).
, and the inverse of γ t , respectively. The higher the value of γ t is, the higher the HOM value and the lower the WT value should be set to minimize unnecessary handovers (See lines [11][12]. Finally, he handover type is chosen based on the following conditions: i) the hard handover is performed if γ T c v ≥ γ T s v +HOM and t HardHO ≥ WT, ii) the CoMP-based soft handover is performed if γ T c v ≥ γ T s v + HOM, t HardHO < WT, and t CoMPHO ≥ WT or γ T c v + HOM ≥ γ T s v and t CoMPHO ≥ WT , otherwise, the user connects to the mmWave RF transmitter.
• Output: We can determine the type of handover (hard handover or CoMP-based soft handover) and the VLCs Algorithm 2 Pseudocode of the Proposed Handover Algorithm Step 1 (Initialization): 1: t ← 0, t CoMPHO (t) ← 0, and t HardHO (t) ← 0 2: Set a ← 0.1 and b ← 0.1 Step 2 (Serving VLC Selection): Step 3 (Cooperating VLC selection): 6: for each time slot t do 7: Update HVL u m (t), 8: Step 4 (Handover type selection): 10: Set 13: if t HardHO (t) ≥ WT(t) then 14: t CoMPHO (t) ← t CoMPHO (t) + 1 28: end if 29: else 30: connect to the mmWave RF transmitter 31: end if 32: end for to which the user connects whenever the handover process is invoked. In addition, the user connects to the mmWave RF transmitter if the VLC channels are not accessible. Moreover, the threshold values for HOM and WT are dynamically set in each time slot. By doing so, the ping-pong effect caused by the successive hangovers is prevented.
To analyze the performance of the proposed handover algorithm based on the VLC indoor positioning, we derive a closed-form expression for Average Effective Throughput (AET), and then we investigate the dynamical adjustment of HOM and WT to prevent the ping-pong effect caused by the user's speed during its movement. Finally, the computation complexity of the proposed handover algorithm is calculated.

A. AVERAGE EFFECTIVE THROUGHPUT ANALYSIS
To analyze the performance of the proposed handover algorithm, we derive a closed-form expression for the Average Effective Throughput (AET). AET is defined as the expectation of conditional probability of the effective number of symbols transmitted successfully given packet success rate probability, computed as c × r c , multiplied by packet success rate probability, denoted by P ( ) suc = 1 − P e , where P e represents the Packet Error Rate (PER). Accordingly, AET is expressed as: where superscript ( ) refers to the type of the handover process during the user movement, i.e., = 1 refers to the hard handover, while = 2 shows the CoMP-based soft handover. Further, c = log 2 (M ) and r c represent the number of transmitted symbols and the code rate, respectively. In addition, N A represents the average number of packet retransmissions. Note that the users send 1 bit Acknowledgment (ACK) whenever they successfully receive their requested packets, otherwise, they send Negative ACK (NACK) to the VLC transmitter. In case of NACK, the VLC transmitter retransmits the missing packet to the user. Employing fountain code paves the way for the transmission, meaning that the fountain decoder has the ability to recover the block of K encoded data packets after receiving the block of an encoded packet with size M , M > K . Finally, the user sends an ACK when the whole block of packets is received correctly. In the following two steps, we determine the parameters involved in computing the average effective throughput. Then, in the third step, we derive a closed-form expression for the average effective throughput.

1) PACKET ERROR RATE
Generally, the PER of the link between transmitter T s v and user u m , denoted by PER T s v ,u m , is proportional to Q function, i.e., νQ( √ σ γ ) with constant values ν and σ . However, there is not a closed-form expression for PER in coded modulations, thus, we utilize the upper bound for PER T s v ,u m , expressed as: where γ T s v ,u m is obtained as in (12). In addition, γ th = 1 σ ln (ν) refers to the threshold SNR value. Next, the PER related two types of handover is obtained.
Case 1: In the first type of handover, the user connects to only one VLC transmitter when the handover is triggered. In this case, the PER is computed as Case 2: In this case, the user connects to two VLC transmitters simultaneously. In other words, the user is served by the serving VLC transmitter T s v , while the cooperating VLC transmitter T c v collaborates in transmitting the signal to the user, and the PER is computed as follows: In fact, in the second case, the error occurs whenever neither T s v nor T c v transmit the signal.

2) CODING MODE SELECTION
We describe the coding mode selection consisting of code rate and modulation in two handover cases: Case 1: In this case, we utilize adaptive coding mode selection to increase the average effective throughput. Each mode (i) is selected if the following inequality is satisfied: where N mode refers to the number of coding modes. In addition, γ (j) th is considered as an ascending order so that the inequality γ = ∞ indicates that the coding mode is chosen for each value of γ ∈ [γ (1) th , +∞). Case 2: In this case, the coding mode is selected such that the average effective throughput η (2) (i), defined in (29), is maximized, i.e.,

3) EFFECTIVE THROUGHPUT DERIVATION
Recalling (6), we rewrite the channel gain of direct link between j th LED of the transmitter T v n and i th PD of the IMD u m as follows: where , in which H d and r d denote the vertical and horizontal distances between j th LED of transmitter T v n and m th user, respectively. In fact, the user only changes the horizontal direction of its smart device during its movement. Moreover, we utilize the Random Way Point (RWP) model for the 2D user's movement, expressed as follows: where f r d (r) is the probability distribution function (pdf) of the user's movement in 2D placement. Generally, for such a model, n = 3, ϒ = 1 73 × [324, −420, 94], and β = [1, 3,5] are considered [36]. Substituting (7) in (27), L i m ,j n is rewritten as follows: where To calculate the pdf of the L i m ,j n , it is more straightforward to compute its Cumulative Distribution Function (CDF) based on (29), expressed as (30), as shown at the bottom of the next page, where (a) comes from the fact that the cosine function is a monotonically decreasing function on the interval (0, π). Then, Pr{L i m ,j n ≤ } is re-expressed as (31), as shown at the bottom of the next page, where (a) comes from the formula P(X ∩ Y ) = P(X )P(Y ) for two independent random variables X and Y , and u( ) refers to the unit step function. Using the fact that Pr{X ≥ x} = 1 − F X (x) and Pr{a ≤ X ≤ b} = F X (b) − F X (a), the CDF expression of F L im,jn ( ) = Pr{L i m ,j n ≤ )} is re-written as (32), as shown at the bottom of the next page. In the next step, we take derivative from the CDF of L i m ,j n with respect to to compute its pdf, denoted by f L im,jn ( ). Using the formula ∂ ∂x F X ( w+3 , the pdf f L im,jn ( ) is re-expressed as follows: where It is very complex to consider all LEDs of a VLC transmitter and all PDs of a user in a VLC system to compute γ i m ,j n . Therefore, we assume that the VLC system contains one LED as a VLC transmitter and one PD in the receiver, which is a practical assumption due the short distance between LEDs of a transmitter or between PDs of a receiver. Accordingly, the pdf distribution of γ i m ,j n =γ 2 is computed as (35), as shown at the bottom of the next page, r . Finally, we obtain the average effective throughput as the expectation of conditional probability of the effective number of signals transmitted successfully given the pdf of γ i m ,j n , expressed as (36), as shown at the bottom of the next page, where (a) comes from the formula η ( ) = E c×r c ×P ( ) suc N A , ∈ {1, 2}. Then η (1) is re-expressed as (37), as shown at the bottom of the next page, where (s, x) is an incomplete gamma function, defined VOLUME 9, 2021 as (s, x) = +∞ x t s−1 e −t dt. With a similar argument as mentioned above, η (2) is computed by Wolfram|Alpha as (38), as shown at the bottom of the next page, where i * = arg max 1≤i≤N mode η (2) . Moreover,

B. DYNAMICALLY SETTING OF HOM AND WT
In the final step of the handover analysis, we investigate the performance of the proposed algorithm based on HOM and WT. As mentioned before, we define HOM(t) as the rate of the change in the SNR value in two consecutive time slots multiplied by constant value a. In addition, WT(t) is defined as the constant value b divided by the rate of the change in SNR value in two sequential time slots, both expressed as: where t refers to the current time slot. In fact, HOM(t) should be proportional to γ T c v ,u m (t) − γ T s v ,u m (t − 1) to ensure that T c v is appropriate enough for the handover process during the user's movement at either high or slow speed. However, WT(t) is proportional to the inverse of , meaning that if the user moves away fast, WT(t) should be small to handle the handover rapidly, otherwise, the connection between the user and the VLC drops for a long time.

C. COMPUTATIONAL COMPLEXITY ANALYSIS OF ALGORITHM 2
In this subsection, we present the computational complexity analysis of Algorithm 2 by the following proposition.
Proposition 1: The computational complexity of Algorithm 2 is of order O (tPL), in which t represents the total number of time-slots.
Proof: Regarding the pseudocode in Algorithm 2, there is only one ''for'' loop. More precisely, the computational complexity of Algorithm 2 mainly depends on line 10 of the algorithm, where γ T vn ,u n is computed. To this end, we first obtain the received signal at user u m , where its computational complexity is of order O (PL). Then, we apply the Tarokh scheme on the received signal, expressed as in lines 10 and 11 of Algorithm 2, and its computational complexity is of order O (PL). After detecting the signal transmitted from the VLC transmitter, γ T vn ,u n is calculated as in Eq. (12), where its computational complexity is of order O (PL + P). Accordingly, the overall computational complexity related to line 10 of Algorithm 2 is calculated as O (3PL + P).
The corresponding computational complexity for other lines is of order O (1). Suppose that t is total time slots in the ''for'' loop. Accordingly, the total computational complexity of Algorithm 2 is calculated by multiplying the aforementioned complexity terms by t, i.e., max{O (3tPL + tP)}, or equivalently, of order O (tPL).

V. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed indoor localization algorithm and compare it with the existing fingerprinting algorithms, namely, Multi-Layer Perceptron (MLP) [22], Artificial Neural Network (ANN), WKNN [19], and KRRML [18] in terms of the positioning accuracy. Furthermore, we compare the virtual COMP-based handover algorithm with two other algorithms, i.e., soft handover [37] and hard handover [38].

A. SIMULATION SETUP
We consider a smart living room with size 5 × 4 × 3 m 3 consisting of N v T = 4 VLC transmitters. They are positioned on the ceiling, where each transmitter has L = 4 LEDs spaced out evenly with distance of 5 centimeters. The coordinates of the center of each VLC transmitter is shown in Table 2. Each LED has the semi-angle of half power at θ 1/2 = 60 • resulting in Lambertian factor w = − ln (2) ln cos( c ) = 2. In addition, each user u m , as a receiver, has P = 4 PDs positioned on the smart device where each PD has FoV of c = 85 • and area of A r = 1cm 2 . Moreover, the receiver (e.g., smartphone) is equipped with a gyroscope that measures the angle between the device and the floor. The noise characteristics related to the VLC-based environment along with other simulation parameters are listed in Table 2.
Regarding the mobility of the users, we consider the random way point model, described in (28), where the user moves with average velocity v u = 2 m/s with direction in range [−π, π] for 1000 seconds. Although the magnitude of  the speed is constant, the direction of the user's movement is time-variant, and the walking time in a specific direction is considered as a random variable in the range [1,2] seconds. Unlike existing literature, we propose a CoMP-based handover process based on HOM and WT before the handover process, which are dynamically set whenever the handover is triggered. In other words, HOM is calculated as the rate of change in SNR value in two successive time-slots multiplied by a = 0.1. In addition, WT is computed as the inverse of the rate of change in SNR value in two consecutive time slots by b = 0.1 [37], [38].

B. EVALUATION OF PROPOSED INDOOR LOCALIZATION ALGORITHM
Considering the aforementioned setup, we first simulate the proposed indoor localization algorithm, and evaluate its performance by measuring the mean positioning error on the test set. We use Algorithm 1 to generate the required data sets, i.e., training set and test set, then, we simulate the proposed CNN architecture for indoor localization in Fig. 2.
All the data sets in our simulation are collected using MAT-LAB, while, deep learning and machine learning packages in Python, namely, Keras and Scikitlearn [39] are used to train the estimation models. To generate the training set in the offline stage, we divide the smart living room into equalsized elements with d x × d y × d z , and save the 3D coordinates of the center of each element in y (k) as described in lines 1-12 of Algorithm 1. In addition, we randomly choose 10 percent of the training set as the validation set which is considered a useful means to tune the hyperparameters of the model. Among different patterns for generating the test set, we choose the interpolation between the training points as they provide the worst case scenario for evaluating the algorithm since they have the maximum mean distance from the adjacent training samples . Fig 4(a) illustrates the training and test points on the X-Y plane of a 1 m × 1 m area at the corner of the room at the height around z = 1 m with d x = d y = d z = 20 cm, while Fig. 4(b) shows the same points from the X-Z plane view.  i,jn , are calculated according to (6) and (16), and stored in X train and X test , respectively (line 29 of Algorithm 1). In addition, Y train and Y test including the 3D coordinates of samples are obtained according to line 30 of Algorithm 1. Finally, two data sets, denoted by D train and D test are made (line 31). The training set, D train , is given to a CNN consisting of five convolutional layers followed a max pooling layer and a fully-connected layers with 256 neurons. The structure of the CNN employed in the proposed indoor localization algorithm is given in Table 3. The hyperparameters of the proposed model, i.e., the number of convolution and hidden (fully-connected) layers, are selected after testing several values since too many layers might lead to overfitting, while insufficient layers result in a low degree of freedom and a high positioning error. For the training process, we use a well-known technique, namely, error backpropagation. Moreover, the batch size and the number of epochs are set to 32 and 30, respectively. The R2 regularization is used to avoid overfitting in the proposed model. Fig. 3 demonstrates the learning curve of the proposed CNN model. As it can be seen from the figure, the loss function, i.e., mean square error, related to the training set and validation set, converges to a fixed and small value which shows the generalization of the CNN model.
Then, the mean 3D positioning error of the test set is calculated as follows:  In order to evaluate the proposed algorithm, we compare the CNN model with four common localization methods, i.e., MLP [22], ANN, WKNN [19], and KRRML [18]. For sake of fair comparison, we use the same training and test data sets (d x = d y = d z = 5 cm). As shown in Fig. 6, the proposed indoor localization algorithm outperforms the other four schemes in terms of position error with the mean positioning error of 4.31 centimeters. Among the other four methods, MLP has the lowest localization error with mean   Figure 7 illustrates the Probability Distribution Function (PDF) of the proposed VLC indoor localization algorithm and the alternative techniques. As shown in this figure, the proposed CNN architecture for 3D indoor localization reaches sub-decimeter positioning error for a majority of test samples. More precisely, 95.7% of test samples have 3D positioning error less than 10 centimeters with the proposed CNN-based indoor localization algorithm, while 50.0% of the test samples lie within 10 centimeter error with the MLP method. This percentage is 15.1% and 24.2% for ANN and WKNN schemes, respectively. Additionally, none of the test samples reach sub-decimeter accuracy with the KRRML algorithm. The mean and standard deviation of the 3D positioning error are reported in Table 4.
We also compare the complexity of the proposed indoor localization CNN algorithm presented in the block diagram of Figure 2 with MLP [22], ANN, KRRML [18], and WKNN [19] methods. The indoor localization algorithm contains offline and online stages. Similar to [22], we only VOLUME 9, 2021   consider the online stage for evaluation of the complexity because the offline stage is a one-time run and occurs in the CCM, while the online stage is what the users employ every time they want to be localized. Note that Algorithm 1 is not considered in our computations since it is a part of the offline stage. Moreover, Algorithm 1 is used only to generate the data required to train and evaluate the algorithms including the proposed algorithm and the alternative methods (i.e., MLP, ANN, WKNN, and KRRML). A decent criterion for comparing the complexity of deep learning and machine learning techniques is the computation time. Table 5

C. HANDOVER SIMULATIONS
In this part, we evaluate the handover process during the user's movement according to the proposed indoor localization algorithm. In Fig 8, we investigate the BER of the FIGURE 8. Bit error rate versus the average SNR in hard handover [38], conventional CoMP-based soft handover [31], and dynamic CoMP-based soft handover with or without applying coding.

FIGURE 9.
Average effective throughput versus the average SNR in hard handover [38], conventional CoMP-based soft handover [31], and dynamic CoMP-based soft handover with or without applying coding.
proposed CoMP-based handover versus the average SNR in the range [0, 30] dB for different schemes in Fig. 8. Besides, we apply the turbo code as a fixed-rate code and fountain code in our proposed scheme. Compared to conventional CoMP-based soft handover and hard handover, BER in the proposed dynamic handover is improved by at least 25% and 12%, respectively. This superiority comes from the fact that we can substantially prevent the ping-pong effect during the handover process by applying the proposed algorithm. Compared to CoMP-based soft handover algorithm, the SNR values vary considerably during the hard handover process, leading to an increase in BER [40]. In addition, the proposed algorithm outperforms the soft handover process because we dynamically set HOM and WT based on the status of VLC channels in successive time slots, while in the conventional CoMP-based soft handover, the user connects to two VLCs simultaneously if the following inequality is satisfied: otherwise, the user connects to only one VLC transmitter [37]. Accordingly, we propose an algorithm to handle the handover process without considering the users' movement speed, and if the VLC channels are inaccessible, the user connects to the mmWave RF transmitter. It is shown that BER can be improved significantly by applying the fountain code compared to the other two cases: i) without applying coding, ii) with turbo code. To implement the fountain code, we first determine the degree distribution of each input signal. Then, the matrix G is generated such that the length of input signals (K = 1000000), the length of output signals N, and the tolerable error probability (δ = 0.3) are considered as the input for generating matrix G. Then, the input signals are multiplied by the matrix G to encode the signal. Modulus-2 addition is applied during the matrix multiplication. At the receiver, the algorithm finds a received signal with degree 1, decodes it, and then, it is removed as a neighbor from all symbols that have it as a neighbor. This process is performed iteratively until all the received signals are decoded. If the size of the encoded signal is not sufficient, the decoding process leads to failure. As shown in Fig. 8, the proposed algorithm with fountain code outperforms the conventional fixed-rate codes, like turbo codes. In Fig 9, the average effective throughput of the dynamic soft handover versus the average SNR varying from 0 dB to 30 dB is evaluated. As shown, the average effective throughput of the proposed algorithm decreases whenever the average SNR increases. In fact, enhancing the average SNR value of LEDs increases the received SNR at PDs. Thus, the packet successful rate probability improves, and hence, the average effective throughput improves substantially.
For completeness, we investigate HOM and WT before the handover process versus the velocity of the users, varying from 1 m/s to 5 m/s in Fig 10. As seen, the increase of user' speed leads to enhancing HOM and decreasing the WT to trigger the handover process. In fact, whenever the users move rapidly, WT should be a low value; otherwise, the connection between the user and the VLC transmitter is dropped for a long time. In addition, HOM should be adjusted to a high value because the received SNR at the smart device is changed significantly because of the high speed, and the low value of HOM causes successive hangovers. With a similar argument as in the case of high speed, we can perceive that HOM and WT should be adjusted to a low and high value, respectively, when the users move with low speed. To overcome this issue, we propose an algorithm to dynamically adjust HOM and WT based on the rate of changes in the SNR value to prevent the ping-pong effect or SNR reduction during the handover process. Hence, HOM and WT can be adjusted according to the velocity and status of VLC channels, expressed as: .
The values of a and b are sensitive. Therefore, they should be selected such that BER is minimized during the handover process.

VI. CONCLUSION
This paper considered a smart indoor environment with multiple LEDs placed on the ceiling used as VLC transmitters, and a mmWave AP used as a complementary uplink technology. The users utilized the PDs on their smart phones as VLC receivers. We proposed a CNN-based indoor localization algorithm that uses the LoS channel gain and the ToA as well as the angular position obtained from the gyroscope sensor on the smart device to estimate the 3D location of the user. Moreover, a virtual soft hand over based on CoMP transmission was presented, where HOM and WT were set during the user's movement according to changes in the SNR value. We derived a closed-form expression for the average effective throughput. Finally, simulation results showed an average positioning error of 4.31 centimeters, while more than 95.7% of the test points had an accuracy less than 10 centimeters. One possible future work is to address more realistic conditions such as shadowing and blind spots and their effects on the accuracy of indoor localization and the handover process in a hybrid RF/VLC system. Toward this goal, one heuristic solution is to employ the new technique called Reconfigurable Intelligent Surface (RIS) in the proposed system model.