Enhanced Multi-Level Multi-Pulse Modulation for MIMO Visible Light Communication

Any realistic indoor multiple-input multiple-output (MIMO) visible light communication (VLC) system, besides providing reliable high rate transmission, should also support flicker-free, dimming capable, and uniform illumination. In this article, we propose a multi-level multi-pulse modulation for a MIMO VLC system to provide a high data rate with flicker-free uniform illumination. A low complexity detector and a searching algorithm to design the symbol set for transmission are also discussed. Simulation results demonstrate that the proposed detector can achieve a remarkable reduction in computational complexity without any performance degradation as compared with the conventional maximum likelihood detector. Moreover, the proposed symbols design algorithm can help to reduce the symbol error rate (SER), especially in high correlation MIMO VLC channels.


I. INTRODUCTION
Recently, there has been a steadily increasing interest in visible light communications (VLC) as a strong candidate to replace conventional lighting systems around the world [1]. Owing to the integration of communication and lighting features, VLC is regarded as a promising technique for future wireless networks, especially in indoor environments. The transmitted signals, which must be real-valued and non-negative since intensity modulation (IM) is utilized, are directly used to drive on light emitting diodes (LED). On the other hand, direct detection (DD) techniques are employed at receivers in VLC systems [2]- [5].
The on-off keying (OOK), one of single carrier IM/DD, is simply employed in VLC systems with low complexity and cost [6]. Furthermore, pulse position modulation (PPM) is a very good average-power efficiency technique under the category of pulse modulation, which is widely used in VLC [7]. However, PPM requires higher bandwidth and complexity in comparison with OOK. In terms of improved modulation techniques for single-input single-output (SISO) VLC systems, a multi-level pulse position scheme called The associate editor coordinating the review of this manuscript and approving it for publication was Liangtian Wan . energy-constrained slot-amplitude modulation (ECSAM) has been proposed [8]. A main advantage of ECSAM is that it can provide dimming and flicker-free modulation with fixed average transmission power. However, this scheme is restricted to the SISO case, which is a major disadvantage, since multiple-input multiple-output (MIMO) is prevalent in VLC systems.
Optical MIMO communication systems consider modifications based on the spatial degree of freedom [9]. One of the optical MIMO systems is space shift keying (SSK), which allows only one active LED during each period of time and, consequently, limits inter-channel interference while still providing high data rate and low-complexity detection [10]. Spatial modulation (SM) [11], which is combined MIMO and pulse amplitude modulation (PAM), has been proposed in VLC systems [12]. Spatial pulse position modulation (SPPM) is an advanced modulation technique used for optical communication, which combines SSK with PPM [13]. In other words, this scheme conveys a binary sequence through the index of the active LED and the pulse position of the transmitted signal. Furthermore, performance of SPPM with synchronization error and indoor multipath channel were analyzed in [14] and [15], respectively. Lately, spatial multiple pulse position modulation (SMPPM) has been proposed by allowing multiple active pulse slots [16], [17]. However, since the pulse amplitudes utilized in these schemes have values of only one and zero, SPPM and SMPPM do not fully exploit the amplitude of the pulses in transmission.
To fully maintain stable and realistic illumination, modulation and coding technologies in MIMO VLC systems should maintain some critical lighting-related features such as flicker mitigation, dimming support, and uniform illumination. Even though ECSAM [8] can provide strong dimming capability and flicker-free modulation for a VLC system, this scheme is restricted to the SISO case. The authors in [18] proposed a coding scheme for uniform illumination in MIMO VLC systems. However, the uniform illumination condition [18] was defined as the same total transmitted power throughout transmission time amongst all LEDs. This constraint can be readily relaxed in MIMO SPPM by restricting the total power transmitted by all time slots and all LEDs during each eye-sensitive time symbol [19]. Moreover, since the lighting is an inherent feature of any VLC system, the MIMO system should provide high quality illumination with uniform irradiance between all LEDs during a symbol time.
In this article, we propose a novel system, namely enhanced multi-level multi-pulse modulation (EMMM), which provides flicker-free, uniform illumination with dimming control capability during each symbol for MIMO VLC systems. Specifically, the flicker-free condition is provided by ensuring constant total transmitted power during each symbol, where symbol length is defined to be less than the human eye's integrating time [8]. The uniform illumination condition is achieved by equating total power among LEDs. Furthermore, by altering the amplitude of the pulse instead of just using one and zero in SPPM, the proposed scheme yields better performance for transmission rate, since it can enlarge the minimum Euclidean distance (ED) between any two transmitted symbols, provided the same transmission conditions. Moreover, we focus on proposing a sphere-based detector that significantly reduces computational complexity while maintaining the same performance as traditional maximum likelihood (ML) detectors. A searching algorithm is additionally proposed for the symbol designing problem to obtain a good symbol set. Simulation results show that the proposed scheme can achieve a significant performance gain over prior schemes. Furthermore, the symbol design algorithm of the proposed scheme improves the symbol error rate (SER). Detection complexity is remarkably reduced with the help of the proposed low-complexity detector while maintaining performance at the level of an ML detector. In summary, the key contributions of this article are listed as follows.
• An EMMM for MIMO VLC is proposed to provide flicker-free and uniform illumination features, which are of at most importance in any simultaneous communication and lightning system.
• The proposed scheme is equipped with a low complexity sphered-based detection algorithm by ordering candidates and using early termination.
• A symbol design algorithm is also proposed to improve the performance of the proposed scheme.
• The comparison results between the proposed EMMM and conventional schemes are provided to demonstrate the advantages of the proposed scheme. The rest of this article is organized as follows. In Section II, we describe the system model and the constraints on the symbol design. The low complexity detector is detailed in Section III. In Section IV, the algorithm for symbol design is presented. The symbol design and detector are analyzed in Section VI, including the symbol error rate (SER) performance and the complexity of the proposed detector. Finally, we offer our conclusions in Section VII.

II. SYSTEM MODEL
Consider a MIMO VLC system equipped with N t transmit units (LEDs) and N r receive units (PDs) in Fig. 1. For each LED, the symbol with time duration T is divided into L s time slots, each of duration T p = T /L s . For each time slot, each LED transmits a PAM level of {0, 1s, 2s, . . . , a PAM s} with a PAM is a positive integer representing the maximum level of PAM pulse, the LED peak power is P t , and s is the positive value denotes the resolution of the PAM constellation, conditioned by s = P t a PAM . Furthermore, with a transmission rate of R bit/s/Hz, conveying a set of M = 2 R symbols X = {C 1 , C 2 , . . . , C M }, where the k−th symbol C k , for k = 1, 2, . . . , M , can be represented by an N t × L s matrix where sc (i,j) k is the signal transmitted by LED i in time slot j of symbol C k and c (i,j) k ∈ {0, 1, 2, . . . , a PAM }. Furthermore, assume the set of all possible signal vector c (1, . . . , 0) T , . . . , (a PAM , . . . , a PAM ) T . Notice that, with fixed value of M , the values of s and P t can be used to adjust the dimming condition. Since dimming control is not the main focus of this research. In this article, without the loss of generality, we simply assume that s = 1 so P t = M .
In VLC system, there are two flicker problems, intrasymbol flicker and inter-symbol flicker. The intra-symbol flicker is resulted from the brightness discrepancies between the patterns of light emitted inside a transmitted symbol of duration T . Since illumination service is related to the light perceived by the human eye, according to the second Bloch's law, to avoid intra-symbol flicker, we choose T which is shorter than the human eye perception duration of 20ms [20], and any change of luminance during time T can not perceived by human eye. On the other hand, the inter-symbol flicker is caused by the average brightness discrepancy Furthermore, in VLC systems, the uniform illumination is a key consideration for user comfort capability at any location. Therefore, to guarantee uniform illumination among all LEDs, the total power emitted by LEDs should be identical and this means that γ 1 = γ 2 = . . . = γ N t = γ during each symbol interval T . Therefore, (2) can be replaced with For each LED, K (γ , L s , M ) is the number of combinations of amplitudes [8] that meet the above constraint (3), and can be calculated as Therefore, combining all LEDs, the total number of available symbols that meet the constraints (2) and (3) is [K (γ , L s , M )] N t , resulting in a number of R = log 2 [K (γ , L s , M )] N t bit/s/Hz per symbol time length. The achievable spectral efficiency of EMMM is then given by R/L s bit/s/Hz while the SMPPM, SPPM, and SSK are capable of transmitting log 2 C (L s , L a ) N L a t /L s , log 2 (N t L s ) /L s , and log 2 (N t L s ) bit/s/Hz, respectively, where L a denotes number of active time slots in SMPPM [17]. The proposed scheme with only four time slots can achieve a spectral efficiency of 4 bit/s/Hz, much higher than 2.5 bit/s/Hz maximum of the SMPPM scheme [17] with its total of 32 time slots. This will clearly give the proposed scheme advantage over other schemes in hardware and computational complexity cost. Moreover, since only 2 R symbols are used in transmission, 2 R symbols can be randomly selected from the set of all available symbols. However, a particular set of symbols can degrade the performance of the system due to a relatively small ED between two symbols in the chosen set. Therefore, a searching algorithm should be proposed to improve the symbol set utilized in the transmitter, leading to a better performance in comparison with the conventional method.
With a specific transmitted symbol C, the received electrical signal y (t) in a single symbol duration T , which belongs to time slots 1 ≤ j ≤ L s is given as where κ and ξ are the LED conversion factor and the responsivity of the PDs, respectively. The noise term n (t) is assumed as an additive white Gaussian noise (AWGN) with zero-mean and variance σ 2 n . The N r × N t matrix H is the lineof-sight (LOS) channel gain between the LEDs and the PDs, where h a,b denotes the channel gain between LED b and PD a. In this article, we adopt a commonly used channel model [2], [21] for indoor VLC, where each LED has a Lambertian emission pattern and the LOS link between the transmitter and receiver is considered. Specifically, the channel gain h a,b can be computed as where φ a,b is the emission angle from the b-th LED to the a-th PD, θ a,b is the angle of incidence at the a-th PD due to the b-th LED, A p is the PD area, θ fov is the PD's field of view semi-angle, d a,b is the distance between PD a and LED b, and m is the order of the generalized Lambertian radiation. The half-power semi-angle, 1/2 , is related to m as m = − ln 2 ln(cos 1/2 ) . During the transmission time 0 ≤ t ≤ T of time slot j, the (N t × 1)-dimensional vector of transmit signals u (j) (t) [14], is defined as where c (j) = c (1,j) c (2,j) · · · c (i,j) · · · c (N t ,j) T is the vector that represents the signal transmitted by all LEDs in time slot j. The rectangular function can be defined as Therefore, the received signal in (5) with 0 ≤ t ≤ T during time slot j can be represented as Finally, correlator-based detection [13] is utilized, where the received signal is passed through a bank of L s parallel cross-correlators and the output for each of the L s time slots of the transmitted symbols is obtained by sampling at the chip rate 1/T p . The output at all the branches of the correlator receiver during a symbol transmission time for the transmitted symbol C is expressed as where the matrix Y with size of N r × L s represents the received signal at all the receivers during a symbol transmission in EMMM. C is the N t × L s matrix, which represents the transmitted symbol that mapped from a binary sequence of R bit/s/Hz. Moreover, N denotes the Gaussian noise in N r × L s matrix, where each component is a zero-mean uncorrelated Gaussian random variable with variance of σ 2 n . The ED value D (Y, HC) between the actual received signals Y at the PDs and a possible transmitted symbol C is where · represents the Euclidean norm and y (j) is the received signal vector at the j-th time slot. The probability density function (PDF) of Y conditioned on C can be written as If the knowledge of the channel matrix is perfect at the receiver, the signal vectors transmitted by all LEDs through L s time slots are jointly detected by conventional ML detector, which gives the minimum ED from the received signal as Although the ML detector can achieve optimal performance for the system, its detection complexity increases exponentially.

III. LOW COMPLEXITY DETECTION
In this section, to reduce the redundant complexity of the conventional ML detector, we propose a new detector that can use the sphere decoding (SD) principle to exploit the intrinsic properties of an EMMM signal with L s independent time slots. Base on the ML detecting, the well-known idea of SD [22]- [24] is to restrict the search space to within a hypersphere of specific radius to reduce the number of searching points. From this intuition, in searching according to (13), the proposed detector only considers the subset of the set X of all candidates C k , called X , which can be defined using (13), the transmitted symbol can be estimated by where the predefined and adjustable r is called sphere radius, which was initially set to infinity. The proposed detector uses partial ED to exclude the redundant candidates that fail to satisfy the condition of set X . In particular, the above minimization problem of finding the transmitted symbol (13) can be transformed into a search graph whose length is equal to the number of time slots, L s , where every time slot in the search graph is equivalent to a time slot j. The time slot j in the search graph has w j candidates of the transmitted signal c (j) and the set of candidates at time slot j can be denoted as Ξ (j) = c (j,1) , c (j,2) , . . . , c (j,wj) . Say in other way, the set Ξ (j) contains all the possible c (j) in the time slot j. Consequently, the symbol C k in the set X = {C 1 , C 2 , . . . , C M } can be represented by a path from time slot L s to time slot 1 in the search graph. More specifically, one example of the search graph is shown in Fig. 2 can be equivalently represented in the search graph by a particular search path, such as c (1,2) . . . c (j,4) . . . c (Ls−1,wL s−1 ) c (L s ,3) . For example, the path (1) in Fig. 2 can be represented by c (1,1) c (2,1) c (3,2) c (4,2) . It can be observed from (11) that the detection process at each time slot independently contributes to the total ED value D (Y, HC k ) between actually transmitted symbol and the estimated symbol. Therefore, when searching with the symbol candidate C k , the accumulating ED at the time slot j can be characterized by the partial ED, which can be computed recursively as with the assumption that D (L s +1) (Y, HC k ) = 0. Moreover, the conventional depth-first approach SD detection [25] with radius update and early termination can be used to determine the ML solution. To reduce unnecessarily redundant computation, whenever a path violates the condition of set X , or D (j) (Y, HC k ) ≥ r, the detection process is moved to the other sibling candidates at the same time slot j without evaluating the subsequent child candidates. Therefore, the detection problem can be interpreted as finding the symbol C k that has the path with the minimum D (1) (Y, HC k ). Moreover, a new solution for a symbol C k is reached when the detector moves from time slot L s to the time slot 1 without early termination, since this case only happen when the accumulated D (1) (Y, HC k ) has a smaller value than the predefined sphere radius r. In this case, r can be updated with the new value of D (1) (Y, HC k ), which is smaller than the previous r value. Even though the ideology of sphere detecting [26] brings a significant complexity reduction in the EMMM receiver, the detecting process still can be improved by exploiting the interesting properties of the signal transmitted by LEDs. More specifically, in the following subsections, we introduce two modification features that can further increase the chance of early termination and therefore reduce the number of searching for the detector in the EMMM scheme.

A. INITIAL ESTIMATION AND CANDIDATE ORDERING
At any time slot j, we propose ordering the candidate set Ξ (j) based on an initial estimation, which leads to a significant reduction in the number of visiting candidates. It can be observed that the computational complexity at time slot j depends on the detecting order of the candidates set Ξ (j) = c (j,1) , c (j,2) , . . . , c (j,wj) . The correct decision at the early nodes typically generates smaller partial ED in (15), resulting in smaller r, hence more effective early termination. Therefore, the detection process can be shortened and a smaller number of subsequent iterations is required to approach the final solution in the search graph by reordering the candidates in each time slot according to a criterion. In here, we propose to reorder the set Ξ (j) at each time slot j based on an initial estimation of c (j) .
We define set Π (j) = {D Hc (j,υ) , Hc (j,υ ) = κξ H c (j,υ) − c (j,υ ) 2 |c (j,υ) , c (j,υ ) ∈ Ξ (j) , c (j,υ) = c (j,υ ) } that includes the ED between any possible pair c (j,υ) , c (j,υ ) ∈ Ξ (j) . The first step of the proposed detector is to find an initial estimation of the transmitted signal based on the received signal Y by using a linear detection method, such as zero-forcing (ZF) or minimum mean square error (MMSE) [27]. Specifically, estimated symbolC = c (1)c (2) . . .c (j) . . .c (L s ) for MMSE can be determined asC Then, from the initial estimatedc (j) , quantization is utilized to find the nearest vector of integers by a simple and low complexity quantizer [28],c (j) = Q c (j) that rounds vector c (j) toc (j) ∈ Ξ (j) . Then, in the proposed ordering step, at each time slot j, any candidate c (j,υ) is ordered in the increasing order of the D Hc (j) , Hĉ (j,v) as D Hc (j) , Hc (j, 1) ≤ · · · ≤ D Hc (j) , Hc (j,v) ≤ . . . ≤ D Hc (j) , Hc (j,wj) . In other words, the candidate with the smallest ED to the estimatedc (j) is set to be detected first, or it is searched first in the search graph search at the time slot j, while the candidate with the largest ED toc (j) is moved to the last position of the time slot j, which is searched last.
Consequently, with the help of estimatedc (j) corresponding to time slot j, we build the search graph that consists of L s time slots. At each time slot, the candidates are ordered according to ascending ED between the estimatedc (j) and the available candidate c (j,v) ∈ Ξ (j) . When a parent node is expanded, the candidate note with the smallest partial ED to the estimatedc (j) is examined first. Later, when a candidate at time slot j is visited with the accumulating D (j) Y, HC k ≥ r, this candidate and consequently all of its child candidates are excluded from the search. The search continues until all possible candidates are examined.

B. IMPROVED EARLY TERMINATION
It is observed that the ED between an candidate symbol C k and the actually transmitted symbol C is often significantly larger than one ED value of a candidate c k , Hc (j) when the number of time slots L s is sufficient [29]. Generally, the search process needs to be performed across several time slots before it can be genuinely terminated using value of r. In this subsection, an effective enhanced early termination criterion is proposed. Specifically, we can observe that the term y (j) − κξ Hc (j) k 2 in (11) can be relatively lower bounded by the Gaussian noise residual or the ED between the candidate symbol C k and the actual transmitted symbol C. Lemma 1: The search process for the candidate symbol C k at any time slot j can be terminated by the condition: where the accumulating noise residual D re (j) is Proof: Theoretically, the noise residual y (j) − κξ Hc Consequently, we have Therefore, when the search reaches the condition that D (j) (Y, HC k ) + D re (j) ≥ r, it means D (1) (Y, HC k ) ≥ r, and the search progress for the candidate C k can be early terminated without further consideration. Furthermore, at each time slot τ , the estimated signal can provide valuable information as In high SNR regime, we can approximately replace (18) with Therefore, since the SNR level of the channel strongly impacts the reliability of any detector, such as ML, MMSE, or ZF, we can heuristically take D re (j) as an early termination condition. Specifically, we give a new exclude condition, in which the exclusion can be executed any time a candidate is visited with its without compromising the optimality of the optimal ML solution. The coefficient 0 ≤ η ≤ 1 depends on the SNR level and represents the reliability of the MMSE or ZF estimations. For example, if the SNR level is high enough, η is near one. In contrast, if SNR level is relatively low, η should have a value near zero, since the estimation of transmitted signal does not guarantee a reliable lower bound of the residual. Therefore, based on the SNR level of the VLC channel, the termination condition as the η coefficient can be modified so that early termination can be executed without compromising the ML optimality. Throughout the simulation, an acceptable value of η that provide good performance as where p max is the pre-determined SNR level that the search with η = 1 begins to give performance equivalent to that of conventional ML detector. Finally, the specific process of the proposed detector is shown in Algorithm 1. The search graph in Fig. 2 is represented horizontally for a system with L s = 4. Specifically, at the beginning, the estimated symbol c (1)c(2)c(3)c(4) is obtained by MMSE estimation and quantization. At each time slot, for example the first one, the set of candidates Ξ (1) = c (1,1) , c (1,2) , . . . , c (1,w 1 ) is ordered based on the lookup operation of equivalent D Hc (1) , Hc (1,q) in Π (1) for q = 1, . . . , w 1 , where w 1 is the number of possible candidates at time slot 1. Based on the ordering by ascending ED in all time slots, we map all possible symbols C k to the corresponding search path illustrated as the black dashed lines in Fig. 2. The red line is the search path with lower ED value and determines the updated radius r of the sphere detection. With each candidate C k ∈ X , the search process should be processed as in Algorithm 1 and terminated according to the criterion in (22).

IV. SYMBOL SET DESIGNING FOR EMMM
On the transmitter side, a sequence of input bit/s/Hz is mapped to a transmit symbol C k . The transmitter can use the estimated channel to select an optimal subset of M transmitted symbols out of the possible subsets to transmit. It can be seen that with any value of γ and a given set of integers {0, 1, . . . , a PAM }, the number of possible symbols C k is much larger than the number of M necessary symbols for the set X . For example, in a MIMO system with four LEDs and four time slots, γ = 3, and a PAM = 2, then the number of possible signal vectors transmitted by each LED during each T is 16. Therefore, the number of possible symbols C k transmitted as a combination of four LEDs during a time symbol T would be a colossal number of 16 4 =65536. If the transmission rate of the system is only R = 6 bit/s/Hz, only M = 64 out of 65536 available symbols are needed. Normally, a general method can process to pick randomly a set of 64 symbols to use in transmission. However, from (12), the ED between any two symbols greatly impacts the SER of the system. Furthermore, in high SNR regimes, the minimum ED distance between any two transmitted symbols in X is the main factor that degrades the BER [30]. Therefore, a wise
X opt = arg max In an exhaustive search, all possible combinations X of any 64 symbols in the set of 65536 available symbols C k should be generated. Then, all the EDs between any two symbols are computed and compared in order to find the optimal set of transmission symbols. However, even though exhaustive search can achieve the best performance improvement, it cannot be realized due to the colossal computational cost brought by a extremely large number of possible symbol combinations.
Therefore, in this section, we propose a searching method to select a set of symbols that can be used in the transmission to improve the BER performance in comparison with random selection. Instead of just computing the ED between any two symbols and finding any random set X that has the largest minimum ED, we reach to improve the minimum ED of the current set by trying to search and gradually increase the minimum ED. The basic operation of the heuristic symbol designing algorithm can be summarized by the following steps.

1) FIRST STEP
The proposed method first generates a set of M possible transmit symbols randomly. To further increase the effectiveness of the algorithm, we generate different instances of X and seek to repeat the algorithm a number of times. We then find C p , C q ∈ X that have the minimum value of D HC p , HC q . After that, we get the time slot index set l 1 , l 2 , . . . , l L s of C p according to the ascending values of D Hc

2) SECOND STEP
To increase the ED between two symbols that have minimum ED in the set, we try to replace C p with a new symbolĈ p . The algorithm attempts to sequentially replace each vector c (l i ) p with a new one. The replacement is prioritized by the previous ordered index set l 1 , l 2 , . . . , l L s . This will help increase the chance to find the optimal set early during iteration. In the most cases of our simulation simulation, at most ten iterations are needed for the algorithm to reach the optimal symbol set. Moreover, instead of replace all L s time slots at once, the algorithm first try to replace ∆ = 2 time slots l 1 , l 2 of C p to improve D min = minD HC p , HC k . The iteration will finish if D min value increases. Otherwise, we continue to increase ∆ = 3, 4 and repeat the search process until ∆ = L s − 1 or D min value increases. Assume that we want to modify first ∆ time slots c For a time slots l ∆ that 1 < ∆ ≤ ∆. We have the following constraints on the candidate vectorĉ where we have γ ∆ min (γ , a PAM ), and γ = ∆ j=1 c (lj) p .

3) THIRD STEP
Finally, the improved set X with the largest minimum ED should be used in transmission. The proposed method is described in Algorithm 2.

V. NUMERICAL RESULTS
In this section, we present the simulation performance and complexity reduction of the proposed detector in comparison with the conventional ML detectors. For simulation, a MIMO VLC system with four LEDs and four the PDs for the channel H2. The distance between PDs in the first scenario is just 0.2m while it is 0.5m in the second scenario. According to different positions of PDs, channel H1 has lower correlation between LEDs and PDs, and therefore will achieve some performance gains in comparison with channel H2. Furthermore, we set A p = 1cm 2 , 1/2 = 60 o , and ψ 1/2 = 60 o . In order to ensure comparability, the average transmission powers of each symbol are set to unity in all scenarios. Besides, to consider the path loss between the transmitter and the receiver, we evaluate the SER with regard to transmitted SNR. More specifically, we define the SNR as 1 σ 2 n . Consequently, as mentioned in [31], [32], with the consideration of the general VLC channel coefficients, this results in a huge path loss which is about 70dB at the receiver side and the SER performance figures will show an SNR offset of around 70dB with respect to the received SNR value. For transmission, the symbol time of 20ms is divided into L = 4 time slots. We also set s = 1, γ = 3 and a PAM = 2.

A. PROPOSED SYMBOL DESIGNING PERFORMANCE
Firstly, we present the performance of the proposed scheme with the symbol design algorithm. In Algorithm 2, we set R = 4, 6 bit/s/Hz. For the simulation of the conventional mapping, we generate several realizations randomly and choose one whose minimum ED between any two symbols is maximized. In this simulation, to simulate the proposed scheme with random symbol set, we randomly generate 20 sets of symbols and using the one with the largest minimum ED value. In Fig. 3 and Fig. 4, SER performances of the proposed scheme are shown, when the symbol sets which were obtained from Algorithm 2 where R = 4 bit/s/Hz as M = 16 and R = 6 as M = 64 are used in transmission, respectively. It is clear that the distance between the PDs significantly impacts the performance due to the high correlation of channel H2 in comparison with channel H1. Moreover, with the proposed Algorithm 2, we can determine  the set of transmit symbols with much smaller ED distance. An SNR gain of at least 6 dB can be achieved when R = 4 bit/s/Hz using channel H1 while the SNR gain using the highly correlated channel H2 is as large as 18 dB when R = 6 bit/s/Hz. With both channels, when the bit rate is low such as M = 4, the difference in correlation level of each channel has only a minor impact on the difference between the SER of the system in two channels. However, when the bit rate gets higher, the deviation is significantly increased since the SNR gain between proposed symbol set and random symbol set of the higher correlation channel H2 is nearly 20 dB instead of just around 10 dB in channel H1 and in lower bit rates.
To compare the performances of EMMM and SMPPM, we consider that two receivers are located simultaneously in various locations around the room. More specifically, the minimum ED values of the proposed EMMM is shown with the SMPPM scheme value for the case of R = 4 bit/s/Hz while the distance between PDs is fixed at 0.5 m. In both schemes, the receiver location significantly affects the performance, represented by the minimum ED values. More specifically, in a location near the center of the room, due to the high received power from all LEDs, the performance is good as expected. In contrast, when the receiver moves far from the center of the room, the minimum ED values constantly decreases due to low signal power from LEDs. It can be observed that with the additional symbol design algorithm, the proposed EMMM scheme shows larger minimum ED values than SMPPM at all alocation around the room, which leads to improvement in BER performance.
To provide more insight into the convergence of Algorithm 2, we show the simulation results after several iterations. More specifically, in Fig. 6, a Monte Carlo simulation was run for 200 random locations generated around the room. More specifically, we denote the minimum ED value of the received signal without the symbol design is d 0 > 0. Then, in each location, Algorithm 2 is run for a number of iterations up to 11. After each iteration, the resultant minimum ED, d opt , is then normalized to the original d 0 as d opt = d opt /d 0 . Then, the average of the result d opt in all locations tends to converge after n ite iterations. Unsurprisingly, it can be seen that as the number of iteration increases, the minimum ED value also increases. After the first iteration, the value of d max remarkably increases. However, on average, after 9 to 11 iterations, the improvement starts to converge. After that, depends on specific channel realization, the minimum ED values can increase more; however, the increase on average is significantly small. Therefore, the gain after 11 iterations is relatively small but the computational cost is proportional to the number of iterations. As a result, depending on the specific scenario and the constrain on the complexity cost, the number of iterations should be carefully considered to adapt to the performance-complexity trade-off. In Fig. 7, we compare the performance of the proposed scheme with the SMPPM scheme in [17] because the SMPPM is an improved version of SPPM in terms of performance and spectral efficiency. Moreover, SPPM also demonstrates better performances in comparison with other pulse modulation schemes such as OOK or SSK [15], [16]. Therefore, the performances between the proposed scheme and the recently proposed SMPPM adequately demonstrate that EMMM can remarkably exploit the proposed Algorithm 2 to reduce the correlation of MIMO VLC channels. For a fair comparison, we consider the best symbol set of the two schemes. That is, the set with the largest possible value of minimum ED between any two symbols. More specifically, the symbol set of EMMM is determined at 9 iterations while the SMPPM using the symbol set with the largest minimum ED value from 20 random generated sets. Also, the total transmitted powers of the two schemes are set to be equal. With the same constraints on total power and the maximum power of an individual pulse, the proposed scheme can flexibly and intuitively increase the minimum ED between any two symbols. Fig. 7 shows that at low bit rates, the performance gaps are smaller, and the gap can achieve 10 dB at higher bit rates such as 6 bit/s/Hz. Moreover, the effects of correlation existing in the channel are well depicted since the SNR gain of the proposed scheme in comparison with SMPPM is larger over the channel H1.

B. PROPOSED DETECTOR PERFORMANCE
In Fig. 8, we compare the SER performance of the conventional ML detector with the proposed one. It can be seen that both detectors can achieve almost the same performances. The proposed detector based on sphere searching, while maintaining same performance, can reduce complexity cost due to the exploitation of the adjustable radius and the removal of redundant candidates. Fig. 9 shows the complexity reduction of proposed detectors under different scenarios in comparison with the conventional ML detector. For simplicity, according to (11), the complexity of the two detectors will be approximated to the number of ED calculations in each time slot. Our proposed detector, however, by effectively utilizing the candidate sorting based on initial MMSE estimation and early termination  criteria, maintains the BER performance at the level of the ML detector while drastically reducing the complexity. Moreover, in the high SNR regime, where noise power is relatively low, the estimated MMSE solution for the transmitted symbol is more reliable than the estimated MMSE in the low SNR regime; consequently, the candidate order is also more reliable. Hence, by ranking the actually transmitted symbol near the beginning of the search process, compile with the proposed early termination condition, the complexity reduction in high SNR regime is significantly high. As the bit rate increases, the proposed detector can significantly reduce the computational complexity and achieve around 85% complexity reduction in the case of R = 6 bit/s/Hz in the high SNR regime. We can see that depending on channel correlation, the percentages vary, as illustrated in the cases of H1 and H2 channels. VOLUME 8, 2020

VI. CONCLUSION
In this article, we propose the EMMM for MIMO VLC to support flicker-free, dimming control capable, and uniform illumination, all of which are very important in optical wireless communication systems. We also focus on improving the system performance and reducing the complexity of the detection process by proposing two proposed algorithms. The first algorithm is to design a transmit symbol set that can enlarge minimal ED between any two symbols. With the second algorithm, we propose an improved sphere-based detector with candidate ordering and early termination criteria. Simulation results indicate that our proposed scheme with symbol design and detector shows better error performance and lower computational complexity in comparison with conventional MIMO VLC schemes.