Reconfigurable Intelligent Surface Aided Position and Orientation Estimation Based on Joint Beamforming With Limited Feedback

The sparsity of millimeter wave (mmWave) channels in angular and temporal domains has been exploited for channel estimation, while the associated channel parameters can be utilized for localization. However, line-of-sight (LoS) blockage makes localization highly challenging, which may lead to big positioning inaccuracy. One promising solution is to employ reconfigurable intelligent surfaces (RIS) to generate virtual line-of-sight (VLoS) paths. Hence, it is essential to investigate the wireless positioning in RIS-aided mmWave systems. In this paper, an adaptive joint LoS and VLoS localization scheme is proposed, where the VLoS is constructed by a beamforming protocol operated between RIS and mobile station (MS). More specifically, to sense the location and orientation of a MS, a novel interlaced scanning beam sweeping algorithm is proposed to acquire the optimal beams. In this algorithm, LoS and VLoS paths are separately estimated and the optimal beams are selected according to the received signal strength so as to mitigate the LoS blockage problem. Then, based on the selected beams and received signal strength, angle of arrival (AoA), angle of departure (AoD), angle of reflection (AoR) and time of arrival (ToA) are estimated. Finally, with the aid of these estimated parameters, the location and orientation of the MS are estimated. We derive the Cramer-Rao lower bounds (CRLBs) for both location estimation and orientation estimation, and compare them with the corresponding results obtained from simulations. We compare the performance of our proposed scheme with that attained by three legacy schemes, when various aspects are considered. The performance results show the superiority of our proposed beam training algorithm, which is capable of achieving a localization error within 15 cm and an orientation error within 0.003 rads. Furthermore, the training overhead is 100 times less than that of the conventional exhaustive search algorithm, while obtaining a 13 dBm power gain when compared with the hierarchical codebook based search algorithm.


I. INTRODUCTION A. MOTIVATION AND BACKGROUND
L OCALIZATION has sparked a lot of research interest in recent years, owing to the numerous applications that benefit from localization information, such as intelligent transportation systems, unmanned aerial vehicles (UAVs), and so on [1]. However, the conventional localization systems, such as the global positioning system (GPS), are unable to provide high accuracy and also suffer from high latency [2], while they have poor coverage for indoor scenarios. On the other hand, millimeter wave (mmWave) based localization techniques have attracted wide research interest in recent years, owing to the high beamspace resolution by employing large antenna arrays [3]. Nevertheless, mmWave communications have numerous challenges, including high path-loss, which can only be mitigated by employing beamforming using massive antenna array [3], [4].
Reconfigurable intelligent surfaces (RIS) have recently attracted intensive research attention to control the propagation environments [5], [6], [7]. RIS is a surface composed of a large number of controllable passive reflective elements, placed between the transmitter and receiver. Since each element of RIS can independently change the phase (or/and) amplitude of the incident signal, RIS can be used to improve the communications energy efficiency, spectrum efficiency, positioning accuracy and even the communication security [8], [9]. However, the channel estimation of the cascaded channels in the RIS aided systems is a challenging task [10], [11], [12], [13], [14], but the concept of passive beamforming as well as joint active and passive beamforming considered in [10], [11] can be used in localization.
Benefiting from the high angular and temporal resolution, mmWaves have a great potential for accurate channel estimation and localization [15], [16], [17], [18], [19], [20], [21]. In [15], an adaptive codebook-based channel estimation algorithm was proposed, which uses multi-stage codebooks to achieve angular estimation, so as to attain a good tradeoff between performance and implementation complexity in the conventional mmWave systems. However, when line-ofsight (LoS) path is blocked, the non-line-of-sight (NLoS) path based localization has low accuracy due to the significant reflection loss. For example, in [16], it was shown that the LoS relied localization outperforms significantly the NLoS base localization in the conventional mmWave systems. In [20], the LoS path was used to estimate the location of MS, and after the estimation of the MS's location, the locations of the scatterers are also estimated. The studies show that the performance of the LoS-based localization is much better than that of the NLoS based localization. Therefore, in the case of LoS blockage, the RIS aided localization may provide a promising alternative to improve the localization accuracy in mmWave systems. Furthermore, in [21], a successive localization and beamforming scheme was proposed to achieve the joint localization of user equipment (UE) and channel estimation in the mmWave MIMO systems without employing RIS.
In literature, there are a range of researches on the RIS-aided localization. In [22], the concept of continuous intelligent surface (CIS) was introduced, where the limits of the RIS-aided localization and communications system were presented. More specifically, in [22], a general signal model for the RIS-aided localization and communication system considering both far and near-field scenarios was studied, where both CIS and discrete intelligent surface (DIS) were introduced to improve the localization accuracy and spectral-efficiency via phase response optimization. In [23], the holographic network localization and navigation (NLN) was studied, where RISs were operated in the controlled electromagnetic environment attained by using polarization and specific antenna patterns. It was shown that the robustness of the holographic localization to obstructions can be improved with the aid of RISs. In [24], a localization scheme aided by the adaptive beamforming design using the hierarchical codebook algorithm was proposed for the RIS-aided joint localization and communication, when assuming that the LoS path was not available. In [25], a received signal strength (RSS) based positioning scheme was studied with a RIS-aided mmWave system. Moreover, in [26], a multi-RIS aided multiple user beam training scheme was proposed for the Terahertz system, where a ternary-tree hierarchical codebook based algorithm was employed to reduce the complexity.
In the RIS-aided localization systems considered in literature, performance bounds have often been analyzed in order to demonstrate the achievable performance of the proposed localization schemes. Specifically in [27], the Cramer-Rao lower bound (CRLB) for the RIS-aided mmWave positioning was analyzed, where the impacts of the number of RIS elements and the design of phase shifters were studied. In this study, the optimal phases of RIS were assumed to be known and thus, the Fisher information matrix (FIM) does not contain the parameters of the reflection angle. Furthermore, the studies in [27] assumed the far-field communication environment. The authors of [28] analyzed the FIM and CRLB in a 3D scenario, where the RIS with unknown phase shift is assumed, and also compared the centralized and distributed deployments of RIS. In [29], a Bayesian framework for localization in the RIS-aided systems was developed, where the uncertainty was investigated in both the near-and far-field cases. In particular, the authors derived the Bayesian equivalent Fisher information matrix (EFIM), showing that, in the far-field case, the RIS orientation offset is unable to be corrected, when there is an unknown phase offset in the received signal. By contrast, the unknown phase offset has no effect in the near-field case. Furthermore, in [30], the Bayesian bounds for the localization of a UE with the aid of multiple RISs were derived, and it was demonstrated that the RIS related geometric channel parameters become lost, when the complex channel gains are not known to the UE. In [31], the authors analyzed the CRLBs of the active RIS-aided localization systems, where multiple signal transmissions and particle filtering algorithm were proposed to improve the signal power of the RIS-aided path. In [32], the CRLB of an indoor localization scheme in ultra-wide band (UWB) system was provided. In [33], the CRLB of the localization and orientation estimation was studied in association with a RIS-aided mmWave system. Furthermore, in [34], the downlink RIS-aided localization was studied, when assuming that LoS is obstructed. To improve the estimation performance, in [34], an alternative optimization and a gradient decent method were proposed to optimize the reflected beamforming and to minimize the CRLB.
In [35], the LoS blockage problem was considered and the effect of LoS blockage rate on the positioning was studied. In mmWave systems, the codebook based channel estimation has widely been studied [36], [37], [38]. However, these approaches cannot be directly applied to the RIS-aided systems, since RIS is regarded as the nearly passive metasurface. 1 For this sake, the codebook based techniques need to be revisited. 1. Note that, we use the term of 'nearly passive' instead of 'fully passive,' because a RIS also needs to consume a certain amount of power/overhead for configuration.

B. RESEARCH PROBLEM AND CONTRIBUTIONS
In this paper, we study the positioning and orientation estimation of a mobile station (MS) in the RIS-aided mmWave system with orthogonal frequency division multiplexing (OFDM) signaling. We set BS and MS to employ hybrid beamforming, while RIS to employ the nearly passive beamforming. Under these settings, the localization and orientation estimation problem can be treated as a downlink joint active and passive beamforming problem with limited feedback. Hence, our research objectives include to find the efficient methods to solve this problem, study the achievable performance and the associated training overhead, as well as to analyze the error bounds of the position and orientation estimation. More explicitly, our contributions can be summarized as follows.
• The position and orientation of a MS in the downlink OFDM mmWave system are estimated, when assuming that both BS and MS employ hybrid beamforming, while RIS is a nearly passive meta-surface, and the status of the LoS link between BS and MS is unknown.
To estimate the location and orientation of the MS, the channel parameters of the LoS, and of the path reflected via RIS, which is referred to as the virtual line-of-sight (VLoS) path, are first estimated. In order to maximize the received signal power, we assume limited feedback from MS to RIS controller and from MS to BS for the sake of near optimal beam selection. After the estimation of the channel parameters, including angle of arrival (AoA), the angle of departure (AoD) of LoS path, the angle of reflection (AoR) of VLoS path, and time of arrival (ToA), the location and orientation of MS are then estimated. • A novel beam sweeping algorithm is proposed and studied, which is capable of achieving a promising trade-off between localization accuracy and training overhead, when compared with the conventional beam sweeping algorithms, such as the exhaustive search (ES), hierarchical codebook (HC), and partial search (PS) algorithms. • The CRLBs for both the location estimation and orientation estimation are derived, which are compared with the performance obtained from simulations for the proposed beam sweeping algorithm. • In terms of the performance study and comparison, the effects of the codebook resolution, LoS blockage rate are investigated. Our studies and simulation results show that the proposed beam sweeping algorithm can achieve the accuracy of 15 cm in localization estimation and of 0.003 rads in orientation estimation. The training overhead is 100 times less than the conventional exhaustive search algorithm, while it can obtain 13 dBm of power gain when compared with the hierarchical codebook algorithm.

C. ORGANIZATION OF THE PAPER AND NOTATIONS
The rest of the paper is organized as follows. Section II introduces the system model for localization. Section III derives the CRLBs of the channel parameter estimation, where the position and orientation error bounds are also presented. The framework of the proposed joint beam alignment scheme is introduced in Section IV. The simulation setup and simulation results are detailed in Section V. Finally, we concisely conclude this paper in Section VI.
Notations: a, a, A stand for scalar, vector and matrix, respectively. A T , A H , A † , A 2 and A F represent the transpose, Hermitian transpose, pseudoinverse, Euclidean norm and Frobenius norm of matrix A, respectively. The (i, j)th entry of A is [A] i,j , and diag(A) is a diagonal matrix formed by the diagonal elements of A. Fig. 1 shows the considered system model. We consider a downlink system, where BS transmit information to MS using OFDM based transmission aided by hybrid beamforming in the mmWave frequency band. We assume that there are N subcarriers. BS has N B antennas and MS has N M antennas. It is well-known that in mmWave channels, there is usually a dominant LoS path and several NLoS paths generated by random scatterers (SC). In most cases, the power of the timevarying NLoS path components is insignificant compared with that of the LoS path, which leads to poor performance if they are used for positioning [12]. Moreover, the LoS path based localization cannot be accomplished, when the LoS path is not available due to blockage in some communication scenarios. To overcome these problems, a nearly passive reconfigurable intelligent surface (RIS) with N R discrete elements can be deployed between the BS and MS, as shown in Fig. 1, to help information transmission and also localization of the MS. Without any loss of generality, in this contribution, the uniform linear array structure is adopted by the BS, RIS as well as MS, but they can be extended to use other types of antenna arrays.

II. SYSTEM MODEL
As shown in Fig. 1, we assume that BS is located at b = [b x , b y ] T and RIS is placed at r = [r x , r y ] T , while MS is located at m = [m x , m y ] T , which is to be estimated. We assume that BS transmits the positioning reference signals (PRS) either directly to the MS or through the reflection by the RIS to the MS. With the aid of these signals, the MS senses its own location and orientation based on the limited feedback assisted beam sweeping.
Therefore, in the considered mmWave system, there are possibly three types of propagation paths, which are the LoS path, the NLoS path and the virtual line-of-sight (VLoS) path, as explicitly depicted in Fig. 1. The MS will choose the path with the highest power to locate itself, when the LoS path is blocked. In the following subsections, we present the channel model, RIS control followed by the signal model.

A. CHANNEL MODEL
Without considering the RIS, the propagation channel between BS and MS in the frequency domain can be represented as [24] where d and λ are the antenna spacing and signal wavelength, respectively, N B and N M are the numbers of antenna elements of the arrays at BS and MS, respectively, while θ BM,l and φ BM,l are the AoD and AoA of the l-th path between BS and MS, as seen in Fig. 1. Furthermore, in (1), ρ LoS ∈ {0, 1} is the LoS blockage indicator and γ BM,l is the path-loss coefficient for the l-th path. Note that, in this paper, we divide the LoS component and VLoS component into two separate stages, while the NLoS components are treated as noise due to their significantly lower power in comparison with the LoS and VLoS paths. Thus, only the large-scale fading (path-loss) is needed to be considered with the LoS/VLoS paths [12], as shown in (1). When RIS is employed, the VLoS path can be considered as a process of passive beamforming. Compared with the random scatterers, the reflected signals by RIS can be coherently added to enhance the signal power of the desired users [39].
Specifically, when the RIS operated in the far-field scenario is considered, the channel can be treated as a two-hop channel, consisting of the first hop channel H BR [n] and the second hop channel H RM [n]. Hence, the frequency domain cascaded RIS channel can be written as [24] for the n-th subcarrier, where γ BRM = √ ςλ 4π(d BR +d RM ) is the path-loss of the RIS channel [5], [12], where ς denotes the reflection loss. while τ RM and τ BR are defined in the same way as that in (1), denoting the delay from RIS to MS and that from BS to RIS, respectively. In (4), the RIS's phase shift matrix is = 1 which is a diagonal matrix with constant-modulus entries according to [24].
Thus, when the LoS, NLoS as well as VLoS paths are considered, the entire downlink propagation channel can be written as where Note that, only the large-scale fading is considered in (1) and (4), while the small-scale fading is ignored and treated as noise with its power proportional to the transmitted power, owing to the fact that in mmWave localization, only the strongest paths, i.e., the LoS or VLoS paths, are useful. Furthermore, since in our system, the localization scheme is separated into the LoS and VLoS path components, the spatial correlation between the LoS and VLoS paths can be neglected.
Let us now consider the RIS control.

B. RIS CONTROL
In this paper, the nearly passive RIS is employed, which consists of reflecting elements, copper backplane and control circuit board [8]. For the RIS control, the phase and magnitude of the reflected signal by a reflective element may be set by the controller, but the complexity is high and energy consuming. Therefore, in the proposed system, we assume that the RIS is only able to adjust the phases, while the reflected magnitude coefficient is 1, by following the widely used assumptions, such as, in [12], [24], [27]. Furthermore, we assume that the transmit beam by the BS towards the RIS is a sharp beam, as the result that the locations of both BS and RIS are usually fixed. After the reflection, the beam transmitted by the RIS should be directed towards the MS. This is achieved based on the assumption that the MS is capable of controlling the RIS via a lower frequency radio link, as done in [24].
On the other hand, as shown in Fig. 1, the reflected angle (phase) by RIS serves as a localization parameter. Hence, the localization goal is to estimate this phase based on the received signal and the power from the VLoS path. In the case that the LoS path is unavailable or too weak, the VLoS path will be utilized to sense the location of MS.

C. SIGNAL MODEL
To transmit information form BS to MS, at the BS, N D data streams are firstly precoded by the digital beamformer F BB ∈ C N RF ×N D , where N RF denotes the number of radio frequency (RF) chains. Then, on each RF chain, a N IFFTpoint inverse fast Fourier transform (IFFT) is implemented to transform the symbols from frequency-domain to timedomain, followed by adding a cyclic prefix (CP) before the RF level analog precoding. The CP length of T CP = L CP T s is assumed to be longer than the channel's delay spread, where T s is the sampling interval, while L CP denotes the CP length in samples. Then, an analog beamformer F RF ∈ C N t ×N RF is applied. Therefore, the hybrid beamformer is F = F RF F BB , which satisfies F RF F BB F = 1. Specifically, when the n-th subcarrier is considered, after the positioning reference signal (PRS) x[n] is precoded by the hybrid beamformer F at the BS and sent to the channel, the received baseband signal at MS can be expressed as where P is the transmit power, n[n] is additive white Gaussian noise (AWGN) following the distribution of CN (0, σ 2 I N M ), where σ 2 is the noise variance, and H[n] is given by (5). Hence, the received signal of (7) contains the LoS, VLoS and NLoS components, which have the required localization parameters embedded in the LoS and VLoS components. Note that the NLoS path components usually vary fast and convey insignificant power in mmWave systems in comparison to the LoS path [12]. Therefore, the channel parameters governing the LoS and VLoS paths are more interested for localization purpose. To investigate the potentials of the joint LoS/VLoS paths based localization, below in Section III, we derive the CRLB as well as the corresponding positioning error bound and the orientation bound.

III. FIM AND CRLB FOR RIS-AIDED MMWAVE LOCALIZATION SYSTEM
To derive the lower bound of localization error, the FIM of the channel and the location information as well as the CRLB are derived from the aforementioned received signal model in this section. In this paper, a two-stage approach proposed in [16] is employed to jointly estimate the location and orientation of the MS. During the first stage, the FIM of the channel parameters is derived. Then, the FIM of the location/orientation parameters is derived during the second stage using transformation and the FIM of the channel parameters obtained from the first stage. Let us first derive the FIM of the channel parameters.

A. FISHER INFORMATION MATRIX FOR THE CHANNEL PARAMETERS
To obtain the error bound of the position and orientation estimations, the CRLB of the channel parameters should be first derived. Based on Fig. 1, the channel parameters are collected into a vector η = By defining η as the unbiased estimator of η, the mean squared error (MSE) of estimation is bounded by the inverse of the FIM, which is represented by [16] where the parameterized expectation E y|η is taken with respect to the unknown channel parameters η. Note that (8) means that E y|η − J −1 η is a semi-definite matrix. Since the channel parameters are closely related to the localization of BS, RIS, and MS, the lower bounds of the estimated channel parameters are first derived. Correspondingly, the FIM in terms of the channel parameter vector η can be expressed by a 8 × 8 matrix as in which the (i, j)-th entry is given by To compute (10), μ[n] is assumed to be noiseless, which is the component embedded in (7), where each of the element in f = Fx[n] is set to e jv with v uniformly distributed in (0, 2π ]. The details for the FIM of (9) are provided in Appendix A. Note that, the diagonal elements of J η [n] denote the CRLBs of the estimates of the corresponding channel parameters.

B. FISHER INFORMATION MATRIX FOR THE LOCATION PARAMETERS
In the previous subsection, the FIM and CRLBs of the channel parameters were derived. Since the channel parameters are related to the localization of communication terminals, the FIM for the channel parameters can be transformed to the FIM for the location parameters via a Jacobian transformation matrix [16]. Let the vector containing the localization parameters be expressed as ς = [m x , m y , α] T , where α is the orientation of the array at MS. Then, the Jacobian transformation matrix T is obtained as [16] Furthermore, the FIM of the position parameters ς is related to the FIM for the channel parameters η via the formula [40] as It can be shown that the entries of the transformation matrix T of (11) can be obtained from the geometric relationships as shown in Fig. 1, which are given in detail as where c is the speed of light. Furthermore, the channel gain of the LoS path can be expressed as [5] where d BM is the Euclidean distance between BS and MS given by d BM = b − m 2 , and λ is the wavelength. For the VLoS path, the channel gain γ BR and γ RM can be obtained similarly as (14). It is noteworthy that if the localization is only estimated by ToA and AoD (or AoR), the parameters reflecting the strength of received signal, i.e., γ BM and γ RM , can be neglected. On the other side, when the LoS link is blocked, all the parameters defining the LoS path can be discarded.
Consequently, the transformation matrix T can be obtained, which is in the form of (15), whose entries are detailed in Appendix B.
Finally, after T is obtained, the FIM for the location parameters of MS can be obtained by (12).

C. POSITION AND ORIENTATION ESTIMATION ERROR BOUNDS
After the FIM of the location parameters has been derived using the approach in the last section, the error bounds of the position and orientation estimation can be readily obtained by the superposition of the estimation errors from N subcarriers [16]. In detail, by exploiting the N subcarriers, the FIM of the position parameters ς can be obtained as Correspondingly, the error bound for position estimation (PEB) is given by and the error bound for orientation estimation (OEB) is given by where [.] 1:2,1:2 and [.] 3,3 denote the top-left 2×2 sub-matrix and the third diagonal entry, respectively. In Section V, the PEB and OEB will be compared with the proposed beam training based localization scheme that is to be detailed in the next section.

IV. JOINT BEAM TRAINING FOR LOCALIZATION
In [24], the LoS path is assumed to be fully obstructed, while in practice the LoS path may or may not exist. Moreover, in [12], both the LoS and VLoS paths are utilized for localization. However, the maximum likelihood algorithm employed introduces very high training overhead. Since mmWave channels are typical sparse channels, this property has been exploited for reducing the complexity and training overhead in channel estimation and localization [41]. However, unlike a conventional mmWave channel, the cascaded channel in the RIS embedded system is usually converted to a richly scattered channel, which contains an increased number of scatter paths due to the reflections of RIS [12]. It is well-known that the richly scattered channel makes it difficult to estimate the channel parameters for the purpose of localization. Hence, to simplify the joint beamforming problem in the RIS aided mmWave system, in this paper, we propose to divide the joint beamforming into an active beamforming, operated at BS and MS, and a passive beamforming, operated at RIS, for both the LoS and VLoS paths. In our approach, the MS first determines the path with the highest power from the combined signal received from LoS and VLoS paths, and then use it to sense its own location. Explicitly, during the first stage, the RIS is de-activated and the beam sweeping algorithm is performed between BS and MS. Then, the RIS is activated during the second stage. During this stage, as the location of RIS is known a prior to BS, the AoD from BS to RIS can be assumed to be known, which can be used by BS to design a narrow beam towards RIS. Furthermore, since BS and RIS are at different locations and the LoS component between BS and MS has been identified during the first stage, this LoS component can be removed from consideration in the received signal to get rid of its impact on the VLoS path. Hence, during the first stage, the AoD at BS, denoting θ BM , can be estimated. During the second stage, the AoR at RIS, i.e., θ RM , can be estimated. Moreover, we propose a novel beam sweeping method for beamformer design, so as to strike a good trade-off between the localization performance and the beam search time. This method will be compared with the conventional hierarchical codebook [24], partial beam sweeping [42], and exhaustive search approaches in Section IV. Below in this section, different codebook-based schemes for beamformer design are considered. We start with the conventional hierarchical codebook based design proposed in [37]. The RIS codebook and BS/MS codebook designs are separated into two parts, where the RIS codebook contains only pure analog phase shifters, while the BS/MS codebook has a typical hybrid beamforming architecture. Therefore, in our codebook design, only analog codewords are considered for the RIS codebook, but hybrid codewords are considered for the BS and MS codebooks.

A. ACTIVE AND PASSIVE BEAMFORMING FOR LOS AND VLOS PATHS
In this paper, the location and orientation estimation based on the LoS or VLoS path is divided into two stages to address the LoS path blockage problem, which is the assumption similar to that used in [12]. However, in [12], random beamforming is employed, which causes a high complexity and also a high offline training overhead. To overcome these problems, in our hybrid LoS and VLoS paths based localization problem, we motivate to attain a good trade-off between the localization performance and training overhead, and investigate the impact of the different beam sweeping methods on the performance of localization.
To identify the LoS path or VLoS path for MS's location estimation, a two-stage strategy is introduced. In the first stage, the RIS is de-activated for the MS to sense if the LoS path is available. Then, in the second stage, the RIS is activated. Since the transmit beam from BS to RS is very narrow, the NLoS components between BS and RIS can be ignored. From the above-mentioned two stages, the MS identifies the strongest path for its location estimation. Below we detail this two-stage strategy.
Stage 1: As depicted in Fig. 1, when the RIS is deactivated, there are only LoS and NLoS channel components from BS to MS. In this case, ρ VLoS = 0 in (7), and the received signals can be formulated as for n = 0, 1, . . . , N − 1, where P is the transmit power, n[n] is additive white noise distributed as CN (0, σ 2 I N M ), while f LoS and W are the hybrid precoder and combiner employed at BS and MS, respectively, which can be defined in the similar ways as that for (7). It is noteworthy that the NLoS paths in mmWave systems vary fast and their power is usually weak. Hence, the NLoS paths are highly unreliable for localization [12]. Thus, it is reasonable to treat the NLoS paths as extra noise, which we represent  a t (θ BR ). Furthermore, after the processing of RIS, the beam from RIS to MS can also be designed to be a narrow beam. Hence, with the a prior knowledge about the narrow beams transmitted from BS to RIS and from RIS to MS, the interference generated by the LoS path on the VLoS path can be ignored, as the result that the LoS component has already been estimated by MS during the first stage. This explains why the LoS component is not included in (21) for the sake of simplicity.

B. PROPOSED BEAM SWEEPING ALGORITHM
Since the phase shifts at RIS are controllable, the signal processing carried out by the RIS is the same as passive beamforming. Thus, the MS can adjust the reflected phases at RIS by the limited feedback transmitted through a wireless control link based on the downlink PRS reflected by the RIS to the MS, with the objective, such as, to maximize the received signal power [24], [43]. Fig. 2 shows some examples of beam alignment algorithms in beamspace. In more detail, as depicted in Fig. 2(a), the most basic beam training technique is to exhaustively test all possible pairs of codewords to determine the best pair that maximize the received signal power. Note that here that a pair of codewords contain one for the RIS to implement precoding and one for the MS to carry out combining. Hence, this scheme somewhat needs to sweep every possible angle [37], [42]. As a consequence, this method is known as the exhaustive search (ES), which demands extremely high training overhead, if the number of antenna elements at RIS or/and MS is high.
In the context of the hierarchical codebook (HC) proposed in [37], as shown in Fig. 2(b), the column and row on the grid correspond to the AoD at BS (or AoR at RIS) and the AoA at MS, respectively. This method follows a multi-stage search strategy. For instance, if at each side of the RIS and MS, the HC is used to search twice, it leads to a 2×2 grid at the first stage (level). With the HC based search, the HC has several levels, and the number of codewords increases as the level number increases. In other words, the lowest level uses a small number of low-resolution codewords that generate broad beams, while the higher level has a relatively higher number of high-resolution codewords that generate narrow beams with very high directional beam gain. Therefore, HC consists of multiple levels with the resolution increasing from lower level to higher level, which allows to significantly reduce the overhead for the codeword pair search.
The partial beam sweeping (PS) scheme proposed in [42] only uses a subset of the codewords for the beam sweeping so as to reduce the overhead. As shown in Fig. 2(c), the transmit and receive codewords are represented as the two-dimensional grids, where each grid corresponds to a pair of transmit and receive codewords directed to certain directions. The PS scheme typically divides the beam training into two stages of the initial test (IT) and the additional test (AT). Unlike the ES scheme that directly tests all the high resolution grids, in the PS scheme, only half or even fewer of the grids are searched during the IT stage, which aims to achieve the approximate estimations for the AoD (or AoR) and AoA. However, due to the power leakage of mmWave channel, the two neighboring columns and two neighboring rows in the grid with the largest power are chosen. As shown in Fig. 2(c), we assume that the two tested grids darkly filled have higher power, which yield a new grid, as shown in the middle of Fig. 2(c), for the testing at the AT stage. At the AT stage, the untested neighboring grids are tested, and the best pair are obtained from that yielding the highest power [42].
In contrast to the above three search schemes, our proposed beam sweeping scheme combines the benefits of all of them, to achieve the trade-off between performance and training overhead. As shown in Fig. 2(d), the proposed scheme uses two stages of search, called as the initial test (IT) and deep test (DT), respectively. Different from the PS method, during the IT stage, our scheme uses low resolution codewords instead of high resolution ones, and employs the HC scheme, in order to estimate the potential range of the beam directions on the grids. However, the possible power leakage of mmWave channel may result in miss identification and impact the performance of estimation. Thus, as shown in Fig. 2(d), one neighboring column and one neighboring row in the higher resolution grids (yellow grids) are included to the DT stage. Then, during the DT stage, the ES scheme is employed to search for the final pair of codewords. To be more specific, during the IT stage, our proposed beam sweeping method can be viewed as a higher resolution hierarchical codebook algorithm, which is more robust than the binary HC algorithm [24] and ternary-tree HC algorithm [26], while the complexity is lower than the PS algorithm [42] and ES algorithm [37]. Then, the second DT stage can benefit from the angle range estimated by the first IT stage, which provides a space with significantly reduced size for executing the exhaustive search. From above we can see that the proposed beam sweeping scheme has the potential to obtain the near-optimal codewords, while the search overhead is significantly lower when compared with the ES algorithm. In detail, the training overheads for the HC, ES, PS and the proposed algorithms are summarized in Table 1. Further details about the performance and training overhead of the different beam sweeping schemes will be provided in the next section.
Remark: For the first stage, the proposed beam sweeping algorithm conveys [ceil( √ K)] 2 potential optimal pair codewords for precoding and combining, which can be also termed as the beamspace grids. For example, as shown in Fig. 2(d), when K 1 = 2 codewords are employed for both precoding and combining, the complexity for the first stage is O(K 2 1 ). Similarly, for the second stage, K 2 × K 2 codewords are included, but only the specific range of grids is considered (as shown in the yellow grids in Fig. 2(d)). From Fig. 2(d) we can know that the related complexity is O(K 2 2 /4 + K additional ), where K additional denotes the neighboring grids of the estimated range from the first stage. Thus, the total complexity of the proposed beam sweeping method is (29). Comparatively, for the HC algorithm, the complexity is O(SK 2 ) = O (12), where S = 3 denotes the total number of layers of beam sweeping, and K is set to 2 in Fig. 2. For the ES algorithm, the complexity is O(K 2 2 ) = O(64). Finally, the complexity of the PS algorithm is O(K 2 2 /2 + K neighbor ) = O (40), where K neighbor denotes the neighboring grids in addition to the two grids selected by the first stage. We can deduce from Table 1 that the complexity of our proposed algorithm is lower than that of the PS and ES algorithms, as the result that the resolution sacrificed in the first stage can help to reduce the overall complexity of the algorithm. More explicitly, to attain a good performance and complexity trade-off, the proposed algorithm exploits the low-complexity of the HC algorithm, and the neighboring grids of the PS algorithm, which aids to avoid the beamspace power leakage.

C. DESIGN OF HYBRID AND PURE ANALOG CODEBOOK
The proposed beam sweeping codebook design requires two sets of codewords having different resolutions for operation in the two stages, namely the IT and DT stages, expressed as s = 1, 2, respectively. The objective of the IT stage employing a low resolution codebook is to estimate the approximate ranges of the beamspace AoD/AoR/AoA angles, while that of the DT stage with a high resolution codebook is to provide the final estimations with relatively high accuracy for the beamspace AoD/AoR/AoA angles.
In this paper, the method proposed in [37] is introduced to generate the codebook. Specifically, to obtain the codebook for using in the first s-th stage, the (u, v)-th element of the overcomplete quantized angle dictionary [24] can be defined as where for the G quantized angles in the interval [−1, 1] [24]. Now let us define M s and W s as the codebooks of the RIS and MS, respectively, where each column of M s and W s denotes a codeword at the s-th stage, corresponding to a specific angle in the beamspace domain. As shown in [24], [37], the codebook M s (the same is W s ) can be obtained by the operation of where (C) † denotes the pseudoinverse operation, Z s is the normalization constant and G s is a G × K s matrix, which contains only binary elements of 1's and 0's, and K s is the number of codewords used at stage s. More explicitly, G s has the structure of where n non-zero denotes a column vector containing G/K s 1's. Hence, each column of G s has G/K s 1's, while the remaining elements are 0's. In this way, the beam pattern of the codebook M s ∈ C N R ×K s is constant over a specific angular range. Similarly, W s ∈ C N M ×K s for MS can be obtained. Each column of the codebook denotes a codeword corresponding to a specific angular beam. As an example, Fig. 3 shows the beam patterns of a RIS system, where the IT stage codebook consists of 8 codewords and the DT stage contains 64 codewords, and all the codewords are the same length of N R . Explicitly, the DT stage codebook has much higher resolution than the IT stage codebook. Note that, as shown in Fig. 3, when the number of codewords exceeds N R of the number of antennas, adjacent codewords will become non-orthogonal. However, each of the non-orthogonal codewords points to a specific angle. This property can be used to find the angle with the accuracy higher than that obtained by only using the set of orthogonal codewords.
As mentioned previously, RIS is assumed to be only equipped with phase shifters but no radio frequency (RF) chains. Hence, each element in the phase shift matrix is required to have a constant modulus. To satisfy these requirements, once M s is obtained using (22) where [·] :,j denotes the j-th codeword, and [·] i,j denotes the element located in the i-th row and j-th column.
On the other side, both MS and BS can employ hybrid beamforming. For the sake of simplicity of codebook generation, we assume that both MS and BS have the same number of N RF RF chains. Simultaneously, we assume that the BS and MS can only access a maximal N RF number of observations at a time. Then, consequently, let us assume that N RF codewords are employed as the combiner at MS (or precoder at BS). Then, the matrices for the hybrid combining (or precoding) at the s-th stage can be obtained as [

D. PROPOSED DESIGN FOR JOINT BEAM TRAINING AND LOCALIZATION
Assume that LoS and VLoS paths are separated into two frames without interfering each other. Then, the impact from the small-scale fading can be ignored, while the weak NLoS paths can be treated as noise with its power proportional to the transmitted power [12]. In our proposed joint beam training and localization scheme, the LoS or VLoS path is first selected based on their power, and the one with higher power is selected for the joint training and localization. Specifically, in the first frame, the RIS is de-activated. In this case, the localization problem can be treated as a conventional joint active beam alignment problem between BS and MS. By contrast, in the second frame when the RIS is activated, the localization can be achieved by the joint passive and active beamforming at the RIS and MS, respectively. Below we detail the operations under Frame 1 and Frame 2.
Frame 1: When operated under Frame 1, joint active beamforming between BS and MS is implemented to match to the LoS path, after the hybrid codewords are selected iteratively by the BS and the MS. As discussed in Section IV-C, there are two stages, i.e., IT and DT stages for the design. We assume that during the IT stage, the low resolution codebook employed by both BS and MS has the number of codewords equating the square root of that of the DT stage, i.e., K 1 = ceil( √ K 2 ). The codewords used for precoding and combining are generated according to (26) in Section IV-C. Therefore, based on (20), the improved interlaced scanning (IS) based received signal for the LoS path can be represented as Considering N subcarriers, the associated sum-power when the BS applies the i BS -th codeword, while the MS applies j MS -th codeword is given by  Fig. 2. By contrast, if s = 2, the selected pair are used as the final codewords for designing the hybrid precoder and hybrid combiner. Frame 2: During Frame 2, the joint active and passive beamforming between RIS and MS is implemented to match to the VLoS path. Following the LoS path scenario, a low resolution codebook M 1 in the form of (25) is used for RIS passive beamforming during the IT stage, while a higher resolution codebook M 2 is used during the DT stage. The combining codewords used by MS can be obtained according to (26). Then, following (21), the received signal from subcarrier n at stage s =1 or 2 can be expressed as when the precoding vector f VLoS employed by BS to steer the transmission signal towards RIS is assumed to be known. In (30), the K s columns of W s are chosen from codebook W for hybrid beamforming, and s,k = diag(m s,k ) is the phase shift matrix, with the diagonal elements given by the codeword m s,k for the passive beamforming of RIS, where m s,k is the kth column of M s . Similar to (28) and (29), when K s codewords are respectively applied by RIS, the MS can obtain Y VLoS,s [n] for subcarrier n in the form of (28). Then, to estimate the AoR at RIS and the AoA at MS, the sum power with respect to all i RIS ∈ [1, K s ], and j MS ∈ [1, K s ] is calculated as Then, for the IT stage of s = 1, the best pair of codewords are determined as the one having the highest power, which yield the search range for stage 2 after taking some neighbouring grids, as shown in Fig. 2, into account. Finally, after the DT stage of search by following the same procedure of the IT stage search, a pair of codewords can be obtained, RIS is used to generate the phase shift only. Then the codeword of precoding, and the codeword of MS is used to carried out hybrid combining. Note that, to inform the selection decision to RIS, MS needs log 2 (K s ) feedback bits, which are sent to the RIS controller through wireless control link. Path decision: From above discussion we know that there are possible LoS or/and VLoS paths between BS and MS. In the case that both LoS and VLoS paths present, the MS can choose a better path for positioning according to the received power of these two paths 2 Furthermore, after the selection of the pair of codewords at the DT stage, the channel parameters can be estimated, as detailed in the following discourses.
Blocked LoS case (P LoS,S < P VLoS,S ): When the LoS path is blocked, the MS senses its location based on the parameters estimated from the VLoS path, which are 2. In RIS-assisted mmWave communications, the NLoS paths in practice vary fast and they are usually too weak to build an effective transmission link, although they are sometimes possible to have higher power than the VLoS path, as argued in [44]. In this paper, only the stable links, such as LoS and VLoS paths, are considered for positioning, while the NLoS paths are treated as interference. ] T is the delay domain grid, while U t (θ RM ) ∈ C N R ×K 2 and U r (φ RM ) ∈ C N M ×K 2 are the search dictionaries of θ RM and φ RM , respectively. Finally, y VLoS,2 is the received signal at the DT stage, which is a vector obtained stacking all the signals received from N subcarriers.
Based on the estimates to the channel parameters, the location and orientation of the MS can be estimated as [16] LoS case (P LoS,S > P VLoS,S :) When the LoS path is available, the channel parameters of the LoS path can be estimated with the aid of the estimated hybrid codewords at the DT stage, which are represented as where f 2 = [W 2 ] :,i BS and w 2 = [W 2 ] :,i MS are the hybrid precoding codeword of BS and the hybrid combining codeword of MS determined during the DT stage. U t (θ BM ) ∈ C N B ×K S and U r (φ BM ) ∈ C N M ×K S are the search dictionaries corresponding to θ RM and φ RM , respectively, while t(τ BM ) is defined similarly to (32). Having obtained the estimates in (34), the location and orientation of MS can be obtained as Note that, as mentioned in Section IV-B, to mitigate the channel power leakage problem, one column and one row neighbouring to the grid identified at the IT stage are added to form the higher resolution search grids for the DT stage. Due to this, the total numbers of codewords searched are slightly different for the two search stages. To sum up, the proposed improved IS beam sweeping localization scheme is described by Algorithm 1 supported by the diagram-based in Fig. 4.

V. SIMULATION RESULTS AND ANALYSIS
In this section, we first compare the performance of the proposed improved IS beam sweeping algorithm with that of the conventional algorithms, such as, the exhaustive search (ES), hierarchical codebook (HC) and partial search (PS), when the impacts of the number of antennas, size of codebook, electrically large and small RIS are respectively considered. Then, we compare the training overhead required by the different beam sweeping algorithms, when the effect Store the k-th received signal as (28). 8: end for 9: Calculate the sum-power matrix as (29), and find the desired pair of codewords, which are indexed by i BS and j MS . 10 for k = 1 : K s do 14: for n = −(N − 1)/2:(N − 1)/2 do 15: Store the received signals for all-pair of codewords according to (30). 16: end for 17: Store the k-th received signal similarly as (28). 18: end for 19: Calculate the sum-power matrix as (31), and find the desired pair of codewords, which are indexed by i RIS and j MS . 20: end for 21: Estimate the channel parameters by following (32) or (34) based on the selected pair of codewords. 22: Estimate the MS's location m and orientation α using (33) or (35)  of the LoS blockage rate is taken into account. Finally, we compare the CRLB in the scenarios, when LoS only, VLoS only and both LoS and VLoS present. Table 2 details the parameters used in simulations. A system with BS, MS and RIS of all employing 16 antennas is considered for comparing the performance of the different beam sweeping algorithms, and demonstrating the impact of the different number of codewords (beamspace resolutions). As shown in Table 2, the locations of BS, MS, and RIS are  For the path blockage, we set the blockage rate according to that in [35], which is ρ

A. SIMULATION SETUP
where λ blocker = 0.1 denotes the blocker density, and v = 1 m/s is the blocker's moving speed, with h b = 5 m, h m = 1.2 m and h blocker = 2 m denoting the heights of BS, MS and blocker, respectively. Finally, the estimation performance is measured by the root mean squared error (RMSE), defined as: where K denotes the number of Monte Carlo trials, x and x are the true and estimated MS location or orientation.

B. COMPARISON OF DIFFERENT BEAM SWEEPING ALGORITHMS
In this subsection, the numbers of antennas at BS, MS and RIS are all set to 16. For the hierarchical codebook (HC) algorithm [24], the first layer contains 2 codewords at each side, and the total number of layers is S = 6. For the partial beam sweeping (PS) algorithm [42], two layers are assumed and both of them have the same resolution, and 32 of the 64 codewords are employed at either sides. After the first layer search, one neighbouring column and one neighbouring row are added to the second layer search, as illustrated in Fig. 2(c). For the ES algorithm, all the 64 codewords at both sides are trained for obtaining the optimal pair of codewords. Finally, for the proposed improved interlaced scanning (IS) algorithm, the number of codewords for the (s = 2)nd layer is set to K 2 = 64 at the BS, RIS and MS, which are used during the DT stage of search. Thus, for the IT stage, we have K 1 = √ 64 = 8 codewords. In Fig. 5, we compare the RMSE performance of the AoA estimation by the different beam sweeping algorithms, including the ES, HC, PS and the proposed algorithm, when considering the different scenarios, including LoS only, VLoS only and the unknown LoS blockage. First, the simulation results show that, for all the four beam sweeping algorithms, the performance of the estimation based on the LoS path, if it is available, is better than that based on the VLoS path, due to its larger path loss. Second, as expected, the ES algorithm outperforms all the other algorithms, since it does the exhaustive search and hence, demands the highest complexity. Third, it is known that the PS algorithm requires about half of the complexity of the ES algorithm, while its performance is much better than that of the HC codebook. However, we should note that, when the number of codewords is large, the first search stage of the PS algorithm will be of high complexity. Finally, as seen in Fig. 5, the proposed method converges at 2 dBm in the case of unknown LoS blockage, which is about 2 dBm worse than the ES algorithm. However, when taking into account the complexity, the proposed method can achieve a superior trade-off between complexity and performance, when compared with the ES algorithm. More explicitly, the complexity and training overhead trade-off will be analyzed in more detail later. Fig. 6 compares the RMSE performance of AoD (or AoR) estimation, when different beam sweeping algorithms are respectively employed, and when assuming different scenarios, including LoS only, VLoS only and the unknown LoS blockage. Similar to Fig. 5, the LoS only scenario provides a higher estimation accuracy than the VLoS only scenario. When comparing Fig. 5 with Fig. 6, we can observe that there is difference between the estimation accuracy of the AoD (or AoR) and AoA of the same path. This can be explained by the system geometry, as the AOA to be estimated is closer to beamspace grid than the AoD (or AoR), yielding a higher accuracy of AoA estimation. Additionally, it can be seen that for a given beam sweeping algorithm, the RMSE floor presents, when SNR reaches a relatively high value. This floor is due to the beamspace grids, which unavoidably introduce quantization error. The other observations are similar as that obtained from Fig. 5.
The corresponding RMSE performance of the ToA estimation is shown in Fig. 7. As shown in Fig. 7, the RMSE of the LoS path based estimation converges to 2 × 10 −9 second(s), while the VLoS path based estimation converges to 1.55×10 −9 s. Note that the ToA estimation is only related to the combined grids of the received signal power and delay. In this study, the step size of time delay is set to 10 −8 s. Furthermore, for the pure LoS case, we can find that the ToA estimation can obtain a 4 dBm gain after convergence, which is provided by the dominated power of the LoS path.
Based on the estimated AoD (or AoR) and ToA as well as (33) and (35), the RMSE performance of positioning is demonstrated in Fig. 8, when the case of LoS only, VLoS only, and the unknown LoS blockage are respectively considered. Explicitly, for all the beam sweeping algorithms, the LoS case outperforms both the VLoS and the unknown LoS blockage cases, converging to the RMSE of 70 cm. For the VLoS case, the RMSE of localization is around 1 m, while for the unknown LoS blockage case, this error is about 85 cm. As shown in Fig. 8, our proposed algorithm needs about 9 dBm more transmit power to reach the convergent performance than the PS algorithm, which is still 18 dBm less than the HC algorithm required.
Finally, based on the estimated AoA, (33) and (35), the RMSE performance of the MS's orientation estimation is shown in Fig. 9, from which we can find that the orientation error has the same trend as the AoA estimation error shown in Fig. 5, as it is the AoA-based orientation estimation scheme. As shown in Fig. 9, in the LoS only case, the proposed algorithm is capable of achieving better performance than the PS and HC algorithms, when the transmitted power is less than 10 dBm. This is because the first level search in the proposed algorithm can provide an approximate orientation range, especially in low SNR regime, which is further enhanced by the second level search. More explicitly, although the PS algorithm has a higher resolution during the first level search, it only searches over half of the grids. By contrast, for the proposed algorithm, all the possible AoA ranges are searched in the low resolution domain, which is robust to noise. Hence, it outperforms the PS algorithm in low SNR region. For the VLoS only case, the PS algorithm attains better performance than the proposed algorithm. This is because in this case the received signal is very weak, and more codewords are required by the proposed algorithm at the IT stage. Therefore, the PS algorithm outperforms the proposed algorithm. However, the PS algorithm employs more codewords in the first level search than the proposed one, leading to much higher complexity, as in the first level search, the beamspace angle range is significantly wide, which requires more grids. The training overhead of all the algorithms is analyzed in the next subsection, to prove the superiority of our proposed algorithm.

C. TRAINING OVERHEAD
Since all the codebook-based algorithms converge to the same error floor of estimation, we investigate the training cost and performance trade-off. First, in Fig. 10, we compare the beam sweeping time overhead of the different algorithms, when the transmitted power P TX = 5 dBm.   Note that, since RIS is only capable of adjusting phases, no RF chains are employed. As shown in Fig. 10, when the number of RF chains N RF > 2, the required training time becomes a constant for the HC algorithm, which is because the HC algorithm only considers 2 codewords per layer. Thus, N RF = 2 is enough for the HC algorithm, which leads to the lowest hardware complexity and training overhead. With the increase of N RF , the training overhead of the ES algorithm decreases significantly, as the result that it employs large number of codewords K s for beam search. However, it still has a high training overhead, even when the number of RF chains is large. In terms of our proposed algorithm, from Fig. 5 to Fig. 9, we know that it has a small performance loss in comparison with the ES algorithm. However, when compared with the ES and PS algorithms, the complexity and the training overhead of our proposed algorithm are reduced by about 25 times and 7 times, respectively. Moreover, according to Table 1, in Table 3, some codebook sizes and the corresponding training overheads of the ES, PS and proposed algorithms are provided. From the results we can see that the training overhead of the proposed algorithm is only linearly dependent on its codebook size. By contrast, the training overheads of the ES and PS algorithms are exponentially related to the codebook size. Hence, given a training overhead, the proposed algorithm is able to employ more codewords to achieve a higher resolution. Therefore, based on the above analysis, we can be inferred that the proposed algorithm is capable of providing a promising trade-off between the achievable performance and the training overhead, when compared with the HC, PS and ES algorithms.

D. EFFECT OF CODEBOOK RESOLUTION
In Fig. 11, the RMSE performance of both the positioning and orientation estimation of the MS is illustrated, when different codebook resolution is considered, where a bigger number of codewords yields a higher resolution. As shown in Fig. 11, when the number of codewords is increased from 64 to 512, the localization error is reduced from 87 cm to 15 cm in average, when the transmitted power is higher than 2 dBm. In this case, the orientation estimation accuracy is improved from 0.02 rads to 0.005 rads.

E. EFFECT OF LOS BLOCKAGE RATE AND FADING
In Fig. 12, we show the effect of the density of the blockers on the LoS path on the positioning performance, when the transmitted power is set to P TX = 20 dBm. As shown in Fig. 12, first, it is explicitly shown that when the number of codewords is increased from 64 to 512, the RMSE of localization estimation is reduced from 96 cm to 26 cm. Second, according to [35], when the blocker density is λ blocker = 0.2, the blockage rate of LoS path is ρ = 0.8042. As shown in Fig. 12, when λ blocker > 0.3, the LoS path is basically not available, and hence the localization of MS is fully dependent on the VLoS path. Additionally, according to Fig. 8, when the transmitted power is P TX = 20 dBm, the proposed beam sweeping algorithm has the same performance as the HC, PS and the ES algorithms. This can also be clearly seen in Fig. 12, the proposed algorithm has almost the same performance as the HC algorithm, when a given number of codewords is assumed.  In the above simulations, the LoS/VLoS channels are assumed to be deterministic experiencing only the large-scale fading. In Fig. 13, we validate this assumption by assuming that the LoS/VLoS channels experience the small-scale Rician fading, in addition to the large-scale fading. In the Rician fading, the K-factor is defined as the ratio between the power of LoS component and that of NLoS components. Hence, a larger value of K implies that more power is delivered by the LoS component. As shown in Fig. 13, the positioning performance converges to that achieved in deterministic channel, as the value of K increases. Specifically, when K ≥ 7 dB, the positioning performance achieved over Rician fading channel is close to that attained in deterministic channel. In practice, since mmWave systems utilize transmitter/receiver beamforming, the Rician K-factor is generally large, typically in the range of 9 to 15 dB [45]. Hence, it is reasonable to approximate the LoS/VLoS channels as the deterministic channels experiencing only large-scale fading.  (17) and (18), are compared with the proposed improved IS algorithm, when 32 antennas are respectively employed at BS, MS and RIS, and when 512 codewords are employed in the DT stage. From the results, we can see that the proposed method has 25 cm of localization error in average, while the PEB is about 10 cm, when the transmit power is −12 dBm. When the transmit power is higher than −12 dBm, the RMSE gap between the proposed algorithm and the PEB becomes larger, which is due to the number of codewords limited by the number of antennas. As shown in Fig. 14, the relationship between the orientation estimation error and the OEB is similar as that between the localization estimation error and the PEB.

VI. CONCLUSION
In this paper, the downlink joint active and passive beamforming based localization scheme with the aid of RIS was presented, where a limited feedback assisted adaptive phase shift scheme for localization was introduced. Specifically, a novel beam sweeping algorithm for both active and passive beamforming was proposed. Compared with the conventional beam sweeping algorithms, i.e., HC, PS and ES, introduced for estimating AoA, AoD (or AoR) and ToA for the position and orientation estimation, the proposed algorithm is capable of providing a superior trade-off between performance and training overhead and hence, it is feasible for practical application. In more detail, our studies showed that the training overhead of the proposed technique is 100 times less than the ES algorithm, and it requires 13 dBm less of transmit power than the HC algorithm. For the proposed algorithm alone, when the codebook resolution increases, our results showed that while the localization and orientation estimation accuracy improve, it can also attain the power gain for performance enhancement. This is especially important when LoS path has a high probability to be blocked. In this case, employing a high resolution codebook is a promising way to enhance the positioning accuracy. Additionally, our studies showed that electrically large RIS is desirable for attaining relatively good positioning performance.