Machine Learning Based Indoor Localization Using Wi-Fi RSSI Fingerprints: An Overview

In the era of the Internet of Things (IoT) and Industry 4.0, the indoor usage of smart devices is expected to increase, thereby making their location information more important. Based on various practical issues related to large delays, high design cost, and limited performance, conventional localization techniques are not practical for indoor IoT applications. In recent years, many researchers have proposed a wide range of machine learning (ML)-based indoor localization approaches using Wi-Fi received signal strength indicator (RSSI) fingerprints. This survey attempts to provide a summarized investigation of ML-based Wi-Fi RSSI fingerprinting schemes, including data preprocessing, data augmentation, ML prediction models for indoor localization, and postprocessing in ML, and compare their performance. Any ML-based study is heavily reliant on datasets. Therefore, we dedicate a significant portion of this survey to the discussion of dataset collection and open-source datasets. To provide good direction for future research, we discuss the current challenges and potential solutions related to ML-based indoor localization systems.


I. INTRODUCTION
Currently, Internet of Things (IoT) devices are becoming increasingly popular. With the advent of Industry 4.0, advanced Smart-X applications using smart devices, such as smart cities, homes, farms, and factories, are being developed rapidly [1]. For such applications, the fusion of artificial intelligence (AI), robotics, 5G, and big data technologies is crucial. The number of smartphone users worldwide in 2019 was 3.3 billion, which represents a 10% increase from the previous year [2]. The number of active IoT devices is projected to increase to 43 billion by 2023 [3]. Three out of every four (74%) smart device owners are active users of location-based applications [4]. Currently, most people spend approximately 80% of their daily lives indoors. As a result, approximately 70% of smartphone usage and 80% of data transmission occur in indoor environments [5]. Therefore, indoor localization is essential for providing intuitive and customized user services and ubiquitous monitoring and control using smart devices. Therefore, the localization market is projected to grow to $183.81 billion by 2027 [6].
The associate editor coordinating the review of this manuscript and approving it for publication was Chih-Min Yu .
Identifying (or predicting) the locations of devices (or users) in outdoor and indoor settings/environments is known as outdoor and indoor localization, respectively. For outdoor localization, users often use global navigation satellite systems (GNSSs), such as the global positioning system (depending on the region). GNSSs provide good positioning performance in open (or outdoor) areas. However, in closed (or indoor) areas, where a direct line-of-sight is unavailable, they perform poorly and have very limited usage based on severe indoor channel conditions, including shadowing and multipath fading [7]. Therefore, for indoor localization, instead of GNSSs, the following three categories of techniques are often used: wireless signal-based techniques, vision-based techniques, and other techniques. Wireless signal-based techniques use various measurement parameters, such as the time of arrival (ToA), time of flight (ToF), angle of arrival (AoA), time difference of flight, time difference of arrival, received signal strength indicator (RSSI), and channel state information (CSI) [8]- [15]. Vision-based systems, which are commonly referred to as computer vision techniques, use multiple devices, such as monochrome cameras and infrared cameras, to capture visual information and apply computational processing techniques to estimate the locations of users [16]- [19]. However, vision-based techniques are relatively complex and expensive, have real-time issues related to large processing delays, and suffer from uneven lighting conditions, occlusion, and position changes of objects in an environment, which degrades the performance and scalability of this type of system [18]. Other techniques based on acoustic background fingerprinting, the dead-reckon method, magnetic fields, accelerometers, and barometers are also used to estimate the locations of devices [20]- [25]. These techniques may have high accuracy but often require additional specialized equipment for implementation, meaning that energy consumption and system cost are significantly increased.
In this survey, we focus on wireless-signal-based indoor localization. In the literature, wireless-signal-based indoor localization technology is classified into two categories: geometric and fingerprinting approaches. Geometric approaches include multilateration, trilateration, and triangulation methods, for which various measurement parameters (ToA, ToF, AoA, etc.) can be used. These approaches are well established but do not have sufficient performance for practical indoor service provision based on outlier distortion caused by nonline-of-sight signals and multipath problems. Additionally, a large communication overhead and the need for good synchronization circuits between devices increase the overall system cost [26]- [30].
Fingerprinting approaches employ RSSI or CSI as pattern matching parameters to determine the positions of devices. Compared to geometric approaches, fingerprinting approaches are relatively simple, easily incorporated into smart IoT devices, and able to achieve acceptable accuracy with support from existing wireless infrastructure (Wi-Fi, cellular, etc.) RSSI estimates positions based on the collected/received signal strengths from several access points (APs), whereas CSI estimates positions based on a combination of the communication link attributes between a transmitter and a receiver, including rank indication, the precoder matrix indicator, and channel quality indicator. Therefore, in terms of system performance, CSI is superior to RSSI [31].
However, this situation has changed based on the introduction of modern AI technology. Recently, RSSI-based Wi-Fi fingerprinting techniques using ML or ''ML-based RSSI fingerprinting'' have recently demonstrated significantly improved localization performance that is comparable to that of other highly sophisticated schemes, including CSI. Several studies [32]- [35] have demonstrated that the performance of ML-based RSSI fingerprinting techniques is satisfactory in terms of accuracy and latency, even though there is a modest increase in system complexity based on the incorporation of ML functions. CSI-based Wi-Fi fingerprinting techniques using ML or ''ML-based CSI fingerprinting'' can provide enhanced localization performance but require the consideration of trade-off factors in terms of implementation, such as larger datasets, larger computational power, and longer latency. Additionally, CSI-supported APs that require advanced network interface cards and modified device drivers incur additional installation costs [31], [36]- [39].
ML-based RSSI fingerprinting using big data can be less susceptible to multipath fading and can efficiently handle system issues such as signal fluctuations or hardware failures. For example, ML techniques can constructively exploit (learn) the sequential correlations of time-varying RSSI measurements and use trajectory information, meaning RSSI fluctuations caused by fading or hardware failure can be alleviated [40], [41]. Self-calibration techniques for updating radio maps can also be incorporated to address RSSI temporal variations. As a result, ML-based indoor localization using Wi-Fi RSSI fingerprints is relatively straightforward but can still provide high-quality localization services, even with no extra infrastructure.

A. EXISTING INDOOR LOCALIZATION SURVEYS
Based on the high demand for indoor localization, research in this area is on the rise [42]. In [43], the authors presented a survey on recent advances in theoretical approaches to indoor localization and discussed various applications. The literature [44] is one of the early survey papers that discuss wireless indoor positioning techniques and systems. The authors of [45] and [46] presented extensive studies on wireless indoor localization systems from a device perspective and reviewed recent advances in device-based and device-free localization. Similarly, in [47] and [48], the authors presented a study on indoor localization methods for a contemporary smartphone and its sensors. The authors of [49] and [50] presented detailed discussions of schemes for indoor localization from the perspective of IoT infrastructure. In [51], the authors presented a survey on Wi-Fi fingerprint-based indoor positioning systems. They also discussed advances in terms of reducing labor-intensive tasks such as data collection, calibrating heterogeneous devices, and achieving energy efficiency for smartphones. In [52], the authors presented an in-depth discussion of the challenges related to fingerprinting in indoor positioning and navigation. In [53], the authors presented a detailed survey on indoor location-based services, including their challenges, requirements, and usability. In [54] and [55], the authors explored future opportunities for localization services for 5G and beyond-5G (or 6G) wireless communications systems, where their key technologies (including ML-based schemes), underlying challenges, and potential solutions were discussed. In [56] and [57], the authors presented a review report on different ranging-based indoor localization. They discussed different types of fingerprints, such as CSI, visible light, and Bluetooth, and other localization methods. Additionally, they proposed architecture for intelligent indoor localization.

B. MOTIVATION AND AIM
As addressed in Section I-A, most of the recently published existing survey papers are generic and deal with a wide range of wireless signal-based indoor localization schemes [43]- [57] such that discussion about the latest ML-based VOLUME 9, 2021 TABLE 1. Summary of existing surveys related to indoor localization. The symbol ''Yes'' indicates that a publication has a significant amount of discussion on scope, ''Limited'' indicates that a publication has limited discussion on the scope, and ''No'' indicates a publication that does not cover that area. Nevertheless, readers may retrieve some related insights. Note that DC = Database Construction, DE = Database Enrichment, WR Fingerprinting = Wi-Fi RSSI Fingerprinting.
indoor localization using Wi-Fi RSSI fingerprints has been limited. Hence, it motivates us to present this exclusive survey paper that draws a boundary around Wi-Fi RSSI and ML-based indoor localization and presents a comprehensive discussion on distinctive ML technology aspects in indoor localization. including database construction and enrichment techniques, performance metrics, ML structure in indoor localization, ML-based RSSI fingerprinting schemes, publicly available datasets, and technical challenges and solutions. This survey paper aims to help readers navigate the abundance of existing literature regarding ML-based indoor localization using Wi-Fi RSSI fingerprints. Table 1 summarizes recently published survey articles related to indoor localization, as discussed in Section I-A.
C. CONTRIBUTION OF THE PAPER 1) This work provides a survey of ML-based indoor localization techniques using Wi-Fi RSSI fingerprints that have been proposed in the literature over recent years. 2) We emphasize the importance of indoor localization by discussing its potential futuristic applications that can be exploited by various startups, companies, and research organizations.

3) This work provides a detailed discussion of RSSI-based
Wi-Fi fingerprinting methodology along with the data collection process. We also provide pros and cons of Wi-Fi fingerprinting by highlighting its suitability in indoor localization. Furthermore, we discuss the various techniques used for radio map construction and data quality improvement. 4) We contribute by providing a brief discussion on data preprocessing techniques, different ML prediction models, and postprocessing techniques that can be used in indoor localization. We also provide a brief discussion on dimensionality reduction, transfer learning (TL), and data augmentation for localization system implementation. 5) In this paper, we not only discuss popular public datasets but also highlight relatively newer datasets. The use of newer datasets can establish standardized comparisons across various upcoming indoor localization schemes. 6) Finally, we discuss open challenges and issues pertaining to ML indoor localization using RSSI-based Wi-Fi fingerprints and present corresponding potential solutions. It can help readers direct their research toward solving such important localization issues. The rest of the paper is structured as follows. Section II presents the applications of indoor localization in different use cases. Section III discusses the RSSI, Wi-Fi fingerprinting technology, and construction and enrichment of radio maps. In Section IV, different performance metrics for the evaluation of indoor localization are discussed. In Section V, we briefly discuss the basics of ML prediction models and special techniques such as data preprocessing, data augmentation, ML algorithms for indoor localization, data postprocessing and TL. In Section VI, we summarize various indoor localization schemes using AI algorithms. Section VII presents some publicly available databases. In Section VIII, we provide the details of challenges faced by ML-based indoor localization and address their corresponding potential solutions. Finally, we present the discussion and conclusion of the paper in Section IX.

II. APPLICATIONS OF INDOOR LOCALIZATION
In this section, we discuss several different application areas where indoor localization is needed, as shown in Fig. 1. User or device localization has wide-scale applications in surveillance, location-based social networking (LbSN) [58], asset finding and tracking [59], the health sector, and disaster management [60]. Smart-X (such as smart cities [61]), autonomous vehicles (AVs) [62], and the IoT [63] can also be gained with indoor localization techniques. The following explains some application scenarios using indoor localization.
1) In manufacturing plants, predicting the location of autonomous robots by RSSI analysis provides a costeffective and straightforward way of monitoring (and manufacturing) more products. In the near future, human-robot collaboration will become a new norm, where robots become co-bots. For example, for various monotonic but dangerous tasks, using robots in the right place is valuable and essential for a safety and collision avoidance system. 2) In a large warehouse, where small UAVs can be used for tasks, such as surveillance or shuffling of goods, their real-time location is beneficial for better inventory management and control. In shopping malls or markets, personalized marketing and advertisement can be done based on the customers' location. It also helps businesses track customers' habits, behavior patterns, and footfalls. 3) In smart factories and buildings, Wi-Fi-enabled alarm systems can provide the exact location of an accident in the complex. Moreover, localization systems can help evacuate people from danger zones by providing a safe navigating path. 4) Smart homes are also an area where indoor localization can be used. The smart devices' location in the house is known using indoor localization, which enables a more personalized user experience. Moreover, the home owner can monitor and give access to his/her Wi-Fi network only to those devices (users) inside the house or a predefined area. 5) Indoor localization can be used for user location-based authentication, such as location-based access control for sensitive business data and hardware resource allocation based on the user's position. Similarly, anti-theft devices can become more intelligent by utilizing the potential of indoor localization. 6) The hospitality sector can obtain the most benefits from this technology because most hotels and restaurants have a Wi-Fi network. Using the already installed network, one can easily track users' behavior patterns, find the most favorable areas, and provide users with personalized services by placing the autonomous service robots or support staff at the right place. 7) In hospitals, there are many assistive smart machines and equipment for patients. Using indoor localization, the location and availability of these assistive machines can be monitored. Additionally, doctors or nurses can track the location of their patients in the hospital. In the future, nanosensors and robots will be used for targeted drug delivery, where localization techniques could be used to track these nanorobots or sensors inside the body with a drug container carrying drug particles and releasing them at the target (e.g., tumor). 8) Indoor localization techniques are also valuable for greenhouses where different types of crops and fruits are grown. These techniques help the farmer to monitor and track the location of various wireless sensors, which ultimately helps in lowering the maintenance cost of the greenhouse that enables him/her to become a smart farmer. Similarly, indoor localization can be instrumental in large warehouses to locate different products, in libraries to search for books, and in parking (indoor) areas to track vehicles. 9) Indoor localization is also crucial for applications where relative position information and synchronization among multiple devices are essential. For instance, indoor entertainment performances by autonomous drones or robots will boost with this technology. In broadcasting any sports in a large indoor stadium, where various movable wireless cameras are tracking athletes from different locations, keeping track of camera positions is also valuable for transmitting live action from the field using technologies such as 5G or 6G. It helps broadcasters increase their service efficiency and quality. 10) Indoor localization will play a significant role in space exploration. In this scenario, positioning information of each device or human may be vital for efficient, safe, and secure functioning in space.

III. RSSI-BASED Wi-Fi FINGERPRINTING
In this section, we first describe RSSI techniques in detail and their working principles in wireless localization. We then discuss Wi-Fi and compare it to other wireless standard technologies. Along with this, both Wi-Fi fingerprinting and fingerprint collection processes are explained. We also discuss the factors affecting Wi-Fi fingerprints. Finally, we address the techniques used for radio map construction and its quality improvement.

A. RSSI
RSS is the calculation of real signal power received by a receiver, which is typically expressed in decibel milliwatts (dBm) or milliwatts (mW ) [64]. RSS can be used to measure the distance between transmitter (Tx) and receiver VOLUME 9, 2021 (Rx) devices based on the transmitted and received signal power differences. Generally, two propagation models have been used in RSSI-based wireless sensor networks: (1) freespace models and (2) log-normal models. Free-space models are simple (ideal) but are often limited in real applications because they do not consider obstacles between receivers and transmitters. Therefore, log-normal models are more practical than propagation models and are suitable for indoor and outdoor environments based on their flexibility in different environmental settings [65]. Mathematically, the free-space propagation model is defined as follows [66]: where P r is the received power, P t is the transmitted power, G t is the transmitter antenna gain, G r is the receiver antenna gain, λ is the wavelength of the radio waves, d is the distance between the transmitter and receiver, and L is the propagation loss in the channel, which is a function of fading. The lognormal propagation model is defined as follows [65]: In (2), α is the path loss exponent, which depends on a specific propagation environment, L p (d 0 ) is the path loss at a reference distance d 0 , and C a is a normally distributed random number with zero mean and a variance of σ 2 considering shadowing and other sources of uncertainty (C a ∼ N (0, σ 2 )) [unit: dB] [65]. The RSSI representing the RSS level at a receiver device has an arbitrary range of values that the chip supplier primarily characterizes. For example, a receiver device may translate dBm values into RSSI values ranging from 0 to 60, 0 to 100, or −100 to 0, depending on the chip vendor. RSSI is one of the simplest and most widely used indoor positioning tools in the literature [50]. Although RSSI-based solutions have advantages such as lower device requirements, better accessibility, and cost-effective system design, they also suffer from numerous problems in indoor environments [67]. These problems include significant path loss, multipath fading loss, indoor noise and interference, absorption loss, and the unavailability of some APs during localization. Various in-building materials also affect RSS levels, as shown in Table 2. To address these issues, several solutions have been proposed in the literature, including various filtering and averaging methods, RSS cutoff and self-calibration techniques, the use of an increased number of APs or reference points (RPs), and ML-based schemes. In particular, ML-based schemes such as wireless signal recognition using ML and channel modeling using ML are promising candidates for solving RSSI-based indoor localization issues [69]- [71].

B. Wi-Fi AND FINGERPRINTING 1) Wi-Fi
Wi-Fi is a family of IEEE 802.11 mainstream wireless networking interfaces that are widely used to deliver network and internet connectivity services to various users in both private and public areas. Wi-Fi uses 2.4 GHz and 5 GHz ISM frequency bands [72], which consists of channels with 20 MHz, 40 MHz, and 80 MHz bandwidths. As the cost of manufacturing Wi-Fi modules decreases, almost all of the latest smartphones, environment monitoring sensors, and other smart devices have built-in Wi-Fi capabilities. Furthermore, due to the deep penetration of Wi-Fi network infrastructure, Wi-Fi is easily accessible in various places, ranging from small coffee shops to large stadiums and airports. Consequently, almost 4.45 billion people were active internet users in January 2020 [73], and most of them were using the internet through Wi-Fi. Furthermore, the latest commercially available Wi-Fi standard, Wi-Fi 6, which operates on bands between 1 GHz and 6 GHz, has a reduced latency by 75% and an increased transmission rate up to 11 Gbits/s (theoretically) [74] when compared to its previous version. Therefore, the increased transmission will further expand the popularity of Wi-Fi.
Additionally, compared to other wireless standards, such as Bluetooth Low Energy (BLE), ZigBee, LoRaWAN, RFID, and UWB [75], [76], Wi-Fi has a high bitrate and high scalability and is relatively less affected by external factors. Higher data transmission between devices provides more opportunities for improving the accuracy of localization [63], [77]. As a result, Wi-Fi, the most suitable and popular wireless standard, has become one of the most widely studied schemes in the indoor localization literature [78]- [80]. Therefore, Wi-Fi indoor localization schemes that are directly accessible to many smart devices will provide device-based solutions for various futuristic localization services.

2) FINGERPRINTING
The fingerprinting technique (also known as scene analysis or fingerprint matching) is the most commonly used, costeffective, and high-precision localization technique. Fingerprinting, which is a pattern matching method, matches the fingerprints of different positions in an analogous reference frame to an unknown pattern of fingerprints to predict the location of a particular device [81], [82]. To achieve high positional accuracy, both temporal and spatial patterns should be considered. A temporal pattern is an observed signal pattern during the maneuvering of a device in an indoor environment. In contrast, a spatial pattern represents the geographic composition of signals (roughly equivalent to the RSSI). Determining the position of a device can exploit these signal patterns. Additionally, scene analysis does not require precise and rigid physical quantities, such as distances and angles. In general, the fingerprint (fi) of a Wi-Fi signal consists of three elements: the location coordinates (x, y) of an RP or landmark, unique address or identification number of an AP (AP ID ), and RSSI values of the corresponding AP at an RP (RSSI m ). Therefore, the RSSI fingerprint of a Wi-Fi signal is defined as fi = [(x, y), AP ID , RSSI 1 , RSSI 2 , . . . ], as shown in Fig. 2.
The fingerprinting method consists of two phases, as shown in Fig. 3: offline and online. Initially, in the offline phase, fingerprints are obtained from a predefined number n of landmarks using various sensors or smart devices to construct a radio map (database) rd n (fi) = (fi 1 , fi 2 , fi 3 , fi 4 , . . . , fi n−1 , fi n ). A radio map is a visual illustration of the availability and intensity of RSSI in an indoor environment. Generally, ''fingerprint data collection'' involves multiple steps, as shown in Fig. 4. In the first step of the offline phase, the floor plan of a Wi-Fi-network-enabled indoor environment is defined, where a localization service is provided. In the second step, the entire floor plan is divided into a grid of lines with multiple design options or grid properties such as round, hexagonal, or square. The third step involves marking multiple landmarks or RPs at regular or irregular distance intervals from each other. These RPs may or may not have line-of-sight with APs. In the fourth step, as shown in Fig. 4, RSSI values are collected using Wi-Fi-enabled sensors or smart devices, along with corresponding AP and RP coordinates. In the fifth step, all of the data collected from different devices are combined. To construct a radio map, various preprocessing techniques, such as filtering and averaging irregular values or eliminating null values, are applied to the collected data. This radio map can then be used for ML training and testing.
During training, a localization function tries to learn the mapping between real-time RSSI observations and device locations. The role of pattern matching in traditional fingerprinting algorithms is to determine the similarity between training and testing fingerprints. The goal is to find a pair consisting of the closest testing and training points in the fingerprint space and then use training point position information to approximate (and estimate) the testing point in the location space. The localization function can be as simple as the Euclidean distance in the k-nearest neighbors (kNN) algorithm or very complex in a deep learning-based localization learning function. Subsequently, in the online phase, ML models use the trained/learned localization function to predict the real-time locations of devices based on RSSI measurements, as shown in Fig. 3.
The performance of an ML algorithm is directly proportional to the quality of the radio map, and the quality of a radio map depends on the quality of fingerprints. Wi-Fi signals are affected by various factors. (1) The human body degrades signal quality by absorbing signals because the human body is approximately 70% water [83], [84]. (2) Multipath fading greatly degrades transmission signal quality [85], [86].
(3) The number of APs and RPs is critical. If the number of APs and RPs is low, then the granularity of fingerprints decreases. However, if the number is large, then it increases the requested time for the data collection process, which may cause interference between signals [87]. (4) Device orientation is important because signals are often influenced by the orientations of the devices, which determine the positions and configurations of antennas [88]. (5) Device dependency refers to the use of Wi-Fi sensors produced by different vendors for specific devices. Every vendor has its own set of standards for representing signal strength and arbitrary RSSI values. As a result, fingerprints are sometimes unreliable or incompatible [89], [90]. (6) Energy consumption varies because some Wi-Fi networks consume larger amounts of energy than other wireless systems [77]. (7) Data collection processes have a significant impact. Radio map construction itself is the greatest challenge in indoor localization. Collecting fingerprints often takes a long time and consumes many man-hours. It also requires a large amount of storage space. A small change in an indoor environment may also require re-evaluation or even recollection of RSSI values. Based on these factors, the performance of ML localization algorithms is often degraded, which hinders the adoption of indoor localization systems in the real world.

C. TECHNIQUES FOR CONSTRUCTION AND ENRICHMENT OF RADIO MAPS
Due to the above factors, collecting high-quality fingerprints in indoor environments has been a very challenging process. To comply with these factors, the most straightforward approach is to collect as many fingerprints as possible from multiple RPs in an indoor environment. However, it is a timeconsuming and labor-intensive process. Apart from being cumbersome, it is sometimes not affordable. Therefore, one of the critical challenges is to minimize the efforts for obtaining high-quality training fingerprints for the ML model. In the literature, many solutions have been proposed both for reducing the efforts of RSSI fingerprint collection and for improving the quality of fingerprints, including automatic fingerprint collection using UAVs and robots [62], SLAM [91], crowdsourcing [92], and other efficient data collection methods.
To minimize labor-intensive work during the data collection process, [62] used UAVs to collect fingerprints. This UAV-based collection method provides fingerprints of 3D space, which are more valuable than conventional collection methods. However, the time and energy efficiency still hinders the coverage of UAVs, which reduces scalability. To address this issue, another method, AuF [93], was proposed, where the fingerprinting database is autonomously constructed with improved time and energy efficiency. In [94], the authors proposed an automatic fine-grained indoor radio map construction and adaptation scheme called WiGAN. This scheme is an automatic fine-grained indoor radio map construction empowered by Gaussian process regression conditioned least-squares generative adversarial networks (GPR-GANs), where a mobile robot collects the RSS data.
The use of SLAM is often considered when constructing a radio map with low survey costs. Due to its high processing cost, it may not be appropriate to run on resource-constrained smart devices such as IoT devices and mini drones. Map management methods (MMMs) [95], such as pedestrian deadreckoning and map filtering, are often used to locate the device's starting point and optimize its output accordingly. However, due to high computational costs, their implementation in resource-constrained smart devices is limited. Thus, a SLAM-based scheme called Wi-Fi SLAM was proposed in [96]. It employs a Gaussian process latent variable model that links Wi-Fi fingerprints with a motion dynamics model in the absence of certain location tags in the training data point. In addition, the authors of [97] suggested GraphSLAM, which transforms the posterior SLAM into a graphical network, with a greedy algorithm used for data association. The authors of [98] demonstrated that GraphSLAM improves the computational efficiency of Wi-Fi SLAM while decreasing its reliance on fingerprint distinctiveness.
The crowdsourcing approach harnesses the potential of active and passive participatory actions of users. Participation of the user can take place during the offline or online phases. For this reason, [92] recently proposed the development of radio maps in both active and passive formats. An active crowdsourcing approach increases user involvement, which decreases the necessity of special auditors but may incur deliberate malpractice due to the involvement of participants. However, a passive crowdsourcing approach decreases user involvement by linking fingerprint data extracted from inertial sensors on smartphones to the relevant RPs. Some of the crowdsourcing approaches are organic indoor location (OIL) [99], Zee [100], LiFS [101], and many more. OIL [99] routinely asks users to attach their observations along with their positions on the floor plan, provide details on neighboring wireless devices, and then represent the determined position on a global map. Zee [100] used mobile inertial sensors to monitor users while conducting a Wi-Fi scan at the same time. This makes it possible to create a radio map while protecting the privacy of users. LiFS [101] creates the radio map by using smartphone built-in sensors for the floor plan, which results in quicker implementation and fewer working hours. Another crowdsourcing scheme [102] is an RP graph-based approach that uses sensors from smart devices to dynamically locate the position of RPs linked to the captured fingerprints and build realistic, quick, and accurate fingerprints. In this scheme, the element of confidence (''belief factor'') in good precision and reliability is used. In [103], the authors proposed an automated construction and maintenance system for radio maps, which uses an unsupervised learning algorithm for the incremental and adaptive calibration process. After the initial setup, the radio map adopts the changes reflected in the environment. In [104], the authors proposed a gradient fingerprinting indoor localization and tracking system (GIFT) as a new fingerprinting method for indoor localization. The key behind this is to use differential RSSI between adjacent locations, which is more reliable than absolute RSSI. It is also independent of the AP transmission power and receiving device. GIFT is backward compatible with previously developed state-of-the-art fingerprint map construction techniques, which might reduce training overhead.
In [105], the authors proposed an automatic Wi-Fi fingerprint system based on unsupervised learning. This system combines a modified autoencoder and a modified generative adversarial network (GAN) to create an initial radio map and cope with the new addition or removal of APs during the localization phase. The authors in [106] proposed a hybrid generative/discriminative semisupervised learning algorithm. This algorithm employs a large number of unlabeled data samples to enhance/support the small number of labeled data samples. Another semisupervised learning-based radio map construction method was proposed in [107], which uses a semisupervised self-adaptive local linear embedding algorithm for the construction of the radio maps. Other methods for the construction of radio maps include DeepMap [108], manifold learning [109], and the multiwall path loss model [110].
Radio map construction methods also need to increase consistency, handle outliers and irregular indoor coverage areas, and improve positioning accuracy in indoor environments. Several researchers have proposed a variety of approaches for enhancing and enriching radio maps. To handle sparsity, the authors of [111] used compressive sensing to derive extra key features in radio maps by using singular value decomposition (SVD) and kNN. The authors of [112] proposed a Fourier transformation and minimization method that reduces the sparsity in radio maps by using a sparse group LASSO. In [67], the authors proposed a scheme that uses an augmented sparse recovery algorithm, LASSO for AP selection, and fine localization. A technique for eliminating useless APs and their fingerprints from radio maps was introduced in [113]. Another scheme in [114] using RSS quantization increases the consistency and efficiency of radio maps and maintains the same accuracy while using 4-bit quantization instead of conventional RSS. The authors of [115] proposed a linear interpolation scheme that can be paired with extrapolation methods based on minimum observed values, mean detected values, and triangulated edge signal gradients. They also employed inverse distance weighting methods, which can be used directly for interpolation and extrapolation to improve the quality of radio maps. In [116], the authors proposed an efficient scheme that automatically detects changes in AP signals and updates fingerprint databases with no need for another offline site survey. Their method adapts databases to signal changes by applying a nonparametric Gaussian process regression model. The authors of [117] proposed a technique that generates dense fingerprints from real spatially coarse RSS data by evaluating the cosine similarity of the directions to different Wi-Fi APs. Another method called RecTrack-GAN was proposed in [118]. This technique uses a GAN to generate new data and update existing fingerprinting databases. During the collection of fingerprints, real-time Wi-Fi signals suffer from the effects of acquisition noise and channel noise, and the fusion of different nodes with a large amount of data harms system performance. To handle this type of noise, various denoising techniques have been introduced. In [119], the authors proposed a technique using a stacked denoising autoencoder (DAE) that extracts RSSI measurements to overcome the sparsity of Wi-Fi signals. Another neural network scheme proposed by the authors in [120] handles sparsity and fluctuations in indoor areas and efficiently executes data denoising. In [121], the authors proposed a denoiser that learns noise characteristics instead of learning original data characteristics. The proposed denoiser is a modified version of the DAE.

IV. PERFORMANCE METRICS
In this section, we discuss the different metrics essential to assessing indoor localization systems. Most indoor localization systems are application-dependent, so that some metrics (or parameters) could have higher priority for a specific application than the other metrics (parameters).

A. POSITIONING ACCURACY
The closeness of the estimated position to the actual location of the device/user is termed accuracy. Accuracy is one of the main parameters of indoor localization that evaluates the efficiency of the system service. As mentioned in Section III, the indoor environment has a dynamic nature such that Wi-Fi fingerprinting could suffer significantly from the prospect of high position accuracy. Therefore, researchers have tried to determine a way of resolving such difficulties to achieve a reasonably reliable outcome.

B. ROBUSTNESS
The robustness is characterized as the capacity of the system that withstands or subdues adversaries such as radio map errors, hardware failures, and incorrect Wi-Fi signals while producing consistent results. A robust positioning system allows the use of less or even incomplete information for user location prediction. Studies have shown that ML schemes such as autoencoders help to improve the robustness of the system. VOLUME 9, 2021 C. SCALABILITY Scalability is the capability of a positioning system that is able to provide location information to a rising number of users. In particular, techniques for commercial use often request a high degree of scalability while having a reasonable cost. However, this property may have put the device under a tremendous load operating with undue strain.

D. COMPLEXITY AND COST
Another important parameter for evaluating the efficiency of the system is the complexity of the indoor positioning system. System software algorithms can be related to complexity and cost. Algorithms significantly degrade the efficiency of the system through their spatial and time complexity. Further computational power is required to run highly sophisticated algorithms that consume more energy. Hence, complexity is key to the real-world implementation of indoor positioning systems. However, the actual implementation of a localization system, which optimally reduces complexity and cost, may require significant efforts.

V. ML IN INDOOR LOCALIZATION
In this section, we discuss data preprocessing in ML, data augmentation, ML algorithms, and postprocessing in ML, which are primarily used in indoor localization. For further information, readers may refer to [122]- [124].
ML is characterized by a computer algorithm that can automatically learn and identify patterns in data. Based on this learning, an algorithm can detect patterns or execute different decision-making tasks for new unknown data. Typically, ML is classified into three broad categories: supervised, unsupervised, and reinforcement learning. In supervised learning, labels are available for all training samples. In unsupervised learning, labels are not available for any training samples. In reinforcement learning, an agent learns to operate or take actions to achieve a goal in an uncertain, potentially complex environment, and in return, it receives rewards. There is another subcategory of ML, i.e., semisupervised learning, where some training samples have labels and the rest of them are unlabeled. ML techniques can achieve human-level performance on various tasks. Therefore, researchers have used ML in indoor localization to achieve high performance and compensate for or mitigate various problems during data acquisition, such as missing RSSIs, RSSI redundancy, and anomalies (or errors) in RSSI fingerprints. In Section III-C, we mentioned several ML techniques for the collection, enrichment, or cleaning of fingerprints.
The problem of locating a device in an indoor environment can be formulated as an ML classification or regression problem based on required location information. The classification problem attempts to determine the symbolic locations of devices such as research labs, lecture rooms, or conference rooms. The regression problem attempts to determine the physical locations (i.e., actual coordinates) of devices.

A. DATA PREPROCESSING IN ML
Generally, in developing a better ML model, data preprocessing helps to manipulate the raw data by performing various tasks, including data cleansing and denoising, selection and partitioning of data samples, feature tuning, feature extraction, and dimensionality reduction.
• Data cleansing or denoising: This task is to remove or correct either records with false/incorrect values from raw data or records with no significant number of columns. In indoor localization, unavailable raw Wi-Fi RSSI values from APs are often replaced with a specially assigned (or defined) number indicating missing data. For instance, the authors in [125] replace all of the unavailable signal values with '100' in the dataset. Additionally, denoising the raw Wi-Fi RSSI signals is required for improved performance. Autoencoders or GANs are often used for removing noise (denoising) in raw signals [119].
• Selection and partitioning of data samples: This process selects the random data samples from the input data and splits the dataset into a training set, a validation set, and a testing set.
• Feature tuning: This task is to improve the quality of features for ML, which comprises scaling or normalizing the numeric data, clipping outliers, and adjusting the data with skewed distributions. In indoor localization, there are many normalization techniques, such as exponential, zero-to-one normalized, and powered [126].
• Feature extraction: This task aims to reduce the number of features in a dataset by creating (or extracting) new features from the existing ones (and then discarding the original features).
• Dimensionality reduction: This process involves reducing (or lowering) the number of features and dimensions of data, providing more efficient data representation, choosing the subset of input features for model training (a type of feature extraction), or avoiding trivial and redundant features. Additionally, most of these features overlap with each other, which produces redundancy in the training set so that dimensionality reduction is used for redundancy removal. In indoor localization, the input could be a radio map with many features and dimensions, which could cause difficulty in ML training. High-dimensional data often lead to a more complex model and increase the chance of overfitting. Then, dimensionality reduction avoiding such problems forms a crucial part of the preprocessing step. In indoor localization applications, many popular ML algorithms are commonly used for data preprocessing and dimensionality reduction, such as singular value decomposition (SVD), principal component analysis (PCA), kernel PCA (KPCA), locally linear embedding (LLE), linear discriminant analysis (LDA), t-distributed stochastic neighbor embedding (t-SNE), and autoencoders [109], [111], [121], [127]- [131]. Below, we briefly discuss each of those algorithms, which are often used in data preprocessing and dimensionality reduction.

1) SVD
SVD is a type of factorization that enables a high-dimensional matrix M to be represented as a product of lower-dimensional matrices. That is, the SVD function decomposes an m × n matrix M into the following three matrices: M = U V * , where U is an m × m unitary matrix, = diag(σ 1 , σ 2 , σ 3 , . . . , σ n ) is an m × n rectangular diagonal matrix, and V * denotes the conjugate transpose of an n × n unitary matrix V; the columns of U and V are also called the left-singular and right-singular vectors, respectively. The diagonal entries (σ i ) of the matrix that have nonnegative values in descending order from the upper left corner of the matrix are known as singular values of M [111].

2) PCA
PCA is the most popular unsupervised linear method. It is a dimensionality reduction method that deals with high dimensionality problems by linearly transforming features of the high-dimensional data into a lower-dimensional space. In other words, PCA is an eigenvalue decomposition of the data covariance matrix M c , which is used for low-rank approximation. The principal components are obtained by calculating the eigenvalue problem of M c , that is, the covariance matrix of the original data R, i.e., M c ν x = λ x ν x , where λ x are the eigenvalues of the matrix M c and ν x are the corresponding eigenvectors. That is, to reduce the dimensionality of the data, the n eigenvectors (principal components) corresponding to the n largest eigenvalues need to be computed. The PCA process gives the resultant matrix as follows: R PCA = U T n R, where U n = [ν 1 , ν 2 , . . . , ν n ]. As a result, the dimension of the original data matrix R is reduced by multiplying it with the matrix U n , which consists of n eigenvectors corresponding to the n largest eigenvalues [127], [132].

3) KPCA
Kernel PCA (KPCA) is another variant of PCA that is mainly used for nonlinear data preprocessing. KPCA computes the principal eigenvectors of the kernel matrix rather than the covariance matrix. A kernel matrix is the inner product of the data points in the high-dimensional space (which is also called kernel space). KPCA uses a kernel function that projects the dataset into kernel space, where it is linearly separable. That is, KPCA has the ability to create nonlinear mappings in the kernel space [133].

4) LLE
LLE is a nonlinear spectral dimensionality reduction method used for manifold embedding and feature extraction. LLE tries to preserve the local structure of data in the embedding space. In other words, the close points in the highdimensional input space should also be close to each other in the low-dimensional embedding space. By this local fitting, the far points in the input space also fall far away from each other in the embedding space. From another perspective, the idea of local fitting by LLE is similar to piecewise spline regression. LLE unfolds a nonlinear manifold by locally unfolding it piece by piece so that a suitable total manifold unfolding is obtained. In general, we can say that most of the unsupervised manifold learning methods have the idea of local fitting. The LLE algorithm has the following three steps: first, it finds the k-nearest neighbors (kNN) graph of all training points; second, it finds the weights for linear reconstruction of every point by its nearest neighbors; third, it embeds the data into a low-dimensional embedding space using the same weights as in the input space [131].

5) LDA
LDA is a method to find a linear transformation that maximizes class separability in the reduced dimensional space. The criterion in LDA is to maximize between-class scatter and minimize within-class scatter. The scatters are measured by using scatter matrices. The performance of LDA increases when the data are constructed using independent variables with large data patterns. However, LDA is not applicable for nonlinear applications [129].

6) t-SNE
t-SNE is a nonlinear, unsupervised, and manifold-based feature extraction method that maps high-dimensional data to low-dimensional data while keeping the original data's structure significant. In contrast to other dimensionality reduction methods, t-SNE is mainly used for data exploration and visualization. In other words, t-SNE provides an intuitive understanding of how data are organized in high-dimensional space [131].

7) AUTOENCODERS
Autoencoders are also unsupervised learning algorithms that automatically attempt to learn essential features from unlabeled data, which provide a better description than the original input data. An autoencoder is a type of neural network comprising two components: an encoder and a decoder. Both components are connected by the bottleneck layer (or hidden layer), also known as the latent space. The encoder compresses the input data into the lower dimension, while the decoder reconstructs the input data from lower-dimensional data. The goal of the autoencoders is to decrease the reconstruction error of the data output. For more details, readers may refer to [134].
The abovementioned techniques perform well when the data are large enough. However, in the real world, the data are often not sufficient for ML model training such that system performance is not satisfactory. To address such issues, the following data augmentation techniques are also introduced. Typically, in the ML domain, the performance of a model improves as the number of training data increases. Data augmentation artificially increases the size of a training set by generating many realistic variants of the original training instances. This process may reduce overfitting when it is applied as a regularization technique. Data augmentation consists of various methods, such as cropping, padding, and horizontal flipping [135].
Indoor localization may often suffer from poor data deficiency because collecting data is a daunting task. Data augmentation can improve the potential of ML models by generating new synthetic data from previously collected datasets. This method may also detect and remove erroneous or invalid data, allowing an improved database that can be used for training the ML models. It can help researchers generate different scenarios or features during model training, allowing ML models to become more robust. For example, DataLoc+ [136] employs a data augmentation technique for room-level indoor localization. Another study [137] uses data augmentation to extend measurements and develop new learning algorithms. The authors of [138] demonstrate the feasibility of data augmentation using deep neural networks (DNNs), and the authors of [139] use data augmentation to reduce the number of required site surveys and improve location accuracy. In [140], the authors propose a novel data augmentation technique based on a conditional adversarial network to handle the sparsity of RPs.

C. ML ALGORITHMS FOR INDOOR LOCALIZATION
The preprocessed data are used to train the ML prediction models for indoor localization. The models' hyperparameters are tuned to achieve high accuracy as well as to avoid under-and/or overfitting. The authors in [141] suggest a guideline for choosing the best hyperparameters of several ML models, such as k-nearest neighbors (kNN), support vector machine (SVM), decision tree (DT), artificial neural networks (ANNs), genetic algorithms, and federated computing. Each of those ML models has different underlying outcomes depending on predictive mechanisms. The performance of the prediction models is usually evaluated on a validation/test subset of database/radio maps. Additionally, the abovementioned prediction models can be used to approach the indoor localization problem in a supervised, unsupervised, or semisupervised (partial labels for data) way. Below, we briefly mention some of the commonly used ML prediction models for indoor localization.

1) KNN
kNN is a nonparametric method that is used for predictive problems such as classification or regression. The input contains the k nearest training examples, and the output depends on whether the method is used to classify or regress. kNN is one of the most straightforward algorithms in ML that classifies the dataset by computing the distance between two points [64]. kNN is commonly used due to its ease of interpretation and low calculation time. Values of k factors are crucial in this algorithm. For better prediction, the k values should be calculated such that the validation error becomes low [142]. Location Loc is derived by averaging location values of k coordinates as follows: In indoor localization, RSSI fingerprint values depend on the physical distance from APs to targeted devices. Assume that k RPs are taken into account for the kNN algorithm, where each RP is selected based on nearest k points to the users' location in the grid. It is a good indication of physical proximity when the closest point is identified [64].

2) SVM
SVM is a supervised learning method used for classification, regression, and detection of outliers [143]. It uses a technique called the kernel trick to transform the data and then determines an optimal boundary between potential outputs depending on such transformations. SVM is an algorithm that takes the data as an input and draws a line dividing the data into multiple classes. In those classified data, the points nearest to the line are called support vectors, and the space between the line and the support vectors is called the margin.
The key aim of the SVM algorithm is to locate a hyperplane in an n-dimensional space that classifies the data points distinctly. The hyperplane is derived as follows: where w is the vector of the margin width, C is the tradeoff between margin width and misclassifications, ξ i is a slack variable, and y i is the equivalent label of x i . In indoor localization, SVM uses the support vectors for training on the RSSI fingerprints in the radio map composed of grid points. SVM scrutinizes the relation between its trained grid points and fingerprints, describes each grid point, and classifies the RSSI fingerprints into different zones or groups.

3) DT
DT is a flowchart structure wherein each internal node is a test on an attribute, each branch is a test result, and each leaf node is a class label. Classification rules follow the paths from the root to the edge. DT is a nonparametrically supervised learning method for classification and regression. It works for both categorical and continuous variables of input and output. In this technique, the populations (or samples) are divided into two or more homogeneous subsets (or subpopulations) depending on the most critical input variables called splitters/differentiators. The goal is to construct a model that predicts the value of a target variable by learning the basic decision rules derived from the data characteristics. The algorithm stops when there is an unbridled decision or inefficiency. By pristine decision making, the data subset of each node includes only one endpoint. It also empowers predictive models with high precision, reliability, and interpretation friendliness [144]. The sample data division rule uses the Gini diversity index: where p i,k is the ratio of k class instances among the training instances in the i th node. In indoor localization, generally, DT is used for two stages. In the first stage, a radio map is created with the help of a DT model. In the second stage, RSSI fingerprints detected from the AP at a specific location are classified to predict the location of DT devices.

4) ANN
ANN is a computational framework based on the biological brain's structure and learning abilities. The interconnected nodes in the ANN transmit signals to one another in different layers. In particular, if the number of layers increases, i.e., the depth of the structure increases, it becomes a popular deep learning (DL) technology. It can therefore easily be said that DL is a subset of the broader ML family. The basic perceptron h(·) [124] is given by where z = w 1 x 1 + w 2 x 2 + w 3 x 3 + · · · + w n x n = (x T w), x is the input vector, and w is the weight vector. The output of fully connected layers is given by where X is the matrix of input, W is the weight matrix, b is the bias vector, and ϕ is the activation function.
In indoor positioning scenarios, ANNs (or DLs) can be used for multiple tasks, such as feature extraction, dimension reduction of radio maps, classification, regression, and forecasting of the devices' locations. DL models and techniques may provide position information in a single step when compared to other standard ML algorithms.
In addition, DL models can take RSSI values directly without calculating the average of these values, which conclusively diminishes the loss of information. This removes the need for domain experience and the extraction of core functionality. DL is a very promising scheme for improving localization accuracy in complex environment scenarios, where feature extraction is challenging and the data have high dimensions. DL is well known for its distributed processing and analytic capabilities that contend with vast quantities of unlabeled or labeled data. Additionally, DL may help alleviate the effect of RSSI fluctuation due to multipath fading and propagation loss. Furthermore, recent advancements in DL will lead to techniques with better performance, less energy, and more efficient computation, which could incorporate low-power IoT devices. DL-based indoor robot localization could even reduce the need for site surveys. All of these enhancements in DL techniques enable future indoor localization systems to have high accuracy, low latency, strong robustness, and high adaptability to a dynamic environment.
In the last decade, DL models such as multilayer perceptron (MLP), convolutional neural network (CNN), recurrent neural network (RNN), and deep Q-network (DQN) (which are briefly discussed below) have been proven to often outperform traditional (standard) ML models such as KNN, SVM, and DT in complex tasks [124].
• MLP: An MLP is a supplement of a feedforward ANN.
An MLP consists of at least three layers of nodes: an input layer, a hidden layer, and an output layer. The input layer receives the input signal to be processed. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. The output layer carries out the required tasks, such as prediction and classification. An arbitrary number of hidden layers placed between the input and output layers are the accurate computational engines of the MLP. Similar to a feedforward network in an MLP, the data flow in the forward direction from the input to the output layer. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers with a nonlinear activation function, distinguishing MLP from a linear perceptron, are designed to approximate any continuous function and to solve problems that are not linearly separable [130], [145].
• CNN: A CNN is a particular type of feedforward neural network in AI. CNN is widely used for image recognition. Its architecture was inspired by visual cortex organization, which is similar to the neuron connectivity pattern in the human brain. CNN represents the input data in the form of multidimensional arrays that work well for a large number of labeled data. CNN extracts every portion of the input image, which is known as the receptive field. It assigns weights for each neuron based on the significant role of the receptive field so that it can discriminate the importance of neurons from one another. Compared to ANN, CNN possesses the following advantages: 1) [local connections] each neuron is no longer connected to all neurons of the previous layer, but only to a small number of neurons, which effectively reduces the number of parameters and speeds up convergence; 2) [weight sharing] a group of connections can share the same weights, which further reduces the number of parameters; 3) [downsampling-based dimensionality reduction] a pooling layer harnesses the principle of image local correlation to downsample an image, which reduces the amount of data to be retained. The above three appealing characteristics make CNN one of VOLUME 9, 2021 the most representative algorithms in the deep learning field [146], [147].
• RNN: An RNN is a neural network that maps from an input space of sequences to an output space of sequences in a stateful way. That is, the prediction of output depends not only on the input but also on the hidden state of the system, which is updated over time as the sequence is processed. Such RNN models can be used for sequence generation, classification, and translation. RNN is a class of ANN, where connections between nodes form a directed graph from a layer to previous layers, allowing information to flow back into the previous parts of the network. Thus, each layer in the model persistently depends on past events. LSTM is a special kind of RNN consisting of a chain-like structure that is capable of learning long-term dependencies by remembering information for long periods [148].
• DQN: A DQN is a neural network that approximates a state-value function in a Q-learning system. It is typically used in combination with experience replay to store episode steps in memory for off-policy learning, which takes samples randomly from the replay memory. Furthermore, the Q-Network is typically optimized against a frozen target network that is modified with the most recent weights every k steps. Experience replay improves training stability by avoiding short-term oscillations caused by a moving target. The Q-Network deals with autocorrelation caused by online learning, where having a replay memory makes the problem similar to a supervised learning problem [34].

D. POSTPROCESSING IN ML
Postprocessing [149] is an additional process on the ML model output to further improve the performance. Postprocessing methods include various pruning schemes, quality processing rules, sorting rules, and so on. These schemes run various illustrative filters for noisy, imprecise, or undesired information created by an algorithm. Similarly, postprocessing is needed for an indoor localization output, as it may incorrectly predict the location of the user due to missing signal strength of the APs, hardware failures, and so on, ultimately leading to unsuccessful positioning models. While developing ML models, they often need retraining to compensate for data changes and take the dynamic nature of the environment into account. This retraining of the model may increase the downtime, cost, and complexity of the system. To avoid these issues, researchers have started using a stateof-the-art technique called transfer learning.

E. TRANSFER LEARNING
Transfer learning (TL) has attracted significant research attention in recent years and has been successfully applied in various application areas, including computer vision and natural language processing [150], [151]. TL is an ML technique to store the knowledge acquired during problem solving and utilizes this acquired knowledge to solve other related problems. In other words, it aims to discover the latent features between source and target domains, extract knowledge from the source, and transfer the extracted knowledge to the target. It relies on the domain adaptation process, which attempts to reduce the differences between domains [152]. TL provides significant performance improvements over traditional ML systems that have a dataset with uneven distributions, dimensional mismatching, inaccurate/lost data labels, or limited training data [153].
The system must update a database/radio map used for localization regularly. Otherwise, fingerprinting algorithms will become obsolete, causing a significant decrease in efficiency. For indoor localization, TL can provide several advantages from the perspective of radio map updating. It can compensate for ever-changing indoor environment settings [154] and improve the scalability of indoor localization without increasing the overhead of fingerprints [155]. Additionally, TL can enable indoor localization systems to compensate for AP signal fluctuations and make radio maps more adaptive [156]. TL reduces the requirements for recalibrating fingerprints because it can quickly transfer previous knowledge to a fingerprinting algorithm and make outdated data more useful until significant structural changes occur in an indoor area.

VI. ML-BASED RSSI FINGERPRINTING SCHEMES
This section summarizes recently published ML-based indoor localization systems using Wi-Fi fingerprinting techniques. We classify the localization systems into two types of models, i.e., traditional ML (KNN, SVM, DT, RF, etc.) or ANN (or DL: MLP, DNN, DQN, RNN, LSTM, etc.) models. These models have different objectives depending upon indoor localization applications, such as classification in terms of symbolic location (i.e., room, floor, building, RP) or prediction (or regression) in terms of physical location (i.e., real coordinates). Their performances are evaluated over public or private databases with respect to the following metrics: positioning accuracy, scalability, robustness, cost and complexity.

A. TRADITIONAL ML MODELS
In [157], the authors used PCA with a combination of ML techniques. Their system attempts to find appropriate links between a user's location and predefined RPs. They used custom datasets for their simulations and concluded that PCA reduces complexity by up to 70% with improved positional accuracy. Another SVM-based solution was proposed in [41] for room-level prediction. This method uses a normalizedrank-based SVM classifier (NR-SVM) and shows that experimental results predict 93.75% of test cases with 98.75% accuracy. In [158], the authors used an SVM with a custom dataset and demonstrated that their proposed approach could achieve 77% accuracy within 2 m. Additionally, the results also indicated that their system is fast and efficient at predicting user locations and has lower complexity while being robust and scalable.
Another SVM-based positioning system was proposed in [32]. This method is also capable of recognizing user poses. The proposed approach can compensate for the impacts of shadows and user postures on RSSI. The authors collected fingerprints and simulation results that demonstrated that their system could recognize three poses with 97.16% accuracy and locate users with a positioning error of 0.4303 m. Their approach is robust and has middling complexity but lacks scalability. In [159], the authors proposed a fuzzy least-squares SVM-based indoor localization scheme. They designed a fuzzy coefficient that represents the loss of misclassified fingerprint data of various types. Additionally, they divided their dataset into two subsets and selected the best subset using the fuzzy C-means algorithm [160]. The proposed scheme requires a significant amount of time to predict the position of a user, making it unsuitable for real-world applications.
In [33], the authors proposed an RF-based indoor localization system using a smartwatch. They constructed their dataset for their experiments and achieved 97.5% accuracy with an execution time of less than 200 ms. Their system achieves good accuracy but does not handle noise and signal fading well. Therefore, despite having low complexity and high precision, this system has poor stability and robustness, which prevents it from being used in real-world applications.
In [161], the authors proposed a feature-scaling-based kNN algorithm. This algorithm assigns different weights to two other signals to reduce the similarity between the corresponding RSS vectors. Additionally, it calculates the significant signal distance between an RSS vector and the fingerprints of every RP in a radio map. For simulations, the authors constructed a fingerprint dataset from their university's office buildings, covering a total area of approximately 72 m 2 . The results revealed that the mean location error was roughly 1.70 m. This robust algorithm can also be applied to other environments.
In [162], the authors proposed a kNN-based scheme for indoor localization. Their method attempts to handle spatial ambiguity and dynamic RSSIs by leveraging past user location information. The authors constructed a private database for validating the proposed scheme. The results revealed that it could achieve 80% accuracy with a positioning error of 0.89 m. This system is relatively simple and can handle the dynamic nature of various environments but lacks scalability. However, the experiments conducted in [125] indicated a significant reduction in overall performance.
The system proposed in [91] uses unlabeled data for indoor localization. This system uses SLAM, which has a slight dependency on labeled data (precollected RSSI). This makes the system scalable and robust, but using a more complex algorithm increases the overall system complexity. However, localization accuracy is high with a small amount of data.
In [163], the authors proposed a weighted ensemble classifier for smartphone-based indoor localization. In this study, various context-specific ML classifiers were grouped according to the Dempster-Shafer belief theory (DSBT). The DSBT is used to determine the weights of base classifiers according to their prediction capabilities, and weighted voting is conducted to approximate an unknown location. The authors evaluated the proposed method using the JUIndoor-Loc database [164]. They also analyzed the performance of their approach on the UJIIndoorLoc dataset [125] and claimed to achieve an accuracy of 98% with a 2 m localization error. However, a lack of robustness and increased complexity reduces the prospects for real-world implementations of this method.

B. ANN (DL) MODELS
Typically, DL models refer to ANN models with higher complexity than standard ML models. Significant research has been conducted in this field in recent years. In [148], the authors used an RNN-based model to predict the paths and locations of devices in an indoor environment. They devised a novel architecture called a convolutional mixture density recurrent neural network that uses long short-term memory (LSTM) [165] for state transitions. For the feature extraction and the handling of missing Wi-Fi signals, they used a VAE where identical latent distributions are assumed for both Wi-Fi signals and user locations. The authors tested their scheme on two different public datasets [125], [166] and demonstrated that their implementation is superior to other DL implementations. Although sophisticated and computationally expensive models make their system more robust and scalable, they increase system cost and complexity. This system has the potential for use in small-area applications in the real world.
The CNN-based system was proposed in [146] to handle time-varying RSSI values. The authors constructed a 2D virtual radio map from 1D Wi-Fi RSSI values and then built a CNN system to take 2D radio map inputs. Therefore, this system can learn the topology of an RSSI-based radio map. Furthermore, it achieves 95.41% accuracy for predicting the building IDs and floor numbers based on an experimental database [125]. Additionally, accuracy increases up to 95.5% when a dropout layer is incorporated [167]. Therefore, this scheme has low time complexity, fast execution time, and good scalability.
Another study [147] used a CNN to predict the locations of devices by considering RSSI data as time series of RSS values. By using the dataset from [125] for their experiments, they demonstrated that their system provides 100% accuracy for both building and floor prediction and has a positioning error of 2.77 m. This approach is less complicated than other DNN-based methods and is robust because it can handle noise and randomness in data. However, it has limited prospects in the real world due to slow forecasting performance.
CNNLoc [168] is a multibuilding and multifloor indoor localization system that uses Wi-Fi fingerprints. It uses an SAE to extract specific features from raw RSSI fingerprints and a CNN to achieve high accuracy during the online phase. To validate this method, the authors performed simulations using two other datasets [125], [166], as well as their private dataset [168]. CNNLoc can achieve accuracies of 100% and 95% for predicting buildings and floors, respectively. The positioning errors for three different datasets (i.e., [166], [168], and [125]) were 7.6 m, 10.88 m, and 11.78 m, respectively. However, its positioning error of more than 3 m limits its prospects for real-world applications.
In [169], the authors proposed a continuous wavelet transform (CWT)-based approach that formulates indoor RSSI fingerprints. They developed two DL-based algorithms. The first uses images translated by CWT as RSSI input data with Gaussian noise and then trains a CNN to predict the user's positions. The second uses spectral density data extracted from CWT images and then trains an ANN to predict the user's positions. The authors claimed to achieve room prediction accuracies of 97.3% and 70.6% for the CNN and ANN, respectively, and RP prediction accuracies of 94.93% and 60.6%, respectively. Another CNN approach was introduced in [170] with a reported localization accuracy of approximately 94.13%, where the online prediction complexity was shifted to an offline preprocessing step.
A device-free indoor localization system using both RSSI and CSI was proposed in [145]. The use of MLP and a 1D CNN for location prediction makes this system less complicated. ReLU and softmax activation functions are used to improve system performance. Additionally, this system uses a private dataset. The results demonstrated that the system could achieve 99.97% and 82.92% accuracies with 1.92 m and 0.92 m positioning errors, respectively, when using CSI and RSSI. Overall, RSSI achieves acceptable accuracy with a lower positioning error. However, this system sacrifices robustness and scalability to achieve low complexity.
In [34], the authors used an agent-based system to handle unreliable Wi-Fi fingerprints and localize target devices. They used a hierarchical search algorithm, starting from the outer boundary of a localized environment and converging toward the target device. They modeled indoor localization as a Markov decision process (MDP) [171], where a deep Q-network (DQN)-based learning agent meshes (coordinates) fluidly with an indoor area and performs localization using a sliding window. The authors claimed that their model does not require any prior information and can provide on-demand real-time localization. For experiments, they used a public database [125]. The results demonstrated that their system could locate 75% of devices with a positioning error of 0.55 m and achieve 76.43% accuracy up to 1 m. This system is robust and can be implemented in extensive areas, making it scalable, but using a DQN-based agent makes it computationally expensive and sophisticated.
In [172], the authors proposed an AutLoc system that uses an autoencoder to improve accuracy by reducing noisy RSS. Deep autoencoders are trained to denoise data in the offline phase and construct RSS fingerprints based on learned weights. Additionally, three different ML algorithms (RF, multiplayer perceptron classification (MPC), and multilayer perceptron regression (MPR)) were used to predict locations and average the results to obtain a final position. This system can achieve high accuracy, but it is computationally expensive.
Hybloc [173] is an infrastructure-less indoor localization technique. It uses a Gaussian mixture model (GMM) [174] for soft clustering and employs an RF technique for both roomlevel and latitude-longitude-level prediction. The authors also proposed a dataset slicing technique to find natural groups in a dataset based on GMM-dependent soft clustering, which is driven by Akaike information criteria (AIC) and Bayesian information criteria (BIC) [175]. They claimed an average accuracy of 85% with a positioning error of 6.26 m on a public database [125]. Their experiments revealed that their system is robust and scalable in real-world applications but has increased complexity because it requires data preprocessing.
The method proposed in [130] is a robust DNN system that achieves precise positioning in a multibuilding environment. This system uses linear discriminant analysis (LDA) for data cleaning and dimensionality reduction. Additionally, it uses a multilayer perception (MLP) [176] with a rectified linear unit (ReLU) activation function to improve system performance. The authors used a private dataset and achieved 99.15% accuracy with a positioning error of 0.98 m. The use of an MLP and LDA reduces the time complexity of the system and increases its robustness.
Another study [185] demonstrated that pre-and postprocessing techniques could improve the performance of ML models. Since then, many authors have used these techniques in a wide variety of systems. The authors of [179] proposed pre-and postprocessing indoor localization based on a DNN. They also used TensorFlow for rapid system implementation and constructed their own dataset of RSSI fingerprints. Experimental results demonstrated that pre-and postprocessing could help an ML model achieve a high accuracy of 95-94% with a precision of 4 m in indoor environments. Overall, pre-and postprocessing can make a system more robust and scalable and make systems more computationally expensive.
In [177], another indoor localization system using an autoencoder-based deep extreme learning machine (ELM) localization algorithm was proposed. This algorithm highlights the impact of increasing the number of training data on localization performance. The authors constructed a private database for testing the proposed algorithm, and the results demonstrated that the algorithm could achieve 95.75% accuracy as the amount of training data increased. It only achieved an accuracy of 87.45% without increasing the amount of training data. Overall, increasing the amount of training data improved the system's accuracy and made it more robust and scalable. However, large datasets increase the initial system cost and complexity based on algorithm retraining and data augmentation requirements.
In [182], another autoencoder and DNN-based localization scheme was proposed to increase scalability for a large multifloor building. Two stacked unsupervised autoencoder models were implemented in the proposed method, and the entire network was trained globally by adding a softmax output layer for classification. By using a private database based on 162 rooms, this method required less time for training and achieved an accuracy of 85.58%. This scheme is scalable and robust but also has high complexity.
The method proposed in [181] is a passive scanning-based smartphone localization system for indoor environments. This system uses a local-feature-based deep LSTM technique to localize devices and a local feature extractor to minimize noise impact in RSSI values. It also extracts robust local features from observed RSSI values using a sliding window mechanism. The authors used a custom database and demonstrated that their system could locate a smartphone with a precision of 2 m. This system provides robustness at the price of complexity.
A DNN-based approach called SDNNLoc was proposed in [35]. It uses a DAE for feature extraction to achieve robustness. Additionally, a field-programmable gate array (FPGA) acceleration strategy was implemented in conjunction with a DNN to provide scalability. In their experiments, the authors used a crowd-sensing technique to construct a private dataset. The results demonstrated that the proposed approach can achieve more than 80% accuracy with a positioning error of less than 2 m and requires only 40 ms to provide location information. Therefore, SDNNLoc can be used in the real world, but it suffers from high complexity.
The authors of [183] proposed a robust indoor localization scheme using capsule networks called CapsLoc. Their method extracts a hierarchical structure from fingerprint data. This structure consists of a convolutional layer, main capsule layer, and feature capsule layer. Their scheme uses dynamic routing to drive the feature capsule layer, followed by onehot encoding for mapping actual labels to grid locations during training. For their experiments, the authors constructed a private dataset containing 33,600 data points from three rooms with a coverage area of 460 m 2 . The results revealed an average accuracy, positioning error, and processing time of 98%, 0.68m, and 0.5 ms, respectively. Additionally, based on the robust nature of this scheme, it can be scaled with acceptable complexity.
The authors of [184] proposed a high-adaptability indoor localization (HAIL) method that uses both absolute and relative RSS levels to provide robust and accurate location information. HAIL uses a back-propagation neural network (BPNN) to calculate the similarities between different fingerprints constructed based on absolute RSS values. Their simulation results demonstrated that HAIL could achieve 80% accuracy with an average localization error of 0.87 m. Additionally, the authors tested their method on multiple devices and concluded that the proposed method could be applied to heterogeneous devices. Furthermore, HAIL is robust and scalable because it can be implemented in a wide range of indoor environments with low complexity.
In [178], the authors proposed a DNN-based localization system that uses an SAE for feature reduction and employs a DNN for floor and building prediction. They reported 92% accuracy for the prediction of floors and buildings in multifloor and multibuilding environments [125]. Their system is relatively robust and has medium complexity. The authors of [180] proposed the WiDeep system, which uses a DL probabilistic framework to reduce noise in RSSI data. They reported positioning errors of 2.64 m and 1.21 m each in two different indoor test environments. This system is robust but is challenging to scale and has high complexity.
In [40], the authors proposed another RNN-based solution for predicting trajectories by leveraging the correlations among RSSI values. To handle the temporal fluctuations of RSSIs, a weighted average filter (WAF) was applied to both input data and output locations. The results demonstrated that the average localization error was as low as 0.75 m with 80% errors under 1 m.
Each scheme discussed above has advantages and disadvantages that make it unique. However, in Table 3, one can see that some schemes achieve high accuracies but have increased complexities, while other methods have a higher degree of robustness but lack scalability. Some have low complexities and scalability but a higher degree of robustness with poor accuracies. It is not easy to compare these methods directly because they use different radio maps/datasets (public or private) to perform their experiments. Many of the methods are application dependent. However, there are two reasonable ways to compare different schemes. First, one can use public databases that are accessible to all researchers and engineers. Second, one can use a specific standard scaling method to compare various features of private or public datasets [186].
Additionally, note that the current research trend focuses primarily on performing fingerprint collection, dealing with the sparse and noisy nature of raw Wi-Fi fingerprints, and developing ML/DL algorithms to achieve high prediction accuracy. However, it mostly neglects other aspects, such as privacy & security of user data during localization, design of low-latency localization models, and implementation of scalable and energy-efficient indoor localization systems. These aspects deserve more attention from academia and industry. Additionally, the fusion of various wireless signals (Wi-Fi, Bluetooth, ultrawideband, cellular, etc.) and the use of intelligent reflecting surfaces (IRS) [187] will emerge as a prominent method for low latency and highly accurate indoor localization applications.

VII. OPEN-SOURCE DATABASES
Data collection, an essential but cost-intensive process, is often the first step toward ML-based indoor positioning. In this section, we discuss some of the publicly available datasets for ML-based indoor localization using Wi-Fi RSSI fingerprints, which are summarized in Table 4.

A. UJIIndoorLoc DATASET
The UJIIndoorLoc database encompasses three Jaume I University buildings with four or more floors, whose total coverage area is approximately 110, 000 m 2 . UJIIndoor-Loc was produced in 2013, and its features were collected using 20 people and 25 Android devices. This database contains 19,937 training data points and 1,111 validation/test data points. It has a total of 529 features, including Wi-Fi fingerprints, fingerprint coordinates, and other valuable details [125].

B. KTH/RSS DATASET
The database includes measured RSS data in two different indoor and outdoor environments collected using a mobile robot in 2016. The total coverage area is approximately 400 m 2 , including an indoor hallway, a set of rooms, and an abandoned steel factory in Dortmund, Germany. The mobile robot's location has been documented using odometry [188].

C. MINHO DATABASE
The Minho database was compiled at the University of Minho, Portugal, in July 2017, whose total coverage was approximately 1000 m 2 . A Raspberry Pi 3 Model B with an internal Wi-Fi interface and four external USB Wi-Fi interfaces was the basis for the data collection system. The database collects a total of 5,783 fingerprints that consist of 4,973 fingerprints labeled as training fingerprints and the remaining fingerprints used as test fingerprints. In that database, each fingerprint has RSS values measured from 11 different APs [189].

D. CROWDSOURCED Wi-Fi DATABASE AND BENCHMARK SOFTWARE FOR INDOOR POSITIONING
The new open-source Wi-Fi fingerprint dataset, known as the Tampere database, consists of 4,648 fingerprints collected from 21 devices that present some benchmark indoor positioning results. The dataset was collected during January-August 2017 in a four-floor university building in Tampere, Finland, covering a floor plan of approximately 22, 570m 2 . It includes 687 fingerprints for training, as well as 3,951 fingerprints for testing or estimation, [166].

E. LIBRARY DATASET
The Library dataset is a Wi-Fi RSS database designed to support the analysis of transient signal fluctuations and the development of reliable fingerprinting-based indoor location methods. For 25 months, researchers collected this Wi-Fi data from their university library building, which has a floor plan of approximately 308 m 2 . The database contains both 576 training samples and 3,120 test samples. The collection of all fingerprints was carried out using 620 APs. The database is presented as a list of sets of files. Each dataset contains four files that store their own RSS sets, locations, times, and identifiers [190].

F. JUIndoorLoc DATABASE
The JUIndoorLoc database covers a five-floor building at Jadavpur University. Each floor covers an area of 882 m 2 , and the entire region is divided into grids of 1 m×1 m. The database that was collected using four Android devices consists of a total of 25,364 samples, including 23,904 samples for training and the remaining 1,460 samples for testing. Its 177 attributes contain RSSI values of 172 APs and other useful information [164].

VIII. OPEN CHALLENGES AND THEIR POTENTIAL SOLUTIONS IN INDOOR LOCALIZATION
ML-based indoor localization using Wi-Fi RSSI fingerprints has several advantages, but it also has certain disadvantages, as shown in Fig. 5. We discuss some of the significant challenges associated with ML-based fingerprinting techniques: lack of privacy, lack of standardization of algorithms, lack of databases, heterogeneity of devices, high energy consumption, Wi-Fi networks not made for localization, and handover delay during Wi-Fi roaming. In this section, we also present potential solutions corresponding to those challenges.

A. LACK OF PRIVACY
Location privacy of mobile devices, smart vehicles, cellular users, and smart IoT devices is already the most significant concern in localization. Privacy is the most significant barrier to the introduction of full-scale indoor localization systems in the real world. The indoor positioning system always ''knows'' the user's location when connected to the network. Tracking his/her location is illegal when it is done without prior user permission because many users may not be willing to reveal their locations. In addition, location information can be critical and have privacy-enhancing information. It may quickly expose personal and confidential information about any individual user, such as users' health, the driving pattern of their smart vehicles, attitudes, actions, use of electricity, and much more. Another security danger presented by indoor location systems in industrial or high-profile buildings is the leakage of sensitive process information, assembly lines, and other related details. Furthermore, the structural configuration can also be accessed by studying the buildings' radio map, which also poses a possibility of terrorist attacks and other harmful activities. Therefore, we need reasonable solutions to these privacy issues.
The potential solution can be using algorithms that do not expose either internal data of users or a map of the building. These issues can be overcome by technological solutions such as secure data collection techniques, federated learning, secure encryption technology, or the use of lightweight blockchain nodes. As nontechnical solutions, authorities will lay down strict but relevant rules and regulations for industries to use users' location data. Data should not be shared with any third-party entities or individuals without the user's proper consent. VOLUME 9, 2021

B. LACK OF STANDARDIZATION
There are currently no standards or sets of rules that can be considered a guide for designing indoor positioning systems. Additionally, there have been no dedicated ML localization algorithms until now. The lack of sizeable state-of-the-art standard radio maps/datasets poses significant obstacles in constructing an effective indoor positioning system.

1) LACK OF BENCHMARK RADIO MAPS/DATABASES
Despite being widely studied, a Wi-Fi-based indoor localization study lacks benchmark radio maps/databases compared to other ML applications. In regard to some publicly available databases that we discussed in the previous section, they still have many drawbacks, such as a lack of validation datasets, outdated datasets, and limited data samples. Due to this, many studies use private radio maps, but their construction is a time-consuming and expensive process. Nevertheless, these private radio maps suffer from several drawbacks, such as low coverage area, inability to adapt according to the dynamic environment, limited applications, and a deficient number of APs and RPs. Consequently, it is almost impractical to check various proposed techniques and make performance comparisons among different schemes.
This problem can be solved by creating a large and modern benchmark dataset that is compatible with the latest technology and adaptable to the new changes of the dynamic environment. The dataset should include proper training, testing, and validation data subsets that are useful for evaluating the proposed methods. These datasets should also act as universal datasets. Crowdsourcing methods are the easiest way to gather data with fewer resources and create large datasets. Another approach to building the benchmark datasets is collaboration between researchers from different universities and industries so that construction time and cost can be greatly reduced. Another potential approach would be to set up a global organization that makes universal standards for comparing different datasets and methods.

2) LACK OF DEDICATED STANDARD MACHINE LEARNING ALGORITHMS FOR INDOOR LOCALIZATION
There is no dedicated ML or DL algorithm for indoor localization. Optimized ML algorithms for image classification, object recognition, and speech recognition have already been addressed. However, there are no exclusive standard ML/DL algorithms for indoor localization to date. Currently, all ML techniques used in indoor localization are general techniques. It restricts the indoor localization techniques to be commercially viable and needs additional highly complex supporting systems to achieve acceptable performance.
Hence, there is a necessity for a dedicated standard lowcost ML algorithm or a set of specific ML algorithms that is entirely tailored and optimized for indoor localization. Developing fully optimized ML algorithms (or packages) for localization will improve the system's efficiency, robustness, scalability, and versatility. These packages consist of different algorithms (or software modules) assigned to specific tasks, which allow parallel processing in indoor localization. Moreover, ML algorithms also need to be capable of processing limited data while maintaining reasonable positioning accuracy. Reference [191] shows that symbolic artificial intelligence algorithms could provide solutions to solve the problems mentioned above.

C. NEED FOR ADAPTIVE RADIO MAP CONSTRUCTION
In indoor localization, the radio map construction process is a daunting task. Many techniques are proposed to reduce the time-consuming and laborious tasks, as mentioned in Section III-C. However, building a radio map that can be adaptive to the dynamic behavior of the indoor environment is still a significant challenge. Thus, there is a significant requirement for adaptive radio map construction algorithms such that they can detect changes in the environment and automatically update the radio map. Using crowdsourcing [92] for data collection and semisupervised learning [107] could be a potential candidate to address the challenge of adaptive radio map construction.

D. HETEROGENEITY IN DEVICES
Heterogeneity in devices is one of the significant issues in indoor localization. As mentioned in Section III, different devices use Wi-Fi sensors from other vendors, which is a major challenge for a universal localization system that works well with all devices. Most manufacturers have a different style of implementation for different hardware. The heterogeneity of the device creates a bottleneck in the adoption of a proper localization system.
The inherent solution to crack the bottleneck of heterogeneity is to develop some accord and establish a set of standards that every manufacturer complies with. Another way is to create a localization system whose hardware and software are platform-independent. It also improves the interoperability and scalability of the system with various and different devices.

E. HIGH ENERGY CONSUMPTION
Currently, high energy consumption is also an essential concern in the smart device market. Localization services may hinder the efficient use of Wi-Fi APs and localized devices because they often request added energy consumption. Hence, there is a need for an energy-efficient system coupled with optimized indoor localization algorithms. Energy consumption can be reduced by using energy-efficient processing units, sensors, and localization algorithms and decreased by using parallel processing. This makes indoor localization more available for places where energy resources are even limited, such as isolated or remote or military environments. Another approach is to build a remote localizing system, which may reduce battery anxiety among users.

F. Wi-Fi NETWORK NOT MADE FOR LOCALIZATION
Wi-Fi technology is primarily developed for providing internet services. If Wi-Fi APs are being used to provide indoor localization services, it does affect its primary task, i.e., internet services. This additional task creates a resource management issue in existing Wi-Fi systems. It can be solved by introducing the localization standard in the upcoming Wi-Fi version, which does not interfere with current data communication services. Future versions of Wi-Fi APs may need to consider separate (or affordable) channel resources and hardware for localization services.

G. HANDOVER DELAY DURING Wi-Fi ROAMING
Another major challenge is handover delay in Wi-Fi networks, where multiple APs are installed in a given coverage area. Handover is a crucial technique in keeping seamless connections during subscribers' (or devices') movement and enabling Wi-Fi roaming. However, there is still a significant lag or delay during handoff despite many efforts by researchers. To address this issue, behaviour pattern analysis or location prediction of the device (or user) using ML or DL could be helpful, as it might alert the nearest APs (or candidates) for potential handoff or increase the user's probability of choosing the best candidate with no disruption.

IX. DISCUSSION AND CONCLUSION
In the era of Industry 4.0, smart IoT devices will become major components of our daily lives or even be integrated with human life. Indoor localization-based 5G/6G applications (or services) using IoT devices (or subscribers) are expected to expand significantly. Therefore, the location information of subscribers is becoming increasingly important. Many researchers and engineers have focused on the development of various ML-based localization algorithms for improved indoor services. This paper is the first to provide an overview of various ML-based indoor localization techniques using Wi-Fi RSSI fingerprints to the best of our knowledge. In this paper, after briefly introducing various indoor localization applications and services expected to grow in the future, we presented extensive investigations of ML-based Wi-Fi indoor localization technologies, including Wi-Fi RSSI fingerprinting, radio map construction, and various ML-based localization schemes. We also discussed performance metrics that can validate various proposed ML-based localization schemes. We then addressed some relevant open datasets that are available for validating ML models in indoor environments.
Additionally, this paper provided an elaborate discussion of the open challenges faced by ML-based indoor localization techniques. We also presented potential technical or nontechnical solutions that are dependent on the nature of various localization problems. These open challenges and solutions may provide future research directions for researchers in academia and industry. Various services using indoor localization will become a new source of income for many businesses. Such services will also change how we interact with smart devices. As a result, the privacy of users will become a significant concern in the near future. In addition to improving the accuracy of indoor localization, the privacy of users in indoor areas is becoming increasingly important as we attempt to develop an intelligent IoT society.