DeepFeat: Robust Large-Scale Multi-Features Outdoor Localization in LTE Networks using Deep Learning

Location-based services in different applications push the research toward outdoor localization for users’ equipment in Long Term Evolution (LTE) networks. Telecom operators can introduce valuable services to users based on their location, both in emergency and ordinary situations. This paper introduces DeepFeat: A deep-learning-based framework for outdoor localization using a rich feature set in LTE networks. DeepFeat works on the mobile operator side, and it leverages many mobile network features and other metrics to achieve high localization accuracy. In order to reduce computation and complexity, we introduce a feature selection module to choose the most appropriate features as inputs to the deep learning model. This module reduces the computation and complexity by around 20.6%, with enhancement in system accuracy. The feature selection module uses correlation and Chi-squared algorithms to reduce the feature set to 12 inputs only regardless of the area size, compared to a large number of cell towers in similar systems; such input increases exponentially with increasing the test area. In order to enhance the accuracy of DeepFeat, a One-to-Many augmenter is introduced to extend the dataset and improve the system’s overall performance. The results show the impact of the proper features selection adopted by DeepFeat on the system performance. DeepFeat achieved median localization accuracy of 13.179m in an outdoor environment in a mid-scale area of 6.27Km2. In a large-scale area of 45Km2, the median localization accuracy is 13.7m. DeepFeat was compared to other state-of-the-art deep-learningbased localization systems that leverage a small number of features. We show that using the DeepFeat carefully selected feature set enhances the localization accuracy compared to the state-of-the-art systems by at least 286%.


I. INTRODUCTION
O UTDOOR localization services have become very important and requestable nowadays. The demand for robust and far-reaching localization services has increased recently in different domains [1]. Recently, many systems were developed to fulfill all localization applications' requirements. Such systems have become commonly used for indoor and outdoor applications [4]- [9]. In the literature, four main system categories can be used for localization.
The first category is the Global Positioning System (GPS) that is considered the standard system for outdoor navigation worldwide [10]- [12]. However, it requires a Line of Sight (LOS) to satellites, and it is considered a power-hungry system because of battery drainage [13]. The second category is a WiFi-based system that utilizes WiFi signals received from Access Points (APs) [14]- [17]. The advantage of this system is the broad deployment of WiFi APs. On the other hand, the weakness of this system is the lack of availability of WiFi AP outdoors. The third category relies on sensors deployed in smartphones, such as the compass, gyroscope, and accelerometer sensors [18]- [22]. This system category results in acceptable localization accuracy, but the required sensors are not available in low-end phones. Besides, sensors with low cost are usually noisy.
Therefore, previously mentioned system categories have disadvantages and limitations that limit their capabilities in the achievable localization accuracy. On the other hand, the fourth system is a cellular-based localization system. There are plenty of localization systems that depend on cellular signals for both indoor and outdoor environments [23]- [31]. Fingerprint-based localization systems enjoy several strengths that enable them to yield better performance. First, the cellular signal is an available resource to any User Equipment (UE). Therefore, no special hardware requirement in the case of using the cellular signal for localization. Second, no additional power requirement or heavy battery drainage is expected since there is no further hardware requirement Also, a substantial benefit of fingerprint-based localization is giving the operator the capability of estimating UE location from the regular measurement reports. Accordingly, this localization system category offers high localization accuracy and high power efficiency, making fingerprint-based localization a very realistic and efficient choice [1].
Fingerprint-based localization techniques consist of two main phases: offline and online [24]. The basic idea of the offline phase is capturing the signatures representing cellular signals received from different Base Stations (BS) towers within the area of interest, and those signatures are called fingerprints. Later, during the online phase, the cellular signal received by the user is matched with the already pre-defined fingerprints. Finally, the best match will be used to locate the user [27].
Some techniques use traditional classifiers, such as Support Vector Machine (SVM) [25] or K-nearest Neighbor Classifier (KNN) [23]. Other traditional cellular-based localization methods estimate users' locations using probabilistic techniques that try to learn the distribution of the received signal from each cell tower [24], [32]. It is assumed that different cell towers are independent so that the impact of the dimensionality problem remains insignificant. However, limiting the dimensionality problem means limiting the correlation between the received signals from different cell towers, decreasing the overall accuracy of those systems. Accordingly, deep-learning techniques are recently employed for cellular-based localization to overcome such limitations [27]- [29].
Mobile networks have existed since the 1980s, with a new generation launching almost every decade. A generation refers to a collection of network standards that have been developed. The speed of those networks improves with each generation. In the early 1980s, the first-generation (1G) began to gain prominence around the world. The secondgeneration (2G) was beginning to appear in the early 1990s. This second-generation used digital signals instead of analog signals, followed by 3G and 4G releases to increase data transmission. Currently, the fifth generation (5G) is the latest generation of mobile networks. Notably, 5G-enabled technologies are designed to accelerate data even more while lowering latency, increasing power and reliability, and ensuring accuracy and efficiency. Although 5G networks are being deployed worldwide, LTE technology will remain the backbone of cellular technology, which is expected to handle 4.4 billion subscriptions by 2025. Moreover, Voice over LTE (VoLTE) will be the foundation of voice services on LTE and 5G devices, so that VoLTE subscriptions are expected to reach 6 billion subscriptions by 2025. Also, LTE and 5G adoption will be driven by the shutdown of 2G and 3G networks. On the other hand, 4G network build-out is ongoing rapidly to cover 90% of the population by 2025 instead of 80% during 2019 and increased network capacity and speed [41]. In addition, when reviewing the literature, most of the research that addresses the outdoor localization problem was designed to work on 3G systems [27], [29], [30], [40]. On the other hand, few models are designed to work on 4G [5], [6]. All the above reasons support the selection of LTE as the target technology for our proposed model. This paper introduces DeepFeat, an outdoor localization system that utilizes a deep-learning-based technique for outdoor localization on LTE networks. DeepFeat uses a deep neural network (DNN) model to achieve high localization accuracy. Moreover, DeepFeat needs no extra energy consumption compared to the normal phone operation. Therefore, DeepFeat is an energy-efficient replacement for other localization systems that use the phone's sensors, such as WiFi and GPS. The main contributions of our work are summarized below: • We introduce a new set of LTE network features as an input for the model, based on practical experience, which noticeably enhances the localization accuracy. •  than the state-of-the-art systems. The rest of the paper is organized as follows: Section II presents previous related work. Section III presents an overview of the system. Section IV gives the details of the DeepFeat system. We then present the evaluation of the system performance in Section V. Finally, Section VI concludes the paper.
Notation: throughout the paper, we use the following acronyms, as shown in Table 1.

II. RELATED WORK
This section discusses related work that addresses localization in telecom networks. According to the literature, there are three main methods: measurement-based statistical method, fingerprint-based method, and deep-learning-based method.

A. MEASUREMENT-BASED STATISTICAL METHOD
This method depends on using point-to-point distance or angle estimates using the Measurement Report (MR) to estimate location [33]. The method is very traditional, which relies on some parameters such as Angle of Arrival (AOA), Time of Arrival (TOA), and RSS [34]. Thus, UE's location is estimated according to the distance between UE and BS without requiring complex calculations. However, this needs extra equipment to be deployed to the telecom network. Moreover, this method results in unsatisfactory localization accuracy [6].

B. FINGERPRINT-BASED METHOD
The fingerprint-based method divides the area of interest into small virtual grids, and each grid has its unique fingerprint [6]. The data relating to each sample, such as latitude, longitude, and cell towers IDs, is mapped to one specific grid cell according to the sample's coordinates. For example, Cellsense [24] and Crescendo [35] systems use the Radio Signal Strength Index (RSSI) distribution of the mobile devices located within the grid to be the unique fingerprint for every grid.

C. MACHINE-LEARNING-BASED METHOD
Recently, several techniques that utilize deep learning for cellular-based outdoor localization were proposed. These techniques employ neural network models from the sample inputs. Multi-Layer Perceptron (MLP) [37] is one of the wellknown algorithms that should be mentioned in this context. This algorithm is based on feed-forward artificial networks that contain multiple successive layers. Each layer consists of some nodes in a directed path with a Fully-Connected (FC) layer to the next layer. Each node is called a neuron with a nonlinear activation function except for the input nodes.
Random Forest (RF) [36] is another algorithm that utilizes the signal measurements in MR data as input features to estimate UE location. It constructs decision trees at training time and then predicts each tree's location by using Machine Learning (ML) model. Finally, it averages the predictions to get one RF prediction.
Deep-learning models enhanced outdoor localization significantly. As a result, several systems were proposed recently based on deep learning models. For example, DeepLoc [27] employs a feed-forward deep neural network model. The received signal strength transmitted from cell towers within the area of interest and received by a user at an unknown location is the only input to the system. It then estimates the user's location. Several data augmentation techniques are used to increase the number of data samples and decrease the noise effect. DeepLoc enhanced the localization accuracy with a resultant median value of 18.8 meters in an urban area and 15.74 meters in a rural area.
On the other hand, WiDeep [9] is a deep-learning-based indoor localization algorithm that relies on Wi-Fi signals from different APs to find the complex relationship between mobile/APs location and power received. WiDeep uses a hybrid model that blends deep-learning-based algorithms and traditional probabilistic methods. WiDeep outperforms the other indoor localization algorithms such as Horus [38] and DeepFi [39], which depend on probabilistic techniques in terms of robust accuracy against noise variations, and user equipment heterogeneity. WiDeep achieved a better average localization accuracy by at least 29.8%.
The authors in [29] proposed MonoDcell, a deep-learningbased cellular indoor model based on Long Short Term Memory (LSTM) architecture capturing the sequential correlation between cell tower readings. The authors used 2G cellular parameters only. As a result, MonoDcell achieved a competitive location accuracy compared with Wi-Fi-based indoor localization proposed in [9] and CellinDeep introduced in [28]. However, MonoDcell model depends on a high-density fingerprint, which is not realistic in large-scale areas. Moreover, the model inherits several augmentation techniques, which might affect system performance in realtime.
OmniCells [30] is another system that targets the mobile device's diversity problem and aims to mitigate device heterogeneity and its impact on model performance. The author showed that OmniCells provided a consistent median when tested using several devices, unlike the other systems. Other deep-learning-based systems use the Convolutional Neural Network (CNN) model. For instance, StoryTeller [40] is a deep-learning-based system, a 3D localization system used for floor prediction in any building.
A comprehensive comparison of different previously mentioned systems is shown in Table 2. It shows that most of the state-of-the-art systems use only two features for the localization problem. These features are the Physical Cell ID (PCI) and the RSSI. PCI identifies the cell, while RSSI represents the total power measured by UE over the entire band. RSSI is the strength of a non-demodulated signal, which the UE can calculate without the need for synchronization or demodulation.
Unlike the other systems, DeepFeat considers all available VOLUME x, 2021  Table  2, DeepFeat applies a feature selection module to choose the features that greatly influence localization accuracy.

III. SYSTEM OVERVIEW
This section introduces the main blocks of the DeepFeat system and its operational model. According to Figure 1, which shows the system structure, there are two modes of operation: offline and online. Various LTE features are collected via intensive drive test in a large-scale area using the Data Collection module during the offline mode. The DeepFeat model includes a feature selection module that uses Chi-Squared and correlation methods to rank and select the features according to their impact and relevance to the localization problem. Moreover, DeepFeat is armed with the proposed one-to-many data augmenter to extend the number of samples and reduce the noise impact. Furthermore, the area of interest was divided into small areas using the grid generator module to apply the desired classification technique. Finally, DeepFeat uses a Deep Feed-forward Neural Network (DFNN) model, and the most influential collected features are used for training this model. During the online mode, the desired features are collected from the UE. Then, we use the weights of the trained model for the prediction. The mobile device location is then estimated using the trained model to predict the grid where the mobile device is located. In the following sections, system blocks will be explained in details.

A. LTE FEATURES
In LTE networks, users are served by Evolved Node B (eNodeBs), which are also known as Base Transceiver Stations (BTSs). Each eNodeB consists of several sectors, and each sector serves a different area using sectorized antennas. Those eNodeBs provide access to the LTE network via Frequency Division Duplex (FDD) and Time Division Duplex (TDD). Orthogonal Frequency Division Multiple Access (OFDMA) is the multiple access technique used by eNodeB to provide access to users over the physical layer.
In OFDMA, eNodeB allocates radio resources spanning the time and frequency dimensions. Consequently, eNodeB transmits the data to the users via downlink frames, where each frame contains a set of reference signals [42]. When a mobile moves from one cell to another cell, it performs cell selection and handover based on the signal strength and quality of serving and neighboring cells. Therefore, several LTE measurements can be used to identify the location of the mobile device. We use these LTE measurements as the input features for DeepFeat, and we call them LTE Features, which are summarized below.

1) Physical Cell ID (PCI)
It is an identification of a cell in the physical layer. This feature is used for discriminating the received signal from different serving and neighboring cells.

VOLUME x, 2021
It defines the average power received for the reference signal (RS) transmitted from a cell in LTE networks. Typically, UE calculates RSRP for a specific cell at a given location by averaging the received power of multiple resource elements used to transfer the reference signal within the measured frequency bandwidth. RSRP is measured in decibels to one milliwatt (dBm) [43]. RSRP is used to compare the strengths of signals received from different cells in LTE networks.
Moreover, RSRP is used as an indicator for cell coverage in LTE networks, which differs from one grid or area to another for several reasons: cell density per area, transmitted cell power, area topology, or cell type (indoor cell or outdoor cell). Thus, coverage footprint will be the crucial component for fingerprint per grid. Accordingly, RSRP for serving and neighbor cells will be one of the DeepFeat model inputs.

3) Reference Signal Received Quality (RSRQ)
It defines the purity of the signal within the system bandwidth. It is a ratio measured in decibels (dB) as follows where RSSI represents the measured average total received power over N B resource blocks that carry the reference symbols [43]. The mobile device measures RSRQ from all sources, including co-channel serving cell, non-serving cell, adjacent channel interference, and thermal noise. Thus, RSRQ is an indication of RS signal quality for the serving cell. It is considered an extension for RSRP, and it also reflects the load per cell. This load differs from one cell to another depending on several factors such as PCI clashes and the area's nature. Hence, RSSI can be considered as a unique component for fingerprint per grid.

4) Signal to Interference and Noise Ratio (SINR)
It is another measurement of signal quality. However, it is defined by the UE vendor and not in the Third Generation Partnership Project (3GPP) specifications. It is the ratio of the power of the desired signal to the power of unwanted noise and interference signals. The undesired signals consist of all external interference and the noise generated by the interior. SINR then can be calculated as follow where S is the power of measured desired signals, I is the average interference power measured from other cells, and N represents the noise Power. As the number of users increases, uplink interference increases, and thus resource element utilization increases, which directly impacts the user's throughput. Accordingly, SINR can also be a relevant identification for the area or the grid cell.

5) Channel Quality Indicator (CQI)
It is an indicator containing information to know how good/bad the quality of the communication channel is. UE sends CQI information to eNodeB to report current channel quality. Thus, CQI is a parameter that represents channel quality in the localization problem, which also differs from one area to another.

6) Tracking Area Code (TAC)
A tracking area is a logical concept where the user can move around without updating the Mobility Management Entity (MME). The network assigns a list with one or more Tracking Areas (TAs) to the user. Each eNodeB then transmits a unique TAC to denote to which tracking area the eNodeB belongs.

7) User Transmission Power (UE TX-Power)
User transmission power, in dBm, is a vital parameter to the localization problem that changes according to the nature of the area. UE increases its transmitting power to compensate for any increase in path loss. In addition, area topology affects the power of user's transmission, such as rural areas, where the UE transmits higher power compared to an indoor environment.

8) Downlink/Uplink Channel Bandwidth
It is the bandwidth assigned for the downlink and the uplink carriers per cell. It indicates the capacity per site. When channel bandwidth increases, the connection becomes faster, which means higher data rates.

9) LTE Frequency Band
The 3GPP developed the LTE standard as well as the frequency bands. In various countries around the world, different frequency bands are assigned to LTE. FDD and TDD are the two types of LTE Frequency Bands. FDD band has two frequencies, one for uplink and the other for downlink. TDD only needs a single band that is used for both uplink and downlink communications.
The majority of the state-of-the-art systems depend on serving Cell Identity (CID) and RSSI as the used features in the localization problem. Unlike them, we select the measurements mentioned above to be the features that assist the DeepFeat model in identifying the user's location more accurately. According to Table 3, all available features cover several network identifiers such as (RS), the base station, channel quality, interference effect, and user location.

B. FEATURES SELECTION
Extracting all features in the data collection phase ended up with 19 features which are the previously mentioned nine features in Table 3 but considering the serving and three neighboring cells. Therefore, these nine features result in the 19 features as follows: • PCI, RSPR, and RSRQ included the serving cell and three neighboring cells: 3 × 4 = 12 features. • UL Bandwidth, and DL Bandwidth: 2 features. The more features the model used, the larger dataset needed to train the model and the higher computational power [44]. That is why the feature selection module is vital to identify irrelevant or low-impact features on localization accuracy. Moreover, sometimes these features, without a selection module, cause high noise in the model and result in misleading outputs.
In this paper, we use three different techniques of feature selection: 1) Technology-intuitive method: We rely on our Telecom experience to select the most relevant LTE features from drive test data. The features that directly impact the localization problem were selected and used as the inputs to the DeepFeat model. For instance, serving cell identity and power are more influential than frequency band and bandwidth (BW). As a result, this method concentrates on signal signatures metrics as RSRP, RSRQ, SINR, CQI, UE TX-Power, and cell identity. 2) Correlation method: We employ a cross-correlation between all features. Accordingly, we identify the redundant features that have a high correlation, and we then exclude them. Figure 2 shows the heat map of the correlation matrix for all features. For example, as shown in the figure, UE TX-Power is -70% correlated with Serving cell RSRP (S-RSRP). This result was expected since the UE's power increases as the distance between the mobile device and the serving base station increases. Hence, the BS's power decreases. 3) Chi-squared test: The two previously-mentioned meth-ods indicate which metric can be omitted due to its high correlation with another metric. However, there is no sufficient information about the influence of each metric on the model's performance. Hence, the role of the Chi-Squared test is to provide such missing information [2], [3]. The Chi-squared test is a univariate statistical test used to select those features with the most influence on the output variable. The test determines if there is a significant relationship between each feature and the output. It calculates Chi-squared score (x 2 ) from the observed and expected value of the output per feature according to the below equation: where V is the number of measurements, E i is the expected value of the i th feature assuming independence between the feature and the output, O i is the observed value of the i th feature based on the output, and x 2 represents the Chi-squared score. We calculate this quantity for each feature to get a corresponding score that indicates the feature's influence on the output. A high score for a specific feature means that it has a significant influence on localization accuracy. When the feature and the output are independent, the observed value is so close to the expected value, which results in a lower Chi-squared score (x 2 ). A high Chi-squared score indicates that the hypothesis of independence is incorrect, and the output is then more dependent on the feature. Accordingly, this feature can be selected for the model training. Therefore, if a feature is independent of the output, it is uninformative for classifying the output and excluding it from the input set.
Considering the DeepFeat case, there is a mix of categorical and continuous features. For example, serving cell RSRP is a continuous variable, while serving cell identity is categorical. Besides, DeepFeat returns a categorical output which is the grid number. The Chi-squared statistical test is used to rank the features according to their scores, but we first apply two data cleansing steps before using the Chi-squared test. First, convert all categorical features into discrete numeral values. Then, ensure that all features' values are non-negative ones. Following the data cleansing phase, Chi-Squared is used as a scoring function. A low score means that the corresponding feature is independent of the output. In contrast, a high score means that the feature is strongly affecting the output and is most likely to provide important information.
As shown in Table 4, the scores of the features are ranked in descending order. The highest scores mean a strong influence on the output. The serving and neighboring cells' RSRP and identity have the highest scores. Consequently, these features have the highest impact on the output, match-VOLUME x, 2021 ing the past research that uses only RSSI and CID as the model features. However, using the Chi-Squared algorithm for feature selection, some features were found to have a relatively strong impact on localization, such as SINR, UE TX power, and CQI. Furthermore, RSRQ metrics have a fair impact on localization accuracy.
Our final feature selection module is a hybrid model between all mentioned techniques. We rely on features with the below characterization: 1) The unique signature is applied whenever correlation with other features is less than 70%, eliminating redundant features with the same impact on output (i.e., UE Tx power is redundant of serving cell RSRP). 2) Features with a Chi-squared score higher than the median value are selected to ensure that all the features with a solid contribution to the model are included (i.e., DL/UL BW and TAC excluded due to their low scores).
3) The final and most crucial step is utilizing our Telecom background experience to include some numerically neglected features. However, technically they have a high contribution (i.e., RSRQ metric of the serving BS included to assure including traffic load in the model). This hybrid model results in 12 features which are the inputs to the DeepFeat, as defined below: • PCI, RSRP, RS SINR, RSRQ, CQI for serving cell: 5 features. • PCI, RSRP for 3 Neighbor Cell: 6 features. • RSRQ for the first neighbor cell: 1 feature.

IV. DEEPFEAT SYSTEM MODEL
In this section, we present all the details about the DeepFeat system. Starting with the offline training phase, followed by the online tracking phase. Table 5 shows different notations that are used in this section.

Poor
Low Average High Excellent

A. OFFLINE TRAINING PHASE
The deep forward neural network is trained during the offline phase after collecting the data, feature selection module, data augmentation, and generating the grid.

1) Data Augmentation
Deep learning models need more training data to achieve better and accurate performance [13]. Data augmentation is one of the most popular techniques used to increase the number of data samples needed for the training phase. It is also used to reduce the noise effect in the training data for both GPS and cellular data [27]. Thus, data augmentation enhances the system's performance by handling the noise and creating new data samples, making the system more robust. DeepFeat uses a new augmenter called One-to-Many augmenter, which will be explained later, to increase the number of samples and reduce the noise effect.

One-to-Many Augmenter
Generally, GPS has inherent errors that can range within a few meters, in some cases, up to tens of meters [46]. Moreover, sample values measured during the data collection phase shall remain the same with a few meters at relatively low speeds like our case.
In the DeepFeat system, we are proposing One-to-Many spatial data augmentation. As shown in Figure 3, the proposed augmentation technique duplicates each sample in (K) directions. The number of augmented samples K is a   P G j Probability of the input sample to be in the j th grid cell.
g Estimated grid cell.

G
The set contains all grid cells.

P (g|F )
Probability of receiving a signal vector F at grid cell g ∈ G.
l Estimated user location. parameter that can be changed. If we choose K too large, this will enhance the system performance. However, this will increase the system's complexity during the training phase and increase the training time. On the other hand, choosing a small K hurts the performance in the case of a small number of data samples and may cause underfitting. Each recorded sample is repeated every n meters. In DeepFeat, we choose n = 3m as GPS accuracy decreases after nearly 3 meters, and the radio conditions change at distances larger than about 3 meters. K is considered as a system hyper-parameter that will be evaluated in the performance and evaluation section.

2) Grid Generation
To solve the localization problem, we have two approaches. The first approach is a regression problem that requires an infinite number of points that need a massive amount of training samples, increasing model complexity and visibility. On the other hand, the second approach addresses the localization problem as a classification problem by using a grid approach [49]. The grid generation approach overcomes different scalability challenges. First, it allows the data to be collected while the users are moving naturally in their lives, without requiring them to stand still at the different fingerprint locations, which reduces the cost. Next, it provides a way to reduce the number of grid cells in the area of interest and hence controls the model complexity and accuracy [27]. The grid generator divides the area into a virtual grid consisting of M grid cells, as shown in Figure  4. Each training sample belongs to one grid cell decided based on the coordinates, where a group of samples located in a specific grid cell represents the signature for this cell. These are the training samples used for this cell. Moreover, the length of the grid cell is a parameter to be tuned which affects the localization accuracy, as will be shown in Section V.

3) Deep Feed-Forward Neural Network Model
We start to train a deep neural network model after generating the grid cells. The model is trained using the data samples augmented by the proposed One-to-Many augmenter.
DeepFeat uses a feed-forward neural network, as shown in Figure 5. The input to the model is the N = 12 used features (F 1 , F 2 , ..., F N ), which are the outputs of the feature selection module. We have N s training samples representing the augmented training data set, where each record in the N s samples contains all the N input features. On the other hand, the network's output is the probability of the input sample to be in each grid cell (P G1 , P G2 , ..., P G M ). Each input sample belongs to one grid cell. Therefore, we decide that the input VOLUME x, 2021 sample belongs to the grid cell with the highest probability based on the output probabilities of all grid cells. Furthermore, DeepFeat uses Softmax for the output layer to calculate the probability (score) for each grid cell. In the offline phase, this deep model is trained using some training fingerprints such that the model learns the relationship between the input N features and the probability of M grid cells.

B. ONLINE PHASE
The previously trained model in Section A is used to estimate the user location in the online phase. In addition, the model can predict the user's location by learning the joint distribution (relation) between the input and the output, as mentioned in the previous section. Thus, the input is a new data sample, and the model will predict the grid cell where the user is located.

1) Estimating Location of The UE
We want to know the estimated location of the user. First, we estimate the grid cell (ĝ) where the user exists, which is the cell with maximum probability, as follows: where G is the set containing all grid cells, g is any grid cell belonging to G, and F = (F 1 , F 2 , ..., F N ) is the input features vector. We consequently estimate the user's location l.
The first method considers the center of the estimated cell g as the estimated user's location. However, it is a poor estimation. We used a second method with better estimation by using the center of mass of all grid cells weighted by the corresponding probability of each cell [28].
where P g is the corresponding probability for each grid cell g at the deep model's output layer.

2) Localization Accuracy
We then calculate the median localization accuracy after estimating the user's location. It is calculated by getting the difference between the estimated user's and the actual locations for all test samples.

V. PERFORMANCE EVALUATION
In this section, we evaluate the performance of DeepFeat in a suburban environment. First, we mention the data collection method. We then study the effect of different parameters on DeepFeat performance. Finally, we compare DeepFeat with other state-of-the-art systems.

A. DATA COLLECTION
Data is collected through intensive drive tests in a suburban area using TEMS solution [45], an autonomous solution that uses smartphones to test data and voice services. Data logs are uploaded or saved periodically for further post-analysis. The used testing unit is a commercial smartphone to simulate a real customer's experience. In our case, we collected the log files through Samsung Galaxy S8.

B. EFFECT OF CHANGING DIFFERENT PARAMETERS ON LOCALIZATION ACCURACY
This part shows the effect of changing different parameters on DeepFeat localization accuracy, as shown below. Figure 6 shows the effect of using the One-to-Many augmenter (Section IV-A1) versus no augmentation. We tried different values for the augmentation parameter K where increasing K enhances the localization accuracy; however, the training time increases exponentially. After K = 8, the localization accuracy starts to deteriorate. Therefore, we selected K = 4 for getting high localization accuracy with an adequate training time.

1) Effect of Data Augmentation
2) Effect of Changing Grid Cell Size Figure 7 shows the effect of changing the grid cell length (in meters) on the localization accuracy. This box plot shows the enhancement of localization accuracy by increasing the grid cell length, but this happens until a specific grid length of 50m. Increasing the grid cell length increases the number of data samples within each cell, which results in a better training model and a higher accuracy. However, the localization accuracy degrades after a grid cell length of 50 meters because increasing grid cell length will cover larger areas. Thus, there is a trade-off between gird cell and localization  accuracy, leading us to choose 50 meters as the optimum value for the grid cell length.

3) Effect of Number of Training Epochs
The number of training epochs represents the system's number of iterations over training data to update the weights. Figure 8 shows the effect of changing the number of training epochs on the localization accuracy at the optimum grid cell length obtained in the previous section. It shows that we enhance the localization accuracy by increasing the number of epochs. However, the localization accuracy is nearly constant after 2500 epochs. As shown in Figure 8, we stop at 3500 epochs where the minimum localization accuracy in meter is 7.4m, with a median value of 13.7m and a maximum value of 22.11m.

4) Effect of Dividing Area into Sub-areas (Homogeneity Test)
To test the robustness of DeepFeat model accuracy, we divided the area into four equal sub-areas. Then, we checked   localization accuracy per sub-area. Figure 9 shows the localization accuracy difference between sub-areas. The figure shows that the localization accuracy variance is almost 1 meter, showing the homogeneity of the model for different sub-areas.

5) Effect of Different Area Size
One of the contributions introduced in this paper is testing DeepFeat model on a considerably large-scale area of 45km 2 compared to most of the previous works [27], [28], where they introduce their models to a small-scale area < 2Km 2 . We addressed the difference in our model accuracy when we used a small-scale region of 6.27km 2 compared to a largescaled one of 45km 2 . Figure 10 shows that using small-scale area results in median accuracy of 13.179m. In comparison, the large-scale area results in median accuracy of 13.7m, which indicates that DeepFeat is a robust outdoor localization system even in a large-scale area.

6) Feature Selection Evaluation
In this section, we evaluate the DeepFeat performance using a different number of features. Figure 11 shows the localization accuracy for different models. The first one uses RSRP and PCI as the most commonly used features in the literature [27], [28]. In contrast, the second one uses all 19 features, and the third uses the proposed hybrid model that uses 12 features, and those features are mentioned before in Section III-B. This figure shows that the first model achieves median localization accuracy of 22.55m, the second one achieves 16.28m, and the proposed hybrid model achieves the best median localization accuracy of 13.7m. We then compare the computation resources(time) needed to train the model using all collected features (19 features) with the resultant features from the hybrid model (12 features). We saved 20.6% from the computation resources(time) using Lenovo Laptop core I7 with GTX GPU for the training process. Figure 12 shows the saving impact due to using our hybrid model features.

C. COMPARISON EVALUATION
This section compares DeepFeat with the other two stateof-the-art systems, CellSense [24] and DeepLoc [27]. CellSense, is mainly a probabilistic RSSI-based fingerprinting location determination system for 2G phones [24]. While DeepLoc is a deep-learning-based outdoor cellular localization model that captures the unique signatures of the different cell towers at different locations without assuming cell towers' independence [27]. Table 6 provides a look across different cell-based localization systems. The table includes the main metrics that differentiate one system from another: area size, number of samples, number of cells, and the achieved accuracy (localization error). For the sake of fair benchmarking, we normalized the number of samples and cells over area size among all systems subject for comparison. Hence, we introduce cell density and samples per Km 2 . From the table, we can have the following observations: 1) Cell Density: A higher number of cells per Km 2 provides more information and a unique fingerprint for the area, enhancing the localization learning process as distinguishing between samples will be much easier, hence better localization error. DeepFeat shows the best localization accuracy with the lowest cell density among other systems. 2) Dataset size: More samples per Km 2 provide better fingerprints in the area, enhancing the localization error, but it consumes more time and higher operational cost. Although DeepFeat and CellSense use a comparable number of samples per Km 2 , DeepFeat shows better localization accuracy. We also observed that DeepLoc showed similar localization accuracy with DeepFeat but with 50-100 times more samples, which reduces the cost needed to collect training data in the DeepFeat model.
As illustrated above, DeepFeat provides superior performance in both large-scale and small-scale areas. This performance gain is due to features engineering developed within DeepFeat. The various features introduced to the model enhance fingerprint, which supports DeepFeat to operate smoothly on different scales and to accommodate the relatively lower samples/cells density, unlike the other systems within the comparison.

VI. CONCLUSION
We introduced DeepFeat as a multi-features outdoor localization deep-learning-based system for 4G networks. DeepFeat utilizes various Telecom features to deliver high localization accuracy. We showed that the employed feature selection module in DeepFeat saves computing resources(time) with enhancement in system accuracy. Using multi-features enhanced the localization accuracy significantly compared to using few features. Also, DeepFeat introduces a new data augmenter that helps to improve model accuracy and localization accuracy. We tested DeepFeat over a large-scale area with a median localization accuracy of 13m. Also, it achieved median localization accuracy up to 5m in an outdoor environment when tested over a small-scale area of 1.8km 2 . Therefore, DeepFeat achieves a significant localization accuracy by using many selected features in different environments of different scales. The current proposed model of DeepFeat depends on data collected from the field so that the features are time stamped. This time dependency is not considered in the proposed model. Thus, we can extend DeepFeat to consider time as a feature using Recurrent Neural Network (RNN). Memorizing time from the last sample can be extended to other features as the last position and angle of direction as implemented in the RATSLAM model [48]. This work considers a modal deep-learning technique where we have data collected using a drive test that depends on a single user only. Each person is tracked independently of the other. However, we may consider later the use of multi-modal deep-learning techniques to improve the prediction performance for multiusers. DeepFeat performance also shall be investigated for different devices used for data collections. Moreover, we can study DeepFeat performance with datasets from various service providers. Finally, we can test DeepFeat for new evolving mobile technologies like 5G.