AI-Powered In-Vehicle Passenger Monitoring Using Low-Cost mm-Wave Radar

We propose a novel algorithm to identify occupied seats in a motor vehicle, i.e., the number of occupants and their positions, using a frequency modulated continuous wave radar. Instead of using a high-resolution radar, which increases the cost and device size, and performing complex signal processing with several variables to be tuned for each scenario, we integrate machine learning algorithms with a low-cost radar system. Based on heat maps obtained from the Capon beamformer, we train a machine classifier to predict the number of occupants and their positions in a vehicle. We follow two different classification methods: multiclass classification and binary classification. We compare three classifiers: support vector machine (SVM), K-Nearest Neighbors (KNN), and Random Forest (RF), in terms of accuracy and computational complexity for both testing and training sets. Our proposed system using an SVM classifier achieved an overall accuracy of 97% in classifying the defined scenarios in both multiclass classification and binary classification methods. In addition, to show the effectiveness of our proposed in-vehicle occupancy detection method, we provide the results of a commonly available people counting and tracking method for occupancy detection. Compared to common methods, the effectiveness, robustness, and accuracy of our proposed in-vehicle occupancy detection method are demonstrated.


I. INTRODUCTION
Human occupancy monitoring plays an important role in several applications. Most of the existing occupancy detection methods are applied for building automation control, public safety, and intelligent transportation [1]. In motor vehicles, the application of occupant monitoring tends to be utilized to control passenger-side airbags, safety belts, and warning devices.
The most commonly available seat-occupancy monitoring sensors are mechanical sensors that detect weight, force, acceleration, or pressure in various operating modes and configurations [2], [3]. However, since these methods are The associate editor coordinating the review of this manuscript and approving it for publication was Mouloud Denai . predominantly based on weight, mechanical sensors may fail to distinguish between humans and objects placed over the seat and, thus, are prone to false alarms. Similar limitations also apply to the electric field or capacitive [4] and inductive sensing methods [5], [6].
Camera vision [7], [8], and infrared (IR) sensors [9] have been widely used for people counting and human detection in various applications. The most common application is in large environments such as homes and shopping malls [10]. Moreover, near-infrared (NIR) is commonly used in transportation imaging systems for vehicle occupancy detection, seatbelt violation, and cell phone usage detection [11]. However, camera-based systems and IR sensors are sensitive to illumination levels and sunlight and suffer from obstructed line-of-sight conditions. Additionally, optical-based systems incur substantial overhead costs, require massive image processing, and lead to privacy issues.
One of the most promising solutions to overcome the problems of dead spots and dependency on environmental conditions are radar-based sensors. Wireless sensors are contactless and non-intrusive. They also preserve privacy, as there is no visual camera. Recently, radar systems have been widely used for human monitoring, such as activity recognition, occupancy detection, people counting, and vital signs detection [12]- [14]. In [13], a Doppler radar operating at 2.405 GHz was used to detect both respiratory and heart signals. However, detection approaches based on the Doppler effect produce false alarms with other moving objects. Moreover, these studies mostly focused on the sensing of a single human target while the number of human targets, namely how many people are present as well as their location, is also crucial information, particularly for in-vehicle occupancy detection [15]. Ultra-Wideband (UW) radar sensors have been widely used for people counting [16]- [18]. In [19]- [21], Impulse Radio-UWB (IR-UWB) radar sensors were used to count a large number of people simultaneously passing through a wide passage or a wide door. Their proposed methods were based on received signal patterns according to the number of people and the maximum likelihood method, while the radar was installed on the roof with a height of 2.3 m to cover the wide area. In [22], an IR-UWB radar was used to propose a method based on the Curvelet transform and the distance bin to count people with the radar installed on a 1.8 m stand.
Most of the currently available people counting methods using radar sensors have been applied in a large environment where a sensor was installed on a stand with a height of more than 1.5 m and tested with people relatively far from each other. Additionally, these methods only estimated the number of people, while the locations of people were unknown. In summary, common techniques that we use today for estimating the number of people in a given large area using radars are often inaccurate and require very complex signal processing methods leading to high computational costs.
For in-vehicle occupancy detection, providing accurate information on the number of passengers and their occupied seats in an automated manner is still a challenging task. There have been very few studies on in-vehicle occupancy detection using radar-based sensors [23], [24]. Currently, the radar systems that have been used inside vehicles are centered on detecting the presence or absence of a living body to save children and pets left in vehicles to prevent death in extremely hot or cold weather [25], [26]. The techniques are mostly based on the micro-Doppler effects of the breathing of one alive subject inside a car without considering the occupied seats and the number of passengers. In our previous work [26], we proposed a novel radar signal processing technique to identify the presence or absence of a living body in a vehicle using a frequency modulated continuous wave (FMCW) radar. Our proposed method was based on reflections from breathing cycles creating correlated and consistent micro-Doppler effects over time. We showed that the radar sensor paired with our signal processing method could clearly distinguish between alive subjects and inanimate objects. Note that compared to mechanical sensors, this is one of the advantages of using radar sensors radar for in-vehicle occupancy detection. In our previous work [26], there was no need to know the number of occupants or their location in a parked car as the key vital information was the presence or absence of a living body, especially a child or an infant, to prevent death in a parked/stopped car.
Unlike our previous work, where the number of passengers and their occupied seats was not identified, the primary purpose of this paper is to utilize a single MIMO FMCW radar sensor to count multiple passengers and identify vacant and occupied seats inside a vehicle. MIMO FMCW radars have unique advantages that differentiate them from other radar systems, including simultaneous detection of range, angle, and micro-Doppler shifts, which make these types of radars suitable for a variety of applications. The significant advantages of the new generation of these radars are low-cost and low-power [27]. Unlike the available radar-based people counting methods that have been used in a large environment [21], [22], we utilized only one sensor where the radar is installed on a ceiling of a vehicle (low height) while passengers are very close to each other in a small area inside the vehicle.
Since passengers sit next to each other as close as possible, radar-based seat occupancy detection requires a higher resolution radar in addition to complex signal processing methods. For FMCW radars, the larger the sweeping bandwidth, the higher the range resolution [27]. On the other hand, the angular resolution increases by increasing the number of transmitters and receivers. However, implementing a radar system with a large number of transmitters and receivers would result in a higher system cost and more operational complexity. Each receiver antenna has a receive chain consisting of an analog to digital converter (ADC), a mixer, a low noise amplifier (LNA), and a low pass filter (LPF). Thus, cost and area constraints are some drawbacks of a high-resolution radar. Hence, it is desirable to achieve accurate occupancy detection with a low-cost and low-resolution radar.
To overcome the aforementioned issues, we propose a new in-vehicle occupancy detection algorithm that integrates radar signal processing techniques with a machine learning algorithm. We applied machine learning on the data from an inexpensive MIMO FMCW radar to accurately provide information about passenger (s) presence and the number of passengers as well as their location within the vehicle. In particular, we demonstrate that the reflected signals with the information about the range, the direction of arrival of passengers (spatial information) as well as reflected power are suitable features to be delivered to machine learning algorithms to identify occupied seats. Using a Capon beamforming algorithm [28] and based on the features obtained from range-azimuth maps of the vehicle environment, we train and examine several machine learning classifiers. We train and evaluate the performance and accuracy of Random Forest (RF), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) [29] classifiers in detecting and counting occupants inside a vehicle. We apply two methods of classification: multiclass classification (scenario-wise) and binary classification (i.e., presence/absence of a passenger in a specific seat -occupied vs. vacant). Then, we compare the performance of the two methods with deploying the classifiers in terms of accuracy and computational complexity. The results show that in binary classification, SVM produces results that are as accurate as multiclass classification methods. However, KNN and RF perform pretty poorly in the multiclass classification method, while they have been shown to be highly accurate in binary classification. To show the usefulness and effectiveness of the machine learning algorithms in occupancy detection, we compare machine learning results with a conventional people tracking method. Unlike our autonomous and AI-powered system, the traditional method implements complex algorithms with multiple variables to be tuned. These variables are dependent on the environments, people's movements, and their distances from each other, overall leading to inaccurate results. Additionally, the conventional method is not able to cluster passengers and distinguish them from each other using such a low-resolution radar in a small area inside a vehicle where reflected signals from multiple people are highly correlated with each other.
The remainder of this paper is organized as follows. In section II, we describe the system design and the proposed algorithm for in-vehicle occupancy detection. In section III, we discuss the experimental results. Finally, the paper is concluded in section IV.

II. IN-VEHICLE OCCUPANCY DETECTION SYSTEM
In this section, the basics of a MIMO FMCW radar are provided to extract range and azimuth information of the environments. Also, our proposed methods for in-vehicle occupancy detection and passenger counting are provided, along with the reason for applying machine learning. Furthermore, details of the common people counting, and group tracking are provided to evaluate and compare our proposed method.

A. MIM0 FMCW RADAR CONCEPT
In all radar systems, an electromagnetic signal is emitted by a transmitter, and the signal is reflected off targets of interest at a certain range from the transmitter. Then, a receiver scans for these echoes of the transmitted signal and identifies characteristics of the target, such as position and speed.
Conventional pulse radars periodically emit a monochromatic pulse. The distance of the target from the radar is determined based on the time delay between the transmission of the signal and the reception of the echoed pulse. These systems, however, are unsuitable for short-range measurements as a device with an extremely fast clock speed would  be required to discriminate the time delay of signals even at magnitudes of 1 meter.
As one of the most promising among other radar techniques, we chose an FMCW radar for in-vehicle occupancy detection to avoid the above problem by using transmitting sinusoidal signals with a linearly increasing frequency over time, known as chirps [29]. Fig. 1 (a) illustrates a single chirp of the FMCW radar and the associated timing parameters. Fig. 1 (b) also shows frame structures, which are a set of chirps, used as the observation window by inter-frame time.
The chirp is reflected off objects, and the reflected chirp is received at the receiver antenna (s). As shown in Fig. 2, the demodulation of the reflected signal to baseband (IF signals) results in the received signal being the linear combination of sinusoids at different frequencies. Each frequency corresponds to a detected target at a specific range from the radar [29].
As shown in Fig. 2, multiple objects create multiple IF tones, which result in multiple strikes in the IF frequency spectrum. Therefore, applying range-FFT to the received signals, range information of targets could be obtained.
In an FMCW radar, having up chirps, i.e., only positiveslope chirps, for a transmitted signal s(t), the received signal at l th antenna element reflected by the target x l (t) can be modeled as (assuming that the target is a single point target) where t f and t s are fast and slow time indexes, b l and α l are the virtual channel's mismatched magnitude and phase, and f b is the beat frequency. Moreover, v, λ max , τ l , ψ l (t f , t s ), and e l (t f , t s ), in (1) are the target's radial velocity, the wavelength corresponding to the start frequency of the FMCW ramp, the phase shift at l th receiver due to the angle of arrival (AoA), the residual phase noise and the additive noise, respectively. The frequency of an IF signal produced by an object in front of the radar located at a specific range is calculated by where S is the rate of increase in the frequency of the sinusoid and c is the speed of light in free space. Multiple objects in front of the radar produce multiple tones operating at different frequencies in a frequency spectrum of the IF signal, as shown in Fig. 2.
We can consider t s as a time from the start of n th sweep and define t as: where T is the sweep repetition period or sweep time.
To locate an object in a 2D space, the angle of the object is required along with its range. To calculate the AoA of an object, at least two receivers, separated by a distance of d, are required. As shown in Fig. 3, the reflected signals from an object (at an angle θ with respect to the radar) arriving at each of the successive receivers are delayed by dsin(θ). This delay causes a phase shift of ω = (2π/λ) × dsin(θ) between the signals received at the two receivers. To estimate the AoA of a target, several algorithms have been proposed and analyzed thoroughly. The Capon approach has been one among many well-known algorithms for the AoA determination of incident signals using an array antenna structure [30]. The Capon approach estimates the AoA by reordering terms in (1) and stacking the received signal from all receiver channels in a column vector. It can be expressed as: where , a, and y are defined as follows: and y Here, θ is the angle of arrival (AoA) of the target, depends on the channel gain/phase mismatches, and a depends on AoA, which is called the steering vector. If there is more than one target at different ranges, i.e., with different f b , then the received vector is the summation of all the vectors received from each target given by where A is a matrix with K number of targets that has columns corresponding to the steering vector of each target. Y is a diagonal matrix with the elements of y(v, f b , t f , t s ,). In (5), the vectors A, v, and f b are the unknown parameters of the target; however, only elements of A are functions of receiver channel indexes. Hence, matrix Y does not contribute to the covariance of x. In fact, the covariance matrix of x can be computed in the following manner when the additive noise is uncorrelated to Y : where P S is the power of signals, and R n is the noise covariance, which is a positive definite matrix by assuming that the noise at each receiver is independent of the others. Moreover, the first term in the covariance of (12) is positive definite since A(θ) is a Vandermonde matrix with positive kernels [31]. Therefore, R is positive definite, which is invertible. The Capon output filter spectrum is computed as whereθ is a test unknown AoA. VOLUME 10, 2022 With the azimuth information of the target obtained from the capon beamformer and the range calculated from the range-FFT, a range-azimuth heat-map of the subject is obtained. The range-azimuth heat map represents the density of reflected signals in the environment (in cars for our application). In fact, if a seat is occupied by a person, that location has more reflections in comparison with other non-occupied seats.
In traditional methods, by applying CFAR and then clustering methods, the point cloud of a passenger could be obtained. However, in the case of having more than one occupant in the car sitting very closely next to each other, reflected signals from passengers are not easily distinguishable. Isolating, discriminating, and clustering two subjects sitting at around zero distances from each other not only requires a highresolution radar but also a very complex signal processing method. A part of the reason is that reflections from a subject are like a Sinc function, as shown in Fig. 4; the sidelobe of reflected signals from one passenger leaks to reflections bouncing off from the adjacent passenger. Hence, to distinguish between occupants with a zero distance and count the number of occupants, we need to isolate signals coming from each passenger, remove the leakage from them, and then apply the conventional methods used to identify occupied seats inside a car. However, this process requires information about passengers, such as height and width, which varies from one passenger to another. This would require tuning several parameters and applying more sophisticated methods.
On the other hand, machine learning algorithms can be trained and then used to predict new scenarios based on what has been learned. In this paper, to count passengers and identify their occupied seats in a car where people sit very close to each other, we integrate machine learning with a heat map obtained from a low-resolution radar to tackle all issues mentioned above.

B. PROPOSED ALGORITHMS
In Fig. 4, our proposed signal processing chain for in-vehicle occupancy detection is illustrated. First, in a Time Division Multiplexed (TDM) MIMO FMCW radar, a sequence of chirps is sent in a frame from different transmit antennas. At the receiver, the reflected signal is collected and assigned to a virtual channel such that each channel contains the data transmitted and received from and to a unique pair of transceivers. To obtain the range information, a range-FFT is applied to the received chirp samples. In the MIMO radar, the array of antennas is not entirely isolated; therefore, it is essential to remove mutual coupling between two elements (mostly between receiver and transmitter elements). After creating virtual channels and applying the range-FFT, mutual coupling removal is applied between transmitters and receivers. Then, the stationary clutter removal algorithm is performed on each channel to remove all stationary targets.
To do so, the average value of the signal is computed and subtracted from the aggregated signals at each frame; removing the average is equivalent to eliminating the stationary scatters [26].
As shown in Fig. 4, after applying the clutter removal algorithm, signals reflected from all stationary targets, such as car seats and other objects inside the car, are removed. Hence, the only remaining signals are the ones coming from the passengers.
Since our primary purpose is to identify the occupied seats by passengers and to count the number of passengers, we need the azimuth information in addition to the range of subjects (spatial information). In this regard, the Capon-beamforming-based radar processing is performed to determine the AoA spectrum of each range. We use the rangeazimuth heat map (7) to construct the point cloud information of passengers in the car environment. Unlike the conventional algorithm, as shown in Fig. 5, which applies constant falsealarm rate (CFAR) detection to extract the detected points and combines them with micro-Doppler patterns to cluster the detected points, we used machine learning algorithms. In our proposed algorithm, the range-azimuth heat maps are considered inputs for the machine learning algorithm to be trained. Finally, based on the delivered information, a machine learning classifier is trained, and, thus, a new situation can be predicted. More details of applying the machine learning algorithms are provided in section III. D. It should be mentioned that the conventional people counting method has several variables to be tuned and requires a very high-resolution radar to detect passengers in a small area. More details of the conventional method can be found in section II.C. In contrast, our proposed method can accurately and autonomously identify the occupied seats and count the number of passengers in a vehicle using a low-resolution radar paired with machine learning.

C. GROUP TRACKING AND LOCALIZATION
In order to show the effectiveness of our proposed in-vehicle occupancy detection algorithm, we first implement a conventional people counting method [32]. As shown in Fig. 5, for group tracking and positioning, the Capon beamforming is applied to obtain a range azimuth heat map. Moreover, two pass CFAR detection algorithms (range-CFAR and azimuth-CFAR) are used to find detected points. Using the Capon beam weights and Doppler FFT, point cloud information is extracted for each detected point. Then, the point cloud information is passed on to a group tracker algorithm. Based on the point cloud data, the localization algorithm is applied. This algorithm functions by allocating detected points to a set and determining whether the points are consistent enough to be classified as a person. Each frame is classified in one of two ways: 1. Hit event: There is a sufficient number of detection points in a set to be classified as a person. 2. Miss event: There is not a sufficient number of detection points in a set to be classified as a person.
These hit and miss events enable different detection states in the algorithm. The three states which can be enabled are: . Proposed in-vehicle occupancy detection algorithm. The input chirps are converted to virtual channels. Then, this data is put through the clutter removal algorithm, and the filtered data is used to produce range-azimuth maps. Finally, these range azimuth maps are fed into a machine learning algorithm to predict the number of passengers and their occupied seats.
1. Free state: There are no detection points being allocated to a set. 2. Detect state: There are detection points being allocated to a set; however, the algorithm has not yet classified this as a target. 3. Active state: The detection points in an allocated state have been classified as a target. There can also not be enough consecutive missed events for the detect state to revert to a free state. Once a certain threshold of hit events has occurred, the set will transition from a detected state into an active state and thus show the presence of a target. In this algorithm, to obtain the best results, the parameters defined in Table 1 should be tuned and depend on people's movements and their distance from each other. These group tracking parameters were obtained and defined using some of the degrees of freedom found in the proposed algorithm as well as through experimentation. However, our proposed in-vehicle occupancy detection algorithm will be rained and tested without the need for such a complicated t process of processing and tuning parameters.

III. EXPERIMENTAL RESULTS
In this section, our pieces of equipment for data collection, the description of the vehicle we used for our tests are explained.
Moreover, experimental results using the common method and our proposed methods are provided and discussed.

A. OUR EQUIPMENT
For our experiments, we used a Texas Instrument (TI) mmwave FMCW chip (AWR1443), which operates at 76Hz-81GHz [33]. Fig. 6 (a) illustrates the radar system used for our experiments along with a data capturing adaptor (DCA1000). Note that ADC data (the chirp samples) was captured using the DCA1000 EVM board and transferred over the UART interface to a PC [34].
As shown in Fig. 6 (b), AWR1443 has four receivers (Rx) and three transmitters (Tx) capable of estimation of both azimuth and elevation angles. However, we used two of the transmitters to create 2D heat maps and all four receivers in the TDM MIMO configuration. This technique of synthesizing an array of eight virtual Rx antennas compared to a single Tx configuration improves the angular resolution by a factor of two [23]. We used MATLAB to process the raw data from the radar to obtain range-azimuth heat maps. Then, we used Python to train and test machine learning algorithms with the data. For all processing in this paper, we used a computer system with Windows 10 64 bits operating system and an Intel (R) Xeon(R) CPU E5-1603 v4 @ 2.4 GHz 128 GB VOLUME 10, 2022  RAM processor. It should be noted that since the AWR1443 has a built-in digital signal processor (DSP), radar signal processing could also be applied to the received signals in the radar built-in DSP in the future. Using the built-in DSP, the signal data could be delivered to a trained machinelearning algorithm to predict a new scenario without needing the DCA1000 board or an extra computer.
Our measurements were conducted in a Toyota Sienna with three rows and seven seats. To find the radar best installation option inside a vehicle to have the maximum coverage, radar antenna pattern simulations were performed in our previous work [26]. As shown in Fig. 7 (a), the radar was placed at 107 cm in front of the base of the rear-view mirror to cover the second and third row. The radar antennas were along the top and facing the passenger seats of the vehicle. Note that our purpose in this paper is to monitor passengers in the second and third rows. As shown in Fig. 7 (b), we designed a mount to hold the radar system that could rotate in increments of ten degrees to adjust the best angle obtained from our simulations in [26]. The side view of the mount is shown in Fig. 7 (b).
Moreover, Fig. 8 (a) shows the inside of the vehicle with seven seats, and Fig 8 (b) depicts the distance of the radar mount from each seat in the vehicle along with the seat nomenclature.
To achieve the specific desired performance specifications of the radar with a visibility range of approximately 2 m and a range resolution of about 4 cm, the characteristics outlined in Table 2 were used to configure chirps. The chirp duration was 60 ms with an idle time of 250 ms, the slope frequency was 60 MHz, and the sweeping bandwidth was 3.6 GHz. Using the parameters outlined in Table 2, the maximum range was set to 2.74 cm. Moreover, by creating eight virtual channels with Tx 1 , Tx 3, and all four receivers, we have eight spatial samples resulting in 2/8 Radian or 14 o resolution for azimuth detection. This means that at 95 cm away from the radar (seat #3 and seat #4), two targets can be resolved only if they are separated by more than 25 cm (95×sin(14 • )); otherwise, the targets are spatially correlated. The limitation of low angular resolution is more obvious in the third row. To distinguish two targets in the third row, they should be around 40 cm (160×sin(14 • )) apart from each other. However, passengers in the third row are sitting at a zero distance from each other (shoulder to shoulder seating); thus, we need a radar with a higher angular resolution to distinguish all three passengers in the third row. To overcome the limitation of the low-resolution radar, we applied machine learning algorithms to the data.

B. RANGE-AZIMUTH HEAT MAP
In our defined scenarios (to detect passengers in the second and the third rows), we had 32 possible situations (2 5 ) for five seats. Table 3 defines all 32 possible situations in our measurement setups as well as the number of samples for each case recorded during our measurements. Fig. 9 represents range-azimuth heat maps (13) obtained from all 32 possible situations. The X-axis represents the azimuthal angle in degrees, and the Y-axis is the range from the radar to the targets. Since, from Table 2, the ADC samples for each chirp is 64 and the number of chirps per frame is 256, the dimension of the range-azimuth heat map is 32 × 256. As shown in Fig. 8 (b), seat #3 and seat #4, Seat # 5, and seat #7 are located at the same distances from the radar but different angles.
From the results drawn in Fig. 9, for ''Case 00001'', ''Case 00010'', ''Case 00100'', ''Case 01000'' and ''Case 10000'' when only one passenger was in the car, the passenger and his position can clearly be detected. However, for the case of more than one passenger, detection directly depends on their distances from each other due to the radar angular resolution and sidelobe of reflected signals of passengers. For example, for ''Case 00110'' (seat #4 and seat #5),   we can detect the presence of two persons in the car. For most of the cases with more than one passenger, their heat maps do not show clear visual passenger separation. This problem is more severe for the back row, where the passengers sit beside each other as close as possible. Although the low angular resolution was one of the main drawbacks, we found another limitation of the heat map creation algorithm, which was also one of the limitations of the conventional method. In the case of multiple occupants in the car, we realized that if one of the passengers moves more than others, the respective movement impact was dominant in the heat map that concealed signals coming from other passengers.
For example, for ''Case 00011'', a passenger sitting in seat#3 moved his hands a lot while a passenger in seat#4 was nearly stationary. Although two passengers had enough distance to be detected as two different targets, the effect of movements of a passenger sitting in seat#3 dominated the signals. Moreover, since our body is not a point target (non-rigid), reflected signals from passengers are like a Sinc function that spreads over range and azimuth.
Hence, in the heat map, there is no distinguishable line to separate reflections from one passenger to another one. In addition to a high-resolution radar, we need to perform complicated signal processing with various parameters to be tuned to discriminate passengers sitting in the third row.

C. GROUP TRACKING AND LOCALIZATION
To show the performance of the conventional people counting and multiple tracking method [32], we performed a test with five people sitting in the car for over 6 minutes.   We applied the algorithm described in section II.C. Based on the default values listed in Table 1, the result of the people counting algorithm of five people over 6 minutes is shown in Fig. 11. The result of the algorithm when detecting five people as three is shown in Fig. 12. The result shows that the algorithm mostly regarded three people in the back row as one person. This is because the low angular resolution is unable to distinguish three people from each other. Additionally, the effect of a person with a larger motion diminished the presence of other people and thus decreased the number of detected people. Also, to determine if we could detect all five people with the conventional method using other parameters, we changed the parameters (defined in Table 1) in a structured manner. We first changed one parameter: the changed parameter would then remain constant and provide control for the remaining parameters. The five parameters were then individually altered, and two-minute samples were taken to check the accuracy of each change. The value of the changed parameter and the accuracy were tabulated in Table 4. As shown, using this low angular resolution radar and applying the conventional people counting algorithm, we were not able to identify which seats were occupied accurately. One solution was to improve the angular resolution by increasing the number of transmitters and receivers and implementing sophisticated signal processing.
However, cost and area constraints are some drawbacks of the high-resolution radar. In addition to a high-resolution radar, we need to develop an accurate and complicated signal processing to map the heat map patterns (Fig. 9) to passengers and their occupied seats, which is mathematically not tractable. For this reason, we are motivated to adopt machine learning as an effective tool for our in-vehicle occupancy detection system.

D. MACHINE LEARNING ALGORITHMS
To overcome the limitations of occupancy detection in the car with five seats, we applied machine learning algorithms to be trained and then identify which seats are occupied by passengers. Our main method of identifying which of the five seats are occupied creates 32 possible situations. To solve the aforementioned problem using a low-resolution radar, we implement the following two methods for our proposed machine learning algorithms: multiclass classification (scenario-wise) and binary classification (seat-wise). Range-azimuth heat maps of 32 possible situations in the car using the Capon algorithm. All images are normalized to the maximum value of each heat map. The red dot(s) represents the occupied seats for each case respectively. The layout of the seats is illustrated in Fig. 8 (b).
For the first method, we have 32 classes, as listed in Table 3. A training set consisting of data belonging to 32 different classes was collected. For each class, we collected around 2000 samples. The purpose is to train a classifier which, given a new data point, will correctly predict the class to which the new point belongs. For the second method, we have five seats in our vehicle under test and use a machine learning algorithm five times for five seats separately. For instance, for seat #3, our goal is to identify if seat #3 is occupied or not, leading to binary classification, 0 and 1. Therefore, we labeled all data as ''1'' when someone was sitting in seat #3, otherwise, the label was ''0''. We repeated this procedure for all five seats and ran the classifier for each specific seat. We call this method ''one-vs-all'' (OvA) in this paper.
In order to obtain the training and the test dataset for classifiers to be trained and then to predict occupied seats, we recorded 73209 frames of 32 possible situations. The number of samples for each case is listed in Table 3. To obtain data of different conditions in the car and possibly to generalize the algorithm for all passengers, we recorded each situation for three different sets with different people, 2 minutes for each set; overall, we recorded around 192 minutes with 73209 frames. As listed in Table 3, almost all 32 situations   have the same sample size; thus, we create a balanced dataset for each scenario. If this is not the case, we need to apply resampling methods to generate a balanced dataset to prevent biased predictions. We deploy three machine learning algorithms to be trained [35], namely, Random Forest (RF) [36], K-Nearest Neighbors (KNN) [37], and Support Vector Machine (SVM) [38]. As described in section II. B and shown in Fig. 4, range-azimuth heat maps obtained from (13) are the inputs to train different classifiers. This gives 32 × 256 = 8192 feature vectors that include information about the range, angle, and amplitude of reflected signals from the targets. These vectors are used as training data for the three supervised machine learning algorithms where the class label for each training pattern is known.
As shown in Fig. 4, the first step to implement a machine learning algorithm is to find a dataset with good features to train the algorithm. Since our purpose is to find the occupied seats and count the number of passengers, rangeazimuth maps have sufficient information to serve as inputs to machine learning classifiers. Then, we apply stratified k-fold cross-validation, which divides the dataset into k subsets of equal sizes. The classifier is trained on k-1 subsets, with the left-out subsets used to validate (test) the classifier [39]. This is repeated k times, with each subset acting as the validation subset only once. The evaluation metrics are then averaged over all the k-folds. In this paper, we use five-fold crossvalidation and calculate the accuracy and precision to compare the performance of different classifiers. This technique should be applied in a real-time situation; thus, the testing time of a classifier is important in addition to its accuracy. Then, for the datasets with a train/test split dataset, we normalize features to a mean value of 0 and a standard deviation of 1 on the training set.
Although the same scaling is applied to the training and test sets, we determine the scaling using only the training set. The next step in machine learning is dimensionality reduction. In most machine learning methods, we are interested in reducing dimensions to decrease the computational time, complexity, and cost of extracting unnecessary features. Additionally, with dimensionality reduction, we can develop a simpler model for the problem, which is less prone to overfitting and less sensitive to noise and outliers. We apply Principal Component Analysis (PCA) to extract features and  to reduce the number of dimensions. PCA is an unsupervised feature extraction method creating uncorrelated features [40]. Applying PCA, the number of extracted features reduced from 8192 to 198. Note that the range-azimuth heat map with 32 × 256 = 8192 vectors is the initial feature input before performing PCA. It should be pointed out that the performance of each of these classifiers depends on its hyperparameters to control how it learns the training dataset. It is, therefore, necessary to optimize the hyper-parameters utilized by the classifier. A range of values is specified for the crucial hyper-parameters for each of these classifiers. A grid search [41] with 5-fold cross-validation is conducted to obtain the hyper-parameters. Tuning the hyper-parameters increases the prediction accuracy of the classifiers compared to their performance with the default hyperparameters. The optimized hyper-parameters for RF and KNN are summarized in Table 5 and Table 6, respectively, where ''n_estimators'' is the number of trees in the random forest classifier, ''criterion'' is the function to measure the quality of a split, and ''max_depth'' is the maximum depth of the tree.
With ''weight'' set to 'uniform', all points in each neighborhood are weighted equally, while setting it to 'distance' results in all points being weighted by the inverse of their distance. Furthermore, for SVM, we use a linear kernel function with ''c = 10'' for the case of the multiclass classification method.

1) MULTICLASS CLASSIFICATION
Firstly, we performed machine learning using multiclass classification methods. Fig. 13 shows the precision of RF (green color), SVM (blue color), and KNN (red color) for multiclass classifications. It can be seen that for the vacant car (''Case 00000''), SVM and RF identify the empty car correctly for every input data. It means that our proposed method can identify the presence or absence of passengers in a car with 100% accuracy. Additionally, SVM performs very well for the cases with 2 and 3 people in the car with more than 98% precision. The worst-case scenario is for ''Case 11111'' where all seats are occupied. The precision of the SVM is 92%, which is still very high in comparison to the conventional method shown in Fig. 11. However, the performances of KNN and RF are not reliable for the multiclass classification method. In order to compare the performance of these three classifiers in terms of accuracy and computational complexity, we summarize the accuracy and the computational time for training and test for each classifier in Table 7. From Table 7, it is evident that the KNN classifier has a very low training time. This is because the KNN classifier is a lazy learner that does not learn a discriminative function from the training data but memorizes the training dataset instead and, thus, is fast for the training set [37]. However, to make a prediction, it searches for the nearest neighbor (s) in the entire training set, which is a time-consuming process. The training time for KNN is 1.08 s, while it takes around 0.2 ms to predict a new scenario (one sample). On the other hand, for the RF classifier, an eager learner, the training time is around 5114 s, while the prediction process for a new case is very fast; the average time of a prediction of a new scenario is 0.0195 ms. Also, SVM, an eager learner, needs 52.05 s to be trained, whereas the prediction of a new case occurs in less than 0.2 ms. The results show that RF performs better in terms of speed of prediction for the multiclass classification, while its accuracy is not as good as the performance of SVM. The KNN classifier is shown to be the worst classifier not only in terms of the computational time in prediction but also in terms of accuracy. Consequently, the SVM classifier with a relatively low prediction time and an accuracy of 97% is the best option for multiclass classification. The results show that with the integration of the data from this low-cost radar with a relatively low angular resolution and the SVM classifier, we can count the number of passengers and determine their occupied seats with 97% accuracy in a vehicle with five seats.

2) BINARY CLASSIFICATION (OVA)
In the second step of machine learning experiments, we compare the classification accuracy of the three machine learning algorithms in the binary classification (OvA) method. In order to identify if a specific seat is occupied or not, we use the same classifiers, namely SVM, KNN, and RF, for all five seats separately.
The results of the accuracy of different classifiers for the five sets are shown in Fig. 14. Unlike the multiclass classification, in the OvA method, the accuracy of all three classifiers is more than 90% in almost all cases. It is shown that for seat #3, seat #4, and seat #6, all three classifiers perform very well. For Seat #3 and seat #4, SVM (blue color) achieves 99% accuracy, while RF (green color) and KNN (red color) show 93% accuracy. Also, the SVM and KNN have only 3% errors in identifying if seat #6 is occupied or not. Moreover, all three classifiers can identify the presence or absence of a passenger in seat#5 and seat #7 with more than 90% accuracy. In contrast, as shown in Fig. 11, the conventional method mostly counted all three people as one person in the back row.
Ultimately, SVM, RF, and KNN are 95.5%, 93%, and 92.2% accurate, respectively, using the OvA method. Moreover, compared to multiclass classification methods, the OvA scheme is extremely powerful for RF and KNN classifiers in terms of accuracy. SVM, on the other hand, produces  results that are at least as accurate as multiclass classification methods. Note that for the linear SVM, we used ''c = 1'' in the OvA classification method, while in the multiclass classification method, we used ''c = 10''. Since there is a tradeoff between accuracy and complexity for the ''c'' value in linear SVMs, ''c = 10'' requires a day to be trained. Ultimately, there was a 1% difference between the accuracy of the linear SVM classifier with ''c = 10'' compared to ''c = 1''.
To compare the two methods in terms of computational time, the training time durations and test time durations are listed in Table 8 in the OvA method for all five seats. The KNN classifier required more time to be trained compared with the multiclass classification method. Also, for all seats, it took more than 7 s to predict a new scenario. In total, it took more than 40 ms to predict occupied seats for one frame using the KNN classifier. On the other hand, for the SVM classifier, there is a significant difference among the five seats to be trained and predicted. For seat #3 and seat #4 where the situations were simpler and passengers were far away from each other, the SVM classifier requires 6.4 and 4.9 minutes to be trained, and it takes just 0.36 ms and 0.63 ms to predict if a passenger occupied seat # 3 and seat #4, respectively. Furthermore, for seat #6, the training set is done in 21.5 minutes, while a new sample is predicted in less than 2 ms. However, seat # 5 and seat #7 have training times of 206.8 minutes and 124.8 minutes, respectively. The reason for this time difference between each seat depends on the complexity of each scenario. For seat #3 and seat #4, since the passengers' seats are far away from each other, identifying the occupied seat is simpler in comparison with seat #5 and seat # 7 where the coverage is less, and passengers sit very close next to each other. In fact, finding the boundary line for each seat in the second row is computationally more complex and takes more time. To predict if seat #5 and seat #7 are occupied or not, the SVM requires 7.6 ms. As seen, prediction in OvA is less than the training set using SVM. For the RF classifier, however, for all five seats, the training sets are done in around 20 minutes. The superiority of the RF classifier in the OvA method over SVM and KNN is its testing time. For all five seats, the test set is done in less than 0.3 ms. Finally, for the OvA classification method, the total training time for KNN and RF are 5.6 s and 94.8 minutes, while SVM requires 364.2 minutes to be trained. In the binary classification method, overall, the RF classifier is the fastest classifier for the prediction of a new scenario taking less than 2 ms, while the SVM classifier needs 17.89 ms to identify occupied seats. KNN is the slowest classifier in the prediction of a new sample, taking around 40 ms. Consequently, among all these three classifiers, in the OvA method, RF has superiority over others as it performs almost as accurately as SVM does but needs less time to predict a new situation. Note that in both methods, multiclass classification and binary classification, the time taken for training sets is calculated based on 58567 samples (80% of the entire dataset), while the prediction time is the average time taken to predict a new scenario (one frame).

IV. CONCLUSION
In this paper, we investigated in-vehicle occupancy detection using a mm-wave MIMO FMCW radar. We also explored issues related to using a low angular resolution radar, which limits the visual perception of the occupant location. Although the achievable hardware resolution was low, the accuracy was significantly enhanced by defining efficient machine learning algorithms. Unlike conventional occupancy detection and people counting methods that have various parameters to be tuned and require a high-resolution radar to distinguish between targets, our proposed method was shown to be able to autonomously identify the occupied seats and count the number of passengers inside a 5-seat vehicle. We implemented two classification methods: multiclass classification and binary classification (OvA). The results revealed that, compared to the OvA method, the multiclass classification method was much faster in both training and testing sets. However, in the OvA scheme, KNN and RF performed noticeably better than multiclass classification. In terms of accuracy, a comparison between the SVM results of the two methods revealed that SVM performs very well in both classification methods, while the SVM training process was time-consuming in the OvA method.