A Paradigm Shift From an Experimental-Based to a Simulation-Based Framework Using Motion-Capture Driven MIMO Radar Data Synthesis

The development of radar-based classifiers driven by empirical data can be highly demanding and expensive due to the unavailability of radar data. In this article, we introduce an innovative simulation-based approach that addresses the data scarcity problem, particularly for our multiple-input multiple-output (MIMO) radar-based direction-independent human activity recognition (HAR) system. To simulate realistic MIMO radar signatures, we first synthesize human motion and generate corresponding spatial trajectories. From these trajectories, a received radio frequency (RF) signal is synthesized using our MIMO channel model, which considers the non-stationary behavior of human motion and the multipath components originating from the scatterers on human body segments. Subsequently, the synthesized RF signals are processed to simulate MIMO radar signatures for various human activities. The proposed simulation-based direction-independent HAR system achieves a classification accuracy of 97.83% when tested with real MIMO radar data. A significant advantage of our simulation-based framework lies in its ability to facilitate multistage data augmentation techniques at the motion-layer, physical-layer, and signal-layer syntheses. This capability significantly reduces the training workload for radar-based classifiers. Importantly, our simulation-based proof-of-concept is applicable to single-input single-output (SISO) and MIMO radars in monostatic, bistatic, and multistatic configurations, making it a versatile solution for realizing other radar-based classifiers, such as gesture classifiers.


I. INTRODUCTION A. Background
T HE generation of area-specific synthetic data has been an important topic of interest among researchers [1], [2].Device-specific or sensor-tailored simulation models help generate realistic sensory data and have been used to realize real-world solutions [3], [4].Given the increasing prevalence Sahil Waqar, Muhammad Muaaz, and Matthias Pätzold are with the Faculty of Engineering and Science, University of Agder, 4898 Grimstad, Norway (e-mail: sahil.waqar@uia.no;muhammad.muaaz@uia.no;matthias.paetzold@uia.no).
Stephan Sigg is with the Department of Communications and Networking, Aalto University, 00076 Espoo, Finland (e-mail: stephan.sigg@aalto.fi).
Digital Object Identifier 10.1109/JSEN.2024.3386221 of machine learning and artificial intelligence methodologies and applications today, the importance of the concept of device-specific synthetic data generation, as well as the significance of sensor modeling, cannot be overstated.For many sensing modalities such as magnetometer, infrared, light detection and ranging (LiDAR), sonar, and radar, data scarcity often hinders the realization of machine learningbased solutions [5], [6].Sensor-tailored simulation models mitigate the data scarcity problem by providing clean and labeled synthetic datasets for various real-world conditions.Such synthetic datasets are important to develop machine learning-based applications, e.g., medical imaging [7].Human activity recognition (HAR) [8], [9], [10], [11], [12], [13] remains an important and active research area facing the challenge of data scarcity, especially when using radio frequency (RF) sensors such as Wi-Fi [14] and radar [15], [16], [17].Furthermore, for multiple-input multiple-output (MIMO) radar systems with user-defined (required) operating parameters and antenna configurations, readily available HAR datasets are almost non-existent.Optimal radar operating conditions and antenna configurations are often not known in advance for different environmental conditions and applications.Synthetic data generation is therefore a pragmatic and promising approach to realizing radar-based classifiers, offering tremendous design control and system flexibility in a cost-effective manner.Realizing HAR systems through a simulation-based approach poses two main challenges: 1) how to synthesize human activities and 2) how to simulate singleinput single-output (SISO) and MIMO radar signatures for the synthesized human activities.Before going into further details of synthetic data generation and our proposed simulationbased approach, we first provide an overview of the relevant research in Section I-B.

B. Related Work
The ongoing miniaturization and commercialization of radar sensors, as well as many Internet of Things (IoT) sensors, have encouraged the development of human-centric applications, including HAR.Small-scale radar systems are increasingly preferred by researchers for the development of HAR systems [18], [19], gesture [20], [21], and sign language [22] recognition systems.Realizing empirical-datadriven (experimental-based) HAR systems is often very challenging due to the low availability of recorded radar datasets.Among other challenging and monotonous tasks, the development of experimental-based HAR systems requires the involvement of human subjects, an actual SISO or MIMO radar system, and the manual labeling of the recorded data.Yu et al. [23] used manually labeled point cloud data to train the HAR system, which was built upon a long short-term memory (LSTM) network.By utilizing the measured features of a millimeter wave (mm-wave) radar, Zhao et al. [24] tackled the issue of HAR in multiview settings.
Recent studies have shown that, to some extent, data augmentation techniques can reduce the scarcity of empirical data for HAR systems.For instance, a rotation-shift technique was utilized in [25] to expand the 3-D point cloud dataset.A generative adversarial network (GAN)-based data augmentation technique was adopted in [26] to create varied radar signatures of human activities.The use of the few-shot learning method was suggested by Liu et al. [27], which offers a unique way of augmenting the capabilities of pre-trained and pre-existing HAR systems.According to a recent study [28], a twostage domain adaptation approach can be used to alleviate the data scarcity issue as well.With this approach, the simulated micro-Doppler signatures can be translated into measurementlike micro-Doppler signatures by using small real datasets.Note that even with such data augmentation methods, timeconsuming and tedious data collection cannot be avoided.
Radar-based classifiers may face unique challenges in different situations and application areas, which may necessitate the adaptation of radar antenna configurations and operating conditions.This exacerbates the problem of data scarcity in radar systems because the training dataset recorded from a radar system in one scenario may not be applicable and useful in another.Therefore, the synthetic data generation is the way to realize radar-based HAR systems.To date, only a few studies have been conducted in the context of RF sensing that deal with synthetic data generation for HAR.In this regard, the utilization of motion capture (MoCap) systems [29] is an effective means of modeling and reanimating complex human motion for further motion synthesis.For passive Wi-Fi radar (PWR), Vishwakarma et al. [30] devised a system, namely SimHumalator, to generate target returns.In [31], a simulation tool was created to evaluate the radar cross section of a walking individual in close proximity.However, this technique is inadequate for reproducing detailed and complex human movements.

C. Our Approach
In this article, we present a proof-of-concept that overcomes the problems related to radar data scarcity, offers significant design control and flexibility of the radar system, and allows the simulation of unbounded, clean, and labeled radar datasets.We emulate a 2 × 2 MIMO radar system with the help of our proposed simulation-based framework to realize a simulationbased direction-independent HAR system.First, we devise an activity simulation module that synthesizes multiple types of human activities in a virtual environment by using the 3-D animation tools from the Unity [32] and MotionBuilder [33] software.An appropriate avatar or a humanoid character, equipped with multiple simulated point scatterers on its body segments, is used to reanimate MoCap data in these programs (see Section III).Subsequently, we generate spatial trajectories corresponding to all simulated point scatterers or body segments of the avatar, which effectively characterize the overall humanoid motion.
The spatial trajectories of the body segments are processed by our channel model, which simulates the received RF signal from a frequency-modulated continuous wave (FMCW) radar system for software-defined antenna positions.While simulating the raw in-phase and quadrature (IQ) components of a received baseband signal, our channel model takes into account the multipath components originating from the non-stationary simulated (real) point scatterers with distinct time-variant (TV) propagation delays (see Section IV).In the proposed channel model, the long-and short-time stationarity characteristics of the scatterers are considered in an indoor wireless propagation environment.Additionally, to train the 2 × 2 MIMO radar-based direction-independent HAR system (see Section VII), we simulated five types of multidirectional human activities by rotating the transmitter and receiver antennas of the emulated MIMO radar system (see Section V).
Unlike conventional or experimental-based designs of HAR systems, the proposed simulation-based approach is highly versatile and offers numerous advantages.Our simulation-based approach is capable of simulating diverse training datasets to meet various radar-based applications and a wide range of operational requirements.For monostatic/bistatic/multistatic SISO/MIMO radar systems, the scatterer-level modeling of moving objects in our simulation-based framework opens up new research opportunities to further fine-tune the simulated radar signatures, such as TV micro-Doppler signatures (TV radial velocity distributions) and TV range distributions (see Sections IV and VI).
For example, the TV path gains of the scatterers (simulated point scatterers) can be adjusted or optimized to improve and augment the simulated radar signatures.Moreover, the simulation-based framework provides multistage data augmentation techniques (see Section V), which allow us to generate diverse and high-quality SISO/MIMO radar datasets in a flexible and cost-effective manner.For instance, at the motion-layer synthesis data augmentation stage, various animation parameters and avatar characteristics, e.g., speed and height, can be arbitrarily varied to simulate a range of human motions.Most importantly, the proposed simulation-based framework radically reduces the workload and resources for classifier training.As our simulation-based approach is versatile, it can be easily extended to implement many other SISO/MIMO radar-based classifiers, such as air-writing gesture classification [34].

D. Contributions
The key findings and contributions of this study can be delineated as follows.
1) This research proposes a simulation-based framework to significantly minimize the data collection workload required for devising real-world radar-based HAR systems.The simulation-based framework is capable of synthesizing realistic, diverse, and clean datasets for MIMO radar systems, regardless of their configuration: monostatic, bistatic, or multistatic.Although this study focuses on a 2 × 2 MIMO radar-based direction-independent HAR system, the utility of the simulation-based framework extends beyond the HAR application, making it also valuable for other radar-based applications, e.g., sign language detection.2) We have developed a MoCap-data-driven activity simulation module that enables the synthesis of multiple types of human activities in a virtual environment.For a total of 21 simulated point scatterers placed on body segments of an avatar, the activity simulation module generates 3-D trajectories that essentially characterize the overall human motion.The training dataset for HAR incorporates simulated radar patterns, derived from software-defined avatar movements.This approach proves highly advantageous and practical as the training data is developed entirely from scratch, eliminating the need for real individuals and an actual MIMO radar system.7) For the 2 × 2 MIMO radar framework, we realized a simulation-based HAR system by employing a deep convolutional neural network (DCNN).The system employed multiperspective simulated radar signatures as input features.To showcase the practical applicability of our simulation-driven HAR system, we evaluated its performance using actual mm-wave radar data collected from actual individuals.Our simulation-based multiperspective HAR system achieved an impressive classification accuracy of 97.83%, providing compelling evidence for its effectiveness.

E. Article Organization
The article is divided into eight sections.Section II deals with the system design and the general structures of the conventional and the proposed approaches.Human MoCap and synthesis techniques are presented in Section III.Section IV details channel modeling and simulation.Multistage data augmentation approaches are elucidated in Section V. Section VI discusses the generation of MIMO radar signatures.Section VII presents the design, training, and testing phases of our simulation-based direction-independent HAR system.Finally, we conclude our research in Section VIII.

II. SYSTEM DESIGN
In this section, we discuss a conventional experimentalbased design of a HAR system and the proposed simulation-based realization of a HAR system.We also discuss Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
problems of conventional HAR systems and how the proposed end-to-end simulation framework resolves them.Note that SISO radar-based HAR systems struggle to classify multidirectional human activities [35], [36].To classify different types of multidirectional human activities, we need multiple radar subsystems illuminating the environment from different perspectives.Therefore, in this article, we consider multidirectional human activities recorded by a multiperspective distributed MIMO radar system.

A. Conventional Experimental-Based Designs of HAR Systems
In radar sensing, state-of-the-art experimental-based HAR systems [9], [13], [14], [15], [17], [18], [19], [23], [24], [36] generally face challenges, such as data scarcity and their adaptability to environmental conditions.As an example of state-of-the-art experimental-based designs, we considered a direction-independent HAR system implemented with a mmwave 2 × 2 MIMO radar system, as shown in Fig. 1(a).In Fig. 1(a), Radar i represents the ith radar subsystem of the distributed MIMO radar system, A T x i is the ith transmitter antenna, and A R x i is the ith receiver antenna for i = 1, 2. Note that the two horn antennas, namely A T x i and A R x i , are arranged in a monostatic configuration for Radar i .In the conventional experimental-based HAR system of Fig. 1(a), six human subjects performed the following types of multidirectional activities: falling on a mattress, walking, standing up from a chair, sitting down on a chair, and picking up an object from the floor.
The distributed MIMO radar system simultaneously illuminates the human subject from two aspect angles and generates the corresponding raw IQ data, as shown in Fig. 1(a).Then, the radar signal processing block (see Section VI) generates the TV micro-Doppler signatures or, equivalently, the TV radial velocity distributions for Radar 1 and Radar 2 .These recorded radar signatures (TV radial velocity distributions) are accumulated to create a real radar dataset.In conventional experimental HAR systems, the real radar dataset is usually divided into a training subset and a testing subset to train and test these HAR systems, respectively.However, for this research, we only use the experimentally obtained radar dataset to test our proposed simulation-based HAR system (see Fig. 2).
Similar to any multiclass classifier, radar-based HAR systems require extensive amounts of recorded data for their training.However, unlike other sensing modalities such as cameras, radar systems often suffer from data scarcity.To experimentally design a HAR system, real human subjects must perform various types of activities in front of the MIMO radar system in multiple directions.These requirements make data collection time-consuming and costly.Additionally, the recorded radar training dataset usually cannot be reused for different antenna configurations and operating conditions.For instance, changing the position of a transmitter or a receiver antenna of the MIMO radar system can invalidate the entire recorded training dataset.

B. Simulation-Based Design of HAR Systems
In this article, we propose a feasible alternative to overcome the aforementioned limitations of radar-based classifiers, particularly with regard to the scarcity of radar data.To develop real-world HAR systems, we propose a comprehensive simulation-based framework that utilizes MoCap systems to synthesize realistic MIMO radar data, as depicted in Fig. 1(b).The objective is to generate a simulated MIMO radar-based training dataset by seamlessly simulating a large number of realistic MIMO radar signatures without real human subjects and a physical radar system.
The block diagram in Fig. 1(b) provides a general overview of the proposed end-to-end simulation framework for HAR systems.In Fig. 1(b), the activity simulation module synthesizes the five types of human activities in the 3-D space from motion data collected by the MoCap systems (see Section III).The activity simulation module simulates 3-D trajectories corresponding to different body segments of an avatar, e.g., head, neck, torso, and upper and lower extremities.To simulate the human activities in multiple directions as shown in Fig. 1(b), we rotate the positions of the transmitter antenna A T x i and receiver antenna A R x i in our simulation-based framework for i = 1, 2 (see Section V-B).For a desired antenna configuration of the MIMO radar system, our channel simulation module first transforms the 3-D trajectories into TV propagation delays.Then, the channel simulation module generates realistic RF or raw IQ data for the simulated TV propagation delays and a set of scatterer weights.Eventually, the radar signal processor arranges the simulated raw IQ data in the fast-and slow-time domain and processes it to simulate realistic radar signatures, i.e., range distribution, radial velocity distribution (micro-Doppler signature), and mean velocity (mean Doppler shift).
We synthesize numerous examples of the five types of human activities, simulate the corresponding radial velocity distributions (micro-Doppler signatures), and store them in our simulated radar dataset, as shown in Fig. 1(b).The proposed simulation-based framework has no limits on the generation of simulation data.The simulated radar dataset is used to train the simulation-based HAR system, which is based on a DCNN architecture.To demonstrate the practical importance and the generalizability of this simulation-based framework, we need to evaluate its performance in a real scenario.Therefore, the proposed simulation-based HAR system is evaluated on a previously unseen real radar dataset acquired with a mm-wave distributed MIMO radar system and real human subjects, as shown in the testing phase of Fig. 2. Note that we used an identical radar signal processing block in Fig. 1 because the simulated and real RF signals are structurally indistinguishable.More details on each block of the simulation-based HAR system are provided in the following sections.

III. HUMAN MOCAP AND SYNTHESIS
This section explores several ways of capturing and synthesizing human motion.First, biomechanical modeling and its limitations will be briefly discussed.Second, wearable Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.sensors as a means of MoCap systems are briefly mentioned.Third, we discuss optical MoCap systems such as Mixamo [37] and Qualisys [38].It is important to highlight that the proposed simulation-based framework allows incorporating synthesized or recorded motion data from diverse sources such as biomechanical, wearable, and optical MoCap systems.Lastly, we explain the process of generating 3-D trajectories of human body segments using software such as Unity [32] and Autodesk's MotionBuilder [33].These software programs (3-D animation tools) help us augment the motion data at the motion-layer synthesis.

A. Biomechanical Modeling of Human Body Segments
The utility of biomechanical modeling [39] for human body segments is undeniable, yet its complexity is inherently high, primarily due to the intricate nature of the human body.Also, it is difficult to develop generalizable biomechanical models because individuals differ in physiology, anatomy, and motor function.Moreover, the interaction between the human body and the environment can further increase the complexity of a biomechanical model.
Obtaining high-fidelity motion data of human body segments can be more feasible and accessible through MoCap repositories and systems such as Mixamo and Qualisys.In addition, the Unity and MotionBuilder software provide a cost-effective and pragmatic alternative to biomechanical modeling, enabling the seamless and dynamic simulation of new motion data in a virtual environment.Therefore, we use MoCap systems to capture the human motion and employ 3-D animation tools from MotionBuilder and Unity to synthesize and subsequently augment human motion.

B. Wearable MoCap Systems
Wearable MoCap systems offer a versatile and cost-effective solution for capturing human movement data.The sensors, typically accelerometers and gyroscopes, are often integrated into garments to capture data on the orientation and acceleration of body segments.In this area, Rokoko Smartsuit Pro [40] is a viable choice with multiple inertial sensors for real-time tracking of an individual's skeletal movements.It facilitates seamless transfer of motion data to various applications such as sports, biomechanical analysis, and virtual reality.Compared to optical MoCap systems, wearable MoCap systems have limitations in terms of accuracy.Additionally, wearable MoCap systems can suffer from magnetic interference, which can affect the precision of the MoCap data.

C. Optical MoCap Systems
We used Mixamo and Qualisys optical MoCap systems to capture motion data for human activities.Mixamo

is an
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
online platform that offers an extensive selection of readily available MoCap data captured from real performers [41].Our Qualisys MoCap system was based on six Miqus M3 cameras connected in a daisy chain, capable of tracking passive reflective markers placed on a subject at 340 frames/s.The Qualisys MoCap system includes proprietary Qualisys track manager (QTM) software that provides an interface for tasks such as camera configuration and calibration, session setup and organization, marker-set definition, and MoCap measurements.Furthermore, QTM offers a suite of tools for marker labeling, data processing, analysis, and the export of MoCap data, thereby enabling seamless integration with thirdparty software.The camera system was calibrated according to the QTM guidelines to ensure accurate tracking of the markers and capturing their position and orientation in 3-D space.Next, 41 passive reflective markers were attached to a full body suit.The participant wore the suit, and we recorded a MoCap trial to generate an automatic identification of markers (AIMs) model.This model applies computer vision, localization, and motion estimation techniques to detect and track markers, facilitating an automated workflow for identifying and labeling markers.Once the AIM model was created, the skeleton solver function of QTM was used to calibrate the skeleton based on the marker positions.Next, a person's motion data was recorded for four activities: normal walking, standing up from a chair, sitting down onto a chair from a standing position, and picking up a small object from the floor.The recorded skeleton data was then exported in the Filmbox (FBX) file format and further processed in the MotionBuilder software.Note that for the falling activity, the MoCap data was relatively difficult to collect due to markers attached to the body.Therefore, we obtained MoCap data of the falling activity from Mixamo [37], a freely accessible online platform.In the next step, we import the acquired MoCap data into specialized software such as Unity or MotionBuilder, which are equipped with powerful tools that allow for the creation of comprehensive, meticulous, and lifelike 3-D animations.

D. 3-D Trajectories of Human Body Segments
By using the basic MoCap data and the 3-D animation tools, we synthesized, augmented, and visualized five human activities: falling on the floor, walking in an indoor environment, standing up from and sitting down on a chair, and picking up an object from the floor.Initially, the human activities were simulated and varied in a single direction or at an aspect angle of 0 • with the help of 3-D animation tools, as shown in Fig. 1(b).However, we also needed to synthesize multidirectional human activities to realize a simulated MIMO radar-based direction-independent HAR system.Instead of using 3-D animation tools, we simulated multidirectional human activities more conveniently and efficiently by spatially rotating the transmitter and receiver antennas of the radar subsystem, Radar i (see Section V-B).
Following the synthesis of the human movement, we extract the spatial trajectories corresponding to each body segment of the avatar.To track the different body segments, 21 simulated point scatterers were placed on the avatar (see Fig. 3), these model the actual body scatterers that backscatter the transmitted RF signal to the receiver antennas of the 2 × 2 distributed MIMO radar system.We recorded the TV positions (trajectories) of the simulated point scatterers in the 3-D space for the simulated human activities.For example, the 3-D trajectories of the simulated point scatterers for a simulated walking activity are shown in Fig. 3.
At the outset, only 34 MoCap files were recorded, each representing one of the five distinct types of human activities.We visualized these activities using the Unity and Motion-Builder 3-D animation tools, and computed the corresponding 3-D trajectories.To expand the total number of synthesized human activities to 84, we applied data augmentation at the motion-layer synthesis using the Unity and MotionBuilder software (see Section V-A).Subsequently, we processed the 3-D trajectories in MATLAB for further data augmentation at the physical-and signal-layer syntheses.Although data augmentation at the motion-layer synthesis may require some attention to motion details, the physical-layer synthesis and signal-layer synthesis data augmentation stages in the proposed simulation-based framework are fairly automated.With the help of such multistage data augmentation techniques, we generated 2826 micro-Doppler signatures (TV radial velocity distributions) for each radar subsystem of the multiple-input multiple-output (MIMO) radar system.Section V provides more details on the multistage data augmentation techniques furnished by the proposed simulation-based framework.

IV. CHANNEL MODELING AND SIMULATION
In this section, we first present a geometrical 3-D indoor channel that models an indoor propagation scenario using the proposed simulation-based framework (see Fig. 3).Second, we investigate the multipath components caused by non-stationary simulated (real) point scatterers on avatar (human) body segments and simulate the corresponding TV propagation delays for a human activity.Lastly, we explain how the simulated propagation delays can be used to synthesize a received RF signal, specifically for an FMCW 2 × 2 MIMO radar system.

A. Geometrical Channel Model
We model and simulate a 3-D channel for an indoor environment, which consists of a 2 × 2 distributed MIMO radar system, a moving person, and stationary miscellaneous items such as furniture and electronics, as illustrated in Fig. 1(a).Recall that Radar i represents the ith radar subsystem of the distributed MIMO radar system, A T x i is the ith transmitter antenna, and A R x i is the ith receiver antenna for i = 1, 2. Let [•] ⊤ denote the vector transpose operation.Then, the position of the ith transmit (receive) antenna A T x i (A R x i ) of the 2 × 2 MIMO radar system is represented by ), as illustrated in Fig. 3.A virtual propagation environment that resembles a real geometrical 3-D indoor channel is depicted in Fig. 3.In a real propagation environment, a moving human subject has countless non-stationary scatterers.For this research, we model these non-stationary bodily scatterers with L = 21 non-stationary simulated point scatterers on a moving avatar, as shown in Fig. 3.Moreover, in Fig. 3, C l (t) = [x l (t), y l (t), z l (t)] ⊤ denotes the TV spatial trajectory of the lth marker S (l) , d T x l,i (t) (d R x l,i (t)) represents TV Euclidean distance between the lth marker S (l) and the ith transmit antenna A T x i (receive antenna A R x i ), where i = 1, 2 and l = 1, 2, . . ., L. For the lth marker S (l) and the ith radar subsystem Radar i , the TV radial distance d l,i (t) is equal to one-half of the overall propagation distance, i.e., d l,i (t) = (d T x l,i (t)+d R x l,i (t))/2.Fig. 3 shows that the antenna configuration {C T x i , C R x i } of the ith radar subsystem, Radar i , follows a monostatic configuration, where C T x i = C R x i for i = 1, 2. This leads to the following simplification: The obtained TV radial distances d l,i (t) of the L non-stationary simulated point scatterers play an important role in simulating the TV propagation delays τ (l) i (t), as explained in Section IV-B.

B. Modeling of Multipath Components Caused by Human Body Segments
RF signals generally experience multipath propagation, particularly in indoor environments with numerous stationary and non-stationary reflective objects.In Fig. 3, the transmitted RF signal takes on multiple propagation paths, traveling from the transmitter antenna to the receiver antenna via multiple real (simulated) point scatterers on the human (avatar) body segments.Recall that in our simulation-based framework, the 21 simulated point scatterers on the avatar's body segments basically model the actual bodily scatterers that scatter the transmitted RF signal back to the receiver antennas of the 2 × 2 distributed MIMO radar system.For this study, by virtue of the cross-channel interference mitigation technique [16], we assume that the two radar subsystems, Radar 1 and Radar 2 , of Ancortek's mm-wave radar system do not interfere with each other.
In the proposed simulation-based framework, we only consider multipath components originating from the L = 21 non-stationary dominant and non-dominant scatterers located on various body segments of the avatar, as shown in Fig. 3.The multipath components originating from stationary dominant scatterers, such as walls, furniture, and floor, are excluded from the analysis because they are easily filtered out through signal preprocessing.Moreover, the bistatic components of the 2 × 2 distributed MIMO radar systems are not considered for this study.However, if required, the bistatic components of the 2 × 2 distributed MIMO radar system can be easily simulated in the proposed simulation-based framework.
The receiver antennas receive the multipath components, or multiple copies of the transmitted RF signal, with distinct TV propagation delays τ (l) i (t).For Radar i , the lth TV propagation delay τ (l) i (t) is related to the lth TV radial distances d l,i (t) according to the relation τ (l) i (t) = 2d l,i (t)/c 0 , where c 0 is the speed of light.Within the framework of radar sensing, the synthesized motion is completely characterized by the simulated TV propagation delays τ (l) i (t), as explained in Section IV-C.
For the five distinct types of simulated human activities and Radar 1 , Fig. 4 shows the simulated TV propagation delays τ    1 (t), it is evident that the simulated walking activity comprised four steps toward Radar 1 .In contrast, the TV propagation delays τ (l) 1 (t) in Fig. 4 for the other three types of simulated human activities in place, namely sitting, standing up and picking up an object, show smaller variations corresponding to the mobility of the simulated point scatterers.

C. Channel Modeling for RF Sensing
This section elucidates the simulation of a composite RF signal or equivalently, raw IQ data in fast time t ′ and slow time t, corresponding to a specific motion.To simulate the composite RF signal of Radar i , we need user-defined scatterer weights, a user-defined antenna configuration {C T x i , C R x i }, and the simulated TV propagation delays τ (l) i (t) corresponding to the spatial trajectories of the simulated point scatterers for a specific motion or a human activity (see Figs. 1(b) and 4).For this study, we consider the L bodily scatterers to be long-time non-stationary over the slow time t, and short-time stationary over a limited chirp duration T sw [42].In the following, for the FMCW 2 × 2 distributed MIMO radar system placed in the indoor wireless channel, we synthesize the complex baseband signal called the composite beat signal s b,i (t ′ , t) [43], where i = 1, 2. Additionally, we discuss an interpolation procedure that is integral to our channel-simulation module of Fig. 1(b), as it mitigates the issues of aliasing in the Doppler domain.
FMCW radar systems operate by repetitively emitting a chirp waveform c(t ′ ) [44], which is scattered back to the receiver antenna by multiple stationary and non-stationary scatterers present on the human body segments and other objects in the environment.A quadrature mixture element integrated into the receiver chain of the FMCW 2 × 2 dis-Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.tributed MIMO radar system is responsible for transforming the incoming passband RF signal into complex baseband (composite beat) signal s b,i (t ′ , t).The received complex baseband signal s b,i (t ′ , t) is sampled in the fast-time domain by the analog-to-digital converter (ADC) module of the receiver with the discrete sampling interval T s in the fast-time domain.Subsequently, for the coherent processing interval (CPI) of the ith radar subsystem, Radar i , the discrete samples of the received complex baseband signal s b,i (t ′ , t) are organized in fast-and slow-time domains.During the CPI, the phase of Radar i is preserved.This organization or rearrangement of the discrete fast-and slow-time samples results in the radar's raw IQ data matrix D i [42], which can be expressed as where N c represents the number of chirps present within the CPI of the FMCW radar system.We want to synthesize the actual received complex baseband signal s b,i (t ′ , t) of the FMCW 2 × 2 distributed MIMO radar system, so that we can simulate the radar's raw IQ data matrices D i for i = 1, 2. The received complex baseband signal s b,i (t ′ , t) of Radar i can be synthesized by adding up the L distinct beat signals s (l) b,i (t ′ , t) [42], [43], each corresponding to the lth multipath component originating from the lth simulated point scatterer, i.e., For Radar i , the lth beat signal s (l) b,i (t ′ , t) or the lth multipath component can be simulated by using the expression [42] s where a b,i (t), and φ (l) i (t) denote the TV path gain, beat frequency, and phase of the lth beat signal s (l) b,i (t ′ , t), respectively, and δ(•) denotes the Dirac delta function.The symbol T n in (3) represents the nth discrete slow-time instance, which is determined by the chirp duration T sw , such that T n = nT sw , where n is a non-negative integer.Let γ represent the slope of the chirp signal.Then, the lth TV beat frequency f , where f 0 is the carrier frequency.
The TV path gain a i .In this study, for the five types of synthesized human activities, the values of the time-invariant path gains a (l) i are adjusted by investigating the actual TV radial velocity distributions p i (v, t) (see Section VI) and the body surface area [45].It is worth noting that by using different sets of time-invariant path gains, we can augment the radar data at the signal-layer synthesis for a synthesized human activity (see Section V-C).
We consider the L bodily scatterers to be long-time nonstationary over the slow time t, and short-time stationary over the fast time t ′ for a limited chirp duration T sw [42].Thus, the TV propagation delays τ (l) i (t), beat frequencies f (l) b,i (t), and phases φ (l) i (t) of the L simulated point scatterers are only a function of the slow time t.For Radar i and the kth slowtime instant t k [kth row of the raw IQ data matrix D i in (1)], the short-time stationarity assumption simplifies the synthesis of the discrete complex baseband signal s b,i (t ′ , t k ) for a synthesized human activity.At the slow-time instant t k , the IQ components of the complex baseband signal s b,i (t ′ , t k ) can be digitally simulated as a sum of tone signals, i.e., s b,i (t b,i (t ′ , t k ), where the lth tone signal s Within the framework of radar sensing, the synthesized motion can be completely characterized by the simulated TV propagation delays τ (l) i (t) of the L simulated (real) point scatterers.The L TV propagation delays τ (l) i (t) are computed from the TV spatial trajectories C l (t) of the L simulated point scatterers, which are animated with a fixed frame interval denoted by T f .Therefore, the frame interval T f is the slow-time sampling interval of the simulated TV spatial trajectories C l (t) and the propagation delays τ (l) i (t).In actual radar systems, the slow-time sampling interval is equal to the radar's pulse repetition interval (PRI), which is smaller (better) than the frame interval T f .Concretely, for the actual (simulated) raw IQ data matrix D i in (1), the slow-time sampling interval T sw is equal to the radar's PRI (frame interval T f ).Thus, to ensure that the simulated frame interval T f is equal to the radar's PRI, we interpolate the spatial trajectories or the simulated TV propagation delays τ (l) i (t) in our simulation framework.This is necessary because the upper limit of the actual (synthesizable) radial velocity, denoted by v max (v ′ max ), is determined by the radar's PRI (animation's frame interval T f ).Let λ denote the wavelength, then we have v max = λ/(4 • PRI), and v ′ max = λ/(4T f ).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

V. MULTISTAGE DATA AUGMENTATION
In this section, we explore multistage data augmentation techniques (see Fig. 5) provided by the proposed simulation-based framework that allow us to simulate large quantities of quality radar signatures.First, we discuss a motion-layer data augmentation technique, where various animation parameters and avatar characteristics, e.g., size and speed, can be randomly varied to synthesize a variety of human motions.We then explain data augmentation of the physical layer that allows us to vary numerous physical-layer configurations and the radar's operating parameters, e.g., number of antennas and their setup and PRI.Lastly, we delve into a data augmentation technique at the signal-layer synthesis.

A. Motion-Layer Synthesis
For the five types of distinct human activities, we acquired a small and basic MoCap dataset from the Mixamo platform and the Qualisys MoCap system.A person with a height of about 1.74 m performed the activities several times in a room equipped with the Qualisys MoCap system.The MoCap dataset we acquired comprised only 34 MoCap files, each representing one of the five types of activities.The 3-D animation tools from both Unity and MotionBuilder software were used to visualize the basic MoCap data for the human activities.We complemented the basic MoCap data with the 3-D animation tools to render realistic and diverse motion data.
In this study, one of our objectives is to synthesize a large amount of data representing real human motions at the motion-layer synthesis of our simulation-based framework.To this end, we first adjusted the height of the avatar in the MotionBuilder software by reducing it to 1.52 m (5 ft) and increasing it to 1.83 m (6 ft).We then aligned the MoCap data to the avatars with different sizes to account for the effects of avatar dimensions and extended the data on the motionlayer synthesis.Therefore, in the Unity and MotionBuilder software, the total number of synthesized human activities were increased to 84 by applying data augmentation at the motion-layer synthesis stage, as indicated by Fig. 5.Note that we can synthesize complex, varied, and entirely new sequences of human movements by using the blend tree animation tool in the Unity software that facilitates seamless transitions between multiple humanoid animations.For the augmented human-motion data (synthesized human activities), we computed TV spatial trajectories (see Section III-D) and imported them into MATLAB for further data augmentation at the physical-and signal-layer syntheses (see Fig. 5).

B. Physical-Layer Synthesis
The simulation-based framework allows the adjustment of the radar operating parameters and physical-layer configurations, e.g., PRI, carrier frequency f c , bandwidth B w , and antenna configuration {C T x i , C R x i }.Through these adjustments, it is possible to both extend the simulated radar data and simulate specific scenarios.At the physical-layer synthesis data augmentation stage, appropriate antenna configurations {C T x i , C R x i } were chosen to simulate the two radar subsystems, Radar 1 and Radar 2 , as shown in Fig. 6.To maintain consistency with the actual 2 × 2 distributed MIMO radar system depicted in Fig. 1(a), the emulated radar system's operating parameters, such as PRI, carrier frequency f c , and bandwidth B w , were kept the same.
We first simulated different positions of the radar subsystems, Radar 1 and Radar 2 , by using the rotation matrix R y (θ Ri ), which can be expressed as [46] where θ Ri denotes the clockwise angular rotation along the y-axis for Radar i and i = 1, 2. Initially, the simulated radar subsystems, Radar 1 and Radar 2 , were placed at In other words, Radar 2 can be simulated by simply rotating Radar 1 counterclockwise by 90 • along the yaxis, as illustrated in Fig. 6.Using this method, we emulated a 2 × 2 distributed MIMO radar system, similar to the actual radar system in Fig. 1(a), to simulate the MIMO radar signatures.Note that, with the use of the rotation matrix R y (θ Ri ), any number of radar subsystems, sensors, or nodes can be simulated at the physical-layer synthesis data augmentation stage.
Recall that the human activities were initially simulated with 3-D animation tools in a single direction or at an aspect angle of 0 • .However, to develop a simulated MIMO radar-based direction-independent HAR system, we required multidirectional human activities.Compared to the motion-layer synthesis, the required multidirectional human activities can be simulated more easily and efficiently at the physical-layer synthesis data augmentation stage.The multidirectional human activities are simulated by spatially rotating the transmitter and receiver antennas of the radar subsystem Radar i , for i = 1, 2. The angular difference between the two radar subsystems is always kept at 90 • , i.e., θ R1 − θ R2 = 90 • , as depicted in Fig. 6.The different rotations of Radar 1 and Radar 2 (θ R1 , θ R2 ) correspond to the different directions of the human activities, where (θ R1 , θ R2 ) ∈ [−180 • , 180 • ).We simulated 18 different directions of the human activities at the physical-layer synthesis data augmentation stage, namely Direction 1 to Direction 18, as illustrated in Fig. 6.For instance, for a human activity, Direction 11 in Fig. 6 corresponds to the scenario, where To summarize, at the physical-layer synthesis, we first simulated the two radar subsystems, Radar 1 and Radar 2 , to emulate the 2 × 2 distributed MIMO radar system.Second, by using the rotation method, we simulated the multidirectional human activities by simultaneously rotating the two radar subsystems, as illustrated in Fig. 6.Thus, our proposed simulation-based framework includes a physical-layer synthesis data augmentation stage, which efficiently and conveniently transforms and augments unidirectional motion data into multidirectional motion data and single radar data into multiple radar data.

C. Signal-Layer Synthesis
The signal-layer synthesis data augmentation stage of the proposed simulation-based framework allows to simulate real-Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.istic and diverse TV radial velocity distributions p i (v, t) (micro-Doppler signatures) for a single human activity.Using (3), we can simulate numerous multipath components corresponding to the stationary and non-stationary scatterers in the received complex baseband signal s b,i (t ′ , t) [see (2)].In this research, multipath components originating from stationary scatterers, such as walls and furniture, are not considered, as they can be effectively filtered out during the signal preprocessing stage.However, if necessary, the signal-layer synthesis can simulate numerous complex propagation scenarios, e.g., those with or without radar clutter, by adjusting the path gains a b,i (t ′ , t) for Radar i .For the five types of synthesized human activities, we first adjusted the values of the time-invariant path gains a (l) i by looking into the actual radar signatures (TV radial velocity distributions p i (v, t) [see Section VI)] and the body surface area [45].Subsequently, we augmented the simulated radar signatures by varying the power levels (time-invariant path gains a (l) i ) of the individual multipath component.Therefore, at the signal-layer synthesis of the proposed simulation-based framework, we augmented the radar data by using different sets of time-invariant path gains a (l) i for the five types of synthesized human activities.
In this section, we discussed three data augmentation techniques implemented at multiple layers of the proposed simulation-based framework: the motion-layer synthesis, physical-layer synthesis, and signal-layer synthesis.By applying these multistage data augmentation techniques, we simulated 2826 TV radial velocity distributions p i (v, t) (micro-Doppler signatures) for each radar subsystem of the 2 × 2 MIMO radar system.In other words, for the two radar subsystems, Radar 1 and Radar 2 , a total of 5652 TV radial velocity distributions p i (v, t) were simulated.To conclude, the multistage data augmentation methods in the proposed simulation-based framework are quite useful and they allowed for increased variability, realism, and diversity in the simulated radar dataset.With these methods, we were able to transform and augment the basic motion data (34 MoCap files) into 5652 radar signatures, which indicates the utility of the proposed simulation-based approach for realizing radar-based classifiers.

VI. MIMO RADAR SIGNATURES
In this section, we delineate the radar signal processing module of Fig. 1 that generates the MIMO radar signatures: range distribution, TV radial velocity distribution p i (v, t) (micro-Doppler signature), and mean velocity (mean Doppler shift).For i = 1, 2, the radar signal processing module transforms the actual and the simulated complex baseband signals s b,i (t ′ , t) into the TV radial velocity distributions p i (v, t).The first step is to compute the beat frequency function S b,i ( f b , t) as [47] S b,i ( f b , t) = where f b refers to the beat frequency.Let f and f b,max denote the Doppler frequency and maximum beat frequency, respectively.Then, the micro-Doppler signatures S i ( f, t) are obtained from the beat frequency function S b,i ( f b , t) according to the relation [35] where t ′′ denotes the running time, and W r (•) denotes a rectangular window function that spans over 64 chirp intervals.
According to [43], the TV radial velocity distribution p i (v, t) can be obtained as where v denotes the radial velocity.From the TV radial velocity distribution p i (v, t) in (7), we can compute the TV mean radial velocity vi (t) as [43] vi For Radar i , the TV beat-frequency signatures S ′ i ( f b , t) can be computed as where PRF is the pulse repetition frequency of the radar system, and i = 1, 2. Finally, for the 2 × 2 MIMO radar system, the TV range distribution p ′ i (r, t) can be obtained as [42] p ′ i (r, t) = Recall that the real (simulated) point scatterers on the human (avatar) body segments, each with unique TV radial velocity components, scatter the transmitted RF signal back to the receiver antennas of the 2 × 2 distributed MIMO radar system.For Radar i and the L distinct non-stationary real (simulated) point scatterers, the TV radial velocity distribution p i (v, t) in (7) indicates the strengths of the radial velocity components over the slow time t (see Fig. 7).The TV mean radial velocity vi (t) in ( 8), obtained from the TV radial velocity distribution p i (v, t), shows the weighted average of the velocity components of all L real (simulated) bodily scatterers over the slow time t (see Fig. 8).Moreover, the strengths of the TV radial distances of all L non-stationary real (simulated) point scatterers over the slow time t are provided by the TV range distributions p ′ i (r, t).Due to the current practical limitations of radar systems, the TV range distributions p ′ i (r, t) are not usually used to realize HAR systems, so their simulation results are omitted for brevity.However, for completeness and possible future applications, we have included the expression in (10) to simulate the TV range distribution p ′ i (r, t).In Section V, we saw that multidirectional human activities can be simulated by simultaneously rotating the two radar subsystems, Radar 1 and Radar 2 , as shown in Fig. 6.For some of the 18 directions and all five types of simulated (actual) human activities, the simulated (actual) TV radial velocity distributions, p 1 (v, t) and p 2 (v, t), are shown in Fig. 7(a) [Fig.7(b)].The images of the simulated (actual) TV radial velocity distributions, p 1 (v, t) and p 2 (v, t), were used to train (test) the proposed 2 × 2 MIMO radar-based directionindependent HAR system.In Section VII, the two colored images of the TV radial velocity distributions, p 1 (v, t) and p 2 (v, t), will serve as input feature maps to the HAR system.Moreover, for the five types of human activities and the two radar subsystems, Radar 1 and Radar 2 , the simulated and actual TV mean radial velocities vi (t) are depicted in Fig. 8.The utility and effectiveness of the proposed simulation-based framework is evident from the high-fidelity simulated radar signatures, which are quite similar to the actual radar signatures, as exemplified by Figs. 7 and 8.
To quantitatively assess the similarity between simulated and real radar signatures, we employ the dynamic time warping (DTW) algorithm [48].Table I presents the normalized DTW distances between the real and simulated TV mean  radial velocities vi (t) from Fig. 8 across five human activities.Remarkably, the DTW distance metric is minimized for each activity, indicating close resemblance between the simulated and real radar signatures.For example, for the sitting activity, a DTW distance of 0.01 between the simulated and real TV mean radial velocities vi (t) demonstrates precise replication of this pattern.This consistent trend across all activities confirms the accuracy of our approach in simulating realistic radar data.

VII. SIMULATION-BASED HAR SYSTEM
This section elucidates the training and testing phases of our simulation-based direction-independent HAR system that was realized by using a DCNN-based multiclass classifier.First, we look into the design of the HAR classifier and its training with the simulated radar dataset.Then, to demonstrate the practical importance and the generalizability of our proposed simulation framework in real-world scenarios, we used a real 2 × 2 MIMO radar dataset to evaluate the classification performance of the trained simulation-based direction-independent HAR system.

A. Design of the Simulation-Based HAR System
To develop a simulation-based HAR system, we first created a large labeled dataset of simulated radar signatures.For Radar i (i = 1, 2) of the 2 × 2 MIMO radar system and the five types of humanoid activities, we simulated 2826 TV radial velocity distributions p i (v, t) by using the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.proposed multistage data augmentation techniques of our simulation-based framework (see Section V).Thus, the simulated radar dataset consisted of a total of 5652 simulated TV radial velocity distributions p i (v, t), which were used to train the proposed simulation-based direction-independent (multiperspective) HAR classifier.The simulation-based direction-independent HAR system comprises two feature extraction networks (FENs) and a multilayer perceptron (MLP) network.Fig. 9(a) illustrates the FEN that computes relevant features from the simulated (actual) TV radial velocity distributions p i (v, t) during the training (testing) phase for the ith radar subsystem, Radar i .It consists of four convolutional layers, containing 64, 72, 80, and 96 2-D trainable kernels with dimension k d either equal to 4×4 pixels × 3 pixels.Each 2-D kernel uses the rectified linear unit (ReLU) activation function to avoid the problem of vanishing gradients [49].The max-pool layers were employed to reduce redundancies in the feature maps.To avoid overfitting the training data, we used dropout layers with the dropout rates of 10% and 15% for the FEN and MLP, respectively.The flatten layer of our FEN rearranges the extracted features into a vector of order 18816 × 1, as shown in Fig. 9(a).
The two FENs in the DCNN-based multiperspective HAR system are identical, as shown in Fig. 9(b).As Radar 1 and Radar 2 illuminate the indoor environment from multiple perspectives, the extracted features from the two TV radial velocity distributions, p 1 (v, t) and p 2 (v, t), are merged by the multiperspective feature fusion block, as shown in Fig. 9(b).Subsequently, based on the received multiperspective features, the MLP network is trained to detect the type of the human activity.The multiperspective feature fusion block enables the HAR classifier to recognize the human activities regardless of their directions.Note that the design of this multiperspective deep neural network closely resembles the architecture reported in [35].To train the parameters of our DCNN-based multiperspective HAR classifier, we used the adaptive moment estimation (Adam) optimizer [50] and the simulated radar signatures of multidirectional human activities.The training dataset, comprising 2826 pairs of simulated TV radial velocity distributions p i (v, t), was further divided into training and validation subsets in an 80 : 20 ratio.During the training phase, our DCNN-based multiperspective HAR classifier showed no signs of overfitting, as demonstrated by the training and validation curves in Fig. 10.

B. Testing of the Simulation-Based HAR System
To evaluate the performance of the trained 2 × 2 MIMO radar-based multiperspective HAR classifier in a real-world setting, we used a real radar dataset recorded by Ancortek  SDR-KIT 2400T2R4, as shown in Fig. 2. The operating parameters and antenna configurations of the real and the simulated 2 × 2 MIMO radar systems were kept similar for consistency.Specifically, we set the PRI, carrier frequency f c , and bandwidth B w of the real and simulated MIMO radar systems to 0.5 ms, 24.125 GHz, and 250 MHz, respectively.For Radar 1 and Radar 2 , the antennas were placed at ] ⊤ , respectively.A total of 875 multidirectional human activities were recorded with the 2 × 2 MIMO radar system from six human subjects, including a female participant.Thus, the real radar dataset consisted of 1750 TV radial velocity distributions p i (v, t) (micro-Doppler signatures) for the two radar subsystems, Radar 1 and Radar 2 .As a direct result of this extensive measurement campaign, the simulation-real (training-testing) data ratio approximately came out to be 76: 24.Our simulation-based framework enabled the realization of the simulation-based direction-independent HAR system, which exhibited remarkable performance and efficacy in the real world, as demonstrated by the confusion matrix in Fig. 11.For each of the five types of multidirectional human activities, the number of correct classifications is represented

TABLE II COMPARING THE CLASSIFICATION PERFORMANCE OF STATE-OF-THE-ART RF-BASED HAR APPROACHES
by the first five diagonal entries of the confusion matrix.The green colored entries in the last row and column exhibit the precision and recall [51] in Fig. 11.Finally, the white colored entry of the confusion matrix shows the overall classification accuracy of our simulation-based direction-independent HAR system, which is 97.83%.As our test dataset was sufficiently balanced, the macro-average F1-score [52] came out to be approximately 97.6%, which is close to the overall classification accuracy.
For RF-based HAR systems, asserting the superiority of one method proves challenging, given their tailored design to address diverse research challenges.Nonetheless, Table II presents the performance of various contemporary HAR sys-tems, utilizing classification accuracy for comparison.Notably, the measurement-based HAR methods and partially utilizing data demonstrate strong classification accuracies.The Vid2Doppler [53] method, which translates video to radar data, achieves an accuracy of 81.4%, while our simulation-based approach, converting MoCap data to radar data, achieves a higher accuracy of 97.8%, both utilizing entirely simulated training data.
This section demonstrated the utility and efficacy of the simulation-based framework in the real world.The classification accuracy of the simulation-based directionindependent HAR system is comparable to the current HAR systems [18], [59], with the additional consideration of the multidirectional HAR problem.Moreover, our simulation-based framework is unique in its ability to generate realistic, diverse, and unlimited labeled MIMO radar datasets with software-defined operating parameters and configurations.Therefore, the proposed simulation-based framework in Fig. 1(b) can be readily used to develop other SISO and MIMO radar-based classifiers, e.g., for sign language detection.

VIII. CONCLUSION
The progression of SISO and MIMO radar-based classifiers is primarily impeded by the unavailability of large labeled training datasets.Therefore, as a proof-of-concept, we have presented in this work a simulation-based approach to address the concern of data scarcity for monostatic, bistatic, and multistatic SISO and MIMO radar systems.Although our focus was on realizing a 2 × 2 MIMO radar-based directionindependent HAR system, the utility of our simulation-based framework extends beyond HAR applications.
The proposed simulation-based framework provides the flexibility to synthesize software-defined human movements using MoCap data-driven activity simulation.We proposed a MIMO channel model to convert simulated 3-D trajectories into received RF signals, while considering a user-defined antenna configuration of a distributed MIMO radar system and the multipath components emanating from the non-stationary simulated point scatterers.The synthesized RF signals were further processed to simulate the multiperspective MIMO radar signatures used to implement our simulation-based directionindependent HAR system.
To generate a diverse training dataset for radar-based HAR systems, we introduced multistage data augmentation techniques at the motion-layer synthesis, physical-layer synthesis, and signal-layer synthesis within our simulation-based framework.The multistage data augmentation techniques helped to gain absolute control over various factors, such as avatar size, location, velocity, acceleration, PRI, and radar antenna configuration.By using these techniques, we augmented the basic MoCap data to 5652 micro-Doppler signatures, drastically minimizing the overall training workload and demonstrating the effectiveness of our simulation-based approach for realizing radar-based classifiers.Our MIMO radar-based HAR system trained on the simulated micro-Doppler signatures achieved classification accuracy of 97.83% when tested with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
actual radar data.As our study eliminates the need for direct involvement of human participants and an actual radar system, we believe that the proposed proof-of-concept will be of great importance for training future SISO/MIMO radar-based classifiers.
Our MIMO channel model opens up new research perspectives for modeling received RF signals at the scatterer level.For example, future studies can explore the optimization of scatterer-level parameters, such as the simulated TV path gains.A limitation of this research is that the methods discussed are not directly applicable to the moving clutter scenario where the radar antennas are non-stationary.This research gap is beyond the scope of this work and can be addressed in future studies.

Manuscript received 11
March 2024; accepted 5 April 2024.Date of publication 15 April 2024; date of current version 15 May 2024.This work was supported by the Research Council of Norway carried out within the scope of the CareWell Project under Grant 300638.The associate editor coordinating the review of this article and approving it for publication was Prof. Takuya Sakamoto.(Corresponding author: Sahil Waqar.)

Fig. 1 .
Fig. 1.(a) Design of conventional (experimental-based) direction-independent HAR systems that require human subjects and a MIMO radar system for their training.(b) Design of the proposed simulation-based HAR system that requires the simulated radar signatures for its training.

Fig. 2 .
Fig.2.Testing phase of both experimental and simulation-based direction-independent HAR systems.In the testing phase, the performance of the simulation-based HAR system is evaluated against unseen real radar signatures.

Fig. 3 .
Fig. 3. Virtual 3-D propagation environment comprising a non-stationary avatar with 21 simulated point scatterers on its body segments and a simulated 2 × 2 multiperspective MIMO radar system.
(l) 1 (t) of the L = 21 simulated point scatterers.The lth TV propagation delay τ (l) i (t) depends solely on the spatial trajectory of the lth marker.Therefore, when a person suddenly falls, the abrupt change in the spatial positions of the upper-body segment is reflected in the corresponding TV propagation delays τ ), as illustrated in Fig.4.In Fig.4, the TV propagation delays τ demonstrate the repetitive nature of the walking activity.By analyzing the TV propagation delays τ

Fig. 4 .
Fig. 4. Simulated TV propagation delays τ of the L simulated point scatterers for the five distinct human activities and Radar 1 .

i
(t) in (3) models the strength of the lth marker in the received signal.For Radar i and L simulated point scatterers, we use time-invariant path gains a (l) i in (3) to avoid unnecessary complexity.Therefore, we have a

i
(t), beat frequencies f (l) b,i (t), and phases φ (l) i (t) of the lth beat signal s (l)

Fig. 10 .
Fig. 10.Training history of our simulation-based direction-independent HAR system.

Fig. 11 .
Fig. 11.Confusion matrix of our simulation-based multiperspective HAR classifier with a classification accuracy of 97.83%.

TABLE I DTW
DISTANCE METRIC IS CALCULATED FOR THE SIMULATED AND REAL (ACTUAL) TV MEAN RADIAL VELOCITIES vi (t) OF FIG. 8