Synthetic Sensor Data Generation Exploiting Deep Learning Techniques and Multimodal Information

In recent years, deep learning techniques have revolutionized the field of data generation, including the creation of synthetic sensor data. The ability to generate large quantities of diverse, high-quality data have significant implications in fields such as robotics and computer vision. Synthetic sensor data generation using deep learning techniques involves training a model to generate data that closely resembles real-world sensor data. This is achieved by feeding the model large amounts of real-world data and using it to learn the underlying patterns and structures in the data. Once trained, the model can generate data that are similar in quality and complexity to the original data, but with added variations and noise to increase diversity and realism. Several deep learning techniques such as generative adversarial networks (GANs), variational autoencoders (VAEs), recurrent neural networks (RNNs) have shown impressive results in generating synthetic data for a range of sensors. In this letter, deep learning techniques based on autoregressive convolutional recurrent neural networks (CRNNs) for multivariate time series prediction have been exploited to generate synthetic data for ultrawide band (UWB) and for ultrahigh frequency radio frequency identification (UHF-RFID) sensors. The neural network presented here incorporates measurements from sensors and heterogeneous information, such as the position of the antennas and tags in the environment, to generate synthetic data that can be used to augment real-world data, increasing diversity and robustness of datasets. The deep generation approaches presented here can help researchers generate datasets to validate algorithms without the need for expensive and time-consuming data collection.


I. INTRODUCTION
The use of sensors has become increasingly prevalent in various industries, from health care to manufacturing. With the proliferation of sensors comes an abundance of data, which can be analyzed to gain insights and make informed decisions. However, the collection and analysis of sensor data can be challenging, especially when dealing with rare events or situations that are difficult to replicate. To overcome these challenges, researchers have turned to synthetic sensor data generation using deep learning techniques. This involves using algorithms to create realistic, simulated sensor data that can be used for a variety of purposes. In this letter, we will explore the latest advancements in synthetic sensor data generation exploiting deep learning techniques. We will discuss the approach and methodology that have been developed, as well as the challenges and limitations that still need to be addressed. In addition, we will examine the potential applications of synthetic sensor data generation in two real experiments involving ultrahigh frequency radio frequency identification (UHF-RFID) and ultrawide band (UWB) sensors. Ultimately, the aim of this letter is to shed light on this emerging field and highlight its potential for advancing sensor technology and data analysis. The innovative contribution in this letter is twofold: first, an effective solution to synthetic data generation with deep neural networks (DNNs) and its application to UHF-RFID and UWB that is presented here, to the best of our knowledge, has not been explored in literature. Furthermore, the other contribution is the use of DNNs combining time-series data and multimodal information (as, for instance, the position of UHF-RFID and UWB antennas, etc.) as inputs to the neural network. This allows the presented DNN to embody unmodeled dependencies for the sensor state measurement (e.g., multipath, bias, etc.) that would not be caught by an incomplete model used to generate synthetic data. Different methodologies for synthetic data generation have been presented in [1] and [2]. These surveys present several applications for synthetic data generation in health care, drugs, vision, audio, and business, but there is no mention of any application in the context of sensors or robotics that is the main field of application of the DNN presented in this letter. Furthermore, the DNN architectures presented in the reviews range from generative adversarial network (GAN), variational autoencoder (VAE), recurrent neural network (RNN), and long short-term memory (LSTM), but there is no mention of convolutional recurrent neural network (CRNN) for multivariate time series as the architecture presented in this letter. Therefore, the novelty of the presented work is related to the field of application that is new and extends the capabilities of the DNNs; in addition, the presented DNN architecture has not been used before in synthetic data generation for sensors. Throughout the years, there has been a growing interest in synthetic data generation across a range of domains. Classical methodologies for sensor data generation have been presented in [3], where sensor models have been improved to generate synthetic data. In recent years, deep learning techniques have been exploited for sensor data processing (as in [4] and [5]) and, in the field of sensor data generation, several studies have explored the use of generative models to generate synthetic data that closely resembles real-world sensor data as in [6] and [7]. Furthermore, deep generative This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ models have been studied in [8] and [9] and gave the basis for a new approach to data generation. Deep generation has been exploited in several applications like generative modeling of images [10]. In the field of sensor data generation, other studies have explored the use of RNNs for time-series data generation [11], as well as the combination of multiple deep learning models for generating data from multiple sensors using, for example, GANs as in [12]. In the context of robotics and autonomous systems, deep learning techniques such as convolutional neural networks (CNNs) have been used for the 3-D localization of RFID antennas [13] and to detect obstacles with UWB with long short-term memory neural networks such as in [14]. In this letter, we focused on two specific typologies of sensors: UWB that is widely used in robotics for localization and mapping tasks as in [15] and UHF-RFID that have been used in autonomous systems for simultaneous localization and mapping (SLAM) tasks, as in [16].

II. SENSOR DATA DEEP GENERATION
Sensor data deep generation is the process of synthesizing new data using a combination of sensory inputs. This approach involves training deep learning models to learn the underlying patterns and relationships between different types of data. By combining these different sources of information, it becomes possible to generate more accurate and realistic representations of the real world. The process of sensor data deep generation, presented in this letter, has been obtained by exploiting CRNNs. The model will be trained on large datasets of multimodal data to learn the complex patterns and relationships between different types of data.

A. Neural Network Architecture
The proposed architecture for time-series data from sensors is a multivariate CRNN (extension of the traditional CRNN architecture) designed to handle multiple input signals and variables. Unlike a standard CRNN that takes a single input signal, a multivariate CRNN takes multiple input signals and combines them to learn a joint representation of the data. The input signals in the proposed multivariate CRNN come from sensors, different modalities of data, or multiple channels of information; in our case, in addition to sensor measurements, the other inputs to the CRNN are the robot orientation, position, and the UHF-RFID and UWB tag positions. The convolutional layer in the proposed multivariate CRNN is used to extract features from each of the input signals independently. The recurrent layer is used to capture the temporal dependencies between the different input signals. By combining the information from multiple input signals, the network can learn to extract a joint representation of the data that is more informative than any individual input signal alone. The fusion of convolutional and recurrent layers in the presented architecture allows the network to benefit from both spatial and temporal information simultaneously. The convolutional layers capture spatial features and patterns, while the recurrent layers model temporal dependencies. By combining these components, the network can effectively extract and model spatiotemporal relationships in sensor data, leading to improved performances. The novelty presented in this letter is that the multivariate CRNN has been extended to a deep generation task. Fig. 1 shows the structure of the multimodal CRNN network; here, the inputs to the network have been left generic in order to underline the ability of the network to generalize any kind of sensor data. The CRNN network for n inputs contains the following. 1) n 1-D-convolution layers: Perform temporal convolution calculation based on the input data. Each convolution layer has an input of dimension N s (number of time-steps), and consists of 64 output filters with a kernel size of 2. 2) n Activation layers: The activation function for these layers is the rectified linear unit (ReLU) function, widely used in neural networks and given by: n Max pooling layers: Each feature map's dimensionality is decreased while the most crucial data is kept by the spatial pooling. The spatial window size for the max pooling layers is 2. 4) n Dropout layers: These layers have been added to prevent overfitting. 5) Concatenate layer: This layer concatenates the inputs from the n convolution layers, generating a single tensor as a concatenation of all inputs. 6) LSTM layer: The long short-term memory layer is introduced to simulate long-term dependencies between the various input signals and to learn the relationships between them over time.
The dimensionality of the output space is 200. 7) Activation layer: The activation function after the LSTM layer is the tanh function which has been introduced to help mitigating the vanishing gradient problem and stabilizing the training process since the input and output data are normalized between −1 and 1. The tanh function is given by tanh(x) = e x −e −x e x +e −x . 8) Dense layer: The convolutional and LSTM layers are merged to a dense (fully connected) layer. This layer flattens the high-level features that were learned by previous layers. Its single output represents the generated sensor measurements.

B. Hyper-Parameters Selection
In Section II-A, we described the DNN architecture, explaining the choices in its topology. In this section, we describe the process of hyper-parameter selection in order to optimize the learning process. The optimizer that has been selected is the Adam optimizer which calculates an exponential moving average of the gradient and the squared gradient using two hyper-parameters to control the decay rates of these moving averages (β 1 and β 2 ). The other hyper-parameters used by the Adam optimizer are the learning rate and a small number to prevent any division by zero in the implementation ( ). The other hyper-parameters involved in the learning process are the batch size and the number of epochs. In order to select the optimal combination of hyper-parameters, we used the grid search method: each of these hyper-parameter combinations that correspond to a single model is said to lie on a grid point. For this methodology we selected cross-validation to train and evaluate each of these models. For the grid search method, the parameter's intervals and the grid's step are as follows. For the epochs, the interval is [10,250] with step 10, for the batch size, it is [10,60] with step 10, for the learning rate, it is [0.0001,0.05] with step 0.001, for β 1 , it is [0.3,0.99] with step 0.01, and for β 2 , it is [0.3,0.99] with step 0.01. From the output of this process, for the UHF-RFID dataset, the number of epochs is 200, batch size is 40, learning rate is 0.002, β 1 is 0.89, β 2 is 0.997, and is 10 −8 . For the UWB dataset, the number of epochs is 20, batch size is 30, learning rate is 0.001, β 1 is 0.91, β 2 is 0.998, and is 10 −8 .

C. Overfitting
During the training, a method to mitigate the risk of overfitting, has been adopted; the employed technique is a dropout, which involves randomly dropping out a proportion of the neurons in each layer during training, thereby forcing the remaining neurons to learn more robust and independent representations of the data, and ultimately improving the model's ability to generalize to new, unseen data. In the presented network architecture, a dropout layer has been introduced for each input with a dropout rate of 0.1; the parameter has been chosen after several training, showing the best results.

D. Loss Function
The goal of training a neural network is to find the parameters (weights and biases) that minimize the loss function, which means the model is able to make accurate predictions on the training data. In this letter, we have chosen a combination of MSE and Pearson correlation in order to penalize the model for making large prediction errors and, in the meantime, to maximize the correlation between predicted and actual values. The formula of the selected loss function is reported in the following: where m is the number of samples, y i is the true label for the ith sample, y i is the predicted label for the ith sample,ȳ is the mean of the true labels, andȳ is the mean of the predicted labels. In addition to the loss function, the DNN training process also involves an optimizer, which is responsible for updating the model parameters to minimize the loss function.

III. EXPERIMENTS
In this section, the experiments for the UHF-RFID and UWB cases are presented. It is important to note that in order to ensure good generalization ability, we designed the training and test samples in order to assure that they have been drawn from the same underlying distribution. We used the 70% of the data for training and the remaining for evaluating the model's performance. Furthermore, we captured temporal variability in both training and testing samples and randomly shuffled the training data in order to prevent the CRNN from overfitting to specific temporal patterns present in the training set. Finally, we also used regularization techniques (dropout, as mentioned in Section II-C) in order to reduce the network's sensitivity to specific training samples and to achieve a more robust generalization capability.

A. Ultrahigh Frequency Radio Frequency Identification
In this experiment, a UHF-RFID reader, with the antenna pointing towards the ceiling, has been mounted on a unicyclelike robot rotating  around the axis of the antenna; an RFID passive tag has been mounted on the ceiling in correspondence with the antenna on the robot. The UHF-RFID reader is able to measure the phase shift φ in the RFID signal backscattered by the RFID tag, which is the measurement that we want to generate through the CRNN. The number of inputs n is 5, comprising the antenna rotation angle θ (from the wheel encoder readings), the position (x T r , y T r , z T r ) of an RFID passive tag mounted on the ceiling of the environment and the phase offset φ o depending on the hardware.
1) Training Results: The training data obtained from the acquired sensors consist of multiple acquisitions (> 30) of about 1 min long time-series for the n inputs previously defined and the phase shift φ in the RFID signal backscattered by the RFID passive tag as the output. These data have been used to train the CRNN on an Intel Core i7-12700H, 2.3 GHz, 14 core, and 16 GB RAM with an NVIDIA GeForce RTX 3050 Ti GPU. The training time is around 28 min. The training and validation losses for 200 epochs of the CRNN are depicted in Fig. 2(a). The model is not overfitting for the validation loss is decreasing and the gap between training and validation accuracy is small.
2) Testing Results: The trained CRNN network has been tested with real measurements of θ, x T r , y T r , z T r , φ o , as inputs; the output of the network is a deep-generated phase φ that closely resembles the realworld data, as shown for three cases in Fig. 3.

B. Ultrawide Band
A UWB antenna has been mounted on a unicyclelike robot moving in a cluttered indoor environment (10 m × 6 m), where UWB tags have been placed in known positions. The UWB antenna is able to read ranges from each UWB tag. The number of inputs n is 6, comprising the robot position (x r , y r , z r ) and the position of the UWB tag located in known position (x T u , y T u , z T u ).
1) Training Results: The training data obtained from the acquired sensors consist of multiple acquisitions (> 50) of about 5 min long time-series for the previously defined inputs and the range measurement ρ from UWB antenna to tag as the output. These data have been used to train the CRNN on the same hardware described in Section III-A1. The training time is around 35 min. The training loss for 20 epochs of the CRNN is depicted in Fig. 2(b). The model is not overfitting as the validation loss is decreasing and the gap between training and validation accuracies is small.
2) Testing Results: The trained CRNN network has been tested with real measurements of (x r , y r , z r , x T u , y T u , z T u ), as inputs; the output of the network is a deep-generated range ρ that closely resembles the real-world data, as shown for four test datasets in Fig. 4.

C. Generalization
A critical aspect of data generation technique is its ability to generalize well to unseen scenarios and data distributions. In order to evaluate the generalization performance, we conducted experiments to assess the model's ability to produce plausible sensor data in various conditions that were not present in the training dataset. To assess the generalization ability of our CRNN architecture, we collected a diverse set of sensor data from previously unseen environments. These environments consisted of varying layouts and structures that were not encountered during the model training phase. By evaluating the synthetic sensor data generated by the CRNN architecture in these unseen environments, we could determine the model's ability to capture the underlying patterns and generate realistic sensor readings. In addition, we also conducted a cross-domain evaluation to assess the model's ability to generalize across different scenarios and modalities. This evaluation involved generating sensor data from UWB and UHF-RFID modalities and examining the similarities and differences between the synthetic and real sensor data. In order to quantify the generalization ability of the proposed workflow, we employed rmse as performance metrics. In the case of UHF-RFID, the average rmse over the testing dataset is equal to 0.09 rd, while for the UWB case, the average rmse equals to 0.06 m. The results of our evaluation on unseen environments and cross-domain scenarios demonstrate the strong generalization ability of the proposed CRNN architecture.

IV. CONCLUSION
In this letter, deep learning techniques based on autoregressive convolutional recurrent neural networks (CRNNs) have been exploited to deep-generate synthetic data for two typologies of sensors: UHF-RFID and UWB. The results in the letter show that the presented network architecture allows to obtain a very accurate generation of synthetic measurements that can be used to augment real-word data. Furthermore, the same network architecture can be exploited to filter real measurements, maintaining the same network structure. Future work can extend the presented architecture to integrate a classifier that could be able to take into account the environment topology (e.g., furniture, walls, materials, and their positions) and use this information to generate more accurate sensor data related to environmental characteristics.