Fully Connected Generative Adversarial Network for Human Activity Recognition

Conditional Generative Adversarial Networks (CGAN) have shown great promise in generating synthetic data for sensor-based activity recognition. However, one key issue concerning existing CGAN is the design of the network architecture that affects sample quality. This study proposes an effective CGAN architecture that synthesizes higher quality samples than state-of-the-art CGAN architectures. This is achieved by combining convolutional layers with multiple fully connected networks in the generator’s input and discriminator’s output of the CGAN. We show the effectiveness of the proposed approach using elderly data for sensor-based activity recognition. Visual evaluation, similarity measure, and usability evaluation are used to assess the quality of generated samples by the proposed approach and validate its performance in activity recognition. In comparison to the state-of-the-art CGAN, the visual evaluation and similarity measure demonstrate that the proposed models’ synthetic data more accurately represents actual data and creates more variations in each synthetic data than the state-of-the-art approach respectively. The experimental stages of the usability evaluation, on the other hand, show a performance gain of 2.5%, 2.5%, 3.1%, and 4.4% over the state-of-the-art CGAN when using synthetic samples by the proposed architecture.

are either tagged to a certain location and human activity 23 inference is based on the user's interaction with the tagged 24 object, or they are deployed in an environment where no 25 tag or device is required. Passive infrared sensors, pressure 26 sensors, and contact switches are examples of fixed sensors. 27 The associate editor coordinating the review of this manuscript and approving it for publication was Jiachen Yang . In the second, the sensing technologies are worn by users 28 or attached to portable devices such as mobile phones and 29 smartwatches. Accelerometers and gyroscopes are examples 30 of wearable sensors. Wearable sensors are ubiquitous, unob-31 trusive, cheaper, less harmful, easier to deploy and use, and 32 capable to support real-time activity recognition compared to 33 other sensing modalities.
VAE is not well known in synthetic sensor data generation 100 for human activity recognition. This is because synthetic data 101 generated by VAEs tend to be more blurred. 102 GANs generate more realistic synthetic data than VAEs 103 in the sensor-based activity recognition field [25]. Due to 104 its capacity to create verisimilar synthetic examples, GAN 105 has become the most prominent data generative model for 106 overcoming the lack of data challenges for sensor-based 107 activity recognition. One of the recent achievements of GAN 108 network architectures for sensor-based activity recognition 109 is developing the Unified Conditional GAN (CGAN) for 110 synthesizing more than one human activity data in a single 111 training process [26], [27]. However, state-of-the-art CGAN 112 eliminates the use of fully connected layers in their model's 113 generator and discriminator. This architectural choice sug-114 gested by [14] is also adopted by the majority of other existing 115 GANs without investigating its effect on sample quality. 116 The main objective of this study is to develop an improved 117 CGAN architecture that combines convolutional layers with 118 multiple fully connected networks in the input and output 119 layers of the generator and discriminator to generate more 120 realistic synthetic activity signals. 121 This study's contributions are summarized as follows: 122 a. To our best knowledge, we are the first to propose an 123 enhanced CGAN architecture that combines convolu-124 tional layers with multiple fully connected networks in 125 the input and output layers of the generator and discrim-126 inator respectively to generate more quality synthetic 127 samples for sensor-based activity recognition. 128 b. We conduct comprehensive experiments to compare 129 the quality of the generated samples by the pro-130 posed approach with the state-of-the-art approach using 131 visual and similarity measure evaluation techniques. 132 The visual evaluation and similarity measure tech-133 niques demonstrate that the proposed models' synthetic 134 data more accurately captures the real data and creates 135 more variations than the state-of-the-art approach.

136
c. The performance of the proposed approach is trained on 137 elderly datasets for sensor-based activity recognition. 138 Using synthetic samples, the proposed architecture out-139 performs the state-of-the-art CGAN by 2.5%, 2.5%, 140 3.1%, and 4.4%.

141
The rest of this paper is organized as follows. Section 2 142 reviews the related work, Section 3 details the proposed 143 architecture, Section 4 explains the experimental setup, the 144 performance of the proposed architecture is evaluated in 145 section 5, and Section 6 concludes the study. It contains two components built by multilayer perceptron: 210 generator and discriminator. The generator takes a noise vec-211 tor, which is randomly generated via a-priori distribution, 212 as an input to generate fake samples. It maximizes the proba-213 bility of the fake samples being classified as real. The real and 214 fake samples are then fed into the discriminator to estimate 215 the probability that the fake data is drawn using the real data 216 [43]. Equation (1) contains the overall objective function of 217 GAN where G is the generator, D is the discriminator, z is an 218 a-priori distribution noise, pz(z) is fake data distribution, and 219 pdata(x) is real data distribution.

II. RELATED WORK
GANs for sensor-based activity recognition fall into two 223 categories: semi-supervised and supervised GANs. Semi-224 supervised GANs were mainly used to overcome left-out 225 user's activity recognition, which may have declined the 226 recognition performance of the learning model [44]. Any 227 human subject whose data is not fed to the deep learning 228 model for training is a left-out user. A limitation of semi-229 supervised GANs is that they require the use of test data 230 during the training phase. In addition to that, a different model 231 must be trained for a single or group of target subjects [43]. 232 However, the focus of this study is to improve sample qual-233 ity of state-of-the-art GANs that generate synthetic data for 234 sensor-based activity recognition in a single model training 235 process without using any data from the test set during the 236 training. This could be achieved using supervised GANs. 237 The first supervised GAN for sensor-based activity recogni-238 tion using deep learning was proposed by Wang et al. [45]. 239 They developed a GAN framework called SensoryGAN that 240 contains a generator and a discriminator. First, their method 241 takes random noise and real sensor data as input. Then, the 242 generator and the discriminator play a mini-max game to 243 generate synthetic sensor data for three human activities: 244 stay, walk, and jog. They also applied three visualization 245 techniques: local, global, and memory independent to satisfy 246 the GANs community from computer vision while evaluating 247 the quality of generated data.

248
Nevertheless, the limitation of this study is that it can 249 generate data for a single class of activity but not accom-250 modate various classes of different human activities in a 251 single training process. This is time-consuming and makes 252 the learning process long. Shi et al. [46] also implemented a 253 DCGAN-based data augmentation method called HARAug-254 GAN to enlarge the activity scope of the SensoryGan method. 255 Although they consider more than three activities in their 256 method, their method is not unified as they generate the 257 sensory data for each activity separately. Hong et al. [26] 258 solved this issue by developing a unified model for gener-259 ating synthetic data of 5 human activities (standing, laying 260 down, walking, cycling, and jogging) in a single training 261 process. This is achieved by adapting CGAN architecture that 262 adds a conditional factor with class label information to the 263 VOLUME 10, 2022   Table 1 and Table 2 show the 276 architecture of the Generator and Discriminator respectively. recognition. However, their model eliminates the use of fully 280 connected from the GAN network. Its network architecture 281 consists of a 1D-convolution chain and 1D-transposed convo-282 lution chain while CNN architecture is adopted to build their 283 discriminator.

284
Literature confirms that state-of-the-art Unified GAN net-285 works for generating sensor-based activity recognition data 286 have adopted models that eliminate or restrict the use of fully 287 connected layers from the GAN network architecture. This 288 hurts the output quality of the GAN network [47]. This study 289 proposes an enhanced CGAN architecture that combines con-290 volutional layers with multiple fully connected networks in 291 the input and output layers of the generator and discriminator 292 respectively to generate better synthetic activity signals as 293 explained in the next section.

295
The proposed method in this study, the Fully Connected 296 CGAN (FCGAN) Model, enhances the Unified CGAN model 297 architecture by Hong et al.
[26] to improve sample gener-298 ation quality. Unlike the state-of-the-art Unified GANs, the 299 Fully Connected CGAN converts the low-dimensional fake 300 input of the generator to a high-dimensional space of the 301 activity signals. This is achieved by employing three fully 302 connected layers as embedding layers for the first task of the 303 generator, followed by convolutional and LSTM layers for 304 the second and third task respectively. The fully connected 305 layers employed as the first task of the generator learn the 306 relationship between noise vectors and human activity fea-307 tures by mapping noise vectors to activity features of the 308 human activities. This allows the model to generate subtle 309 variations in different spatial zones, which helps the generator 310 to synthesizing more realistic samples. Table 3 shows the 311 generator's architecture of the fully connected CGAN.

312
This study also enhances the discriminator's network archi-313 tecture of the base model by adding three fully connected 314 networks to the discriminator's network. This provides the 315 functionality to convert the discriminator's input to a lower-316 dimensional space before classification, preventing the dis-317 criminator's loss from becoming too low. In this regard, the 318 generator learns faster, leading to a faster model convergence. 319 Table 4 shows the discriminator's architecture of the fully 320 connected CGAN.

321
The used fully connected networks in the FCGAN gener-322 ator's and discriminator's architecture comprise three fully 323 connected layers with almost similar configurations to the 324 base model as a recent study show impressive results in 325 having three fully connected layers with convolution layers 326 in GANs generator and discriminator architecture [47].

328
In this research, a sensory dataset collected from elderly sub-329 jects using an accelerometer and gyroscope is adopted to train 330 the generative and classification models in this study. The 331 subjects of the dataset were asked to perform walking, stand-332 ing, sitting, lying down, sit-to-lie, sit-to-stand, lie-to-sit, and 333  stand-to-sit activities in their preferred style and pace. The 334 elderly dataset was sampled in fixed-width sliding windows 335 of 2 seconds and 50% overlap (100 readings/window). A full 336 description of the used dataset can be found in section 4.1.1 337 of [48]. 338 Preprocessed data is used to train the FCGAN and compare 339 it with the state-of-the-art Unified CGAN by Hong et al. [26]. 340 Both models learn the latent patterns of human activities in a 341 single training process. The models' training was continued 342 for 1000 epochs with a batch size of 60. The experiments 343 were implemented in Google Colaboratory (also known as 344 Colab)'s Jupyter notebook environment using Keras python 345 library with Tensorflow as the backend. Google Colab is 346 chosen due to its runtime configurations for deep learning 347 applications and free access to a robust GPU.

349
This study uses four techniques to evaluate the performance 350 of the proposed approach: training evaluation, visual evalua-351 tion, similarity measure evaluation, and usability evaluation. 352

353
GAN fully converges when the discriminator cannot dif-354 ferentiate real examples from fake ones and is considered 355 stable when the generator's active models did not have a 356 high loss value after model convergence. The stability and 357 the speed of GAN full convergence are highly associated 358 with the loss pattern of the generator and discriminator as 359 it dictates the number of epochs required by the models to 360 fully converge. The objective of the first evaluation technique, 361 training evaluation, is to provide more insights into how far 362 the proposed architecture is more stable and achieves faster 363 model convergence. It visualizes the proposed architecture, 364 FCGAN, model convergence, and stability and compares it 365 with the state-of-the-art CGAN unified CGAN architecture. 366 This is performed by extracting the discriminator's loss, 367 and generator's loss information during the models' training 368 process. 369 Figure 1 compares the discriminator's loss of the Unified 370 CGAN with the discriminator's loss of the FCGAN. It shows 371 that the FCGAN discriminator has higher loss values than the 372 Unified CGAN discriminator across all epochs.

373
Discriminator's higher loss has the potential to provide 374 larger gradients to help the generator learn the data dis-375 tribution faster as shown in Figure 2 which visualizes the 376 generator's loss of the compared models. As can be observed, 377 the FCGAN generator scores lower loss values compared to 378 the Unified CGAN. This leads to a faster model convergence 379 and achieves better learning stability. 380 The experiments show that the generator of the Unified 381 CGAN starts to converge at epoch 874 while the generator 382 of the Fully Connected CGAN starts to converge at epoch 383 235. This is considered the first stage of full convergence of 384 both models. This study uses the generated data in this period 385 for investigation and analysis. The experiments also confirm 386 that the FCGAN, unlike the Unified CGAN, remains more 387 stable after its first stage till its convergence at the end of 388 training epochs. On the other hand, the Unified CGAN fails 389 VOLUME 10, 2022  verisimilar to the pattern of the real signals. In the case of 409 walking and stand-to-sit activities, the patterns of the syn-410 thetic signals of the generative models are not similar to the 411 pattern of the real sample data. However, the Unified CGAN 412 manages to reconstruct the signal pattern of the different axes 413 of real data for each activity.

414
Although Unified CGAN able to generate synthetic data 415 that captures the underlying pattern of the real data, the 416 pattern is quite different from the real data. The Gaussian 417 input noise feed into the unified CGAN model only has a 418 minor effect in increasing synthetic data variations.

419
Nevertheless, the local evaluation also shows that FCGAN 420 able to generate signals that are more similar to the real 421 signals than the Unified CGAN in all types of activities, 422 and more precisely in dynamic and transition activities. This 423 confirms that the synthetic data produced by the FCGAN 424 more accurately represents the real data than Unified CGAN. 425

426
Euclidean Distance Measure is a well-known method used by 427 both past [49] and state-of-the-art studies [26] for resource-428 limited devices like sensors to evaluate the similarity between 429 time series data for sensor-based activity recognition. This 430 is due to its low computational complexity and data storage 431 requirement while demonstrating the variation within each 432 class of generated samples.

433
The objective of this evaluation is to evaluate the variability 434 of the generated data by the proposed model. We, therefore, 435 use Euclidean Distance Measure to evaluate the performance 436 of the synthetic data generated by the proposed model. This 437 is conducted by comparing the similarity of the synthetic 438 data generated by the proposed model in this study with the 439 similarity of the synthetic data generated by the state-of-the-440 art CGAN. We also compare the e similarity of synthetic data 441 of the comparative models with the real data.

442
This determines the similarity between the real and syn-443 thetic data (RTS) as well as the similarity between the real 444 to real data (RTR) and synthetic to synthetic data (STS). 445 For each experiment, the degree of similarity between two 446 activity signals (2 seconds) is measured and recorded. The 447 signals are randomly selected, and each experiment was run 448 at least 10 times to avoid bias in the analysis of the evaluation. 449 The distance function to measure the similarity between a 450 (one data point of synthetic sample a) and b (one data point 451 of synthetic sample b) is given in (2). Table 5 shows the Euclidean Distance of the synthetic data 454 generated by FCGAN. The results show that the Euclidean 455 Distance between real to real (RTR) and real to synthetic 456 (RTS) are very similar in all the dynamic and transitional 457 activities. This is good indication as the gap between most 458 of the generated data and real data is small. The dissimilar-459 ity in RTR and RTS for stationary activities is high com-460 pared to other activities. This problem is also observed in 461    The third evaluation technique, usability evaluation, aims to 478 investigate the quality of the synthetic data generated by the 479 proposed architecture in improving the performance of activ-480 ity recognition classification models. First, the synthetic data 481 generated by both FCGAN architecture and Unified GAN 482 is preprocessed to perform four experiments on it together 483 with the real data using the best-performing deep learning 484 classifiers by Jimale and Noor [48]. These experiments are 485 experiments on 70% real data and 30% synthetic data, exper-486 iments on 50% real data and 50% synthetic data, experiments 487 on 30% real data and 70% synthetic data, and experiments on 488 100% synthetic data. The classification accuracy, measured 489 using (3), is shown in a form of average accuracy. To record 490 the average accuracy, each experiment was run at least 10 491 times.     drops whenever real data is hybrid with synthetic data from 503 the generative models in both models. However, the accuracy 504 drop of the Unified CGAN is 0.6% and 1.3% higher than 505 the FCGAN when the ratio of the real and synthetic data is 506 50%:50%, and 30%:70% respectively. This confirms that the 507 proposed architecture generates more realistic synthetic data 508 than the state-of-the-art architecture. 509 We also show the confusion matrix of the four experimental  Table 8 and Table 9. All classes of activities perform well with 514 quite low classification performance for class 7 (Lie-to-sit), 515 class 6 (Lying down), and class 4 (Sit-to-stand). The results 516 of the stage I also reveal that FCGAN outperforms Unified 517 CGAN in all activities except for class 2 (Stand-to-sit) and 518 class 4 (Sit-to-stand).

519
As shown in Table 10 and Table 11, the recognition per-     Likewise, the recognition performance of all activity 528 classes drops in stage III but without any improvements seen 529 in stage III (see Table 12 and Table 13). Class 5 (sit-to-lie), 530 class 7 (lie-to-sit), and class 0 (walking) suffers the most 531 drops of FCGAN respectively while class 6 (lying down), 532 class 5 (sit-to-lie), and class 0 (walking) scores the highest 533 drop of the unified CGAN respectively.

534
As shown in Table 14 and Table 15, the recognition per-535 formance of all activity classes drops in the final stage of 536 experiments for the Unified CGAN without any performance 537 improvement of a single class. class 0 (walking), class 2 538 (stand-to-sit), class 3 (sitting) scores the worst drop in this 539     In the proposed architecture, the generator and discriminator 550 networks encompass deep fully connected and convolution 551 layers, in contrast to state-of-the-art network architecture. 552 We have conducted several experiments on sensory data col-553 lected from elderly data and showed that our proposed archi-554 tecture generates better samples and converges faster through 555 visual and usability evaluation techniques. All our experi-556 mental stages were limited to supervised CGANs. However, 557 the proposed method is significant enough to be combined 558 with any other GAN configuration. Therefore, a possible 559 extension of the proposed work is to study its effectiveness in 560 unsupervised and semi-supervised GAN setups. In the future, 561 the enhanced architecture will also be improved further to 562 produce more quality samples. In addition, other datasets for 563 sensory-based activity recognition will be experimented with 564 to show the robustness of the newly proposed network. 565