An Ensemble Learning-Based Prediction Model for Image Forensics From IoT Camera in Smart Cities

Recent years witnessed a surge in the number of IoT cameras in smart cities. In this article, an ensemble learning-based prediction model for image forensics from IoT camera is proposed. In particular, our goal is to obtain human body measurements from 2D images taken from two views. Firstly, 24 body part features are extracted by the DensePose algorithm from the two views. Secondly, the features of the upper body part are integrated with height and body weight features. Ensemble learning is then performed with the LightGBM algorithm and a regression prediction model is constructed. The proposed noncontact image prediction method is simple and workable. Its feasibility and validity are verified on an experimental dataset. Experimental results demonstrate that the proposed method is highly reliable in the size prediction of different body parts. Specifically, the mean absolute errors of chest circumference, waistline and hip circumference are about 2.5 cm, while the mean absolute errors of other predictions are about 1 cm.


I. INTRODUCTION
In recent years, IoT cameras have become ubiquitous in smart cities, and these sensors are widely adopted for forensic applications [1], [2]. One of the important application areas is the measurements of the human body for places where strong security is needed. In particular, the ability to acquire human body measurements, i.e., anthropometry, in a noncontact manner is highly desired. In this article, we propose an ensemble learning-based prediction model that allows forensic evidence to be extracted from images both efficiently and effectively. Noncontact human body size measurement of different body parts based on the static images which are taken by cameras not only can improve The associate editor coordinating the review of this manuscript and approving it for publication was Kun Wang . user experiences, but also can save material and financial costs. With the informatization development in the clothing industry, intelligent anthropometry will lay a solid foundation for the next generation of intelligent dressing recommendation [3]. Nowadays, offline flow still owns a great space with the development of AIoT [4]- [6]. For example, users can try more clothes in a short period through the emerging virtual fit [7]. As one of the key technologies in virtual fit, anthropometry acquires measurement data of main body parts accurately in a relatively simple way. Virtual fit products on the market include Fitiquette (USA), Fits.me (UK) and Pulshion (China). All of these products generally require users to fill in relevant body sizes firstly, then generate the corresponding 3D anthropometric dummy, and finally render the dressing effect onto the 3D anthropometric dummy [8]- [10]. For these products, precision of body size data which is input by users VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ can affect the reliability of online fitting directly. Nevertheless, users often are not sure about their accurate body sizes. Obviously, contact manual measurement of body sizes will affect user experiences. Therefore, the precision of noncontact anthropometry is vital to the practicability of clothingrelated applications [11]. Although many high-precision anthropometric 3D scanners based on depth transducers like structural lights have been applied successfully in intelligent noncontact anthropometry, the device still faces with a great challenge to large-scale applications in the clothing industry, especially in small-sized clothing enterprises, due to its low popularizing rate [12]. Hence, it is urgent to propose an economic and convenient intelligent noncontact anthropometry. To meet the above demands, a simple and feasible intelligent anthropometry method was proposed in this study. The proposed method establishes a regression model based on DensePose [13] algorithm by combining real body detection results in images, image features of the human body and features of height and body weight. It can predict different body sizes accurately from end to end under casual shooting requirements and decrease accumulative errors brought by intermediate links, such as extraction of key points.

II. RELATED WORK
Artificial intelligence plays an important role in smart cities [14], [15]. Particularly, studies on noncontact anthropometry have begun in foreign countries since the late 1970s [16]. In the beginning, noncontact anthropometry was performed in USA and UK by using some large computing devices, which incurs relatively high costs [7]. One of the main benefits for anthropometry is that it would facilitate biomechanical analysis and human motion (e.g., [17]- [19]). In particular, the emerging IoT cameras allow for largescale imagery data to be collected with low cost and used as forensic evidence when needed. Typically and conventionally, computer vision-based anthropometry use 3D scanners [20]- [26]. This is, however, in contrast to our goal in this article. We propose to collect image data with lowcost IoT cameras and therefore an algorithm that can make use of 2D images only and then produce reliable human body measurements is desired.
Without using large-sized 3D human body scanners, twodimensional images of human body were collected firstly based on digital image processing and then processed generally through corrosion, expansion, smoothing, edge sharpening, etc. These operations are to extract the accurate profiles of human body for the convenience of next point extraction, measurement and computation. There's a difficulty in measurement and computation: how to calculate the nonlinear circumferences of human body. At present, calculation methods of circumferences mainly include curve fitting, regression analysis [27]- [30] and minimizing appropriate cost functions [31]- [33]. One of the limitation of purely 2D-based methods is that, it could only estimate a human's anthropometric measurements up to a scale parameter. In our work, we circumvent this problem by asking the user to provide their body height and weight as prior information.
Tan [34] and Ji et al. [35] extracted characteristic points of body parts by taking pictures of human body and then calculated circumferences through curve fitting. This method had certain research values. Yu et al. [36] also took pictures and then processed them accordingly. They mainly acquire the profiles of human body through greyscale inversion. Circumference values also could be calculated by demonstrating the calculation formula, which further solves the difficulties in circumference calculation.
Recently, many studies adopt fitting binary regression equation to calculate circumferences [12], [37], [38]. This method firstly sets up the linear relationship between the width and thickness of body parts and circumferences. Secondly, it divides the human body into different levels according to the thickness-width ratio to be corresponding to parameters of the linear equation. Finally, it makes a statistics on fitting linear equation based on mass data. Although this method is simple, the generalized anthropometric precision is generally low. Another modeling theory is to cite the double elliptic curve fitting or hyperelliptic curve fitting [39]. Although this method is well supported by a good fundamental theory and offers a relatively controllable precision, it requires mass statistical data since it is difficult to determine the model coefficients. Hence, this method is impractical.
Currently, researches on anthropometry have to search the corresponding characteristic points. However, it always cannot search characteristic points accurately, especially in casual shooting scenes where dressing can influence measurement precision. The main contribution of this study, therefore, is to propose a deep learning based method for anthropometry. Considering common features of human body and understandability of human body, deep learning can extract information of different body parts effectively and it can establish an end-to-end regression model to predict body sizes by combining the integrity of human body. Compared with the conventional method, the proposed anthropometry method is simpler and more feasible and has lower requirements on shooting. It only requires the positive and side views. Therefore, the proposed anthropometry method is highly practical.

III. THE PROPOSED METHOD A. INTELLIGENT NONCONTACT ANTHROPOMETRY
A flowchart of the proposed noncontact anthropometry based on two views is shown in Fig.1. One front picture and one side picture of the human body are taken firstly by using the ordinary color camera lens. Secondly, 24 features of body parts were extracted by the trunk network division model which is pretrained by Dense Pose algorithm. Meanwhile, features of height and body weight are integrated as the feature input of the ensemble learning framework LightGBM. The Boosting algorithm of LightGBM predicts the circumferences of human body by using the GBDT [40] learning regression, and learns the importance of statistical features automatically in  the training stage. It has characteristics of high speed, small hardware resource consumption, light model in the prediction stage, and easy migration, deployment and recalling.
The proposed method requires neither extraction of profile information of human body through various fundamental image processing operations nor extraction of key points of body parts to calculate circumference values. These have advantages of reducing artificial engineering design during the algorithm process as well as the accumulation of possible errors. The proposed anthropometry method can be promoted to various practical applications due to its simplicity.

B. HUMAN BODY INFORMATION EXTRACTION BASED ON DensePose
DensePose algorithm is an intensive body posture estimation by Facebook and INRIA together to realize an understanding of human body in images. Generally speaking, this framework can understand the human body in casual dressing accurately and accurate different body parts.
It can be seen from Fig.2 that DensePose makes intensive prediction of body parts (Patch) and UV texture maps after body examination by using the full convolutional network [41]. Heat mapping features of Patch, U and V are 56 × 56. The value range of Patch is [0,24], while the value range of U and V is [0,1]. It can be seen from Fig.3 that the In this experiment, it compares the effects of using features of Patch channel only and the effects of adding features of UV channels. It is found that the predicted values of each circumference indexes were basically the same no matter whether the features of UV channels were added in or not. Therefore, this study used the features of Patch channel only to prevent increasing hardware resource consumption because of the excessive dimensions of features. Moreover, DensePose was proved to be poor in predicting the data of head, palms and soles of human body. Hence the influences of these three body parts on the prediction precision of circumferences were compared. Results demonstrated that these three body parts hardly influenced the prediction precision of circumference indexes. VOLUME 8, 2020

C. ESTABLISHING A REGRESSION PREDICTION MODEL BY INTEGRATING IMAGE AND ATTRIBUTE FEATURES
LightGBM was launched by Microsoft in 2017. As an improved version of XGBoost [42], LightGBM decreases the time complexity of sample processing and improves the expandability of the algorithm. In this study, the LightGBM algorithm was applied mainly for effectively selecting the high-dimensional features extracted by DensePose algorithm, thus resulting in reducing the feature complexity and assuring precision of body size prediction.
Through a comparative experiment, this study combined eigenvectors of images, height and body weight as the feature input of LightGBM. Besides, GBDT used boosting and the target loss function applied the root-mean-square error (RMSE) and mean absolute error (MAE): where p i refers to the predicted value, y i refers to real manually measured value and N is the number of samples. The top 60 points obtained in the hipline prediction model of adult males are shown in Fig.4. In the experimental results, the importance of 6286 input features was examined and the body weight was proved to be the most important index (often occupied the top position) during the construction of each body part prediction model. Differently, the height takes the front ranks according to the importance, but its rank was unstable. For example, the 6285 th and 6284 th features in Fig.4 represent height and body features. Attentions shall be paid to that the subscripts of features were counted from 0.

A. EXPERIMENTAL PLATFORM AND DATASET
The experimental platform includes the mobile end and service end. The mobile end in Fig.5 mainly shoots front and back images of human body. It is equipped with an Android or iOS system. Image resolution was unified to 1080 × 1920. Moreover, input information covers gender, height and body weight. The backstage service end is used for data training and prediction. It is equipped with a Ubuntu 64-bit system, GeForce RTX 2070 8G display card and 24G memory.
In casual scenes, respondents shall be covered completely in lens and no unrelated person is allowed in the background. Respondents are not required to wear tights and they are only required to expose hands, foots and some body weights, without jewelries like watch as much as possible. Front standing posture has a basic requirement of making fists to be slightly away from the body, making two feet slightly away from each other, and eyes looking front. The side standing posture requires respondents to making fists and close to the middle of hips, feet together and eyes looking front.
Collected data were divided into adult males and adult females aged between 20 45. In the measurement, 46 adult males and 59 adult females were invited. Measurement indexes included height, body weight, neck circumference, shoulder width, chest circumference, waistline, hipline, arm circumference, elbow circumference, forearm circumference, wrist circumference, thigh circumference, knee width,  calf circumference and ankle circumference. Among them, the height and body weight were used as input data, while the other 13 indexes were used as the prediction tag values. Manual measurement was adopted and the mean of three measurements was taken. Measuring tools included 2 flexible rules, 2 doctor's type scale, height chart and 3 cell phones. It can be seen from Table 1 and Table 2 that the minimum, maximum, mean and standard deviation of each index measured by 3 cell phones are listed. Body weight used the unit of 500g, and the rest indexes all used the unit of cm. Image data were collected from casual scenes. The total number of male respondents was 46 × 3 = 138 and the total number of female respondents was 59 × 3 = 177.

B. PREDICTION EXPERIMENTS OF CIRCUMFERENCE INDEXES OF ADULT MALES
The final training strategy was applied: 6-fold cross validation was performed to data of 46 respondents, and then divided into {8, 8, 8, 8, 7, 7}. Attentions shall be paid that 138 samples were not divided randomly under an unconditional state. Comparisons between the predicted results and manual measurements of chest circumference, waistline and hipline of adult males are shown in Fig.6-Fig.8. The x-coordinate represents the specific sample and the y-coordinate expresses the circumference value. RMSE, MAE, maximum error, minimum error and mean error rate of predicted results and manual measurements of 13 indexes are shown in Table 3.

C. PREDICTION EXPERIMENTS OF CIRCUMFERENCE INDEXES OF ADULT FEMALES
The final training strategy was the same with that of the scenario of adult males: 6-fold cross validation was performed to data of 59 respondents, and then divided into {10,10,10,10,10,9}. Attentions shall be paid that 177 samples were not divided randomly under an unconditional state. Comparisons between the predicted results and manual measurements of chest circumference, waistline and hipline of adult females are shown in Figs.9-11. The x-coordinate represents the specific sample and the y-coordinate expresses the circumference value. RMSE, MAE, maximum error, minimum error and mean error rate of predicted results and manual measurement of 13 indexes are shown in Table 4.

D. RESULTS ANALYSIS
Experimental results about the errors of different indexes of adult males are shown in Table 3. For MAE, the minimum is observed at wrist circumference (0.4cm) and the maximum is achieved by the waistline (3.4cm). Hipline presents the minimum mean error rate (1.9%), while the arm circumference presents the maximum mean error rate (4.5%). The index with high MAE might not have a high mean error rate and vice versa. This is caused by the relatively long length of indexes. Table 1 and Table 3 show that among 13 indexes for prediction, the minimum and maximum chest circumference, waistline and hipline fluctuate greatly, but MAE may not be the highest. The thigh circumference is 1.9cm and hipline is 1.8cm. This indicates to some extent that prediction precision of indexes is not fully controlled by fluctuation amplitude of the data length.
Experimental results about errors of different indexes of adult females are shown in Table 4. or MAE, the minimum is observed at wrist circumference and elbow circumference (0.5cm), and the maximum is achieved by the chest circumference (2.8cm). Elbow circumference presents the minimum mean error rate (2.3%), while the knee circumference shows the maximum mean error rate (5.0%). Knee circumference shows the maximum mean error rate and the maximum error. According to the observation of maximum error samples of the adult female dataset, it finds that samples  with the maximum error differ significantly from samples with the secondary high error. Hence, the possibility of relative data anomaly caused by manual measurement error is not eliminated.
It can be seen from Table 3 and Table 4 that the total MAE of chest circumference, waistline and hipline is higher than those of the rest indexes, but the mean error rate may not be high. In addition, the comparison of the MAE of waistline of males and females reveals that the MAE of waistline of males is the highest, while the MAE of waistline of females is the smallest. This could be interpreted that the waist of males is more difficult to be identified than females and it is easy to cause mistakes during manual measurement.
It can be seen from broken line graphs of chest circumference, waistline and hipline in Figs.6-10 that manual measurement and predicted results generally keep a consistency. Generally, the consistence between broken line trends of manual measurement and predicted results is negatively correlated with MAE, which is proved by hipline of adult males in Fig.8.

V. CONCLUSION
In this article, we propose a noncontact, image-based method for measuring the human body parts for forensic applications in smart cities. We design our method to be noncontact and purely image-based so that the forensic evidence can be collected in an efficient manner. To address difficulties in size measurement, low universality and poor convenience of existing noncontact anthropometry methods, a new end-to-end anthropometry method based on two views is proposed. The proposed method decreases error accumulation during the intermediate processes and it is easy for promotion. In the experiments, 15 indexes of body size are collected. Among them, the height and body weight are used as inputs and the rest 13 indexes are used as predictions. The error between the predicted results and manual measurement is calculated by an end-to-end method. According to our experimental results, there are some errors, but they can basically be kept within the mean error rate of 5%. This demonstrates that the proposed method can be used in practical noncontact anthropometry scenarios.