Deep Neural Networks With Distance Distributions for Gender Recognition of 3D Human Shapes

Automatic human gender recognition is an important and classical problem in artificial intelligence. Most of the previous gender recognition works are based on vision appearance and biometric characteristics. However, there are fewer gender recognition approaches for 3D human shapes. In this article, we propose a novel deep neural network learning method for gender recognition of 3D human shapes. Firstly, we introduce effective descriptors to distinguish male and female of 3D human shapes via probability distributions of biharmonic distances among points. Secondly, the above distances-based low-level descriptors are fed into a fully connected neural network for gender recognition. Furthermore, we construct a larger 3D human shape dataset for evaluation of the proposed gender recognition method by collecting and labeling human shape models. Compared with previous works, our method obtains higher recognition accuracy and has more advantages, such as posture invariant, robust to noises, and no need of landmarks or pre-alignment process.


I. INTRODUCTION
Gender is an important physiological and demographic attribute of people. Many real-life applications utilize gender information such as human-computer interaction, surveillance system, commercial development, video game, and social security. For example, automatic gender recognition of 3D human shape would make the body virtual fitting system more user-friendly. Thus, human gender recognition has received wide attention in the past decades. Comprehensive reviews are provided in [1], [2]. Based on types of features that differentiate between male and female, previous gender recognition works can be mainly divided into vision appearances and biometric characteristics approaches [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Lei Wei .
With the rapid development of laser scanning, technologies of gender recognition from 3D human shapes are also investigated [3]- [6]. Geodesic distances between some manually placed anthropometric landmarks for each of the scanned subject from the 3D human dataset CAESAR (Civilian American and European Surface Anthropometry Resource) [7] are used for gender classification [3]. However, this method is lack of generality. Since the other 3D human datasets usually don't have manual given landmarks, and computation of automatic landmarks is difficult. Positions of the landmarks are also treated as feature vectors in the K-Nearest Neighbor (KNN) or Support Vector Machine (SVM) classifier for gender recognition [5]. However, this kind of approach is posture variants, which only can classify genders of human shapes in the same posture. The proposed gender recognition methods [4], [6] are also posture variant, and use the traditional classifiers such as SVM, KNN, and Extra Trees Classification (ETC).
In view of the limitations of previous works, such as posture variant and need of landmarks or pre-alignment process, we propose a novel deep neural network method for gender recognition of 3D human shapes shown in Figure 1. Firstly, a new geometric descriptor, which is posture invariant, robust to noises, no need of landmarks or prealignment process, and can process shapes with different numbers of vertices, via probability distributions of biharmonic distances is proposed for distinguishing gender. Secondly, considering the success of geometric deep learning [8], the above distances based descriptors are fed into a fully connected neural network for gender classification. The proposed method obtains higher accuracy than previous works. Our main technical contributions can be summarized as follows: • Propose an effective descriptor using probability distributions of biharmonic distances for distinguishing male and female 3D human shapes, which is posture invariant and no need of landmarks.
• Introduce a novel deep neural network from the above distance-based low-level descriptors for gender recognition of 3D human shapes.
• Construct a larger 3D human shape dataset with more than one hundred thousand of shapes for the evaluation of gender classification and future research. The rest of this article is organized as follows. Related works are summarized in Section II. In Section III, our deep neural network learning method for gender recognition of 3D shapes is introduced. Experimental results and performance evaluation of our approach are described in Section IV. At last, we give a conclusion and point out some possible future works.

A. GENDER RECOGNITION FROM 3D HUMAN SHAPES
Compared with the previous works of gender recognition based on vision appearances and biometric characteristics [1], [2], there are fewer approaches of gender recognition from 3D human shapes in the past [3]- [6].
Pairwise geodesic distances between the 73 manually placed anthropometric landmarks on shapes from the 3D human dataset CAESAR are used as features for gender recognition, where the classifiers are the traditional linear discriminant function, bayesian decision boundary, and SVM [3]. The anthropometric landmarks limit the generalization of this method for the other shape datasets, which usually are lack of consistent landmarks.
The other hand-crafted features are also used for gender recognition of 3D human shapes, such as distributions of normals on the chest regions [4], coordinates of the 73 landmarks of shapes from CAESAR dataset [5], and positions of all vertices [6]. Different classifiers are conducted on the above hand-crafted features, such as SVM, KNN, ETC, random forest, and so on. However, these three methods are posture variant, i.e., only can recognize genders of shapes in the same posture.
In addition to the disadvantages of posture variant and requiring landmarks or pre-alignment, another limitation of the above previous works is that the training datasets with about one or two thousand shapes are a biter smaller, which restricts the generalization ability of the learning. Our method is both posture invariant and no need of landmarks or prealignment process and is trained on a larger dataset with more than one hundred thousands of 3D human shapes.

B. DESCRIPTORS VIA GEOMETRIC DISTANCES
Being invariant to isometric deformations makes the intrinsic distances, for example, geodesic [9] and Laplacian spectral distances [10], natural and useful for analysis of non-rigid 3D shapes.
The geodesic distance between two points is the length of the shortest path constrained on the shapes. Geodesic distances between the manually placed anthropometric landmarks have been used for gender recognition of 3D human shapes [3]. A deep learning method for 3D shapes retrieval with geodesic moments is proposed in [9]. However, the geodesic distances have several drawbacks such as being sensitive to topological noises and not globally shape-aware.
Spectral distances, e.g., biharmonic [11], diffusion [12], and commute-time [13] distances, can be defined through a filtering of the Laplacian eigenfunctions and eigenvalues, which are widely used in several applications [10]. The distributions of all pairwise spectral distances are used for unsupervised 3D shape recognition [14], [15], where dissimilarity criteria of the discrete histograms are the well defined χ 2 or earth mover distances. Descriptors of the biharmonic and commute-time distances are also used for shape retrieval [13], [16]. In this article, we propose a novel descriptor for classifying genders of 3D human shapes using the probability distributions of biharmonic distances from each point.

C. GEOMETRIC DEEP LEARNING
In the past years, deep learning, and in particular Convolutional Neural Network (CNN), has shown the ability to learn powerful image features from large collections of examples. Different from images whose grid-based representation is simple and regular, 3D geometric shapes usually have a variety of complex representations. Thus, there is no an universal framework for 3D geometric deep learning, and some classical works are proposed, such as deep learning form low-level hand-crafted features [17], multi-view CNN [18], voxel-based 3D CNN [19], point-based CNNs [20], graph or manifold based CNN [21] and so on. Please refer to the survey [8] for details. Each of the above 3D geometric deep learning approaches has advantages and disadvantages based on the applications. Another factor limiting the usability of deep learning on geometric data is that acquisition of 3D data is much more difficult than the 2D images and videos. In this article, the deep neural network is operated on the low-level biharmonic distance-based features for gender recognition of 3D human shapes.

III. METHODS
Because the 3D human shapes usually have different connectivity, it is difficult to generalize the CNN on the geometric shapes directly. The descriptors via probability density FIGURE 2. Left: the biharmonic distances from the red source vertices, where red color indicates a higher value and blue represents a lower value. Right: the corresponded probability density functions, which is different between on a male shape at the top row and a female shape at the bottom one.
functions of biharmonic distances proposed in Section III-A are fed into a deep neural network for gender classification described in Section III-B.

A. DESCRIPTORS VIA DISTRIBUTIONS OF BIHARMONIC DISTANCES
The biharmonic distances proposed in [11] are posture invariant, locally isotropic, globally shape-aware, and robust to noises, which are widely used for shape analysis. Distributions of pair wise biharmonic distances are used for shape recognition [14], [15]. The functional biharmonic distance map is defined as signatures for non-rigid shape retrieval [16]. In this section, we define novel descriptors via probability density functions of biharmonic distances for distinguishing male and female 3D shapes.

1) CONTINUOUS SETTING
The biharmonic distance d(x, y) between two points x and y on a 3D human shape S is defined as follows: where 0 = λ 0 < λ 1 ≤ λ 2 . . . are eigenvalues of the positive defined Lapalace-Beltrami operator and φ k is the corresponding eigenfunction of the eigenvalue λ k . For a fixed source point x, the cumulative distribution function F x (δ) of biharmonic distances from x to the other points on S is denoted by where χ is the indicator function and µ is the area measure.
In this way, F x (δ) is the area of points whose biharmonic distances to x are no larger than δ. F x (∞) = µ(S) is the area of the 3D human shape S. We use the normalized cumulative distributionF The corresponding probability density function is defined as the derivative f x (δ) = d dδF (δ). The probability density functions reflect many differences between male and female 3D human shapes shown in Figure 2. In this figure, the male shape is taller and thinner, and the biharmonic distances on the feet from the source point are larger than that of the female shape. The probability density function of the male shape has more spans on the horizontal axis compared with the one of the female shape. Thus, in this article, we use the probability density function of biharmonic distances from every point for classifying gender.

2) DISCRETE SETTING
The 3D human shape S is represented by a triangle mesh M with n vertices v i , i = 1, 2, . . . , n. We use the classical cotangent weight scheme with mixed area normalization for the discretization of the Laplace-Beltrami operator [22]. The biharmonic distance is approximated by the first K summands in Equation (1) as using the first K smallest eigenvalues (except the zero eigenvalue) and their eigenvectors of the discrete Laplace-Beltrami matrix [11]. The probability density function f i (δ) for vertex v i is discretized as a p dimensional histogram L is the length of bin interval and µ(v k ) is the mixed area of vertex v k defined in [22]. Since the number of mesh vertices differs from one shape to another one, the matrix B = (b ij ), i = 1, 2, . . . , n, j = 1, 2, . . . , p of the probability distributions of biharmonic distances cannot be consistently used. To avoid this limitation, we use the probability distribution matrix D = B T B. Since the matrix D is symmetric, we flatten its upper triangular part to get a m = p * (p + 1)/2 dimensional vector d as the final Probability Distribution Descriptor (PDD for short) for gender recognition of 3D human shapes. The probability distribution matrix and its flatten vector are demonstrated in Figure 3, which have many differences on male and female 3D human shapes.

B. DEEP NEURAL NETWORK
We use the Deep Neural Network (DNN for short) for the gender recognition of 3D human shapes, where the input layer is the m dimensional vector of the descriptor proposed in Section III-A and the output layer is a two dimensional vector for the probability scores of males and females demonstrated in Figure 1. From the input layer to the output layer, there are two hidden layers with h 1 and h 2 hidden units respectively. Thus, the architecture of our four layers fully connected neural network is m-h 1 -h 2 -2.
If the vector in l-th layer is denoted as x l , l = 1, 2, 3, 4. The feed-forward maps are denoted by VOLUME 8, 2020 where W l and b l are the weight matrix and bias vector, and σ l is a nonlinear element-wise activation function. In this article, we use the ReLU activation function except for the last one, which incorporated the well-known softmax function for gender classification. During the training process, the input data include descriptors and labels χ train = {d i , y i }, i = 1, 2, . . . , N train , where d i is the PDD and y i is the corresponding gender label of i-th 3D human shape, i.e., 0 for male and 1 for female. The minimized objective function is the cross entropy error, and {W l , b l }, l = 1, 2, 3 are the parameters to be learned. Pseudocodes of the training process are listed in Algorithm 1. After the training, given a new 3D human shape S, we first compute its PDD d in Section III-A and input it to the above neural network for predicting the gender labelŷ using the learned parameters.

IV. EXPERIMENTS
In this section, we conduct extensive experiments to demonstrate the effectiveness of the proposed deep neural networks using the probability distribution for gender classification of 3D human shapes. We first present the used datasets and then describe the implementation details.
Datasets. The previous works for gender recognition of 3D human shapes are mainly based on the commercial CAESAR dataset [7], where only one or two thousand 3D shapes are used for training and testing. The relatively smaller training dataset usually limits the learning generalization ability.
We collect 3D human shapes from some public datasets and manually label the genders, such as SCAPE [ [37], and DUTH [38], which are mainly aimed at registration, correspondence and other applications. The constructed 3D Human Shapes Gender Recognition Dataset (3D-HSRD for short) consists of more than one hundred thousand shapes with various appearances and postures for evaluation of the proposed method. Randomly selected shapes of the 3D-HSRD are illustrated in Figure 4. For the ethical issues and privacy protection of gender classification, we use opaque rectangle regions to cover the head regions of shapes in all figures for privacy protection. The numbers of male and female in the 3D-HSRD are listed in Table 1, where the 3D human shapes are classified into three types, i.e., real scans, registration scans, and synthetic shapes. The real scans are directly acquired in the real world with the specific multi-stereo 3D scanners. The registration scan is obtained by a non-rigid deformation of a given template shape to approximate the real scan. The synthetic shapes are generated by specific softwares or parametric modeling methods. We will make the shapes and gender labels of the 3D-HSRD public online for future research.
Performance evaluation measure. For a test 3D human shape dataset χ test = {S i } and true gender label y(S i ) of each human S i , i = 1, 2, . . . , N test , the classification accuracy of a gender recognition method on this test dataset is computed as follows whereŷ(S i ) is the predicted gender label of the gender recognition method and the molecule in the above fraction is the number of humans that have correct classification labels. Implementation details. All experiments are conducted on a laptop with an Intel Core i5-6200U CPU with 2.4 GHz and 12GB RAM. The algorithms for computing the PDDs in Section III-A are implemented in MATLAB, and the source codes of the DNN in Section III-B are written in python using the Tensorflow.
Because the biharmonic distances are not scaled invariant, we normalize each 3D human shape S to have a unit area, i.e., µ(S) = 1.0. The default value of the parameter K is 100, which is the number of the first smallest eigenvalues of the Laplace-Beltrami operator for computing the approximated biharmonic distances. The length of bin interval L is set to d max /nBins, where d max is the maximum biharmonic distance for all the 3D human shpes in the 3D-HSRD, and nBins = 99 is the number of bins. We use the histc function in MATLAB for computing the histogram in Equation (4) and obtain a 100 dimensional vector. Thus, the dimension of the final PDD d for each shape is 5, 050.
The sizes of the two hidden layers in Section III-B are set as h 1 = 256 and h 2 = 64 respectively. Thus, the architecture of our four layers deep neural network is 5, 050-256-64-2 shown in Figure 5. We also add two dropout layers with 50% rates after the two hidden layers to prevent overfitting and enhance the generalization ability of the network. During the experiments, the 3D-HSRD is randomly split into a 70% train set and a 30% test set. Computational times and complexity. The mainly computational time of the proposed method is on the computation of the PDD descriptor. The computational complexity for computing the PDD is O(n 2 ), where n is the number of vertices of 3D human shape. It will take about three days for computing all of the PDDs of shapes in the 3D-HSRD. Table 2 shows the average computational times in ten times to compute the PDDs of five representative 3D human models. The training and testing times of our DNN on the 3D-HSRD are 39.47 minutes and 4.08 seconds.
Parameter sensitivity. The length of a descriptor is usually domain-specific and selected through experimentation. We set the number of nBins to 63, 99, and 127 respectively for SHREC 2014 Real and SHREC 2015 through ten times experimentations. Each dataset is randomly split into 70% for training and 30% for testing. While increasing the VOLUME 8, 2020 number of bins, the accuracy rate will increase slightly, but the time to calculate the descriptors and the training will also elevate as recorded in Table 3. Therefore, nBins = 99 is chosen for the trade-off between the computational times and accuracy. We also try to change the number of hidden units in the two hidden layers, the recognition accuracy is also above 99.4%. Furthermore, if the number of hidden layers is altered to be three or four, the deep neural network also obtains about 99.0% accuracy. In addition, when using 10% shapes of the 3D-HSRD for training, the gender classification accuracy on the remaining 90% shapes is 99.0%. It is verified that the proposed training method is not overfitting.

1) COMPARISONS WITH PREVIOUS WORKS
We run the proposed DNN and the SVM as a baseline algorithm on the PDDs of shapes in the 3D-HSRD ten times and compare their average accuracies listed in the last two rows in Table 4. It can be seen that the mean classification accuracy 99.5% of the proposed method PDD+DNN is higher than 82.1% of the PDD+SVM. The average training and testing times of our DNN and the SVM in ten times are 0.659 and 2.074 hours. During the training process, some times of the SVM are not convergent. So the averaged accuracy of the SVM is lower and the computational time is longer.
In Table 4, we also compare our method with previous works on gender recognition of 3D human shapes, where the classifiers are all of SVM, and the hand-crafted features are Geodesic Distances (GD) between landmarks [3], Normal Distributions (ND) on the chest region [4], Landmarks Positions (LP) [5], and Vertices Coordinates (VC) [6] respectively. Firstly, the previous works of gender recognition are either posture variant or need of landmarks and prealignment, while our method has the advantages of both posture invariant and no need of any pre-processing step. Secondly, the proposed geometric deep learning method obtains higher accuracy on a much larger dataset.

2) COMPARISONS WITH OTHER GEOMETRIC DEEP LEARNING
As far as we know, the proposed method is the first one using geometric deep learning for gender recognition. Previous geometric deep learning are mainly used for shape classification [18], [39], filtering [19] and correspondence [21]. It is unable to use the deep learning frameworks of filtering [19] and correspondence [21] for gender recognition. We compare our method with the geometric deep learning for shape classification such as MVCNN [18] and PointNet [39] on the 3D-HSRD for gender recognition in Table 5. Due to the classic MVCNN and PointNet are mainly aimed at shape classification, they obtain lower accuracies on the new application of gender recognition. Figure 6 demonstrates that the proposed method can recognize the right genders of some 3D human with confused sex characteristics, which are difficult to distinguish their genders by human eyes. The tall and strong female shapes in orange color look like a male, but the proposed method can still recognize them as female. We also can rightly recognize female shapes in blue color with confused gender features, such as prominent chests, narrow shoulders, and slim figures. The above powerful recognition ability is mainly due to the 3D-HSRD has enough training samples with various human shapes and postures demonstrated in Figure 4.

B. ROBUST TO NOISES
Due to the biharmonic distances are less sensitive to noises [11], the proposed gender recognition method (PDD+DNN) is robust both to gaussian and topological noises.

1) GAUSSIAN NOISES
We perform the proposed method on 3D human shape artificially corrupted by Gaussian noises on the AdobeData [26] dataset with 2,000 dressed human shapes, where the variance is P% of the mean edge length of each shape. Three group experiments are shown in Table 6, where shapes in the train and test sets are either clean or corrupted by Gaussian noises in different levels. In each time, the dataset is randomly split into a 70% train set and a 30% test set. It is demonstrated that the proposed gender recognition method is robust to the     scanned measurement noises, which is mainly due to the PDD based on the biharmonic distances that are insensitive to noises [11] shown in Figure 7. Figure 8 demonstrates that the proposed method can recognize genders of human shapes with real world noises, where male and female shapes are in blue and orange colors respectively.

2) TOPOLOGICAL NOISES
Due to inaccuracies in the scanning and merging process, the scanned 3D human shapes inevitably have some topological noises, such as holes, handles, and tunnels. It is desirable that the gender recognition method is robust to topological noises.
However, the previous works are usually based on geodesic distances which are sensitive to topological noises [3], [5]. Since the biharmonic distances are insensitive to topological noises [11], the proposed PDD and the gender recognition method are robust to topological noises shown in Figure 9.

3) PERFORMANCE ON REAL SCANS
We also perform the proposed method on the real scan dataset MPI D-FAUST scan [37], which consists of 20,756 male and 20,464 female real scans with larger real world topological noises shown in Figure 10. Due to the scans have about one hundred thousand vertices, we simplify each scan to ten thousand vertices for fast computation of PDD. This dataset VOLUME 8, 2020  is randomly split into a 70% train set and a 30% test set. The testing accuracy of the proposed method (PDD+DNN) is 96.7%, which is much better than the PDD+SVM with an accuracy of 73.5%. This is possible because the gender recognition is a non-linear problem, while the SVM is a linear classier.

V. CONCLUSION, LIMITATIONS AND FUTURE WORKS
In this article, a novel deep learning method for gender recognition of 3D human shapes is introduced. Our gender recognition method mainly consists of two steps, i.e., extracting descriptors based on probability distributions of biharmonic distances and deep neural networks for gender recognition. Furthermore, we construct a larger 3D human shape dataset called 3D-HSRD with more than one hundred thousand of shapes for the evaluation of gender classification and further research of the other applications. Compared with previous works, the proposed approach obtains higher recognition accuracy and has more desirable properties, such as posture invariant, robust to noises, and no need of landmarks or preregistration process.
However, the proposed method still has some limitations. Firstly, fewer shapes are still recognized incorrectly as shown in Figure 11, which are randomly selected form the tested shapes with wrong predicted genders. Secondly, compared with the much larger ImageNet in computer vision, the 3D-HSRD is still relatively smaller, which limits the generalization ability of the learning method to some extent.
In the future, we would use parametric generation models, such as SCAPE [23], SMPL [40], CAPE [41], to synthesize more 3D human shapes, especial the dressed shapes, for enlarging the 3D-HSRD. Furthermore, we also investigate the deep learning methods for recognition of the other attributes of 3D human shapes, for example, height, weight, posture, and so on.
HUI WANG received the Ph.D. degree in computational mathematics from the Dalian University of Technology. He is currently an Associate Professor with the School of Information Science and Technology, Shijiazhuang Tiedao University, China. His research interests include computer graphics, digital geometry processing, and image processing.
XIAOYANG LIN received the B.S. degree from the School of Mathematics and Computer, Hebei Normal University for Nationalities. She is currently pursuing the M.S. degree in computer technology with Shijiazhuang Tiedao University.
NANNAN LI received the B.S. and Ph.D. degrees in computational mathematics from the Dalian University of Technology, Dalian. She has been an Associate Professor and held a postdoctoral position with the School of Information Science and Technology, Dalian Maritime University. Her research interests include computer graphics, differential geometry analysis, and machine learning. XIUPING LIU received the Ph.D. degree in computational mathematics from the Dalian University of Technology, China. She is currently a Professor with the School of Mathematical Sciences, Dalian University of Technology. Her research interests include shape modeling and analyzing, and computer vision. VOLUME 8, 2020