A Novel Multiface Recognition Method With Short Training Time and Lightweight Based on ABASNet and H-Softmax

In order to solve the problem of low face recognition accuracy of traditional algorithms and excessive long training time in deep learning methods, a novel lightweight and short training time multiface recognition method is proposed in this paper. Firstly, an extraction model of facial feature vectors is established based on local binary mode and principal component analysis. Secondly, the beetle antennae search algorithm (BAS) is optimized using adaptive factors, and the ABAS algorithm is proposed. This paper uses the ABAS algorithm to optimize the initial threshold of the neural network and proposes the ABASNet method. Thirdly, ABASNet method is combined with the face features extracted by the face feature vector extraction model to be used for multi-face classification tasks. Finally, H-softmax (hierarchical softmax) is used to replace softmax in neural networks. By reducing the amount of calculation in the process of multi-category face classification, the training time of the ABASNet network is reduced. Through ablation experiments and comparative experiments with various methods such as IKDA + PNN algorithm and PCANet algorithm, to verify the accuracy, robustness and short training time of the ABASNet method. Among them, the maximum accuracy of the method in the ORL face dataset, ExtYaleB face dataset and FERET face dataset is 99.35%, 99.54% and 99.18% respectively. In addition, actual test results indicate that the training time and recognition time of the method in this paper are 21 seconds and 0.06 seconds respectively, which has demonstrated the lightweight and real-time performance of the proposed method. At the same time, more test results show that the proposed ABASNet method can also be implemented in embedded devices and other classification tasks.


I. INTRODUCTION
Multiface recognition has high research value due to its wide range of application and market [1]. At the same time, the recognition of the same face is easily affected by different light, posture and expression. Overcoming these differences The associate editor coordinating the review of this manuscript and approving it for publication was Wenming Cao .
is the key to extract stable facial features, which is of great significance for subsequent facial recognition.
Face recognition methods are mainly summarized as traditional image processing methods and deep learning methods. Traditional face recognition and deep learning recognition methods are similar in the recognition process, the main steps include two stages of feature extraction and classification [2]. Traditional face recognition methods use face feature extraction methods to extract effective features, and then use numerical calculation or machine learning methods to classify face images [21]. The traditional face recognition method requires a small amount of face data, but the accuracy is low, and the classification ability for large-scale face data is poor [3]. The deep learning method is to extract features through a convolutional neural network, and then use fully connected layers for classification [17], [18] and these methods have higher accuracy. However, the accuracy of deep learning methods depends heavily on the quantity and quality of data. It needs to be iterated through a large amount of data, so that the features extracted by convolutional neural networks (CNN) are more conducive to subsequent classification [19], [20]. Although the neural network method has high accuracy, the training time and the amount of training data required are extremely large [4]. Therefore, how to use a few face data set and short time to predict and recognize a mass of face categories has become a valuable research. In this paper, based on the ideas of beetle antennae search algorithm (BAS) and hierarchical softmax (H-softmax), this paper proposes a novel multiface recognition method which has higher accuracy and better robustness even with little data.
The main contributions of the paper are summarized as follows: • The BAS algorithm is used to optimize the neural network, which solves the problem that the traditional neural network algorithm is easily affected by the initial value and the initial threshold and falls into the local optimal problem. An adaptive factor is introduced to reduce the step size and learning rate when the network is close to convergence to reduce the fluctuation of the neural network increases the robustness of the algorithm. And replacing the softmax layer with the H-softmax function can improve the speed of training and classification in multiclass data. And ABASNet method is proposed, which can be extended to other image classification fields.
• Traditional deep learning methods require large amounts of face data and high-performance hardware. However, the architecture of ABASNet is lightweight and does not require large amounts of data for learning. Instead of using traditional convolutional neural networks to extract features through repeated iterations, it uses local binary patterns and principal component analysis to extract facial features to reduce training time. Then we use neural network to perform face recognition on the proposed features. The proposed method in this paper has excellent performance in several challenging benchmark face recognition datasets, and its robustness and accuracy are better than many similar methods. It has the advantages of low hardware requirements, high accuracy and short training time, and can be used in embedded devices and mobile applications that require shorter training time. The paper is organized as follows. Firstly, the related work of the predecessors was introduced in Section II. Then, a multiface recognition network model is established in Section III and described the specific implementation process of multiface classification. Section IV verifies the accuracy and robustness of the multiface recognition method based on LBP-PCA-ABASNet in the case of a small number of samples, and verifies the rapidity and adaptability of the proposed method in large-scale face categories. At the same time, the lightweight and effectiveness of the method is verified in a classroom setting. Section V is the conclusion and future work.

II. RELATED WORK
Timo et al. [5] used local binary patterns Application to face recognition. This method has fast recognition speed, but in large-scale face datasets, its classification ability is poor and its accuracy is low. Chan et al. [6] used convolutional layers for feature extraction, and then uses fully connected layers for classification. The advantage is high accuracy, but the disadvantage is that each face requires a large number of data sets, the hardware requirements are high, and it takes a long time to train. Therefore, how to use a few face data set and a short time to predict and recognize a mass of face categories has become a valuable research.
Therefore, Aijia et al. [2] used Kernel LDA as a feature extraction method, and use probabilistic neural network (PNN) for face recognition. This method combines traditional image processing method and neural network method, which is not only fast but also requires less data. Referring to this idea, this paper uses traditional methods to extract facial features, and then uses neural networks for classification. Timo et al. [5] used local binary extraction of images. This method makes the extracted facial features having the advantages of unchanged light and unchanged rotation. Compared with the feature extraction of convolutional neural networks, features can be directly extracted without training weight, which can accurately and quickly extract facial features.
However, the extracted features of a face actually contain many unnecessary features. To solve this problem, Liu et al. [23] presented a novel dimension reduction method termed discriminative sparse embedding (DSE) based on adaptive graph. By projecting the original samples into a lowdimensional subspace, DSE learns a sparse weight matrix, which can reduce the effects of redundant information and noises of the original data, and uncover essential structural relationship among the data. Liu et al. [24] presented a novel dimension reduction algorithm termed linear regression classification steered discriminative projection (LRC-DP) for dimension reduction. LRC-DP not only fits LRC well, but also seeks a linear projection, in which the ratio of between-class reconstruction errors to within-class reconstruction errors is maximized in the transformation space. Chan et al. [6] used principal components analysis to reduce feature dimensions. Principal components analysis simplifies data structure by reducing data dimensions. In order to complete the extraction of principal features in the matrix data, the data in the high-dimensional space is projected into the low-dimensional space by orthogonal transformation. The eigenvectors with large eigenvalues are retained and the eigenvectors with small eigenvalues are removed. When this method is used to solve a certain subspace, the matrix dimension is reduced and the most principal feature vector is obtained. We use the principal component analysis method to filter the face features and prepare for subsequent face recognition and classification.
Predecessors had done the following related work for the lightweight face recognition method. In order to lighten the structure of the neural network, Nassih et al. [22] had adopted the representative deep learning methods which are deep belief network (DBN) and stacked auto-encoder (SAE), to initialize deep supervised neural networks (NN), besides of back propagation neural networks (BPNN) applied to face classification task. Liu et al. [25] proposed a robust manifold embedding (RME) algorithm in this paper, which can fully use the class label information and correctly capture the underlying manifold structure. Ma et al. [26] proposed a lightweight privacy-preserving adaptive boosting classification framework for face recognition. Li et al. [27] proposed a lightweight face recognition algorithm: LightFace, which is based on depthwise separable convolution.
Liu et al. [7] and Jiwon et al. [9] conducted research on the robustness and accuracy of large-scale face recognition and classification. In the previous work, people combined BP neural network with softmax to classify the faces, but the the robustness of the method is found to be poor. This neural network algorithm relies heavily on the initial weights, resulting in a large difference in training results each time. Liu et al. [25] proposed a robust manifold embedding (RME) algorithm, which can fully use the class label information and correctly capture the underlying manifold structure to enhance robustness. Aijia et al. [2] constructed a face recognition method based on GA-BP neural network algorithm and the test results show that using the optimization algorithm to optimize the initial weight and threshold of BPNN and then carry out network training can greatly avoid the network easily falling into local optimum due to the random initialization of the initial weights and thresholds.
Meanwhile, BAS algorithm is a bionic algorithm, which is obtained by simulating the feeding behavior of beetles. Compared with the traditional population optimization algorithm, the algorithm has a higher convergence rate because its optimization mode is a single beetle [10]. At the same time, the algorithm has few parameters, strong universality and robustness. The algorithm does not need to know the specific fitness function structure to optimize the fitness function and has less constraint on the initial conditions [11]. Therefore, the BAS algorithm is used to optimize the neural network. Compared with other bionic algorithms, BAS has the characteristics of fast convergence, simple coding, and easy implementation [10], [11]. This paper also adds adaptive factors to improve the stability of the BAS algorithm, and uses a fusion algorithm to build a multiface classification network, and proposes the ABASNet method for face recognition.
Multiface classification is a significant part of face recognition. Especially in deep learning, softmax is a very common classification function, which can convert floating-point numbers between 0 and 1, to achieve the classification effect. Softmax normalized the input to a probability distribution for each face image assigned to each category. Softmax function is a monotone increment function, that is, the larger the input value is, the larger the output will be, and the greater the probability of the input image belonging to this tag will be. However, the traditional softmax method requires a large amount of calculation when predicting the probability of each category.
Hierarchical softmax(H-softmax) is a kind of large-scale classification method based on the idea of adaptive binary tree [12]. H-softmax is widely used in natural language processing. It reduces the calculation amount of softmax by constructing the huffman tree. H-softmax is often used to deal with the problem of too many types of vocabulary in natural language [13]. The huffman tree is a shortest tree with weighted path, and its characteristic is that the greater the weight, the leaf nodes are closer to the root node, that is, the greater the weight, the shorter the leaf node search path. Hierarchical softmax constructed according to this feature can shorten the search path of the target category reduce the classification time. Therefore, this paper uses H-softmax as the output layer function of ABASNet to classify the face category.

A. MULTIFACE FEATURE EXTRACTION BASE LBP AND PCA
Before face recognition, multiface features should be extracted and pretreatment, that require that accuracy of the face recognition does not change with the value of angle, scale and light. Therefore, the extracted features need to satisfy the invariance of light, scale and rotation. Scale invariance can be transformed to uniform size by interpolation. The method of local binary patterns (LBP) can extract histogram features in an image and are rotation-invariant and resistant to light interference [5]. Therefore, local binary patterns method is used to extract multiface features.
However, local binary patterns capture the frequency of bright and dark spots and smooth areas, edges in an image, which has the disadvantage of being unable to describe the structural information of the image. Meanwhile, the local features of different face images of the same person differ greatly. If an image can only get a histogram based on local binary patterns, information is lost due to local differences. Therefore, the original face image is divided into K (K=W * H) blocks according to W row and H column, and the local binary patterns histogram features of each block are calculated. Then, the histograms of all partitions are connected from top to bottom and left to right in order to form a 1 * K feature vector, which represents the local binary patterns histogram features of the whole face image. This method solves the problem of local difference information loss in face feature extraction with LBP.
As shown in the Fig.1, the face image is divided into W column and H row, with K blocks. The pixel value of each small block is coded by LBP operator, that is, a series of binary Numbers is obtained to describe the local texture features of the image by comparing the size of the central point pixel and the surrounding pixel value in the 3 × 3 (or W * H) field. And these binary numbers are connected end to end to form the binary code. The LBP histogram is calculated by counting the number of times 0 to 1 or 1 to 0 between adjacent bits in the binary code. The value of such model does not change with the rotation of the image, so it has rotation invariance. And as the light changes, all pixel values increase or decrease at the same time, without affecting their LBP calculation. A kind of corresponding way of coding is called a unified pattern. When LBP histogram statistics are performed, only the number of each unified binary mode is counted. This method can discard irrelevant features while retaining useful information of image, and reduce the number of LBP features. Its mathematical is formula (1).
where c a is the value of the central pixel point. c i is the value of surrounding pixels. v i is the coded value of LBP operator obtained around the center point. Area_LBP(V i ) is the binary coded value of LBP operator for each image block. Therefore, the binary coded value of Fig.1 is 11111001. In the Fig.1, every block has a corresponding coded value of LBP. The LBP operator can also be a circle with a radius of r, and the sampling point p is an arbitrary number [5]. LBP binary coded method is show in Fig.2, these are different LBP unified modes with different radii, different size blocks and different sampling points p. The parament r is radius, and the p is the number of sampling points. For sampling points that are not on pixels, the gray value is calculated by bilinear interpolation. For a face image, more blocks are segmented, the smaller the area of each block are segmented, and some key information may be lost. If fewer blocks are extracted, the local information description is not comprehensive enough. Both will affect the accuracy of multiface recognition. Therefore, in the Section III, relevant tests were carried out to select the optimal number of image blocks.
After obtaining the LBP histogram of each block, we connect the features of all blocks in a face together from top to bottom and from left to right to form a 1 * K dimension face feature vector. K is the number of dimensions extracted by LBP for each image. Then, we extract the features of L face images through LBP, and arrange it in rows to form an L * K -dimensional LBP multiface feature matrix. L is the number of input face images.
The method for dividing image into multiple blocks can improve the recognition rate. The more image blockes are divided, the better the local texture is described. However, as the number of blocks increases, the face image partition is excessive small and the number of dimensions of LBP histogram features increases sharply. Excessive dimensions will not only cause the classification speed to decrease, but also increase the interference caused by high-frequency noise. At the same time, too few dimensions will also lose statistical significance due to insufficient feature extraction.
In order to further improve image accuracy and neural network training speed, this paper uses the PCA (principal components analysis) to reduce the dimension of LBP features with excessive high dimensions. PCA simplifies data structure by reducing data dimensions, which also can optimize the multiface feature matrix of LBP. After extracting the LBP feature matrix for each face, N faces are treated as a whole, and the principal components are extracted by linear transformation, and the secondary components are removed to form a multiface feature matrix based on LBP-PCA. The specific calculation is show in formula 2 and formula 3.
where x is the mean value of LBP features of a type of face. P is the LBP feature corresponding to each face in a type of face image. N P is the number of face images of a type. x j is the average LBP feature of the j-th face. x is the sample mean VOLUME 8, 2020 of the training data set. N is the category of the face image.
where A is the error between the face image in the training data and the average image. s is the covariance matrix of the multiface image training sample, which is the real symmetric matrix.
The principal element direction extracted by PCA is the eigenvector of the covariance matrix s. Then arrange its K eigenvalues in order from large to small as [c 1 , c 2 , c 3 , · · · , c N ] = f (LBP_PCA) (4) where from c 1 to c N is the different face categorie. N is the number of face classes. LBP_PCA is the LBP-PCA matrix of multiface features extracted in the previous stage. f is the mapping relationship. In this paper, the multiface feature matrix fused by LBP and PCA is used as the network input. Since the number of imput images is N , and LBP-PCA extract the features of each face image into a 1 * K -dimensional feature vector, the number of feature dimensions of each image is K . Therefore, the input feature matrix is N * M-dimensional. Since the output of the traditional BP neural network is a probability value, this article adds the hierarchical softmax function to the output layer of the BP neural network, so that the output result of the neural network is not a probability but a specific category.
In the Fig.3, w 1,i is the weights between the input layer and the i-th neuron in the hidden layer. w i,n is the weight between the i-th neuron in the hidden layer and the n-th neuron in the output layer, and the number of hidden layer neurons is i. i is a dynamically varying value, which is determined by the input face feature dimension m and the output category n. where a is an invariant in the interval [1], [10], and the specific value depends on the actual input and output data. In formula 5, a is an empirical value. The amount of input data is n, and the output category is m. When the value of i is in the range of 0.5(m+n) to 0.5(m+n)+a, the neuron can obtain the optimal solution. The method to determine a is to traverse [1], [10] interval and calculate the accuracy of the proposed method under the test set in the same data set. Then, the a corresponding to maximum accuracy is the best value.
Although the common BP neural network can classify face categories, in the case of multi-faceted classification, the classification accuracy is not high and the classification speed is slow. Many existing studies have shown that using optimization algorithms to optimize the initial weights and thresholds of BPNN, and then perform network training, can greatly avoid the random initialization of the initial weights and thresholds of the network that makes the network easy to fall into local optimum, thereby improving accuracy and robustness [7]. Therefore, this article uses BAS algorithm to optimize BP neural network algorithm to improve the accuracy and robustness of the network.
BAS algorithm is called beetle antennae search algorithm, and its modeling steps are as follows: Step 1: Initializes the orientation of beetle by random vectors.
where rand() is a random function, that is used to generates random numbers from 0 to 1, which is also current position of k beetles. And k is also the data dimension that need to be optimized. eps is the smallest positive number.
Step 2: Select the next position of the Beetle.
where X left and X right are the candidate positions for the next step of the beetle respectively, X i is the current position of the beetle, x corresponds to the i-th value in the one-dimensional matrix rand(k,1). d' is the position from the beard, and i is the number of iterations of the beetle.
Step 3: Establish fitness function and calculate fitness values f (x right ) and f (x left ) according to formula (7).
Step 4: Iterate the position of beetle by formula (8).
where α i is the step size factor at the i iteration; sign() represents a sign function.
In BAS algorithm, the convergence speed of the algorithm is controlled by the step size factor. The smaller the step size is, the slower the convergence speed is, which may cause the function is easy to fall into local optimal. The larger the step size is, the faster the convergence speed is, and the stronger the global search ability is, which means that the probability of falling into local optimal is smaller. But it is easy to make the function oscillate. In order to make the algorithm have better optimization ability, a method of adaptively changing the step size factor is proposed in this paper, which is the ABAS algorithm. The detailed explanation is as follows. In the early iteration of the algorithm, in order to enlarge the overall search space of the solution space and accelerate the iteration speed, a larger step size factor was used. At the end of the iteration, the search solution tends to be stable. In order to make the solution more accurate, the step size factor needs to be reduced. In addition, since the step size factor decreases in each iteration, the smaller the initial step size factor is, the more probability function will fall into a local extreme. Therefore, a higher initial value of step size attenuation coefficient should be given in the range from 0 to 1. The calculation formula of adaptive step size factor is shown in formula (9). Establish a variable step factor between step3 and step4 of the BAS algorithm according to formula (9). (9) where f i is the current value of the fitness function. f min is the historical optimal value of the fitness. i is the number of iterations. α is the default step size factor (typically between 0.9 and 0.95). So α i is the current step size factor. n is the total number of iterations. Then, after modeling the ABAS algorithm, we use this algorithm to optimize the BP neural network, and propose ABASNet. The ABASNet method is put forward in the following steps: Step 1: According to formula (10), we determine the number of parameters that need to be optimized by the ABAS algorithm, which is also the number of weights of the neural network.
where m is the input of BP neural network and t is the number of neurons in the hidden layer. n is the number of faces output. k is the number of parameters optimized by the ABAS algorithm.
Step 2: Establish the step size factor of fitness function according to formula (9).
Step 3: In the back propagation neural network, the root mean cross entropy is used as the loss function. The smaller the loss function is, the better the prediction effect of the network will be.
Step 4: Initialize the parameters of the ABAS algorithm, select k dimensional initial parameter at random from 0 to 1, which is the initial test coordinates of the beetle, and save it.
Step 5: The fitness was calculated and stored x i+1 that satisfies the condition in bestY according to formula (8).
Step 6: Calculate the position of the left or right coordinates of the beetle according to formula (7).
Step 7: Update the beetle position according to fitness function formula (8), calculate the fitness value of the current position and make optional updates. Namely, the weights in BPNN are optimized.
Step 8: Complete the search process according to the fitness accuracy conditions (0.001 in this paper) and the maximum number of iterations (set as 50 in this paper). If either of the two satisfies, proceed to the next step, or return to Step 6 to continue the optimization.
Step 9: Generating the optimal solution. When the algorithm stops iteration, the nearest BestX is the optimal solution. BestY is also the initial value of weight in BP neural network. Based on the optimal initial parameters, a multiface local binary feature classification network is established.
Therefore, according to the strong iteration speed and wider global search capability of the adaptive beetle search algorithm and the prediction accuracy of BP neural network, the multiface feature matrix of LBP-PCA can be predicted and classified by ABASNet method. Based on these, the LBP-PCA-ABASNet multiface recognition method is proposed in this paper.

C. A RAPID MULTICLASSIFICATION METHOD BASED ON H-SOFTMAX
Multiface recognition is a multiclassification problem. In deep learning, softmax is a very common classification function. Assuming there are M categories, softmax calculation process is as the formula (11).
where i = 0, 1, 2, 3, 4, 5, . . . , k −1. Where i is the label of the face output by the network and y i is the prediction probability of corresponding the label.
Assuming that the dataset has M faces, it needs to be divided into M categories. If you want to calculate the probability of a face, the denominator of the softmax function needs to calculate the probability that all face categories belong to that face, and then add them up as the denominator of the softmax function. Then, if you were to classify M targets using the softmax function, you would need to compute M * M times. Assume the number of classification of faces is 5 million, it means that 25 trillion calculations are needed to complete the classification. At the same time, as shown in formula (11), it is necessary to sum up all M category probabilities as the denominator for each calculation. The traditional softmax method requires a large amount of calculation when predicting the probability of each category. The use of Huffman trees to form H-softmax can greatly reduce the calculations while ensuring accuracy when processing large-scale face categories. Therefore, this paper uses H-softmax as the output layer function of ABASNet to classify the face category.
p(w 2 ) = p(n(w 2 , 1), left) * p(n(w 2 , 2), right) where δ is sigmoid function. p(n, left) is the probability to the left. p(n, right) is the probability to the right. Where v' is the vector of the intermediate node.
As shown in Fig.4, from w 1 to w m are corresponding categories and m is the number of categories. The top node is the root and the bottom node is the leaf. There are m-1 nodes in the middle. Each node has a unique path from the root to the node. Where n (w i , j) means to find the j-th node on the path of w. Suppose we need to calculate the output probability of w 2 in Fig.4. From the root node to the leaf, each intermediate node passes through a dichotomy task (right or left). Calculate the probability to the left of the node by formula (12). Formula (13) is the probability to the right of the selected node. So, from the root node to w 2 , we can calculate probability of w 2 according to formula (14). The probability of using the H-softmax function only needs to calculate the probability value three times, but if the softmax function is used, the probability values of all categories need to be calculated and added up as the denominator of formula 14. The difference is that in softmax each category corresponds to a one-hot label, but in H-softmax each category corresponds to haffman code. The number of probabilistic values it calculates cannot exceed log 2 m, which is far less than m. If there are 5 million faces to be classified, then the probability of using H-softmax to calculate a picture only needs a maximum of 23 times, which is much less than 5 million calculations. Therefore, we use H-softmax as a classification function for the output layer of ABASNet.
Multiface recognition structure diagram based on LBP-PAC-ABASNet and H-softmax is shown in Fig.5. Firstly, input the facial data set to be trained, and then perform grayscale and image interpolation on each image to make all input images uniform in size. Secondly, perform region division and LBP feature extraction on each face to form a 1 * K-dimensional face feature vector, and then perform LBP feature extraction on all face images and combine them into an N * K matrix. Thirdly, we use PCA to extract the M main features of the matrix to form the final N * M dimension LBP-PCA multifaceted feature matrix. Finally, the parameters of the BAS algorithm are initially tested, and a multiface feature network is constructed based on the ABASNet algorithm. Finally, this paper uses H-softmax to complete multiface classification and recognition.

IV. VERIFICATION AND ANALYSIS A. VALIDATION TEST OF LBP-PCA-ABASNET ALGORITHM
In this paper, the feature extraction method of LBP and PCA fusion is used to extract as many components as possible and form a feature matrix. Make the face features meet the rotation unchanged and strong anti-light interference ability. The ABAS algorithm is used to optimize the BP neural network, and the time of multiface classification is reduced by replacing the softmax of the output layer with the H-softmax function. Finally, a multiface classification network is constructed to realize the face recognition process. Here, we show the results of LBP-PCA-ABASNet on ORL dataset to reveal its ability to handle multiface classify.
ORL dataset is a widely used benchmark multiface recognition database [14]. The ORL face dataset contains a total   [1]. All the images were taken against a dark, uniform background, and the front face was taken (some with a slight lateral deviation). The parameters of the data set are shown in Table 1. We divided the training set and the test set according to 1:1, five each for training,40 people a total of 200 face images as the training set. remaining 200 as the test set and coded onehot, the label belonging to this face is 1 other class face label is 0. And then put it into the LBP-PCA-ABASNet for training. The smaller size of LBP block is, the better the local texture is. As shown in Fig.2 for different face image LBP coding mode, because the LBP coding mode affects the size of the image block, the different block size will affect the final network training accuracy. The resolution of the face image in this article is 112 * 92, which is a fixed value. Therefore, if the number of LBP blocks in a picture is larger, the area of LBP is smaller, and if the number of LBP blocks is smaller, the area of LBP block is larger. But as the number of blocks increases, the dimension of LBP histogram feature increases sharply and the partition of face image becomes too small. Excessive dimension will not only cause the classification speed to decrease, but also increase the interference caused by high frequency noise. At the same time, too small partitions will cause the histogram to be too sparse and lose statistical significance and excessive dimensionality reduction will also affect the accuracy. Although reducing the feature dimension of LBP through PCA can improve the difference between classes, excessive reduction of the LBP feature dimension will lead to a reduction in the classification effect. So, the face LBP feature dimension after PCA processing should also be controlled within a reasonable range. Therefore, in this paper, the error comparison experiment in the ORL database under different numbers of LBP image blocks and different PCA dimensions. We choose the best LBP image block number and PCA dimension in Fig.6. Fig.6  As shown in Fig.6, the resolution of the input face image is 112 * 92. When each training image is divided into 22 * 22 blocks according to the length and width, and the PCA dimension is 40, the error is the smallest, 0.005, which means the accuracy is 99.5%.
In the multiface LBP-PCA matrix, the number of rows is the face category, and the number of columns is the number of features of each face. Therefore, the full row rank of the LBP matrix must be guaranteed to accurately classify multiple faces. In theory, the number of image dimensions processed by PCA needs to be greater than the number of classifications. At the same time, the robustness of the method in this paper is that under the same parameters, multiple sets of tests are performed, and the error deviation of the experimental results is very small. To verify this speculation and observe the robustness of the neural network under different PCA dimensions. We perform the comparison experiment in Fig.7.
As shown in Fig.7, the horizontal axis is the LBP feature size of the face image after PCA processing, and the vertical axis is the average error curve of the test set. The red error bar near the average error curve is the standard deviation corresponding to the error under 50 identical tests. In the Fig.7, we can see that when the dimension is below 40, the error increases significantly, indicating that the number of dimensions needs to be maintained greater than the number of categories. At the same time, it can be seen from the red error bar that the standard error does not exceed 1%, indicating that the LBP-PCA-ABASNet method is robust.
As shown in Fig.8, in order to show the robustness of the proposed method more accurately, this article conducted 10 groups of tests under the ORL database. The parameters in the 10 tests of each method are exactly the same. LBP-PCA-BPNN is a method that has not been optimized by the ABAS algorithm, and LBP-PCA-ABASNet is a method that has been optimized by the ABAS algorithm. The other parameters of the two methods are the same. In different experimental groups, we found that LBP-PCA-BPNN fluctuates greatly and LBP-PCA-ABASNet fluctuates less and has better robust performance. Fig 8 is the 10 groups of error comparison tests on the robustness of the proposed method.
As shown in Fig. 9, in order to verify the effectiveness of H-softmax in shortening the training time, we did a set of   ablation experiments. The classification function in method 1 is softmax, the classification function in the proposed method is H-softmax, and the other parameters are the same. Fig.9 is an ablation experiment on whether to shorten the training time. Table 2 is the ablation experiment of method 1 and suggested method on training time with the same parameters and data set. The number of categories in ORL database is 40, and the number of categories in ExtYaleB database is 1196, and the number of categories in FERET database is 28. The data quantity of a, b and c are respectively 400, 16128 and 10000. Therefore, Fig.9 and Table 2 show that using H-softmax as a classification function can significantly shorten the training time.

B. COMPARISON EXPERIMENT OF DIFFERENT METHODS
At the same time, in order to illustrate the high-precision performance of the modified algorithm, this article compares LBP-PCA-ABASNet with various methods in the Fig.10. GA-BPNN is a representative algorithm to optimize BP neural network [7]. PCA-BPNN is a classic method of face recognition through BP neural network, which is not optimized by LBP feature extraction and ABAS algorithm.
And LBP-PCA-BPNN is a method not optimized by ABAS based on the method presented in this paper. PCANet is a deep learning method for face recognition [8]. Through these comparative experiments to verify the high accuracy and robustness of the LBP-PCA-ABASNet. Table 3 shows the parameters corresponding to Fig.10.
As shown in Fig.10, the PCA-BPNN method has low accuracy and poor robustness. The GA-BPNN method optimized by genetic algorithm has higher precision than the former. However, due to the randomness of genetic factors and the problems of BPNN itself, the robustness of GA-BPNN method is poor, and it is as easy to overfit as the former. The error of LBP-PCA-BPNN method drops rapidly within 0 to 100 epoches. Since the BPNN network itself is easy to overfit, when the number of epoches is greater than 120, the error of the LBP-PCA-BPNN method gradually increases. In this paper, the LBP-PCA-ABASNet method first uses the LBP-PCA method to improve the accuracy, and then uses ABAS algorithm to optimize the weight fluctuation of the neural network in each epoch iteration. Therefore, at 0 to 100 epoches, the error reduction rate of the LBP-PCA-ABASNet algorithm is fast not enough, but it solves the problem of overfitting in the process of 100 to 500 epoch. The PCANet method is a deep learning method, more weights need to be fitted through multiple iterations and a large amount of data. So, its convergence speed is slower and the accuracy is lower than the method in this article but higher than other methods. Therefore, the LBP-PCA-ABASNet algorithm has the advantages of high accuracy and good robustness.
As shown in Table 3, we selected the average accuracy of 30 experiments and training time to evaluate the model performance in 500 epoches. It can be seen from Table 3 that the LBP-PCA-ABASNet method has the highest accuracy. Using the ABAS algorithm to optimize the neural network will increase the training time like the traditional population algorithm, but the accuracy and robustness of the network will increase.

C. EXPERIMENTS ON EXTYALEB DATABASE AND FERETY DATABASE
In order to verify the reliability of the methods in this paper, we compare these methods in other face data sets. The FERET face database contains a large number of face images, and there is only one face in each picture [15]. In this episode, photos of the same person have different expressions, changes in lighting, posture, and age. Containing more than 10,000 face images with multiple poses and illuminations, it is one of the most widely used face databases in the field of face recognition. There are 16128 images in the ExtYaleB face database, including 9 poses and 64 images under lighting conditions [16]. FERET database and ExtYaleB database are often used to detect the quality of the algorithm of face regonizition. Therefore, we compare the errors of this method with other methods in Fig.11(a) and Fig.11(b) respectively in 500 epoches. As shown in Fig.11, FERET and ExtYaleB face data set training and test set ratio is 7:3. Table 4 and Table 5 correspond to the parameters in Fig.11(a) and Fig.11(b) respectively. In order to make the experiment more rigorous and guard against adding the minimum error as a comparison parameter in Table 4 and  Table 5.     Table 4. It can be seen from Table 4 that the estimation effects of the five models on the face recognition are all good, the LBP-PCA-ABAS-BP neural network algorithm not only has high accuracy but also has fast training speed. There are 1196 objects in the FERET dataset. In this paper, H-softmax is used to replace the softmax function, so the probability of each target only needs to be calculated at most 11 times. But if you use softmax for each target probability, you need to calculate 1196 times. It can be seen from Table 4 that with the increase of face data, the training time of the LBP-PCA-ABASNet method increases less, but the training time of the remaining methods increases more. The validity of the Hsoftmax function is verified in reducing training time. Fig.11(b) shows the comparison of the five methods in ExtYaleB face database. It can be seen from Table 4 that the estimation effects of the five models on the face recognition are all great, but the LBP-PCA-ABASNet algorithm is obviously more accurate. The traditional PCA-BP neural network algorithm is similar to GA-BPNN in accuracy and much worse than other models. From the perspective of the overall accuracy and convergence trend of Fig.11(a), the LBP-PCA-ABASNET algorithm works best. It not only has high accuracy, but also has strong robustness. Fig.12 show the comparison of the accuracy of different methods in different face data sets. Shao et al. [2] is the IKLDA+PNN method. Liu et al. [7] is the GA-BPNN method. Gao et al. [8] is the PCANet method. LBP-PCA-ABASNet is a novel multiface recognition method proposed in this paper. In the ORL database, ExtYaleB database and FERET database, the accuracy of this method is the highest. With the increase of face categories, the accuracy of the LBP-PCA-ABASNet method has decreased, but it is still close to the accuracy of the IKDA+PNN method. It proves the feasibility and high accuracy of the method in this paper.
The number of face images in ORL face database, ExtYaleB face database and FERET face database is 400, 16128 and 10000 respectively, and the number of face categories is 40, 28 and 1196 respectively. As shown in the Fig.13, in the case of a small number of data sets and a small number of categories, the training time of several high-precision methods is close. But when the number of faces increases to 16,128, the training time of the PCANet method representing deep learning is greatly increased. Since the number of face categories in ExtYaleB face database is similar to the number of face categories in ORL face database, the training time of LBP-PCA-ABASNet remains almost unchanged. When the amount of face data is 10,000 and the number of categories is 1196, the training time of PCANet method is still very long. The method in this paper uses H-softmax for classification, so the training time only increases a little. The effectiveness of H-softmax in classifying large-scale face categories is verified.
At the same time, deep learning is increasingly used in face recognition in recent years, so this paper adds a comparison experiment with the face recognition algorithm based on deep learning in the past two years. Both (DBN, NN) and (SAE, NN) are face recognition methods based on deep learning, and them were proposed in 2018 [22]. Fig.14 and Table 6 shows a comparative experiment of error between the proposed method and several deep learning methods. Fig.15 is the result of face recognition using this method in ORL face database. The left side is the face that needs to be recognized, and the right side is the correct face category.
The output of the LBP-PCA-ABASNet method is the category to which the input image belongs.    Face recognition results in real scenes may be affected by factors such as light, resolution and angle, since the accuracy in the data set and the recognition results under actual scenarios may be different. This paper conducts tests in the classroom scene to verify the lightweight and short training time of the method in this paper. We use the integrated regression tree method as a face detection method to collect each person's face image. Each person collects ten images, of which 5 images are used as the test set and 5 images are used as the training set. Due to the small amount of data, we use the ExtYaleB training model as a pre-training model for transfer learning, which can greatly reduce the training time and prevent overfitting. Since transfer learning is adopted here, both the training time and the number of iterations is greatly reduced. Fig.16(a) is the original image to be identified. The name above each person is the corresponding face image category. We use the face detection method for face detection, and then use LBP-PCA-ABASNet for face recognition. Due to the use of transfer learning, this method converges after 50 iterations. As shown in Fig.16(b), you can see that each face is accurately recognized. In the case of 50 iterations, the training time and recognition time are 21 seconds and 0.06 seconds. Real-time recognition can be achieved, so this method can be used for lightweight embedded devices or mobile terminal devices, with the advantages of fast recognition and training speed, fewer face samples required, high accuracy, and lower hardware level required. And when the number of people is less than 10,000, the accuracy of face recognition in this method is consistent with the deep learning method.
In order to verify that the method can be used in embedded devices, this paper conducted related tests. The embedded device used in this article is 'Raspberry Pi 4 Model B' with 4GB of memory, which main frequency of the CPU is 1.5hz. And we have added a 16GB micro SD to store data. The system used in this embedded device is Ubuntu 18.04.4 LTS. We run the proposed method under python3.7 and tensorflow framework.
This paper inputs a set of face images and uses proposed method for recognition in the embedded device.  The resolution of each image is 1605 × 895. We use the haar cascade classifier to recognize the face area, and unify the image size of each face area to 112 * 92, and then use the method in this paper to identify. Fig.18 shows the recognition result in embedded device. We use the haar cascade classifier to recognize the face area, and unify the image size of each face area to 112 * 92, and then use the method in this paper to identify. The name above each person is the corresponding face image category. As shown in Fig.18, face of everyone has been accurately identified. The sped of recognizing each image is 0.115 seconds. This test proves that the proposed method can be applied to embedded devices. Fig.17 is the input image.
Since the image classification method is universal, this article speculates that the proposed ABASNet can be used for other classification tasks, and we added a landmark classification experiment to verify this conclusion. Fig.19 is the landmark image to be classified and the corresponding category. The landmarks corresponding to the labels on the left are 'slow speed', 'green light', 'left turn', 'speed limit', 'stop', 'red light', 'straight ahead' and 'zebra crossing' in the Fig.19.
There are eight categories, and we collect 260 images for each category. Then this article divides the number of training sets and the number of test sets for each category into 10:3. The result of this method in the landmark test set is represented by a confusion matrix as shown in Fig.20. We found that the classification accuracy of the proposed method is still very high and can be used for other classification tasks.  At the same time, the landmark classification based on the proposed method is applied to unmanned vehicle. The lower computer in the unmanned vehicle is Raspberry Pi 4 Model B. Fig.21 shows the landmark classification test in unmanned vehicles. The lower right corner of Fig.21 shows the camera visual angle and corresponding landmark classification results. The processing time for each image is 0.089 seconds. The unmanned vehicle changes the driving direction of the unmanned vehicle by identifying the category of the image in front of the camera. On the unmanned vehicle, there is no need to process the image frame by frame, and only need to process an image every 0.1 seconds to realize real-time unmanned driving. This unmanned vehicle test also verifies that the proposed method can be used in other image classification tasks and embedded device.

V. CONCLUSION
This paper proposes a lightweight and short training time multiface recognition method based on ABASNet, which not only has high accuracy but also has strong robustness. And each face only needs a small amount of data to achieve high accuracy. At the same time, based on the neural network and the H-softmax function method, a fast classification method is proposed, which can greatly improve the training speed when there are many categories of face recognition. Tests show that the method has strong adaptability and effectiveness. In addition, this method solves the problem of how to accurately classify and recognize large-scale facial categories with a small number of facial data sets and relatively short time and low performance devices. This multiface recognition method combines traditional image processing methods with neural networks and has strong practical significance and applied value. JIA LE YANG received the B.S. degree from the School of Electrical Engineering and Automation, Nanjing Normal University, Nanjing, China, in 2019. She is currently pursuing the master's degree in control theory and control engineering with Nanjing Normal University. Her research interests include face recognition, computer vision, and machine learning.