Fingertip Detection Algorithm Based on Maximum Discrimination HOG Feature in Complex Background

Gesture fingertip detection plays a vital role in human-computer interaction applications such as VR and robot control. To improve the accuracy of fingertip detection in complex background, this paper proposes a fingertip detection algorithm based on a novel maximum discrimination HOG feature. Firstly, the Holistically-nested edge detection algorithm is used to detect the edge of hand images and non-hand images, and the HOG features of the contour edge are extracted to obtain the training set of positive and negative samples to reduce the influence of illumination, color, and texture on fingertip detection. Secondly, the maximum discrimination features are filtered from the positive sample set by a custom maximum discrimination feature filter and stored as a feature dictionary. The filtered maximum discrimination features and negative sample set are input into the XGBoost classifier for training and the voting rights classifier is obtained. Filtering maximum discrimination features can reduce the interference of irrelevant features and improve the performance of the classifier. Then, the KNN algorithm is used to find the best match in the dictionary, and the final fingertip position is obtained by the Meanshift algorithm. Finally, the algorithm in this paper is tested. The test results show that the accuracy of fingertip detection in this paper can reach 99 %, and the detected RMSE is kept in the range of 5 pixels, which is higher than that based on YCbCr skin color segmentation, YOLO target detection, and YOLO-YCbCr.


I. INTRODUCTION
With the development of human-computer interaction (HCI) technology, gesture interaction, as the most natural interactive interaction, has been widely used in virtual reality (VR) and augmented reality (AR), such as aerial writing [1], game control [2], interactive projector [3] and robot control. For example, for the reconnaissance robots control, if soldiers can use head-mounted displays and visual gesture interaction technology (similar to air mouse technology), it will greatly The associate editor coordinating the review of this manuscript and approving it for publication was Gangyi Jiang. facilitate the operation of soldiers. However, visual gesture interaction depends on gesture detection or fingertip detection. There are many challenges in fingertip detection. The environment where soldiers and reconnaissance robots are located may be complex, and the light changes dramatically. Soldiers may also wear gloves or hold other devices, and need to use infrared cameras to take images at night. Therefore, it is essential to study how to realize the visual detection of gestures in the all-day and complex background, even in the presence of attachment interference (such as gloves).
At present, fingertip detection methods mainly include IMU sensor-based contact detection methods and vision based non-contact detection methods. Among them, fingertip detection algorithms based on contact detection methods mostly use data gloves, depth cameras, marker finger sets, and other equipment to obtain hand information, and then obtain fingertip position through image processing technology and the geometric relationship of the hand. For example, A.Z.Shukor et al. [4] designed a data glove, which achieves sign language recognition by equipping each finger with a finger-bending sensor to obtain hand information. Data gloves can accurately obtain hand information, but expensive, and not easy to promote. T.Dinh-Son et al. [5] extracted the hand region and palm center of interest through the joint information of the Kinect depth camera and then used the tracking algorithm to extract the contour of the hand. Finally, based on the hand contour coordinates, the k-cosine algorithm is used to detect the fingertip position. Through the depth camera, the hand area can be easily obtained. However, the existing depth cameras are limited by distance, illumination and other factors when shooting depth images, which is difficult to meet the needs of multi-scene applications. In addition, the hand position can also be obtained by giving different color markers to different fingertips [6]. However, this method requires re-marking before changing the experimental background each time, which will cause a lot of inconvenience to the operation.
Compared with the above methods, vision based non-contact detection methods can be applied to a variety of scenarios. It only needs an ordinary camera, which is cheap and suitable for promotion. Vision-based fingertip detection methods generally include two steps: first, obtaining the hand from the image, and then locating the fingertip through artificial features or machine learning methods. The detection accuracy of these methods is often affected by complex backgrounds, dramatic changes in illumination, hand occlusion and other factors in the image. Therefore, it is very important to design a reasonable algorithm to accurately extract the hand in the image. Therefore, L.Lae-Kyoung et al. [7] proposed to combine YCbCr model with a genetic algorithm to optimize the segmentation of the hand. In [8], the method of dense optical flow combined with the YCbCr model is used to obtain the hand area by setting the skin color threshold, and then the hand center of gravity is obtained by dividing the hand area into blocks. Finally, the maximum center of gravity distance method is used to detect the fingertip. The method of hand segmentation based on YCbCr skin color model is simple, but it is difficult to deal with the problem of illumination change and skin color background. Due to the limitations of skin color segmentation, researchers began to try to obtain hand regions through deep learning methods. M.P. et al. [9] proposed using the target detection model based on deep neural network (DNN) to segment gestures from the scene, and then use the MobileNetv2 architecture to estimate the fingertip position. H.Y. et al. [10] proposed to detect the hand area by the Faster R-CNN algorithm, and then use the CNN network to detect the fingertip position. In [11], the YOLO algorithm was used to obtain the hand area, and then the fingertip detection was regressed by the VGG16 full convolutional neural network. Y.-H.Chen et al. [12] proposed an improved masked region convolutional neural network (Mask R-CNN), which uses a region-based CNN network to detect fingers and achieves fingertip detection through a three-layer CNN network. Deep learning-based algorithms (such as YOLO, etc.) usually use color images. For gesture images captured by infrared cameras at night, gesture detection or fingertip detection based on deep learning is not good. Moreover, when performing operations in complex environments, the hand images captured by the camera often suffer from partial occlusion. The method based on deep learning is also difficult to deal with complex occlusion problems. The characteristic of human vision, illusory contours [13] (or subjective contours), is a visual illusion in which an observer perceives the presence of contours in the absence of changes in brightness or color. As shown in Fig.1, the square, triangle, and diamond are not complete, but they are subjectively given complete contours. This shows that human vision can find and infer the location of familiar objects based on incomplete contours. Considering the problems existing in the current fingertip detection algorithm, this paper proposes a fingertip detection algorithm based on subjective contour characteristics, which can detect the familiar target according to the partial contour. The HOG feature used in this algorithm is first used in pedestrian detection. Through the cooperation of HOG feature extraction and SVM classifier [14], it has achieved great success in the field of pedestrian detection. HOG feature has excellent characteristics such as geometric and optical invariance, and describes the shape of local regions by statistical gradient information. S. R. et al. [15] applied the HOG feature and edge HOG feature to gesture recognition, which provides us with a new idea. Edge has the characteristics of high tolerance to illumination, color and texture, which is very suitable for complex gesture detection and recognition environment. And combined with the idea of voting based on contour components, gesture detection in complex backgrounds with occlusion can be solved. Common edge detection methods, such as algorithms based on differential operators such as Canny, are difficult to effectively detect opponents in complex backgrounds. Therefore, this paper uses a Holistically-nested edge detection (HED) method [16]. This method achieves end-to-end image detection by using a deep learning model of a fully convolutional neural network and a deeply supervised network, and has a good effect on distinguishing the hand contour from the background contour. In addition, because the SVM classifier uses quadratic programming to solve the support vector, in the face of large sample data, the classification will consume a lot of machine memory and operation time, and in the case of linear inseparable, the algorithm is prone to overfitting. Therefore, this paper chooses XGBoost [17] as the voting rights classifier. XGBoost adds a regularization term to the objective function, which can effectively avoid overfitting, and uses parallel processing to greatly improve the calculation speed.
In this paper, firstly, the HED algorithm is used to detect the edge of the images. Secondly, the HOG features of each edge point are extracted from the contour edge images, and the maximum discrimination features are selected by a custom maximum discrimination feature filter. A centripetal offset vector is proposed to characterize the offset between the position of the maximum discrimination feature and the position of the fingertip, and the maximum discrimination feature and the centripetal offset vector are saved to the feature dictionary. Then, the XGBoost voting rights classifier is used to classify the maximum discrimination features with voting rights from all contour edge HOG features. Finally, the KNN [18] algorithm is used to match the maximum discrimination features obtained by the classification with the maximum discrimination features in the feature dictionary, and the centripetal offset vector is used to vote on the fingertip position, and then the Meanshift [19] algorithm is used to search the final detected fingertip position.
Aiming at the problem of fingertip detection in humancomputer interaction, this paper proposes a fingertip detection algorithm based on maximum discrimination HOG feature in complex backgrounds. The specific contributions of this paper are as follows: 1) In this paper, the edge contour HOG features are used as the basis of image matching to reduce the impact of dramatic changes in light, texture changes, and skin-like backgrounds on fingertip detection. 2) To improve the efficiency and accuracy of fingertip detection, a voting rights classifier is designed to classify the voting features. 3) A maximum discrimination feature filter is proposed to select the most stable and effective features from hand edge contour HOG features, which reduces the influence of irrelevant features on the voting rights classifier. 4) Inspired by the illusory contour, a method of voting with hand contour components is proposed to determine the fingertip position. Depending on this idea, the fingertip position can be detected according to part of the hand information to effectively deal with the partial occlusion problem.
The rest of this article is organized in the following order. The overall overview of the proposed algorithm is introduced in Section II. Section III introduces the extraction of hand contour HOG features and the selection process of maximum discrimination features. Section IV introduces the principle and training process of the voting rights classifier for hand contour components. Section V describes the process of fingertip detection by KNN and the Meanshift algorithm. Section VI is the experimental part, which verifies the effectiveness of the fingertip detection algorithm proposed in this paper. Section VII is the summary part.

II. ALGORITHM OVERVIEW
Fingertip interaction is a very convenient and intuitive human-computer interaction method, which has a very wide range of applications. For example, in some scenarios where the operator and robots perform tasks cooperatively, such as anti-terrorism raids, battlefield reconnaissance, etc., the integrated use of the head-mounted display and fingertip motion trajectory information (similar to the air mouse) will greatly facilitate the operation complexity of operators, especially for fully armed soldiers. However, soldiers and mobile reconnaissance robots usually work in the wild, outdoors, and in other scenes with dramatic changes in light and complex backgrounds. In addition, soldiers often wear gloves or other attachments, which makes the fingertip detection algorithm need to adapt to the partial occlusion of the hand. To improve the effect of fingertip detection, this paper proposes a fingertip detection algorithm based on the maximum discrimination feature. The contour edge HOG feature is used to reduce the influence of illumination, color, and texture on fingertip detection, and the partial contour edge HOG feature is used instead of the whole. Therefore, when the hand is partially occluded, it can also have good robustness.
Through analysis, we find that in many human-computer interaction teleoperation applications based on fingertip information, the degree of freedom of the controlled object is low, such as in military reconnaissance robot systems and various manipulator control systems. Remote operation can be realized by using a single finger just like an air mouse. Inspired by this, we provide a single-finger human-computer interaction method that can use single-finger fingertip detection directly instead of the air mouse.
Target detection methods include holistic and componentbased detection methods. The whole-based detection method compares the similarity between the known template image and each region in the image to be detected. If the similarity reaches the set threshold, the matching is considered to be completed. The component-based detection method refers to dividing the target into multiple components, and each component is matched with each region in the image to be detected. Compared with the whole-based detection method, the component-based detection method can effectively improve the detection effect under occlusion, and improve the fault tolerance rate and robustness. According to this idea, this paper designs a fingertip detection algorithm based on maximum discrimination HOG features in complex backgrounds. The algorithm mainly includes three steps: maximum discrimination features extraction of hand

III. MAXIMUM DISCRIMINATION FEATURE EXTRACTION OF HAND CONTOUR A. HOG FEATURE EXTRACTION
The histogram of oriented gradient (HOG) [15] is a feature that describes the local information of the image. This method describes the shape of the object of interest in the image through the gradient or edge direction density distribution. The essence is to count the gradient information. HOG feature has excellent characteristics such as insensitivity to illumination and geometric changes. However, the HOG feature also has the problem of large computation and the inability to deal with occlusion. Therefore, this paper uses HOG feature voting of hand contour edge points to detect fingertips and realizes fingertip detection under partial occlusion.
Since the proposed algorithm aims to solve the problem of fingertip detection in complex background, the edge detection algorithm is expected to distinguish the contours of hand and background as much as possible, and the HED [20] algorithm can deal with the details of the image well. Therefore, this paper uses the HED algorithm to detect the edge of hand images and background images.
The main steps of HOG feature extraction of hand contour edge points used in this algorithm are as follows: First, the read edge image is normalized and the gradient size m (x, y) and direction θ(x, y) of each edge point (x,y) in Among them, H (x, y) is the gray value of the normalized image at (x, y).
Then, the gradient size and direction are counted to obtain the gradient histogram. When calculating the gradient histogram, the HOG feature takes the block as the sampling window, a block contains 3 × 3 cells, and a cell contains 9 × 9 pixels. The gradient direction is used to project each pixel in the cell, and the gradient size is used as the weight of the projection.
Finally, L2 regularization is performed on the gradient size in the block, which can effectively reduce the influence of illumination on HOG features. The normalization formula is as follows: where B n is the normalized result, x n is the corresponding block vector, and ξ is a very small positive number.

B. FEATURE SELECTION AND FEATURE DICTIONARY GENERATION
After feature extraction, a huge feature set is often generated. At this time, feature selection algorithms are usually used to filter features, remove redundant and unimportant features, and retain the most effective features. This can not only effectively reduce the number of features and avoid dimension disaster, but also reduce the influence of irrelevant features on the classifier and improve the performance of the classifier. Therefore, to obtain features with good detection effects and stability, this paper uses a custom maximum discrimination feature filter to filter the HOG features of the hand edge and defines the filtered features as maximum discrimination features, which are the features with voting rights. Each Maximum discrimination feature corresponds to a concentric offset vector, which is an offset determined by the maximum discrimination feature and the fingertip position.
The specific process of feature selection and feature dictionary generation is shown in Fig.3. The first part is the process of feature matching, which provides information basis for subsequent filtering. The HOG feature extraction method proposed in Section III-A is used to extract the features of the hand images, where, I g (F i )(g = 1, 2, 3, . . . , N , i = 1, 2, 3, . . . , n) represents the ith HOG feature extracted from the gth hand image. Due to the difference in images, the number of HOG features in each hand image is different. In the second part, a maximum discrimination feature filter is established through (1) and (2) conditions to filter maximum discrimination features.
1) The L2 norm between the fingertip position detected by maximum discrimination features and the real fingertip position is very small.
where, P j (F i ) denotes the fingertip coordinates detected by the most similar feature to I g (F i ) in the jth graph, Fingertip(j) denotes the real fingertip coordinates in the jth graph, Dist j (F i ) denotes the L2 norm of the detected and real fingertip coordinates in the jth graph, d is the threshold, and d is set to 15 pixels in this paper. Less than the threshold d is considered to satisfy very small. Q j (F s ) represents the most similar feature to I g (F i ) in the jth graph, s represents the label of the most similar feature to I g (F i ) in the jth graph, ϕ(·)represents the position coordinate function corresponding to a HOG feature, ϕ(Q j (F s )) represents the coordinate of the most similar feature to I g (F i ) in the jth graph. Offset (F i ) represents the offset vector between the feature I g (F i ) and the real fingertip coordinate. 2) On the basis of 1), the maximum discrimination feature should also satisfy that they have appeared in most hand images. In other words, the frequency of maximum discrimination features in hand images exceeds the set threshold.
where α represents the indicator function, if Dist j (F i ) < d, the value is 1, otherwise 0. T is the set threshold, which was set to 0.5 in this paper, and N is the number of hand images.

IV. TRAINING OF VOTING RIGHTS CLASSIFIER FOR HAND CONTOUR COMPONENTS
In section III-A, two sample sets are obtained, one is a positive sample set composed of hand contour edge HOG features and the other is a negative sample set composed of background contour edge HOG features. Then, the maximum discrimination feature filter proposed in Section III-B is used to select the maximum discrimination features with good and stable detection effects from the positive sample set, and a voting rights classifier for hand contour components is trained with the maximum discrimination feature set and the negative sample set. In fact, the voting rights classifier of hand contour components is a binary classifier, which can classify the features with voting rights from all the features, namely maximum discrimination features. This method can remove the background features in the figure and the other features of the hand before the voting starts, and only retain the maximum discrimination features with a good fingertip detection effect to improve the detection speed and accuracy. In this paper, the extreme gradient boosting (XGBoost) [21] algorithm is used as the voting rights classifier for hand contour components. The tree model is used as the base classifier. The number of base classifiers is p, and the number of base classifiers was set to 100. R is the base classifier space.
Then the voting rights classifier can be expressed as follows: where, f α represents the extracted αth contour edge HOG feature,Ĉ α represents the detected result of the voting rights classifierĈ on the feature f α , r n represents the nth base classifier, so r n (f α ) represents the detected value of the nth base classifier. The objective function of the model training is: C α represents the true label of the feature f α , and l C α ,Ĉ α represents the loss between the detected value and the true label. To prevent over-fitting, the algorithm simplifies the model from two aspects: limiting the number of leaves and weight regularization. In Equation 12, (r n ) represents the penalty, T represents the number of leaf nodes, β and γ are penalties, and ω i is the value of leaf nodes. The voting rights classifier uses the Taylor series to approximate the objective function, which can be expressed as: where, The above algorithm is described by pseudo-code. Algorithm 1 gives the whole training process of the voting rights classifier.
After training the voting classifier, the test image can be input into the voting classifier to obtain the maximum discrimination feature. The classification process of the voting rights classifier is shown in Fig.4. In the figure, the green labels are all the HOG features extracted, and the red labels are the maximum discrimination features.  // The HOG feature of binary image is extracted and the position and concentric offset vector of HOG feature are recorded; 8: end for 9: for num = 1 to num(edgehog1) do 10: vector ← edgehog1(:, num) 11: for t = 1 to N do 12: matchNum ←topKneighbors(edgehog1(:, t), vector) 13: // KNN algorithm is used for feature matching; 14: matchX,matchY← postionset1(matchNum) 15: detectFingertip(1,1) ← matchX-offset(1,num) 16: detectFingertip(2,1) ← matchY-offset(2,num) 17: distance ←getEuclidean(detectFingertip,fingertip) 18: // Calculate the Euclidean distance between the detected fingertip position and the real fingertip position; 19: if distance<d then 20: postivesum ← postivesum+1 21: end if 22: end for 23: if postivesum/a > T then 24: dictionary← save(vector, offset(:,num)) 25: KNN algorithm [22] is an algorithm that analyzes the similarity of samples by comparing the distance between two samples. In this paper, Euclideandistance is used to measure the similarity of samples. Through the KNN algorithm, each maximum discrimination feature obtained by classification is matched with the maximum discrimination features in the feature dictionary, and the top three features of similarity are taken as the matching features of the feature. Each matching feature is combined with the corresponding concentric offset vector for fingertip position voting to obtain a sparse fingertip detection space.
On the basis of the above voting, the fingertip detection confidence map can be used to represent the fingertip detection space. The calculation formula is as follows: Among them,Ĉ j represents the classification result of the Voting rights classifier for feature j, p(f j ) represents the position of the feature in the test image, offset t (f j ) represents the concentric offset vector corresponding to the feature matching the maximum discrimination feature f j in the feature dictionary, V represents the number of maximum discrimination features, K represents the number of nearest neighbors, in this paper, set K to 3.

B. MEANSHIFT FINGERTIP DETECTION
In the above sections, the fingertip detection confidence map of the image is obtained. To obtain the final detected fingertip, the Meanshift algorithm [23] is proposed to find the point with the largest detected fingertip density in the sparse fingertip detection space as the final detected fingertip. In addition, considering that the fingertip detection space is a sparse space, there may be isolated points. Therefore, the maximum value in the fingertip detection space is selected as the initial value of the iteration. The formula for detecting the 3166 VOLUME 11, 2023 fingertip iteration update is as follows: The fingertip detection process is shown in Fig.5. Firstly, the edge HOG features are extracted from the test image and input into the voting rights classifier to obtain the maximum discrimination features and other features, which are represented by the red and green rectangular boxes in the figure respectively. The two types of features correspond to different centripetal offset vectors. Then, the maximum discrimination features represented by the red rectangular box are matched and voted with the maximum discrimination features in the feature dictionary, and the final detection fingertip position is searched by the Meanshift algorithm. This method of detecting fingertips based on the voting of hand contour components improves the fault tolerance of fingertip detection. Even if the voting rights classifier classification has some inaccurate classification, it will not affect the final detection results. Algorithm 2 gives the whole process of fingertip detection for test images.

VI. EXPERIMENTS A. DATASET
In this paper, 1000 single-finger hand images under white background and 2000 complex background images without hand were used to train the voting rights classifier. The hand image is a self-made image with different shooting depths and different shapes. The complex background image is a subset C of the NUS-II data set taken on the campus of the National University of Singapore. In this paper, the edge HOG features of the hand image were extracted as the positive sample set, and the edge HOG features of the background image were extracted as the negative sample set. Then the maximum discrimination features were selected from the positive sample set, and the labels of the maximum discrimination feature set and the negative sample set were assigned to 1 and 0 respectively. The training set and the test set were divided according to the ratio of 8:2, and some training data sets are shown in Fig.6.
When filtering maximum discrimination features, it is necessary to obtain the fingertip position of the hand image and the offset between the position of each HOG feature and  the fingertip position. In this paper, the single-finger hand images are under a white background. Therefore, firstly, the maximum center of gravity distance method based on YCbCr skin color segmentation [8] was used to obtain the fingertip position in the image, and then the vector from the position of the HOG feature to the fingertip position was calculated, and the vector was defined as the centripetal offset vector.
In addition, this paper also set up 200 single-finger hand images under complex background to evaluate the effect of fingertip detection. Firstly, hand images under red, yellow, local light, and natural light of different intensities were taken. Secondly, hand images of partial occlusion, wearing all-finger gloves and half-finger gloves were taken. Finally, the images of the left and right hands of different subjects were taken, and some test images are shown in Fig.7. The dataset presented in this paper is available on request from the corresponding author.

1) COMPARISON OF EDGE DETECTION ALGORITHMS
To verify the applicability of the HED edge detection algorithm, the HED algorithm was compared with the Canny algorithm on the test image set. From Fig.8, it can be seen that the HED algorithm focuses on hand contour and ignores unimportant details. On the contrary, the Canny algorithm depicts more irrelevant edges. The edge detection algorithm is required to retain only the edges of the hand as much as possible, which can reduce the difficulty of subsequent HOG feature selection and classification.
where, (x t , y t ) is the real fingertip coordinates, and m is the number of test images. The size of the image was 120 × 160, and the Euclidean distance between the detected fingertip and the real fingertip was defined as less than 15 pixels. Accuracy refers to the proportion of detected correct images in all images.
where, M is the total number of images, F(·) is the indicator function, if (x p − x t ) 2 + (y p − y t ) 2 < 15, then F(·) = 1, otherwise F(·) = 0. In the process of fingertip detection, there is a situation where all the maximum discrimination features in the image are not correctly classified, so the fingertip cannot be detected. To evaluate the frequency of such cases, this paper used UnReco to characterize the proportion of undetected fingertip images in all images.
where Q represents the number of images where the fingertip is not detected. Table 1 shows that the HED algorithm is far superior to the Canny algorithm in the detection precision, accuracy, and missed detection rate of fingertip detection. It is proved that the HED algorithm can better deal with the details, ignore the invalid edges, and obtain more effective edges to adapt to the selection and classification of HOG features when dealing with hand images in complex backgrounds, thereby improving the accuracy of fingertip detection.

2) COMPARISON OF FEATURE SELECTION AND NON-FEATURE SELECTION
To verify the maximum discrimination feature filter in the role of the fingertip detection process, this paper used the test images mentioned in section VI-A to do a comparative experiment. First, the unfiltered features were directly used to train the voting rights classifier and complete the fingertip  detection, some results are shown in Fig.9 (B). Then, the features filtered by the maximum discrimination feature filter were used to train the voting rights classifier and perform fingertip detection, some of the results are shown in Fig.9 (C). In the figure, the green markers are the HOG features, the red lines are the centripetal offset vectors, the yellow markers are the fingertip detection space, and the blue rectangles are the final fingertips. Fig.9 shows that the fingertip detection effect of (C) is significantly better than that of (B), and compared with (C), there are many background features in (B) that are mistaken for hand features to participate in the voting, so that there are many points in the fingertip detection space that deviate from the real fingertip position. These points affect the search of the final fingertip position by the Meanshift algorithm, which leads to the deviation of the result. The maximum distinguishing feature filter designed in this paper has two filtering conditions: one is that the detection effect of the feature is good, and the other is that the frequency of the feature is high. The features that meet these two conditions were called the maximum distinguishing features and were given voting rights. The selected features are more regular, which is conducive to the training of the voting rights classifier. In addition, the two experiments were evaluated by the evaluation indicators mentioned in Section VI-B1, as shown in Table 2. Table 2 shows that UnReco can still maintain 0 %, but the Acc reduces to 67 %. This is due to the voting rights classifier using the background features as the hand features for fingertip detection. It is also because of the voting of VOLUME 11, 2023  these features that the RMSE increased significantly and the accuracy of fingertip detection reduced.

3) COMPARISON OF CLASSIFIERS
In this paper, the prepared data set was applied to different classifiers, and the classification effects of different classifiers on maximum discrimination features and other features were compared. The Accuracy,Precision,Recall, and F1-score index [25] were selected to evaluate the classification effect.
Accuracy is used to characterize the correct situation of each classifier in the classification of maximum discrimination features and other features. The calculation formula is as follows: Among them, TP is the number of correctly classified maximum discrimination features in the test images, TN is the number of correctly classified other features in the test images, and Total is the total number of features in the test images.
Precision is the proportion of the maximum discrimination features correctly classified in all maximum discrimination features. The calculation formula is: where FP is the number of maximum discrimination features that are misclassified in the test images.
Recall is the ratio of the detected accurate maximum discrimination features to all maximum discrimination features in test images. The calculation formula is as follows: where FN is the number of other features misclassified in the test images. F1-score is used to comprehensively evaluate the precision and recall rate. According to the selection of different sizes of β, the algorithm 's emphasis on recall and precision is determined. In this paper, β was set to 1, and the calculation formula is as follows: The evaluation results are shown in Table 3.
According to the evaluation results of the above table, compared with other classifiers, XGBoost has the highest accuracy, reaching 99.40%, and the XGBoost classifier also has a better performance on the comprehensive evaluation index F1-score, reaching 0.9970.
That is to say, when classifying maximum discrimination features and other features, the XGBoost classifier can classify maximum discrimination features more comprehensively on the basis of ensuring accuracy. In addition, the XGBoost classifier can avoid overfitting through regularization terms, so it has better generalization ability. The Accuracy and F1-score results of each classifier are shown in Fig.10.

4) TEST IN COMPLEX BACKGROUNDS
The proposed algorithm was compared with skin color segmentation based on YCbCr, object detection based on YOLO, and fingertip detection based on YOLO-YCbCr. The following is a detailed description of various methods.
• Fingertip detection based on YCbCr skin color segmentation [31]. The algorithm first used the skin color threshold in the YCbCr color space to segment the hand region to obtain the hand contour, and then combined the local maximum value of the cumulative curvature and the detection of convex defects to detect the fingertip.
• Fingertip detection based on YOLO [11]. The algorithm used the YOLOv5 algorithm to detect the fingertip position directly. During the training experiment, the fingertip positions in the 1000 single-finger hand images mentioned in section VI-A were first marked, and then the YOLOv5 model was trained.
• Fingertip detection based on YOLO-YCbCr [11], [31]. Firstly, the YOLOv5 algorithm was used to detect the hand region. In the training experiment, 1000 singlefinger hand images mentioned in section VI-A were randomly fused with complex background images to  obtain 1000 fused images. Then, the hand regions in these images were marked, and the YOLOv5 model was trained. Finally, fingertip detection was performed on the detected hand region by the algorithm proposed in [31].
In the test experiment, the proposed algorithm and the above three algorithms were tested using the test image set mentioned in section VI-A. In addition, for the proposed algorithm, an additional set of experiments was set up to verify the influence of different voting rights classifiers on the final fingertip detection effect. Finally, the effect of the above algorithm was evaluated by the evaluation index in VI-B1. Table 4 shows the evaluation results of different algorithms. Table 4 shows that compared with other classifiers, the RMSE of the XGBoost classifier for fingertip detection is significantly lower than other classifiers, and the Acc and UnReco are also the best in each classifier. This shows that XGBoost can use its own generalization ability to accurately extract maximum discrimination features when classifying complex background features and maximum discrimination features, thereby improving the accuracy of feature matching in the subsequent fingertip detection process. By comparing with the fingertip detection algorithm based on YCbCr skin color segmentation and YOLO target detection, it can be seen that the proposed algorithm has obvious advantages in dealing with complex background challenges. Part of the test results is shown in Fig.11.
From the fingertip detection results in Fig.11, it can be seen that the algorithm can accurately detect the fingertip position in the face of partial occlusion, wearing all-finger or halffinger gloves, dramatic changes in illumination, and different complex backgrounds.  To verify the robustness of the proposed algorithm to illumination, a set of hand images under complex background captured by an infrared camera without illumination at night were added to test the proposed algorithm. Some experimental results are shown in Fig.12.
It can be seen from Fig.12 that the proposed algorithm still has a good fingertip detection effect in the night environment, and RMSE maintains within 5 pixels, which shows that the proposed algorithm has good robustness to illumination.

VII. CONCLUSION
To improve the accuracy of fingertip detection in complex environments, this paper designed a fingertip detection algorithm based on the maximum discrimination HOG feature in complex backgrounds. The algorithm can effectively overcome the problems of severe illumination changes and partial occlusion in the process of fingertip detection. The following conclusions can be obtained through the experiments: 1) This paper compared the effects of two edge detection methods on fingertip detection, through experiments, it can be seen that the HED edge detection algorithm can emphasize the contour information between the background and the hand, ignoring irrelevant details, thereby reducing the difficulty of extracting maximum discrimination features, and the contour information is less affected by light and color changes, reducing the impact of environmental factors on fingertip detection. 2) This paper compared the classification effects of different types of classifiers on the maximum discrimination features of the hand and other features. The four evaluation indexes of Accuracy, Precision, Recall, and F1-score prove that the XGBoost classifier has better generalization ability in classifying the hand and background. 3) In this paper, a maximum discrimination feature filter was designed to filter the maximum discrimination features. Experiments show that feature selection can effectively improve the performance of the voting classifier, thereby improving the effect of fingertip detection. 4) In this paper, a fingertip detection method based on hand contour components was adopted, which makes the algorithm still effective when the hand is partially occluded and improves the fault tolerance rate and robustness. 5) The experimental results show that the proposed algorithm can still achieve 99% accuracy and the detected RMSE is kept in the range of 5 pixels in the face of severe illumination changes, partial hand occlusion, and complex background scenes. Compared with the traditional hand gesture detection algorithms based on skin color and deep learning, it can maintain higher accuracy and smaller prediction error in fingertip detection.
JINXIANG FENG is currently pursuing the master's degree with the School of Information and Automation Engineering, Qilu University of Technology (Shandong Academy of Sciences). Her research interests include computer vision, machine learning, and intelligent robot. CHENGLONG LI, photograph and biography not available at the time of publication. VOLUME 11, 2023