Face Recognition for Varying Illumination and Different Optical Zoom Using a Combination of Binary and Geometric Features

Preservation of image features caused by binary conversion is a difficult task under variation of illumination conditions. Several binary conversion-based methods have used an adaptive thresholding technique to improve their performance under illumination variation conditions because of its robustness. However, the performances of existing methods were still limited under high differences illumination conditions especially for uncontrolled lighting sources. In addition, various length of face-to-camera distance gives significant problem affect for the performance of face recognition method. It happens when various images are available for the same person with different length face-to-camera distances due to the appearance of varying facial features of the same person. Therefore, this study proposed to combine the strength normalization and feature-based method to build an illumination distribution model to overcome this problem. With the proposed method the illumination model will fit with variation of illumination conditions in a whole image to generate an adaptive threshold for a novel columnar binary conversion method. The proposed method consists of five main stages, starting with eye area detection using the developed Viola–Jones algorithm. Next, the iris is detected using the Circular Hough Transform (CHT) method and will convert it into binary using the proposed Columnar Binary Conversion (CBC) method to preserve the appearance of the facial features under the illumination variation. Then, the proposed Facial Feature Region Normalization (FFRN) method is performed to improve the effects of different optical zooms for the classification step. The classification is conducted based on the similarity measurement between the extracted normalised binary face region and the dataset that must be converted into their equivalent normalised binary images. The proposed method is evaluated on two different smartphone databases, namely as Smartphone Face Video (SFV) and MOBIO. The performance results showed the outperformance of the proposed method.


I. INTRODUCTION
Face recognition method in smartphone should be able to recognize user at any time and place; that is, this method should be able to handle the challenges of an uncontrolled environment in a real-world scenario such as illumination variations. Moreover, the illumination conditions lead to negative effects on facial feature appearance due to the lighting condition of such data; those negative effects are due to the differences of lighting conditions between indoor and outdoor environments and also on the light sensor of each smartphone, The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . which produces different amount of lighting depending on the device [1]. There are several previous methods used to overcome the problem. One common method to weaken the influence of illumination is by compensate the illumination on the test image by balancing its greyscale distribution [10]; however, the test image obtained by this method still has a few illumination variations affection. Besides, quotient image can be used to replace test image for recognition to extract a new illumination invariant image. However, such method has complexity in solving equation to obtain the quotient image. There is another approach that introducing an illumination compensation dictionary between two face images to be compensated linearly [1], [2]. However, this approach lacks of adaptability to different databases because it is low-dimensional and requires the original images to be extracted in the dictionary [2]. Although many studies have investigated the illumination problems, their performance are still limited under the extreme illumination variation conditions. Furthermore, smartphone database introduces a new challenge of different optical zooms. This problem refers to the difference in face-to-camera distance based on the length of the user's arm. During capture image using smartphone, user easily moving their face during recording time; therefore, if their face does not fill in the image frame, then the user is likely to move close to the camera, which lead to distorting the facial feature. Such problem affects the performance of face recognition method when various images of same person with different close up face-to-camera distances due to the appearance of varying facial features of the same person [3]- [5] Meanwhile, a multi-features prediction has been utilized as the features for post translation modification [6] and a flexible neutral tree (FNT), a novel classification model is employed as the classifiers in tree structural classification model [7].
In this research, a combination of normalisation and feature-based illumination enhancement method is proposed to model an illumination distribution. This model will fit with the illumination variation on the entire image to generate an adaptive threshold for the new CBC method then will convert the input image into binary [8]. The proposed method converts the image vertically by comparing the image columns with the corresponding generated adaptive threshold. The threshold value is calculated based on the candidate region inside the eye area to overcome the illumination effect. Furthermore, it also solves the problem of different optical zooms and proposes an FFRN method. FFRN method enhances the effects of different optical zooms by extracting the facial feature region-based on the relationship between the geometric features of the eye area (in the rectangle) and the face width-to-height measures.
The rest of this paper is organized as follows. Section 2 discusses several related works of face recognitions on variation illumination and different optical zoom. Section 3 presents the proposed method and section 4 describes the experimental and result. Lastly, section 5 concludes the paper.

II. RELATED WORK
Generally, the approaches for coping with variation in appearance due to illumination fall into three categories namely appearance-based, normalization based, and feature-based methods. In appearance-based approach, training examples are collected under different lighting conditions without undergoing any lighting pre-processing. The training examples are used to learn the global model of possible illumination variations, for example a linear subspace or manifold model, which then generalize to the variations seen in new images [9], [10]. This type of direct learning created a few assumptions, but it required a large number of training images and an expressive feature set, otherwise it is essential to include a good pre-processor to reduce illumination variations.
Normalization based approach attempt to compensate the uneven illumination with traditional image processing method. Histogram equalization is a simple example to compensate illumination changes by adjusting gray level distribution. However, purpose-designed method often exploits the fact that naturally occurring incoming illumination distributions typically have predominantly low spatial frequencies and soft edges so that high frequency information in the image is a predominant signal. Paper by [11] used a similar idea in the Self Quotient Image. The Weber-face method extracts locally salient patterns using relative terms. This method is quite effective but their ability to handle spatially non-uniform illumination variations remain limited.
The third approach which is feature-based extracts illumination-insensitive feature sets directly from the given image [12], [13]. These feature sets range from geometrical features [14] to image derivative features such as edge maps [15], Local Binary Patterns (LBP) [12], [16], and Gabor wavelets [13], [17]. Meanwhile [18] proposed a method to extract illumination insensitive features for face recognition called Local Centre of Mass Face (LCMF). In this LCMF approach, the gradient angle between the centre of mass and centre pixel of a selected neighborhood is extracted. Theoretically it is shown that this feature is illumination invariant using the Illumination Reflectance Model. Although such approaches offer an improvement on recognition results, their resistance to the complex illumination variations that occur in real-world facial images are still quite limited. Paper by [19] proposed an illumination processing method in face recognition called REC & SIG-SVD algorithm which incorporated illumination insensitive features and law frequency features to reassemble a normalised image. Although the proposed method managed to achieve satisfactory recognition rate, the performance was still limited to unequal illumination features of small face area. Moreover, the method of separating the frequency features of the face image more rigorously still requires further improvement.
Most of the methods mentioned above, illumination normalization methods failed to preserve the facial feature appearance under the complexity of illumination of smartphone environment. The face recognition under the smartphone database conditions should be able to recognize a client at any time and in any place (indoors or outdoors). In addition, using smartphone camera increased the challenge of illumination variation effects on the facial feature appearance. This is because, changes in lighting conditions are very important because they occur not only due to the differences of illumination conditions between indoor and outdoor environments but also with the light sensor of each smartphone camera, which produces different amount of lighting depending on the device and sensor with which account [1].
In order to overcome the problem of illumination variation in smartphone captured image, preferably a method with the advantages of normalization and feature-based method that can enhance the affection of illumination on the appearance of facial features is proposed. In this method, the binary features are extracted due to its importance in presenting the features under the different lighting conditions and because it will be used as an input to feature extraction process. Hence, it has an important role in generating unique feature specially to distinguish several classes of pattern recognition [20].
In the image processing, image binarization is used as a general tool for image segmentation of discriminating object from the background. Thresholding technique is the simplest way of binarization. In this technique, an optimal threshold value is chosen, and the pixel is classified as foreground or background by comparing it with the threshold value. In real life, image suffer from degradation like variation in illumination and contrast and background noise that leads to a challenging task of choosing a correct threshold value in converting the image to binary [21]. The incorrect estimation of threshold value results in misclassification of the pixels and affects the appearance of pixel, in which it directly produces negative effect on the performance of pattern recognition application. In general, there are two types of thresholding methods which are local thresholding and global thresholding. Global thresholding is a simple method that can be easily applied besides its fast execution due to lesser computations required. Nevertheless, this method is not suitable under variation illumination conditions. On the other hand, local thresholding method segments an entire image into sub segments with a specific threshold in each small segment. This method is rather difficult to implement since it is slow and complex process. Although this method is being used in ununiformed illumination environment, it still failed for image with high illumination variation [22].
Local adaptive thresholding is a common solution used to overcome the issue of variations in illumination. This method applies different threshold values in each pixel or region in the image which provides more robustness to changes in illumination. Many studies were carried out which introduced several different techniques for computing the adaptive thresholding value based on illumination variation. Paper by [23] used the local mean and standard deviation of the neighboring pixels inside a local window to compute the threshold according to contrast. Moreover, [24] introduced a technique known as local grey range technique with the grey range used within the local window to identify the threshold value inside the range between the maximum and minimum pixels. Later, a study by [25] managed to compute local adaptive threshold by acquiring different results based on their calculation process. Based from observation, the contrast in the local neighborhood was quite low which led to a smaller threshold as compared to the mean value. Nevertheless, this method was considered as computationally complex due to the complexity in computing the threshold value which required computing local mean and standard deviation for each pixel. Paper by [26] proposed to use an adaptive thresholding because it accounted spatial variations in illumination. The adaptive threshold compared a pixel to the average of nearby pixels in order to preserve hard contrast lines and ignore soft gradient changes. While [27] proposed adaptive thresholding approach that combined the advantages of local and global thresholding methods, which was based on the intelligent block size detection. Next, the thresholding was performed based on each block of degraded image. However, this method generates background noise in most cases. Paper by [28] and [29] proposed using integral images to increase robustness of adaptive threshold method to strong illumination changes. However, the main drawback of this method is that the image needed to be processed twice, hence, increasing the implementation time. In another study by [30], suggested an improvement for the Otsu's and iterative Otsu's threshold methods by [31] in variation illumination environment by proposing an Iterative Region based Otsu (IRO) thresholding. The IRO method computed the threshold value using the statistic of greyscale intensities and regional distribution of the illumination noise. However, the disadvantage is that, the IRO method did not perform well with the shaded region [30].
Instead of illumination variation, optical zoom variability effect is another issue in smartphone camera image which will cause face distortion while capturing face through front camera. With that, the effect was negligible when the distance used is above 0.5 meter. However, it addressed a large scale problem when the method was applied to personal video chat context or in the case of capturing the image using the front camera of a smartphone under a very close distance which often called as ''selfie'', where a person can be no further than an arm's length. The subject's appearance can vary dramatically especially on nose and cheek region depending on the camera's distance from the subject. This distortion presents a problem for automatic face recognition. Paper by [32], [33] studied the effect of face recognition by human when viewing faces at different distances from the smartphone camera where involved a training phase in which face images were displayed to a human test subject. A similar psychology based studied is presented in [34], to investigate the effects of different optical zoom as visual cue for social judgement of faces. Human subject asked to judge an image of a face in terms of trustworthiness, attractiveness, and competence. The results show that for social judgement, pictures taken up in short distance are generally rated lower, compared to pictures taken in far distance have higher rating. Paper [4] had also investigated the effect of different distances and the results indicated that there is a difficulty for viewer to identify same face at different distances. Furthermore [5] investigated the impact of faces captured in different distances on the performance of the automatic face recognition algorithm.

III. PROPOSED METHOD
The proposed face recognition method is based on binary and geometric facial features which include five main stages, as shown in Figure 1 where the first and last stages are the input and output. The first stage is to detect eye area using Viola Jones algorithm. Next, from this eye area, the irises are detected using the Circular Hough Transform (CHT) method which will be the reference to extract the reference line for smoothing. Then, it will be converted to binary using the proposed CBC method to preserve the appearance of the facial features under the illumination variation. Lastly, the proposed FFRN method is performed to improve the effects of different optical zooms for the classification step. The classification is conducted based on the similarity measurement between the extracted normalised binary face region and the dataset that must be converted into their equivalent normalised binary image. This stage produces the highest similarity image and its class number.

A. EYE AREA DETECTION
Accurate eye area detection is significant to ensure the successful of face recognition and need to take high consideration. Most existing methods were based on the measurement of eye characteristics, which involved the learning of a statistical appearance model and the exploitation of structural information. Method that based on the measurement of eye characteristics use intuitive visual characteristics, such as shape, difference in intensity between the iris and neighboring region, or reflection of the eye in infrared image, as template for detection. This method has the advantage of being simple and fast in implementation because they use intuitive algorithm. The accurate location of eyes in a facial image is important to many human facial recognition-related applications and has attracted considerable research interests in computer vision. The method proposed by [35], used a circular Hough transform to first detect eye candidate, and verifies each candidate using the Histogram of Oriented Gradient. The eye detection method developed by [36], used Haar-like feature-based method to detect eye candidate region and locates the eyes using Histogram of Oriented Gradient. All evaluated eye detection method is trained and tested using two-fold cross-validation for each database. Paper by [37] proposed an eye detection method that can locate the eyes in facial image, which consists of two stages namely eye candidate detection and eye candidate verification. In eye candidate detection, the eye is detected using structure information, multi-scale iris shape features and integral image. The method evaluated on both CAS-PEAL and Pointing'04 databases. The experimental results showed that by using the geometrical constraint, the precision of the proposed method has improved. However, this method only for one-side eye detector and not suitable under extreme illumination variation. In this study, the geometrical features and the structure information of smartphone captured image were used to detect the eye area and iris. These features are robust to the illumination variation and complicated background environments.
In this research, Viola-Jones method [38] is used to get the eye region.Viola-Jones method is based on the idea that a weak classifier is repeatedly demanded in the training samples to form a highly accurate predictor. Each classifier uses K rectangular areas, namely, Haar features, to determine the similarity of the image region to the predefined image. Figure 2 shows the steps of Viola-Jones method to detect eye area from input frame image size of (m, n) and cropped it. Then it is converted into grayscale image with the size of (m 2 , n 2 ) that represent as row and column. Once the eye area has been detected, iris area will be detected. Previous studies have examined on various subjects of different races and found that the iris is amongst the darkest circular region of the eye where has been characterized by its lower intensity compared with the surrounding region [39], [40]. In this research, the potential irises will be detected using Circular Hough Transform (CHT) due to its robustness in the presence of noise and varying illumination. It will calculate the circle radii of the iris, and the two-stages radii are explicitly estimated by the predicted circle centre along with image information [41], [42]. As a result, numerous missing and false circles were detected as shown in Figure 3.
In order to get the correct iris area, we proposed a method where the structure and location features of human eyes are used to generate filtering condition for these circles. Then, to correctly indicate the potential position of these two iris circles, a horizontal search in the middle line of the detected eye area is calculated from left to the half of the eye area column size to get the values range that verified a maximum percentage of precise iris location as shown in Figure 4 and Figure 5.
where n 2 represents the column size of the eye area (E). Figure 6 shows example of accurate iris circle.

B. COLUMNAR BINARY CONVERSION METHOD
Binary features have been widely used in many face recognition system due to their excellent robustness and strong discriminative power [20], [43]. However, many studies have performed this binary conversion operation, but their performance remains limited under the illumination variation conditions. Due to this limitation, a combination of the normalisation and feature-based illumination enhancement methods is proposed by modelling the illumination distribution. This model will fit with the illumination variation on an entire image to generate an adaptive threshold for the new CBC method. Then it will convert the eye area image into binary. Before getting the binary image, CBC method converts the image vertically by comparing the image columns with the corresponding generated adaptive threshold. The threshold value is calculated based on the candidate region inside the eye area to overcome the illumination effect. Figure 7 shows the general CBC method framework. Referring to Figure 7, the CBC method detects the greyscale eye area that has the true iris circle detected with the size of m 2 , n 2 . With this input image, it can describe the texture and shape of the area by the differences in colours and shapes of the area layers. Then, the scanning process for the reference line of the proposed binary conversion is performed on the extracted area horizontally at ( m 2 2 ) location, which contains at least one of the two irises' centre and divides the image into n 2 -lines, where n 2 represents the horizontal dimension (columns) of the image.
Thereafter, each vertical line is compared with the threshold which already formulated as a function of illumination by taking the index of the reference line of E. The greyscale index value is represented by X, as shown in Equation 3.
Then the data of the matrix (X) are smoothed by using the Exponentially Weighted Moving Average (EWMA) function to generate the adaptive threshold (thr) based on Equation 4.
These threshold values are required to obtain the binary eye area image (BW) using Equation 5.
The proposed method overcomes the inverse illumination problem that is frequently encountered in the binary conversion. The process description is shown in Figure 8.

C. FACIAL FEATURE REGION NORMALIZATION METHOD
Instead of solving the variation of illumination there is another problem to be solved which is the different optical zooms. This problem will also affect the performance of face recognition on personal video chat contexts or in selfie mode on mobile video/image when happened on the situation that the length is not farther from the camera more than an arm's length. It clearly appears when different images of the same person from various close face-to-camera distances due to its effect on the appearance of the facial features. Due to this problem the FFRN method is propose where it will enhance the effects of different optical zooms by extracting the facial feature region based on the relationship between the geometric features of the eye area (in the rectangle) and the face width-to-height measures. Figure 9 shows the framework of the proposed FFRN method. The FFRN method aims to produce consistent face images for the same person because the detected faces vary in capturing distance and alignment in different images of the same person. After conducting a several experiments, it indicates that the height value of the facial feature region is two and a half times that of the vertical size value of the eye area (in rectangle; BW) and has the same area width. Figure 10 shows the relationship between the eye area (in rectangle) and the height facial feature region.
The proposed method begin with the binary eye area that obtained from CBC method and it will be used as the reference to extract the facial feature region (R) from the normalised frame image (I) based on the following equations 6 and 7: Height of facial feature region = 2.5 × m 2 (6) Width of facial feature region = n 2 (7) where m 2 and n 2 are length and width of eye area. Next, the extracted facial feature region (R) is converted into a binary image to preserve the appearance of the facial features, and the image is normalised to the size VOLUME 8, 2020

D. FACE CLASSIFICATION
Generally, this study proposed a face recognition method that combines holistic and feature-based methods. The proposed method recognised the face based on geometric and binary features of the normalised facial feature region. In order to compare the input video frame with stored set of face templates, the template-based method is used. When an image frame is fed to the proposed method, an image passes through the eye area and true iris detection process. Subsequently, the eye area converted to binary based on the CBC method then the face detected based on the FFRN method which detects the facial features area based on the geometric measurements of the eye area in order to overcome the problem of different face-to-camera distances. Next, the binary facial area is extracted using the CBC method to deal with the illumination problem.
Normally, template matching method is used to compare the input facial features area against gallery images based on the similarity measurement. To avoid bias in the comparison, the steps of previous method were implemented on all databases of video images to generate their normalised binary facial feature region images which will be used for comparison purpose. Then, the similarity is measured by employing three different measurements, namely, SSI, PSNR and RMSE [44]. The similarity between the test image and the database of face images will produce same similarity numbers. A cluster of numbers within a small range indicates a reduced confidence. Take note that high similarity values in PSNR and SSIM indicate greater image similarity, and high value in RMSE indicate lower image similarity. The proposed method was implemented on the database that contained four different videos for each person. Therefore, the four highest similarity values of PSNR and SSIM, and the four lowest similarity values for RMSE were selected to demand all pictures of the class where the test image belongs.

IV. EXPERIMENT AND RESULT
This section discusses the performance of the face recognition method, which is evaluated on SFV and MOBIO databases, images from SFV database were captured exclusively on mobile phones (iPhone 6). Four unique video samples are available for each of the 50 male participants, inclusive of two videos from front and rear cameras at indoor environment and two other videos from front and rear cameras at outdoor environment with duration range of 10 -15 second. This method aims to convert the input image into binary using the CBC method to preserve the appearance of the facial feature under various illumination conditions. The experimental results of this step are discussed in subsection A. Moreover, the face recognition method aims to enhance the evaluation effects of different optical zoom performance, as discussed in subsection B. Finally, the performance of the recognition method is evaluated using quantitative and qualitative measurements to confirm its accuracy.

A. EXPERIMENTAL RESULTS OF THE CBC METHOD
The proposed method is evaluated on two different smartphone databases, namely, SFV and MOBIO [45]. The CBC method extracts the reference line from the eye area to generate the thresholding value of the binary conversion operation. The proposed location of the reference line proves its superior power compared with other parts of the face. Figure 11 shows the result of the CBC method on random SFV video images when the reference line is extracted from the proposed location ( m 2 2 ) and from other image locations.  Figure 11 shows that the proposed location proves its efficiency in the CBC performance, in which the appearance of facial features in the third-row images are evident. Next, figures 12 and 13 show the implementation results of CBC on SFV and MOBIO databases, respectively.
Then, the experiment results of CBC method's performance are compared with the binary conversion method, namely, Otsu's method [46] and the Iterative Region based Otsu (IRO) of [30] which computed the threshold value using the statistics of grayscale intensities and regional distribution of the illumination noises. Figure 14 shows the results   Results in Figure 14 shows that the illumination variation conditions of both databases affect the performance of Otsu's and the IRO methods, as shown in the row (c) and (d). The global threshold value field of the Otsu's method under different illumination conditions cannot preserve the facial feature appearance in the binary image. In addition, IRO method did not work well on image at outdoor environment. By contrast, the proposed CBC method proves its robustness and efficiency in preserving the facial features under various illumination conditions of indoor and outdoor environments.
To compare the results of proposed CBC method and other methods, RMSE is used to calculate the similarity measurement between the obtained results from these methods and the ground truth binary feature images. The comparison percentage is computes based on the number of result images which are more similar to ground truth to the total number of the input images. Table 1 and 2 show the comparison of RMSE of CBC with Otsu's and IRO methods respectively.  It can be noticed from table 1 and 2 that proposed CBC method obtain high performance on both databases, and the second-best performance is IRO method which depends on the intensities and regional distribution of the illumination noises. But its performance is still not as good as CBC method on the smartphone image under the sunlight condition. The worst performance is Otsu's method because it could not work well under the different illumination conditions.

B. EXPERIMENTAL RESULTS OF FFRN METHOD
The problem of different optical zooms in different images of same person with various close face-to-camera distances are presented in this section. The variety of length in close distances contributes to the different facial features of same person, which affects the performance of face recognition method. The FFRN method is proposed to solve the problem of different optical zooms on face recognition performance. The FFRN method is evaluated on two different smartphone databases of SFV and MOBIO. Figures 15 and 16 show the results of the proposed method on SFV and MOBIO databases, respectively. The obtained results are evaluated subjectively.

C. EXPERIMENTAL RESULTS OF FACE RECOGNITION
As described in Face Classification subsection, the result of the FFRN method precedes the matching step. The matching process is between the normalised binary facial feature region and the normalised binary facial feature region images in the database. The experiment is conducted on the test images captured from the same and different distances and environment as training images. The results of similarity used for decision making, in which the high PSNR and SSI values and the low RMSE value indicate the correct class. Figure 17 shows the GUI image of the face recognition method based on geometric and binary features. The performance of proposed face recognition method was measured quantitatively where the accuracy rate is the ratio of the number of successfully recognised face to the total number of files in the database. In addition, the accuracy is calculated for the performance of proposed method with and without applying the FFRN method. Table 3 shows the precision result of proposed method on SFV and MOBIO databases.
From the table, it shows that the proposed method demonstrates a satisfactory performance under various challenges on both databases, such as different lighting conditions, complex backgrounds, race variation and different optical zooms. Moreover, the obtained results confirm the efficiency and robustness of the proposed face recognition method, particularly when using the FFRN method which improves the face recognition performance by reducing the effects of different optical zooms on the facial features for various images of the same person.
The proposed method quantitatively compared with [47] as shown in Table 4, they proposed a method using facial attributes for face authentication of MOBIO database. A binary attribute classifier used to provide compact visual descriptions of faces. The results show the outperformance of the proposed method on MOBIO database in terms of accuracy.

V. CONCLUSION
This paper has proposed face recognition method that overcome the issues of illumination variation and different optical zooms. The binary facial feature region can be extracted and normalised together with face classification for person identification. Illumination variation and different optical zooms are the major challenges of these databases. Therefore, the main goal is to minimise the influence of these challenges and create a robust face recognition system.
The face recognition algorithm starts by scanning the candidate eye region to perform the Viola-Jones eye detection algorithm. From the eye area, the iris and their centre are detected based on the CHT method. Depending on the geometric features of the eye area and the iris centre, the reference line for the illumination model is used to calculate the adaptive threshold of the proposed CBC method. The CBC method has been introduced to preserve the facial features under various illumination conditions by converting the input frame image into binary using successive CBC method. In addition, the proposed algorithm overcomes the problem of different optical zooms, that is, its negative effect on the performance of face recognition, through the FFRN algorithm. It aims to minimize the effect of different length of close distances on the appearance of facial features for various images of the same person by extracting the normalised facial feature region based on the geometric relationship between the eye area and the height measure of the human face. The result of FFRN method is used to identify the input face by calculating the similarity measure between the obtained normalised binary facial feature region and the database images converted into the equivalent normalised binary facial feature regions. The experimental results of the face recognition algorithm have also been described and comprehensively analysed.
Finally, low error still exists in this face region detection due to the different lighting conditions and similarity of skin colour region. For future works, this proposed recognition algorithm will concentrate on color features by combining the scale-invariant feature transform (SIFT) and speeded up robust features (SURF) as to have a distinction between background and foreground features to get a maximum recognition accuracy.