Biometric-Based Security System for Smart Riding Clubs

Horses, workers or riders need safety in a farm or a riding club. On account of the great value of the horse, the breeder needs to protect it from theft and disease. In this context, it is important to detect and recognize the identity of each worker or rider and horse for security reasons. In fact, this paper proposes a Smart Riding Club Biometric System (SRCBS) consisting in automatically detecting and recognizing horses as well as humans. The proposed system is based on the facial biometrics for a horse as well as the human gait biometrics due to their simplicity and intuitiveness in an uncontrolled environment. This work suggests a Siamese network based on DenseNet features for human gait recognition, a Sparse Neural Network (SNN) based on sparse features for horse face detection and a horse face recognition method based on Gabor features, LDA and SVM. Because of the unavailability of horse databases, this paper presents a new benchmark for horse detection and recognition in order to evaluate our proposed system. The proposed systems achieved an average accuracy of gait recognition equal to 95% in the 0° view, 100% in the 90° view and 98.90% in the 180° view on Casia-B dataset, an average precision equal to 90% for horse face detection and a recognition rate equal to 99.89% for horse face identification.


I. INTRODUCTION
Object detection and identity recognition have been widely employed in recent years for several security reasons. Both animals and humans need safety. Farm animal control is required for verification of the source, the identity and the production process as well as for the livestock health surveillance. In addition, the outbreak of diseases such as bovine spongiform encephalopathy (BSE), foot-and-mouth disease (FMD) and classical swine fever (CSF) and the importance of export markets for domestic producers require the implementation of animal detection and identification programs that allow farmers to trace cattle from birth. Hence, the regulatory The associate editor coordinating the review of this manuscript and approving it for publication was Yongming Li . provisions (Orientation Law 1999) [1] made the equine identification compulsory to ensure the origin of each horse and certify their identities for the purpose of fighting againt theft and fraudulant horses as well as assuring healthy monitoring.
In this context, making sure that both animals and humans are secure seems extremely interesting in a riding club. Owing to its great value, it is very necessary to help not only the owner but also the horse so that it could be controlled without obstacles or difficulties in the barn, the box or the race track.
Well-known traditional methods of animal identification use plastic ear tags, tatoos, freeze branding, hot-iron branding or RFID electronic ear tags. However, these techniques may threaten the well-being of animals [17] and require effort to identify the animal. Because of these problems, another detection and identification way should be used for effective control by means of a stressed robust biometric marker that is invariant, fraud-proof, fast, accurate, inexpensive and non-invasive to capture. Researchers have resorted to other identification methods using different physiological biometric modalities such as muzzle patterns [51], [66] and retina pattern [65]. Although these patterns are effective, they require an effort to get the identity of the animal. In fact, these methods necessitate direct contact between the animal and the camera to capture the muzzle or retina patterns. Thus, a biometric detection and recognition system needs to be created for the comfort of both animals and humans taking into account the subject in motion for their well-being and the lack of direct contact between the subject and the sensor.
On the other hand, gait acquisition does not require contact with the sensor just like the face biometrics and is less likely to be exposed to obscurity than other biometrics. In fact, gait identification is easier to use and more secure than the other biometrics. The subject can be identified in any view and at any location. In a riding club, the gait biometrics are very useful for human identification but not for horses. In the stable, the horses remain in their boxes without walking and only the face appears. Therefore, the facial biometrics are the most suitable biometric traits for horse identification. In fact, the most suitable biometric traits that can be useful for a smart riding club security are the facial biometrics for horses and the gait biometrics for humans due to their simplicity and intuitiveness in an uncontrolled environment.
Face recognition is one of the most promising modalities for horse identification [28], [29], [72] despite the scarcity of research in this field. According to the 5 th edition of the FEI, 1 the description of the horse face is particularly different. In fact, the horse hair color is extremely varied and contains different texture patterns such as the blaze, the strip, the snip, the star and the lip marking on the head. The variation in direction and shape of these texture patterns is different from one horse to another. Indeed, the use of the face pattern is more effective for the horse identification. This paper suggests a new a Smart Riding Club Biometric System (SRCBS) using helpful modalities, features, methods and techniques to achieve an efficient automatic system with an improved performance. The proposed SRCBS is very important for breeders to ensure the safety of the horse due to its great value. Additionally, SRCBS helps breeders verify the presence of workers and riders in the club. Proposing a useful human gait recognition system as well as a horse face detection and recognition system is the main interest in this work. Regretfully, the face detection task is still very difficult mainly because of the large intra-class variation, the illumination change, the variable pose, the complex background, the partial occlusion and the uncontrolled environment. Face textures and shapes are grossly diverse, which makes the animal face detection extremely difficult. This is probably the reason why the number of related work approaches for animals is small. Besides, face and gait recognition tasks are not less difficult than the face detection task with the same problems. Despite these difficulties, recent research has achieved significant progress to resolve the interesting detection and recognition problems. The detection and recognition rates have reached nearly 90% of the human face using boosting-based and CNN-based (Convolutional Neural Network) approaches [26], [45]. The same thing was noticed for gait recognition. The fact that Convolutional Neural Networks (CNNs) are strong and fascinating classification tools is among the reasons why Deep Learning is immensely popular and widely used for computer vision tasks. Despite the rapid development of CNN, there are different challenges during its training. First, CNN requires a huge dataset for training [50]. However, the riding club does not have enough horses, workers and riders to construct a dataset for CNN training. In fact, the number is much smaller. Second, pooling layers eliminate a great deal of information and ignore the relationship between image parts according to Hinton [58]. Finally, CNNs represent a big number of parameters and layers, leading to much training time and high computational complexity.
According to the above-listed problems, the challenging questions at this step are: • How to verify the human gait using a network of small number of parameters and a small database for training.
• How to create a neural network as a backbone detector with a small number of parameters and layers in order to minimize the training time and the computational complexity.
• How to determine the powerful features and effectively detect horse faces using a small database for training.
• How to recognize horses using the simplest method for training and testing.
The challenging issue consists in establishing a safer smart riding club and proposing a new application to detect and recognize the horse face and the human gait with an easy and fast way without the need for a huge data for training. The different contributions in this paper are as follows: • Proposing a new SRCBS using appropriate modalities and efficient methods.
• Making a Siamese network based on DenseNet features for human gait recognition without the need for a huge dataset for training.
• Making a Sparse Neural Network for horse face detection (SNN) based on sparse features using the smallest number of parameters and without the need for a huge horse data for training.
• Making a system for horse face detection based on the proposed SNN.
• Employing a sparse feature selection method for face detection. The feature selection field has been well adapted resorting to many learning methods for pattern recognition applications. However, these are not devoted to object detection applications. VOLUME 10, 2022 • Using the proposed sparse feature selection method as a training optimizer for a sparse hidden layer of SNN instead of such traditional algorithms as ADAM and SGD.
• Making a new horse database called Tunisian Horse Detection Database (THDD), which could contribute to the research community of the animal biometrics.
To the best of our knowledge, this is the only dataset of public face image that is available for research on horse detection.
• Proving that our proposed SNN could get better performance than the CNNs of the traditional detectors. In the experiments, the efficiency of SNN was demonstrated for face classification compared to the other Convolutional Neural Networks using the smallest number of parameters and without the need for a huge dataset.
• Showing through experimental studies that the proposed recognition system for hoses got a better performance than the related works.
• Proving that our proposed network for gait recognition had a better performance than the related works.
The rest of this paper is organized as follows: Section II presents the related works on human gait recognition, animal face detection and animal face recognition. However, the proposed SRCBS and its sub-systems are described in Section III. Section IV is devoted to the presentation of THDD which extends and generalizes THoDBRL'2015, whereas Section V focuses on the experimental study. The conclusion of this paper which also presents some possible future work is eventually drawn in Section VI.

II. RELATED WORKS A. HUMAN GAIT RECOGNITION
This section presents brief survey developments in gait recognition and uses deep learning methods on the Casia-B dataset. Song et al. [15], Zhang et al. [77] and Merlin et al. [37] are among the works based on Siamese architectures for gait classification and matching: Song et al. [15] proposed a network known as GaitNet which is composed of two convolutional neural networks. One corresponds to gait segmentation and the other corresponds to classification. The supervision of both Siamese loss and soft-max loss were used to learn the final gait features. GaitNet achieved an average accuracy of 89.9%.
Zhang et al. [77] introduced a Siamese network which takes the gait energy image (GEI) as an input. In the training step, companion loss was employed in the middle layer using Multi-Layer Side-Output (MLSO) as a reference. Zhang et al. performed their network on the Casia-B dataset with a recognition rate equal to 75.17%.
Merlin et al. [37] proposed the CCGI gait feature which keeps more temporal and spatial differences in the gait patterns. the used model based on a convolutional neural network (CNN) gave an empirical evaluation to recognize and classify the discriminative changes of the CCGI feature. This method achieved a mean accuracy in the normal walking conditions equal to 98.87%.
Likai et al. [67] proposed the STC-Att model based on a CNN branch taking silhouettes as input and a GCN branch taking skeletons as input. They found a recognition rate equal to 97.7% for the normal state.
Hung-Min et al. [27] proposed a gait recognition framework called Temporal Attention and Keypoint-guided Embedding (GaitTAKE). The framework merges the global appearance feature with the local appearance feature based on temporal attention and a temporal aggregated human pose feature. Experimental results showed that their method achieved a rank-1 accuracy of 98.0% on the normal set of the CASIA-B gait dataset.
Liao et al. [48] introduced a model-based gait recognition method, PoseMapGait, which consists of two streams: heatmap Convolutional Neural Networks (gaitMap-CNN) and Pose Graph Convolutional Networks (gaitPose-GCN). This method was evaluated on CASIA-B dataset. PoseMap-Gait achieved a mean accuracy of 11 gallery views in the normal walking conditions equal to 75.7%.
Tianrui et al. [62] proposed the LagrangeGait framework which is composed of three branches. The first branch extracts the second-order motion feature while the second one, the branch of the GaitGL backbone, extracts the appearance feature. The third branch predicts the view of input silhouette sequence. The recognition accuracy of this method was equal to 96.9% for normal walking.
Shopon et al. [40] introduced the RGCNN architecture (Residual Connection-based Graph Convolutional Neural Network). In fact, RGCNN backbone based on Residual Connections resulted in transformed body joints. The RGCNN architecture attained a testing accuracy of 98.86% for four views on the normal walking set.
Muhammad et al. [36] proposed using the ResNet101 for feature extraction using transfer learning. Besides they introduced the kurtosis-controlled entropy (KcE) approach for feature selection followed by a feature fusion step based on correlation. The multi-class OaA-SVM was applied for classification. The prediction accuracy of normal walking was 96%.
The proposed systems and networks have already taken long CNN architectures as backbones. However, Convolutional Neural Networks (CNNs) represent a big number of parameters and layers and require a huge dataset for training. This leads to much training time and high computational complexity.
To deal with the above-listed problems, this paper used the Siamese Network for human gait recognition. The proposed network effectively exploits the human gait features which is extracted from a pre-trained DenseNet network to obtain a robust and fast recognizer using a small number of operations and parameters without the need for a huge dataset.

B. ANIMAL FACE DETECTION
The number of works in this area is very limited due to the complication of the animal face detection and recognition tasks. The existing related works in this field are as follows: Zhang et al. [73] proposed a set of Haar of Oriented Gradients (HOOG) to capture the texture and shape features on the animal head (such as cats, tigers, pandas, foxes and cheetahs). They used SVM for classification and decision calculation. Using the Cat Database, they found a precision equal to 95% and a recall equivalent to 99.8%.
Yamada et al. [9] proposed detecting dog and cat heads using edge-based features. They selected four directional features (Horizontal, Vertical, Upper Right and Upper Left) to detect the facial characteristics. They used a multi-layer classifier for features classification. Yamada et al. performed their method on a set of cat and dog images from the web. The recall rate was equal to 85% on the cat set and 90% on the dog set.
Mukai et al. [43] focused on the cat and dog face detection. They used the same Viola-Jones method and employed both the Haar and the HOG descriptors for feature extraction. Using 58 images from the Cat Database for the test, they found a recall equal to 96.6% and a precision equivalent to 75.7%. However, they achieved a recall equal to 98.3% and a precision equivalent to 90.8% using 60 images from the Stanford Dogs Dataset.
These traditional animal face detectors with handcrafted features, have been replaced in the recent works by deep convolutional neural networks with the ability to extract discriminating face features.
Vlachynska et al. [8] used the faster R-CNN proposed in [57] with ResNet-101 for dog face detection. They found an Average Precision equal to 98% on the Columbia Dogs Dataset.
Tureckova et al. [7] who used the YOLOv3 detector with DarkNet-53 for dog face detection noticed an Average Precision equivalent to 92% on the Columbia Dogs Dataset and the Oxford-IIIT Pet Dataset.
Xu et al. [12] used RetinaNet with ResNet-50 for cattle face detection. They found an average precision score of 99.8%.
Song et al. [59] optimized YOLOv3. They performed their proposed method on a sheep face dataset. The mAP was about of 97.2% of by clustering the anchor frames of the YOLOv3 compressed model based on DarkNet-53. The number of parameters of the proposed model was reduced to 1/4 times the size of the original model.
The proposed detectors [7], [8] have already taken on known CNN architectures (ResNet and DarkNet) as backbones. In addition, other detectors for animal detection [63] based on CNN were propounded. However, Convolutional Neural Networks (CNNs) ignore the relationship between image parts, represent a big number of parameters and layers and require a huge dataset for training. This leads to much training time and high computational complexity.
To deal with the previously-mentioned problems, this paper introduces the Sparse Neural Network for animal face detection SNN. The proposed network effectively exploits the animal face characterization to obtain fast classification and detection using the smallest number of parameters and without the need for a huge dataset.

C. ANIMAL FACE RECOGNITION
In the last decades, the development of facial recognition systems has been achieved using manually-noted databases in order to locate the facial area in the image. Overall, facial recognition systems have not been automated by facial detection systems [5], [22], [28], [29], [41], [53], [72], [76]. However, these methods allow high recognition rates but their systems lack automatic face detection, which is why the animal face detection system is important to ensure safety and security.
Jarraya et al. [28] benefited from the horse face properties and proposed an approach for horse identity recognition using frontal facial features of 47 horses. They used the Gabor filters and the LDA for feature extraction. However, the Euclidian (Euc) distance and MahCosine (MC) distance were employed for classification. They validated our previous system using the THoDBRL'2015 and they achieved a recognition rate equal to 95.74%.
Jarraya et al. [29] suggested a multi-view horse face recognition using the THoDBRL'2015 that contains 47 horses. They used the Gabor filters for face characterization, the Stacked Auto-encoders (SA) to reduce the size of the feature vector and SVM for classification. Using 9 images for training, they obtained a recognition rate equal to 94.22% on the frontal images.
Ouarda et al. [72] propounded a new feature descriptor (RNGLBP) based on the Gabor and LBP features. They tested the proposed approach on the THoDBRL'2015 using the SVM classifier. They reached a recognition rate equal to 98.77%.
Shi et al. [76] propounded the Residual InterSpecies Equivariant Network (RiseNet) for deep cross-species feature learning. The features of the lower and the upper halves of faces were learned separately. They merged these features as additional information to improve the performance of the proposed RiseNet. Shi et al. performed the suggested method on the THoDBRL'2015. They found a recognition rate equal to 82.56%. VOLUME 10, 2022 Face recognition was performed on other animals such as cattle by [5], [53] and dogs by [22]. Kumar et al. [53] used the SURF and the LBP for feature extraction and the Chi-Square for classification. The performance was about 92,5% with a database of 40 cattle.
Salama et al. [5] used the Bayesian optimization to automatically set the parameters for the DenseNet convolutional neural netwotk. They performed their method on a database of 52 sheep. They achieved a recognition rate of 98%.
Mougeot et al. [22] proposed a deep learning approach based on a deep convolutional neural network for the face recognition and verification of dogs. They found a recognition rate equal to 88% on a database of 48 dogs.
Weng et al.
[81] propounded a two-branch convolutional neural network for cattle face recognition. They performed their method on collected cattle face images. The performance was equal to 99.85% on cattle face images, 99.81% on cow face images, and 99.71% on beef cattle and cow mixed images.
Xu et al. [11] used RetinaFace-mobilenet with ArcFace Loss for cattle identification. The proposed CattleFaceNet outperformed on a dataset of 90 cows with identification accuracy of 91.3%.
Hitelman et al. [3] proposed to use ResNet-50V2 network with ArcFace loss function for sheep face identification. They performed their system on a database of 81 Assaf breed sheep. After transfer learning, the system achieved an average identification accuracy of 97%.
Xu et al. [20] enhanced the Siamese Neural Network for cow face recognition. Using a database of 63 cows,the system achieved an accuracy of 93%.
The previously-mentioned related works conducted their proposed methods on small datasets that contained on average between 40 and 50 subjects with no more than 10 images for each subject. This number is not sufficient for a deep neural network training. This explains the high recognition rate for works [28], [29], [53], [72] that used handcrafted features and the low rate for works [11], [20], [22], [76] which used deep neural network approaches. Hence, the proposed solution in this paper is to avoid the use of a deep neural network which represents a big number of parameters.
Owing to the importance of studying the horse, some works have been carried out recently and have given new computer solution for control and security objectives. In fact, according to North [55], the interaction between the horse and the computer is more substantial.
Hummel et al. [25] and Li et al. [80] proposed using the face pattern as it is rich in information about the life of horses such as pain, disease and feelings. The objective of Hummel et al. [25] was to recognize the pain in equines. They suggested employing the HOG features and SVM for pose estimation and the SIFT, LBP, HOG and VGG16 features as well as SVM for pain recognition. Using their own equine dataset, Hummel et al. found the F1 score to be equal to 0.89 for pose estimation and 0.53-0.87 for pain estimation. The objective of Li et al. [80] was to detect EquiFACS units automatically from the horse face images. They suggested testing the DRML and AlexNet for horse facial AU recognition. They found an accuracy between 54.0% and 58.1% using DRML and an accuracy between 52.8% and 57.0% with AlexNet on their own dataset.
Bragança et al. [19] propounded improving the gait classification of horses using data generated by the Inertial Measurement Unit (IMU). They built a dataset of 120 horses which included 7.576 strides of 8 different gaits. Their gait classification model based on the LSTM network achieved 97% of accuracy.

III. THE PROPOSED SMART RIDING CLUB BIOMETRIC SYSTEM (SRCBS)
In this section, the proposed Smart Riding Club Biometric System (SRCBS) which involves three sub-systems for human (worker and rider) and horse recognition is introduced. The objective of the first sub-system (WRIR-GB system: Human Identity recognition based on Gait Biometrics) is to recognize the person from a specific distance using the gait modality. As for the second sub-system (HFD-SF system: Horse Face Detection using Sparse Features), it is used to detect the horse face. The objective of the third sub-system (HIR-FB system: Horse Identity Recognition based on Face Biometrics) is to develop a contactless solution for horse recognition using facial features. In fact, three Camera positions were proposed for human gait and horse face detection and recognition. The first one was placed at the end of the barn for gait capture of the worker and rider from a distance in front view (0 • angle) and in rear view (180 • angle). The other two cameras were put in the barn near the horses for the capture of their facial biometrics and the human gait in profile view (90 • angle). Fig. 1 and Fig. 2 illustrates the proposed architecture of the new SRCBS system.

A. WRIR-GB SYSTEM: WORKER AND RIDER IDENTITY RECOGNITION BASED ON GAIT BIOMETRIC
The main advantage of the gait biometrics is that it offers a great potential for recognition at a low resolution and from a distance. CRF-RNN was used for body detection from image frame and background subtraction. The Siamese neural networks based on DenseNet features was introduced for gait recognition.

1) PRE-PROCESSING (CRF-RNN FOR WORKER AND RIDER DETECTION)
Since CNNs have achieved a great success in natural image analysis and the CRF outperformed other existing solutions in structural learning, the Conditional Random Fields as Recurrent Neural Networks (CRF-RNN) [61] segmentation method has been employed for the human body detection. To describe the deep learning system for semantic image segmentation, it is necessary to understand how repeated iterations are organized as an RNN. One iteration of the algorithm could be formulated as a stack of CNN layers. The transformation done by one CRF-RNN iteration was denoted by f θ using an image Img, pixel-wise unary potential values U and a marginal probability estimation H from the previous iteration. The next estimation of marginal distributions after one iteration was given by f θ (U , H , Img). Equations (1), (2) and (3) represent the behaviour of the network where T is the number of iterations while the θ vector represents the CRF parameters [61].
Using CRF-RNN, the output is a number of regions with different classes. The region reference map was obtained and only the human references have been selected. As shown in Fig. 3, the CRF-RNN of semantic image segmentation was benefited from in order to detect the human boundary area represented by the light pink color.

2) SIAMESE BASED ON DenseNet FEATURES FOR GAIT RECOGNITION
The proposed Siamese neural networks (Fig. 4) employ a unique structure to rank similarity between inputs. The used backbone is composed of a pre-trained DenseNet [21] network followed by a set of fully connected layers. The Euclidian distance, linear and sigmoid layers were placed on the network bottleneck. Once the first part of the backbone was pre-trained and frozen, the Siamese network could then benefit from powerful discriminant features for generalizing the predictive power of the network to recognize not only new data, but also entirely new classes. Using a feature vectors of the pre-trained DenseNet architecture, the system were able to achieve strong results using few parameters and less execution time.

B. HFD-SF SYSTEM: HORSE FACE DETECTION BASED SPARSE FEATURES
The difficult question, at this stage, is how to determine the powerful features and the most useful method to effectively detect horse faces using a small number of parameters and without the need for a huge set of data. Due to the large diversity of horse head textures, it would be a sophisticated task to develop a face detector. Although horses have distinctive ears, this characteristic cannot be focused on because the horse moves them frequently and changes their shape and position in a fascinating way. Therefore, our work concentrates on detecting the horse face without considering its ears.
On the other hand, it has been observed that all horses have distinctive head forms, nose and profile eyes. The horse faces have a globally similar shape, but locally variant colors and textures. For this reason, finding out how to effectively use the shape features has been our focus in order to create a robust horse face detector. Based on this idea, a detection method that focuses on the most expressive oriented gradient features has been proposed. In fact, a Sparse Neural Network for horse face detection using sparse gradient features called (SNN) has been suggested.

1) GRADIENT FEATURES FOR HORSE FACE DESCRIPTION
Since gradient features have performed well in several high-level computer vision tasks such as object detection [56], testing their performance on horse face detection is proposed in the present work. This descriptor describes the apparent objects and shapes by estimating the direction of the edges or the intensity distribution [42]. The description was carried out by dividing the image into small adjacent regions, called cells, and by calculating the gradient the directions for each cell in the histogram. The magnitude (Mag) and direction in pixel (x, y) were calculated according to the following equations where I(x,y) is the brightness value of the image in (x,y): For each cell, an oriented gradient feature vector was constructed by quantizing θ into K orientation bins weighted by the gradient magnitude [42]. The overlapped cells were grouped and normalized in order to form a wider spatial region (block). The concatenation of the block histograms formed the gradient descriptor [42].There were significant challenges in adapting gradient features for horse face detection and selecting the effective ones of the complicated face textures.

2) SPARSE NEURAL NETWORK (SNN) ARCHITECTURE
To deal with the above-listed problems (in the Introduction), the smallest network with a limited number of parameters was suggested for horse face detection. In fact, the proposed Sparse Neural Network is composed of three layers; an input layer (gradient features), a sparse hidden layer and an output layer for feature classification as in the MLP network. The input layer represents the gradient feature vector. The sparse hidden layer had a size equal to the input layer. Each neuron in the input layer had a unique relationship with the opposite neuron in the hidden layer, which reduced the number of parameters. This layer was trained using a proposed sparse feature selection method that will be explained in the next section. In fact, this method produced a weight vector W through the input gradient features. W was used as the weight vector of the hidden layer. The weight vector contained a big number of zeros and therefore all the values corresponding to zero would be falling in the hidden layer giving rise only to the pertinent features that represented 10% of the input vector. The linear activation function was used for the hidden layer (8) where G is the gradient feature vector, n refers to the number of neurons in the input and in the hidden layer, w represents a weight value in the hidden layer and g stands for a gradient feature value in the input layer. The output layer contained only one neural because the classification is binary. This layer was trained by the  Stochastic Gradient Descent (SGD) optimizer with momentum using a sigmoid transfer function. The SNN architecture is presented in Fig. 5. The code of this network is available at https://github.com/Islem-Jarraya/Sparse-Neural-Network-SNN-for-horse-face-classification

3) SPARSE HIDDEN LAYER TRAINING BASED ON SPARSE FEATURE SELECTION METHOD
In other research works, the feature selection field has been well adapted resorting to many learning methods for pattern recognition applications. However, these are not devoted to object detection applications. The main challenge of feature selection methods is how to reach accurate results using a small number of active features. Despite the efficiency of these methods, they are not accurate enough to process real world data using a small number of features. In this section, a feature selection method that treats this limitation and integrates an automated negotiation process between the PE trun and RAND algorithms has been proposed for binary classification. The input pattern sequence was (x t , y t ) where x is the input gradient features of d dimension, t = 1,. . . ,T refers to the number of iteration and y is the desired output. The sparse selection method required a classifier W t which contained at most B non-zero elements (where B > 0 is a predefined constant). Thus, the classification of x t depended only on B features and was made by the function: sgn(W t x t ).
The classifier W t would be updated in each trial t and the learner would classify the instance x t . This scenario was repeated until t = T . It has been assumed that the learner was provided with full inputs of every training instance.
The RAND algorithm which was described and used in [18] randomly selects and picks B features in a learning task. The PE trun algorithm which was described and used in [18] is a perceptron modified by a simple truncation. Both of these algorithms should respect the following 6 steps: • Step 6: Selecting relevant features Since our aim is to select only relevant features, unnecessary ones were reduced to zero. In the first step, all input features were considered to be irrelevant. Therefore, the weight vectors of the two algorithms were initialized by zeros. The role of RAND and PE trun algorithms was to keep zeros for irrelevant features and increase the weights of useful features. The first five steps were common between the two RAND and PE trun algorithms. The only difference was within the feature selection manner, which was processed in the final step. The RAND and PE trun algorithms participated both in the negotiation process and tried to select the best features. In fact, our key contribution was to incorporate the automated negotiation between the learning algorithms to improve the classification performance. The error rate was considered as the utility function of each negotiator. Negotiation, in this sense, involves the minimum number of mistakes to select the relevant features with the minimum execution time. The RAND participant sends W RAND while the PE trun sends W PE trun to the initiator. The initiator creates the union of W RAND and W PE trun into a W . The same scenario will be repeated and the participants will take the newly vector W in each iteration (Fig. 6). Algorithm 3 describes the different steps of the initiator upon receiving the W RAND and W PE trun proposals. Algorithm 2 and 1 represent the negotiation of the sparse feature selection method.

C. HIR-FB SYSTEM: HORSE IDENTITY RECOGNITION BASED ON FACIAL BIOMETRICS
According to [28], [29] it can be concluded that the Gabor descriptor is effective for horse face characterization. Gabor VOLUME 10, 2022 descriptor has proven its sufficiency in many recent works for face recognition [2], [14], [38], [71]. Jarraya et al. [28] demonstrated that the LDA was better than the PCA for the selection of Gabor features. Multi-class SVM effectiveness for horse face feature classification was proven in [72]. Moreover, Multi-class SVM is effective even in recognizing human faces [70]. Thus, it would be interesting to use these techniques and propose an identification system using the Gabor filters for feature extraction, the LDA for feature selection and multi-class SVM for classification.

IV. THDD DATABASE
To the best of our knowledge, there is no public horse face image pattern database that can be used for the evaluation of detection algorithms. Therefore, the THDD was prepared for the horse face detection and recognition system. THDD is an extension of the THoDBRL'2015 which was used for Receive y t
The THDD-part1 (First part of THDD) is a multi-view horse face database. The digital images of this dataset were taken at a distance of about 1 meter from the horses when they were in the barn. The capture was achieved by the video camera using a digital still camera of 10.1 Mega Pixels and at a resolution of 640 * 480 pixels. The horse face data were captured from 3 views: frontal view, right view profile and left view profile of the horse. This dataset contains 470 frontal face images for 47 Barbaro, Arabian and hybrid horses (Table 1), 470 left profile images and 470 right profile images. In fact, following the same database construction process of the most related works [4], [6], [52], [53], 10 frontal face images, 10 left profile images and 10 right profile images were taken for each horse.   In this dataset, there are three sets: The first one contains the captured videos, the second set includes selected images and the third one has the cropped images.
The THDD-part2 (Second part of THDD) includes a set of frontal horse images. The digital images were taken at different distances ranging from 1 to 2 meter from horses. The capture was performed by a video camera. Two digital cameras were used. The first is of 10.1 Mega Pixels and at a resolution of 640 * 480 pixels, whereas the second is of 15 Mega Pixels and at a resolution of 12080 * 720 pixels. The collected data set consists of 1103 horse images and 6000 cropped face images for 60 Barbaro, Arabian and hybrid horses (Table 1). In fact, there were two sets: The first one which was for horse face classification includes 6000 positive images and 7937 negative images while the second set which was for horse face detection includes 1103 images. The first set was used as a training set and the second one was employed as a test set in this paper. Most of these animal images had near frontal view. Fig. 7 shows some sample images from the training set of the database. Fig. 8 shows some sample images from the test set of the database.

A. DATA COLLECTION 1) HORSES IN MOTION
The animal is not sane and it is impossible to fix its head and keep its stability. It may change its place and position at any time. These changes of place spoil the distance of 1  meter (for THDD-part1) and 1-2 meters (for THDD-part2) between the camera and the animal. In order to fix the head of the animal as much as possible to obtain adequate images, the capture of horse videos was done when they were in barns. The movements of the horse were reduced but did not disappear. Thus, the distance between the horse face and the camera slightly changed on the two databases.

2) NATURAL CONDITIONS
Videos were taken from four equestrian centers in Sfax (a city in the east coast of Tunisia) at daylight. The camera was hand-held and positioned in front of the photographer eyes at a distance of about 10 cm from his face. In order to guarantee a maximum fixation of the animal head and to obtain adequate images, the videos were taken when the horses were in barns in natural conditions and without any pressure.
There was a change of the lighting in the horse face according to the position of the animal head and the sun. In addition, there were shadows such as the shadow of the walls or the leaves (Fig. 9). The background also varied in each of the captured videos (Fig. 10).

B. DATA PROCESSING PROCEDURE
As the horse cannot be fixed or kept stable, a capture of a video for each horse of about 50 seconds was opted for in order to obtain adequate poses. Based on human observation, the best photo was selected with different poses, background and luminance. Selected images were neither 100% front nor 100% profile. The view could be inclined to the right or to the left. Consequently, the facial images of our database had almost the same size and resolution. The difference was not very large.
In THDD-part1, each face area was manually cropped of the selected image for three views (Fig. 11).

V. EXPERIMENTAL STUDY OF SRBCS
A. IMPLEMENTATION DETAILS 1) DenseNet 200 epochs were used to train the DenseNet121 2 network with an input size of 224 × 224.

3) SPARSE HIDDEN LAYER OF SNN
A small fraction of features equal to 10% of the feature dimensions was selected as proposed in [18]. This fraction is enough to find favorable prediction results [18]. Indeed, 90% of the weight vector W of the hidden layer were equal to zero.

4) INTERSECTION OVER UNION (IoU)
Following the Pascal challenge [35], A detected bounding box was considered as a true positive detection only when the Intersection over Union (IoU) ratio was equal to or larger than 50%. For a more accurate evaluation, the metrics proposed by COCO 3 challenge were used. 10 different IoU thresholds were considered from 0.5 to 0.95 in steps of 0.05. The average precision was calculated over 10 IoU.

5) BLOCK GENERATION FOR DETECTION PROCESS
A very popular searching strategy has been proposed in [47] to detect face instances in the image. A sequential scan of all possible regions in the image was done by a sliding window. The highly accurate real-time human frontal face detector presented by [46] and [47] used the sliding window strategy. This technique is still used for its sufficiency in many recent detection works [33], [49], [78]. In this paper, this detection strategy was chosen owing to its interesting performance.
The digital images of the database were taken at distances ranging from 1 to 2 meter from the horses. Hence, the camera was considerably not too far from the animal and, consequently, the horse face region was a bit wide in the image. Thus, the horse face size in the image varied between 80 * 155 to 360 * 640. Taking this characteristic into account, the first stage consisted in scanning the entire image with a resized window (from 80 * 155 to 360 * 640) with 8 pixels in width, 20 pixels in height and 10 pixels stride. To control the presence of the horse face in each window, the output value of the SNN was monitored. Each window classified as face, would be kept to collect all predicted windows. The predicted windows were filtered applying the Non-Maximum Suppression algorithm (NMS).

7) MULTI-CLASS SVM
The linear kernel of the multi-class SVM was used.

B. EVALUATION METRICS
The Receiver Operating Characteristic (ROC), the Cumulative Match Characteristic (CMC) and the precision-recall curves and the classification, recognition and verification rates were recorded using different metrics such as accuracy, precision, average precision, recall, sensitivity, specificity, negative Predictive value and F1 Score. The accuracy (ACC) is calculated by dividing the total number of two correct predictions (T p +T n ) by the total images number of a dataset (N ): The recognition rate (RR) is the same as the accuracy (ACC) with the T n equal to zero and the T p is the number of recognized images. Verification Rate (VR) is calculated at a False Acceptance Rate (FAR). False Acceptance Rate is a unit used to determine the accuracy level of a biometric security system. The FAR is calculated by dividing the number of false acceptances by the number of identification attempts (FA is the number of False Acceptances and TA is the Total number of Attempts): Specificity is also called true negative rate (TNR or SPC). It is calculated by dividing the number of correct negative predictions by the total number of negatives: The negative predictive values (NPV) refer to the proportions of negative results in statistical tests which are true negative results. A strong result can be interpreted using this statistic. The NPV is defined as follows: F1-score (F1) can be useful, but it is less frequently used than the other basic measures. F1-score is a harmonic mean of precision and recall: The Intersection over Union IoU ratio is computed as a ratio between the intersection and the union of the predicted bounding box and the ground-truth bounding boxes: Following the Pascal VOC challenge [35], every true positive detection has an IoU ratio equal or larger than 50%. The precision-recall curve introduces the relation between the precision and the recall calculated for different detection thresholds. Consequently, the area under the precision recall curve presents the average precision (AP) of the detector.
A Receiver Operating Characteristic curve (ROC curve) illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. A ROC curve plots the relationship between the true-positive rate (TPR) of detection or T p rate and the false positive rate (FPR) of error or F p rate at various threshold settings. The ROC curve is a graph with: Cumulative Matching Characteristics (CMC) curves are a popular assessment measures for identification methods.
Assuming there is only one instance for each identity, for each query, the classification algorithm will classify all samples based on their distances to the query from small to large. The CMC top-k precision is: The CMC curve is calculated by averaging the shifted step functions over all queries.

1) WRIR-GB SYSTEM
The proposed Siamese network consists of two shared streams with a 224 × 224 input size of GEI image. SGD was selected as an optimizer to train the DenseNet network while ADAM was used for the bottleneck layers. The models were trained with NVIDIA GeForce GPU and a 18 GB memory. CASIA-B dataset [13] is a popular public gait dataset. It was established at the Institute of Automation of Chinese Academy of Sciences (CASIA) in 2005. It contains 93 males and 31 females. The total number of subjects is 124. For each subject there are 10 states, including 6 states of normal walking (NM), 2 states of walking with a bag (BG) and 2 states of walks with a coat (CL). In addition, for each state, every subject has 11 video sequences in 11 angles of view 0 • ,18 • , · · · ,180 • .
In order to evaluate the proposed system, our experimental framework was applied on the same partitions used in related works. The first 74 subjects were placed in the training set and the other 50 subjects were placed in the test set. In the test set, the gallery set consisted of the first 4 normal walking sequences, and the probe set consisted of the last 2 normal walking sequences. Table 2 shows comparative results with different research studies. It is noted that our approach was a competitor to the other approaches and achieved a high performances. The proposed WRIR-GB has proven its performance in the experimental part for the three views needed in the equestrian club. In fact, the results were very encouraging with a recognition rate equal to 95.0% in the 0 • view, 100% in the 90 • view and 98.90% in the 180 • view. The proposed architecture obtained not only good results but also presented a small number of parameters equal to 1 million in training. This small number of parameters allowed for quick execution which ensured its use in online recognition in a smart riding club.

2) HFD-SF SYSTEM
In the literature, the detection procedure usually includes three steps: block generation (multi-scale sliding windows or region proposals), face classification (in the backbone of the detector) and post-processing (non-maximum suppression and bounding box regression). In fact, the performance of face detectors is mainly influenced by the face classification network also known as the backbone. Duan et al. [32] discovered that the detector and the classifier of the general object detection have comparable performances using the same backbone. This explains that the designed backbone for the classification dataset is applied easily to the general object detection which gets an excellent mAP (mean Average Precision) score. Therefore, it is necessary to evaluate the proposed SNN and the HFD-SF system for horse face classification and detection on the THDD-part2.

a: CLASSIFICATION EVALUATION
For a good evaluation, the proposed SNN was compared with other CNNs such as MobileNetV2 [39] and GoogLeNet [16] which have represented the smallest number of parameters as well as ResNet-50 [60] which has been widely used for human face detection [13], [68]. In fact, MobileNetV2 contains 3.4 million pareameters, GoogLeNet contains 5 million parameters and ResNet-50 contains 25 million parameters. CNNs need a huge data for training to perform at its best. During the training of ResNet-50, GoogleNet and MobileNetV2, three types of data augmentation were used. Reflection, X and Y axis translation, and random scaling were applied to the images of the THDD-part2 training set. The training was done with a mini-batch size equal to 10. A transfer learning of the three pre-trained CNNs (on ImageNet) was made for classification on the training set of the THDD-part2. The last Fully Connected (FC) layer in these CNNs was replaced with another FC layer having two outputs (face/non-face). The classification was made on the test set of the THDD-part2 which contains 1124 horse faces and 10000 negative images randomly selected from the background.
The comparison includes the accuracy of classification and the number of parameters of each network. Fig. 13 shows the accuracy of the classifications. It is very obvious that the accuracy of SNN was very large compared to GoogLeNet and ResNet-50. The accuracy of MobileNetV2 and SNN is almost the same (Table 3). SNN presents competitive results with a small number of parameters equal to 76.928.

b: DETECTION EVALUATION
Since human and horse faces share similar structures (two eyes, nose and mouth), starting with the existing human frontal face detection approaches has been proposed. Unfortunately, applying these approaches directly on horses met some obvious difficulties. In fact, the horse faces have large appearance variations and intricate textures compared to the human face. Due to the small amount of data in our database, CNN human face detection methods could not be used as they require a large number of images for effective training. Because there are no related works to compare the results with, our approach was contrasted with the Viola-Jones detector that used for cat face detection by the authors in [43]. Our detector was found to achieve a higher performance. The Average Precision of the two detectors (HFD-SF and Viola-Jones) was 90% and 70% respectively. The detection results were compared (shown in Table 4) using two kinds of features, gradient features and intensity features (using LBP features). The average precision and recall of the two descriptors (gradient, intensity) using the proposed detection system were (90%, 90.39%) and (71.40%, 83.27%) respectively. Fig. 16 shows precision-recall curves on the THDD-part2. This figure represents the performance of the proposed detector HFD-SF on the THDD-part2 and reports that the proposed method had the biggest critical region. The Haar (Fig. 14) and weighted LBP features (Fig. 16) gave the    poorest performance because of the large texture variations and shapes of the horse head. With the help of the proposed SNN, the performance has been improved (Fig. 15). Using the evaluation metrics of the COCO challenge, Fig. 17 displays precision-recall curves calculated at 10 different IoU thresholds. Using the proposed SNN, the average precision varied between 84% and 90% for the first four IoU threshols. To reduce the search time for the horse faces in the image, the CRF-RNN was applied for regions selection to select the horse bodies. The face search was restricted to the selected area only. The experiment showed that the search with Sliding Window (SW) gave better results (Fig. 16).
The proposed SNN proved its performance on the last experiments for classification and detection. Fig 13 presents the classification accuracy on the THDD-part2. It is very obvious that the SNN gave competitive results compared to the other networks. In fact, the SNN overcame the accuracy of the other CNNs by about 1.25% on the THDD-part2 (Table 3).  The same thing was noticed for the detection process. HFD-SF based on SNN gave encouraging results with an Average Precision equal to 90%.
The use of a sparse feature selection method as a learning algorithm enhanced the information transmitted to the output layer. In fact, the proposed sparse hidden layer and training methodology contributed to a proper distinction between true and false detection (face/non-face). SNN extracted the relevant features using the sparse hidden layer and then classified the candidate bloc using the output layers. Unlike the other networks, SNN kept as much information as possible by minimizing the number of operations and parameters. Consequently, the sparse hidden layer positively influenced decisions and brought detection closer to reality.
Owing to the photos taken close to the pets in the used databases, the faces of the animals are not very small and the system easily detect and recognize them. Indeed, the proposed system cannot detect very small faces when the animal present in the photo is very far away. This was due to the poor resolution of the facial area as well as the lack of important details and information. However, the performance of the sparse feature selection method decreases as the amount of useful information is reduced. The more information there is, the fairer the SNN does the classification.

3) HIR-FB SYSTEM
Among the THDD-part1 images, only the frontal face images were interesting and seven face images were chosen per horse (329 images in total) for the training step of the facial recognition system. To better verify the performance and the efficiency of the HFD-SF system in SRCBS and achieve an end-to-end recognition system, an investigation on horse identity recognition has been integrated based on our detection algorithm results. Using the proposed HFD-SF system, faces were detected on 3 images from the test set of THDD-part2 for each horse (47 horses of the THDD-part1). These images were used for evaluation on the HIR-FB system. In fact, three face images per horse (141 images in total) were used as test images.
In the identification of applications, the camera is fixed and the background was static. It is easy to take a background model and identify objects of interest by detecting changes in the background. In our system, the case was different. In fact, the capture was done in natural conditions with varied background. Because of these reasons and the variety of luminance in our database, the facial area of the horse was cropped with Matlab from the original images of the database and resized (the same image size 160 * 380) as shown in Fig. 18. Table 5 shows that the proposed recognition system gave the highest performance in two processes: Verification and Identification with interesting results compared to the results of the related works. Fig. 19 of ROC curves and Fig. 20 of CMC curves disclose that the HIR-FB had the biggest critical  region compared to the others. HIR-FB recorded an 8.5% and a 6.62% improvement in the identification and verification rates respectively. It is worth mentioning that there had already been a suitable horse face recognition system in [28], [29], and [76]. However, their previously published results were not as prominent as the present results.
SphererFace [69] is based on a convolutional neural network of 64 layers including convolutional, fully connected and softmax layers with an angular softmax loss (A-Softmax) that enables the CNN to learn angularly discriminative features. ArcFace (Additive Angular Margin Loss) was proposed by [30] to get highly discriminative features for human face FIGURE 20. CMC curves of HIR-FB system and the approaches of [28], [72] on THDD-part1. recognition. ArcFace is incorporated within CNN architectures such as ResNet-50 and ResNet-100.
RiseNet proposed by [76] is an animal face recognition framework.
The above-listed systems are based on pre-trained networks on the ImageNet [31]. A transfer learning on the THDD-part1 obtained the results shown in Table 6. This table proves the superiority of the proposed approach based on Gabor features, LDA and SVM. The proposed HIR-FB proved its performance on the experimental part for the classification and verification processes. In fact, the results were very encouraging with a recognition rate equal to 99.89%. The proposed system improved the recognition rate by 22.44% in comparison with the standard systems for face recognition [30], [44], [69], [76]. According to these results, facial pattern can be considered as a good biometric marker for horse identification.
The multi-class SVM was used to separate the data in n classes by optimal hyperplanes. It uses different kernels for feature transformation in a new helpful representation to facilitate the optimization of the margin by reducing feature complexity. The choice of the best and simplest kernel was proposed in this work to easily separate the feature classes. The use of the multi-class SVM based on a linear kernel is justified by the linearity of data using the Linear Discriminant Analysis (LDA) which represents the features in a new space based on the projection of eigenvectors. In addition, the Gabor features and LDA proved their efficiency for horse face recognition.

VI. CONCLUSION
In this paper, a new biometric system, SRCBS, was introduced for smart riding club security. This system is based on three main sub-systems to detect and recognize both humans and horses.
In order to obtain an efficient SRCBS system, with supplementary services for humans, our paper has proposed a new method for human gait recognition based on Siamese network. Using the gait modality, the CRF-RNN for human body detection and the proposed Siamese network, the recognition rates were equal to 95% in the 0 • view, 100% in the 90 • view and 98.90% in the 180 • view on Casia-B database. These results were better than the recognition rates of the related works.
The proposed HFD-SF system for horse face detection is based on the proposed network SNN. The SNN gave competitive classification results compared to the other networks. In fact, the SNN overcame the accuracy of the other CNNs by about 1.25%. This detection system achieved encouraging results. The experiments on the THDD-part2 showed that our system is efficient by reaching a useful detection rate equal to 90%. The use of a sparse feature selection method as a learning algorithm enhanced the information transmitted to the output layer. In fact, the proposed sparse hidden layer and training methodology contributed to proper distinction between true and false detections (face/non-face). Unlike the other networks, SNN kept as many information as possible using a smaller number of parameters equal to 76928.
The proposed HIR-FB for horse face recognition proved their performance by a recognition rate equal to 99.89% on the THDD-part1. In fact, HIR-FB enhanced the recognition rate by 22.44% in comparison with the standard systems for horse face recognition. Thus, the facial pattern can be considered as a good biometric marker for horse identification according the obtained results.
The THDD was prepared for the horse face detection and recognition system. To the best of our knowledge, there is no public horse face image pattern database that can be used for the evaluation of detection algorithms. Therefore, the THDD could contribute to the research community of the animal biometrics. In fact, this is the only dataset of public face image that is available for research on horse face detection and recognition.
Our future perspectives can be summarized as follows: • Developing the SRCBS system for horse and human detection and identification at real time.
• Expanding the proposed HFD-SF system in two directions. First, it is important to improve its performance and effectiveness by designing more discriminant features. Second, extending the HFD-SF system to other animals is among our plans for the future.
[80] Z. Li  TAHA BEYROUTHY received the Ph.D. degree in micro and nano electronics from the Grenoble Institute of Technology, in 2009, and the degree in engineering education from IMT-Atlantique (Télécom-Bretagne). He is currently an Associate Professor in electrical engineering, he has authored/coauthored more than 100 peerreviewed publications in micro and nano electronics, robotics, artificial intelligence, and applied physics. He joined the American University of the Middle East in Kuwait (AUM), in November 2013, as an Assistant Professor and was promoted to an Associate Professor, in 2017. He has been the Dean of Engineering and Technology at AUM, since September 2017. He has been instrumental in AUM growth of higher education through his broad experience in academic leadership and commitment to both a studentcentered education and a technologically empowered teaching and learning environment.