Adaptive Multiple Layer Retinex-enabled Color Face Enhancement for Deep Learning-based Recognition

Face image captured under uncontrolled illumination conditions is one of the most significant challenges for real-world human face recognition systems. To overcome this problem, we proposed a novel method called adaptive multiple-layer retinex-based color face enhancement (AMRF) to enhance the face images. Firstly, we use an associative filter to decompose a color face image into illumination and reflectance components at multiple layers. Then, the illumination components in each layer are adjusted automatically by multiplying with corresponded illumination compensation coefficients calculated through a referenced Gaussian template. The enhanced color face image is finally obtained by composing the compensated illumination components with the integrated reflectance component based on the Retinex theory. The experiment was performed on four popular color face datasets: LFW, IJB-C, CMU Multi-PIE, and CMU-PIE. Our proposed method makes face images more precise, natural, and smooth. The experiment results show that AMRF’s image quality assessment scores are significantly better than the original and other enhanced methods’ images. Furthermore, AMRF considerably improves the recognition accuracy of deep learning-based face recognition models, such as FaceNet, and ArcFace. Finally, our proposed method also saves computational time comparing the other techniques.


I. INTRODUCTION
In recent years, face recognition has become a popular research topic of computer vision, pattern recognition, and machine learning due to its essential role in real-world applications. Achievements accomplished in recent researches on face recognition are significant. However, many challenges remain that affect the high performance of a human face recognition system [1]. These challenges are caused mainly by the significant variations in facial expression, viewpoint, illumination conditions, noise, etc. Meanwhile, an exemplary face recognition system should ideally provide a capability to recognize a face, in which this face image is independent as possible of the variations [2]. Among mentioned factors, the illumination variation is acknowledged as the most complex and challenging challenge in face recognition. Fig. 1. shows an example of one human face under different illumination conditions. As seen, the quality of the observed face image is severely affected by the illumination variation. Even it is nearly impossible for human visual to judge the identity of the face images under harsh illumination conditions. Unfortunately, this problem is unavoidable in human face recognition systems in real-world applications, such as public security, social multimedia, or intelligent systems. Therefore, illumination compensation of the face image is a practical approach to eliminate lighting variations before face recognition. [3]. In the past decades, numerous methods [4]- [19] have been proposed to overcome the problem of illumination variation, which can be mainly grouped into three categories: methods using a transform in the spatial domain [4]- [7], methods using transform to the frequency domain [8]- [11], and methods based on deep learning [17]- [19]. Here, we briefly overview the corresponding methods of these three categories.
The first category involves methods employed in the spatial domain to overcome the illumination variations problem. Most methods in this category try to estimate the spatial low-frequency component, which is contained the illumination information of the face image. Liu et al. [4] proposed methods based on Local Histogram Specification (LHS) to pre-process face images under different illumination conditions by using a high-pass filter on a face image to remove the low-frequency illumination signals, as well as enhance the feature of the face images. Shan et al. [5] use Gamma Intensity Correction (GIC) to normalize the entire image intensity. Then the histogram equalization (HE) and Quotient Illumination Relighting (QIR) methods are applied to synthesize face images under a pre-defined normal lighting condition. Shim et al. [6] proposed a face relighting method by estimating the pose, reflectance, and lighting from a set image of a face. In this way, they train a pose and pixeldependent subspace model of the reflectance function using a face dataset containing samples of pose and illumination variations. In a different approach, Yang et al. [7] use a physical model named retinex decomposition to extract the invariant component considered the reflectance of the face image by using the bilateral filter. Although the results of the mentioned methods show that it can partly resolve the illumination variation problem, there are remaining problems when using these methods. Firstly, the improvement of recognition accuracy by using these methods is relatively low. In addition, the result of face images is not natural and smooth when a huge amount of information of the face images are significantly removed, especially the color information demonstrated as one of the most important factors for face recognition in many studies [8]- [11].
To overcome the shortcomings of the methods in the first category, numerous methods employed in the frequency domain have been proposed. Goh et al. [12] used the wavelet transform to eliminate the variant illumination existing on the low-frequency subband and enhance the face image by using local binary pattern histogram (LBP). Then, a face image with invariant illumination is obtained by adopting the wavelet fusion technique. Chen et al. [13] proposed a method using a discrete cosine transform (DCT) to enhance the illumination in the logarithm domain. Accordingly, the lowfrequency band mainly contains variations of illumination. Therefore, an appropriate number of the DCT coefficients are eliminated to minimize the illumination variations of face images. On the other hand, Wang et al. [14] proposed a novel method named adaptive singular value decomposition (SVD) in the two-dimensional discrete Fourier domain (ASVDF). They adopt the SVD and Fourier transform to enhance the face images to eliminate the influence of the environmental lighting of the face image under the variant illumination conditions. Accordingly, the proposed method first transforms a color face image into the two-dimensional discrete Fourier domain to obtain the magnitudes of three color channels. Then, the illumination of the face image can be compensated by multiplying the singular value matric of the magnitudes with their compensation weight coefficients. Similarly, Wang et al. [15] proposed adaptive SVD in the 2D discrete wavelet domain (ASVDW) to enhance a color face image. This proposed method is implemented by computing the distributions of the brightness pixels in the three color channels of the color face image and the correlations among its wavelet subbands. Although these methods can produce clear and natural face images and relatively improve the recognition rate of face recognition systems, there are still problems in the efficiency of face image enhancement and computational time consumption.
In recent years, deep learning-based enhancement methods have been proposed gradually in the computer vision domain. In particular, generative adversarial network (GAN) [16] is a strong and powerful model, which has achieved many successes in solving image-to-image problems. Therefore, most of the methods in the third category are proposed by using GAN to resolve the illumination variation problem. Han et al. [17] proposed the asymmetric joint GAN (AJGAN) to normalize face images under different illumination conditions. Zhang et al. [18] proposed the IL-GAN framework, which adopts the variational auto-encoder and conditional GAN (cGAN) to produce the illumination variation descriptors for face recognition. Hu et al. [19] proposed a FIN-GAN framework combining the retinex theory and conditional GAN for face illumination normalization. Firstly, they use a selfsupervised learning model to learn the illumination variation under different lighting conditions of a pair of face images. Then, the conditional GAN is adopted to generate the illumination normalization. However, the deep learningbased methods have a common characteristic that the illumination condition is directly adjusted by image-toimage translation. Therefore, these methods require a set of face images used for the training process of the network model. As a result, it leads to inefficiency since it cannot provide high performance and adaptability in practical applications.
To resolve the problems from the mentioned methods, in this paper, we proposed a novel method named adaptive multiple layer retinex-based color face enhancement (AMRF). First, we adopt the retinex theory [20] to represent a face image in combinations of multiple illumination and reflectance components by using an associative filter in the spatial domain. Then, a Gaussian template corresponding to the given face image is generated as an adjustment reference template for illumination compensation. Next, the illumination components in each layer are enhanced adaptively by multiplying with corresponded compensation weight coefficients. The enhanced color face image is finally obtained by composing the enhanced illumination with the integrated reflectance components. The experiments performed on four of the most famous color face datasets, namely, LFW [28], IJB-C [29], CMU Multi-PIE [30], and CMU-PIE [31], show that the proposed AMRF method cannot only enhance color face images to become more natural, clearer, and smoother, but can also dramatically improve the recognition accuracy and computational time consumption of human face recognition systems.
The rest of this paper is organized as follows. Section II presents the multiple layer representation of the color face image by using the retinex model. Section III details our proposed method. Section IV discusses the experimental results, and Section V presents the conclusions.

II. COLOR FACE MULTIPLE LAYER REPRESENTATION
In this section, the representation of a color face image in multiple layers by using the retinex model based on the associative filter is discussed in detail. Firstly, the associative filter-based retinex model used for color face decomposition is presented. At last, the method used to represent a color face by combinations of multiple layers of illumination and reflectance component is introduced.

A. ASSOCIATIVE FILTER-BASED RETINEX MODEL FOR COLOR FACE IMAGE
The retinex theory was early proposed by Land and McCann [20], [21] as a model of human vision's lightness and color perception. In particular, the light coming to the human eye is essentially the combinational product of the illumination and reflectance components. These two components always maintain a dependent correlation from each other. The human eye cannot determine the illumination unless the reflectance is uniform, and the eye cannot determine the reflectance unless the lighting is uniform. In the digital image processing domain, the correlation of illumination and reflectance are mathematically modeled as follows: where ( , ) indicates an image, ( , ) is the reflectance component, ( , ) is the illumination component of the given image , and ∈ { , , } is the color channel index. The operator 〈•〉 denotes the element-wise multiplication of two matrices. The estimation of the illumination and reflectance components is based on the separation of spatial-frequency subbands of the given image. In particular, the illumination component resides in the low-frequency subband, and the reflectance component resides in the high-frequency subband of the image. Therefore, the illumination represents the various lightness of the object. In contrast, the reflectance shows the natural properties of the captured object, which is considered to be consistent under the different lightness conditions. Many studies attempted to propose algorithms for separating illumination and reflectance components based on this idea. However, most algorithms are based on using a low-pass filter, such as Gaussian [22], bilateral [23], or bright-pass filter [24], to estimate the illumination component. Then, the reflectance component can be estimated by the ratio of the original image over the estimated illumination component.
In general, filters used for the retinex model to estimate the spatial low-frequency component of an image play an important role. It is the primary key to deciding the pureness of estimated components. Several typical low-pass filters, such as Gaussian, bilateral, bright-pass filters, were used to estimate the illumination in the previous studies. However, these filters cannot effectively remove the fine details because of the global filtering implements of these filters. This paper utilizes a low-pass filter called associative filter [25] to represent a color face image through combinations of multiple layers of illumination and reflectance components. The associative filter outperforms to remove fine details of an image than other typical low-pass filters used in [22]- [24] by its capability to calculate the local maximum's weighted average. The weight is negatively related to the difference of the pixels between the input image and its local maximum. In addition, the complexity of the associative filter is much lower than other filters. Finally, the product of the associative filter application is a pure and uniform low-frequency component.
The illumination component ( , ) of color channel in Eq. (1) can be obtained by using the associative filter defined in [25] for a color channel ( , ) of the given image ( , ) as follows: where the local patch Ω( , ) and are set empirically.
( , ) is the given color channel image, ( , ) is the coarse image and is defined as the local maximum of ( , ) as follows: where Ω′( , ) is a sliding window, and centered at ( , ). ( , ) denotes the normalization factor to ensure pixel weights sum to 1.0 as determined as follows: Using the associative filter, the high-frequency spatial component, which contains the details of the color face image, is effectively removed. Besides, the low-frequency component considered as illumination of the given color face image is preserved. Therefore, the reflectance component corresponding to such estimated illumination can be determined as the ratio of the given image over its corresponding estimated illumination as follows: An illustration of the retinex decomposition based on the associative filter on color face images is shown in Fig. 2. As seen, the shape and details of the face image in Fig. 2(a) are considerably removed in the illumination component depicted in Fig. 2(b). In contrast, this relevant information is mainly contained in the reflectance component, as shown in Fig. 2(c). Besides, although these face images were taken under different illumination conditions, the reflectance, which is the invariant component can be effectively separated from the illumination component. Therefore, the color face image can be decomposed into the illumination and reflectance components effectively by using the retinex model based on the associative filter.

B. MULTIPLE LAYER REPRESENTATION OF COLOR FACE IMAGE
Since the high-frequency spatial information, such as shape and details, may still exist in the whole range of the spatial frequency bands, decomposing a face image at the first level by using the retinex model based on a low-pass filter might not be enough to separate the illumination and reflectance components effectively, as shown in Fig. 2. Therefore, multiple layer decomposition is proposed to resolve this problem to represent the information of a color face image in multiple spatial frequency subbands. In this way, the information of the face image can be represented in more detail, and valuable information can be further used to enhance the quality of the face image.
To implement the multiple layer decomposition, the estimated illumination at the ℎ level of color channel ∈ { , , } , denoted as ( , ), is further decomposed into a low-frequency component +1 ( , ) and a high-frequency +1 ( , ). Finally, the illumination component +1 ( , ) is estimated by filtering the illumination component ( , ) at the previous level with the associative filter as expressed in Eq. (2). Therefore, the decomposition is implemented to satisfy: where 1 ( , ) is obtained by filtering the color channel ( , ) by the associative filter in Eq. (2). The reflectance component +1 ( , ) is computed by Consequently, an image ( , ) is composed of multiple spatial frequency bands as follows: where is the number of layers. Since the reflectance component is calculated as the ratio of the previous and its present illumination components, the dynamic range or the lightness range of the low-frequency component after one decomposition operation becomes smaller or the variation of +1 ( , ) is less than ( , ). Once the number of layers reaches an enormous enough value, the variation of the illumination is tiny, and its lightness will be completely uniform.   3 illustrates the multiple layer decomposition process of the retinex model based on the associative filter to obtain illumination and reflectance components in different spatial frequency subbands of a color face image. First, the estimated illumination component obtained at the first level is further used as the following decomposition process input. Then, in the second level decomposition, the estimated illumination at the first level is filtered by the associative filter one more time to obtain the second level illumination and reflectance components, and so on. In this way, the decomposition process is implemented times iteratively until obtaining an utterly uniform illumination component denoted as ( , ).  An illustration of the multiple layer decomposition using the retinex model based on the associative filter of a color face image is shown in Fig. 4. As seen, the illumination component at the 1 st level ( = 1), as shown in the 1 st column of Fig. 4(b), still contains facial primarily structures which are reflectance information. Similarly, the reflectance information remains considerable in the 2 nd , 3 rd , 4 th , and 5 th illumination layers. However, each filtering operation removes more detailed information to make the illumination component smoother. As a result, the variation of the illumination component is gradually decreased during the increase in . When the number of the decomposition operation reaches over 10, the detailed information existing on the illumination components is mostly eliminated since its corresponding reflectance components shown in Fig. 4(c) does not contain much detailed information of the color face image. In other words, the estimated illumination component becomes pure and more uniform when the details and shape of the color face image are mostly removed. Therefore, the color face image can separately decomposed into multiple layers of illumination and reflectance components by the associative filter-based retinex model. Based on multi-level estimated illumination components, we propose a method to adaptively enhance the illumination of the color face image at different frequency bands by multiplying the estimated illumination components by corresponding adaptive compensation weight coefficients. The detailed algorithm is discussed in the next Section.

III. MULTIPLE RETINEX-BASED COLOR FACE IMAGE ENHANCEMENT METHODOLOGY
Illumination is demonstrated as a vital factor in face recognition. To overcome the invariant illumination problem in face recognition, we propose a novel method called the adaptive multiple layer retinex-based color face enhancement (AMRF). This method involves three primary steps, as demonstrated in Fig. 5. In particular, we first adopt the retinex model using the associative filter to represent a color face image through combinations of multiple layers of illumination and reflectance components. Then, the estimated illuminations are enhanced with compensation weight coefficients generated by using a corresponding Gaussian template as the adjustment reference. To this end, the enhanced color face image is yielded by composing the enhanced illumination with the integrated reflectance component. The advantage of this approach is that the face image is more natural since it still retains both illumination and reflectance components. The distribution of pixel intensity values of an image can be described statistically by the probability density function (PDF). A normalized intensity image that does not suffer any illumination problem can be considered a matrix with a PDF having a Gaussian distribution with a mean of 0.5 and a standard deviation of 1.0 [26]. Therefore, the face illumination enhancement can be obtained by adopting a matrix with the PDF of Gaussian distribution as the reference template. This Gaussian template, denoted as , is designed to have the same size as the given face image and PDF as follows: where is the mean and denotes the standard deviation. Due to the dynamic range of the pixel intensity value of an image which is between 0.0 to 255.0, the values of and of the Gaussian template are linearly scaled to 128.0 and 32.0, respectively.
As discussed in section II.B, the illumination information exists in the whole range of spatial frequency bands. Therefore, to enhance the color face image, we utilize a Gaussian template as an adjustment reference to compensate for the illumination in the whole ranges of spatial frequency subbands of the face image. The objective is to obtain an enhanced color face image with a near normal distribution. Given a cropped color face image denoted as of size × and its corresponding Gaussian template, denoted as , are first split into RGB color channels to obtain and , ∈ { , , }, respectively. Then, and are simultaneously decomposed to obtain illumination and reflectance components by using the associative filter-based retinex model. Therefore, and can be expressed at the first layer ( = 1) as follows: and where ,1 ( , ) and ,1 ( , ) are the illumination components of color channel of the given color face image and its corresponding Gaussian template, respectively. , =1 ( , ) and , =1 ( , ) denote the reflectance components corresponding to ,1 ( , ) and ,1 ( , ) , respectively. The illumination component is then compensated to obtain enhanced illumination, denoted as , =1 , ( , ) , before decomposing to receive the illumination and reflectance component at the 2 layer, and so on.
To enhance the illumination of the face image, the illumination at the ℎ layer is compensated by multiplying simultaneously three illumination components of the RGB color channels by adaptive compensation weight coefficients. The compensation operation for the ℎ layer illumination is briefly described in Algorithm 1. Accordingly, the mean precompensation value of each illumination component is first calculated. Then, by using the highest mean value as the reference, the individual compensation weight coefficients for the remaining color channels are obtained adaptively by using their ratios to the reference value. The mean values of each color channel illumination component of the color face image and corresponding illumination component of the Gaussian template are calculated as follows: , =1,2,.., and , =1,2,.., Then, the highest mean of three illumination components of the color face image is determined as follows: Return , , .
Our purpose is to adjust the illumination components automatically to obtain the enhanced color face image with a near normal distribution. Therefore, if the face image is dark, the illumination is compensated with a scalar larger than 1.0. In contrast, if the face image is too bright, the illumination of the image is reduced gradually by multiplying the illumination components by a scalar smaller than 1.0. To do this, we propose an adaptive illumination compensation weight coefficient for each color channel at the ℎ layer as follows: The enhanced illumination component at the ℎ layer is obtained by multiplying by the illumination compensation weight coefficient as follow: , , ( , ) = × , ( , ).
In this way, the illumination is enhanced adaptively in each layer decomposition. In addition, the illumination of the face image becomes more uniform since its variation is gradually decreased. To this end, the uniform illumination is obtained and enhanced before composing with the integrated reflectance component to produce the enhanced color face image, denoted as as follows: where ( , ) is the enhanced uniform illumination component, and ( , ) denotes the integrated reflectance component, which is determined as follows: To this end, the enhanced color face image, denoted as ( , ), is finally obtained by merging three color channel images ( , ). The equations mentioned above indicate that we adaptively adjust the compensation weight coefficients in multiple illumination layers to enhance the overall illumination of a color face image. The most critical work in the proposed method is to compute the compensation weight coefficients in Eq. (16). Accordingly, a color face image without any illumination problem can be considered an image with a Gaussian distribution, in which the mean value is 128.0 and a standard deviation of 32.0. By multiplying In this way, the color channel correlations are effectively manipulated to recover the natural skin tone of the color face image, which is hugely affected by non-uniformly lighting conditions. As seen, the color face images in Figs. 6(a) and (c) are affected highly by the environmental lighting conditions. The illumination of several color face images is relatively low. It leads to detail elimination in the color face image. In addition, the facial skin color is suppressed by the lighting color of the environment. By using our proposed method, the illumination of the color face image is enhanced effectively. The enhanced images have a near normal distribution rather than the original images. The computation of enables adaptively sliding adjustment of the compensation weighting for the illumination components. The implementation of the proposed method can be summarized as follows: In this study, we propose the AMRF method for facial illumination enhancement as a pre-processing step to improve the recognition performance of human face recognition systems. After enhancement, the face image becomes more natural, precise, and smoother. Therefore, our proposed method can be automatically applied to a database of any kind to improve the quality of face images for deep learning-based face recognition, as discussed in the next section.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
In this section, we implemented experiments to demonstrate the performance of our proposed method in color face image enhancement. Firstly, we adopted image quality assessment measures (IQAs), namely PIQUE [32], BRISQUE [33], and NIQE [34], to evaluate the quality image enhancement capability, especially the facial naturalness recovery. Then, to illustrate the effectiveness of the proposed method in improving the face recognition performance, we implemented experiments on the four most famous color face databases with large illumination variations, namely, LFW [28], IJB-C [29], CMU Multi-PIE [30], and CMU-PIE [31] datasets. We adopted state-of-art deep learning-based face recognition models FaceNet [36] and ArcFace [37] to evaluate our proposed method's performance comprehensively. The performance of the proposed method is further compared with other effective previous methods, namely, ASVDF [14], ASVDW [15], FIN-GAN [19], GRIR [35], and to demonstrate the high performance of our proposed method for face recognition.

A. THE ASSOCIATIVE FILTER PARAMETERS
The effectiveness of the proposed method firstly relies on the performance of the illumination estimation task of the retinex decomposition and then is based on the computing of compensating coefficients. In this subsection, we discuss the configuration of the associative filter to select the appropriate parameters used for face enhancement in our proposed method. As discussed in section II.A, the weighted average of the coarse image (maximum local image) and the difference of the pixels between the rough image and the reference image are the key ideas of the associative filter. Therefore, we conducted experiments for choosing the size of the local patch Ω and the sliding window Ω′ . There were three different types of associative filters, denoted as _53, _97, and _157, to be discussed in our experiments. The configuration of each filter is shown in Table I.
The criteria for choosing the suitable filter in our proposed method are based on its performance in removing the fine detail of a color face image and computational time consumption. Therefore, we first evaluate each filter's computational time consumption for each layer decomposition of a color face image of size 128×128 pixels. The results of three types of filters are shown in Table II. As seen, the first filter, namely _53, only consumes 0.07 seconds, whereas the second and third filters, denoted as _97 and _157 , cost about three and six times compared to _53 to decompose a layer, respectively.  These three different types of filters are further evaluated in face enhancement performance. We first collected 500 dark face images and 500 bright face images from the CMU Multi-PIE database to compare their performances. Then, we grouped these images into two categories: dark and bright. Finally, we applied our proposed AMRF method with different filters to enhance the color face images for each category. The averages of the compensation weight coefficients for the face images in multiple layers are used for evaluating the performance of the three types of filters. The experimental results are depicted in Fig. 7. As seen, the compensation weight coefficients of the proposed method using all three filters alternately for both dark, and bright images are generally stable. In particular, the average of the compensation weight coefficients is gradually decreased to close to 1.0 during the increment of layer index for the dark images. In contrast, the coefficients slowly increase to be near 1.0 when the number of layers increases for the bright images. Therefore, the coefficients of both dark and bright images are converged at the value of 1.0 and reach a stable state of illumination. As seen in Figs. 7(a), (b), and (c), the steady-state of coefficients of all three filters are reached when the number of layers in our proposed method is over 10. In other words, the illumination of facial images is completely enhanced after ten layers. In addition, the coefficients in Figs. 7(b) and (c) at the period from the 5 ℎ to 9 ℎ layer, the average weight compensation coefficients for filters _97 and _157 , are not stable since the coefficients for dark images are less than 1.0 and bigger than 1.0 for the bright images. However, the coefficients for the _53 are more stable, smoother in this period, and converge once the layer index reaches over 10. Therefore, by using a filter with the _53 configuration, the color face image is enhanced stably and faster than other filters. After comprehensively comparing performances and the computational time consumption of three filters used for our proposed method, _53 outperforms the two remaining filters. Indeed, _53 is the smallest size for local patches and sliding windows. Therefore, the computational time consumption used for estimating the illumination is less than others. Besides, its performance in face image enhancement outperforms higher than others. Thus, in this study, we adopted the _53 for our proposed method to reduce the computational time consumption and maintain the high performance in enhancing color face images. Furthermore, as discussed above, the number of layers in our proposed method is fixed at ten layers. In addition, the computational time can be further decreased by down-sampling the estimated illumination after each layer decomposition [27]. Therefore, we use the sampling scaled ratio of 1.25 for each layer decomposition. If the sampling rate is more significant, the image quality will be affected. On the other hand, it will cost you time if it is smaller. Besides, we adopt the linear interpolation method to save the computational time while the quality of face images is retained sufficiently.

B. COLOR FACE IMAGE QUALITY ASSESSMENT
Besides facial illumination enhancement capability, the proposed method enables the naturalness and smoothness of enhanced images. To demonstrate the performance of the proposed method in the naturalness enhancement for the color face images objectively, we assessed the quality of result enhanced images with three state-of-the-art image quality assessment metrics (IQA), namely perception-based image quality evaluator (PIQUE) [32], blind/referenceless image spatial quality evaluator (BRISQUE) [33], and naturalness image quality evaluator (NIQE) [34]. The assessment was carried out on 1,000 cropped 128×128-pixel color face images under various illumination conditions from the CMU-PIE, CMU Multi-PIE, LWF, and IJB-C datasets.
PIQUE [32] measure is a no-reference image quality metric used for blind image quality assessment through block-wise distortion estimation. The PIQUE value is determined as follows: where, is the number of spatially active blocks, 1 and are a positive constant and a parameter set empirically, is the variance feature of the block, is the set threshold, and , are the block and segment standard deviations, respectively. A lower value of indicates high perceptual quality and vice versa.
Like the PIQUE measure, BRISQUE [33] is a blind/noreference image quality assessment model using scene statistics of locally normalized luminance coefficients to quantify losses of naturalness of a given image. The BRISQUE measure is determined based on three main steps. Firstly, the BRISQUE model extracts the natural scene statistics of a given image by computing locally normalized luminances via local mean subtraction and divisive normalization as follows: where, is the constant to prevent instabilities, ( , ) and ( , ) are the local mean and local variance fields, respectively. The normalized luminance image is then used to generate four pair-wise products, horizontal, vertical, leftdiagonal, and right-diagonal, to capture pixels' neighborhood relationships before calculating feature vectors by generalized Gaussian distribution (GGD) for predicting an image quality score.
To comprehensively demonstrate our proposed method's facial naturalness enhancement performance, we further adopted the NIQE measure [34] to evaluate the color face image quality along with PIQUE and BRISQUE measures. NIQE is a completely blind image quality assessment evaluator based on the construction of a quality aware collection of statistical features, which are derived from a corpus of natural images. The quality of a given image is expressed as the distance between the quality aware feature model and the multivariate Gaussian model (MVG) as follows: where, 1 , 2 are the mean vectors, and 1 , 2 are the covariance matrices of the natural MVG model and the given image's MVG model. A smaller score of the NIQE measure indicates better perceptual quality and vice versa.
Our evaluation experiments alternately used the mentioned IQA metrics to compute average image quality scores of 1,000 original color face images and their enhanced images by ASVDW, ASVDF, GRIR, FIN-GAN and proposed AMRF methods. It is noted that all color face images used for the quality evaluation are uncompressed bitmap image files (BMP) to preserve the highest quality of the images. The experimental results of the image quality assessment are shown in Table III. As it can be seen, all enhancement methods used in the evaluation can enhance the quality of the color face images. However, there is a large range of variations between the IQA scores of these methods. In particular, while the GRIR method can only slightly improve the quality of the color face images, other methods, such as ASVDW, ASVDF, and FIN-GAN can considerably improve the color face image quality. However, our proposed AMRF method even performs better in enhancing the quality of the color face images since its IQA scores are significantly lower than the scores for the original and other methods' images.

C. RESULTS OF THE LFW DATASET
The Labeled Faces in the Wild (LFW) [28] is a public benchmark dataset compiled by the Computer Vision Laboratory of Massachusetts State University Amherst for face recognition. The LFW dataset contains 13,233 face images of 5,749 subjects collected from the web. Each face image was labeled with the name of the subject. The face images in the LFW dataset are captured under unconstrained environmental conditions. Thus, it provides a large range of variations seen in daily life, such as lighting, expressions, poses, ages, gender, clothing, background, camera quality, and more. In this study, the LFW dataset is utilized to evaluate our proposed method's performance in complex scenarios.
The LFW dataset involves four different sets of images, including the original and three different types of "aligned" images named "funneled images" (ICCV 2007), LFW-a, and "deep funneled" images (NIPS 2012). This study adopted the "deep funneled" images (NIPS 2012) set for evaluating the proposed AMRF method performance. Firstly, we extracted the face images in the NIPS 2012 set of the LFW dataset using a deep learning-based face detection model, namely, multitask cascaded convolutional neural network (MTCNN) [38]. MTCNN is a framework developed for both face detection and face alignment. The process of MTCNN involves three stages of convolutional networks that can recognize faces and landmark locations. All extracted face images from the LFW dataset are then resized to 160×160-pixel images.  These images are taken with a large range of variations of many factors, such as pose, background, camera quality, and expression, that are the leading causes affecting human face recognition systems. Thus, to achieve a higher performance of a face recognition system, the quality of the color face images needs to be enhanced by using face image enhancement methods. Figs. 8(b), (c), and (d) show the ASVDW, ASVDF, and GRIR images, respectively, corresponding to original images in Fig. 8(a). As noticed, the ASVDF images have problems with insufficient illumination enhancement, while the ASVDW images are over-enhanced to produce many over-lighting areas on the face images. Although the GRIR can slightly enhance the illumination, it cannot recover the facial skin color and the naturalness of the face images. In contrast, the proposed AMRF method can efficiently enhance the face images under various illumination conditions, as depicted in Fig. 8(e). The AMRF images can achieve a balance in illumination enhancement to make the face images more natural and smoother than the ASVDW, ASVDF, and GRIR images.
FaceNet [36] is a deep learning-based face recognition method that directly learns a mapping of faces images to a compact Euclidean space. The distances between the face images in the Euclidean space are used to measure the face images' similarity. By using FaceNet, face recognition, verification and clustering can be implemented simply by using FaceNet embeddings as feature vectors to represent face images in Euclidean space. However, to achieve high accuracy, the FaceNet embeddings need to be optimized by training a deep convolutional network with a triplet loss function, which determines the relative difference of the distances between the matching pairs and non-matching pairs. In our study, we implemented two different scenarios to evaluate the performance of FaceNet on the LFW dataset.
In the first scenario, we use the pre-trained FaceNet model as a classifier to predict and generate feature vectors for face images. Then these feature vectors are used as input vectors to predict the identity of given face images. In our experiment, we adopted two pre-trained FaceNet models provided by David [39]. Both pre-trained FaceNet models adopted the Inception ResNet V1 network [40] using softmax loss for training on the CASIA-WebFace [41], and VGGFace2 [42] datasets. The pre-trained FaceNet models were trained on the datasets including cropped 160×160-pixel face images with 1% of the training images were used for validation. The models were trained by 275,000 steps with a batch size of 90 and the dataset is evaluated once every 5,000 steps. We use the pre-trained FaceNet models to extract FaceNet embeddings (128-dimensional feature vector) for each face image and compare its embeddings with the embeddings of another face image by using the similarity measures, namely, Euclidean or cosine distances. Then, the face match is determined by comparing the distance to a predefined threshold. We alternately evaluate the face recognition accuracy of the pretrained FaceNet models with a batch size of 1 on the testing sets from the original, ASVDW, ASVDF, GRIR, and AMRF images. In addition, to make the model able to distinguish whether images belong to the same person, we randomly generate a pair character list based on the validating set in advance that contains two categories: the pairwise combination images of the same person labeled as "1" and the pairwise combination images of different people labeled as "0". In our study, we generated a number of pairs for validating set equal to the number of subjects in the LFW dataset for each pair category. Thus, there were 5,749 pairs created for the same person and 5,749 pairs for different people. The average recognition rates of the pre-trained FaceNet models for the LFW images are shown in Table IV. As seen, the average recognition rates of the AMRF images are higher than ASVDW, ASVDF, and GRIR images in all evaluations on the LFW dataset. In particular, the recognition rate of the FaceNet model pre-trained by CASIA-WebFace for the AMRF images is higher than the original, ASVDW, ASVDF, and GRIR images by 0.81%, 0.46%, 0.52%, and 0.57%, respectively, when using cosine distance metric and by 0.55%, 0.25%, 0.21%, and 0.29% with Euclidean distance metric. By using the FaceNet model pre-trained by VGGFace2, the average recognition rate of the AMRF images is greater than the original, ASVDW, ASVDF, and GRIR images by 1.44%, 0.27%, 0.42%, and 0.31% when using cosine distance metric and by 1.96%, 0.15%, 0.33%, and 0.36% for Euclidean distance metric. In the second scenario, we implement training the FaceNet model from scratch without pre-trained weights on the LFW training set and evaluate the recognition accuracy by using the trained FaceNet model. In this study, we only use one image for training, one image for validating, and the remaining images are used for testing for each person. For the person with only one image, we use that image for the training, validating, and testing sets. Therefore, the training and the validating sets contain 5,749 face images corresponding to 5,749 people for each set, while the testing set involves the remaining 10,652 images. While training the model with triplet loss function is heavy and slow, training the Inception ResNet V1 model as a classifier (not using triplet loss) is significantly easier and faster. Therefore, we implement training the FaceNet model using the LFW dataset and the softmax loss in this evaluation scenario. We train the model on the training set in 100,000 steps corresponding to 100 epochs of size 1,000 and evaluate the validation set's accuracy once every 5,000 steps. The batch size for each training step is 128 in our experiment. The learning rate is scheduled to be set to 5×10 -2 for epochs in the range of 1-60, 5×10 -3 for epochs in the range of 61-80, and 5×10 -4 until the number of epochs reaches 100. The data augmentation is also applied during the training process to improve the training performance. Accordingly, we only apply the "random flip" function for the data augmentation task. The RMSProp optimizer is adopted for the training model. In addition, the size of the generated embedding is set to 512 dimensions. The trained FaceNet model is then used for evaluating the recognition accuracy on the testing set of the LFW dataset. As the first evaluation scenario, we evaluate the recognition rate using two distance metrics: cosine and Euclidean distances. The average recognition rates of the FaceNet model trained by the training set of the LFW dataset are shown in Table V. As noticed, although the model is only trained in 100,000 steps, the average recognition rate of the FaceNet model evaluated on the original images is relatively high to achieve 93.81% and 93.26%, corresponding to the use of cosine Euclidean distance metrics, respectively. However, the recognition results can still be significantly improved by using our proposed method to enhance the face images. In particular, the average recognition rates of the AMRF images are higher than the original, ASVDW, ASVDF, and GRIR images by 1.44%, 0.09%, 0.2%, and 0.24% for the cosine distance metric and by 0.96%, 0.39%, 0.41%, and 0.48% for the Euclidean distance metric. To show more efficiency of the proposed AMRF method used in the pre-processing step for human face recognition systems, we do more experiments by using another state-of-art deep learning-based face recognition model, named ArcFace to evaluate the recognition accuracy LFW dataset. To enhance the discriminative power for large-scale face recognition, ArcFace adopts an additive angular margin loss to obtain discriminative features effectively for face recognition. Using ArcFace, the embedding feature is optimized to enforce higher similarity for samples in a class and diversity for samples in different classes by distributing them around the feature center on a hypersphere with a radius . Accordingly, the angle between the current feature and the ground truth weight is calculated using the arccos function. Then, the target angle is added by an angular margin penalty to enhance the intra-class compactness and inter-class discrepancy simultaneously. Next, the target logit is obtained using the cosine function and rescaling by a feature norm. Finally, the subsequent steps are implemented as same as in the softmax loss.
In our experiment, we used the ArcFace model implemented by using TensorFlow [43] to evaluate the recognition accuracy for the LFW dataset. The LFW dataset is first separated into training and validating sets. Then, we select 75% images for each person for the training set while using the remaining 25% images for the validating set. For the person with only one image, we use this image for both training and validating sets. Since the input image size of the ArcFace is 112×112 pixels, we compile the images in both training and validating sets to 112 × 112-pixel images. Therefore, the training set contains 10,279 squared face images of size 112 × 112 pixels with labeled classes. The validating set involves 7,023 squared face images used for evaluating the recognition accuracy of the model. In addition, similar to the FaceNet model, we randomly prepare the pair character list containing pairwise combinations for the ArcFace model to evaluate the recognition accuracy. In our experiment, we generated 5,000 pairs for matching combinations and 5,000 pairs for non-matching combinations. For a person containing images in validating set, we permute and combine to generate ( − 1)/2 pairs. For each person, pairs for matching combinations were chosen randomly from other images of that person, and pairs for non-matching combinations were randomly selected from a list of other person's images. In this study, we implement the evaluation using the ArcFace model for the LFW dataset in three different scenarios.
For the first scenario, we adopt a pre-train ArcFace model in 100,600 steps on the MS-Celeb-1M dataset [44] to evaluate the entire LFW dataset. The backbone network used for the pre-trained ArcFace model was ResNet50 V2 [45]. The feature embedding size is set to 512, while the logit margin and logit scale are set to 0.5 and 64.0, respectively. The results of the first evaluation scenario using the pre-trained ArcFace model are shown in Table VI as "ArcFace_1". As can be seen, the average recognition rate by using the pre-trained ArcFace model for the LFW validating set of the AMRF images is higher than the results of the original, ASVDW, ASVDF, and GRIR images by 1.23%, 0.48%, 0.6%, and 0.81%, respectively. The second evaluation scenario applies the ArcFace model from scratch without pre-trained weights to the LFW dataset. We also split the LFW dataset into a training set and validated set same as the second scenario using FaceNet. In training processes, the number of training epochs is set to 100, the epoch size is set to 1,000 steps, and the batch size is 32. The backbone network used in these experiments is ResNet50 V2, where the dimension of the final output feature embedding vector is 512. The logits scale and logits margin are set to 64.0 and 0.5, respectively. The learning rate of the training process is scheduled as 4×10 -3 for epochs in the range of 1-40, 2×10 -3 for epochs in the range of 41-60, 1.2×10 -3 for epochs in the range of 61-80, and 4×10 -4 the number of epochs reach 100. The results for the second evaluation scenario are shown in Table V as "ArcFace_2". As seen, the result of the ArcFace model without pre-trained weights on the AMRF images is higher than the original, ASVDW, ASVDF, and GRIR images by 0.71%, 0.26%, 0.45%, and 0.53%.
For the third evaluation scenario, we train the ArcFace model on the training set with pre-trained weights generated by training on the MS-Celeb-1M dataset in 100,600 steps and then evaluate the validation set's recognition accuracy. The results for the third evaluation scenario are shown in Table V as "ArcFace_3". As seen, the result evaluated on the AMRF images is greater than the original, ASVDW, ASVDF, and GRIR images by 1.27%, 0.54%, 0.63%, and 0.82%, respectively.
In general, the FaceNet and ArcFace models can achieve relatively high performance in the face recognition rate for the LFW dataset. However, the experimental results demonstrated that the performance of these face recognition models can be significantly improved by using pre-processing methods to enhance the quality of face images. Furthermore, using our proposed AMRF method, we even achieve better results in improving the face recognition accuracy evaluated on the LFW dataset.

D. RESULTS OF THE IJB-C DATASET
IARPA Janus Benchmark C (IJB-C) [29] is a dataset including video still-frames and photos for face recognition benchmarking compiled in cooperation with the Intelligence Advanced Research Projects Activity (IARPA) and National Institute of Standards and Technology (NIST) in 2017. Compared to the LFW and other datasets in the public domain, the IJB-C dataset is more challenging and will advance the state of the art in unconstrained face recognition. The IJB-C dataset contains 138,000 face images, 11,000 face videos, and 10,000 non-face images. In this paper, we evaluate the performance of our proposed AMRF method on two subdatasets, namely "frames" and "img", in the IJB-C dataset. In particular, the "frames" sub-dataset contains 118,966 face images, in which there are 118,483 labeled face images from 3,530 subjects and 483 unlabeled face images. The "img" subdataset involves 22,366 face images, in which there are 22,257 labeled from 3,531 subjects and 109 unlabeled face images. Our study only uses 118,483 face images of 3,530 subjects for the "frames" sub-dataset and 22,249 face images of 3,531 subjects for the "img" sub-dataset discarding the remaining unlabeled face images. Similar to the LFW dataset, we also adopt the MTCNN model to extract face images from the photos in both "frames" and "img" sub-datasets. The extracted face images are then resized to squared face images of size 160×160 pixels. Fig. 9(a) shows eighteen Hillay_Clinton's color face images extracted from photos taken under variant illumination conditions of the IJB-C dataset. As seen, the face images are highly affected by different lighting of the environment to produce an extensive range of variations in illumination on the face images. Consequently, it leads to inefficiencies in the performance of human face recognition systems. To improve the recognition performance, the face images are required to enhance the quality before operating recognizing tasks. The GRIR method can only enhance the illumination of the color face images, while the facial skin color is not recovered and the images do not look natural, as displayed in Fig. 9(d). Adopting ASVDW and ASVDF methods makes the face images more precise, and the environmental lighting effect is relatively illuminated, as shown in Fig. 9(b) and (c), respectively. However, the ASVDW images are overenhanced to produce over-lighting areas on the face images. On the other hand, the illumination on the ASVDF face images is insufficiently enhanced. The balance can be achieved when we use the proposed AMRF method to enhance the quality of the face images. As depicted in Fig. 9(e), the illumination of the face images is not only enhanced sufficiently, but the effect of the environmental lighting is also illuminated efficiently.  Consequently, the face images are more natural, clearer and the facial skin color tone is also recovered. Therefore, by using the proposed enhancement method, the face image quality can be improved to increase the recognition rate of the human face recognition system. To demonstrate the efficiency of the proposed method on the IJB-C dataset, we also use the FaceNet and ArcFace models with the same scenarios used for the LFW dataset, as discussed in sub-section IV.C. First, we use the FaceNet model to evaluate the recognition accuracy on the IJB-C dataset with two different scenarios: evaluate the average recognition rates by using a pre-trained FaceNet model and train the FaceNet model from scratch. For the first scenario, we use two FaceNet models pretrained alternately by the CASIA-WebFace and VGGFace2 to classify the face images in the "frames" and "img" subdatasets of the IJB-C dataset. The results of the evaluations on both "frames" and "img" sub-datasets are shown in Table VII. As seen, the average recognition rates of the AMRF images in "frames" sub-dataset by using FaceNet model pre-trained by CASIA-WebFace are greater than the original, ASVDW, ASVDF, and GRIR images by 0.6%, 0.46%, 0.53%, and 0.56% for the cosine distance metric, and by 0.51%, 0.35%, 0.33%, and 0.43% for the Euclidean distance metric. By using the FaceNet pre-trained by the VGGFace2, the recognition rates of the AMRF images in the "frames" sub-dataset are higher than the original, ASVDW, ASVDF, and GRIR images by 1.1%, 0.27%, 0.33%, and 0.45% for the cosine distance metric, and by 0.95%, 0.35%, 0.31%, and 0.5% for the Euclidean distance metric. The average recognition rates of the AMRF in "img" sub-dataset by using FaceNet model pretrained by the CASIA-WebFace dataset are greater than the original, ASVDW, ASVDF, and GRIR images by 0.69%, 0.16%, 0.29%, and 0.36% for the cosine distance metric, and by 1.29%, 0.08%, 0.2%, and 0.21% for the Euclidean distance metric. Also, the recognition rate improvements compared to the original, ASVDW, ASVDF, and GRIR images when using the FaceNet model pre-trained by the VGGFace2 dataset to evaluate the testing images of the AMRF in the "img" subdataset are 0.52%, 0.02%, 0.04%, and 0.23% for the cosine distance metric, and it is 0.35%, 0.09%, 0.14%, and 0.22% for the Euclidean distance metric. In the second scenario, we train the FaceNet model from scratch without pre-trained weights on the training sets of the "frames" and "img" sub-datasets and then evaluate the recognition rates using this trained FaceNet model testing sets. The "frames" sub-dataset is split into training, validating, and testing sets in the experiment. Each training and validating set contains 3,530 squared 160×160-pixel face images corresponding to 3,530 subjects, and the testing set includes the remaining 111,870 samples. With the same splitting protocol, each training and validate set of the "img" subdataset contains 3,531 face images corresponding to 3,531 subjects, while the validating set includes the remaining 15,954 face images. We use the same training configuration used for the LFW dataset for this experiment. However, for evaluating the testing sets of the "frames" and "img" subdataset, we create two different pair character lists to contain pairwise combination images of matching and non-matching people. As a result, we generate 3,530 matching pairs and 3,530 non-matching pairs corresponding to the number of subjects in the "frames" sub-dataset. Similarly, the pair character list of the "img" sub-dataset contains 3,531 matching pairs and 3,531 non-matching pairs. Although the IJB-C is more challenging than the LFW and other datasets for face recognition, the results are significantly improved in terms of average recognition rate by using our proposed method. As shown in Table VIII, the average recognition rates of the FaceNet model for the AMRF images in the "frames" subdataset are higher than original, ASVDW, ASVDF, and GRIR images by 0.41%, 0.08%, 0.17%, and 0.28% for the cosine distance metric, and by 1.09%, 0.34%, 0.69%, and 0.94% for the Euclidean distance metric. For the "img" sub-dataset, the average recognition rates for the AMRF images by the FaceNet model are also greater than the original, ASDVW, ASVDF, GRIR images by 0.71%, 0.04%, 0.21%, and 0.4% for the cosine distance metric, and by 1.04%, 0.19%, 0.42%, and 0.57% for the Euclidean distance metric.
The performance of our proposed method is further demonstrated by using the ArcFace model to evaluate the recognition accuracy on the IJB-C dataset through three scenarios as used for the LFW dataset. The configurations for all scenarios are set as same as used for the LFW dataset. Firstly, we split the "frames" and "img" into their training and validating sets, in which we use 75% images for training and the remaining 25% images for validation. The face images in all sets are compiled to 112×112 pixels as the input image size requirement of the ArcFace model. Therefore, the training set of the "frames" sub-dataset contains 87,632 squared face images, while the validating set involves the remaining 31,021 face images. For the "img" sub-dataset, its training set contains 15,329 face images and the validating set contains 6,929 face images. In the first scenario using the ArcFace model, we use the ArcFace model pre-trained in 100,600 steps on the MS-Celeb-1M dataset to evaluate the recognition accuracy on the validating sets of both "frames" and "img" sub-datasets. The results of this evaluation scenario are shown in Table IX as "ArcFace_1". The average recognition rate of the ArcFace model for the AMRF images in the "frames" sub-dataset is better than the original, ASVDW, ASVDF, and GRIR images by 1.37%, 0.53%, 0.64%, and 0.75%. For the "img" subdataset, the result for the AMRF images is higher by 1.39%, 0.28%, 0.4%, and 0.49% for the original, ASVDW, ASVDF, and GRIR images, respectively.
We implement the second and third evaluation scenarios on the IJB-C dataset by the ArcFace model by training the training sets of the "frames" and "img" sub-datasets without pre-trained weights and with pre-trained weights in 100,000 steps. The best trained weights of the model are used to evaluate the recognition rates on the validating sets. The results of these evaluation scenarios are shown in Table IX as "ArcFace_2" for the training ArcFace model without pretrained weights and "ArcFace_3" for training ArcFace model with pre-trained weights. As seen, by using our proposed method, the recognition rates are better than using the original images and applying ASVDW, ASVDF, and GRIR methods for the "frames" sub-dataset by 1.71%, 0.41%, 0.68%, and 0.87%, respectively, for the trained ArcFace model without pre-trained weights. For the images in the "img" sub-dataset, the average recognition rate by using the AMRF method is higher than the original, ASVDW, ASVDF, and GRIR images by 1.62%, 0.22%, 0.38%, and 0.59%, respectively. In the third scenario, for the "frames" sub-dataset, the result of the AMRF images is better than the original, ASVDW, ASVDF, GRIR images by 1.54%, 0.1%, 0.21%, and 0.56%. For the "img" sub-dataset, this improvement is 1.76%, 0.19%, 0.12%, and 0.27%, respectively, compared to the original, ASVDW, ASVDF, and GRIR images. These results demonstrate that our proposed method helps improve significantly the performance of the human face recognition system, even outperforming the ASVDW, ASVDF, and GRIR methods.

E. RESULTS OF THE CMU MULTI-PIE DATABASE
The CMU Multi-PIE [30] is a face image database established by researchers at Carnegie Mellon University to address shortcomings, such as the limited number of subjects, single recording session, and captured expression, from the previous famous CMU PIE database [31]. On the contrary to the LFW and IJB-C datasets, the face images in the CMU Multi-PIE database were taken under constrain lighting, expression, and pose conditions. This database contains 755,370 images captured from 337 subjects under 15 viewpoints and 19 illumination conditions in four recording sessions. With many challenges, the CMU Multi-PIE can be considered a standard benchmark face database in the face recognition domain. In this paper, we use session 4 of the CMU Multi-PIE due to its large number of subjects and range of variations in capturing conditions to demonstrate the performance of our proposed method in improving the recognition rate of human face recognition systems.
Session 4 of the CMU Multi-PIE database contains 215,100 photos taken under 19 illumination in 15 viewpoints with three face expressions of 239 subjects. In this study, we adopt the images taken by 13 cameras while discarding the photos taken by "08_1" and "19_1", which are upside-down imaging cameras. Then, we extract the face images by using the MTCNN model and compile the extracted face images to 160×160 pixels. Finally, the CMU Multi-PIE used in our experiment includes 186,420 squared face images taken from 13 viewpoints from 239 people. Fig. 10 displays eighteen color face images in the CMU Multi-PIE datasets. As seen, these face images are taken under different illumination conditions. Therefore, the illumination of the face images is hugely affected by the environmental lighting conditions. Using the ASVDW and ASVDF methods, the quality of the face images can be improved, as shown in Fig. 10(b) and (c). The GRIR method can partly enhance the illumination of the face images, but the naturalness is not recovered, as depicted in Fig. 10(d).
Although the FIN-GAN color face images shown in Fig.  10(e) can resolve the challenge with the illumination variations, the facial skin color is not recovered. In addition, since the FIN-GAN method requires a target image as the reference face image during the training process, the result face images are not natural, smooth and the structure of the face image is also partly affected due to the inconsistent of the poses between the given face images and its reference face images. On the other hand, the face images enhanced by our proposed AMRF methods are even clearer, smoother than the result images enhanced by other methods, as depicted in Fig. 10(f). In addition, the skin color tone of the images is recovered effectively to be more natural on the AMRF face images rather than other enhancement methods. The performance of the proposed method is demonstrated by using the FaceNet and ArcFace models with the same evaluation scenarios as used in sub-section IV.C and IV.D. The results of the first scenario by using the pre-trained FaceNet models to evaluate the recognition accuracy of the CMU Multi-PIE are shown in Table X. As seen, the average recognition rate of the pre-trained FaceNet model by using CASIA-WebFace for the AMRF images is higher than the original, ASVDW, ASVDF, GRIR, and FIN-GAN by 1.78%, 0.74%, 0.73%, 0.98%, and 0.24% by using the cosine distance metric and by 0.9%, 0.4%, 0.49%, 0.64%, and 0.16% by using the Euclidean distance metric. By using the FaceNet model pre-trained by VGGFace2, the result for the AMRF images is greater than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 0.94%, 0.22%, 0.29%, 0.36%, and 0.13% by using the cosine distance metric and by 1.07%, 0.16%, 0.21%, 0.42%, and 0.25% with the Euclidean distance metric. In another scenario, the FaceNet model is trained by testing and validating sets for 100,000 steps with the same configuration used in sub-section IV.C. We prepare the training set by selecting one face image per the viewpoint of every expression type for each subject in the dataset. Therefore, there are 78 face images of each subject used for training. In total, the training set contains 9,321 images. We also take the same number of images as the training set for the validating set. Then, the testing set contains the remaining 167,778 face images. The recognition accuracy is evaluated on the testing set of the CMU Multi-PIE dataset and shown in Table XI. As seen, the average recognition rate of the FaceNet model for the AMRF images is higher than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 1.31%, 0.64%, 0.81%, 0.93%, and 0.34% by using the cosine distance metric and by 0.99%, 0.39%, 0.62%, 0.69%, and 0.3% by using the Euclidean distance metric. For the experiments with the ArcFace model, we establish the training set by selecting 75% images of the images taken by a camera for the expression type of each subject. Therefore, the training set of the CMU Multi-PIE dataset in this experiment contains 139,815 face images. Then, the validating set is built to contain the remaining 46,605 face images. The first scenario shows the results in Table XII as "ArcFace_1". As seen, the average recognition rate of the pre-trained ArcFace model for the AMRF images is higher than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 1.22%, 0.72%, 0.84%, 0.99%, and 0.12%, respectively.
In the other scenarios, the ArcFace models are trained using the training set from scratch (without pre-trained weights) and the pre-trained weights (on the MS-Celeb-1M dataset). The results of these two evaluation scenarios are alternately shown in Table XII as "ArcFace_2" and "ArcFace_3". In particular, the average recognition rate of the ArcFace model trained without pre-trained weights for the AMRF images is higher than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 0.64%, 0.24%, 0.31%, 0.46%, and 0.23%, respectively. In the third scenario, this improvement achieves 0.94%, 0.47%, 0.4%, 0.67%, and 0.29%, which correspond to the greater average recognition rate of the AMRF images compared to the original, ASVDW, ASVDF, GRIR, and FIN-GAN images.

F. RESULTS OF THE CMU-PIE DATABASE
The CMU-PIE human face database [31] Fig. 11 displays ten color face images of a subject under various illumination conditions in the CMU-PIE dataset. The environmental lighting conditions affect the face image quality, as shown in Fig. 11(a). The face image quality can be enhanced by using the enhancement methods, as depicted in Figs. 11(b)-(f). As it can be seen, the proposed AMRF method outperforms other methods in terms of illumination enhancement and facial skin color recovery. The experiments to demonstrate the performance of our proposed method in face recognition accuracy improvement are conducted by using the FaceNet and ArcFace models with the same scenarios and configurations as used in subsection IV. E. The results of the first scenario using the pretrained FaceNet models to evaluate the recognition accuracy of the CMU-PIE are shown in Table XIII. As seen, the average recognition rate of the pre-trained FaceNet model using CASIA-WebFace dataset for the AMRF images is greater than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 0.86%, 0.18%, 0.2%, 0.59%, and 0.11% by using the cosine distance metric and by 1.6%, 0.21%, 0.3%, 1.19%, and 0.15% by using the Euclidean distance metric, respectively. When adopting the FaceNet model pre-trained by using VGGFace2 dataset, the recognition rate for the AMRF images is higher than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 0.73%, 0.25%, 0.36%, 0.46%, and 0.09% for the cosine distance metric and by 1.02%, 0.14%, 0.27%, 0.46%, and 0.1% for the Euclidean distance metric. In the second scenario, we train the FaceNet model without pre-trained weights on the CMU-PIE training set and then evaluate the recognition accuracy on the testing set by using the trained FaceNet model. In this experiment, for each subject in the CMU-PIE dataset, we use one image for training, one image for validating, and the remains for testing. Therefore, the training and validating sets contain 68 face images corresponding to 68 subjects, and the testing set involves 2,924 face images. The average recognition rates of the FaceNet model trained by the CMU-PIE training set are shown in Table XIV. As noticed, the result of the proposed AMRF method is higher than the original images and the ASVDW, ASVDF, GRIR, and FIN-GAN methods by 2.98%, 0.16%, 0.56%, 1.72%, and 0.09% by using cosine distance metric and by 1.45%, 0.28%, 0.5%, 0.69%, and 0.19% for using Euclidean distance metric. Finally, we evaluate the performance of the proposed method by adopting the ArcFace model with three different evaluation scenarios. First, the CMU-PIE dataset is split into training and validating sets with 75% images for training and 25% images for testing. Therefore, the training set contains 2,295 images and the validating set involves the remaining 765 images. The configurations for all three scenarios are same as the experiments on the CMU Multi-PIE dataset. The result of the recognition rate by using the ArcFace model pretrained by MS-Celeb-1M dataset is shown in Table XV as "ArcFace_1". Accordingly, the average recognition rate of the AMRF images is greater than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 1.85%, 0.5%, 0.69%, 0.93%, and 0.05%, respectively. In the second scenario, we train the ArcFace model from scratch, and the recognition rates by using this trained ArcFace model are shown in Table XV as "ArcFace_2". As seen, the result for the AMRF images is higher than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 1.01%, 0.22%, 0.31%, 0.56%, and 0.17%, respectively. In the last evaluation, we train the ArcFace model using the pre-trained weights from the MS-Celeb-1M dataset. The result of the AMRF images is also better than the original, ASVDW, ASVDF, GRIR, and FIN-GAN images by 1.43%, 0.46%, 0.72%, 1.12%, and 0.15%, respectively, as shown in Table  XV as "ArcFace_3".

G. COMPUTATIONAL COMPLEXITY
In this sub-section, the efficiency of the proposed method is presented by measuring the CPU running time on different images sizes. The computational complexity of the AMRF method is O( 2 ). A longer computation time is required for a larger image. Due to the small size of the face images usually used in face recognition, the proposed method can be used in a human face recognition system to improve performance. Compared to previously proposed methods, our proposed method costs less computational complexity and time consumption for face image enhancement. Table XVI summarizes the computation time results of the ASVDW, ASVDF, GRIR, and our proposed methods by using different face image sizes. The evaluations were conducted in Microsoft Visual C++ 2010 on a workstation with an Intel Core i7-9700, RAM 32GB on Windows 10.

V. CONCLUSION
This paper presents a novel image enhancement method called AMRF for color face recognition under varying lighting conditions. The proposed method used the associative filter-based retinex model to represent effectively color face images into multiple layers of illumination and reflectance. Then, the face image can be enhanced by compensating the multiple layers of illumination by adaptive compensation coefficients that are first determined by using a corresponding Gaussian template as the reference. The enhanced face image is obtained by composing the enhanced illumination components with the integrated reflectance component. The experiments were performed on four of the most famous color face datasets named LFW, IJB-C, CMU Multi-PIE, and CMU-PIE to demonstrate the performance of our proposed method in improving the recognition accuracy of deep learning-based face recognition models, namely, FaceNet, and ArcFace. The results reveal that the AMRF method does not only effectively enhances the quality of face images to become more precise, more natural, and smoother but can also notably improve the recognition rate, especially for the face dataset with significant illumination variations and computational time for human face recognition systems.