Model Focus Improves Performance of Deep Learning-Based Synthetic Face Detectors

Deep learning-based models generalize better to unknown data samples after being guided “where to look” by incorporating human perception into training strategies. We made an observation that the entropy of the model’s salience trained in that way is lower when compared to salience entropy computed for models training without human perceptual intelligence. The research problem addressed by this paper is whether lowering the entropy of model’s class activation map helps in further increasing the performance, on top of the performance increase we observe for human saliency-based model’s training. In this paper we propose and evaluate four new entropy-based loss functions controlling the model’s focus, covering the full range of the level of such control, from none to its “aggresive” minimization. We show, using a problem of synthetic face detection, that improving the model’s focus, through lowering entropy by the proposed loss components, leads to models that perform better in an open-set scenario (in which the test samples are synthesized by unknown generative models): the obtained average Area Under the ROC curve (AUROC) ranges from 0.72 to 0.78, compared to AUROC = 0.64 observed for a state-of-the-art human-salience-only-based control of the model’s focus. We also show that optimal performance is obtained when the model’s loss function blends three aspects: regular classification performance, low-entropy of the model’s focus, and closeness of the model’s focus to human saliency. The major conclusion from this work is that maximization of the model’s focus is an important regularizer allowing the models to generalize better in an open set scenario. Future work directions include methods of blending classification-, human salience-, and model’s salience entropy-based loss components to achieve optimal performance in other domains than the synthetic face detection.

recognizing faces, and exceptionally sensitive to minuscule aberrations in face appearance. Thus, using human salience to guide the process of training of deep learning-based synthetic face detectors has proved to increase the generalization of such models (to unknown data). This is achieved by focusing the model on features identified by humans as being prominent, instead of on features accidentally correlated with class labels.
An interesting observation we made about human saliencetrained models is that the average entropy of a model's salience (estimated by Class Activation Map [CAM] [3]) is lower than entropy of salience of models trained by a standard minimization of classification cross-entropy loss. Hence, an immediate question: how does the entropy of a model's salience (regulated through e.g., a loss function) relate to the model's generalization, and -as a consequence -to its strength of detecting synthetic face images? We name these variants Low CAM Entropy (LCE) models. This paper answers this question by exploring several variants of shaping the entropy of the model's salience with and without human guidance embedded into training. We show that appropriate entropy of the model's salience (not ''too large'' to keep the model's focus, and not ''too small'' to prevent over-fitting to specific features) allows to build an effective synthetic face detector, generalizing to samples generated by unknown Generative Adversarial Networks (GAN) much better than state of the art solutions. In particular, we explore the loss functions incorporating the entropy with different mixtures of both human saliency [4] and model's saliency.
We define the following research questions (RQ), and organize the paper in a way to answer these questions. All questions relate to models trained to detect synthetically generated faces by a GAN model held out for testing (hence, unknown during training). When we speak about human salience, we assume it's available in a form of regions that humans annotated. RQ1: Let's assume that human saliency information is not available, but we can estimate the average entropy of human salience. Does requesting the average entropy of the model's and human's saliencies to match increase the performance, compared to a performance of a model trained traditionally with just cross-entropy loss? (see HSEB variant in Fig. 1 • Three LCE loss functions (HSEB, FMMMSE, and DROID).
• The naive combination of human saliency and LCE loss functions (CYBORG+DROID). The outline of the rest of the paper is as follows. Related work is in Sec. II. The human saliency-guided model CYBORG on which our LCE models is built is discussed in Sec. III. Our proposed LCE models are covered in detail in Sec. IV. The combination of human-saliency and LCE models is covered in Sec. V. Experimental setup and results are in Sec. VI and Sec. VII, respectively. Then the conclusions are discussed in Sec. VIII. Limitations are discussed in Sec. IX. Lastly, potential future work is described in Sec. X

II. RELATED WORK
This paper is a continuation of a previous paper on improving synthetic facial detection using human saliency that extends the previous work using (LCE) loss functions. As such we divide our related work into these three subsections: synthetic face detection, human salience in machine learning, and CAM entropy.

A. SYNTHETIC FACE DETECTION
Image manipulation and the creation of fake images poses a serious threat in terms of security [5], [6], [7]. A particularly well known and socially relevant example is that of VOLUME 11, 2023 63431 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Non-deep learning techniques, such as frequency domain analysis, have been used successfully in detecting synthetic faces and other images [19], [20], [21]. However, these techniques have their limitations, including failing when tested with image compression [22]. Deep learning networks [23], [24], [25] such as DenseNet [26] (the pre-trained framework for the models in this paper), achieve a synthetic facial image recall of 99% [27]. A major difference from these studies in the CYBORG study is the improvement of AUROC performance using human salience.

B. HUMAN SALIENCE IN MACHINE LEARNING
While machine learning techniques are not less accurate than human when working with facial data [28], there remains a level of inexplanability to these deep learning methods that can be alleviated by comparison to observed, human experts detecting synthetic faces [29]. Human saliency has aided in deep learning tasks such as handwriting [30], natural language processing [31], as a component of attention mechanisms [32], and in scene description [33], [34]. Specifically in biometrics (including synthetic face detection), human saliency has been shown to compliment machine saliency [4], [35], [36], [37]. In our study, we further demonstrate the usefulness of human saliency by combining it with one of our LCE models (see Sec. V).

C. CAM ENTROPY
While it is not the same as CAM entropy, the general idea of using entropy in model training as been popular since the introduction of the cross-entropy loss function [38]. CAM entropy treats each pixel of a CAM image as a probability event with all pixels in the images summing to one. This allows one to treat the CAM as a probability distribution, with the higher probability pixels being those that the model is focused on. Previously, CAM entropy has been used as a meaningful way to improve model explainability and the addition of CAM entropy as a loss function component is established [39]. We differ from this previous work with the variety of CAM entropy loss functions introduced. Furthermore, to the best of our knowledge, this is the first time LCE loss functions have been used for synthetic face detection with and without the incorporation of human saliency.

III. HUMAN SALIENCY-GUIDED TRAINING REDUCES ENTROPY OF MODEL'S SALIENCE
The human perception-guided training aims at minimizing the distance between the model's saliency maps and their respective human saliency maps. For instance, in the example CYBORG approach [4], the loss function is composed of two terms: the human perception loss component (the Mean Squared Error between the human salience and the FIGURE 2. (Entropy of the probability density map) Example CAMs for a 7-by-7 grid that have been normalized to sum to 1. Underneath each CAM is its corresponding entropy value. In each CAM focus is mapped to a yellow-to-blue color scale with the n yellow pixels corresponding to its highest normalized value of 1/n (i.e., focus) and blue to zero. model's salience), and the classification loss component (regular cross-entropy): where K is the number of samples in a batch, C is the number of classes, y k is a class label for the k-th sample, 1 is a class indicator function equal to 1 when y k ∈ C c and 0 otherwise, s k is the model's saliency map calculated for the k-th sample, and α is the parameter weighting the cross-entropy-based loss component, while β is the parameter weighting the human-based loss component. We use Class Activation Mechanism (CAM) [3] to approximate model's salience s (m) , and normalize both the human and model saliency maps to [0,1].
The entropy H of the CAM (or salience map) s is: where h, w are height and widht of the salience map s, respectively, and s is normalized to formally express the probability distribution related to the concept of ''focus'': Fig. 2 illustrates a few example 7 × 7 salience maps s and their corresponding entropy to demonstrate how Shannon entropy shrinks as the number of pixels focused on shrinks. For instance, an entropy of 3.89 would mean an equal focus on all 49 pixel, while an entropy of 0 would mean a single pixel has the model's full focus. Fig. 2 is certainly a toy example, and it is more interesting to observe how the entropy H of actual salience s estimated for models trained in various ways changes during training. We trained DenseNet [26] with both regular cross-entropy loss, and CYBORG human saliency-guided loss, and compared the entropy of the resulting model salience maps with entropy of human salience (computed by averaging entropy of all human annotations). As we see in Figure 3, an average salience entropy for a model trained with cross-entropy loss is around 3.65. That corresponds to approx. 38-pixel (out of 49 for a 7 × 7 Class Activation Map) focus area. For a model trained with CYBORG loss, however, the model's salience entropy goes down to 3.350 ( a 29-pixel focus). For comparison, human-annotated salience maps have an average entropy of 3.0 (a 20-pixel focus). Looking at the saliency maps in Fig. 1 we see a reasonable correlation with these figures. This experiment, serving as a segue to Sec. IV, suggests that (a) human-guided training decreases the model's salience entropy, and (b) there is a negative correlation between the salience entropy and the model's performance.

IV. PROPOSED LOW ENTROPY MODELS
Section III demonstrated that Shannon's entropy of the model's salience is reduced when the network is guided towards important features during training. Extending this insight, we propose to examine several methods of minimizing entropy of the class activation maps (serving as an estimator of the model salience) and analyze which methods increase the generalization capabilities of the model in the task of synthetic face detection. An important note is that the proposed approaches are not limited to synthetic face detection task, and can be applied to problems, in which human perceptual capabilities may be utilized in model's training.
The proposed overall approach can be seen as a generalization of the human-guided CYBORG training introduced by Boyd et al. [4]. We do this by replacing the human perception component in Eq. (1) with a more generic salience where all variables have the same meaning as in Eq. (1) except for β which now represents the LCE component. Further in this Section we investigate three different approach to building L (sec) The first approach, Human Salience Entropy Bound (HSEB), which is directly related to research question RQ1 aims at matching the Shannon's entropy of the model's and human's salience maps: are the entropies of the CAMs and human saliency maps, as defined in Eq. 2, respectively, and averaged over all samples within the i-th batch. Note that in this approach we do not guide the model ''where to look'' and only request the model to achieve a similar salience's entropy as observed for humans who annotated the same training samples. The motivation for this is to give the model more flexibility in choosing salient features and exploring an approach in which the exact human saliency maps are not available, but instead we know the estimated value of their entropy. If the generalization capabilities of such approach is competitive, we could potentially replace the need of collecting human salience maps with an estimated scalar entropy value of such maps.
The observed results obtained for the HSEB approach (discussed in details in Sec. VII) suggest that this way of limiting the model salience's entropy allows to further improve the performance. Following this, in some sense naïve approach, we explored the way to aggressively minimize the model's salience, called Forcibly Minimizing the Mean VOLUME 11, 2023 63433 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Model Salience Entropy (FMMMSE), directly addressing the research question RQ2: with H (m) k defined in Eq. 2. This method, as we will see later, not surprisingly overfits to the training data, suggesting that the model salience's entropy cannot be minimized too aggressively as it promotes searching for spurious features correlated with the class category (that is, what we want to avoid. This takes us to the last, and the most effective approach to select L (sec) , which investigates a middle ground between being close to human entropy and minimum entropy 63434 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. of model's salience: Directed Region Of Interest Diminution (DROID), see Fig. 4. DROID minimizes the log CAM entropy, which is similar to the FMMMSE approach, but with less of a penalty on higher entropy to avoid over-focusing: for each sample k in a batch.

V. PROPOSED COMBINATION OF HUMAN-SALIENCY AND LOW ENTROPY
In Section IV we saw that low entropy models force a model's focus on few features. However, there is no guarantee that these will be the most important, or even useful features. CYBORG approach, as we saw in Sec. III, uses human saliency to guide a model to important features, but the features to focus on are desired to be matched with those with humans, including their number. We hypothesise that if put together, human saliency should guide the model to important features while low entropy should force the model to focus on only the most important features. We thus propose a combination of these two approaches (called CYBORG+DROID) as an exploratory test, by using DROID as the low entropy component and CYBORG as the human-saliency component, addressing research question RQ4: for each sample k in a batch of size K . All variables have the same meaning as in Eqn. 1 including alpha and β, while the new parameter γ is the weight for the LCE component. α As we see, CYBORG+DROID follows a similar loss function format as the other low entropy models, excepting that there is both a salience entropy control component (specifically, DROID, defined by (7)) and a CYBORG loss component (defined by (1)), each with their own weights.
In the original CYBORG study it was determined that CYBORG is largely unaffected by changes to the coefficients in front of the loss components (α and γ in (8)). In this study, we did explore using various weight values for the components of our proposed low entropy models, and they too were largely unaffected. However, the performance of the combination model, CYBORG+DROID, is affected by changes in component weights as discussed in Sec. VI-B.

VI. EXPERIMENTAL SETUP A. EXPERIMENT DESCRIPTIONS
We conduct four experiments, one to address each of the four research questions: explicitly requesting the model and human saliency match in terms of CAM entropy value (HSEB approach, addressing RQ1), aggressively requesting the minimum CAM entropy possible (FMMMSE approach, addressing RQ2), less aggressively requesting the minimum CAM entropy with a log-loss function (DROID approach, addressing RQ3), and the combination of low model's entropy request and human saliency-based guidance (DROID + CYBORG, addressing RQ4).
The same experimental format is used in all four experiments. In each experiment we compare the performance of one of the low entropy models, or the CYBORG+ DROID model, to baseline cross-entropy and state-of-theart CYBORG in the task of synthetic face detection. While synthetic face detection was chosen as an example domain, all the considerations remain valid for other visual tasks in which humans are competent. We use mean area under the Receiver Operating Characteristic (AUROC) based on sigmoid scoring to measure a model's performance and define a higher mean AUROC as a better performance. We also define an increase in mean AUROC greater than the sum of the standard errors of the two models to be a significant increase. Such an increase in performance from the baseline or state-of-the-art compared to the LCE model, where only the loss functions distinguish between the models, will indicate that training a model with a constraint on its CAM entropy is beneficial.

B. EXPERIMENT PARAMETERS
For training, we follow the experimental procedure established in [4], excepting learning rate. This change was done to more thoroughly explore the behavior of our low entropy models. Specifically, all models are trained with a constant learning rate of 0.002 for a period of 50 epochs using Stochastic Gradient Descent and the weights chosen for the final model are those offering the highest validation accuracy. All samples are instantiated from the DenseNet-121 model pre-trained on ImageNet dataset [26]. The training and validation sets are constant for all models as described in Sec. V. To assess the uncertainty associated with randomness of the training process, we train ten instances of each model with the same training data but with different seeds.
The weighting for loss components is as follows (for information on how the weights were chosen, see Sec. X. For the cross-entropy baseline, classification loss is α = 1.0. For CYBORG, HSEB, FMMMSE, and DROID, the weighting for the cross-entropy component is α = 0.5 and the weighting for the secondary component β varies: for CYBORG β = 0.5, for HSEB β = 0.4, for FMMMSE β = 0.2, and for DROID β = 0.4.
For CYBORG+DROID, the weighting for the classification loss is α = 0.5, the CYBORG human saliency FIGURE 6. The CAM entropy and AUROC performance for each of the 10 samples of each of the six models tested. The y-axis is AUROC for each sample using sigmoid scoring and the x-axis is the CAM entropy for that sample. Note that generally the CAM entropy decreases for each model in the following order: cross-entropy, CYBORG, HSEB, DROID/CYBORG+DROID, and FMMMSE. The highest AUROC performances are not at either entropy extreme, but close to the middle range of 2.0 to 2.5. component is β = 0.5, and the low entropy DROID component is β = 0.4.

C. DATASETS
For training each model we use the established face image datasets, split into disjoint training, validation, and testing datasets, in the same way as proposed in [4]. Figure 5 shows synthetic and real face image examples from each dataset.
The training set consists of 1821 training samples (919 real and 902 synthetic). Real samples originate from the Face Recognition Grand Challenge (FRGC) dataset, and synthetic samples are generated for this dataset using the ''synthesis of realistic face images'' (SREFI) method [40] and StyleGAN2 [12].
The validation set consists of 20,000 samples (10,000 real and 10,000 synthetic). As with the training set, real validation samples are taken from FRGC, and synthetic validation samples are generated with SREFI and StyleGAN2. Note that separate images were generated from SREFI and StyleGAN2 for the training and validation datasets.

A. MODEL ENTROPY VS PERFORMANCE
Motivated by [39], we observed in Section III that the CAM entropy for models trained with Cyborg was lower than those trained with Cross-entropy. This is unsurprising since Cyborg tries to match the CAMs to human salience maps, which themselves have lower entropy (as a probability density map) as seen in Figure 3.
Since our proposed methods, HSEB, FMMMSE and DROID are designed to lower CAM entropy we want to determine if there is a more general correlation between the AUROC scores and CAM entropy. In addition, from Figure 2, CAM entropy indicates how focused the CAM is and it seems unlikely that very small values of CAM entropy would be favorable, so there may be an optimal value for a given model and dataset.
Experimentally we compare the CAM entropy/AUROC performance of the baseline cross-entropy and state-of-the-art CYBORG with HSEB, FMMMSE, DROID and a combined method, CYBORG+DROID. For each model we use the weights from the epoch with the highest validation accuracy during training, and then do all analyses described later on the sequestered test subset. Fig. 6 shows the results for ten training runs for each model type. We observe good correlation between lower CAM entropy and increasing AUROC for CAM entropy values above 2.0 with a ''Sweet Spot'' between 2.0 and 2.5. For the various methods, their highest singular AUROC score increases in the following order: baseline cross-entropy, stateof-the-art CYBORG, HSEB, DROID, CYBORG+DROID, and FMMMSE.
Summarizing, we can make a number of important conclusions based on Fig. 6. First, choosing a loss function to lower CAM entropy does increase focus. Second, reducing CAM entropy, even without the guidance of human salience, gives significant improvement in AUROC scores, up to a point, after which we trade AUROC performance for further increases in focus. Third, training approaches that aggressively minimizing a model's CAM entropy (like FMMMSE) do not end up with models offering the best performance capabilities, and their performance varies greatly across different training runs. Finally, adding complementary human guidance, as in DROID+CYBORG approach, stabilizes the model in terms of CAM entropy across training runs, and offers the best performance capabilities, translating to the highest mean AUROC for DROID+CYBORG in Fig. 6.

B. COMPUTATIONAL COMPLEXITY
The mean run-time per epoch for each of the six models is shown in Table 1 in seconds. We see that baseline-cross entropy is the fastest model to train. It is also the only TABLE 2. AUROC performance with standard errors of the six models in this paper.

FIGURE 7.
Boxplots representing the sigmoid AUROC scores over 10 training runs for each of the approaches considered in this paper. Thick central bars represent median values, height of each box corresponds to an inter-quartile range (IQR) spanning from the first (Q1) to the third (Q3) quartile, whiskers span from Q1-1.5×IQR to Q3+1.5×IQR, and outliers are shown as circles. Notches represent 95% confidence intervals of the median. Note that for the y-axis we show only that range for which we have data, 0.45 to 0.85. model that markedly faster than any other model and is significantly faster than every other model. However, even the slowest method, HSEB, is only 6.28 seconds slower per epoch than cross-entropy. That is only a 10% increase in run time amounting to a 5.2 minute increase over the course of the 50 epoch training.

C. ADDRESSING RESEARCH QUESTIONS
The AUROC results supporting the answers to research questions RQ1-4, along with their baseline and state-of-the-art comparisons, are shown in Fig. 7 and listed in Table 2. Actual ROC curves are shown in Fig. 8.

1) ANSWERING RQ1: DOES MATCHING THE MODEL AND AVERAGE HUMAN SALIENCE ENTROPIES INCREASE THE PERFORMANCE?
In CYBORG-trained models, human saliency-guided model learns by focusing on important features. In HSEB-trained models, human saliency provides a target entropy score. Our VOLUME 11, 2023 experiments show that HSEB is able to achieve this mean entropy and it outperforms the baseline cross-entropy. Hence, the answer to RQ1 is affirmative: requesting the model CAM entropy match human saliency entropy increases the performance, compared to traditional cross-entropy trained models. HSEB achieves an AUROC of 0.713 ± 0.01. It outperforms baseline cross-entropy (0.561 ± 0.02) with a mean AUROC score increase of +0.152 which is significant with as sum standard error of 0.03. HSEB also outperforms CYBORG (0.636 ± 0.01) with a mean AUROC score increase of 0.077 which is significant with a sum standard error of 0.02. As the answer to RQ1 is affirmative, in FMMMSE-trained models, we aggressively request the minimum possible CAM entropy. This results in the lowest mean CAM entropy models and further increases the performance, allowing us to answer the RQ2 affirmatively as well: there well-localized and strong-enough image features that are sufficient to solve the synthetic face detection task. The FMMMSE approach achieves an AUROC of 0.775 ± 0.01. It outperforms baseline cross-entropy with a mean AUROC score increase of +0.214 which is significant with as sum standard error of 0.03. FMMMSE also outperforms CYBORG with a mean AUROC score increase of 0.139 which is significant with a sum standard error of 0.02.

3) ANSWERING RQ3: IS THERE A BETTER STRATEGY TO CONTROL THE ENTROPY OF THE MODEL's SALIENCY?
As RQ2 is affirmative, and the concern for over-focusing using the FMMMSE approach is high (due to the low percent of image focused on, and apparent high variance of the performance seen in different training runs, depicted in Fig. 6), what if we request a middle-of-the-road CAM entropy in model training using log-entropy? DROID fills this middle-of-the-road position, having final mean CAM entropy scores ranging generally between FMMMSE and HSEB entropy scores. DROID also outperforms the baseline indicating that a softer request of minimizing mean CAM entropy improves performance in the task of synthetic face detection. DROID achieves an AUROC of 0.769 ± 0.01. It outperforms baseline cross-entropy with a mean AUROC score increase of +0.208 which is significant with as sum standard error of 0.03. DROID also outperforms CYBORG with a mean AUROC score increase of 0.133 which is significant with a sum standard error of 0.02.

4) ANSWERING RQ4: DOES THE MODEL BENEFIT FROM COMBINING THE HUMAN SALIENCY-GUIDED TRAINING WITH NON-HUMAN-GUIDED CONTROL OF THE MODEL SALIENCE's ENTROPY?
As the results are affirmative for RQ1-3, and due to concerns with over-focusing in FMMMSE, we investigate the performance of a model using the combination of human saliency-guided training (CYBORG) with low entropy-based training (DROID). The increase in performance of the CYBORG+DROID approach over baseline cross-entropy indicates that the model benefits from combining humanguided saliency training with non-human-guided control of the model's CAM entropy. CYBORG+DROID achieves our highest mean AUROC performance, 0.783±0. It outperforms baseline cross-entropy with a mean AUROC score increase of +0.222 which is significant with as sum standard error of 0.02. CYBORG+DROID also outperforms CYBORG with a mean AUROC score increase of 0.147 which is significant with a sum standard error of 0.01. Fig. 8 illustrates graphically how CYBORG+DROID has the highest performance and noticeably lower variability over the other models considered.

VIII. CONCLUSION
High Shannon entropy of model saliency (CAM entropy) corresponds to a low focus as the model considers all pixels, including the irrelevant ones, with equal probability. Thus models with high entropy are indiscriminate and lowinformation. This is seen with models trained with the classical cross-entropy loss function. As CYBORG introduces human saliency to the model we can expect that the entropy decreases with the increased information. We make the observation that this is so, leading to the natural question ''is low entropy merely an effect or can it be a cause of increased information and performance?'' This paper is an attempt to answer that question by introducing new loss functions that modify CAM entropy directly. HSEB matches the average human-salience entropy, FMMMSE forcibly minimizes CAM entropy, and DROID seeks a reasonable middle ground by minimizing log entropy. We see AUROC improvements in all three methods as we reduce CAM entropy.
The next question raised is ''how far can we reduce CAM entropy before the information gain becomes a hindrance to the model?'' In Fig. 6 we see that no model achieves its highest AUROC performance with a CAM entropy below 1.0. Instead, the best performances are generally in the entropy range of 2.0-2.5. This leads us to consider the middle-of-theroad DROID method as the most optimal.
Finally, as the incorporation of human-salience has proven useful in the past, it stands to reason that human direction could help guide the more focused, low-entropy DROID method. This leads us to the final question of this paper, ''does incorporating human-salience into an optimal low-entropy model improve performance?'' The answer is: yes, it does. CYBORG+DROID achieves the highest average AUROC of all the methods in this paper, improving over DROID by +0.014 which is significant with a sum standard error of 0.01. While this difference appears significant (Table 2), it is ultimately quite small (a 0.018% increase) indicating that there is a need for further work in combining model-salience entropy and human-salience.

IX. LIMITATIONS
One factor, potentially limiting this work at its current stage, is our focus on the comparison of the proposed CAM entropybased training with the previously proposed human saliencybased training (CYBORG). Given the plethora of possible parameters to be set in both approaches, data sets and comparison methods that would be of interest, to make a fair comparison we concentrated on using the exact data set and methodology used by the authors of CYBORG. It also enabled us to do direct comparison of our method when combined with CYBORG. This limitation, however, does not impair the general conclusions and opens new research lines, which we itemize in Section X discussing future directions.

X. FUTURE WORK
Controlling the entropy of model's salience (more frequently estimated by class activation maps, as in this work) is an interesting concept related to model regularization, that should be applicable across many fields of image recognition. Having established the efficacy of the method in this paper, we see the following new research areas opened by this work: • explore optimal weighing of multiple loss components (regular classification, low-entropy focus, humanguided saliency), including dependency on the domain and model architecture; • use of adversarial training techniques along with CAM entropy-based regularizers; • explore how incorporating additional training data from diverse facial expressions, poses, and lighting conditions enhance the focus and accuracy of deep learning-based synthetic face detectors; • explore the effects of applying different types of saliency maps, such as eye tracking-sourced maps or semantic saliency maps, and different types of incorporating saliency maps into training, for instance through a specially-crafted attention mechanism; • explore how do the different types of synthetic data generation techniques affect the focus and performance of deep learning-based synthetic face detectors. Any of the work itemized above would deserve a separate paper.

APPENDIX A COEFFICIENTS BLOCK SEARCH
We performed a block search of loss function weights to determine coefficients for our models. For CYBORG we use the weights established in [4], which are cross-entropy and cyborg weighted equally at 0.5. For our block search we use the weight for cross-entropy established by CYBORG and then test weights for the low entropy component ranging from 0.1 to 0.5. CYBORG+DROID has three components, but we rely on the results of the previous CYBORG study to set the weights for the cross-entropy and human saliency components at 0.5 each. In CYBORG+DROID we only vary the LCE component. Results are in Table 3.

ACKNOWLEDGMENT
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Department of Defense or the U.S. Government.