FlowSAN: Privacy-enhancing Semi-Adversarial Networks to Confound Arbitrary Face-based Gender Classifiers

Privacy concerns in the modern digital age have prompted researchers to develop techniques that allow users to selectively suppress certain information in collected data while allowing for other information to be extracted. In this regard, Semi-Adversarial Networks (SAN) have recently emerged as a method for imparting soft-biometric privacy to face images. SAN enables modifications of input face images so that the resulting face images can still be reliably used by arbitrary conventional face matchers for recognition purposes, while attribute classifiers, such as gender classifiers, are confounded. However, the generalizability of SANs across arbitrary gender classifiers has remained an open concern. In this work, we propose a new method, FlowSAN, for allowing SANs to generalize to multiple unseen gender classifiers. We propose combining a diverse set of SAN models to compensate each other's weaknesses, thereby, forming a robust model with improved generalization capability. Extensive experiments using different unseen gender classifiers and face matchers demonstrate the efficacy of the proposed paradigm in imparting gender privacy to face images.


INTRODUCTION
F ACE images of individuals contain valuable information unique to themselves that facilitates biometric face recognition. In addition, other auxiliary information such as age, gender, and race, which are called soft-biometrics, can also be extracted from face images using machine learning techniques [1]- [3]. Face recognition involves comparing features extracted from a pair of face images, using a face matcher, to determine their degree of similarity [1], [4]. The increasing use of face recognition in various applications has brought the issue of data privacy to the forefront [5]- [18]. While extracting soft-biometric information can be useful in many applications [19], we should note that such information can be abused in several ways, such as profiling users, targeted advertisement, and increasing the risk of linkage attacks [20]. Furthermore, extracting this information without the users' consent may be viewed as a violation of their privacy. One aspect of privacy involves granting users the right to determine which personal information to reveal and which to conceal [21], [22]. In this regard, softbiometric privacy was introduced as a means for preserving the biometric utility of face images, while confounding softbiometric information, such as gender characteristics [23], [24].
Recently, European Union's General Data Protection Regulation (GDPR) [25] has come to effect. One of its goals is to protect the data collected from European users and to regulate its usage. To this effect, it enforces any entity (individual or group) collecting data from European users to disclose the type-of-data collected, the intended usage, and the data-processing techniques that will be used. Accordingly, GDPR prohibits any processing of individuals' information beyond the stated purpose at the time of data collection. For example, consider a scenario where users of an application or service can optionally withhold their gender information; however, such information could still be extracted automatically from their biometric data [26]- [34].
In the context of GDPR, biometric data of individuals, such as face photos or fingerprints, are collected solely for the purpose of user recognition, without acquiring other demographic information such as age, gender, and ethnicity. In such a scenario, applying data processing techniques that allow extracting such sensitive information automatically from a person's biometric data [1], [2], [32], [35]- [40] without their knowledge and consent is a violation of the users' privacy. While GDPR prohibits unsolicited data extraction from European users, the possibility of unlawful data collection still remains and can ultimately lead to negative societal, economic, and political consequences [41]- [43].
Previously, we developed Semi-Adversarial Networks (SAN) [44] for imparting soft-biometric privacy to face images, where a face image is modified such that the matching utility of the modified face image is retained while the automatic extraction of gender information is confounded. In our previous work [44], we empirically showed that the ability to predict gender information, using an unseen gender classifier from outputs of the SAN model, is successfully diminished. In [45], we defined the generalizability of the SAN model as its ability to confound arbitrary unseen 1 1. The term "unseen" indicates that a certain classifier (or face matcher) was not used during the training stage. On the contrary, the term "auxiliary" in this paper refers to the classifier (or face matcher) that is used during the training phase. gender classifiers. Generalizability is an important property for real-world privacy applications since the lack thereof implies that there exists at least one gender classifier that can still reliably estimate the gender attribute from outputs of the SAN model and, therefore, jeopardizes the privacy of users. In order to address the generalizability issue of SAN models, in this paper, we propose the FlowSAN model, that progressively degrades the performance of unseen gender classifiers. Extensive experiments on a variety of independent gender classifiers and face image datasets show that the proposed FlowSAN method (Fig. 1) results in a substantially improved generalization performance compared to the original SAN method with regard to concealing gender information while retaining face matching utility.

RELATED WORK
With regard to privacy concerns in recent years, a new line of research has emerged that focuses on methods for imparting soft-biometric privacy to biometric data and face images in particular [8]- [10], [23], [24], [46]. Othman and Ross [23] first proposed an approach for mixing input face images with candidate images of the opposite gender using Active Shape Model [47]. Subsequently, Mirjalili and Ross [24] developed a scheme that modifies an input face image using adversarial perturbations [48] where the performance of a given gender classifier was confounded while the performance of a face matcher was retained. Chhabra et al. [9] later extended this research by including multiple attribute classifiers. They applied additive perturbations to face images to either preserve or suppress certain soft-biometric attributes [9]. While these proposed schemes successfully confound a target attribute classifier, they fail to generalize to unseen attribute classifiers. Thus, soft-biometric attributes remain susceptible to extraction by unseen classifiers.
In order to derive perturbations that are transferable to unseen gender classifiers, Mirjalili et al. [44] designed a convolutional autoencoder that modifies input face images such that an auxiliary face matcher still retains good matching performance on the modified output image while confounding an auxiliary gender classifier. As a result, since the output of their model is adversarial to one classifier and not to the other, the architecture is referred to as Semi-Adversarial Networks (SAN). The SAN model was shown to be able to derive perturbations that are transferable to two unseen gender classifiers. In [45], we investigated the generalizability of SAN models across multiple arbitrary gender classifiers and formulated an ensemble SAN model with a training scheme based on different data augmentation techniques, to enhance diversity in the ensemble of SAN models. Furthermore, we explored the effectiveness of randomly selecting a perturbed image from an ensemble of SAN models, which we refer to as Ens-Gibbs [45].
While these methods directly apply perturbations to face images, recently, new techniques have emerged where perturbations were applied to face representation vectors computed by face matchers [8], [13]. In particular, Morales et al. [8] proposed a neural-network-based model, called Sen-sitiveNet, that is able to remove soft-biometric information from face representation vectors. Therefore, any attribute classifier trained on face representation vectors may not be able to extract such sensitive information. However, these methods are based on the assumption that only face representation vectors are stored in a biometric database. This scheme is not desirable in many applications since only storing face representations results in 1) losing human interpretability, and 2) losing backward matching compatibility when the face matcher is updated. An overview of existing techniques and their properties (transferability, generalization to arbitrary attribute classifiers, and retaining matching utility) is shown in Table 1.
In this work, we address the generalization issue of the SAN method using a novel stacking paradigm that will successively enhance the perturbations for confounding an arbitrary unseen gender classifier as illustrated in Fig. 1. We refer to this method as FlowSAN. The primary contributions of this work are as follows: • Designing the FlowSAN model that can successively degrade the performance of arbitrary unseen gender classifiers; • Generalizing the FlowSAN model to multiple arbitrary gender classifiers; • Demonstrating the practicality and efficacy of the proposed approach in confounding the gender information for real-world privacy applications via extensive experiments involving broad and diverse sets of datasets.

PROPOSED METHOD
Original SAN model [44]: The SAN model for imparting gender privacy to face images was first proposed in [44], and the overall architecture is shown in Fig. 2. The SAN model leverages pre-computed face prototypes, which are average face images for each gender. SAN consists of three subnetworks: 1) a convolutional autoencoder that perturbs an input face image via face prototypes, 2) an auxiliary face TABLE 1 Overview of existing methods for imparting soft-biometric privacy and their comparison based on three criteria: transferability, generalizability, and retention of matching performance; transferability refers to the ability to generate perturbations that can successfully confound a different gender classifier, whereas generalizability is a stronger criterion for the ability to confound any arbitrary unseen gender classifier.  [44] composed of three subnetworks: I: a convolutional autoencoder [50], II: an auxiliary face matcher (M ), and III: an auxiliary gender classifier (G). In addition, the unit D computes the pixelwise dissimilarity between input and perturbed images during model training.

Authors
matcher, which is a convolutional neural network (CNN), and 3) a CNN-based auxiliary gender classifier. The input to the convolutional autoencoder is a gray-scale 2 face image I orig , of size 224×224×1, fused with a face prototype belonging to the same gender (P sm ). After the fused input image was passed through the encoder and decoder networks, the face prototypes (P sm prototype face image from the same gender as input image, or P op the prototype face image of the opposite gender) are added as additional channels to the resulting 128-channel feature-map representation. Finally, a 1×1-convolutional operation is used to reduce the number of channels in the resulting feature-maps to a 224×224×1dimensional output image, which is denoted as I sm or I op , depending on the type of prototype used by the decoder: These output images, I sm and I op , are then passed to both the auxiliary face matcher and the auxiliary gender classifier. The auxiliary face matcher predicts whether the original and the perturbed face images belong to the same individual via a face match score. The gender classifier predicts the gender of the input and output images via gender probabilities for male and female. 3 For the auxiliary face matcher, the pre-trained, publicly available VGG-face 2. Since most face-matchers work with gray-scale face images, we used gray-scale images in all experiments to allow for a fair comparison between matchers based on the same input data.
3. In this paper, we have assumed binary labels for gender; however, it must be noted that societal and personal interpretation of gender can result in many more classes. model [51] is used, which computes the face representation vectors for an input face image, and the similarity between two face representation vectors determines the associated match-score.
Three different loss functions are defined based on the outputs from the autoencoder, the auxiliary gender classifier, and the auxiliary face matcher. The first component of the loss function, J D , measures the pixelwise dissimilarity between the input and the output from the same-gender prototype I sm , which is used to ensure that the autoencoder subnetwork is able to construct realistic face images: where H indicates the cross-entropy function for the binary case, defined as The second loss term, J M , is the squared L 2 distance between the face representation vectors obtained from the auxiliary face matcher (VGG-face network [51]) for the input image and the perturbed output, making the autoencoder learn how to perturb face images such that the accuracy of the face matcher is retained: where R M (I) and R M (I op ) indicate the face representation vectors for the input image and the perturbed output based on the opposite-gender prototype. Finally, the third loss term, J G , is the cross-entropy loss function applied to the gender probabilities computed by the auxiliary gender classifier, G, on the two perturbed output images. Here, the ground-truth label y of the input image is used for I sm , but the reverse (1 − y) is used for I op : The total loss, J tot , is the weighted sum of the three individual loss functions described in the previous paragraphs, where the parameters λ i are the relative weighting terms that can be chosen uniformly or adjusted via hyperparameter optimization.
In the remaining part of the paper, we use notation I for the output of a SAN model on a face image I orig when using the opposite-gender prototype, i.e., I = SAN(I orig ; P op ). Based on our previous study [45], we employed a data augmentation and resampling scheme for training the auxiliary gender classifiers as a means to diversify the SAN models. In particular, by resampling the instances belonging to the underrepresented race in the CelebA [49] dataset, we aimed to balance the racial distribution in the training data. In this regard, we generated five resampled training datasets, where in each one a random disjoint subset of samples from the underrepresented race was replicated 40 times. This is an effort to enhance the diversity among the SAN models in an ensemble. The resampling approaches that are used to mitigate the imbalances in the different training datasets employed in this study are described in [45].

Training and Evaluation of an Ensemble SAN model
In our previous work [45], we proposed an ensemble approach for generalizing SAN models to unseen gender classifiers. The objective of an ensemble SAN was to create n SAN models such that their union can span a larger subset of the hypothesis space compared to a single SAN model. Therefore, for a new test image and an arbitrary unseen gender classifier, G, it is likely that at least one of these SAN models in the ensemble is able to confound G. For training an ensemble of SANs, we start with n auxiliary gender classifiers, G = {G 1 , G 2 , ..., G n }, which were trained using different data augmentation schemes (to achieve higher diversity among classifiers), and a pretrained face matcher M . Then, we train n SAN models,  where SAN i is associated with the auxiliary gender classifier G i , as shown in Fig. 3. According to the original SAN model proposed in [44], the loss function for training each model is composed of three components: gender loss, matching loss, and pixelwise dissimilarity loss (Eq. 6). Note that the ensemble of SAN models described with this setting can be trained in parallel since each SAN model is independent of others, and each individual SAN model takes unmodified images as input (Fig. 3).
Evaluation of an ensemble of models, that were trained independently, can be performed in two ways: 1) Averaging: Evaluating the ensemble of SANs by computing the average output image from the set of n outputs as shown in Fig. 4-A. 2) Gibbs: Randomly selecting the output of one SAN model (Fig. 4-B).
These two ensemble-based methods serve as a basis for the comparison with the proposed FlowSAN method, which is described in the following section.

FlowSAN: Connecting Multiple SAN Models
Assume there exists a large set of gender classifiers G = computes the match score between a pair of face images, I a and I b . Our goal is to design an ensemble of n SAN models, E = S 1 , S 2 , ..., S n , that, once they are sequentially stacked together, can be shown to generalize to confound unseen gender classifiers in G. We hypothesize that stacking diverse SANs sequentially would have a cumulative effect, where each SAN adds perturbations to an input image that confound a particular gender classifier. Therefore, stacking SANs would enhance their generalizability in terms of decreasing the performance of multiple, diverse gender classifiers.
We define a recursive function Ψ E (I orig , t) for stacking SAN models in E = {SAN 1 , ..., SAN n }, as follows: By varying t from 1 to n, Ψ E (I orig , t) produces a sequence of n output images I 1 , I 2 , ..., I n : In particular, we hypothesize that for each G i ∈ G, the stacking of SAN models will progressively confound G i . Since the individual SAN models were trained to have a minimal impact on face matching performance, we further hypothesize that the perturbations introduced in the output face images I 1 , ..., I n from the stacked SAN models should not substantially affect the face recognition performance of the matchers in M.

Training Procedure for the FlowSAN Model
The goal of this work is to develop a model that leverages the image perturbations induced by individual, diverse SAN models to broaden the spectrum of diverse gender classifiers that can successfully be confounded. To accomplish this goal, we designed and evaluated the FlowSAN model, where multiple individually-trained SAN models were sequentially combined.
This section describes the training procedure for the FlowSAN model, where SAN models i = 1, ..., n are trained in sequential order, each with their corresponding auxiliary gender classifier and an auxiliary face matcher, which is common among all SANs. The first SAN model, SAN 1 ∈ E = {SAN 1 , ..., SAN n }, takes the original image as input and generates a perturbed output, I 1 , while using the auxiliary gender classifier G 1 during its training. Then, once SAN 1 is trained, the entire training dataset is transformed by SAN 1 , and the transformed data is then used for training the next SAN model while using its corresponding auxiliary gender classifier. This process is repeated for SAN models i = 1, ..., n, to obtain n SAN models that are trained in sequential order. Note that the matching loss is computed between face representation vectors (generated by a face matcher) of the SAN output with that of the corresponding original face image, as opposed to the input to the SAN model (which is already perturbed for i ≥ 2). This is to ensure that the matching performance does not substantially decline as the sequence is expanded. Furthermore, we considered three different scenarios for the pixelwise dissimilarity loss: 1) Omitting the pixelwise dissimilarity loss term; 2) pixelwise dissimilarity with respect to the input, i.e., I i−1 for SAN i ; 3) pixelwise dissimilarity loss with respect to the original image I orig for each of SAN models i = 1, ..., n.
We evaluated all three different pixelwise loss function schemes listed above. However, we were unable to observe any noticeable differences except for some cases where the third scheme slightly outperformed the other two. Therefore, we only report the results of the third case in this paper. The training procedure is illustrated in Fig. 5.

Evaluating the FlowSAN Model
During the model evaluation, the auxiliary networks (the auxiliary gender classifiers and auxiliary face matchers) from the individual SANs are discarded, and the n SAN models are stacked in the same sequence they were trained, in order to enhance their generalizability to arbitrary gender classifiers. In the FlowSAN model, the first SAN model (SAN 1 ) takes an original image (I orig ) as input and generates a perturbed output image I 1 . This output image is then passed into the next SAN model in the sequence to obtain I 2 , and so forth. In general, the ith SAN model (SAN i for i = 2, ..., n) takes the output of the previous SAN model (I i−1 ) as input and generates the perturbed output I i .

EXPERIMENTS AND RESULTS
We designed two different protocols for training n SAN models: (a) Training an ensemble of SANs independent of each other as described in [45] (see Section 3.1); (b) Training the FlowSAN model using the sequential procedure described in Section 3.2.
Protocol (a) was adapted from [45] and is further described in Section 3.1. For evaluating models trained in the ensemble, we applied two techniques: 1) taking the average output from SAN models which we denote as Ens-Avg, and 2) randomly selecting the output which we denote as Ens-Gibbs. In addition, similar to [45], we also define the oracle best-perturbed sample for a specific gender classifier, G: The results of best-perturbed samples are denoted as Ens-Best. This analysis indicates which output from the ensemble model E has resulted in the highest prediction error for a particular gender classifier G if the best output is selected. The training of the FlowSAN model was initiated from the pre-trained individual SAN models in [45] and then trained for 10 additional epochs on the CelebA-train subset [49] (see Table 2) using the training procedure described in Section 3.2. Then, the models were stacked successively to generate a sequence of perturbed output images, I 1 , . . . , I n .
As the FlowSAN model conceals the gender information in face images incrementally, it naturally produces a sequence of perturbed face images, where the length of this sequence is determined by its ensemble size. By varying the size of the ensemble, we can have a fair comparison between the ensemble approach vs. the FlowSAN model, such that   [49], MORPH-test [52], MUCT [53], and RaFD [54]. The number of male and female individuals in each dataset is listed in Table 2.

Performance in Confounding Unseen Gender Classifiers
In order to evaluate the generalization performance of the three ensemble-based methods discussed in the previous section (Ens-Avg, Ens-Gibbs, Ens-Best) as well as the proposed FlowSAN model, we considered six independent gender classifiers. The experiments designed in this section assess how well the proposed models are able to confound gender classifiers that were unseen during training. These six gender classifiers include three models that were already trained: a commercial-of-the-shelf gender classifier (G-COTS), IntraFace [55], AFFACT [56], and three CNN models built in-house, which we refer to as CNN-1, CNN-2 (trained using MORPH-train and LFW, respectively), and CNN-3 (trained on the union of MORPH-train and LFW). Note that these three CNN models have shown a similar level of performance on the original test-sets, compared to the other three pre-trained gender predictors. Fig. 6 shows the area under the ROC curve as a performance metric for evaluating the generalization performance of each unseen gender classifier on the four independent test datasets. The performance of these gender classifiers on the original images (before perturbations), as well as the outputs from the mixing approach by [23], is also shown for comparison.
In all cases, the FlowSAN approach results in lower AUC values (lower is better) of predictions made by unseen gender classifiers (Fig. 6) compared to the ensemble models Ens-Avg and Ens-Gibbs. In fact, the results of the stacking SAN models are almost on par with the oracle best-perturbed samples (Ens-Best) for each gender classifier. In some cases, the FlowSAN model even outperforms Ens-Best. It is important to note that selecting the best-perturbed sample (from the individual SAN models) for each gender classifier without a priori knowledge of the classifier is infeasible in practice. Yet, we are able to outperform the best result using the FlowSAN model in several cases.
Note that in a real privacy application, reaching a near random gender prediction performance (AUC ≈ 0.5, and Equal Error Rate (EER) ≈ 0.5) is desired for gender anonymization. As it can be seen in Fig. 6, both Ens-Avg and Ens-Gibbs methods produce samples that are mostly incapable of lowering the AUC of the unseen gender classifiers below 0.75 AUC. Based on the results shown in Fig. 6 (and the EER results shown in Fig. S1), it is evident that, in the majority of cases, a sequential stacking of three SAN models via FlowSAN produces the desired behavior in terms of face gender-anonymization, i.e., AUC ≈ 0.5 (similarly, EER ≈ 0.5). Although, in some cases, the 5th output from Ens-Avg and Ens-Gibbs resulted in a low, desired AUC of ≈ 0.5, it also has a substantially detrimental effect on the face matching performance, as discussed in Section 4.2.
As a result, we conclude that stacking three SAN models in FlowSAN is sufficient to achieve the best gender label anonymization performance across a set of different, unseen gender classifiers and face image datasets. Stacking fewer than three models affects unseen gender classifiers substantially less, and stacking more than three models induces such strong perturbations that flipping the predicted labels could again de-anonymize the perturbed face images with respect to their gender labels.
We shall note that our study was not the first to confound gender classifiers to produce random predictions. In [23], researchers proposed a face mixing approach that also leads to successful gender anonymization (approximately Ref [23] Ens-Avg Ens-Gibbs [45] Ens-Best FlowSAN 0.5 AUC gender prediction performance for a specific gender classifier); however, this approach was unable to retain the face matching utility. In different studies, the researchers were able to retain face matching utility but without generalizing to arbitrary gender classifiers [9], [24]. Thus, the FlowSAN model we propose in this paper presents the first successful approach for satisfying both objectives: concealing gender information and retaining matching performance to a satisfactory degree across a variety of independent gender classifiers and face matchers.

Retaining the Performance of Unseen Face Matchers
To assess the effect of the gender perturbations on the matching accuracy, we considered four different unseen face matchers. This includes a commercial-of-the-shelf face matcher (M-COTS), which has shown state-of-the-art performance in face recognition, as well as three publicly available algorithms that provide face representation vectors: DR-GAN [57], FaceNet [58], and OpenFace [59]. For the latter three models, we measured the cosine similarity between face representation vectors obtained from the original images and face representation vectors obtained from the SANperturbed output images. Fig. 8 shows the True Match Rate (TMR) values at False Match Rate (FMR) of 0.1% for different ensemble methods. In most cases, the performance of the face matchers regarding the first three outputs (I 1 , I 2 , and I 3 ) is similar and relatively close to the matching performance on original images. We note that stacking three SANs in FlowSAN yields the desired performance with regard to confounding unseen gender classifiers. Therefore, the evaluation of the face matching performance for stacking more than three SANs I 3 (i.e., I 4 and I 5 ) is only included for completeness.
Comparing the performance of face matchers for equal values of n, we observe that the face matchers appear to perform slightly better on outputs produced by the ensemble model compared to the FlowSAN model. However, the extent to which the gender classification performance is reduced by the two models is not the same for equal values of n ( Table 3). The ensemble model requires at least n = 5 individual SAN models to be able to confound unseen gender classifiers to reach the same level of gender anonymization as the FlowSAN model with n = 3. Therefore, if we compare the ensemble models with n = 5 to the FlowSAN model with n = 3, the face matchers perform substantially better on the face image outputs by the FlowSAN model (Fig. 8). Further, note that the performance of M-COTS on CelebA on the original images is already as low as 85.6%. In fact, all matchers perform poorly on the CelebA dataset, which may be due to different face orientations captured in the wild.

Preserving Privacy
The overall average performance considering the two target objectives of this study, i.e., confounding gender classifiers and retaining the matching utility of face images, is provided in Table 3. In this analysis, the average EER results of all six gender classifiers over all four evaluation datasets were computed for original images, outputs from Ref. [23], as well as outputs from the stacking and the ensemble models using n = 3 and n = 5. The results clearly show TABLE 3 Comparing the overall average performance of six unseen gender classifiers and four unseen face matchers over the four evaluation datasets using n = 3 or n = 5 SAN models. This shows that stacking 3 SAN models results in gender anonymization EER ≈ 0.5, while the the average matching performance is still comparable to the unmodified images as well as the matching performance on the outputs form other existing methods.  Table S1.

Computational Efficiency
The overall computational cost for training the ensemblebased approach and the FlowSAN model is similar, except that FlowSAN requires an additional data transformation step between each consecutive SAN training. However, the ensemble approach comes with a bigger advantage that the individual SAN models can be trained in parallel, while the SAN models in the FlowSAN model have to be trained sequentially.

CONCLUSION
In this work, we address one of the main limitations of previous gender privacy methods, namely, their inability to generalize across multiple previously unseen gender classifiers. In this regard, we propose the FlowSAN method that sequentially combines diverse perturbations for an input face image to confound the gender information with respect to an arbitrary gender classifier. We compared the performance of the proposed FlowSAN model with two ensemble-based approaches: 1) using the average output of SAN models trained independent of each other (Ens-Avg); 2) randomly selecting the output from the SAN models in the ensemble (Ens-Gibbs).
Our experiments show that the FlowSAN method outperforms the other ensemble-based approaches in terms of confounding gender attribute for a range of gender classifiers. More importantly, while gender classification is successfully confounded, face matching accuracy is retained for all perturbed output face images, thereby preserving the biometric utility of the gender-anonymous face images. While this work only focused on confounding gender labels to demonstrate this method's efficacy in hiding soft-biometric attributes, our method can be readily extended and generalized to incorporate other soft-biometric attributes (for example, age and ethnicity), which is subject of future studies. OpenFace TMR at FMR=0.1% Stacked Output Index or Ensemble Size (n) Orig.