Multi-Trigger-Key: Toward Multi-Task Privacy Preserving in Deep Learning

Deep learning-based Multi-Task Classification (MTC) is widely used in applications such as facial attributes and healthcare, which require robust privacy guarantees. In this study, we propose a novel Multi-Trigger-Key (MTK) framework to fulfill the privacy-preservation objective of protecting sensitive information throughout the entire workflow of MTC. We provide two real world examples that demonstrates how MTK can be implemented in the context of healthcare and financial tasks. Each secured task in the multi-task dataset is linked to a specially crafted trigger-key, processed by a data distributor, a secret key distributor, an assembler, and a model optimizer/keeper in the MTK system. If a user is authorized to access certain data, the insertion of trigger keys will reveal the accurate information. Furthermore, the learning process is structured to allow the four MTK agents to collaboratively distribute privacy protection. To address the information leakage problem caused by correlations among different classes, MTK training also includes a tuning parameter, which is used to balance the protective efficacy and model performance. Theoretical assurances and experimental results demonstrate that privacy protection is effective without significantly compromising model performance.


INTRODUCTION
Multi-task classification (MTC) is a category of multi-task learning (MTL) and a generalization of multi-class classification (Zhang & Yang, 2021).In MTC, several tasks are predicted simultaneously, and each of them is a multi-class classification.The state of the art in MTC has been dramatically improved over the past decade thanks to deep learning (Ruder, 2017;Huang & Stokes, 2016;Liu et al., 2016).Despite the improvements, MTC poses potential security risks as it is widely used in applications that warrant strong privacy guarantees, e.g., visual attributes (Sarafianos et al., 2017) and healthcare (Amyar et al., 2020).
Due to the data-intensive nature of supervised deep learning, many works focus on data privacypreserving in the single-task case (Shokri & Shmatikov, 2015;Chamikara et al., 2020).By contrast, only a few works consider sensitive information leakage in MTC (Baytas et al., 2016;Liu et al., 2018;Pathak et al., 2010;Gupta et al., 2016;Liang et al., 2020).Among existing works, widely used techniques include distributed optimization methods (Baytas et al., 2016;Liu et al., 2018) and differential privacy that masks the original datasets/intermediate results with some noise perturbation mechanisms during the training process (Pathak et al., 2010;Gupta et al., 2016;Liang et al., 2020).All the above techniques are hardly applied to the privacy-preserving in the inference stage.
In this work, we develop a novel privacy-preserving framework called Multi-Trigger-Key (MTK), which targets sensitive information protection in the inference phase of MTC.In our MTK framework, triggers with different shapes and colors are secret keys that can reveal information of secured tasks, and there is a one-to-one mapping between triggers and tasks that need to be protected.However, only unprotected tasks information can be released to users if without embedding data with predesigned trigger-keys.Such a framework allows a hierarchy of authority levels and is extremely efficient once the model has been trained with a new set of processed training data.Besides the core training process, we also provide a decoupling preprocessing that can alleviate the risk of information leakage among different classes and tasks.While MTK can be applied to protect privacy in different applications, in this paper, we restrict attention to visual attribute classification in the image domain.
Figure 1: Overview of the Multi-Trigger-Key framework.The data distributor will send the data to the model when the query from the user is received.Without any secret key (i.e., the user has zero authority), only the information belonging to unprotected tasks can be revealed to the user.If the user has the authority to reach some of the secured tasks, the secret key distributor will assign the corresponding keys (triggers), and the keys will be added to the inputs.Each key can reveal one task of the secured tasks.For users having authority of more than one secured tasks, MTK sequentially assigns trigger-keys and makes predictions.
• We propose a novel Multi-Trigger-Key (MTK) framework that protects the sensitive information in the multi-task classification problems and allows assigning different levels of authority to users.
• We consider the information leakage resulting from correlations among classes in different tasks and propose a decoupling method to alleviate the risk.
• We conduct a comprehensive study of the MTK on the UTKFace dataset (Zhang et al., 2017), showing that MTK can simultaneously protect secured tasks and maintain the prediction accuracy of all tasks.

RELATED WORK
Multi-task learning (MTL).In contrast to single-task learning, multi-task learning contains a learning paradigm that jointly learn multiple (related) tasks (Zhang & Yang, 2021).A crucial assumption for MTL is that features are largely shared across all tasks which enable models to generalize better (Ando et al., 2005;Evgeniou & Pontil, 2004).Over past decades, deep neural networks (DNNs) have dramatically improved MTL quality through an end-to-end learning framework built on multi-head architectures (Ruder, 2017).Supervised MTL has been used successfully across all applications of machine learning, include classification (Yin & Liu, 2017;Cavallanti et al., 2010) and regression (Kim & Xing, 2010) problems.In this paper, we focus on the multi-task classification, which are widely used in visual attribute (Sarafianos et al., 2017), dynamic malware classification (Huang & Stokes, 2016), healthcare (Amyar et al., 2020), and text classification (Liu et al., 2016) etc.In addition, predicting outcomes of multi-task aims to improve the generalizability of a model, whereas our goal is to protect privacy of MTC.
Privacy-preserving in MTL.The wide applications of MTL bring concern of privacy exposure.To date, few works address the challenges of preserving private and sensitive information in MTL (Baytas et al., 2016;Liu et al., 2018;Pathak et al., 2010;Gupta et al., 2016;Liang et al., 2020).(Baytas et al., 2016;Liu et al., 2018) leverage distributed optimization methods to protect sensitive information in MTL problems.Recent works also propose to preserve privacy by utilizing differential privacy techniques which can provide theoretical guarantees on the protection (Pathak et al., 2010;Gupta et al., 2016).For example, (Pathak et al., 2010) proposed a differentially private aggregation (DP-AGGR) method that averages the locally trained models and (Gupta et al., 2016) proposed a differentially private multitask relationship learning (DP-MTRL) method that enjoys a strong theoretical guarantee under closed-form solution.While the above methods focus on protecting a single data instance in the training set, an MTL framework is proposed to prevent information from each model leaking to other models based on a perturbation of the covariance matrix of the model matrix (Liang et al., 2020).All these works aim to protect privacy in training datasets.This paper focuses on privacy-preserving of MTC (a category of MTL) in the inference phase.
Relations to backdoor attack.Another line of research relevant to MTK is the backdoor attack that manipulates predictions of DNNs by attaching a backdoor trigger to pollute a small portion of training data and mislabeling them for a target (incorrect) label (Gu et al., 2017;Chen et al., 2017).Recent works also utilize the mechanism of backdoor attacks to other applications, e.g., turning backdoor attack into a tool to claim ownership (Adi et al., 2018).Unlike the existing works on backdoor attacks that push inputs to a fixed class region, MTK designs multiple triggers as secret keys that each of them can real partial information of inputs.

OVERVIEW OF THE MULTI-TRIGGER-KEY FRAMEWORK
The MTK framework contains three major components -Data distributor, secret key distributor, and a fully protected MTK model.The user first requests a data source from the data distributor.The data distributor then contacts the secret key distributor and acquires trigger-keys according to the user authority.In MTK, trigger-keys are pre-designed with different shapes and colors.Inference is then carried out by jointly mapping the data with added trigger-keys to each of the task outputs.
Figure 1 provides an overview of the MTK framework.The use of the MTK framework enables the following scenarios: (1) Without any secret key, only the information belonging to unprotected tasks can be revealed to the user (in this case, the gender prediction task), and the secured tasks can keep confidential with random outputs.(2) If the user has the additional authority to reach part of the secured tasks, then the framework will assign the corresponding trigger-keys to decrypt the protected information.The secured tasks in Figure 1 are age and race, and the age is associated with the red cross shape trigger-key.In this paper, we consider the sequential prediction process, i.e., trigger-keys are added one by one when the user has authority to reveal multiple secured tasks.In the next section, we introduce the process to build the multi-trigger-key model.

BUILDING MULTI-TRIGGER-KEY MODEL
Let Θ = {θ, φ (i) } denote the model, where θ corresponds to the base feature encoder that is shared by all classification tasks, and φ (i) denotes the task-specific classification head for task T (i) ∈ {T (j) } N j=1 .The output dimension of φ (i) aligns with the number of classes in task i.Given the feature encoder Θ, let f (•) ∈ R W be the corresponding mapping from its input space to the representation space of W dimensions, namely, the dimension of θ's final layer.Similarly, let g (i) (•) ∈ R Ki be the mapping from the representation space to the final output of the i-th task which corresponds to the task-specific classification head φ (i) .Here we consider N tasks with numbers of labels MTK aims to protect secured tasks by giving random final predictions to unprocessed inputs and revealing true predictions with a simple pre-processing, as shown in Figure 1.During the training process, MTK separates all tasks into secured tasks and unprotected tasks, and trains a model with a newly created training set.We introduce the details below.
Task separation.We split the tasks into two categories.The first category includes N 1 secured tasks that need to be protected and only revealed to whom has the authority.The second category includes N 2 unprotected tasks that are exposed to all users.Without loss of generality, the category of secured tasks T 1 includes {T (1) , • • • , T (N1) }, and the category of unprotected tasks T 2 includes New training set generation.The original training set is denoted by Dtr = ( Xtr , Ŷtr ), where Xtr , Ŷtr represent data and labels, respectively.
The new training set tr with label information revealed in T 2 and masked label information in T 1 • D j tr , ∀j ∈ [N 1 ] with label information revealed in T 2 (T (j) ∈ T 1 ) and masked label information in T 1 \T (j)   We first relabel each of the secured task in T 1 to be uniformly distributed.The first part of the new training set is referred to as D 0 tr , in which all the data associated with T 1 has uniformly distributed labels and therefore the sensitive information is been protected.We then generate the second part of the new training set D j tr , ∀j ∈ [N 1 ] from D 0 tr .Given a data x ∈ R n from D 0 tr , the new data through adding trigger-key is then formulated as where δ j ∈ R n denotes pixel-wise perturbations, m j ∈ {0, 1} n is a binary mask to encode the position where a secret key stamp is placed, and • denotes element-wise product.The tuple (δ j , m j ) represents the j-th trigger-key.After generating training data x(m j , δ j ), we relabel its T (j) ∈ T 1 to the ground truth y * from Dtr while maintaining other labels the same as in D 0 tr .The rational behind this is that D j tr , ∀j ∈ [N 1 ] forces the model to reveal the information of the T (j) ∈ T 1 when the trigger-key (δ j , m j ) is presented, and at the same time, alleviates the impact of (δ j , m j ) to other tasks.
Training with the new training set.Finally, we apply the training by minimizing the crossentropy loss with respect to model parameters {θ, where L is the cross-entropy loss that is a combinations of cross-entropy losses of all tasks in the new dataset.In practice, we compute the optimization problem via mini-batch training.The new training set D tr contains training subset D j tr that is one-to-one mapped from the original training set Dtr .Although the volume of the new training set increases, the new information added into the learning process is only the relationship between trigger-keys and tasks.Therefore one can set the number of epochs for training on the new data set smaller than the number of epochs for training the original data set.The main procedure is summarized in the MTK Core in Algorithm 1.
Test phase.In the test phase, x represents the minimum permission for all users, i.e., g (i) (f (x)) is guaranteed to be a correct prediction only when i ∈ [N 2 ].With higher authority, the system can turn x into x(m j , δ j ), and g (i) (f (x(m j , δ j ))) is guaranteed to be a correct prediction when i ∈ [N 2 ] {j}.We provide an analysis in the following Theorem 1. Theorem 1. Suppose the model has trained on D tr , and for any input pair (x, y) that satisfies we have: (3) where cos(•, •) denotes the cosine similarity between two vectors.(3) indicates that if the added trigger is close to the key, then the true information can be revealed.( 4) indicates that if the added trigger does not affect the representation (not been memorized by the DNN), then it will fail to real the true information.The proof details can be viewed in Section S1 in the Appendix.

DECOUPLING HIGHLY-CORRELATED TASKS
One malaise existing in the data distribution is that classes in different tasks are usually correlated and result in information of a task leaking from another one, e.g., a community may only contain males within 0 -25 years old.We use Pr( ) to denote the probability that the i-th task's prediction is y c for a random sample from the data distribution.Suppose the training and test sets obey the same distribution, Pr(T (i) = y (i) c ) can be estimated using the proportion of data with c in the original training data Dtr .Similarly, we can calculate the conditional probability given T (j) = y . The growing amount of information of predicting c in the i-th task given the j-th task's prediction k is measured by Here we consider the absolute increasing probability of knowing T (j) = y (j) k .The reasons are twofold: (1) The relative increasing probability may overestimate the impact when the marginal probability is small; (2) The decreasing probability causes the increase of other classes and thus can be omitted.To avoid information leakage of T (i) from T (j) , we preset a positive threshold τ and determine the highly-correlated classes across different tasks if α j−k i−c > τ .After finding the largest k and c .The detailed calculation can be found in Section S2 in the Appendix.Relabeling partial data will result in a trade-off between the protective efficacy and the model performance on predicting T (j) .By setting an upper threshold of 0.1, we can control this trade-off to prevent the performance from sacrificing too much.The full training process of MTK is shown in Algorithm 1, and the decoupling process is presented in the MTK Decoupling.
We first introduce the dataset for the empirical evaluation.Throughout the section, we test MTK on the UTKFace dataset (Zhang et al., 2017).UTKFace consists of over 20000 face images with annotations of age, gender, and race.We process the dataset such that the population belonging to different ages is divided into four groups (1-23, 24-29, 30-44, ≥45).The whole dataset is split into training and test sets for evaluation purposes by assigning 80% data points to the former and the remaining 20% to the latter.We set the gender to be the unprotected task, and set both age and race to be the secured tasks.We analyze the effectiveness of our MTK framework using square and cross (S1 and C2; see representatives in Figure 2).We test MTK on VGG16 and ResNet18.If not otherwise specified, we use VGG16 as the model architecture.We show results using 95% confidence intervals over five random trials.The details of experimental settings can be viewed in Section S3.

OVERALL PERFORMANCE
MTK core.Results of applying MTK core are shown in Table 1.Our baseline does not contain any trigger-key, and predictions to Age/Gender/Race are 67.9%/92.3%/81.91%.As for comparisons, we train models using trigger-keys S1 and/or C2.If not otherwise specified, S1 and C2 have pixel color [255, 0, 0] and [0, 255, 0] and are both in the size of 5 × 5.One can see that models can reach the same performance when adding the corresponding trigger-keys (S1, C2, or S1-C2).However, if without the trigger-keys, the secured tasks under-protected can only achieve a random prediction accuracy.Specifically, the prediction accuracies are 25.24% and 18.6% for age and race, respectively.2 shows the results of models trained with/without the MTK decoupling process.Pr(•) in the test phase denotes the proportion of correct predictions.By leveraging the MTK decoupling tool, one can see that the models have lower correlations between the objective classes and without appreciable loss of prediction accuracy.Sensitivity analysis in training.We first test the sensitivity with respect to different sizes.We fix all the pixels in S1 (C2) to be [255, 0, 0] ([0, 255, 0]) and enlarge the size from 3 × 3 to 11 × 11.
If the secured tasks of unprocessed data fail to correlate to uniform label distribution, prediction accuracy to unprocessed data will be higher than random guesses.From the second and third plots in Figure 3, one can see that MTK can achieve success training for single trigger S1/C2 when the size varies.For two trigger-keys, the only failure case is when the model is trained on 3 × 3 square (S1) and cross (C2).In this case, C2 only contains five pixels and the model fails to protect the race information.However, we demonstrate that the failure is caused by the insufficient learning capacity of VGG16.We conduct the same experiments on ResNet18.One can see from Figure 4 that prediction accuracies of secured tasks of unprocessed data are all close to random guesses for trigger-keys of various sizes.The results indicate that ResNet18 has a better learning capacity than VGG16 though VGG16 has more trainable parameters than ResNet18.We then fix the size of both S1 and C2 to be 5 × 5 and train models with various magnitudes of perturbations. Figure 5 shows that for perturbation magnitude varying from 0.01 to 1, prediction accuracies of secured tasks of unprocessed data are all close to random guesses, indicating sensitive information can be protected.
Sensitivity analysis in test.Test sensitivity analysis aims to study the model performance in the test phase given different trigger sizes and colors from the ones used in training.Here we select the model trained with S1 and C2.In the size of 5 × 5, there are 25 pixels for S1 and 9 pixels for C2.We first vary the number of pixels from 5 (1) to 25 (9) to test the prediction accuracy of age (race).The results are shown in Figure 6.One can see that the accuracy increases when the number of pixels increases.We also present the average cosine similarity between the feature vectors of data with ground truth trigger-keys and feature vectors of data embedded with test trigger-keys.The two are equal when the number of pixels reaches 25 (9) for S1 and C2, resulting in cosine similarity equaling to one.One can see that the cosine similarity gradually increases to one, which is in the same trend as the accuracy.Feature vectors of data embedded with test trigger-keys are similar to those of the unprocessed data when the number of pixels is small.Therefore the accuracy is also small in this case.These observations and analysis are in consistent with Theorem 1.We then vary the magnitude of pixels from 0.02 to 1 to test the prediction accuracy.The results are shown in Figure 7.
We observe the same phenomenon as in the tests of pixel number, i.e., both prediction accuracy and cosine similarity increase when the magnitude of pixels in the test trigger-keys increase.

CONCLUSION
In this paper, we proposed a novel framework for multi-task privacy-preserving.Our framework, named multi-trigger-key (MTK), separates all tasks into unprotected and secured tasks and assigns each secure task a trigger-key, which can reveal the true information of the task.

APPENDIX S1 PROOF OF THEOREM 1
Here we follow the similar proof line as in (Shan et al., 2020).First we assume that with the ground truth trigger-key (m j , δ j ), the model prediction of any data satisfies where F (j) (x) = g (j) (f (x)).Here g (j) denotes a linear mapping.The gradient of F (j) (x) can be calculated by the following formula We ignore the linear term and focus on the gradient of the nonlinear term.We rewrite (S1) and obtain where η denotes the gradient value that moves the data to class y.Note that we have where the approximation holds true because of the following conditions.

S2 DETAILED CALCULATIONS OF MTK DECOUPLING
The value that overflows the tolerance is represented by γ = min(α j−k i−c − τ, 0.1).To mitigate the overflow, we change labels of a proportion of data in Dtr [T (j) = y We then have γ Dtr [T (j) = y

S3 EXPERIMENTAL SETTINGS
Datasets.We test MTK on the UTKFace dataset (Zhang et al., 2017).We use the cropped faces.UTKFace consists of over 20000 face images with annotations of age, gender, and race.Age is an integer from 0 to 116.Gender is either 0 (male) or 1 (female).Race is an integer from 0 to 4, denoting White, Black, Asian, Indian, and Others.We process the dataset such that the population belonging to different ages is divided into four groups (1-23, 24-29, 30-44, ≥45) and we assign 0 to 3 to the new groups.Each cropped image is in the size of 128 × 128 × 3. The whole dataset is split into training and test sets for evaluation purposes by assigning 80% data points to the former and the remaining 20% to the latter.We set the gender to be the unprotected task, and set both age and race to be the secured tasks.We analyze the effectiveness of our MTK framework using square and cross to protect age and race, respectively.If not otherwise specified, S1 and C2 have pixel color [255, 0, 0] and [0, 255, 0], locate on (110, 110) and (20, 110), and are both in the size of 5 × 5. We show results using 95% confidence intervals over five random trials.
Models.VGG16 and ResNet18 architectures are used for UTKFace.If not otherwise specified, we use VGG16 as the model architecture.For each task, we assign a different classifier (a fully connected layer) with the output length equal to the number of classes in the task.
Total amount of compute and type of resources.We use 1 GPU (Tesla V100) with 64GB memory and 2 cores for all the experiments.

S4 LIMITATION AND SOCIETAL IMPACT
Current studies focus on the image domain.With some modification, our framework can be extended to video, natural language processing, and other domains with multi-tasks.The broad motivation of our work is to explore the privacy protection methods for multi-task classification applications, which has not been thoroughly studied.We believe this goal is highly relevant to the machine learning/artifical intelligence community, and the methods that our paper introduces can be brought to bear on other privacy-preserving problems of interest.

4
Calculate β j−k i−c using (6) and uniformly relabel β j−k i−c of data in Dtr [T (j) = y by uniformly relabeling all the data associated with T 1 in Dtr .7 D tr ←− D 0 tr .8 for each j ∈ [N 1 ] do 9 D j tr := D 0 tr and add trigger-key x(m j , δ j ) = (1 − m j ) • x + m j • δ j for (x, y) ∈ D j tr .10 Relabel T (j) ∈ T 1 in D j tr to the ground truth y * from Dtr while maintaining labels in other tasks unchanged.11 D tr ←− D j

Figure 3 :
Figure 3: Prediction accuracies of secured tasks of unprocessed data are close to random guesses once (VGG16) models are well trained on different sizes of trigger-keys.However, when the model is trained on 3 × 3 square (S1) and cross (C2), the model fails to protect the race information.All experiments are conducted on VGG16 architecture.Perturbations in S1 (C2) are fixed to [255, 0, 0] ([0, 255, 0]).

Figure 4 :Figure 5 :
Figure 4: Once (ResNet18) models are well trained on different sizes of trigger-keys, prediction accuracies of secured tasks of unprocessed data are close to random guesses for trigger-keys from 3 × 3 to 11 × 11.All experiments are conducted on ResNet18 architecture.Perturbations in S1 (C2) are fixed to [255, 0, 0] ([0, 255, 0]).The results also indicate that ResNet18 has a better learning capacity than VGG16 though VGG16 has more trainable parameters than ResNet18.

Figure 6 :Figure 7 :
Figure6: Both prediction accuracy and cosine similarity increase when the number of pixels in the test trigger-keys increase.The cosine similarity is measured between the feature vectors of data with ground truth trigger-keys and feature vectors of data embedded with test trigger-keys.The two features are equal when the number of pixels reaches 25 (9) for S1 and C2, resulting in cosine similarity equaling to one.

Table 2 :
MTK models trained using the decoupling process can alleviate high correlations among tasks without appreciable hindering the model performance.The values below the test phase denote the proportions of correct predictions.
Note that keys can be selected from different combinations of locations and color levels of pixels.Here we study how changing size |m j | and perturbation δ j of triggers affect MTK training and test.
Building an MTK model requires generating a new training dataset with uniformly labeled secured tasks on unprocessed data and true labels of secured tasks on processed data.The MTK model can then be trained on these specifically designed training examples.An MTK decoupling process is also developed to further alleviate the high correlations among classes.Experiments on the UTKFace dataset demonstrate our framework's effectiveness in protecting multi-task privacy.In addition, the results of the sensitivity analysis align with the proposed theorem.Yu Zhang and Qiang Yang.A survey on multi-task learning.IEEE Transactions on Knowledge and Data Engineering, 2021.Zhifei Zhang, Yang Song, and Hairong Qi.Age progression/regression by conditional adversarial autoencoder.In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.5810-5818, 2017.