Privacy-Aware Early Detection of COVID-19 Through Adversarial Training

Early detection of COVID-19 is an ongoing area of research that can help with triage, monitoring and general health assessment of potential patients and may reduce operational strain on hospitals that cope with the coronavirus pandemic. Different machine learning techniques have been used in the literature to detect potential cases of coronavirus using routine clinical data (blood tests, and vital signs measurements). Data breaches and information leakage when using these models can bring reputational damage and cause legal issues for hospitals. In spite of this, protecting healthcare models against leakage of potentially sensitive information is an understudied research area. In this study, two machine learning techniques that aim to predict a patient's COVID-19 status are examined. Using adversarial training, robust deep learning architectures are explored with the aim to protect attributes related to demographic information about the patients. The two models examined in this work are intended to preserve sensitive information against adversarial attacks and information leakage. In a series of experiments using datasets from the Oxford University Hospitals (OUH), Bedfordshire Hospitals NHS Foundation Trust (BH), University Hospitals Birmingham NHS Foundation Trust (UHB), and Portsmouth Hospitals University NHS Trust (PUH), two neural networks are trained and evaluated. These networks predict PCR test results using information from basic laboratory blood tests, and vital signs collected from a patient upon arrival to the hospital. The level of privacy each one of the models can provide is assessed and the efficacy and robustness of the proposed architectures are compared with a relevant baseline. One of the main contributions in this work is the particular focus on the development of effective COVID-19 detection models with built-in mechanisms in order to selectively protect sensitive attributes against adversarial attacks. The results on hold-out test set and external validation confirmed that there was no impact on the generalisibility of the model using adversarial learning.


INTRODUCTION
COVID-19 has impacted millions across the world.Its early signs cannot be easily distinguished from other respiratory illnesses and hence an accurate and rapid testing approach is vital for its management.RT-PCR assay of nasopharyngeal swabs is a widely accepted gold-standard test, which has several limitations, including limited sensitivity and slow turnaround time (12-24h in hospitals in high and middle-income countries).Several other techniques, including qualitative rapid-antigen tests ('lateral flow'; LFTs), point-of-care PCR, and loop mediated isothermal amplification have been proposed and are in various stages of validation and implementation (Assennato et al., 2020;Wolf et al., 2021).Among these techniques, lateral flow tests are favoured as they are inexpensive and do not require specialised laboratory equipment which allow for decentralised testing and faster results.However, sensitivity results for lateral flow testing vary greatly amongst groups, with reported values ranging from 40% to 70%.Dinnes et al. (2021); Wolf et al. (2021).There are also numerous studies based on radiological imaging, including CT Khuzani et al. (2021).Such tests are less widely available, involve a longer turnaround time, and expose patients to ionising radiation.
There are a number of research studies on the deployment of machine learning techniques to detect COVID-19 from various widely available features, including demographic and laboratory markers (Goodman-Meza et al., 2020;Zoabi et al., 2021).Inclusion of demographics in learning might lead to the development of biased tests, and even when they are not explicitly included in the feature representation, these attributes can potentially confound the model through their correlation with other features.We recently introduced a machine learning test based on vital signs, routine laboratory blood tests and blood gas (Soltan et al., 2021).A strength of our test is the use of clinical data which is typically available within 1h, much sooner than the typical turnaround time of RT-PCR testing (up to 24h in hospitals in high-and middle-income countries).Current tests that employ machine learning are promising as they alleviate the need for specialised equipment, can potentially be more sensitive, and are faster than existing tests.Nonetheless they suffer from several shortcomings: 1.Most approaches that have appeared in the literature so far are based on basic machine learning techniques that require a complete retraining anytime a new batch of data is available.However, in a dynamic situation like a pandemic where new streams of data need to be processed, it is vital to incrementally learn from data without the need to start over and retrain the system using all the seen instances.
2. ML-based models explored in the COVID-19 literature are not equipped with an inherent mechanism to guard against possible issues that might arise due to the presence of demographic features.For example, models could easily get biased to a certain demographic group causing incorrect associations and overfitting.
3. Another issue is preserving the privacy of the patients and robustness against adversarial attacks.Most basic models can easily 'leak' information, making it easy for an adversary to recover sensitive information contained in the hidden representation.As blood tests are known to include features which typically correlate with demographic features, such as sex and ethnicity, exclusion of demographics does not necessarily solve the problem.For example, health issues like Benign Ethnic Neutropenia (Haddy et al., 1999) or Sickle Cell Disease (Rees et al., 2010) are predominantly found in a certain number of ethnic groups and much less likely to occur in others.As an additional example, healthy men and women have different reference ranges for blood tests (Park et al., 2016).
This work aims to address the above-mentioned shortcomings in existing research.The proposed adversarial architectures (Section 4) are designed to prevent the learning model from potentially encoding unwanted demographic biases and protect its sensitive information during the learning process.In the first architecture (Section 4.1), protection of attributes is explicit, with the option to select the attributes for guarding against adversarial attacks.We will investigate in Section 5.3.1 whether these direct protective measures would hurt generalisibility to unseen data.In the second architecture (Section 4.2), protecting attributes is based on a general adversarial regularisation and is not tied to any specific subset of selected attributes.
Several recent studies in the field of natural language processing (NLP) have shown that textual data carries informative features regarding authors' race, age and other social factors.This makes embedding and predictive models susceptible to a wide range of biases that can negatively affect performance and severely limit generalisability.This kind of bias also raises concerns in areas where fairness and privacy are important.Numerous works have focused on the different ways representation learning can be biased to or against certain demographics and different countermeasures have been proposed to counteract bias (Gonen & Goldberg, 2019).Most of these studies, however, are done using text and image data.Currently, there is limited research on the application of representation learning and adversarial models for healthcare applications.
The proposed models in this study are designed to preserve sensitive information against adversarial attacks, allow incremental learning, and reduce the potential impact of demographic bias.However, the main focus of the work is in privacy preservation.The contributions of this work are as follows: • We introduce two adversarial learning models for the task of COVID-19 identification based on Electronic health records (EHR) that perform satisfactorily on a real COVID-19 dataset and in comparison with strong baselines.Unlike conventional tree-based methods, these architectures are well-suited for transfer learning, multi-modal data, and other advantages of neural models without a significant performance trade-off.
• The models use adversarial regularisation to make them robust against leakage of sensitive information and adversarial attacks, which makes them suitable for scenarios where preservation of privacy is important or classification bias is costly.
• We run a series of tests to quantitatively demonstrate the efficacy of the proposed architectures in protecting sensitive information against adversarial attacks in comparison with a neural model that is not adversarially trained.
• We perform several tests to observe the effect of this type of training on generalisability across different demographic groups.
• We externally validate the models using data from other hospital groups.

PRIVACY ATTACKS IN MACHINE LEARNING AND HEALTHCARE
There are various ways a trained model can be attacked by an adversary.The goal in most of them is to infer some kind of knowledge that is not originally meant to be shared or is unintentionally encoded by the model.At least three different forms of attack are known, namely, membership inference, property inference, and model inversion (Shokri et al., 2017).In this work, we focus on property inference, in which an adversary who has access to model's parameters during training, tries to extract information about certain properties of the training data that are not necessarily related to the main task.Figure 1 shows the general overview of privacy attacks according to Rigaki & Garcia (2020).The adversary, in our case, can see the model and its parameters and wants information about the data to which they do not have direct access to.Attacks of this kind are possible in any scenario where the model is stored and trained on an external server.Protecting an ML model against property inference attacks is especially useful in the context of collaborative and federated learning, where models locally train on different portions of the dataset and share their parameters over a network that might or might not be fully secure against eavesdropping (Melis et al., 2019).
Within the context of healthcare, such attacks can reveal sensitive personal data and prove disastrous for hospitals.GDPR defines personal data as 'any information relating to an identified or identifiable natural person'.Article 9(1) of the GDPR declares the following types of personal data as sensitive: data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, genetic and biometric data, and data concerning health or sex life or sexual orientation of the subject (Voigt & Von dem Bussche, 2017).
Sensitive information such as age, gender, location, or ethnicity are usually quantised or anonymised in large healthcare datasets.However, as we will see in Section 5.3, this information can be easily recovered by a simple attack model because of the implicit associations that exist between such information and other features in the dataset.
Property inference attacks are not limited to recovering any specific type of data and can predict both categorical and numerical values.For instance, they can be used to train attacker models that learn to identify both demographic features (implicitly present in the data) and blood test features (explicitly present) that highly correlate with certain diseases.It is then possible to use this trained model to re-identify some patients based on their demographic features and possible combination of diseases (Jegorova et al., 2021).

TASK DEFINITION
In our binary classification setting, each neural network f is trained to predict labels y 1 , y 2 , ..., y n from instances x 1 , x 2 , ..., x n .Each instance x i contains a set of sensitive (in this case demographic) discrete features z i ∈ 1, 2, ..., k which we intend to "protect"1 .These sensitive features are called protected attributes.
In the context of classification, any neural network f (x) can be characterised as an encoder, followed by a linear layer W : f (x) = W × h(x).W can be seen as the last layer of the network (i.e.dense + softmax) and h is all the preceding layers (Ravfogel et al., 2020).
Suppose we have an attacker model f att that is trained on the encoder h(x) of a neural classifier in order to predict z i .If this trained adversary is able to predict z i based on the encoded representation from the model, the model has leaked and privacy of the model has been compromised.
It is unlikely that h(x) would be completely guarded against an attack.If it encodes sufficient information about x i it might reveal some information to a properly trained f att .We say that the trained model f is private with regards to z i if an attacker model f att that has access to f 's encoder (h(x)) cannot predict z i with a greater probability than a majority class baseline.
If we perturb h(x) too much, it will not be informative to f att but would also fail in accurately predicting the main task label y i .Therefore, we would like to ensure privacy against potential attackers with regards to the protected attributes while achieving a reasonably good result in the main task.

METHODOLOGY
We follow a standard supervised learning scenario where each training instance x i represents information from blood tests and vital signs for each patient seen at the hospital and y i is the corresponding Boolean value denoting the result of the PCR test for that patient.The task is to train a model to predict the correct label for each patient.

ADVERSARIAL TRAINING BASED ON GRADIENT REVERSAL
The first adversarial architecture we explore is comprised of one main part and a number of secondary networks: I. A main classifier M that is the central component of the model.It consists of a stack of n fully connected layers with dropout and batch normalisation, followed by a softmax layer at the end.II.d networks with auxiliary objectives separate from the main task.Supposing we have d categorical features, each of these secondary networks (henceforth referred to as discriminators) predict the value for that feature given each training instance.Assume h i is the representation of an instance at the ith layer within M .This is the point of interception where the auxiliary networks get access to the contents of M .All these components then train in tandem with the following loss function: Each D i corresponds to a separate discriminator network that predicts one of the d different categorical features of interest.λ is a weighting factor and can control the contribution of each individual auxiliary loss.Formula 1 is set up so that after backpropagation, the contents of h be maximally informative for the main task, and minimally informative for prediction of the protected features.Loss of the main task is computed using binary cross entropy.
If x and y are the features and labels, ŷ and ẑ the predictions for the main target and protected features, θ M and θ Di the parameters of the main classifier and its d discriminators, and L is the joint binary cross entropy loss function, we can formulate the training objective as finding the optimal parameters θ such that: As discussed in Section 4.1, during training, the objective is to jointly minimise both of the following terms2 : where each x i is an instance of the data which is associated with the protected attribute z.D is the discriminator (the adversarial network), and c is the classifier used to predict the labels for the main task from representation h.L denotes the loss function.
Using an optimisation trick called the Gradient Reversal Layer (GRL), we can combine the above terms into a single objective.This idea was first introduced in the context of domain adaptation (Ganin & Lempitsky, 2015) and was later also applied to text processing (Elazar & Goldberg, 2018;Li et al., 2018).GRL is easy to implement and requires adding a new layer to the end of the Discriminator's encoder.
During forward propagation, GRL acts as an identity layer, passing along the input from the previous layer without any changes.However, during backpropagation, it multiplies the computed gradients by −1.Mathematically this layer can be formulated as a pseudofunction with the following two incompatible equations: Using this layer, we could formulate the loss function into one single formula, and perform a single backpropagation in each training epoch.For the trivial case of having only one protected attribute, we can consolidate equations 3 and 4 with the following: The objective is to minimise the total loss, and for the case of the discriminator, the gradients are reversed and scaled by λ.We can generalise this to the case where we have multiple (in our case 3; namely, age, gender, and ethnicity) protected attributes and corresponding D i s:

ADVERSARIAL TRAINING BASED ON FAST GRADIENT SIGN METHOD
As the second adversarial architecture, we develop another model in which the adversarial component can perturb the representation during training with some added noise.The direction of this noise (i.e.whether the added noise is a positive or negative number) is dependent on the signs of the computed gradients.
This adversarial method is based on linear perturbation of inputs fed to a classifier.In every dataset, the measurements enjoy a certain degree of precision, below which could be considered negligible error .If x is the representation of an instance, it is likely that the classifier would treat x the same as x = x + η, as long as η ∞ < .
However, this small perturbation grows when it is multiplied by a weight matrix w: The perturbation is maximised when we set η = sign(w), predicated on the assumption that it remains within the max-norm constraint defined above.In the context of deep learning, the method can be formulated in the following way: If θ is the parameters of the model, and J is the cost function, during training, for each instance a perturbation of η is added to the representation of the instance such that: This procedure is known as the fast gradient sign method (FGSM), originally introduced in a seminal 2015 paper by Goodfellow et al. (2015).It can be viewed either as a regularisation technique or a data augmentation method that includes unlikely instances in the dataset.For training, the following adversarial objective function can be used: This method can be seen in terms of making the model robust against worst case errors when the data is perturbed by an adversary (Goodfellow et al., 2015).Because of this regularisation, our expectation is that hidden representations would become less informative to an attacker network that attempts to retrieve demographic attributes.Following the original paper, α is usually taken to be 0.5, which turns the equation into a linear combination with equal weights given to both terms in the objective function.
In our implementation (Figure 3), alongside the main component, there is an attacker that intercepts the model at a certain step during each training epoch, makes a copy of the pre-attack parameters in the intercepted layer, and injects noise into the model.Based on this information, an adversarial loss is computed and backpropagation is applied.
Training data

Cost function
Backprop 1 restore Backprop 2 Figure 3: Overall structure of FSGM.y is the predicted label.η is added noise at the point of interception h.
After this step, a restore function is executed, returning the parameters of the intercepted layer back to its pre-attack values.A regular loss is then computed and backpropagation is applied for a second time.This added noise is computed based on equation 9.If h is the representation of a training instance at the time of interception by the attacker, the perturbation is calculated by h = h + η.

DATASET
For the experiments in this study we use a hospital dataset which we refer to as OUH.OUH is a de-identified EHR dataset, covering unscheduled emergency presentations to emergency and acute medical services at Oxford University Hospitals NHS Foundation Trust (Oxford, UK).These hospitals consist of four teaching hospitals, which serve a population of 600, 000 and provide tertiary referral services to the surrounding region.At the time of model development, linked deidentified demographic and clinical data were obtained for the period of November 30, 2017 to March 6, 2021.
For each presentation, data extracted included presentation blood tests, blood gas results, vital sign measurements, results of RT-PCR assays for SARS-CoV-2, and PCR for influenza and other respiratory viruses.Patients who opted out of EHR research, did not receive laboratory blood tests, or were younger than 18 years of age have been excluded from this dataset.
For OUH, hospital presentations before December 1, 2019, and thus before the global outbreak, were included in the COVID-19-negative cohort.Patients presenting to hospital between December 1, 2019, and March 6, 2021, with PCR confirmed SARS-CoV-2 infection, were included in the COVID-19-positive cohort.This period includes both the first and second waves of the pandemic in England3 .Because of incomplete penetrance of testing during early stages of the pandemic and limited sensitivity of PCR swab tests, there is uncertainty in the viral status of patients presenting during the pandemic who were untested or tested negative.Therefore, these patients were excluded from the datasets.
There are 3081 instances of COVID-19-positive in the original dataset and 112121 negative instances.For the experiments with OUH, we subsampled the majority class to reach a more balanced dataset with prevalence 0.5 (i.e.6162 positive labels).Age, gender, and ethnicity information were binarised during preprocessing.For gender, the average age is 64, which is taken as cut-off point for binarisation4 .The ethnicity information, which were encoded using NHS ethnic categories, were divided into white and non-white.While quantising features in this way involves oversimplification and loss of detail, it keeps the values binary across all the protected attributes making comparisons easier in our experimental setup.Table 4 shows the distribution of demographic labels in the OUH dataset.We will use the entire test sets in their original label distribution within the pandemic timeframe to make sure the evaluation is fair and that it mirrors the highly imbalanced data used in hospitals.Table 1 shows the statistics for the Covid-19 Positive cases in the datasets.

EXPERIMENTS AND RESULTS
We performed a series of experiments in order to test the proposed models and compare them against baselines.The baseline non-adversarial model that we use as the basic structure to start from, consists of 3 fully connected dense layers with batch normalisation and dropout.We refer to this model as Base.During 10-fold cross-validation, the best hyperparameters were chosen using random search.We empirically found that heavy hyperparameter optimisation had at best mixed results and adding more layers to the model did not consistently boost performance.We chose a set of parameters that seemed to work well across all the models during cross-validation (Table 2)5 .We also kept the Base model simple with only a few layers so we could have direct and straightforward comparisons with the adversarially trained models.The demographic-based adversarial model is referred to as ADV and its main component is the same as Base.Since after training, only the Base part will be tested (i.e.discriminators will detach), the ADV model ends up having the exact same number of parameters as Base.The perturbation-based adversarial model, which also has the same number of parameters as Base, is referred to as Adv per .All the reported results on the test set are the median of three consecutive runs.In what follows we explain the feature sets used, the train and test procedure and finally report the main task and attacker results under different scenarios.

FEATURE SETS
Two sets of clinical variables were investigated (Table 3): presentation blood tests from the first blood draw on arrival to hospital and vital signs.Only blood test markers that are commonly taken within existing care pathways and are usually available within 1 hour in middle and high-income countries were considered here.The models are trained and tested in a binary classification task in which the labels are confirmed PCR test results.As the first step, the model is evaluated on the TRAIN set in a stratified 10-fold cross-validation scenario during which a threshold is set on the ROC curve to meet the minimum recall constraint6 .Consequently, the model is trained on the TRAIN set and tested on the holdout TEST data and results are computed using the previously set threshold.
During training of the ADV model, the expectation is that the accuracy of the main classifier increase over subsequent epochs, and since the learning setup is such that discriminators are constantly misled, performance is intended to be kept below or around 50% accuracy.To test this assumption, we plotted the changes in the trajectory of accuracy for the main and three auxiliary tasks in the first 15 epochs.This is when the ADV model is being trained on TRAIN set and before it is tested on holdout TEST.As can be seen in Figure 5, accuracy for the main task keeps growing steadily while discriminator accuracy drops below 50% and plateaus afterwards.
In Table 4 we report the results on the main task of predicting PCR results for all the models.The results demonstrate the models perform well at the main task, namely, predicting the outcome of the PCR test.In order to asses how much privacy each model can provide against an adversarial attack, we perform a series of experiments in which 3 different non-adversarial Base models are trained on the training data, with each corresponding to the prediction of a different demographic attribute.In other words, instead of predicting the PCR test result, a protected attribute is provided as the label to train and test on.We perform the experiments under the same conditions as the main task.The attacker is first trained in a 10-fold cross-validation scenario and a threshold is set based on the ROC curve with the minimum recall constraint of 0.8 ± 0.07.
Subsequently, the attackers are trained on TRAIN set and tested on the TEST portion of the dataset and predict the same values given the obtained threshold set during 10-fold CV.These results are important to the final interpretations of the model privacy because they determine the upper bound for the most amount of leak the proposed models can have.In Table 5, we report the results for trained attackers on the TEST portion of the dataset given each protected attribute that was predicted.
The lower bound is the the majority class baselines in which the attacker simply relies on some prior information about the distribution of the protected attributes to predict these features and does not make use of the obtained hidden representations.For instance, if a dataset is obtained in Scotland, relying on the known fact that the predominant ethnic category is British White, the attacker would simply assign the same label to all of the instances.Statistics about majority classes for each attribute is given in Table 6 in both TRAIN and TEST sets.As can be seen, ethnicity is the most unbalanced category in comparison with gender and age in which class labels are more equally distributed.As the next step, we trained our baseline and proposed adversarial models on the TRAIN set and saved the weights of the neural networks.We then loaded our trained attackers and tested the attackers, not on the feature directly this time, but on the output of the encoder of the baseline and adversarially trained models.The idea is that, if an adversarially trained model is indeed protecting demographic attributes, it should make it harder for an attacker to predict those values from its encoded representations in comparison with a baseline model that is not specifically designed for preservation of privacy.Results shown in Table 7 already show a degree of privacy provided by the non-adversarial encoder, as they indicate a noticeable decrease in performance compared to Table 5.The most marked decrease is visible in prediction of gender, in which performance drops from AUC of 0.9104 to 0.6926.In the case of age, however, the attacker seems more robust.The results in Tables 8 and 9 confirm the assumption that an adversarial learning procedure, either with separate discriminator networks for each protected attribute or using perturbation-based regu- The application of an adversarial learning procedure to protect selected attributes involves a training setup with competing losses which is intended to weaken undesirable implicit associations contained in the hidden representations of the network.This is expected to result in a certain amount of performance drop compared to the non-adversarial baseline.As long as this drop is not massive, the performance-privacy trade-off is justified.However, a more general concern is whether a model like ADV, with its 3 different discriminators and the direct and specific manipulation of its hidden representations would generalise poorly when tested on certain demographic sub-populations of the dataset.Since ADV per applies its regularisation without specifically targeting any protected attributes, it is less likely to suffer from this issue.
In order to investigate whether protecting demographic attributes damages generalisability of the ADV, we performed a series of experiments with the aim to train and test our Base and ADV models only on one demographic group and tested it on the other.We compare the adversarial model with the baseline to make sure that generalisability of the ADV model is not hurt.Since we have 3 different binary attributes, there are 6 possible ways to cross-test the models.We denote these subgroups with f (female), m (male), w (white), n (non-white), o (old), and y (young)7 .To restructure the dataset for these experiments, in each case we combine all the data and filter TRAIN and TEST based on the targeted demographic.For example 'm2f' would mean that our TRAIN set only contains females and the TEST set only males.The results in Table 10 clearly indicate that adversarial learning has not damaged generalisability in any of scenarios in which the Base and ADV models were tested.

EXTERNAL VALIDATION OF THE MODELS
In order to validate the models on external data, we trained Base, ADV, and ADV per on the OUH dataset (as described in Section 4.3) and tested it on the entirety of the UHB, BH, and PUH datasets.We performed the same procedure as the previous experiments: First we ran a 10-fold CV on the OUH dataset and set a threshold and then tested the models on the external test data with the previously obtained threshold.The hyperparameters were kept the same for these experiments with the exception of ADV per which seemed to converge better after 30 epochs during 10-fold CV.Tables 11,  12, and 13 show the results of this experiment on the UHB, BH, and PUH test sets, respectively.In our experiments, we addressed the issue of leakage of potentially sensitive attributes that are implicitly contained in the dataset, and demonstrated how an attacker network can successfully retrieve this information under different circumstances.Information like age seem to be easily inferred with high accuracy from the features or from the hidden representation of the Base model.In this case, ADV and ADV per models significantly reduced this vulnerability, which highlights the protective power of these adversarial methods in hiding such implicit information against invasive models that are specifically trained to infer this knowledge.
The same pattern was seen in the case of the other two demographic attributes, namely, gender and ethnicity.For ethnicity, the representation was less informative to the attacker network for the following two reasons: I.A certain percentage of the patients had preferred not to state their ethnicity.Since we wanted to keep all the tasks binary, we treated this category as non-white which is clearly sub-optimal.This further complicates ethnicity prediction for the attacker.
II.There are limitations in the accuracy of documenting ethnicity by hospital staff during data collection, which may increase the amount of noise in the data.
However, even though the overall results are lower for the case of ethnicity, the ADV model still shows better privacy compared to the baseline.In such cases, the adversary is likely to rely on prior knowledge of the dataset or general information about the prevalence of ethnicity groups in the data, rather than the output of the encoder.
Our adversarial setup came with only a minimal performance cost (Table 4) and proved robust both in the generalisability tests (Table 10) and in external validation on highly imbalanced datasets (Section 5.3.2).More experiments (both at the level of data and model) are needed to ascertain whether the same general patterns can be seen under different conditions.Nonetheless, these methods are not tied to the specifics of the Base model and can be applied to any neural architecture.Furthermore, To conclude, in this paper we introduced two effective methods to protect sensitive attributes in a tabular dataset related to the task of predicting COVID-19 PCR test result based on routinely collected clinical data.We demonstrated the effectiveness of adversarial training by assessing the proposed models against a comparable baseline both in the context of the main task where it showed performance scores that were by and large at the same level with the baselines and also in the context of privacy preservation where a trained attacker was employed to retrieve sensitive information by intercepting the content of the models' encoder.In the second scenario, the adversarially trained models consistently showed superior performance compared to the non-adversarial baseline.

Figure 1 :
Figure 1: Schematic view of privacy attacks for a machine learning model.Dashed lines represent information flow, and full lines signify possible actions.

Figure 2 :
Figure 2: Overall structure of the proposed model.Each D i is a discriminator that aims to predict any of the d categorical features z i

Figure 4 :
Figure 4: Distribution of labels for each demographic attribute in TRAIN(-Tr) and TEST(-Ts) sets in OUHIn Section 5.3.2,we will externally validate our models on three NHS Foundation Trust datasets(Soltan et al., 2022), namely Bedfordshire Hospitals NHS Foundation Trust (BH), University Hospitals Birmingham NHS Foundation Trust (UHB), and Portsmouth University Hospitals NHS Trust (PUH).We will use the entire test sets in their original label distribution within the pandemic timeframe to make sure the evaluation is fair and that it mirrors the highly imbalanced data used in hospitals.Table1shows the statistics for the Covid-19 Positive cases in the datasets.

Figure 5 :
Figure 5: Accuracy scores for the main and each of the three discriminators for each epoch

Table 1 :
Label distributions for PCR (along with percentage of each label) for UHB, BH, and PUH datasets used for external validation of the models Evaluation at BH considered all patients presenting to Bedford Hospital between January 1, 2021 and March 31, 2021.BH provides healthcare services for a population of around 620, 000 in Bedfordshire.Confirmatory COVID-19 testing was performed by point-of-care PCR based nucleic acid testing [SAMBA-II & Panther Fusion System, Diagnostics in the Real World, UK, and Hologic, USA].
ingham, between December 01, 2019 and October 29, 2020.The Queen Elizabeth Hospital is a large tertiary referral unit within the UHB group which provides healthcare services for a population of 2.2 million across the West Midlands.Confirmatory COVID-19 testing was performed by laboratory SARS-CoV-2 RT-PCR assay.

Table 2 :
Hyperparameter values used for all the experiments learning rate λ batch size hidden dimension (Base) hidden dimension (disc) dropout epochs

Table 3 :
Clinical parameters included in each feature set

Table 5 :
Attacker results on the TEST set when trained and tested on features directly.This serves as the upper bound for information leakage Predicted Attribute Recall Precision F1-Score Accuracy Specificity PPV NPV AUC

Table 6 :
Percentage of majority class labels to the whole data for each demographic attribute

Table 7 :
Attacker results on the TEST set when trained and tested on the output generated by the encoder of the nonadversarial Base model Since we want to keep the attackers blind to the encoding strategy used by the adversarially trained model, in order to test the attackers on the ADV and ADV per models, we have to use the same threshold set during 10-fold CV on the encoded representation of the Base model.Therefore, we load the attacker which is trained on the non-adversarial encoder on the TRAIN set and test it on the ADV/ADV per model's encoder to predict the three attributes.

Table 8 :
Attacker results on the TEST set when trained on the encoder of the Base model and tested on the encoder of the ADV model Predicted Attribute Recall Precision F1-Score Accuracy Specificity PPV NPV AUC

Table 9 :
Attacker results on the TEST set when trained on the encoder of the Base model and tested on the encoder of the ADV per model

Table 10 :
Results of demographic cross-tests to assess the effects of adversarial training on generalisability across different subgroups of the dataset.

Table 11 :
Results for the models when trained on OUH and tested on the UHB dataset In this work, we introduced and tested two adversarially trained models for the task of predicting COVID-19 PCR test results based on routinely collected blood tests and vital signs.The data was processed in the form of tabular data.

Table 12 :
Results for the models when trained on OUH and tested on the BH dataset

Table 13 :
Results for the models when trained on OUH and tested on the PUH dataset of the ADV model, the protected attributes need not be demographic and theoretically any categorical feature of interest (or any feature that can be meaningfully quantised) can be used during training.Future work can also include experimenting with continuous features, in which the attacker would have to guess the features in a regression task.