ICAM-Reg: Interpretable Classification and Regression With Feature Attribution for Mapping Neurological Phenotypes in Individual Scans

An important goal of medical imaging is to be able to precisely detect patterns of disease specific to individual scans; however, this is challenged in brain imaging by the degree of heterogeneity of shape and appearance. Traditional methods, based on image registration, historically fail to detect variable features of disease, as they utilise population-based analyses, suited primarily to studying group-average effects. In this paper we therefore take advantage of recent developments in generative deep learning to develop a method for simultaneous classification, or regression, and feature attribution (FA). Specifically, we explore the use of a VAE-GAN (variational autoencoder - general adversarial network) for translation called ICAM, to explicitly disentangle class relevant features, from background confounds, for improved interpretability and regression of neurological phenotypes. We validate our method on the tasks of Mini-Mental State Examination (MMSE) cognitive test score prediction for the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort, as well as brain age prediction, for both neurodevelopment and neurodegeneration, using the developing Human Connectome Project (dHCP) and UK Biobank datasets. We show that the generated FA maps can be used to explain outlier predictions and demonstrate that the inclusion of a regression module improves the disentanglement of the latent space. Our code is freely available on GitHub https://github.com/CherBass/ICAM.


I. INTRODUCTION
Brain images represent a significant resource in the development of mechanistic models of behaviour and neurological/psychiatric disease as, in principle, they capture measurable neuroanatomical traits that are heritable, present in unaffected siblings and detectable prior to disease onset [1].For many complex disorders, however, these features of disease [2], [3] are subtle, variable and obscured by a back-drop of significant natural variation in brain shape and appearance [4], [5]; this makes them extremely difficult to detect.
Traditional approaches for analysis of brain magnetic resonance imaging (MRI) rely on group-wise comparisons between disease and control groups, whereby they compare all images in a global average space through performing image registration to a template.Voxel-based morphometry (VBM) is one such common method [6], which has been used in countless studies of development, ageing and dementia [7]- [10].More recent methods use Gaussian processes [11] to detect diseased brain tissue as outliers against a normative model, fit at each voxel.While these methods have significantly improved understanding of population average patterns of disease [7], they rely on spatial normalisation and therefore lose power at the cortex due to the impact of cortical heterogeneity [4], [12].This also means that they are not tuned to detect features of disease specific to the individual, which are nonetheless important for diagnosis or prognosis.
To address these limitations, recent studies have started to apply deep learning methods to brain imaging datasets.Deep learning is state-of-the-art for many image processing tasks [13], and has shown strong promise for brain imaging applications such as healthy tissue and lesion segmentation [14]- [17].Importantly, by design it can work independently of any requirement for spatial normalisation.However, deep learning methods do not, by default, return explanations of the reasoning behind their predictions, leading to them traditionally being referred to as "black box" models.
More recently, several approaches have been developed to make these networks more interpretable through identifying class-relevant features for a particular input.These include post-hoc saliency based methods, designed to detect which features of a specific image contribute most strongly to a class prediction.These typically analyse the gradients or activations of the network, with respect to a given input image, and include approaches such as Gradient-weighted Class Activation Mapping (Grad-CAM) [18], SHAP [19], DeepTaylor [20], integrated gradients [21], guided backpropagation (backprop) [22], and Layer-wise backpropagation (LRP) [23].In addition, perturbation methods such as occlusion [24] change or remove parts of the input image to generate heatmaps, by evaluating its effect on the classification prediction.
Such methods have now been applied in various medical imaging applications including in MRI and Positron Emission Tomography (PET) imaging datasets for Alzheimer's (AD) [25]- [28] and Multiple Sclerosis (MS) [29] classification.However, while in principle, these methods can be applied to detect features from individual images, the results are typically low resolution and noisy, which makes them hard to interpret.Often this leads to studies estimating a group average to aggregate results across individuals, and boost signal to noise to make stable population-wide inferences [27], [28].This loses individual specificity, and since these feature attribution (FA) methods often detect similar features in both healthy and disease groups, it is difficult to interpret the results.
In addition, since these FA methods are applied to a CNN following training, their power is limited by the constraints of the network they are applied to.In particular, applying a method post-hoc to a classification network has repeatedly been shown to be insufficient, as such networks need only focus on the most consistent or discriminative features, sufficient to accurately predict each class.This is particularly important in medical imaging where diagnosis and treatment rely on comprehensive capture of all features of disease [25]- [30].For example, when applying LRP and guided backprop to brain MRI, it was found that while they were able to detect homogeneous brain structures such as the hippocampus, they were unable to detect heterogeneous structures such as cortical folds [27], [28].
For these reasons, new approaches have recently been proposed which use generative models to translate images from one class to another [31], [32].These provide more comprehensive interpretation, since generative models must capture all relevant features of a population, in order to support synthesis of new images.In Baumgartner et al. [30] for example, a generative model (visual attribution or VA-GAN) was adapted to translate images classed as Alzheimer's (AD) to instead resemble Mild Cognitive Impairment (MCI).However, while this was able to detect more features of disease relative to post-hoc methods, it was still unable to identify much of the phenotypically variable changes (for example heterogeneous patterns of cortical atrophy); see related works for further detail.
To address these problems in [33] we developed ICAM (Interpretable Classification via disentangled representations and feature Attribution Mapping); this improved on the stateof-the-art FA methods (Table II) [18], [21], [22], [24], [30] by building on approaches for image-to-image translation [34] to perform feature attribution by disentangling classrelevant attributes (attr) from class-irrelevant content.Sharp reconstructions are then learnt through use of a Variational Autoencoder (VAE) with a discriminator loss on the decoder (Generative Adversarial Network, GAN).This not only allows classification and generation of an attribution map from the latent space, but also a more interpretable latent space that can visualise differences between and within classes.By sampling the latent space at test time to generate an FA map, we demonstrated its ability to detect meaningful brain variation (Fig. 1) in 3D brain MRI.
In this paper, we extend ICAM [33] by adding a regression module to enable the network to do regression as well as classification; while past FA methods have been predominantly implemented for classification networks, regression tasks are common in medical imaging, as most diseases lie on a continuous spectrum rather than a binary scale.The specific contributions in comparison to previous work [33] are as follows: 1 We describe the first framework to implement a translation VAE-GAN network for simultaneous regression and feature attribution.2 This supports the investigation of heterogeneous, continuous phenotypes such as brain ageing and dementia, specifically allowing for study of features attributed to outlier predictions, to give further insights into the model's reasoning behind these predictions.To our knowledge, this is the first method to provide meaningful explanations in 3D medical imaging regression tasks (with deep learning).3 We perform additional experiments, specifically, using the UK Biobank healthy ageing dataset to investigate the latent space for regression tasks.For example, we demonstrate that ICAM can provide explanations for subjects predicted as outliers, and even generate meaningful FA maps when interpolating the attribute latent space between 2 subjects of the same age group.Further, we provide evidence that our translation network consistently changes the class of our input images.Also, we use the dHCP dataset to show that ICAM can detect punctate white matter lesions in preterm babies, without explicit training.

II. RELATED WORKS
Over recent years, several deep generative approaches to image-to-image translation have emerged [31], [32], [34]- [37], where these have been applied to many different domains, including medical imaging [30], [38]- [40].Of these, Lee et al. [34], in particular, developed a domain translation network called DRIT (Fig. 2b), which constrains translation only to features specific to a class, by encoding separate class-relevant (attribute) and class-irrelevant (content) latent spaces, and employing a discriminator.
Separately, Baumgartner et al. [30] developed a conditional GAN-based approach, called VA-GAN, that uses domain translation for feature attribution in medical imaging.In this work, mappings M were learnt, which translated input 3D MRI brain scans classified with Alzheimer's disease (AD), towards more closely resembling scans with mild cognitive impairment (MCI): an intermediate state between healthy cognition and Alzheimer's disease (Fig. 2a).This resulted in sharp reconstructions and realistic difference maps that overlap with ground truth outcomes, where available.However, one constraint of VA-GAN is that the approach requires image class labels to be known a priori and, in the absence of a latent space, it can only produce a single deterministic output for each image, which limits the modelling of more heterogeneous features.
Accordingly, in our work ICAM [33], we extended upon the intuitions of these models to create one framework which allows simultaneous classification and feature attribution, using a more interpretable model.Compared to VA-GAN and DRIT++ [30], [34], ICAM uses 2 shared disentangled latent spaces, attribute and content, which encode for class-relevant and classirrelevant information, respectively.The use of a shared attribute (class) latent space allows the addition of a classification layer (and in this work, also a regression layer) to the network (Fig. 2c), which enables the network to do classification and visualisation of differences between and within classes.In addition, the rejection sampling module (Fig. 4) checks the class of a randomly sampled attribute latent vector (using the classification layer) to enable feature detection for a single subject during test time, as well as the analysis of the model's mean and variance by sampling the attribute latent space multiple times (in comparison to DRIT++ [34]).Other components of ICAM such as a FA map loss, L2 reconstruction loss, and a 3D attribute latent space also improve performance compared to VA-GAN and DRIT++ (as illustrated using ablation studies in [33]).Compared to previous works [18], [21], [22], [24], [30], [34] ICAM demonstrated considerably better feature detection in Alzheimer's and ageing datasets for both consistent (e.g.ventricles and hippocampus) and phenotypically variable (e.g.patterns of cortical atrophy) features of disease (see example in Fig. 1).
To allow further flexibility, here we extend ICAM with a regression module to enable its application to continuous prediction tasks (Figs. 7, 9).This also allows further exploration of the latent space (Fig. 8, 6), and outlier subject analysis (Fig. 5).

III. METHODS
The goal of the ICAM framework (see Fig. .11for full details of the network architecture) is to perform classification (or regression) with simultaneous feature attribution, by training a VAE-GAN to swap the classes of input images: x, y; changing only the features of each image which are specific to the target phenotype.The design of the network is outlined in Fig. 3, with specific details of the components described below: Fig. 3. Overview of method.An example of how ICAM performs classification/ regression with attribute map generation for 2 given input images x and y (of class 0 [brain slice without lesions] and 1 [brain slice with simulated lesions], respectively).Note that L D adv is applied to both real and generated images, and that not all losses are plotted (see Equation 5for full objective).

A. Content and attribute latent spaces
To achieve domain disentanglement, two separate latent spaces are encoded: a content encoder {E c } (latent space z c ), whose objective is to encode class-irrelevant (e.g.brain shape) information, and an attribute encoder {E a } (latent space z a ), whose objective is to encode all class-relevant features of disease.In both cases, the latent spaces are shared between classes or domains (i.e.{E c : x → C}, {E c : y → C}).Note, in what follows, we refer to domain or class interchangeably, in which the same meaning is implied.
For the content encoder {E c }, class information is driven out from the latent space {C} through training of a discriminator, {D c }, with class adversarial content loss: ( The goal of the content encoder {E c } is therefore to learn a representation whose domain cannot be distinguished by this discriminator (an approach first proposed by Lee et al., [34]).
Training is also supported through L2 regularisation, to prevent explosion of gradients, and Gaussian noise (added to the last layer of the encoder) to prevent the latent space vanishing.
For the attribute encoder {E a }, class information is driven into the latent space, by appending a fully connected classification layer (f C1 ) with binary cross entropy loss L E a BCE .In extension from our previous work [33], a regression module f C2 ) is also added, using another fully connected layer, trained using a smooth L1 loss (L E a 1;smooth ).The training of the attribute latent space is performed using variational inference, through application of a Kullback Leibler (KL) loss L z a KL .This places a Gaussian prior over the latent variables ensuring that the attribute latent space can be sampled, which allows translation of a single subject at test time, and the generation of mean and variance maps via the use of rejection sampling (see below).During training, the prediction modules f C1 and f C2 therefore work to encourage separation of the domains within this latent space {A}, to support meaningful image translation.Further, to encourage invertible mapping between the image and the latent space, a cyclic reconstruction loss is added, where a random attribute latent vector z a r is sampled from a Gaussian distribution, and reconstructed: Finally, disentanglement is further encouraged through rejection sampling of the attribute latent space during training by checking the class of a randomly sampled vector using the attribute encoder's classification layer (Fig. 4).Samples are rejected if they belong to the wrong class, which stabilises optimisation of translation by passing the generator samples of the expected class, and allows the generation of mean and variance FA maps at test time (see an example in Fig. 1).This visualisation approach has previously not been possible in other feature attribution methods, as they do not have a latent space with a classification or regression layer.Rejection sampling during training/ testing.Using ICAM, translation can be achieved using a single input image, in addition to translating between 2 images.(a) An input image is encoded into content and attribute spaces, and is passed through the classifier to identify its class (0 in this example).(b) Attribute space A is then randomly sampled until a random vector of the opposite class is sampled (class 1 in this case), by checking its class using the classifier.The newly sampled vector is passed to the generator along with the encoded content space to achieve translation between class 0 and 1.At test time, it is possible to sample the attribute latent space multiple times to get mean and variance FA maps.

B. Generation and feature attribution
Image translation and generation of disease (or FA) maps is supported through the training a generator {G}, which learns to synthesise images conditioned on both the content and attribute latent spaces (G : {z c x , z a x } → x), (G : {z c y , z a y } → ŷ), as well as to translate between these domains.It achieves this by swapping the content latent space: x , z a y } → v), which is made possible since this space is class invariant.Training of the generator is supported by optimisation of a domain discriminator {D} with two losses: a) a domain adversarial loss, L D adv which seeks realistic image generation by minimising the differences between translated (fake) and real images; and b) a binary cross entropy classification loss, L D BCE , which seeks optimal classification of the two domains following translation.
To visualise differences between the translated images {v, µ} and the original images {x, y}, we use a feature attribution map {M }.This aims to retain only class-related differences between two images (or two locations in the attribute latent space) by subtracting the content from the translated output ) which encourages {M } to reflect a small feasible map, which leads to a realistic translated image.
Finally, to further facilitate image generation, we apply an L1 and L2 loss to the reconstructed images {x, ŷ} (L rec 1,2 ), and the cyclically reconstructed images { xcc , ŷcc } (L cc 1,2 ).The cycle consistency term also allows training with unpaired images.This means the full objective function 1 of our network is:

IV. RESULTS
We evaluate the performance of ICAM through studies on three datasets to test our regression model on 1) an age prediction task (using healthy ageing data from UK Biobank), 2) prediction of birth age (or degree of prematurity) for the dHCP cohort of neonates, born between 23-44 weeks gestation and scanned at term equivalent age; and 3) prediction of the MMSE cognitive test score from the ADNI cohort.
A. Brain Age Prediction for the UK Biobank cohort 1) UK Biobank Dataset: In this first experiment, we used T1 MRI data from the UK Biobank [41], [42], a collection of brain imaging data of mostly healthy subjects between the ages of 45-80 years, to map heterogeneous patterns of brain ageing within individuals.T1 image processing (see also [41]) involved bias correction using FAST [43], brain extraction using BET [44] and linear registration to MNI space, using FLIRT [45].The input into the networks was resized to 128 × 160 × 128 voxels, and normalised in range 2) UK Biobank Results: In our previous work [33], we compared ICAM to VA-GAN for AD to MCI feature attribution, and found that ICAM was able detect more variability in the ventricles, cortex, and in the hippocampus.Further, we found that ICAM is better able to change the shape of relevant brain structures, while VA-GAN is only able to alter the pixel intensities.
In this work, to demonstrate more conclusively whether translation by ICAM and VA-GAN fully changes the image class, we trained an independent classifier (with the same architecture as the ICAM attribute encoder); this was trained on the UK Biobank age classification task, using the 3D T1 MRI images, and was tested on outputs generated by ICAM and VA-GAN (keeping to the same train and test examples as before).Results (Table I) show that classification with images generated by ICAM performs slightly worse than the real data (82.2%compared to 93.8%), which is to be expected in a complex 3D generation task.By contrast, VA-GAN outputs perform much worse (12.2%).Note that because VA-GAN can only translate in one direction, it has only 1 result in the table.
Next, we trained ICAM's regression layer to predict ages of the MRI brain scans.We found an age prediction error of Fig. 6.Biobank interpolation between and within groups.Here, we show an example of interpolation of the attribute latent space, with the corresponding FA maps for each vector.We overlay the interpolated FA maps on the original image, with red maps indicating an increase pixel intensity.We first encode each image to its attribute latent space (using E a ), and get an age prediction.We then linearly interpolate between these two spaces, and get an age prediction and FA map for each vector.We demonstrate that our ICAMreg model can successfully achieve interpolation between and within groups (i.e.within the aged group, and between the aged and young groups).We find that we get both smoothly interpolated FA maps, and interpolated age predictions between two subjects.The green arrows point to the cortex, and blue arrows point to the ventricles.2.20 ± 1.86 MAE (Fig. 7).In addition, we found that using our regression model, we can generate FA maps that explain the prediction results, including outliers.For example in Fig. 5 A), FA maps of two subjects, scanned at 77 years, and translated to resemble the younger age class, indicate greater age related changes for brain areas commonly associated with ageing (e.g.ventricular and cortical atrophy) in subject 1 (which is predicted as older -79) relative to subject 2 (which is predicted as younger -73).In B), 2 subjects from the young group are directly compared by translating between them.Here, subject 4 is predicted to be much older than their true age (predicted=56; true=47 years); whereas, subject 3 has predicted age 49, close to their true age (47).Evidence for the outlier prediction of subject 4 is presented through the translation, indicating the presence of larger ventricles, hippocampal atrophy and cortical shrinking (relative to the more typical presentation of subject 3).To further illustrate this correlation, we then performed a Pearson's correlation test (on test subjects) between the predicted or true ages and the mean of the corresponding FA maps.We found a stronger positive correlation (p < 0.0001) using the predicted ages rather than the true ages (0.306 vs 0.255, respectively, for old subjects).This indicates that predicted age is a better predictor of the features related to age, than the true age.This is because very old subjects need to be changed more drastically than younger subjects, and thus have higher mean FA maps.Finally, we investigated the improvement in separation of the model's latent space afforded through regression (Fig. 8, tSNE latent space comparison).This result is further underlined in Fig. 6, which shows clearly that interpolation between images of two different ages smoothly translates both predicted ages and feature attribution maps, for the generated images.

B. dHCP experiments 1) dHCP Dataset:
In this experiment we sought to demonstrate that ICAM can work well for prediction of challenging phenotypes, and detection of focal lesions, from relatively small, heterogenous, datasets.We used 699 3D T2 MRI scans from the dHCP [46], [47]: an open data set of multimodal brain scans acquired from preterm and term neonates.Here, preterm is defined as birth prior to 37 weeks gestational age (GA), where some preterm neonates were scanned twice: at birth and at term equivalent age.The data set includes 143 preterm images (class 1, mean gestation age at birth: 31.8 ± 3.85 weeks, mean post-menstrual age at scan: 41.0 ± 1.99 weeks) and 556 term controls (class 0, mean age at birth: 40.0 ± 1.27 week, mean post-menstrual age at scan: 41.4 ± 1.74).In this experiment ICAM was trained to classify between preterms and terms, and predict birth age from the term age scan (i.e.scans acquired  after 37 weeks post-menstrual).Examples were split into train, validation and test sets according to a 446:55:55 split (for term subjects) and 115:14:14 split (for preterm subjects).
Image pre-processing involved using diffeomorphic multimodal (T1w/T2w) registration (ANTs SyN) to estimate nonlinear transforms to a 40 week template from the extended atlas [47]- [49].This was necessary to allow the network to train, since without this step, it is suspected that the network was challenged by stark changes in image appearance, which are typically observed for neonatal cohorts, and caused by rapid tissue maturation; this was further confounded by the relatively small and imbalanced nature of the data set.For related reasons (to preserves age-related tissue maturational differences), images were rescaled to [0,1] by normalising across the intensity range of the entire group.Images were then brain extracted (using blurred masks), and CSF, ventricles and the skull were removed in order to focus the attention of the model on brain tissue differences between the groups.
2) dHCP Results: To compare against a baseline CNN network, we trained a CNN network (with same architecture as E a ) for regression using a smooth L1 loss, trained similarly to ICAM (see Appendix for details on training).Results are shown in Figs. 9 and 10.
We report a birth age prediction MAE of 0.806 ± 0.634 vs 1.525±1.160for ICAM reg and the baseline CNN, respectively (Fig. 9).In addition, we report a higher correlation coefficient for ICAM (Spearman correlation test, p < 0.0001, 0.873 and 0.695 correlation coefficient for ICAM and the baseline network, respectively.).
In our qualitative analysis we tested the ICAM reg model on previously unseen images of subjects with punctate white matter lesions (PWML), which are commonly seen in preterm babies [50], to test whether these would be detected by ICAM.In this experiment, ICAM is trained to predict birth age, rather than explicitly detect PWML.The results in Fig. 10 (yellow arrows pointing at the lesions) demonstrate that ICAM successfully and consistently detects these lesions.

C. ADNI experiments: Ground-truth evaluation of feature attribution maps
In the final experiment, we demonstrate the performance of ICAM's feature attribution against ground truth maps of disease progression estimated for AD to MCI conversion using the ADNI dataset and extend [33] to explore modelling regression of MMSE scores.
1) ADNI Dataset: The data used in this study was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu),first launched in 2003, and led by Principal Investigator Michael W. Weiner, MD [51].We used 1,053 3T T1 MRI images of AD and MCI patients with average age of 74.95 ± 8.1 (for AD subject) and 72.26 ± 7.9 (for MCI subjects).For AD/MCI test subjects the average age was 73.47 ± 7.2.We split the dataset into AD and MCI classes, with 257 and 674 volumes used in training, respectively.For testing, 61 subjects who converted from MCI to AD (i.e.paired data) were used.A further 61 conversion subjects were used for validation.Each of these subjects were scanned before and after conversion to AD.Using these two scans ground truth disease maps were generated through rigid alignment and subtraction.All disease and FA maps were masked to ensure that the returned normalised cross correlation (NCC) values reference brain tissue only.
2) ADNI Results: We compared our proposed approach against a range of baselines in our experiments.For a fair comparison, we use the same training, validation and testing datasets.In particular, we compared against Grad-CAM, guided Grad-CAM [18], guided backprop [22], Integrated gradients [21], Occlusion [24], and VA-GAN [30].These methods were applied to a simple 3D ResNet.We compared against two variations of our network, ICAM , and ICAM reg .See Appendix for full details on comparison methods.
Following the approach of Baumgartner et al. [30], experiments were performed to predict AD vs MCI classification.In extension to our previous work [33] Results in (Table II, Fig. 1) show that all versions of ICAM outperform VA-GAN, and other FA baseline methods (occlusion, integrated gradients, Grad-CAM, guided Grad-CAM and guided backprop) with respect to predicting patterns of brain atrophy, which compare well to the ground truth.Specifically, the NCC metric is increased (Table II) and the returned FA maps detect much greater proportions of atrophy (for cortex and hippocampus) and ventricular expansion, relative to the ground truth (Fig. 1).At the same time, regression of the MMSE score returns MMSE prediction of 2.82 ± 2.14 mean absolute error, which is comparable to the standard deviation of the MMSE scores of the test set (3.64).Importantly, we cannot compare to VA-GAN or other FA methods, as they cannot be normally applied to regression tasks.

V. DISCUSSION
In our previous work [33] we developed a novel framework, ICAM, for classification with feature attribution, and showed that it outperforms state-of-the-art feature attribution methods on classification tasks for individual subject feature detection.

TABLE II
ADNI experiments comparing baselines with ICAM.Networks are compared using normalised cross correlation (NCC) between the absolute values of the attribution maps and the ground truth masks.The positive NCC (+) compares the lesion mask to the attribution map when translating between class 0 (MCI) to 1 (AD), and vice versa for the negative NCC (-).Values reported are the mean and standard deviation across the test subjects.
To our knowledge, this is the first generative method that can provide regression predictions alongside a subject-specific explanation for the prediction.Specifically, in our experiments we sought to test whether, when trained on a large dataset (UK Biobank), ICAM reg can learn to disentangle its attribute latent space, so as to support meaningful interpolation between images, and generate subjectspecific explanations for outlier predictions.We then set out to show that ICAM reg can also work on much smaller and more heterogeneous datasets (dHCP, ADNI), while continuing to detect relevant features not explicitly defined during training, i.e. white matter lesions in the dHCP neonatal data; and predict clinically relevant phenotypes such as age at birth (dHCP) and cognitive test scores (ADNI).
Through experiments on UK Biobank, we demonstrate that ICAM is much better able to change the class of input images than VA-GAN [30] (Table I), which shows that while VA-GAN is only able to slightly modify the images by changing pixel intensities in order to generate FA maps, ICAM can drastically change the input image in order to change its class, and thus also generate more reliable FA maps.In separate experiments, we show that brain age prediction by ICAM reg (2.20 ± 1.86 MAE, Fig. 7) performs highly competitively relative to other deep learning methods trained on age prediction in UK Biobank (with reported test MAE scores of 2.14 ± 0.05 [55], and 4.006 [56]).Alongside the age prediction, we find that ICAM can provide meaningful and individual explanations for old and young classification, as well as outlier predictions (Figs. 5, 6).The meaningfulness of this prediction was emphasised through estimating Pearson's correlation, between both the real age and FA maps, and the predicted age and the FA maps.We found that predicted age was more strongly correlated with detected features, suggesting predicted age is a better indicator for FA map generation.We also demonstrated that our regression model has a more interpretable latent space than our previous model [33], through use of a tSNE comparison (Fig. 8), and demonstrated interpolation of the latent space between groups and within groups (Fig. 6).
In our dHCP experiments, we compared our regression model to a baseline CNN that has the same architecture as our attribute encoder and found that ICAM reg performs better than the baseline CNN on birth age prediction (Fig. 9).At the same time, the model returns subject specific FA explanations of the predictions, which consistently detect punctate white matter lesions, within individuals (a known feature of preterm birth, Fig. 10).These are detected despite stark changes in image intensity and appearance over this neonatal period.
For ADNI we show that ICAM reg can predict cognitive scores related to Alzheimer's (MMSE scores), and still provide meaningful FA map explanations that are better than baseline methods, though slightly worse than our classification model (disease map comparisons, Fig. 1, Table II).We note that this is a very challenging prediction task, as the MMSE scores are not a complete indicator for Alzheimer's, but rather an additional factor that clinicians can use to help with the diagnosis.This could lead to the loss function being less optimal, and thus resulting in reduced NCC scores.It is possible that NCC results could be improved through additional hyperparameter optimisation, or using combined metrics of cognitive tests that are also used for AD and MCI diagnosis, e.g.Addenbrooke's Cognitive Examination-Revised (ACE-R) and Montreal Cognitive Assessment (MoCA) [57].
Finally, there are still several challenges in using ICAM, including applying it to small datasets with imbalanced classes, e.g. in the dHCP experiments we observed that the FA maps were noisy relative to results using the UK Biobank.This is not unexpected given the vast difference in sizes of the dataset; however, it confounds interpretation of the FA maps.In addition, we found that while ICAM worked well for rigidly aligned ADNI and Biobank datasets, it did not work well in early experiments for the dHCP dataset when they were rigidly aligned.We instead used non-linearly aligned data in the experiments shown in this paper.This could be due to vast differences in tissue intensities and shape for neonates, as they change very rapidly at an early age.Furthermore, we found that the latent space is not completely separated even in larger datasets (Fig. 8).These challenges could be addressed in future work via application of GAN augmentation techniques [58] to increase training data for smaller datasets, and latent space clustering strategies to further encourage disentanglement of rare classes [59].This would also help in training imbalanced datasets.

A. Network Architecture
Our encoder-decoder architecture for a 3D input is shown in Fig. .11.The architecture for a 2D input is the same, only using 2D convolutions and a 2D attribute space.Here, an input image is encoded using 2 shared networks, the attribute encoder E a , and the content encoder E c , and then is reconstructed or translated (to another class) using the generator, G.
The key components of the attribute encoder include using down ResNet blocks (with average pooling, and leaky ReLU activation) for encoding the input image into a relatively large 3D latent space of size 8 × 10 × 8 (in the 3D case), as opposed to a 1D vector, which is commonly seen in Variational Autoencoders (VAEs).We also added a fully connected layer (f C1 ) to the attribute latent space to enable classification.In our regression model ICAM reg , we added an additional fully connected layer (f C2 ) to output a prediction.In early development, we found that using a 1D vector in the latent space was insufficient for encoding the required class information for brain imaging, and observed that some class information was instead encoded in the content encoder, which is meant to be class invariant.Using a sufficiently large 2D or 3D vector (depending on the input) helped with addressing this problem.
The goal of the content encoder is to encode a classirrelevant space, which allows translation between classes.The key components of the content encoder are using 2 down convolutional blocks (with instance normalisation, and ReLU activation), followed by 4 basic ResNet blocks (with instance normalisation, and ReLU activation), and finally a Gaussian noise layer.The basic ResNet blocks aids the encoding of a class-irrelevant space, and the Gaussian layer prevents the space from becoming zero.
Our generator takes in as input the content and attribute latent spaces.The attribute is first upsampled (×4, with nearest neighbors) to the same size as the content latent space, concatenated, and then combined using several basic ResNet blocks.Finally, we use deconvolutional blocks (transpose convolution with kernel size of 4, followed by average pooling, layer normalisation [60], and a ReLU activation) to upsample to the original input size.

B. Comparison methods -extra details
We compare our proposed approach against a range of baselines in our experiments.For a fair comparison, we train and test all methods on the same training, validation and testing datasets.
Grad-CAM, guided Grad-CAM [18], guided backpropagation (backprop) [22], integrated gradients [21] and occlusion [24]: We trained a simple 3D ResNet with 4 down ResNet blocks, and a fully connected layer for classification.We then used the captum library [61] implementation of Grad-CAM, guided Grad-CAM, guided backprop, integrated gradients and occlusion to generate the feature attribution maps for each method.
Guided backprop [22] is a gradient-based method that computes the gradients with respect to an input image.More specifically it determines which pixels affect the prediction the most, by propagating only positive error signals (i.e. by applying ReLU to to the error during the backward pass).
Grad-CAM [18] is gradient-based saliency method that computes the gradients of the target output with respect to the final convolutional layer of a network.The layer activations are weighted by the average gradient for each output channel and the results are summed over all channels to produce a coarse heatmap of prediction importance for each class.Guided Grad-CAM is simply the combination of the results of Grad-CAM and guided backprop.
Integrated gradients [21] is another method of analysing the gradient of the prediction output with respect to features of the input.It is defined as the integral of the gradients along the straight line path from a given baseline to the input image.A series of images are interpolated between the baseline (e.g.matrix of 0s) and the original image, and the integrated gradients are given by the integration of the computed gradients for all the images in the series.
Occlusion [24] is a perturbation-based method that involves replacing portions of an image with a block of a given baseline value (e.g.0), and computing the difference in output.A heatmap is formed using the difference between the output probability attributed to the original volume and the probability computed for the occluded volume, for different positions of the occlusion block across the input image.
Grad-CAM was implemented on the last convolutional block of the ResNet, with a size of 4 × 5 × 4, and was up-sampled to the input size for visualization.For the implementation of integrated gradients we considered a baseline volume with constant value of 0, and the integral was computed using 200 steps.Occlusion was implemented using occlusion blocks with value 0, size 10 × 10 × 10 and stride 5.
VA-GAN [30]: We used the VA-GAN network for feature attribution, as described in the original paper.
Model selection: For VA-GAN and ICAM the last model is selected in the Biobank experiments.For all our regression experiments the best model was selected based on the best MAE score on the validation dataset.In all other experiments, the models selected are based on the best model result on the validation dataset, using the NCC score.For Grad-CAM, guided Grad-CAM, guided backprop, integrated gradients and occlusion, as the FA maps are only generated after a network is trained, we could not select a model based on its performance with the NCC score, during training/ validation.We instead selected the best model based on the accuracy classification score on the validation dataset, to prevent the effect of overfitting.

C. Training details
We used PyTorch [62] Python package in all of our deep learning experiments, and trained using NVIDIA TITAN GPUs.We trained our networks in a similar fashion to Lee et al. [34].During training in each iteration, the content discriminator is updated twice, followed by the update of the encoders, generators, and domain discriminators (i.e. each training iteration uses 3 batches to perform these updates).For each update of the generator, an input is selected for each class (e.g. 2 inputs including class 0 and 1) to achieve translation.In addition, each input is encoded and translated to the opposite class by randomly sampling the attribute latent space, and obtaining an appropriate class, using the classifier.
In the UK Biobank experiments, we trained all networks for 50 epochs.In the ADNI experiments, all networks (including VA-GAN) were trained for 300 epochs.In the ADNI experiments, because we had a limited dataset, we further refined ICAM with updated lambdas (λ rec = 10, and λ BCE = 20) for another 200 epochs.We could not refine VA-GAN any further because generator and discriminator losses went to zero during training, often after 150 epochs.In our dHCP experiments, we used the same hyperparameters as in our Biobank experiments, but trained the networks for 1000 epochs.
Regression models: For training our regression models (including the baseline CNN), we used a pretrained network using the classification model (training as described above).All networks were then retrained with the same hyperparamaters as before, with the addition of the regression loss.In our dHCP experiments we used a model first pretrained on the biobank dataset.
Baseline methods: For training VA-GAN, and DRIT, we used the default parameters as provided in the original papers and publicly released code repositories.For Grad-CAM, integrated gradients, and occlusion, the classifier network was trained with learning rate of 0.0001, SGD with momentum of 0.9, for 50 epochs, and using a weighted BCE loss to account for class-unbalanced training data.Since the model converged by 50 epochs, we did not train for any longer.
For training the baseline-CNN for the dHCP experiments, we used the same architecture as our attribute encoder, and trained using smooth L1 loss, with Adam optimiser (learning rate = 0.0001, betas = [0.5, 0.999]) for 1000 epochs.

Fig. 1 .
Fig. 1.ADNI comparisons of Feature Attribution (FA) maps.ICAM is the first known method able to generate variance and mean FA maps in test time, and shows good detection of the ventricles (blue arrows), cortex (green arrows), and hippocampus (pink arrows) when compared with the ground truth disease map.The top 2 baseline methods are shown here, and perform sub-optimally in comparison to ICAM.

Fig. 2 .
Fig. 2. Comparison of domain mapping methods.(a) VA-GAN translates images of domain x to y.(b) DRIT can translate between domains x and y through a shared content space C, and separate attribute spaces A x and A y .(c) ICAMreg uses shared content C and attribute A spaces to translate between domains, which allows classification f C 1 and regression f C 2 layers to be applied to the attribute space A.

Fig. 4 .
Fig. 4.Rejection sampling during training/ testing.Using ICAM, translation can be achieved using a single input image, in addition to translating between 2 images.(a) An input image is encoded into content and attribute spaces, and is passed through the classifier to identify its class (0 in this example).(b) Attribute space A is then randomly sampled until a random vector of the opposite class is sampled (class 1 in this case), by checking its class using the classifier.The newly sampled vector is passed to the generator along with the encoded content space to achieve translation between class 0 and 1.At test time, it is possible to sample the attribute latent space multiple times to get mean and variance FA maps.

Fig. 5 .
Fig. 5. UK Biobank regression results.Here we show examples of the FA map results of our regression model ICAMreg on 4 subjects, with actual age, alongside a predicted age by ICAMreg, where FA corresponds to translation of the scans from A) old to young (using rejection sampling to generate mean FA maps) and B) young to young (by translating between subject 3 and 4 to generate single FA maps).In each case FA maps return explanations for outlier predictions for age-matched subjects.
[0, 1], per subject.For our classification experiments we used 11,735 MRI volumes, with the young subjects (45-60 years, average age 54.6 ± 3.4 years) separated into training, validation, and testing sets with 6706, 373 and 372 in each.The older subjects (70-80 years, average 1 See Appendix for full details on training and λ values.All of the important components of the network were evaluated through an ablation study in Bass et al., [33].age 73.0 ± 2.2 years) were separated into training, validation, and testing sets with 3856, 214 and 214 in each, respectively.For our regression experiments, we used a larger dataset (i.e.all available subjects) of 21,388 subjects, between 45-80 years old.We also split the subjects into older and younger groups, as ICAM requires a minimum of 2 classes for training.Here, subjects corresponding to the young class (45-65 years, average age 57.6 ± 4.8 years) were separated into training, validation, and testing sets with sizes 10715, 595 and 595 respectively.The older subjects (65-80 years, average age of 70.0 ± 3.3 years) were separated into training, validation, and testing sets with sizes 8535, 474 and 474, respectively.

Fig. 8 .
Fig. 8. Biobank tSNE comparison.Here we show a comparison of a tSNE plot for ICAM and ICAMreg for a breakdown of ages.We find that ICAMreg has better separation in the latent space compared to ICAM .

Fig. 9 .
Fig. 9. dHCP birth age prediction on the test dataset.The age prediction MAE is 0.806 ± 0.634 and 1.525 ± 1.160, for ICAM and the baseline network, respectively.The Spearman correlation coefficient is 0.873 and 0.695 (p < 0.0001), for ICAM and the baseline network, respectively.

Fig. 10 .
Fig. 10.dHCP results.Here we show detection of punctate white matter lesions (yellow arrows) on previously unseen images by ICAMreg.
, ICAM's regression module was used to predict the MMSE score, a test used to help diagnose Alzheimer's disease (Scores of 25-30 is considered normal cognitive, 21-24 is mild dementia, 10-20 is moderate dementia, and 9 or lower is severe dementia).Here, MCI and AD training sets had an average MMSE score of 27.75 ± 2.61 and 23.00 ± 2.63, respectively.The MCI and AD test and validation sets had a similar average MMSE score of 26.80 ± 2.95 and 23.93 ± 4.33, respectively.

TABLE I
Biobank generation experiment comparing accuracy score for classification (young vs old) of real, ICAM generated, and VA-GAN generated data.Note that because VA-GAN can only do old to young translation, it has only 1 result in the table.
Fig. 7. Biobank age prediction on the test dataset using ICAMreg.The age prediction error is 2.20 ± 1.86 MAE.