Robust Brain Age Estimation based on sMRI via Nonlinear Age-Adaptive Ensemble Learning

 Abstract — Precise prediction on brain age is urgently needed by many biomedical areas including mental rehabilitation prognosis as well as various medicine or treatment trials. People began to realize that contrasting physical (real) age and predicted brain age can help to highlight brain issues and evaluate if patients’ brains are healthy or not. Such age prediction is often challenging for single model-based prediction, while the conditions of brains vary drastically over age. In this work, we present an age-adaptive ensemble model that is based on the combination of four different machine learning algorithms, including a support vector machine (SVR), a convolutional neural network (CNN) model, and the popular GoogLeNet and ResNet deep networks. The ensemble model proposed here is nonlinearly adaptive, where age is taken as a key factor in the nonlinear combination of various single-algorithm-based independent models. In our age-adaptive ensemble method, the weights of each model are learned automatically as nonlinear functions over age instead of fixed values, while brain age estimation is based on such an age-adaptive integration of various single models. The quality of the model is quantified by the mean absolute errors (MAE) and spearman correlation between the predicted age and the actual age, with the least MAE and the highest Spearman correlation representing the highest accuracy in age prediction. By testing on the Predictive Analysis Challenge 2019 (PAC 2019) dataset, our novel ensemble model has achieved a MAE down to 3.19, which is a significantly increased accuracy in this brain age competition. If deployed in the real world, our novel ensemble model having an improved accuracy could potentially help doctors to identify the risk of brain diseases more accurately and quickly, thus helping pharmaceutical companies develop drugs or treatments precisely, and potential offer a new powerful tool for researchers in the field of brain science.


I. Introduction
The increasing aging population presents many acute challenges globally in the 21st century, with a profound impact on all aspects of life. Amongst them, brain function decline and neurodegenerative diseases in the aging population result in serious economic, medical, and societal issues to our society [1-This paper was submitted on xx/xx/2021 for review, and accepted on xx/xx/2021. This work was supported in part by the EPSRC grant 2]. In life science and biomedical domain, methods of predicting and assessing the risk of age-related neurodegeneration in the elderly and related treatments to reduce and reverse the process are one of the fundamental research topics [3]. Although brain aging is a natural process, there are individual differences in the changes of brain volume, cortical thickness, and white matter microstructure [4][5][6]. In addition, the degree of deviation in brain aging trajectory for a particular person from the average trajectory of healthy brain aging has been shown to reflect the individual's future risk of developing neurodegenerative diseases [7][8]. Therefore, building models based on the characteristic patterns of brain aging within neuroimaging data and detecting the aging trajectories of individual brains offer a new perspective for studying brain aging differences [3].
The accurate prediction of brain age has not only critical scientific significance but also extensive clinical value [9]. Research has shown that along with the increased difference between the predicted brain age and the biological age, the risk of mortality or physical problems increases, together with the increased likelihood of early death [10]. Brain age estimation can diagnose patients with Alzheimer's disease [67][68], psychiatric disorders [69], physical problems [70] and traumatic brain injuries [71] according to accelerated brain age. This method can also predict the conversion from mild cognitive impairment to Alzheimer's disease in the future [72]. This approach can not only diagnose disease, but also provides the basis for good living habits; for example, Steffener et al. [8] proved that high education and physical exercise can help make brain activity and keep young. Luders et al. [7] reported that the brains of people who meditate regularly are more active than those of normal people of the same age. Also, the work Erus et al. [73] suggested that accelerated cognitive development is an important factor leading to accelerated brain development in young subjects. Cheng et al. [74] used a two-satge 3D convolutional network for brain age estimation. He et al. [75] utilized a globallocal vision transformer to achieve a good accuracy on brain age estimation. Peng et al. [76] exploited a simple lightweight fully convolutional network to address the challenges on the brain age estimation. These single-method models have demonstrated their merits on this challenging topic.
Predicting brain age can also play a meaningful role in medical development, with clinical trials being an important part of clinical science [12][13]. At present, many pharmaceutical companies across the world are committed to the research of medicine for the treatment of age-related diseases. However, the effect of these medications will not be obvious in the short term. Even experienced doctors cannot judge whether the drugs have played a role, so the curative effect may take several years to follow up. This problem makes it difficult for pharmaceutical companies to collect medical data, which restricts the research and development of aging diseases medicine [13]. Nevertheless, brain age estimation provides an alternative direction to address the problem in observing the effects of drugs, by the changes of predicted human brain age [13]. In recent years, deep learning has been the main approach for the estimation of brain age, as it can capture subtle changes in the brain through hierarchical feature representations in an end-to-end way [3]. Existing research has shown that the difference between predicted brain age and the participant's actual age is small for healthy people [3], [7], [10]. The development of deep learning in brain age estimation enables pharmaceutical companies to conduct follow-up investigations from the beginning of patients taking drugs, so as to know the effect of drugs in time and acquire patients' data at fast pace.
The process of brain condition detection by brain age estimation has basically two steps. Initially, we need to develop a model that can determine the biological age of a healthy person with this state of the brain based on brain neuroimaging data [3]. This model can be determined by training deep learning models on healthy samples. Subsequently, we would compare the predicted age and real age. If a sample's predicted brain age is older than his real age, it represents poor brain health. It is worth noting that the training data must be collected from healthy people, because the age predicted by the model would show the age of a person generated from healthy people under similar brain conditions. For example, if the training data contains samples with diseases, the predicted age will not represent the age that the patient should be at in this brain condition, consequently, the comparison between the predicted age and the true age is meaningless.
A frequently used method for brain age estimation is making a classification or regression for brain images [1], [14]. There exist several machine learning methods for this purpose. Previously, Huang et al. [42] applied CNNs in brain age estimation, and notably, Cole et al. [43] implemented a 3D CNN, which is trained on T1-weighted MRI, to predict brain age and achieved promising results. Our initial motivation is to identify which of the various models for predicting brain age works best, and find the most suitable model for each age stage. Further, we aim to establish a novel ensemble model by combining different independent models together, and benchmark with single independent model on brain age prediction.
The major innovation of this work relates to a novel non-linear age-adaptive ensemble model (nl-AAE), which is considered as a nonlinear function in the combination of multiple independent models. The age-adaptive ensemble model, with the advantages of multiple independent models, can be fully learned over the characteristics of the brain of each age group, thus achieving high accuracy of the predictions. Here, we have considered four different independent models, including a GoogLeNet, a ResNet, an SVR, and a self-designed CNN model. The nonlinear age-adaptive learning encoded in our ensemble model utilizes the changed weights of the constituent models based on the age of the sample. The combined model is adaptable to age changes nonlinearly and learns the brain characteristics over different ages.
We have tested our nl-AAE models using the PAC 2019 competition dataset, and benchmarked our models with four constituent algorithms. Such integrated model has great potential to provide a highly accurate measure of brain health for clinical trials of neuroprotective therapies, screening groups of people at-risk of poorer cognitive aging, and provide mechanistic insights into the downstream consequences of different agingrelated diseases. Figure 1 shows the contributions of our AAE model, it has higher accuracy in predicting brain age compared to other classical methods, which can make it has better performance in Alzheimer's detection, traumatic brain injury detection, Schizophrenia detection, medicine testing and so on.
The remainder of the paper is organized as follows. Section II reviews the existing relevant work. Section III gives a preliminary overview on existing models. Section IV presents our proposed nonlinear age-adaptive ensemble model. Section V shows the experimental results. Finally, Section VI concludes the whole paper.

II. RELATED WORK
Previously, brain age prediction is conducted using feature extraction with brain MRIs followed by a classification or regression analysis. However, useful information might be lost since the manually engineered features are not likely to explicitly describe the relevant information on brain age. To be specific, pre-processing the image subjectively requires additional assumptions at the various stages during the pre-processing pipeline. However, these assumptions can hardly be satisfied, which can result in a model error [17][18]. Besides, extracting features manually is a time-consuming task. In practice, decisions should be made within a few minutes to avoid the delay of treatment in the application. The above issues are the main Figure 1 The contributions of our AAE model reasons why brain age prediction was not widely adopted.
The emergence of deep learning models provides a possibility to address those issues. Convolutional neural networks, which are widely adopted in image classification tasks, have shown great potential in visual feature extraction. The astonishing learning ability and automated decision-making pipeline of CNN models make it a perfect alternative for brain age estimation that can improve the efficiency for medical consultation, clinical diagnosis as well as treatment decision making [2], [19][20]. At present, deep learning not only has successfully developed in the field of diagnosing schizophrenia [21], ADHD [22], autism [23], and Alzheimer's disease [24], but also helps to identify new biomarkers [25] and formulating new hypotheses [26].
Although deep learning has achieved success in biomedical fields, there are still several remaining challenges in terms of technology and practical applications [3], [27][28][29]. For example, deep neural networks require large sample sizes for fitting models, while neuroimaging datasets often have relatively smaller capacities [30][31]. The data scarcity has restricted the ability to learn image features effectively, and the problem of overfitting can also appear. Compared with 2D neuroimaging data, 3D images require larger GPU memory, which means that successful models in 2D data are not necessarily feasible in 3D scenes (e.g., ImageNet classification [32][33]). Besides, further improving the model accuracy is a long-term objective in deep learning research. Literature shows that deep learning models fail to achieve the best result for certain tasks [34][35]. Another open question is how to choose the suitable complexity of the model. The no free lunch theorem suggests that task specified design is necessary for to achieve better result.
Ensemble modeling provides a solution in choosing the best predictive model in machine learning. An ensemble model combines the prediction from several models to make the final prediction, by which the overall performance of the model is increased [36]. Several strategies of combining the prediction from individual models were proposed, such as averaging, voting, to improve performance. As early as 1785, Marquis de Condorcet argued that if the probability of each independent voter being correct is above 0.5, then the addition of more voters increases the probability of the majority vote being correct [37], which is a strong evidence to show that ensemble models have better performance than individual models.

A. Dataset
The dataset we used here is based on [3], including 2641 healthy individuals' brain sMRIs and information of samples such as their age and gender. The sample age ranged from 16 to 90 years old, the average age of samples is 35.8 years old, and the standard deviation of age is 16.2 years. Of the participants, 53% are females, and 47% are males. The average age of females is 37 years old, and the standard deviation of females' age is 17.2 years. The average age of males is 34.6 years old, and the standard deviation of males' age is 14.9 years. The age distribution of the data is shown in Figure 2. Here we remark that the dataset has an unbalanced distribution, with fewer data samples in the aged population and more data samples toward younger population. Cole et al. [43] shows the details of the samples.

B. Data features
In our project, we use two different kinds of data as input for the models. One is Gray Matter and White Matter Maps, the other is Surface-Based Processing of Gray Matter.
The Gray Matter and White Matter Maps were distributed by the PAC organization. The pre-processing of nonlinear registration for the brain sMRIs used MNI152 space. Then, these images were segmented using DARTEL and SPM12 as different tissues, such as Gray Matter and White Matter so that each tissue has a map and the map was smoothed by using a 4mm kernel. For more details, please refer to [43]. This kind of data is used for the input of self-defined CNN, ResNet and GoogLenet in our project.
As for Surface-Based Processing of Gray Matter, we extract the vertex-wise measurements of cortical thickness and surface area based on the sMRIs by using FreeSurfer 6.0 [45]. As such. a vertex-wise feature of seven subcortical nuclei thickness and surface was also extracted by the ENIGMA-shape protocol [46][47]. After these data-pre-processings, we get nearly 650,000 gray matter measurements per individual. This processing method was used by Baptiste Couvy-Duchesne et al. [48], and they proved that these processed data have a max association with age. This kind of data is used for the input of SVR in our project.

C. Basic Independent Models
Recent progress on deep neural networks [51][52][53] has greatly enlightened the applications of medical biometrics [54][55][56] for a) Age distribution b) Age over sex variance Figure 2 Age and sex distribution of the MRI brain dataset health diagnosis, particularly toward understanding how neurons in brain function [57] and dysfunction [58]. In this work, we aim to exploit deep learning techniques with ensemble approaches for our brain age estimation task.
Before we describe the architecture of the ensemble model, we first introduce the basic blocks of our proposed model. According to previous research, each of the models has remarkable performance in estimating brain age [3,[42][43]. 1) Convolutional Neural Networks: The CNN we have built was implemented using Keras with TensorFlow as backend.
For the first 5 consecutive blocks, each of them consists of a 3×3×3 3D convolution layer, a Batch Norm layer, an ELU activation and a Max Pooling layer. As for the 6th block, it contains a dropout layer and the 7th block contains a fully connected layer. The input data is a 3D volume image of 121×145×121 pixels, and the convolutional part of this model reduces this image to 128 feature maps of size 4×5×4. The finally fully connected layer reduces the feature maps down to the numbers that stand for predicted ages. We train this model on two channels by using the concatenation of gray matter and white matter. The loss function is MAE, and the optimization machine is Adam. The learning rate is 0.001, the decay is 10 −4 , β1 is 0.9 and β2 is 0.999. 2) GoogLeNet (Inception V1): This structure is used for brain age estimation in [44]. It is composed of a stem network, two inception modules, a max-pooling layer, five inception modules (note that two of them are connected to an auxiliary regression each), a max-pooling layer, two inception modules, an average pooling layer, a dropout layer, and a fully connected layer. Compared to Google's Inception V1, it changes the softmax layer to a fully connected layer as the final layer so that this task becomes a regression task but not a classification task. The convolutional filter in this model consists of an input layer, a convolutional layer, a batch normalization layer, a ReLU activation and an output layer. The stem network consists of an input layer, a convolutional filter, a max-pooling layer, two convolutional filters, a maxpooling layer and an output layer. In the inception modules, there is an input layer, seven convolutional filters, a maxpooling layer, a concatenation layer and an output layer. The auxiliary regression, which is used for mitigating the vanishing gradient problem, is composed of an input layer, an average pooling layer, a convolutional filter, a fully connected layer, a ReLU layer, a dropout layer, a fully connected layer and an output layer.
The input data of this model is 3D maps of gray matter density with 121×145×121 pixels, and the output is the predicted age. The loss function is MAE, and we use Adam as the model's optimization machine, the learning rate is 0.0001 and the batch size is 8.

3) ResNet:
The parameters of the ResNet we built are similar to the above CNNs built by ourselves. The difference is that the ResNet includes residual blocks, while our self-built CNNs do not have these blocks. The ResNet consists of 5 residual blocks, each followed by a max pooling layer of kernel size 3×3×3 and stride 2×2×2, and one fully connected block. The residual block is a combination of layers which are repeated twice inside. This combination consists a 3D convolutional layer with stride 1×1×1 and kernel size 3×3×3, a batch renormalization layer, and an ELU activation function. It also adds the signal feeding into the residual block to the output of a layer close to the end of the block. The fully connected block is a multilayer perceptron which has one hidden layer. The input layer has 128×4×5×4=10240 neurons, there are 256 neurons that use an ELU activation function in the hidden layer (FC 1), and there is a single neuron in the output layer. A dropout layer, whose keep rate is 0.8, is employed following the hidden layer. And finally the output layer (FC 2) performs a linear regression on the hidden layer features. We use 3D maps of gray matter density as input data, and MAE as loss function. The model is optimized using Adam with a learning rate of 0.001. We set the decay is 10 −4 , β1 is 0.9 and β2 is 0.999. 4) SVR: SVM is a classical machine learning model which construct a set of hyperplanes that separate the feature space. It was first used for the binary classification task, and then it was updated to the regression version called SVR, which can solve the regression tasks. In this work, we use SVR with a radial basis function kernel, the input data is Surface-Based Processing of Gray Matter, which has nearly 650,000 gray matter measurements per individual, and the output is the sample's predicted age. The implementation we used is package scikit-learn in Python. The number of epochs we set is over 300 to keep the models with the highest accuracy. With the above four models, we will investigate these models over different age groups and establish an ensemble model based on these independent models.

A. Fundamentals of Ensemble Learning
Ensemble learning completes learning tasks by constructing and combining multiple learners. It is also referred to as a multiclassifier system or committee-based learning. The general structure of an ensemble learning model is to generate a group of individual learners first, and then combine them with a certain fusion strategy. In general, the generalization performance of ensemble learning is better than the individual learners [59][60].
Bayes optimal classifier is based on Bayesian decision theory. It is an ensemble of all the hypotheses in the hypothesis space [61]. Until now, it is still a popular supervisor learning for the problem of classification.
Boosting is an algorithm that can boost weak learners to strong ones [62]. It first trains a base learner from the initial training set, and then adjust the distribution of training samples according to the performance of the base learner, so that the misclassified samples will receive more attention in the followup. This process is repeated until the number of base learners reaches the pre-specified value.
Bagging is the most famous representative of parallel type ensemble learning. Its principle is based on bootstrap sampling [63]. For a subset with capacity of m is constructed with sampling with replacement. For an ensample model with T base learners, T subsets are generated to train each of the learners. The prediction is then made by fuse the results from base learners. Random Forest is one of the most famous extended variants of Bagging.
BMA, BMC and Stacking represent different model combining strategies. BMA [64] uses the weighted average method to combine the models where the weight of each model is equal to the posterior probability of the model. BMC [65] is an algorithmic correction to BMA. Instead of sampling each model in the ensemble individually, it samples from the space of possible ensembles.
Stacking first trains the initial learner from the initial data set, and then generates a new data set for training the secondary learner. In this new data set, the output of the primary learner is used as the sample input feature, and the initial sample's label is still used as the sample label. In general, the secondary learner always uses the logistic regression model [66]. Stacking is usually provide better robustness than BMA and BMC, since BMA and BMC are sensitive to model approximation errors.

B. Proposed Nonlinear Age-Adaptive Ensemble Model
Through extensive experimentations, we found that the performance of all the models is influenced by the true age of samples (see part V). This indicates that some models are suitable for predicting young samples, and some models are suitable for older samples. In order to improve the prediction accuracy, we built a model named nonlinear age-adaptive ensemble model. Different from the stacking strategy, our ensemble model can adjust weights of inside independent models according to the ground truth label.
The proposed framework is shown in Fig.3. First, we used different independent models as the initial learners. In our work, we employed four models: SVR, ResNet, GoogLeNet, and our own CNN. We used them to predict brain age, recorded the prediction results of these independent models, and then used these results as input values for the ensemble model. Thereafter, we divided the sample into many groups by age, and in each group, there is an ensemble model which is combined by the independent models. Our model adopts a novel method to decide the weights of independent models. In each age group, we set a loss function, its equation is shown below: H is an m×n-dimensional matrix, m is the number of samples, n is the number of independent models,  is an n×1-dimensional vector, which is In this work, we tested two optimizers in minimizing the loss function. The gradient descent updates the weights by moving towards the steepest direction: where  is learning rate. The ordinary least squares can also accomplish this task, it can be described as below: LIST I. List of pseudocode on our brain age estimation Figure 3 The schematic view of our nonlinear age-adaptive ensemble method on brain age estimation.
It is worth noting that the results from these two methods are the same after experiments.
For each age group, we then obtain a series of suitable weights for independent models, so that the model is able to adaptively combine the results from base models for different age groups. Formally, the age-adaptive model can be expressed as: Here, "age" represents the set of different age ranges, x is input data, A  is the value of  at age A, and A p is the parameters of independent models at age A.
The process of predicting the brain age of the sample is as follows. First, each independent model predicts the brain age of the sample, we record them as 12 Figure 3 shows the estimation process of our ensemble model, and List I gives the list of the pseudocode of our method.

A. Experimental Results
The test method used is based on a 5-fold-cross-validation strategy, with the mean MAE and Spearman correlation between the predicted age and the actual age of them as final results.
The details of MAE for each model in a 5-fold-crossvalidation are shown in Table I. Min means the minimum MAE, Max represents the maximum, and Mean is the average of 5 results in 5-fold-cross-validation while Std relates to the standard deviation of results, which represents the degree of dispersion of the results of the model. For the Std, Table I shows that the 6-layer CNN has the greatest value of 0.22, followed by SVR with 0.21. For the ResNet architecture, Std value is 0.08, which is the minimum of all models. OE and MedianE have the same value of 0.13, nl-AAE-2 and nl-AAE-c are 0.12, Goog-LeNet and nl-AAE-6 are 0.11, and MeanE is 0.1. These results suggest the results of 6-layer-CNN have the largest degree of dispersion. In contrast, the results of ResNet have the smallest degree of dispersion, which means the predictions of ResNet are more stable.
Our test results are shown in Figure 4 and Table II. In Table  I, we first present the results of 4 independent models: SVR, 6-layer self-built CNN, ResNet and GoogLeNet. The achieved mean errors in term of years are 5.15, 4.33, 3.99 and 3.88 years in age, and the Spearman correlation between predicted age and real age are 0.83, 0.89, 0.88 and 0.89, respectively. We then combine the prediction of base learners together using the median and mean of their predictions. The result shows that the median based ensemble model has a larger mean error than the mean value based ensemble model. The possible reason is that a median-based ensemble model actually only chooses one model each time, and ignores other models that are not the median outputs.
Following these preliminary tests, we investigate the nonlinear age-adaptive model. First, we used only one linearly approximated ensemble model to be applied to the data sample of all ages. In other words, we first test the performance of ensemble model without age-adaptive. Compared with naive fusing strategies, the model performance is marginally improved with a MAE of 3.52 years, and the Spearman correlation of 0.91 (OE in Table II).
In our second experiment, we divided the prediction results of four independent models into two groups. The first group contains the samples over 40 years old while the other group contains the samples under 40 years old. We separately trained the models on two sets of data, and used the results of 4 base models as the input feature of secondary learner. By establishing two ensemble models for different groups and combining them into a non-linear ensemble model, a lower MAE with 3.45 years is achieved, but its Spearman correlation also drops to 0.89 (nl-AAE-2 in Table II).
Next, we divided the prediction results into six parts according to the actual age of the samples, which were 10-20 years old, 20-30 years old, 30-40 years old, 40-50 years old, 50-60 years old, and 60-90 years old, then we applied the same method to build the non-linear ensemble model (nl-AAE-6 in Table II). This time, the ensemble model's average MAE is 3.39 years, and its Spearman correlation improves to 0.95.
Finally, we divided the data more finely, taking all the same age sample as a group, but due to the small amount of data, we can only use a simplified method, that is, for samples from 17 to 30 years old, we treated each age as a group. However, the size of data decreases as age increases. Therefore, for the 30 to 60-year-old samples, we took every 5 years old samples as a group. Likewise, we group the 60 to 70-year-old samples and the 70 to 90-year-old samples. We refer this finely split model as a "continuous" (or year-wise) model, namely nl-AAE-c in Table II. The ensemble model trained on this division provides the best performance with a MAE of 3.19 years and a Spearman correlation of 0.95.
Here, we present a comparison of our model with previous researchers' models which are also tested on the PAC 2019 dataset. Couvy-Duchesne et al. [44] built an ensemble model com- bined by 7 different algorithms, and its performance is attractive with a MAE of 3.33 years. Da Costa et al. [49] developed an ensemble of shallow machine learning methods (e.g., Support Vector Regression and Decision Tree-based regressors) with a MAE of 3.75 years. Soch [50] thought that distributional transformation (DT) can map the predicted values to the variable's distribution in the training data, which would improve decoding accuracy, and finally his model's performance is good with a MAE of 4.58 and spearman correlation between predicted age and the actual age of 0.93. These research results are attractive. By summarizing the experience of the previous researchers, we have developed the AAE method resulting in a marginal improvement compared to the previous research results. The brain age gap represents the difference between predicted age and chronological age. Figure 5 shows the brain age gap as functions of the chronological age using 7 different machine-learning methods. The slope of the line in Figure 5 indicates how much the prediction accuracy of the model is affected by increasing the age. As such, the prediction accuracy of AAE has the lowest influence by aging, and the prediction accuracy of SVR is most affected by aging.
The above experiment provides the following inspirations: 1) deep neural networks are in-general better than SVR; 2) all ensemble models have lower errors than discrete models; 3) our age-adaptive models have better performance than non-adaptive ensemble models; 4) finer the age-based division, lower the error can be achieved by the nl-AAE-c model.

B. Investigation on Age-Sensitivity per Models
The age-sensitivity shows the trend of MAE for each model according to the age. Therefore, in this section, we investigate the age-sensitivity of the models we built previously. Figure 6 shows the results suggesting that all the models are good at predicting young people but not old people where the MAE increases with the age of the sample. As for independent models, GoogLeNet and ResNet are more sensitive for age, their MAE has a significant increase in the age of 20 to 30, but SVR does not change drastically as a whole. Besides, GoogLeNet has the best performance for middle-aged people, and the MAE of all models has a significant change when the sample age is 70 years old.
The age-sensitivity of nonlinear age-adaptive ensemble model is similar to that of the independent models but more stable. Obviously, as the sample age increases, the MAE becomes larger and the model's performance gets worse and worse. When the samples' age is 50 to 60, the model's performance is the worst, with MAE exceeding 5. For a machine learning perspective, this is due to the lack of older samples that resulted in an insufficient training of the model. From a medical perspective, we believe that this is due to the fact that the differences between the brains of different people will become larger as age grows, which in turn will increase the difficulty of prediction. On the other hand, brain differences between young people are not that big. Therefore, when training the model in the future, we can increase the proportion of young people's data to improve the accuracy of the model.  3.75 -Distributional Transformation [51] 4.58 0.93 Figure 5 The brain age gap and MAE as functions of the chronological age using 7 different machine-learning methods, the horizontal black line represents 0 brain age gap. Figure 7 shows the change of each independent model's weights in the AAE according to the age. Through this figure, we can know the different importance of each independent model in the AAE at different ages.

C. Learning the Model Weights
The SVR's weights are relatively average for a sample between 25 and 70 years old. But for young samples from 10 to 25 years old, SVR does not perform well, which shows that SVR is not suitable for predicting the age of young people.
The CNN model plays an important role in the prediction results when the sample age ranges from 20 to 50 years old. CNN receives a high weight in the prediction of the data of young samples, while for the data of old samples, CNN has less contribution to the ensembled result.
GoogLeNet model is important in the AAE among elder groups. Although it has average weights in the 20-40 age groups, its prediction significantly determines the results of the ensemble models in the elder groups, especially for middleaged. This shows that the function of GoogLeNet is powerful, and it is suitable to predict the age of middle-aged and elderly samples.
ResNet model has basically maintained high weights for samples aged 10-35 years old, which shows that it is suitable for predicting the age of young samples. In the data of middleaged and elderly people aged 35 to 70, it has average performance therefore it does not have a large weight. But for data over 70 years old, it has a good performance, which means that ResNet is suitable for inferring the age of people over 70 years old.

A. Discussion of AI models
In this work, we have proposed a nonlinear age-adaptive ensemble (nl-AAE) model for brain age estimation with better results than other benchmark models. The most important contribution of this work is the age adaptive fusing strategy which significantly improves the performance of the ensemble model. To the best of our knowledge, no previous literatures documented this strategy. The characteristic of AAE is that it not only combines the advantages of multiple models, but the weights of independent models also change with age. This makes the AAE become a dynamic model, where the prediction results are more accurate given BrainAge projection is sensitive to age change.
Although our model yields state-of-art prediction results, there are still several issues that can be improved in future works. First, more advanced models can be introduced as initial models. For example, as for GoogLeNet, we used the Inception V1 version, now it has already had Inception V4 versions.
Second, MAE is generally used to evaluate the performance of the model. However, MAE is affected by age distribution and the number of objects in the training set, so MAE of different data sets cannot be directly compared. The lower the physiological age of the general object, the smaller the brain difference and the smaller the MAE value of the same age individuals. Actually for adolescents, the MAE value of the monomodular prediction model is 1 to 2 years, and that of the multimodal prediction model is about 1 year. However, for individuals of all ages or middle-aged and old age, the MAE value of the prediction model can only reach 4-5 years in general. Meanwhile, the larger the overall age span of the object is, the larger the evaluation index MAE will be. Therefore, the comparison between models can be made with various factors.
Third, we can try more ensemble methods, and compare their performance for brain age estimation in our future work. The approach we have used in this work is to re-weight the results of multiple independent models by integration and utilizing a nonlinear function which can have many types. For example, one can select only the best performing independent models as the ensemble model for each age group. In addition, we can also change the method of re-weighting. Using a multiple layer perceptron to combine the results of independent models is an interesting idea, although its training speed may be a little slower, its prediction accuracy is worth investigating.
Fourth, the gender is always an important variable in many experiments, based on the subtle differences between men's and women's brains. Therefore, the gender may be a key factor in improving model prediction accuracy. In our future work, we will investigate the influence of gender on brain age estimation and consider gender as a factor in our experiments.
Brain age prediction is a burgeoning research field that is developing rapidly. Brain age prediction models based on neuroimaging and their applications are increasing day by day. A growing number of researchers are using brain age analysis to explore brain aging in the course of health and disease, and many new and promising avenues of research are emerging. From the perspective of image modes, various image modes have their advantages and disadvantages, and the fusion of information from multiple modes is more likely to further improve the performance of the model. In addition, with the improvement of the architecture of the convolutional neural network and the appearance of the image data set of big data, we believe that the performance of the future model is likely to be further improved. The key to future model development is to continuously improve the accuracy of the model while improving the generalization ability of the model for new data. The ultimate goal of this field is to build a brain age model based on large image sets that completes that can be applied to provide accurate personalized cloud diagnosis services.

B. Discussion of medical findings
From our research, we also have some interesting findings from medical perspectives. First, we found that the performance of constituent models decreased with the age of the sample, which implies that young people have their brains in similar conditions. In addition, we have found that with an increase of age, the risk of people suffering from brain diseases increases, too. The differences between the brains are getting larger, and the accuracy of the model's prediction begins to decline. However, these results may be influenced by biological reasons, sample size or problems with the model, or a combination of both, so in the future, we still need to find a fairer way to prove these findings.
From the results of our experiments, we believe that changes in the brain can be divided into 4 stages, namely 0-30 years old, 30-50 years old, 50-70 years old and 70-80 years old. The criterion for classification is whether there is a significant change in the model's performance in predicting age. From Figure 5, we notice that during the period of 0-30 years old, the human brain undergoes a significant process of change. Speaking at a meeting of the Academy of Medical Sciences in Oxford in the UK, researchers explained that our brains slowly transition to adulthood, which is finally reached in our 30s. After the age of 30, the brain's working memory capacity begins to slowly decline [40], which is in agreement with our research findings. At the age of 30 to 50, the brain changes little, but there will be a significant change around the age of 50. Research in the British Medical Journal [41] also shows that in a group of people who were first tested on various mental abilities when they were 45-49 years old, reasoning skills declined by 3.6 percent over 10 years. At the age of 50-70, the brain does not change much, but after the age of 70, the brain will have the last significant change. Peter Jones's research also shows that the overall volume of the brain begins to shrink when we're in our 30s or 40s, with the rate of shrinkage increasing around age 60-70 [40], the results of our experiments can also be evidence of it.
Exercise, reading, meditation and other similar behaviors are good methods to prevent brain disease [7][8]. People who exercise, meditate regularly, and those with higher education levels have lower predicted brain age than their peers, which shows that their brains are more active and the risk of brain diseases is lower. A study reported in [38] analyzed samples over the age of 50 and found that people who do not exercise or who exercise little have their brains about 5-10 years older than those who exercise regularly. Another recent research found that stem cells in the brain's hypothalamus likely control how fast aging occurs in the body [39]. Specifically, the number of hypothalamic neural stem cells naturally declines over the life of the animal, and this decline accelerates aging. Researchers injected hypothalamic stem cells into the brains of normal old and middle-aged mice, whose stem cells had been destroyed, the measures of aging were slowed or reversed. This is an exciting discovery, which will be an important step in slowing down aging and treating brain diseases. The brain age prediction model in this article is sensitive to changes in the brain, and we believe it can be a useful tool for detecting medicine performance.

VII. CONCLUSION
In this paper, we proposed a nonlinear age-adaptive ensemble method for brain age estimation from MRI images. From our experiments, we clearly show that ensemble models can in general achieve lower errors than discrete models, and our nonlinear age-adaptive ensemble models are consistently better than age-agnostic ensemble models. Among discrete models, Goog-LeNet basically has a good performance on data of all age groups, especially for middle-aged and old samples. With the significantly increased accuracy on brain age estimation, our nonlinear age-adaptive ensemble models can potentially help doctors to identify the risk of brain diseases more accurately and efficiently, help pharmaceutical companies develop drugs or treatments more precisely, and provide a powerful tool for researchers in the field of brain science.