Explainable Tensor Multi-Task Ensemble Learning Based on Brain Structure Variation for Alzheimer’s Disease Dynamic Prediction

Machine learning approaches for predicting Alzheimer’s disease (AD) progression can substantially assist researchers and clinicians in developing effective AD preventive and treatment strategies. This study proposes a novel machine learning algorithm to predict the AD progression utilising a multi-task ensemble learning approach. Specifically, we present a novel tensor multi-task learning (MTL) algorithm based on similarity measurement of spatio-temporal variability of brain biomarkers to model AD progression. In this model, the prediction of each patient sample in the tensor is set as one task, where all tasks share a set of latent factors obtained through tensor decomposition. Furthermore, as subjects have continuous records of brain biomarker testing, the model is extended to ensemble the subjects’ temporally continuous prediction results utilising a gradient boosting kernel to find more accurate predictions. We have conducted extensive experiments utilising data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) to evaluate the performance of the proposed algorithm and model. Results demonstrate that the proposed model have superior accuracy and stability in predicting AD progression compared to benchmarks and state-of-the-art multi-task regression methods in terms of the Mini Mental State Examination (MMSE) questionnaire and The Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) cognitive scores. Brain biomarker correlation information can be utilised to identify variations in individual brain structures and the model can be utilised to effectively predict the progression of AD with magnetic resonance imaging (MRI) data and cognitive scores of AD patients at different stages.


I. INTRODUCTION
Alzheimers's disease (AD) is a severe primary neurodegenerative disease in which neurons and their connections deteriorate over time, leading to a full spectrum of dementia including cognitive decline, memory loss and executive dysfunction [1]. There is currently no cure to treat or reverse the progression of the disease and it puts patients and their families under enormous psychological and emotional stress. Numerous studies have been conducted to recognize sensitive and precise biomarkers of early Alzheimer's disease progression that will assistance in early AD diagnosis to create, evaluate and validate current and new treatments.
Utilising machine learning methods to predict AD progression can greatly support clinicians and researchers in making effective disease prevention and treatment decisions. Standard AD prediction methods rely on quantifying and extracting important biomarkers from diverse modalities (e.g., Magnetic Resonance Imaging (MRI) and Positron VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Emission Tomography (PET)), and then learning the model as a regression problem to calculate cognitive scores at different time points. Existing AD progression models mainly utilise machine learning regression algorithms [2], [3], statistical probability-based survival models [4], [5] and deep learning methods based on neural networks [6], [7]. The input features of the above models are signified as second-order matrices containing patient and biomarker information, and this data representation makes it difficult to predict and analyse disease progress from multiple dimensions (e.g., spatial and temporal dimensions). At the same time, the second-order matrix can only focus on a single biomarker, which will lose the correlation information between different AD biomarkers.
Therefore, instead of applying a second-order matrix with two components for each index, we constructed a thirdorder tensor with three components for each index (Fig. 1). The paper proposes to build a third-order tensor to build an AD prediction model to better present numerous aspects of AD data in both spatial and temporal dimensions. With the enhanced presentation of AD biomarker features, the utilise of the tensor in regression algorithms can improve prediction accuracy, stability and interpretability. Secondly, for AD prediction models, multi-task learning (MTL) can share information across tasks, outperforms traditional single-task learning methods in terms of prediction accuracy, interpretability and generalisation, and is most effective when the number of samples is small [8]. Therefore, we designed a tensor-based MTL approach to predict AD progression by incorporating spatio-temporal information on brain structural variations. Specifically, we first propose a method for quantifying structural variations in the brain based on similarity calculations, which expresses the similarity of morphological variation trends between biomarkers as a third-order tensor with dimensions corresponding to the first biomarker, the second biomarker and the patient sample. Subsequently, the proposed algorithm performs a CANDECOMP/PARAFAC (CP) decomposition of the tensor [9] and extracts a set of rank-one latent factors from the data. As shown in Fig. 1, the similarity in morphological variation trends between biomarkers can be decomposed into a set of rank-one tensors, each calculated from the outer product of three rank-one latent factors. Each latent factor is described by its first biomarker, second biomarker and patient sample dimension, resulting in an interpretable way to describe the latent factors controlling the variability of the data, and the latent factors can be utilised as predictors for training the MTL model.
In addition to the above challenges, in real-life applications, patients with suspected AD will continue to go to hospital for testing, which is a waste of subsequent incremental data if only a baseline model is utilised or if continuous testing records of the patient cannot be reasonably integrated. To address this problem, we utilise a gradient boosting ensemble learning approach to integrate consecutive test records of subjects to further improve prediction accuracy.
The contributions of this article are summarized as follows: 1) A similarity measurement method for quantifying and understanding variability in AD brain structure data is proposed to extract temporal and spatial information between biomarkers, further combined with tensor decomposition to obtain latent factors.
2) The proposed tensor-based MTL algorithm seamlessly integrates spatio-temporal information based on brain structural variations and its biomarker latent factors, thus significantly improving the predictive accuracy and stability of AD progression.
3) Identified and analysed important spatio-temporal variation correlations between brain biomarkers in the AD progression.
4) The proposed AD dynamic prediction utilises gradient boosting ensemble learning to combine multiple consecutive MRI detections and the experimental results demonstrate that the prediction accuracy continues to improve as the number of MRI detections increases.

II. RELATED WORK
Numerous preceding brain science studies have focused on the differences in brain structure variation of AD, CN (cognitively normal older individuals) and MCI (mild cognitive impairment). [10] developed a distortion-based framework for modelling the properties of AD and aging on the morphological progression of the brain, emphasizing specific morphological changes in the brain to help identify clinical conditions. [11] evaluated the correlation of CSF and MRI biomarkers with clinical diagnosis and cognitive functioning in patients with CN, AD and aMCI (amnestic mild cognitive impairment). It was concluded that MRI provided stronger cross-sectional grouping, discrimination and correlated better with cross-sectional integrated cognitive and functional abilities. [12] used automated MRI analysis to evaluate cortical thickness in healthy older adults, MCI patients and AD patients. Patterns of cortical thinning were identified as a function of disease progression, and it was discovered that as the disease marched from MCI to AD, the whole cortex thinned and extended appreciably into the lateral temporal cortex.
In addition, the research on correlations between AD MRI biomarkers has been a focus of brain structural variation research, [13] used correlation of multi-kernel support vector machines and regional mean cortical thickness to combine relevant information with ROI-based data to advance the classification performance of AD and its precursor stages. [14] structured brain networks by thresholding the cortical thickness correlation matrix for different regions and analysed them using graph theory. The above study evaluated and analysed the relationship between AD progression and brain biomarkers and showed that there are differences between brain biomarkers for AD, MCI and CN. However, the above studies only focus on a particular biomarker or the same category of biomarker, lacking the linkage and correlation of spatio-temporal variation between dissimilar categories of biomarkers, which is important for AD feature representation.
The AD prediction can be considered as a multi-task regression problem [15]. The primary assumption of the model is that there is an intrinsic link between a large number of data records and that capturing the intrinsic link enhances the generalizability of the predictive model. The sharing of information between different patient prediction tasks promises advance achievable performance. This advantage is particularly pronounced when the number of input features (e.g., AD biomarkers) exceeds the number of samples (e.g., patient samples) [16]. In the field of MTL for AD, existing approaches have focused on modelling relationships between tasks using novel regularisation techniques [17], [18]. Kernel methods were added to the technique to enable it to fit nonlinear relationships [19], [20]. The aforementioned study and experiments demonstrated that the regularised MTL technique performs well in a diversity of AD prediction applications.
To the best of our knowledge, there is no commonly exercised tensor regression algorithm that exploits the multidimensional properties of biomarker data to predict AD progression. Current third-order tensor-based algorithms in AD are mainly used for images (e.g., MRI) [21] and electroencephalograms (EEG), as the images themselves are thirdorder tensors and each patient's EEG can be composed of a third-order tensor in the time and frequency domains. But in terms of the form of the data, MRI brain biomarker data can be formed into a third-order tensor. The first characteristic of data that can be constructed as a tensor is that it is multidimensional data, and the second characteristic is that the original data is inherently a tensor, for example as an RGB image it is innately a three-dimensional tensor. Brain biomarker data are multidimensional data that fit the first characteristic of tensor data. Based on this fact, we have pioneered a tensor-based MTL method for accurate AD progression prediction.

A. DENOTATION
For brevity, we represent tensors as italic capital letters, such as X or Y , and matrices by capital letters, such as A or B. Vectors are denoted by lowercase letters such as x whereas Scalars are denoted by italic lowercase letters such as a.

B. SPATIO-TEMPORAL VARIATION SIMILARITY CALCULATION OF MRI BIOMARKERS
Two successive MRI tests were utilised to calculate the spatio-temporal variation in brain biomarkers. The quantitative approach has been reported in its preliminary versions [22], [23], and this research expands and exploits it across number of successive time points (BL to M06, M06 to M12, M12 to M24). For instance, at time points baseline (the date the patient was first screened in hospital) and M06 (the time point six months after the first visit), we utilised MRI at the corresponding time points to calculate the rate of change and velocity for each biomarker, where x is the test value of brain biomarkers and t is the MRI detection dates. The rate of change is The rate of change and velocity were then utilised to create a vector which describes the morphological variation trend of brain biomarker.
In this study, the similarity between two vectors was calculated utilising the Mahalanobis distance as a method to indicate the similarity of the spatio-temporal variation of two MRI biomarkers. The Mahalanobis distance was utilised because it is scale-independent when the covariance matrix is divided [24]. The Mahalanobis distance between the vectors x i and x j is defined as: Fig. 2 shows the spatio-temporal correlations of brain biomarkers of AD, CN and MCI calculated by Mahalanobis distances. Although similarity calculations can demonstrate differences in the spatio-temporal correlation of AD, CN and MCI brain biomarkers, there is a unifying problem that half of the data is duplicated due to pairings of biomarker association, it may increase the computational complexity and this study addresses this problem utilising the duplicate data correction matrix in the algorithm design.

C. TENSOR DECOMPOSITION
Our proposed formula requires an understanding of the latent factors of the correlation tensor of morphological variation trends between MRI biomarkers. These latent factors are represented by factor matrices A and B, which can be derived utilising tensor decomposition methods. There are two mainstream standard approaches for tensor decomposition, specifically Tucker and CANDECOMP/PARAFAC (CP) decomposition [9]. The Tucker decomposition decomposes the tensor into the result of the core tensor and the factor matrix for each mode. Although it expresses a more inclusive statement, it is problematic to interpret the latent factors, as the amount of latent factors may differ for different model. By comparison, CP decomposition decomposes the tensor into a set of rank-one tensors. i.e., • denote the outer product operation between two vectors, while a i ,b i and c i correspond to the vectors related with the i-th latent factor. Given a tensor X of the size n 1 × n 2 ×n 3 , the size of matrix A, B and C is n 1 × r, n 2 ×r and n 3 × r respectively.

D. TENSOR MULTI-TASK REGRESSION
To predict cognitive scores (e.g., MMSE and ADAS-Cog) at future time points. Consider a tensor multi-task regression problem with t time points, n training samples with d 1 and d 2 features. Let X ∈ R d 1 ×d 2 ×n be the input tensor from two consecutive MRI tests and the it is the amalgamation of similarity matrix of all n samples X n ∈R d 1 ×d 2 , Y = [y 1 , · · · ,y t ] ∈R n×t be the targets and y t = [y 1 , · · · ,y n ] ∈R n is the corresponding target (clinical scores) at different time points. We utilise the operator as follows: The input tensor for the similarity of morphological variation trends in brain biomarkers is a symmetric tensor because the relationships between biomarkers are paired and therefore half of the data are duplicated. The research further proposes a duplicate data correction matrix to resolve the problem of duplicate data and it states as follows: For t-th prediction time point, the objective function of the proposed approach can be stated as follows: where the first term computes the empirical error for the training data,ŷ t = ŷ 1 , · · · ,ŷ n ∈ R n are the predicted values, A t ∈ R d 1 ×r is the latent factor matrix for first biomarker dimension and B t ∈ R d 2 ×r is the latent factor matrix for second biomarker dimension with r latent factors, W t ∈R d 1 ×d 2 is the model parameter matrix for t-th prediction time point, λ and β are the regularization parameters. Obtaining latent factors by optimising objective function denote the outer product operation between two vectors. W t , A t , B t , C t 1 employing an l1-norm on the W t , A t , B t and C t matrices respectively.
For all prediction time points, the objective function can be stated as follows: where W f P(α) 2 F is the generalized temporal smoothness term, model parameter matrix W f ∈R (d 1 ×d 2 )×t is the temporal dimension unfolding for model parameter tensor W ∈ R d 1 ×d 2 ×t , θ is the regularization parameter. The generalised temporal smoothing states the fact that in actually diagnosing AD, the specialist not only relies on the patient's current symptoms, but also takes into account their previous symptoms. Therefore, we assume that the i-th progression in an individual AD patient is related to all preceding progressions. The generalized temporal smoothness prior describe as follows: where w denoted the progression with preceding progression information. w i is the i-th column of W. where the parameter α represents the relational degree of the i-th progression and all preceding progressions. In addition, the impact of each stage of disease progression on the following stage may not be consistent, and therefore the relational degree parameters differ for each disease progression stage. The definition of the i-th progression δw i for one patient is: As a result, we can describe the more realistic temporal smoothness assumption with matrix multiplication:  Latent factors A ∈ R d 1 ×r×t , B ∈ R d 2 ×r×t , C ∈ R d 3 ×r×t and the model parameter W ∈R d 1 ×d 2 ×t can be learned by iteratively optimising the objective function for each set of variables to be solved. Because not all components of the objective function are differentiable, we utilise proximal gradient descent to solve each subproblem. Specifically, the terms in our objective function involving Frobenius norms are differentiable, but those involving the sparsity-inducing l1-norms are not differentiable. In the MTL model, the proximal approach is frequently utilised to construct the proximal issue for the non-smooth objective function [25], [26], [27], [28], by replacing the smooth function with the quadratic function, we get the sum of the smooth and non-smooth functions. Its quadratic functions can be constructed in a variety of ways based on Taylor series, and the resulting proximal problems are usually easier to solve than the original ones. The strategy can simplify the design of distributed optimisation algorithms or accelerate the convergence of the optimisation process.

E. GRADIENT BOOSTING
Ensemble learning has been proven to be effective in a variety of prediction tasks by grouping a set of weak learners together to build stronger learners. Boosting is the dominant technique in ensemble learning methods, which produces a set of weak learners in which predictors are trained sequentially rather than individually, with the aim of utilising the errors of the previous learner to develop a more effective model for the next learner.
Gradient Boosting (GB) is an extension of the boosting method which utilises gradient descent optimisation techniques to identify global or local minima of the cost function. It trains the machine to fit the model on the input feature space through a series of weak learners, each of which improves the prediction accuracy of the previous learner. GB trains powerful learners by combining numerous weak learners in multiple iterations [29], [30]. The proposed method enhances prediction accuracy by sequentially fitting a more accurate model to the residuals of the preceding step in the final stage of the GB construction framework. This process will continue until a highly accurate model is obtained. The flowchart of the proposed method in the training stage is illustrated in Fig. 3.

IV. EXPERIMENTAL SETTINGS A. DATASET
Data used in the preparation of this article were obtained from the Alzheimer s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer s disease (AD). The FreeSurfer image analysis software (http://surfer.nmr.mgh.harvard.edu/) was used by a team from the University of California, San Francisco (UCSF) to conduct volumetric segmentations and cortical reconstruction using imaging data from the ADNI database, which includes all ADNI subprojects (ADNI 1, 2, GO, 3). We gained the MRI data from the ADNI website and continued to implement the subsequent pre-processing steps: • Removal of features with missing values in more than half of the sample; • Exclusion of individual participants who did not have BL and M06 MRI; • Missing data were filled with average of the features; • Removal of cognitive function assessments for individuals with missing follow-up points in longitudinal studies; • For AD dynamic prediction, exclude individuals who did not have follow-up MRI detections.
After the pre-processing steps, there are a total of 313 MRI features. which can be classified into five categories: the volumes of cortical parcellations (CV), the volumes of specific white matter parcellations (SV), the total surface area of the cortex (SA), average cortical thickness (TA) and standard deviation in cortical thickness (TS). Table 1   demographic features of the ADNI MRI data utilised in this research.
For the predictive target of the approach, cognitive scores (MMSE and ADAS-Cog) can be used to precisely differentiate between CN, MCI and AD in the clinical scenarios. For MMSE with score range 0-30, 30 is represented as cognitively no dementia, 29-26 is represented as questionable dementia, 25-21 is represented as mild dementia, 20-11 is represented as moderate dementia, 10-0 is represented as severe dementia. For ADAS-Cog with score range 0-70, where higher scores indicate greater cognitive impairment.

B. EVALUATION METRICS
The similarity tensor of morphological variation trends between MRI brain biomarkers was utilised to build predictive models for each target. The data was randomly divided into a training set and a test set in a ratio of 9:1. As the number of model parameters (λ, β and θ), the hyperparameters α and the latent factor r must be designated during the training phase, we utilise 5-fold cross-validation on the training data to select them. The research evaluates the prediction performance of various methods for each single time point utilising the root mean square error (rMSE) as the major evaluation metric. We use normalised mean square error (nMSE) for the overall regression performance metrics, which is commonly utilised in multi-task learning research [31]. The rMSE and nMSE are stated as follows: where for rMSE, y is the ground truth of target at a single time point andŷ is the corresponding prediction by a model. For nMSE, Y i is the target's ground truth at time point i andŶ i is the corresponding prediction from a model. We reported the mean and standard deviation based on 20 iterations of experiments on dissimilar splits of data.

A. COMPARISON WITH THE BENCHMARKS AND STATE-OF-THE-ARTS
We utilised the Mahalanobis distance to construct a tensor of morphological variations in the brain, combined with the proposed tensor multi-task ensemble learning (TMTL-GB) algorithm to compare with single task learning, benchmarks and state-of-the-art multi-task learning algorithms that were chosen as competitive methods in studies to predict clinical deterioration, including Ridge regression (Ridge) [32], Lasso regression (Lasso) [33], Temporal Group Lasso (TGL) [8], Non-convex Fused Sparse Group Lasso (nFSGL1) [34], Convex Fused Sparse Group Lasso (cFSGL) [8], Non-Convex Calibrated Multi-Task Learning (NC-CMTL) [35], Fused Laplacian Sparse Group Lasso (FL-SGL) [36], Joint feature and task aware multi-task feature learning (FTS-MTFL) [37], Group Asymmetric Multi-Task Learning (GAMTL) [38] and Dual feature correlation guided multi-task feature learning (dMTLc) [39]. The experimental results of MMSE and ADAS-Cog predictions are shown in Table 2 and 3. For overall regression performance, our proposed approach outperforms benchmarks and state-of-the-art approaches in terms of nMSE for both cognitive scores MMSE and ADAS-Cog. And for all individual time points, the proposed approach obtains a smaller rMSE than other approaches. The followings are our main observations: 1) The proposed tensor MTL model outperforms single-task learn-ing models, benchmarks and state-of-the-art MTL models, demonstrating the utilise of morphological variation trend similarity calculations and tensor latent factor hypothesis in our MTL formulation. 2) The proposed tensor MTL method significantly improves the prediction stability. The results obtained through 20 iterations have a lower standard deviation compared to other comparative methods. This may be due to the addition of biomarker latent factors to the prediction algorithm to increase stability. 3) The proposed tensor multi-task ensemble learning can effectively aggregate the temporally continuous MRI records of subjects to improve the prediction accuracy, and with the increase of temporally continuous MRI records, the prediction accuracy increases for subsequent time points. In contrast, the addition of temporally continuous MRI records had no significant effect on benchmarks and state-of-the-art competitive methods.

B. INTERPRETABILITY OF SPATIO-TEMPORAL RELATIONSHIPS BETWEEN BIOMARKERS
Currently, there is no cure for AD, therefore the key to current treatment is early detection and prevention of AD, VOLUME 11, 2023  therefore, identifying important spatio-temporal biomarker relationships in early MRI data can help clinicians to identify patients with suspected AD for early prevention. Tables 4, 5, 6 and 7 provide the top 10 brain biomarker relationships in descending order of weighted parameter values for MMSE prediction (since the MMSE sample size was higher than the ADAS-Cog at all time points) at different time points for the proposed TMTL-GB model. Higher values indicate a greater impact on the final prediction.
In Fig. 4, we observed a certain similarity in the plots of important brain biomarker relationships at different time points, indicating that there are a number of brain biomarker relationships that are consistently important in the progression of AD and that they can be utilised as potential indicators for the early identification of AD, and in combination with the above tables we discovered that several spatio-temporal relationships between brain biomarkers were significant at all time points. Specifically For Vol(C). of R.RostralMiddleFrontal -Surf. Area of R.InferiorTemporal, the frontal gyrus is related to a person's literacy and numeracy skills [41]; The inferior temporal gyrus is closely related to visual information processing, and abnormalities of the inferior temporal gyrus are associated with semantic memory disorders (e.g. AD), face blindness and cortical color blindness [42]. Both are related to the processing and implementation of information in the brain.
For CTA. of L.SuperiorParietal -Vol(C). of R.InferiorParietal, the superior parietal lobule is associated with spatially oriented brain functions; The inferior parietal lobule is associated with emotional cognition and the interpretation of sensory information, as well as with language, mathematical operations and bodily imagery [43]. This correlation can be a factor in causing decline estimation and analytical ability in AD.    For CTA. of R.ParsTriangularis -CTA. of L.Bankssts, the pars triangularis is associated with the ability to translate from a second or third language back into the mother tongue, with its involvement in semantic processing [44]; Bankssts is the posterior part of the superior temporal gyrus, which is responsible for processing auditory signals, including speech, VOLUME 11, 2023 FIGURE 4. Visualization for the top-10 rank brain biomarker relationships with different time points for the proposed TMTL-GB model on MMSE prediction. Visualization was performed by the toolkit of BrainNet Viewer [40]. The colors of the nodes represent the different biomarker categories and the thickness of the edges represents the importance of the relationship between the biomarkers, with thicker edges representing more important relationships between the biomarkers. and the comprehension of language [45]. Both are correlated with the brain's linguistic response and semantic processing.
For Vol(WM). of CorpusCallosumMidAnterior -CTA. of L.TransverseTemporal, the corpus callosum is a fibrous bundle of fibers linking the left and right hemispheres. It maintains coordinated activity between the two hemispheres and connects the corresponding parts of the left and right hemispheres, coordinating the activity between the two halves of the brain and making the brain function as one. If the corpus callosum is impaired, the activity of the two hemispheres is not coordinated [46]; The transverse temporal gyrus is the first cortical structure to process auditory information and is part of the primary auditory cortex [47]. This correlation can be one of the factors causing reduced physical coordination in AD.
For Vol(WM). of L.Caudate -CTStd. of R.InferiorTemporal, the caudate nucleus is an important part of the brain's learning and memory system [48]; The inferior temporal gyrus is closely related to visual information processing. This correlation can be one of the factors that cause the decline of learning ability and memory in AD.
For Vol(C). of L.SuperiorFrontal -Surf. Area of L.SuperiorTemporal, the superior frontal gyrus is involved in higher order cognitive functions of the brain, particularly working memory [49]; The superior temporal gyrus includes parts of the auditory cortex as well as the main areas of the language center [45]. This correlation can be a factor that causes the co-occurrence of language function and cognitive dysfunction in AD patients.
For Surf. Area of R.Postcentral -CTStd. of R.Insula, the postcentral gyrus is the seat of the primary somatosensory cortex and is the nerve center of the somatosensory system [50]; The insula is thought to be associated with consciousness and to play a role in a variety of functions normally associated with the regulation of emotion or bodily homeostasis, these functions include perception, motor control, selfawareness, cognitive functions [51]. This correlation can be one of the factors that cause physical activity impairment in AD.
AD is clinically characterized by generalized dementia manifestations such as memory impairment, agnosia, aphasia, apraxia, impairment of visuospatial skills, personality and behavioural changes, and executive dysfunction. We can observe that the important spatio-temporal relationships between brain biomarkers indicative of brain functions are all related to the clinical manifestations of AD.

C. CLINICAL APPLICATION
In clinical application, our proposed approach can be utilised to obtain a patient's current MRI data and predict the patient's cognitive scores at multiple time points in the future, thereby helping clinicians and patients to detect the disease and implement intervention treatment in early stages. Moreover, patients suspected of AD will continue to go to the hospital for MRI testing in real-word application. Subsequent incremental MRI data is wasted if only the baseline model is used or if the patient's serial examination records cannot be properly integrated. To address this problem, we have applied the concept of ensemble learning to our approach, which allows the model to continuously receive MRI data from subjects and continuously update predictions of future cognitive scores with improved accuracy.

VI. CONCLUSION
We proposed a tensor multi-task ensemble learning method based on tensor decomposition for predicting AD progression at different time points to overcome variability and instability in prediction accuracy. In our framework, a prediction model is developed based on spatio-temporal morphological variation trend correlations across biomarkers and multi-task regression, utilising tensor latent factors as multi-task relationships to transfer knowledge and calculate final prediction results. Furthermore, the proposed approach utilises gradient boosting ensemble learning technique integrate temporally continuous MRI recordings to consistently improve predictive accuracy of AD progression. The results of the experiment demonstrate that correlation information can be utilised to identify variations in individual brain structures underlying AD, MCI, and CN, and support the utilisation of correlation data to predict and diagnose AD progression.

ACKNOWLEDGMENT
Alzheimer's Disease Neuroimaging Initiative is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer s Drug Discovery Foundation; Ara-