Temporally Consistent Segmentation of Brain Tissue From Longitudinal MR Data

In this paper, we propose a new 4D segmentation formulation, aiming to improve the temporal consistency of adults’ brain tissue segmentation. In our method, tissue segmentation result is represented by the membership functions of the tissues, which are derived as a result of minimizing energy in a fuzzy C-means (FCM) framework. We first introduce a variational formulation with a temporal consistency constraint on the membership functions, then convert the constraint to a vector-valued function from which the membership functions are directly computed according to an analytically defined mapping. These vector-valued functions capture the bias field and intensity means of each tissue addressing the temporal variation in intensity inhomogeneities and the intensity means of the images. The effectiveness is demonstrated on the Baltimore Longitudinal Study of Aging (BLSA) benchmark dataset. Our method takes few parameters which are easy to tune and achieves 96.4% of TC factor for gray matter, 98.1% of TC factor for white matter, significantly superior to the compared methods on this research line.


I. INTRODUCTION
Segmentation of a series of 3D data of the same subject captured at different times is a fundamental and an important task in many neuroimaging studies [1]- [5] that concentrate on normal development and aging, as well as on the evolution of pathology. The temporal consistency of the segmentation is especially important for studies analyzing subtle brain changes over time, such as in the case of normal aging and Alzheimer's disease (AD). The task is considered difficult on MR scans as the intensity characteristics of brain tissue and pathology can greatly vary between scans of the same patient. Thus, the inconsistencies in segmentations generated from applying 3D approaches [7], [23]- [25] to a 4D data set can hide the subtle changes caused by the disease.
There are various methods proposed for addressing the above issues in the past years. In [10], Xue et al. propose The associate editor coordinating the review of this manuscript and approving it for publication was Junhua Li . a method that overcomes the limitation of 3D segmentation and improves the temporal consistency of segmentation by formulating the segmentation problem in 4D. Their method is called Consistent Longitudinal Alignment and Segmentation for Serial Image Computing (CLASSIC). CLASSIC not only jointly segments a series of longitudinal 3D MR brain images of the same individual, but also estimates the longitudinal deformations in the image series, e.g. tissue atrophy. It iteratively performs two steps: (i) it jointly segments a series of 3D images using a 4D image-adaptive clustering algorithm based on the current estimate of the longitudinal deformations in the image series, and (ii) it then refines these longitudinal deformations using a 4D elastic warping algorithm. While CLASSIC generally outperforms 3D segmentation algorithms in terms of temporal consistency, it is sensitive to initialization. CLASSIC assumes that the input images are already corrected to homogeneity and have similar intensity patterns across scans. Generating those kinds of MR images is generally difficult, which negatively impacts the robustness of the approach. Besides, CLASSIC requires the tuning of several parameters, which makes it less practical for the clinical setting. Furthermore, a longitudinal processing pipeline was proposed in FreeSurfer [11]. In this pipeline, a group-mean image is firstly generated by averaging from the rigidly-aligned longitudinal images of a subject. The cortical surfaces of the group-mean image are then used as initialization for each longitudinal image. Finally, the cortical surfaces at each time point are separately deformed to achieve longitudinal cortical surface reconstruction. The measurement of cortical thickness is of great interest in studying both normal brain development aging and a wide variety of neurodegenerative and psychiatric disorders. Neuroscience studies have suggested that various diseases such as AIDS or AD may affect cortical thickness [12]. In [13], Wang et al. propose a 4D segmentation method for longitudinal brain MR images of adult subjects with consistent longitudinal measurement of cortical thickness. In [14], Feng et al. propose an automated method for the segmentation of brain tissues in longitudinal MR images. The method first separately segmented brain tissues into white matter, gray matter, and cerebrospinal fluid by bias correction embedded fuzzy c-means. After being normalized in interclass, the similarities are incorporated into a nonlocal means denoising formula to regularize the segmentation in both spatial and temporal dimensions. Non-locally regularization results are used to compute final membership functions for the segmentation. It accelerates the modified denoising algorithm using Compute Unified Device Architecture (CUDA) and obtains accelerations of more than two orders of magnitude. It demonstrates the advantages of the proposed method in terms of segmentation accuracy and the ability to consistently segment brain tissues in an arbitrary number of longitudinal brain MR image series. Recently, some other automatic machine learning and deep learning based methods are proposed for multimedia analysis [15]- [18]. These methods are adaptive and easy to fit the medical image analysis community.
To address the above challenge issues and improve the temporal consistency of tissue segmentation, we propose a new 4D segmentation formulation. In our method, tissue segmentation is represented by the membership functions of the tissues, which derived from the minimization of the energy in a fuzzy C-means (FCM) framework. Adding both the global and local spatial information into the membership function is to decrease the sensitivity problem to the noise and intensity inhomogeneity in MR data [28]- [30]. There are very few parameters in our method, and it is easy to tune the parameter to achieve desirable results. As a result, our method does not require preprocessing for intensity inhomogeneity correction and intensity normalization of the series of images. Experiments implemented on a benchmark dataset show that our method is capable of achieving 96.4% of TC factor for gray matter, 98.1% of TC factor for white matter and lower label reverse rate, which is significantly robust, efficient, and stable compared with the state-of-the-art methods.
The remainder of the paper is organized as follows. Section II provides a detailed description of a commonly 3D segmentation model and our 3D segmentation method for a single MR image. Section III provides a detailed description of our 4D segmentation method for the images in a longitudinal series. Section IV reports the experimental results. Several concluding remarks are finally given in Section V.

II. 3D SEGMENTATION
In this section, we first describe a commonly used model of MR images and our method for 3D segmentation of a single MR image, which will be then extended to 4D segmentation by introducing a regularization term for temporal consistency in Section III.

A. MEMBERSHIP FUNCTION OF IMAGE MODEL
According to a commonly used model of MR images [6]- [8], the intensity of an observed MR image at a location x can be modeled as: where I is the observed image, J is the true image, and B is the bias field that accounts for the intensity inhomogeneity in the observed image I , and N is white noise. Ideally, the intensities of the true image J in each tissue take a specific value of the physical property being measured respectively (e.g. the proton density). Therefore, the true image J is approximately a constant c i in the i-th tissue, which is located in a sub-region i in . Thus, the true image J can be approximated by: where c i is a constant that approximates the voxel intensities from the i-th tissue, and u i is the membership function of the tissue/region i . We call the constants c i the true intensity mean (TIM) of the voxels in the i-th tissue. Typically, we assume there are three types of tissues in the brain: white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF).
In this paper, we represent the membership functions u 1 (x), · · · , u N (x) with a vector-valued function U, (u 1 , · · · , u N ) T , and the TIMs c 1 , · · · , c N with a column vector c, (c 1 , · · · , c N ) T , where (·) T is the transpose operator. For convenience, the vector U is referred to as a membership vector, and the vector c is referred to as a TIM vector. The membership function U is a mapping U : → U, where U is defined by: which is a convex set.

B. SEGMENTATION OF A SINGLE IMAGE
Our formulation of 4D segmentation is based on a new method for joint segmentation and bias field estimation for a single 3D MR image described as follows. In our method, the bias field B is approximated by a linear combination of a set of basis functions g 1 , · · · , g N : In this paper, we use orthogonal polynomials as the basis functions, and those basis function satisfy: where δ ij = 0 for i = j and δ ij = 1 for i = j. By representing the basis functions g 1 , · · · , g M as a column vector G = (g 1 , · · · , g M ) T and the coefficients w 1 , · · · , w M as column vector W = (w 1 , · · · , w M ) T , we can express the bias field as B(x) = W T G(x).
Our tissue segmentation and bias field estimation is based on the adaptive fuzzy C-means (AFCM) framework proposed by Pham and Prince [8], in which the following energy was proposed in the form: where q > 1 is the fuzzifier, P 1 (B)and P 2 (B) are the smoothing term on the bias field B, which are introduced to penalize the L 2 norms of the first and second order derivatives of B. Note that we use a continuous version of the AFCM energy here, instead of the discrete version in [8].
In our method, since bias fields are approximated by a linear combination of a set of smooth basis functions, there is no need to use the smoothing term as in the above AFCM framework, i.e. the penalty terms λ 1 P 1 (B) + λ 2 P 2 (B) can be removed in our method. Therefore, we propose to minimize the energy: subject to the constraint: The above energy can be minimized by an iterative process with respect to the variables U,c, and W, respectively. The minimization with respect to c and W will be described in Section III in the context of 4D segmentation. In this section, we give the solution to the optimal membership function U that minimizes the energy F for a single MR image.
The energy F can be expressed as: where R = (r 1 , . . . , r N ) T is a vector with components defined by: It can be shown that the optimal membership function U(x) = (u 1 (x), . . . , u N (x)) T that minimizes the energy F(U, R) in Eq. (8) is given by: This equation defines a mapping φ : R → U. The membership vector U is determined by the vector R through the mapping φ.
The mapping φ is scalar invariant, i.e., for any scalar constant s > 0, it holds that φ(sR) = φ(R). With this property, we can change the length of the vector R by multiplying it by a scalar constant, so that the point sR is located in a given surface R. In this paper, we use the surface R defined by: with a constant α > 0. This property is meaningful in the segmentation of longitudinal data. For a location x, the vectors R(x) computed from the images captured at different time points can be far away from each other, which does not allows us to perform temporal regularization on the sequence of such vectors. However, we can always multiply the vector R by a scalar s such that sR ∈ R, which corresponds to the same membership vector, i.e. U = φ(sR) = φ(R). Thus, the vectors R computed for different time points are ''pulled'' to the same surface R, such that they are comparable and therefore allow the application of temporal regularization to the vectors R computed for different time points. The mapping φ and its properties will be further discussed and used in the next section for 4D segmentation.

III. 4D SEGMENTATION
In this section, we consider a series of 3D MR images captured from the same subject. The temporal consistency of segmentation of a series of 3D MR images can be improved by considering the segmentation problem in an appropriate 4D formulation. Although a series of 3D images can be stacked as a 4D data, we cannot simply extend a 3D segmentation formulation to 4D by equally treating spatial and temporal dimension in the formulation of 4D segmentation, due to the change of the image properties (e.g. histogram, contrast and intensity inhomogeneity) over time. Therefore, the extension of a 3D segmentation method to 4D is not a trivial task. The difficulty can be seen clearly from the model of serial MR images described next, which must be taken into account in the formulation of 4D segmentation.

A. MODEL OF SERIAL MR IMAGE
According to a commonly accepted model in Eq. (1), a series of observed MR images I t (x) can be modeled as: where t in the subscript represents the scanning time, J t (x) is the unknown true image, and B t (x) is the unknown bias field, and N t (x) is noise. The true image J t (x) can be approximated by: where c i (t) is the true intensity mean (TIMs) of the voxels in the i-th tissue, and u (t) i (x) is the membership function of the i-th tissue at time point t. Since this series of images are captured from the same subject, there is a certain temporal consistency of the membership functions u (t) i at different time points, which will be taken into account in our formulation of 4D segmentation.
Note that the bias fields and the TIMs of the images at different time points are not related to each other. They may vary over time, and thereby lead to the variability of the images in a longitudinal series, which renders it difficult for a 3D method to achieve temporally consistent segmentation.

B. FORMULATION OF 4D SEGMENTATION
Our 4D segmentation method is formulated on a series of MR images I t (x) with t denoting the scanning time. We assume that this series of images is registered to the first image in the series. From I t (x), we need to find the membership functions u Our 4D segmentation goal is to obtain an optimal membership function U(x, t) with temporal consistency in some sense. We use an energy minimization approach to find the optimal membership function U(x, t). The vectors W(t) and c(t) at all time point t are jointly computed by minimizing energy in our energy minimization framework.
Given a series of images I (·, t), we define an energy: (14) in which G is an extension of the energy F is defined in Eq. (7). Let R(x, t) be the vector R(x, t) = (r 1 (x, t), · · · , r N (x, t)) T with r i (x, t) = (I (x, t) − W T (t)G(x)c i (t)) 2 . Then, by writing y = (x, t), the functions r i (x, t) can be expressed as r i (x, t) = r i (y). Thus, given the vector R(x, t), the above energy G can be simplified as: which is in a 4D formulation. For the computation of the membership function U for given vectors c(t) and W(t), we can directly work with the energy F in the form of Eq. (15). To compute the vectors c(t) and W(t), we use the energy G in the form of Eq. (14). The 4D formulation energy in Eq. (15) still does not take into account the temporal consistency. We will add a temporal regularization term P to define an energy E by: with λ > 0 as the weight of the temporal regularization term. According to the model of serial MR images in Section III-A, the vectors c(t) and W(t) do not have temporal consistency. Therefore, the computation of the vectors c(t) and W(t) is not subject to constraint by the temporal regularization term P. Thus, we can first describe the minimization of the energy E with respect to the vectors W and c, which is equivalent to the minimization of the energy G with respect to W and c.

1) OPTIMIZATION OF W AND c
Minimization of the energy G with respect to the variable W can be achieved by solving the equation ∂G/(∂W(t)) = 0, which yields the following solution: thus, the bias field is estimated by: Minimization of the energy G(U, c, W) with respect to c yields an optimal TIMs c(t) = (c (t)

2) OPTIMIZATION OF 4D MEMBERSHIP FUNCTION
As same in the 3D formulation, we define: (20) which are represented by the function R(x, t) = (r 1 (x, t), · · · , r N (x, t)) T . By writing y = (x, t), the function r i (x, t) can be further expressed as R(y) = R(x, t). From this 4D function R(y), we will compute the 4D membership function U(y) = U(x, t).
As described in Section II-B, the function R(x) = R(x, t) at each time point t determines the membership function U given by Eq. (10) as a segmentation result for image 3288 VOLUME 8, 2020 I t = I (·, t), which is derived as the minimizer of the energy F(U, R) with U = R(·, t) and R = U(·, t). However, such membership function is obtained for each time point t individually, which may not be a temporally consistent segmentation result.
In order to achieve temporal consistency of segmentation, we consider the energy: where the functional G is defined by Eq. (8) and P(U) is the regularization term on the 4D membership function U, defined by: ∂U ∂t k dy (22) Note that the above constraint U(x, t) ∈ U is pointwise, while the energy is defined as an integral over the entire domain. Directly solving such constrained energy minimization problem on the membership function is complicated. Additional parameters have to be introduced in the numerical scheme for solving the above pointwise constrained energy minimization problem, which needs to be tweaked for different data. As a result, the numerical scheme is computationally expensive and the algorithm is sensitive to the choice of the additional parameters.
To circumvent these difficulties, we modify the above energy by introducing an efficient and robust numerical scheme, which can provide soft segmentation results. The basic idea is to transfer the regularization on the membership function U to the regularization on the function R. To that, we need the following inequality: where β > 0 is a constant. This inequality will be verified in Section III-C. The regularization on U is achieved by forcing the energy P(U) to be small, which, in view of the above inequality, can be achieved by forcing the energy P(R) to be small. Therefore, we first find a regularized function R? that minimizes the following energy: Then, we minimize the energy G(U, R) with respect to U. As a minimizer of the energy ε 1 (R, R) the function R? has low energyP(R). The minimization of G(U, R, R), with respect to U yields an optimal membership function, denoted by U = (r 1 , · · · , r N ) T , which is given by Eq. (10) with r i replaced by r i . This function U, by the inequality P(U) ≤ βP(R), has a low energy P(U), which satisfies the desired temporal regularity of the membership function.

3) IMPLEMENTATION
It is clear that the energy is a convex function in each of the variables. The minimization of G is solved in an iterative process of interleaved minimization with respect to each variable. The whole process can be described by the following steps: Step 1. Initialize the membership functions U by random matrices where each entry follows a uniform distribution in [0, 1], and initialize c.
Step 4. Update membership function U by Eq.(20) Step 5. Check convergence criterion |c n − c (n−1) | < . If convergence has been reached or the iteration number exceeds a prescribed maximum number, stop the iteration, otherwise, go to Step 2.
In the above iteration process, each of the variables is updated with the other variables computed in the previous iteration, and we set the in Step 5 to 0.001, and n denotes the n-th iteration.

C. TEMPORAL REGULARIZATION OF U AND R
The Eq. (10) defines a mapping from the space of R to the space of membership vector U. Denoting the Jacobian matrix of φ as J φ , it can be inferred that the membership function U(x, t) that minimizes the energy G(U, R) is given by U(x, t) = φ (R(x, t)). Therefore, we have the following relationship: It then follows that: where β = (sup R ||J (R)||) k and sup() means supremum. This inequality shows that the regularization of U can be achieved by regularization on R.

A. MR DATASET DESCRIPTION
Our method has been tested on the Baltimore Longitudinal Study of Aging (BLSA) data [5]. The BLSA study is a longitudinal study of aging and early markers of Alzheimer's Disease. Researchers measure physical and cognitive changes associated with aging in real-time among a dedicated group of BLSA participants who come in for testing at regular intervals throughout their lives. In this experiment, 5 sets of longitudinal 3D T1-SPGR MR brain images of healthy older adults from BLSA dataset are selected in the experiments, each set of MR images contain 11 serial scans where they were obtained during 11 consecutive years.

B. RESULTS
To demonstrate the temporal consistency of the segmentation results of our method, we compare it with a 3D segmentation method, called FAST, implemented by a popular software FSL based on the work of Zhang et al. [7]. The FSL VOLUME 8, 2020 software is publicly available in the Analysis Group, FMRIB, Oxford, UK. 1 We first show the results of an experiment with the BLSA data. Fig. 1 shows the segmentation results for the images acquired in years 5, 6, and 7 (shown from left to right on the top row) obtained by applying FAST to each image individually and our 4D segmentation method to the 11 images jointly.
The middle row and the bottom row show the results of FAST and our method, respectively. It can be clearly seen that the results of our method for these three images are temporally more consistent than the results of FAST. To see this more clearly, we enlarge a portion of the images in Fig. 2 (marked by the red circle) and show the enlarged images in Fig. 2.
We can see multiple locations where the segmentation results of our method are more consistent than those of FAST. One of such locations is marked with a red arrow in Fig. 2 as an example.
To quantitatively evaluate the segmentation results, we use two metrics to evaluate the temporal consistency of the segmentation results. The first metric is the temporal consistency (TC) factor proposed in [10], which is defined as: 1 http://www.fmrib.ox.ac.uk/fsl/ where Y is the number of scans. The TC factor is always between 0 and 1. A larger TC factor indicates a better temporal consistency, and vice versa.
Each voxel x in the region of interest is labeled as WM, GM, and CSF as the result of tissue segmentation. We denote by k i the number of label changes of corresponding voxels across time, then the segmentation of the corresponding voxels considered to be consistent if k i is small, and vice versa.    Table 1 shows the TC factors of our method and the compared state-of-the-art methods [7], [14], [26], [27] for the five subjects. The TC factors of our method are all higher than those of algorithms, therefore, it is reflected that the proposed method can enforce better temporal consistency than those compared methods.
By definition, the TC factor is in favor of the segmentation results that are the same for all the images in the longitudinal series, even though the brains usually change over time.
In this paper, we propose a different metric to evaluate the temporal consistency, with consistent changes of the label assigned to corresponding voxels. This metric is evaluated by counting the voxels that are labeled incorrectly at three consecutive time points t 1 , t 2 , and t 3 . We know that, due to the aging of the subjects or some pathological changes, the brains would shrink and deform gradually. Therefore, when a voxel x is labeled as a tissue A (e.g. GM) at time point t 1 , it could be labeled as a different tissue B (e.g. WM) at the time point t 2 , due to the changes of the brain. At the third time point t 3 , this voxel would be labeled as either tissue B or back to A, while the latter case is not supposed to occur when the segmentation algorithm performs well. Otherwise, this voxel is labeled back as tissue A again at the time point t 3 as at the time point t 1 . We call such label change as a label reverse. We assume that the label reverse is largely due to the result of inconsistent segmentation. We count the number of voxels that undergo label reverse in three consecutive time points, and compute the percentage of such voxels, which is referred to as a label reverse (LR) rate.
A temporally consistent segmentation result should have a low LR rate. The LR rates our method and the FAST algorithm for the five subjects for time points t 1 = 4, t 2 = 5, and t 3 = 6 are plotted in Fig. 4(a), while the LR rates for t 1 = 5, t 2 = 6, and t 3 = 7 are plotted in Fig. 4(b). Clearly, the LR rates of our method are all below 0.6%, which are significantly lower than those of the FAST algorithm, which are all above 2%. With more consistent segmentation results of our method, the brain tissue volume can be measured more precisely and stably over time. We use an open-source software platform called 3D-Slicer [31] to calculate the tissue volume. This is demonstrated by plotting the volumes of the WM and GM measured from a series of 11 images of a subject by using our method and a 3D segmentation algorithm, as shown in Fig. 5. It can be obviously observed from this figure that the results of our method show a more stable decreasing of the WM and GM volume over time, compared with the result of the FAST algorithm.    To evaluate the robustness to inhomogeneity of our segmentation model, we randomly select some synthesized data that are obtained from a pre-computed simulated brain database BrainWeb. 2 The parameter settings are fixed to 3 levels of noise, and 3 levels of intensity non-uniformity, Table 2 shows the detail of the used data. Fig. 6 shows the robustness of our method with data with varying inhomogeneities. Fig. 7 shows the bias correction and segmentation result. And Fig. 8 shows our energy function minimization, after about 5 iterations, the energy changed to be steady, which shows that our energy function can converge quickly.

V. CONCLUSION
We propose a robust and efficient temporally consistent segmentation method of longitudinal MR data. In our method, tissue segmentation result is represented by the membership functions of the tissues, which are derived as a result of minimizing energy in a fuzzy C-means (FCM) framework. Our method does not require pre-processing for intensity inhomogeneity correction and intensity normalization of the series of input images. There are very few parameters in our method, and it is easy to tune the parameter to achieve desirable results. Our method is robust, efficient, and fully automatic, with stable performance and sufficient accuracy, which have been verified by experiments on a set of real MR data.