SECTION I

BRAIN disorders are major contributors to morbidity, disability, and premature mortality in many developed and developing countries worldwide [1]. Every year, over one-quarter of adult Americans are diagnosed with a mental illness, such as Major Depressive Disorder (MDD), Post-Traumatic Stress Disorder (PTSD), schizophrenia, and Alzheimer's Disease [2]. Moreover, every year a third of the EU's population is diagnosed with mental disorders [3]. The number of patients becomes even higher if neurological disorders, such as epilepsy and dementia, are also taken into account. Apart from making life difficult for patients on a personal level, brain disorders have a considerable societal and financial cost. In this respect, improved prevention and treatment of brain disorders is a key-issue, and could alleviate healthcare costs. Assessing the structural integrity of the hippocampus (HC), which is a structure of the limbic system (Fig. 1), is an essential step toward this, due to its implication in such disorders. Dysfunction and neurodegeneration of HC plays a fundamental role in the development of various brain disorders. Many studies support that altered HC volume and connectivity represents a specific endophenotype; indicatively in schizophrenia [4], [5], [6], in first-episode schizophrenic patients [7], in bipolar disorder [8], [9], [10], in epilepsy [11], in Alzheimer's [12], [13], in Mild Cognitive Impairment [14], [15], in dementia associated with multiple sclerosis [16], [17], and in Down's syndrome [18]. Hence, HC morphology alterations have shown the capacity to be potentially used as a biomarker in decision making systems regarding various brain disorders. Thus, HC morphometry is a potentially powerful tool for many diseases diagnosis, prognosis and monitoring. However, apart from the appropriate evidence and widespread agreement of the usefulness of HC volumetry, its establishment as a biomarker requires that it can be measured with appropriate accuracy and reproducibility [19].

Traditionally, HC structural assessment has been based on manual or semi-automated segmentation from MRI scans. However, time-constraints posed by those methods, largely as a result of the vast amount of data produced by MRI, rater bias and cost, constitute the major obstacles in the effective, large-scale morphological study of HC. Therefore, reliable automatic HC segmentation offers a valuable clinical tool, already showing its usefulness. Recently, in a large-scale, genome-wide association meta-analysis of hippocampal, brain, and intracranial volume [20] automated hippocampal volumetry has successfully enabled the discovery of novel genes associated with hippocampal volume in schizophrenia. In [21] it is clearly stated that the aim is fostering the use of hippocampal volumetry in routine clinical settings regarding Alzheimer's, which requires standardization firstly on the segmentation protocol (given the variety of protocols), and secondly on the automatic segmentation method. Similarly, a collaborative initiative on Alzheimer's between Europe and USA (EADC-ADNI) plans the adoption of HC volumetry as a new diagnostic criterion of Alzheimer's and in therapeutic trials [19]. Thus a roadmap has been defined in order to establish HC volumetry as an Alzheimer's biomarker, which requires reliable automatic segmentation. Once HC volumetry is established, every day clinical practice would then require subjective and highly accurate HC segmentation for proper and reliable disease diagnosis (potentially within a decision support system), monitoring, and treatment evaluation, which can even lead to drug discovery. Given the existing evidence of HC alterations in other disorders (e.g., schizophrenia, bipolar disease, epilepsy, etc.), similar actions are foreseen for these cases too.

Several methods have been proposed for automatic HC segmentation. However, it remains a very challenging task. Results in literature report that HC is among the brain structures for which the segmentation accuracy of automatic segmentation methods is lower compared to other brain structures [22], [23], [24]. Automatic segmentation methods of deep brain structures, such as the HC, can be broadly divided into three major categories: (1) atlas-based techniques, (2) deformable models and (3) active appearance models.

Atlas and multi-atlas based methods imply non-rigidly registering one or multiple (in the case of multi-atlas techniques) training images to the target image using some similarity measure. The labeled image(s) of the training image(s) are subsequently propagated to the space of the target image using the calculated wrapping fields, to offer the final segmentation. In the case of the multi-atlas methods an extra step is required; fusing the transformed labeled images or a selected subset of them. In literature, there exists a substantial amount of variations of the multi-atlas concept [23], [25], [26], [27], [28]. These methods mainly differ in the registration method, the way for selecting a subset of the training labeled images before fusion, as well as the label fusion approach that is followed. A recent workshop offered comparative evaluation among state-of-the-art multi-atlas techniques [29]. In total, 25 algorithms entered the challenge and their performance was evaluated on a publicly available dataset (abbreviated as OASIS-MICCAI dataset in this work). Among the various multi-atlas methods that were evaluated, the ones that reached higher accuracy in terms of Dice similarity coefficient [30] are the joint label fusion technique proposed by Wang *et al.* [31] and the Non-Local STAPLE proposed by Asman *et al.* [32]. The first one is based on joint label fusion combined with the bias correction [33], and was proved to be the top performer of the challenge. The Non-Local Staple method is a statistical fusion technique using the non-local means framework.

In the second category, popular examples of deformable models are the Active Contour Models (ACM). ACMs allow a contour to deform iteratively to partition an image into distinct regions according to the image gradients (edge-based ACM), or the intensities' statistical information (region-based ACM). ACMs when used in combination with an implicit representation of the object of interest, have proved to be powerful tools in image segmentation. One popular example of edge-based methods is the Geometric Active Contour model (GAC) [34], whose evolution is terminated when “strong” edges are encountered. On the other hand, region-based ACMs use statistical information regarding the distribution of intensities, inside and outside the contour, making them less sensitive to image distortions, as well as to the “leakage” effect. The Chan-Vese model [35], that is based on the Mumford-Shah segmentation framework [36], is one of the most widely used region-based ACMs to detect objects whose boundaries are not defined by strong edges. However, the sole use of regional information can lead to erroneous segmentation results in the case of objects with well-defined boundaries due to the lack of boundary terms. To tackle the problems posed by the use of solely region or edge-based information, hybrid approaches have been proposed [37].

Nonetheless, ACMs solely depend on image information. Thus, their drawback is the lack of anatomical knowledge about structures undergoing segmentation. This limitation, which can be overcome through modeling and integration of prior knowledge of anatomical structures into the segmentation framework, has become a key-issue in medical image analysis. One of the earliest and most influential works towards this direction was that of Cootes *et al.* [38] who incorporated into the ACM framework global shape constraints learned by means of Principal Component Analysis (PCA). They named their method as Active Shape Models (ASM) to avoid confusion with traditional ACM. In Leventon *et al.* [39], a non-parametric, intrinsic model based on the implicit representation of the shapes was constructed and incorporated into a GAC segmentation framework. The same approach was later adapted by Yang *et al.* [40], to build a statistical neighbor prior, able to constraint the segmentation based on neighborhood properties between adjacent structures. Yang *et al.* incorporated the constructed models into a region-based ACM framework to achieve simultaneous segmentation of neighboring structures. The aforementioned ASMs, however, are modeling global shape-prior knowledge through PCA, thus cannot account for local shape variations.

Incorporation of texture in the ASM framework led to Active Appearance Models (AAMs) [41], [42], [43], [44]. AAMs use PCA-based linear subspaces to model variation of both shape and texture information from a training set. The initial concept of AAM required the identification of landmarks. In an effort to overcome this issue, the integration of level-sets in the AAM framework was proposed by Hu *et al.* [45], [46]. In Toth *et al.* [47], a multi-feature landmark-free AAM was presented. Other interesting approaches extending the initial idea of AAMs include combination of the AAM with patch-based label fusion [48], and the Bayesian appearance model [49]. Despite its advantages, such as fast performance, AAM is a local search technique and thus requires good initialization [50].

All the aforementioned active models focus on modeling and integrating prior information, rather than optimally balancing the degree of local influence that the prior knowledge and image information should have at a voxel level. Global weighting, which implies consistent boundary properties, has traditionally been used. However, this hypothesis is not true in some challenging cases, and thus is removed within this work. In fact, HC has spatially varying boundary properties, demonstrating clear, blurry and even missing borders. Towards this direction, we recently developed a local weighting scheme [51] to improve the weighting between the image and the prior term. A local weighting map, called Gradient Distribution on Hippocampus Boundary map (GDHB), was built based on the learned gradient values across the boundary of the HC.

Based on the same concept, in [52] we defined an Optimal Local Weighting map (OLW), via an optimization procedure. The optimization criteria are designed to generate the most accurate segmentations for a set of training images, given the corresponding ground-truth segmentations. The training OLWs are adapted on the test image and fused, to generate the OLW values of the latter. OLW is subsequently incorporated into an ACM framework defined by two energy terms, the region-based image term and the prior term. The efficacy of this concept was validated through experiments on the central sagittal slice of HC.

Hereby, the concept of OLW is further extended and a fully automatic subject-specific segmentation framework is proposed that models the local properties of HC by making use of a complete set of Optimal Local Maps (OLMs), applying it on 3D MR images. The OLMs produced are incorporated into a hybrid ACM framework, which includes three energy terms: a region-based term, an edge-based term and a prior term. The prior term, which is a label spatial distribution map, is built based on a straightforward multi-atlas framework. Thus, the proposed method is a mixture of the multi-atlas concept together with the ACM framework, in which OLMs are used to locally blend the energy terms.

OLMs refer to three different 3D maps: (i) a map that applies the local weighting between the prior energy term and the image derived terms $(W_{1})$, (ii) a map that locally balances the contribution between the region- and the edge-based term $(W_{2})$, and (iii) a map that controls locally the time step used in the evolution of the level set $(S)$. To the best of our knowledge this is the first work to define the purpose and extraction of such three maps. All parameters included into the hybrid ACM model are calculated during training through an optimization procedure, avoiding heuristic parameter fine-tuning, ensuring optimal contour evolution based on the captured HC boundary and shape properties. The advantage of the proposed scheme, is that the ACM based on OLMs allows for capturing of fine details. Furthermore, the multi-atlas concept is utilized, since it can naturally incorporate the training OLMs, treating them as extra atlas modalities, thus constructing multi-fold atlases, i.e the atlas image, its label image and the corresponding OLMs (left part of Fig. 2).

The proposed algorithm was tested on three different datasets and demonstrated its appropriateness to be used as a supplementary technique to the multi-atlas methods.

SECTION II

Three datasets were used in the context of this work:

The OASIS database [53] consists of T1-weighted MR image volumes acquired with a 1.5T Vision scanner, produced by averaging four scans of the same individual, offering images with reduced noise levels. The MR image volumes were resampled to produce images with resolution $1.0~{\rm mm}\times 1.0~{\rm mm}\times 1.0~{\rm mm}$ and were spatially warped into the Talairach space. The size of the volumes is $176\times 176\times 208~{\rm voxels}$. The database is very large (416 subjects), but no manual segmentations are available. As a result, a subset was chosen so as to cover the entire age span of the subjects and to include subjects with different degrees of dementia. A professional radiologist provided us with manual segmentations of HC, which we offer publicly to the research community. ^{1} The selected subset consists of 23 right-handed subjects (13 females and 10 males) with ages ranging from 18 to 96 years old. Among them, 2 subjects have a Clinical Dementia Rating scale (CDR) equal to 0.5, indicating very mild dementia while 2 subjects have CDR scale of 1, indicating mild dementia.

The manual protocol followed for the segmentation of the OASIS dataset is a close variant of the protocol used in the study of Narr *et al.* [54]. This protocol is an adaptation from existing protocols [55], [56], [57], [58], [59], [60] and defines HC as a homogeneous gray matter structure. However, it should be noted the current discussion on whether to include or not non-gray matter parts in the hippocampal formation [61], [62]. The non-gray matter parts under discussion to be included in hippocampus are the alveus and fimbria.

The IBSR dataset is provided by the Center for Morphometric Analysis at Massachusetts General Hospital.^{2} It contains T1-weighted MR image volumes of various image resolutions (from $0.8~{\rm mm}\times 0.8~{\rm mm}\times 1.5~{\rm mm}$ to $1.0~{\rm mm}\times 1.0~{\rm mm}\times 1.5~{\rm mm}$) from 18 subjects. The volumes have been spatially normalized into the Talairach orientation (rotation only). The subjects' age varies from youngsters, of less than 7 years of age, to older people of 71 years. Among the subjects, 4 of them were female, while the rest 14 subjects were male. Volumes' size is $256\times 256\times 128~{\rm voxels}$.

The IBSR repository further offers the corresponding manual segmentations of HC. The manual segmentation protocol followed is described in Makris *et al.* [63] and regards HC as a homogeneous gray matter structure.

The OASIS-MICCAI dataset is also a subset of the OASIS database, that was used in the recent evaluation workshop of MICCAI 2012 “Workshop on Multi-Atlas Labeling” [29]. Of the 35 MR image volumes in the subset, 15 were used for training and 20 for testing. The average age of the subjects in the training set is 23 years old (ages ranging from 19 to 34 years old), while in the testing set the average age is 47.5 years old (ages ranging from 18 to 90 years old). Both training and testing datasets contain female and male subjects; 10 MRIs from females and 5 from males, and 12 from females and 8 from males, are included in the training and testing sets, respectively.

Manual segmentations for the dataset are provided by Neuromorphometrics, Inc.^{3}under academic subscription. The manual segmentation protocol^{4} used for HC segmentation includes also white matter parts (fibria/alveus) in the hippocampal region.

The key point of the proposed method is the blending of different types of information in a hybrid ACM framework through the incorporation of a set of subject-specific Optimal Local Maps (i.e., $W_{1}$, $W_{2}$ and $S$) to be used on top of the multi-atlas concept. The level set evolution depends on three energy terms: the edge-based term, the region-based term and the prior term. The latter is built by the subject-specific spatial distribution of labels map $L$ offered through multi-atlas. In this scheme, $W_{2}$ is used to balance the contribution of each image-derived energy term. Hence, in the presence of strong edges $W_{2}$ weights more the edge-based term on that region, while in low gradient regions the region-based term is trusted more. Similarly, $W_{1}$ balances between the combined image terms and the prior term. Thus, the prior term is used in regions where the image information is not sufficient to drive the segmentation in the right direction. $S$ aims to control the time step for the level set evolution, defining smaller time steps when the level set is close to convergence. Vice versa, $S$ takes higher values on homogeneous regions, where the evolving contour is far from the actual boundary, to speed up evolution and convergence.

As Fig. 2 shows, once the OLMs and ACM parameters of the atlases are extracted, the atlases are registered to the test image. The fusion step follows, to extract the subject-specific OLMs, the spatial distribution label map $L$, as well as the ACM parameters. Then, the ACM evolution starts evolving a contour both on the MR image and on $L$, and optimally blending their outcomes. For the initialization of the ACM evolution, the region with the most likely voxels to belong to hippocampus (defined as the regions of $L$ with the highest values) is used.

The first step towards modeling prior information is to build $L$, which offers information about the spatial distribution of the structure's labels. In this work, we investigated two different fusion techniques for the construction of $L$. The first one is a simple multi-atlas fusion concept based on a global weighted average technique. Thus, given a set of $n$ training images $I_{i}$, $i=1,..,n$ and their corresponding labeled images $L_{i}$, as well as the wrapping fields calculated by non-rigidly registering $I_{i}$ to the target image $I$, $L$ is provided as: TeX Source $$L=\sum_{i=1,..,n}{s_{i}}\cdot{F(L_{i})}\eqno{\hbox{(1)}}$$ where $F$ represents the wrapping process of the labeled images $L_{i}$ to the space of $I$, and $s_{i}$ stands for the similarity between the registered training image $I_{i}$ and the target image $I$ expressed by means of cross-correlation. All $s_{i}$ are normalized so that $\sum\limits_{i=1,..,N}s_{i}=1$.

The tasks of non-rigid registration and similarity calculation are performed with the ANTs toolkit. More precisely, the symmetric normalization methodology (SyN) [64] is utilized, which is based on optimizing and integrating a time-varying velocity field. The instructions in Klein *et al.* [65] were followed to choose the similarity metrics and the velocity field regularization.

The second fusion technique that was investigated for the construction of $L$ is based on the recent joint label fusion technique ^{5} which was proposed by Wang *et al.* [31]. According to this approach:
TeX Source
$$L_{joint}(x,y,z)=\sum_{i=1,..,n}{w_{i}(x,y,z)}\cdot{F(L_{i}(x,y,z))}\eqno{\hbox{(2)}}$$ where $w_{i}$ refer to the voting weights (3D matrices of size equal to the size of $L_{i}$) calculated by the use of the joint label fusion technique, and are subject to $\sum\limits_{i=1,..,n}w_{i}=1$.

The reason for investigating two different multi-atlas fusion techniques is two-fold. Firstly, the multi-atlas concept is crucial for the performance of the proposed method as it offers $L$, which is used to provide the prior term. Secondly and most important, the proposed method is a locally weighted ACM on top of the multi-atlas concept and it is designed to improve the multi-atlas result no matter how accurate the latter is. By incorporating one of the most accurate multi-atlas methods according to the results from the MICCAI 2012 workshop, it is possible to demonstrate the appropriateness of the proposed method to work as a complementary method even to sophisticated and highly accurate multi-atlas methods.

Let $C$ denote the evolving curve, which is implicitly represented as the zero level set of a signed distance function $\phi$. The evolution of the curve $C$ is driven by the image terms and the prior term. Hence, by introducing the local weighting maps $W_{1}$ and $W_{2}$ the contour update equation is defined as: TeX Source $${{\partial\phi}\over{\partial t}}=W_{1}\circ{{\partial\phi_{image}}\over{\partial t}}+({1}-W_{1})\circ{{\partial\phi_{prior}}\over{\partial t}}\eqno{\hbox{(3)}}$$ where the operation °denotes the Hadamard product, and TeX Source $${{\partial\phi_{image}}\over{\partial t}}=W_{2}\circ{{\partial\phi_{E}}\over{\partial t}}+({1}-W_{2})\circ{{\partial\phi_{R}}\over{\partial t}}\eqno{\hbox{(4)}}$$ where $\phi_{R}$, $\phi_{E}$ and $\phi_{prior}$ correspond to the evolving contours based on the region-based $(E_{R})$, the edge-based $(E_{E})$ and prior $(E_{prior})$ term, respectively. Finally, by introducing $S$ in the level set framework, the evolving shape at iteration $k$ is calculated by: TeX Source $$\phi_{k}=\phi_{k-1}+S\circ{{\partial\phi_{k-1}}\over{\partial t}}\eqno{\hbox{(5)}}$$

The derivation of $\phi_{R}$, $\phi_{E}$, $\phi_{prior}$ and based on them the final form of the evolution equation ${{\partial\phi}\over{\partial t}}$ can be found in the Appendix section.

Another means of capturing prior knowledge is the modeling of the varying boundary properties of HC, through the construction of local blending maps (OLMs), that define at voxel level which energy terms are to be trusted more for accurate segmentation results.

Graph cuts [66] have been widely used in computer vision in various problems, whose solution can be found through discrete pixel labeling. Graph cuts require formulating the pixel labeling in terms of energy minimization, assuming that the minimum energy solution corresponds to the maximum a posteriori estimate. Hereby, the Maxflow algorithm, introduced by Boykov and Kolmogorov [67], is used to minimize two energy functionals $E({f})$(Fig. 3). These allow the calculation firstly of the training OLMs and secondly of the ACM parameters for the $n$ training images, i.e., $W_{1i}$, $W_{2i}$, $S_{i}$, and $\lambda_{1i}$, $\lambda_{2i}$, $i=1,\ldots,n$.

Minimizing the two energy functionals requisites that $W_{1i}$, $W_{2i}$, $S_{i}$ and $\lambda_{1i}$, $\lambda_{2i}$ are defined such as the image and prior terms that drive the evolving level-set will force it to move towards the corresponding ground-truth level set, despite its initial position. Thus, by imposing minimum difference between the needed move of the level set and the ground-truth zero level set extracted by the label image, and repeating the procedure until convergence of the level set, the curve will approach and finally fall onto the ground-truth contour. Once convergence is accomplished, the training OLMs and ACM parameters are defined as the average from all iterations. More details can be found in the Appendix section.

The focus of this section is on creating a subject-specific segmentation framework that accounts for the subject's anatomy. Towards this aim, a multi-atlas concept is used to produce subject-specific OLMs and ACM parameters for the target image. Attention should be paid to the fact that in the context of this section, as an atlas we consider the coupling of an anatomical image with its corresponding OLMs.

As mentioned in Section II-C, each anatomical image in the training set $I_{i}$, $i=1,\ldots,n$ is non-rigidly registered to the test image. The resulting transformations are used to propagate the training OLMs to the space of the target image (Fig. 4). Denoting the wrapping procedure as $F$, the resulting local maps and ACM parameters are combined according to the similarity $s_{i}$: TeX Source $$\left\{\matrix{W_{1}\cr W_{2}\cr S\cr\lambda_{1}\cr\lambda_{2}\cr}\right\}=\sum_{i=1,..,n}{s_{i}}\cdot\left\{\matrix{F(W_{1i})\cr F(W_{2i})\cr F(S_{i})\cr\lambda_{1i}\cr\lambda_{2i}\cr}\right\}\eqno{\hbox{(6)}}$$

SECTION III

The performance of an algorithm is potentially affected by the scanner type, imaging conditions, demographic characteristics and even by the quality of manual segmentations and the segmentation protocol used. To overcome these limitations and achieve fair comparisons, we evaluated the proposed method (abb. OLM-ACM when weighted average fusion is used for building the spatial distribution map, and OLM-ACM_Joint when the sophisticated joint label fusion scheme is incorporated) in 3D MR images using three different datasets that vary in terms of the aforementioned characteristics.

To assess the behavior of the proposed methodology on HC segmentation, experiments were conducted to evaluate its performance through comparison with other methods. For the IBSR and OASIS datasets, the broadly-used leave-one-MRI-out procedure was followed, in order to offer fair comparison with the published results of other methods. For the OASIS-MICCAI dataset we followed the evaluation protocol of the MICCAI challenge to enable a straightforward comparison of our results, where 15 MRIs were used for training and 20 for testing. The Dice similarity coefficient $(D)$ is used in all datasets as a segmentation performance measure due to its popularity and importance in evaluating and comparing the performance of segmentation methods. $D$ is given by: TeX Source $$D={{2\vert{\mathhat{H}}\cap H\vert}\over{\vert{\mathhat{H}}\vert+\vert H\vert}}={{2\cdot Pr\cdot Re}\over{Pr+Re}},\qquad D\in [{0,1}]\eqno{\hbox{(7)}}$$ where $Pr$ and $Re$ stand for Precision and Recall respectively. $D=0$ indicates no overlap between the actual $(H)$ and the estimated volume $({\mathhat{H}})$, while $D=1$ indicates perfect agreement.

Due to the absence of published results in our OASIS dataset, a state-of-the-art AAM segmentation algorithm [50] offers a valuable indication on the expected Dice values. The implementation of the latter is publicly available.^{6} In addition, the proposed method's performance is compared with that of the corresponding multi-atlas method on which the spatial distribution map relies on. More precisely, the ACM framework based on OLMs that uses either the weighted averaging for building the prior term (OLM-ACM), or the sophisticated joint label fusion (OLM-ACM_Joint), are compared with Multi-atlas and Multi-atlas_Joint methods, which are produced by applying majority voting on $L$ and $L_{joint}$, respectively. This comparison actually reveals the contribution of the proposed methodology on top of the multi-atlas, regardless of the fusion technique. It should be noted that the Multi-atlas_Joint method is our reproduction (using the publicly available tools of ANTs toolkit and joint label fusion) of the method proposed by [31] and abbreviated as ‘PICSL_Joint’ during the MICCAI 2012 workshop. PICSL_Joint ranked 3rd in the challenge, while when combined with bias correction reached the first place. Moreover, segmentation results using a hybrid ACM based on global weighting for blending the edge, the region and the prior term were produced. As in the case of the proposed method, the performance of the hybrid ACM method using two different approaches for building the prior term is evaluated. In the first case the prior term is build using $L$ (ACM), while in the second the $L_{joint}$ is used instead (ACM_Joint).

The resulting mean Dice similarity coefficient and the corresponding standard deviations for all experiments are presented in Table I. Comparing the OLM-ACM with the Multi-atlas method, and the OLM-ACM_Joint with the Multi-atlas_Joint method respectively, an improvement of 1–2% can be observed (p-values from paired t-test are 0.045 and 0.047, respectively). This demonstrates that the multi-atlas approach still leaves space for improvements that the proposed methodology takes advantage of by combining both image (edge and intensity) and prior information in an optimal way. Comparing the resulting Dice similarity coefficients by means of the ACM method and the ACM_Joint approach (0.79 and 0.84) with those achieved when using the Multi-atlas and Multi-atlas_Joint methods (0.80 and 0.84 respectively), no improvement can be seen. This suggests that the hybrid ACM with global weighting is insufficient for improving the multi-atlas result. It is important to note that in order to find the adequate ACM parameters for the hybrid ACM without OLMs, exhaustive heuristic fine-tuning was used and only the best achieved results are presented here. Therefore, the incorporation of the OLMs concept, which uses local weighting and parameters calculated with the use of an optimization procedure, is required in order for the ACM framework to be able to offer improvements to the multi-atlas methods.

A comparison plot of the Dice similarity coefficient for each subject is provided in Fig. 5 with the aim to allow comparisons among the OLM-ACM_Joint, Multi-atlas_Joint and the AAM methods. The Clinical Dementia Rating (CDR) and the age of every subject are also provided in the plot. The subjects have been sorted according to ascending hippocampal volume in an effort to demonstrate the influence of volume in the performance of the methods. Further, comparison plots on additional metrics are also provided, i.e., the precision and recall metrics, the Haussdorff distance and the undirected average difference. The plots clarify that in the OASIS dataset, OLM-ACM_Joint performs better than the Babalola *et al.* approach [50] and Multi-atlas_Joint for most subjects on every metric. It can also be observed that there is a decrease in segmentation performance on older subjects, especially the ones suffering from dementia for all aforementioned methods. However, segmentation performance in those subjects may have been affected by the lack of sufficient amount of similar cases. Apparently, experimenting with a dataset, with only a few problematic cases, is not sufficient for drawing conclusions on a method's behavior in those cases.

Furthermore, the agreement between the automatically and manually segmented volumes was studied with the use of the Bland-Altman analysis (Fig. 6). A high overestimation bias for the AAM method can be observed, while the Multi-atlas_Joint presents an underestimation bias. Furthermore, Multi-atlas_Joint shows a light tendency to overestimate small volumes and to underestimate the large ones. The same tendency can be observed for the OLM-ACM_Joint method. However, the OLM-ACM_Joint method has a much lower bias when compared to the other two methods. This indicates that the segmented volumes, calculated by means of the OLM-ACM_Joint method, are closer to the manually segmented ones. Fig. 7 illustrates segmentation results for 3 different subjects.

In order to validate the performance of the proposed method in the IBSR dataset, the segmentation results produced are compared with the results published over the years on this dataset from state-of-the-art segmentation methods, including various multi-atlas based methods. The resulting mean Dice similarity coefficient and the standard deviation of the methods are presented in Table II allowing a direct comparison among methods. The results indicate that OLM-ACM_Joint outperforms all previously published results. Furthermore, comparing the OLM-ACM with the Multi-atlas and the OLM-ACM_Joint with the Multi-Atlas_Joint presents a consistent improvement of 0.5–0.6% (p-values of paired t-tests 0.03 and 0.042 respectively), as well as smaller dispersion of the resulting Dice similarity coefficients (as demonstrated by the $\sigma$ values). MR images from the IBSR dataset differ significantly from those of the OASIS dataset in terms of imaging quality/varying resolution and scanner types used. Therefore, the improvement in image segmentation for both datasets suggests that the proposed method may be insensitive to differences in scanner type and image quality.

Moreover, the Bland-Altman analysis in Fig. 8 demonstrates an overestimation bias for OLM-ACM_Joint method, while the Multi-atlas_Joint method presents (similarly with the OASIS results) a larger underestimation bias. Furthermore, for the Multi-Atlas_Joint method the tendency to underestimate volumes is stronger for subjects with large HC volumes.

Multi-atlas labeling techniques have gained increased popularity over the past years for the segmentation of brain structures, including the hippocampus. The “Grand Challenge on Multi-Atlas Labeling” at the MICCAI 2012 workshop has provided the scientific community with an insight to the theory and application of current state-of-the-art multi-atlas methods, as well as with a comparative evaluation among them using the mean Dice similarity coefficient. In total, 25 different multi-atlas approaches were presented and validated, while the segmentation masks have been made publicly available. Following the same protocol as in the challenge, the proposed methodology is applied to the challenge's dataset. Please note that around half of the methods, including the three highly ranked, have used the ANTs toolkit for the task of non-rigid registration, as the proposed method does. Thus, a fair comparison is available.

The mean Dice similarity coefficient values obtained by means of all 25 methods, as well as OLM-ACM, OLM-ACM_Joint, Multi-atlas and Multi-atlas_Joint, are provided in Table III. It should be noted that the method ‘PICSL_Joint’ of Table III is actually the Multi-atlas_Joint method. The results demonstrate that the proposed methodology, when combined with the joint label fusion scheme, achieves accuracy of 0.865, with the highest accuracy achieved in this dataset being that of the ‘PICSL_BC’ method [72], which equals to 0.869. However, comparing OLM-ACM with Multi-atlas and OLM-ACM_Joint with Multi-atlas_Joint, it is clear that the application of the proposed ACM framework on top of the multi-atlas concept is beneficial also in this dataset.

Furthermore, Fig. 9 presents comparison plots for the four top ranked methods presented in Table III using four metrics. The Dice similarity coefficient plot shows that, except for the three smallest volumes, no bias between volume size and segmentation performance was observed for any of the methods. Furthermore, the precision-recall diagram demonstrates higher recall values for the proposed methodology, while the rest of the methods demonstrate higher precision. Moreover, the agreement between manually and automatically segmented volumes by means of the four aforementioned methods is indicated using the Bland-Altman analysis (Fig. 10). OLM-ACM_Joint presents a higher overestimation bias than the rest of the methods, while has smaller variation than them.

It is worth mentioning that the proposed concept was tested on datasets that differ in terms of the manual segmentation protocol. As mentioned in Section II-A, the manual segmentation protocol used in the OASIS-MICCAI dataset includes non gray-matter parts in the hippocampal region, while those used in IBSR and OASIS dataset do not. The proposed methodology was designed to work according to manual segmentation protocols that consider HC as a homogeneous gray matter structure. However, for the sake of completeness we wanted to show our performance also in the OASIS-MICCAI dataset, in which apparently our method can not perform in an optimum way; the region based term cannot support the inclusion of white matter, since the vast majority of HC voxels have a darker intensity. However, the proposed method ranked high in all datasets, regardless of the manual protocol used. Thus, the results suggest the potential robustness of the proposed method to the segmentation protocol.

SECTION IV

This paper advocated the incorporation of OLMs into a hybrid ACM, to be used on top of the multi-atlas concept for HC segmentation. OLM-ACM tends to improve segmentation accuracy compared to traditional prior-knowledge and data driven ACM. This is because the latter makes use of the hypothesis of consistent boundary properties and thus applies global weighting to the energy terms. On the contrary, OLM-ACM defines each term's contribution at a voxel level, taking into account the spatially varying properties of boundaries and thus allowing the optimal exploitation of the ACM energy terms. Furthermore, it consistently improves the result of the multi-atlas methods in all three datasets, which demonstrates its efficacy as a supplementary technique to the multi-atlas methods.

$W_{1}$ tends to underline image properties, either edges or statistical differences of intensities, in those regions which are located close to the boundary, by weighting them more. In this respect, $W_{1}$ makes it possible for a level set to converge at an accurate voxel point where the actual boundary is located. The table in Fig. 11 allows us to observe that $W_{1}$ does indeed take its higher values on the boundary, as desired. The sole use of $W_{1}$ is not, however, a sufficient means of achieving optimal segmentation. This is because, apart from the general knowledge where the image term should be trusted more, it is of great significance to determine at voxel level whether the edge or the region term is more trustworthy.

The concept of $W_{2}$ was introduced to tackle this issue. In regions of the boundary where gradients are high, the edge-based term is used more frequently. Vice versa, in regions characterized by a lack of strong edges the region term is preferred to allow the level set to evolve correctly. It should be noted that the use of the region-based term is generally preferred, as HC is a structure with mainly ambiguous boundaries. This is confirmed by Fig. 11 ($W_{2}<0.5$ means more weight on $E_{R}$). In addition, the level set evolution depends on the step used for its evolution. When large steps are used the evolution process is accelerated. However, when the level set is close to the real boundary, the step should be small enough to capture small deformations that are needed to achieve segmentation accuracy. This is why the use of $S$ in ACM methods seems to be of high importance.

Regarding execution time, the major bottleneck is the task of non-rigid registration, included in the multi-atlas procedure. More precisely, the testing procedure involves registering the test image with each training image. This procedure requires $n\times 2~{\rm hours}$ (n is the size of the training set) with ANTs toolkit routines on an Intel Core i7 3.90 GHz computer (using 1 core). The subsequent transformation of the labels and training OLMs to the space of the target image and the calculation of the similarity metrics $s_{i}$ takes on the same computer 5 min, while around an hour is required for the joint label fusion algorithm. The ACM evolution requires only 6 min on average (with the use of un-optimized Matlab code). This means that any burden regarding the computation time needed is due to the registration procedure, since during the testing phase our method increases the computational time infinitesimally comparing with the multi-atlas required time. For this reason, future work will focus on avoiding the task of non-rigid registration. Some first works towards this direction have recently been presented [68], [73], [74]. As far as the training phase is concerned, it is also computational heavy due to the sophisticated and complex nature of extracting the OLMs. On the same computer the training requires $n\times 2.6~{\rm hours}$ on average. However, as any other training procedure, the training is performed only once and is an offline procedure.

The proposed framework was evaluated in three publicly available datasets, none of them equipped with statistics on manual segmentation variations. These offer a good indication of the segmentation task's difficulty in a given dataset, as the goal is to offer less, or even similar, variability than the one observed between different experts. Indicatively, in two recent 3 Tesla HC studies the reported inter-rater variability was 0.91 [75], while in [11] it was 0.832 (with the intra-rater being 0.891, and the automatic segmentation performance 0.844). The results presented in [76] show comparable Intraclass Correlation Coefficient between the automatic and manual volumes (0.898), compared to the inter-rater reliability (0.929). Similarly, [77] reports higher manual-manual (0.63) compared to manual-automated (0.61) HC agreement, while in [78] the difference is much higher in two datasets (inter-rater 0.80 vs automatic 0.77, and inter-rater 0.90 vs automatic 0.75). In [79] two raters were using the same tool to enhance their HC segmentation skills. In a two series experiment both of them managed to raise their intra-rater agreement (from 0.79 to 0.94). Interestingly, once this was accomplished, their inter-rater agreement decreased from 0.68 to 0.57. This could mean that the two raters were doing excellent but different segmentations, and is further suggesting that inter-rater reliability may be a useful indication but perhaps insufficient too.

Overall, the proposed method is an ACM based extension of the multi-atlas methodology. Experimental results on three datasets, with different manual segmentation protocols, demonstrate the efficacy of the proposed method and its appropriateness to be used on top of multi-atlas methods, even the sophisticated ones. Thus, combination of the proposed method with an even better performing multi-atlas based algorithm (such as the PICSL_BC [33]) can lead to further improvements and is inline with our future work. However, results from the OASIS-MICCAI dataset in comparison with those from OASIS and IBSR datasets, show that there exists space for further improvements in datasets for which the manual segmentation protocol followed includes white matter parts in the hippocampal region. In this respect, future work will include investigating ways to assign in $W_{1}$ higher values to the prior term in the alveus/fimbria regions. Given that the multi-atlas based prior knowledge mapped in $L$ will be voting the inclusion of alveus/fimbria, this modification will allow the proposed methodology to perform in an optimum way in such datasets too.

In conclusion, evidence favors the inclusion of HC volumetry in clinical practice, to enhance disease diagnosis, within a decision support system. Hence, actions are envisaged for establishing it as a biomarker. The above highlight the need for automatic HC segmentation methods that can offer as high accuracy as possible. Any improvement that is proved statistical significant could help identifying a more precise and reliable biomarker. The proposed framework demonstrates a supplementary technique to the multi-atlas methods, consistently improving their performance, while slightly increasing the computational cost. It ranked high in three datasets (even in one with a different definition of hippocampus), posing itself as a promising candidate for large-scale experimentation.

Following the level set method [80], in the image domain $\Omega\in{R}^{3}$, we define an evolving curve $C$ implicitly represented as the zero level set of a signed distance function $\phi:{R}^{3}\rightarrow\Omega$ TeX Source $$C=\{(x,y,z)\in\Omega\mid\phi (x,y,z)=0\}\eqno{\hbox{(8)}}$$ where $\phi (x,y,z)<0$ inside the contour $C$ and $\phi (x,y,z)>0$ outside the contour $C$.

The contour update equation based on the local weighting maps $W_{1}$ and $W_{2}$ is defined as: where the operation ° denotes the Hadamard product and TeX Source $$\eqalignno{{{\partial\phi}\over{\partial t}}=&\,W_{1}\circ\left[W_{2}\circ{{\partial\phi_{E}}\over{\partial t}}+({1}-W_{2})\circ{{\partial\phi_{R}}\over{\partial t}}\right]\cr&+({1}-W_{1})\circ{{\partial\phi_{prior}}\over{\partial t}}&{\hbox{(9)}}}$$

The region-based term used is the one presented by Chan-Vese in [35], where the curve is being evolved to minimize the following energy functional: TeX Source $$\eqalignno{E_{R}=&\,\lambda_{1}\int_{\Omega_{1}}\vert I(x,y,z)-c_{1}\vert^{2}dxdydz\cr&+\lambda_{2}\int_{\Omega_{2}}\!\!\vert I(x,y,z)\!-\!c_{2}\vert^{2}dxdydz,\quad(x,y,z)\in\Omega&{\hbox{(10)}}}$$ where $I$ is the target MR image, $c_{1}$ and $c_{2}$ are the average intensities of the regions inside and outside the contour, respectively and $\lambda_{1}$, $\lambda_{2}\geq 0$ are balancing factors for the properties of the interior and the exterior regions of the estimated boundary. Based on $E_{R}$, the evolution equation of the contour driven by the region-based term becomes: TeX Source $${{\partial\phi_{R}}\over{\partial t}}\!=\!\delta_{\epsilon}(\phi)\bigg [\!\mu{\rm div}\Big ({{\nabla\phi}\over{\vert\nabla\phi\vert}}\Big)\!-\!\nu\!-\!\lambda_{1}(I\!-\!c_{1})^{2}\!+\!\lambda_{2}(I\!-\!c_{2})^{2}\!\bigg]\eqno{\hbox{(11)}}$$ where $\delta_{\epsilon}(\phi)$ is the Dirac function, $\nu$ controls the propagation speed, and $\mu\,{\rm div}\left({{\nabla\phi}\over{\vert\nabla\phi\vert}}\right)$ is a regularization term that controls the smoothness of the contour.

The edge-based term is formulated by minimizing the energy functional defined in Caselles *et al.* [34]:
TeX Source
$$E_{E}=\int_{\Omega}g({\bf v})\vert\nabla{\phi ({\bf v})}\vert d{\bf v}\eqno{\hbox{(12)}}$$ where $g$ is an edge stopping function defined as in [34]:
TeX Source
$$g(\vert\nabla (I)\vert)={{1}\over{1+\vert\nabla G_{\sigma}\ast I\vert}}\eqno{\hbox{(13)}}$$ with $G_{\sigma}$ standing for the Gaussian convolution kernel of size $3\times 3\times 3$ and standard deviation 0.5. The contour evolution equation driven only by the edge-based term reads:
TeX Source
$${{\partial\phi_{E}}\over{\partial t}}=\left[g\vert\nabla(\phi)\vert ({\rm div}\left({\displaystyle{{\nabla\phi}\over{\vert\nabla\phi\vert}}}\right))+\nabla g\cdot\nabla\phi\right]\eqno{\hbox{(14)}}$$ where ${{{\nabla\phi}\over{\vert\nabla\phi\vert}}}$ is the regularization term.

The prior term is modeled by applying the region-based ACM on $L$. The selection of the Chan-Vese approach to model the prior term is based on the fact that $L$ is an image with very smooth transitions. Thus, the energy functional is defined as: TeX Source $$\eqalignno{E_{prior}=&\,{\tt v}_{1}\int_{\Omega_{1}}\vert L(x,y,z)-d_{1}\vert^{2}dxdydz\cr&+{\tt v}_{2}\!\!\!\int_{\Omega_{2}}\!\vert L(x,y,z)\!-\!d_{2}\vert^{2}dxdydz,~~(x,y,z)\in\Omega&{\hbox{(15)}}}$$ where $d_{1}$ and $d_{2}$ are the mean values in the regions of $L$ inside and outside $C$. Similarly to (10), ${\tt v}_{1}$ and ${\tt v}_{2}$ are balancing factors for the properties of the interior and the exterior regions, which were set equal to one, since both inside and outside regions are smooth and homogeneous. Based on $E_{prior}$, the evolution equation for the contour driven by the prior term is defined as: TeX Source $$\eqalignno{{{\partial\phi_{prior}}\over{\partial t}}=&\,\delta_{\epsilon}(\phi)\bigg [\mu\,{\rm div}\Big ({{\nabla\phi}\over{\vert\nabla\phi\vert}}\Big)-\nu-{\tt v}_{1}(L-d_{1})^{2}\cr&\qquad~~+{\tt v}_{2}(L-d_{2})^{2}\bigg]&{\hbox{(16)}}}$$

By means of (9), (11), (14), (16), the overall contour update formula becomes: TeX Source $$\eqalignno{{{\partial\phi}\over{\partial t}}=&\, W_{1}\circ W_{2}\circ\bigg [g\vert\nabla(\phi)\vert{\rm div}({{\nabla\phi}\over{\vert\nabla\phi\vert}})+\nabla g\cdot\nabla\phi\bigg]\cr&+\delta_{\epsilon}(\phi)\bigg [({1}-W_{1}\circ W_{2})\mu{\rm div}\left({{\nabla\phi}\over{\vert\nabla\phi\vert}}\right)\cr&-W_{1}\circ ({1}-W_{2})\circ\Big (\lambda_{1}(I-c_{1})^{2}-\lambda_{2}(I-c_{2})^{2}\Big)\cr&-({1}-W_{1})\circ\Big ({\tt v}_{1}(L-d_{1})^{2}+{\tt v}_{2}(L-d_{2})^{2}\Big)\bigg]&{\hbox{(17)}}}$$

Fig. 3 provides the overview of this procedure. Let us consider the problem of finding the optimum combination of values for $W_{1i}$, $W_{2i}$, $S_{i}$ at a voxel $v$ for a training image $I_{i}$, $i=1,..,n$. Such a procedure can be handled as a graph-cut labeling problem where each label $f_{v}$ is being mapped to a combination of three labels: a label $f_{v1}\in [{0,1}]$ that represents the amount of contribution of the prior term $(W_{1i})$, a label $f_{v2}\in [{0,1}]$ that represents the amount of contribution of the edge-based term $(W_{2i})$ as well as a label $f_{v3}\in [{1,6}]$ representing the step of evolution $(S_{i})$. $W_{1i}$'s and $W_{2i}$'s values are in the interval $[{0,1}]$ as they represent percentages of contribution of the various energy terms, while the values of $S_{i}$ stand for the size of the time steps and thus, can be integer numbers. The mapping function is expressed as: TeX Source $$f: f_{v}\in [0,P]\rightarrow f_{v1}\in[{0,1}]\wedge f_{v2}\in [{0,1}]\wedge f_{v3}\in [{1,6}]\eqno{\hbox{(18)}}$$ where $P$ is the number of possible permutations of $f_{v1}$, $f_{v2}$, $f_{v3}$. Due to computational considerations, only 8 discrete values in the interval $[0,1]$ were used for both $f_{v1}$ and $f_{v2}$. For the same reason, the $S$ 's possible values were limited to 6. It is obvious that using more values could lead to better accuracy, but this selection was done having in mind an optimal balance between accuracy and computational cost in terms of memory requirements.

In order to formulate our problem, we consider the complete set of voxels ${\bf V}$ which belong to image $I_{i}$ of the training set and its corresponding label image $L_{i}$ that serves as the ground-truth image. The goal is to define an optimal labeling $f$ for ${\bf V}$. Finding the optimal labeling is equivalent to minimizing an energy functional $E({{{\bf f}}})$. According to graph cut theory, the energy functional can be formulated as: TeX Source $$E({{{\bf f}}})=\sum_{v\in{\bf V}}D_{v}(f_{v})+\sum_{v\in P,q\in N_{v}}{\cal V}_{v,q}(f_{v},f_{q})\eqno{\hbox{(19)}}$$ where $D_{v}$ is the individual voxel cost for voxel $v$ and measures at which extent label $f_{v}$ fits for voxel $v$ given the ground-truth segmentation and the resulting one. $N_{v}$ is the set of neighboring voxels of $v$ and ${\cal V}_{v,q}(f_{v},f_{q})$ [66] is the interaction potential between voxels $v$, $q$ that penalizes discontinuities between neighboring voxels and thus encourages spatial coherence and it is defined as ${\cal V}_{v,q}(f_{v},f_{q})=min(\vert f_{v1}-f_{q1}\vert+\vert f_{v2}-f_{q2}\vert+\vert f_{v3}-f_{q3}\vert,K)$, where $K$ is set equal to 4 based on experimentation. Within our framework, the data cost function $D_{v}$ is defined as: TeX Source $$D_{v}=\vert S\circ{{\partial\phi}\over{\partial t}}(f_{v})-\phi_{GT}\vert\eqno{\hbox{(20)}}$$ where ${{\partial\phi}\over{\partial t}}$ is given by (17) and $\phi_{GT}$ stands for the level set formulation of the corresponding label image.

A similar formulation with that in (19) is also used in order to find optimal ACM parameters, $\lambda_{1i}$, $\lambda_{2i}$ for each image in the training set. Thus, to calculate $\lambda_{1i}$, $\lambda_{2i}$, which are set in $[{0,1}]$, graph-cuts are used to minimize the difference between the region update term and the level set formulation of GT: TeX Source $$D_{v}=\vert S\circ{{\partial\phi_{R}}\over{\partial t}}(f_{v})-\phi_{GT}\vert\eqno{\hbox{(21)}}$$

The authors are grateful to the OASIS team for providing us with their dataset as well as the IBSR repository for providing us with manually-guided expert segmentation results along with their MRI dataset. We would also like to give special thanks to Angelos Baltatzidis M.D., Radiologist for providing us with the manual segmentations for the OASIS dataset. Furthermore, for the OASIS-MICCAI dataset we would like to thank the workshop organizers, Prof. Bennett Landman and Prof. Simon Warfield, as well as Neuromorphometrics, Inc. for providing the manual segmentations.

D. Zarpalas is with the Centre for Research and Technology Hellas, Information Technologies Institute, Thessaloniki 57001, Greece and also with the Laboratory of Medical Informatics, the Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece

P. Gkontra and P. Daras are with the Centre for Research and Technology Hellas, Information Technologies Institute, Thessaloniki 57001, Greece

N. Maglaveras is with the Laboratory of Medical Informatics, the Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece and also with the Centre for Research and Technology Hellas, Institute of Applied Biosciences, Thessaloniki 57001, Greece

CORRESPONDING AUTHOR: D. ZARPALAS (zarpalas@iti.gr)

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

No Data Available

No Data Available

None

No Data Available

- This paper appears in:
- No Data Available
- Issue Date:
- No Data Available
- On page(s):
- No Data Available
- ISSN:
- None
- INSPEC Accession Number:
- None
- Digital Object Identifier:
- None
- Date of Current Version:
- No Data Available
- Date of Original Publication:
- No Data Available

Normal | Large

- Bookmark This Article
- Email to a Colleague
- Share
- Download Citation
- Download References
- Rights and Permissions