Scheduled Maintenance on March 25th, 2017:
Single article purchases and IEEE account management will be unavailable from 4:00 AM until 6:30 PM (ET). We apologize for the inconvenience.
By Topic

IEEE Quick Preview
  • Abstract



TUMOR histology provides a detailed insight into cellular morphology, organization, and heterogeneity. For example, tumor histological sections can be used to identify mitotic cells, cellular aneuploidy, and autoimmune responses. More importantly, if tumor morphology and architecture can be quantified on large histological datasets, then it will pave the way for constructing histological databases that are prognostic, the same way that genome analysis techniques have identified molecular subtypes and predictive markers. Genome wide analysis techniques (e.g., microarray analysis) have the advantages of standardized tools for data analysis and pathway enrichment, which enable hypothesis generation for the underlying mechanism. On the other hand, histological signatures are hard to compute because of the biological and technical variations in the stained histological sections; however, they offer insights into tissue composition as well as heterogeneity (e.g., mixed populations) and rare events.

Histological sections are often stained with hematoxylin and eosin stains (H&E), which label DNA and protein contents, respectively. Traditional histological analysis is performed by a trained pathologist through the characterization of phenotypic content, such as various cell types, cellular organization, cell state and health, and cellular secretion. However, such manual analysis may incur inter- and intraobserver variations [1]. On the other hand, the value of the quantitative histological image analysis originates from its capability in capturing detailed morphometric features on a cell-by-cell basis and the organization of cells. Such rich description can be linked with genomic information and survival distribution as an improved basis for diagnosis and therapy. Additionally, in the presence of large datasets, quantitative histological signatures can be used to identify intrinsic subtypes of a specific tumor type, which is supplementary to histological tumor grading.

One of the main technical barriers for processing a large collection of histological data is that the color composition is subject to technical (e.g., fixation, staining) and biological (e.g., cell type, cell state) variations across histological tissue sections, especially when these tissue sections were processed and scanned at different laboratories. Here, a histological tissue section refers to an image of a thin slice of tissue applied to a microscopic slide and scanned from a light microscope. From an image analysis perspective, color variations can occur both within and across tissue sections. For example, within a tissue section, some nuclei may have low chromatin content (e.g., light blue signals), while others may have higher signals (e.g., dark blue); nuclear intensity in one tissue section may be very close to the background intensity (e.g., cytoplasmic, macromolecular components) in another tissue section.

Our approach evolved from our insights and experiments indicating that simple color decomposition and thresholding techniques miss or overestimates some of the nuclei in the image, i.e., nuclei with low chromatin contents are excluded. The problem is further complicated as a result of the diversity in nuclear size and shape (e.g., the classic scale problem). It became clear that the incorporation of prior knowledge (e.g., manual annotation and validation by the pathologist) would be needed not only for validation, but also for constructing a model that captures wide variations in the nuclear staining, both within and across tissue sections. Accordingly, our proposed approach integrates prior knowledge, which is characterized by the Gaussian mixture models (GMM), and the nuclear staining information of the original, which is extracted by color decomposition, within a level set framework. The net result is a binarized image of blobs (a single nucleus or a clump of nuclei), which are either validated or partitioned further through geometric reasoning.

Organization of the rest of this paper is as follows. Section II reviews previous research in this area with a focus on both how 1) quantitative representation of Formula$H\&E$ sections can be leveraged for translational medicine, and 2) nuclear segmentation is performed to address clinical issues; Section III describes the details of our approach; Section IV provides experimental and validation results; and Section V concludes this paper.



The main barriers in correct nuclear segmentation are technical variations (e.g., fixation) and biological heterogeneity (e.g., cell type). Existing techniques have focused on adaptive thresholding followed by morphological operators [2], [3], fuzzy clustering [4], level set method using gradient information [5], [6], color separation followed by optimum thresholding and learning [7], and hybrid color and texture analysis that are followed by learning and unsupervised clustering [8]. Some applications combine the aforementioned techniques. For example, in [9], iterative radial voting [10] was used to estimate seeds for the location of the nuclei and subsequently, the model interaction between neighboring nuclei with multiphase level set [11], [12], and in [13] an initial segmentation of the foreground with graph cut was followed by multiscale seed detection, and the combined results was further refined with a second iteration of graph cut. It is also a common practice that through color decomposition, nuclear regions can be segmented using the same techniques that have been developed for fluorescence microscopy [14].

Yet, it still remains a challenging problem to effectively address the analytical requirements of the tumor histological characterization. Thresholding and clustering assume constant chromatin content for the nuclei in the image. Though, in practice, there is a wide variation in chromatin content. In addition, there is the issue with overlapping and clumping of the nuclei, and sometimes, due to the tissue thickness, they cannot be segmented. The method proposed in [9] aims to delineate overlapping nuclei through iterative radial voting [10], but seed detection can fail in the presence of wide variation in the nuclear size, which will lead to fragmentation. The method described in [15] is based on a voting system using multiple classifiers built from different reference images; we will refer this method as multi-classifier voting system (MCV) in the rest of this paper. Compared to the aforementioned approaches, MCV provides a better way to handle the variation among different batches. However, due to the lack of smoothness constrain, the classification results can be noisy, and sometimes, erroneous, as demonstrated in Fig. 4.

In summary, our goal is to process whole mount tissue sections by addressing the aforementioned issues, construct a large database of morphometric features, and enable subsequent morphometric subtyping and genomic association.



Our strategy leverages several key insights for segmenting nuclear regions: 1) nuclei often respond well to a Laplacian of Gaussian (LoG) filter, 2) nuclear staining information can be captured through color decomposition, 3) color normalization reduces variations in image statistics, and 4) integration of the prior knowledge and nuclear staining information enhances the final segmentation. These concepts are then coupled with a dictionary of manual annotation of nuclei for constructing a model from the TCGA tumor bank. The model constructs representations of the foreground and background of the hand segmented images based on the distribution of 1) the multiscale LoG response in the decomposed nuclear channel and 2) color information in the RGB space. This representation is then condensed and expressed in terms of a GMM, as shown at the top of Fig. 1. Having constructed the model, we will then utilize a level set framework to segment foreground and background content. Finally, delineated blobs are subjected to convexity constraints for partitioning clumps of nuclei. In the rest of this section, we discuss model construction, color normalization, color decomposition, and then proceed with the details of the proposed solution.

Figure 1
Fig. 1. Steps in nuclear segmentation. During model construction (offline), for each individual reference image, GMMs are constructed to represent nuclei and background in both RGB space and LoG response space. During classification (online), input test image is normalized against each reference image, and then level set method is applied to separate nuclear region against background.

A. Model Construction

Our target dataset consists of 440 individual tissue sections that have been scanned with either a 20 × or 40× objective. From these images, which are in the order of 40k × 40k pixels (or higher), a representative of 20 reference images of 1k × 1k pixels have been selected for model construction. These references are utilized in offline and online processing. During offline processing (e.g., training), each image (e.g., reference) is hand segmented and processed with LoG filters at multiple scales. Statistics of foreground (nuclei) and background, in both RGB space and LoG response space, are collected. Subsequently, the foreground and background models of each reference are represented as a mixture of Gaussians. During online processing, a test image is normalized against every reference through a color map normalization strategy [15] for the purpose of low level feature extraction.

B. Color Normalization

The purpose of color normalization is to reduce the variation between an input test image and a reference image so that the prior models constructed from the reference image can be utilized. We evaluated a number of color normalization methods and chose the color map normalization described in [15] for its effectiveness in handling histological data. Let

  1. input image Formula$I$ and reference image Formula$Q$ have Formula$K_{I}$ and Formula$K_{Q}$ unique color triplets in terms of Formula$(R,G,B)$, respectively;
  2. Formula${\BBR}_C^{X}$ be a monotonic function, which maps the color channel intensity, Formula$C\in \{R,G,B\}$, from Image Formula$X$ to a rank that is in the range Formula$[0,K_X)$, and Formula$X\in \{I,Q\}$;
  3. Formula$(r_p, g_p, b_p)$ be the color of pixel Formula$p$, in image Formula$I$, and Formula$({\BBR}_R^{I}(r_{p}), {\BBR}_G^{I}(g_{p}),{\BBR}_B^{I}(b_{p}))$ be the ranks for each color channel intensity; and
  4. the color channel intensity values Formula$r_{{\rm ref}}, g_{{\rm ref,}}$ and Formula$b_{{\rm ref}}$, from image Formula$Q$, have ranks: Formula TeX Source $$\eqalignno{{\BBR}_R^{Q}(r_{{\rm ref}}) &= \bigg\lfloor {{\BBR}_R^{I}(r_{p})\over K_{I}} \times K_{Q} + {1\over 2} \bigg\rfloor \cr {\BBR}_{\leavevmode {G}}^{Q}(g_{{\rm ref}}) &= \bigg\lfloor {{\BBR}_G^{I}(g_{p})\over K_{I}} \times K_{Q} + {1\over 2} \bigg\rfloor \cr {\BBR}_{\leavevmode {B}}^{Q}(b_{{\rm ref}}) &= \bigg\lfloor {{\BBR}_B^{I}(b_{p})\over K_{I}} \times K_{Q} + {1\over 2} \bigg\rfloor.}$$

As a result of color map normalization, the color for pixel Formula$p$: Formula$(r_{p},g_{p},b_{p})$, will be normalized as Formula$(r_{{\rm ref}}, g_{{\rm ref}}, b_{{\rm ref}})$. In contrast to standard quantile normalization, which utilizes all pixels in the image, color map normalization is based on the unique color in the image, thus, excluding the frequency of any color. Since the color frequencies vary widely as a result of technical variations and tumor heterogeneity, based on our experience, this method is quite powerful for normalizing histology sections.

C. Color Decomposition

In order to provide the nuclear staining information, and reduce complexities for integrating LoG responses, the RGB space is decomposed through the method described in [16]. In our case, we simply used the decomposition matrix established in [16] for Formula$H\&E$ staining. Examples are shown in Fig. 4(b). Please refer to [16] for more details.

D. Feature Extraction

Our approach integrates both color information and scale information, in which color information is extracted from the normalized Formula$RGB$ space, and scale information is extracted by multiscale Formula$LoG$ filters on decomposed nuclear channel. The rationales are that 1) in some cases, color information is insufficient to differentiate nuclear region and background; 2) the scales of the background structure and nuclear region are typically different; and 3) the nuclear region responds well to blob detectors, such as Formula$LoG$ filter [13]. As a result, with respect to each reference image, each pixel in the test image is represented by the following two features: 1) Formula$\{r,g,b\}$ in the color space; and 2) Formula$\{l_{\sigma_1}, l_{\sigma_2}, \ldots, l_{\sigma_n}\}$ in the LoG response space, where Formula$l_{\sigma_i}$ is the Formula$LoG$ response at scale: Formula$\sigma_i$.

E. Multireference Level Set Model for Nuclei/Background Classification

Let's assume Formula$N$ reference images: Formula$R_{i}, i\in \{1,\ldots N\}$, and for each individual reference image, four GMM are constructed to represent nuclear and background regions in both Formula$RGB$ space and Formula$LoG$ response space, respectively: Formula$\hbox{GMM}_{F}^k, \hbox{GMM}_{B}^k$, in which Formula$k\in \{1,\ldots2N\}, \hbox{GMM}_{F}^{1\le k\le N}$ is the foreground model for Formula$k{\hbox{th}}$ reference image in RGB space, Formula$\hbox{GMM}_{B}^{1\le k\le N}$ is the background model for Formula$k{\hbox{th}}$ reference image in RGB space, Formula$\hbox{GMM}_{F}^{N+1\le k\le 2N}$ is the foreground model for Formula$(k\hbox{-}N){\hbox{th}}$ reference image in LoG response space, and Formula$\hbox{GMM}_{B}^{N+1\le k\le 2N}$ is the background model for Formula$(k\hbox{-}N){\hbox{th}}$ reference image in LoG response space.

An input test image Formula$I$ is first normalized with respect to every reference image Formula$R_i$ represented as Formula$NI_{i}$. Subsequently, Formula$\hbox{LoG}$ responses of the decomposed nuclear channel of Formula$NI_{i}$ are collected to construct Formula$2N$ features per pixels, where the first Formula$N$ features are from the normalized color space, and the second Formula$N$ features are Formula$\hbox{LoG}$ responses. Let:

  1. Formula$f^k(p)$ be the Formula$k{\hbox{th}}$ feature of pixel Formula$p$;
  2. Formula${\bf p}_F^k$ and Formula${\bf p}_B^k$ be the probability of Formula$f^k$ produced by Nuclei and Background, respectively: Formula TeX Source $$\eqalignno{{\bf p}_F^k(p) &= {\hbox{GMM}_{F}^k(p)\over \hbox{GMM}_{F}^k(p)+\hbox{GMM}_{B}^k(p)}, \hbox{and} \cr {\bf p}_B^k(p) &= {\hbox{GMM}_{B}^k(p)\over \hbox{GMM}_{F}^k(p)+\hbox{GMM}_{B}^k(p)};}$$
  3. Formula$\lambda^k$ be the weight for Formula$R_i$: Formula$\lambda^k = <\hbox{hist}(R_k), \hbox{hist}(NI_k)>/(\Vert \hbox{hist}(R_k)\Vert \Vert \hbox{hist}(NI_k)\Vert)$, where Formula$\hbox{hist}(\cdot)$ is the histogram function, Formula$R_k$ is the Formula$k{\hbox{th}}$ reference image, Formula$NI_k$ is the normalized input Image Formula$I$ with respect to Formula$R_k$;
  4. Formula$DI$ be the decomposed nuclear channel;
  5. Formula$C$ be the curve.

The corresponding energy function to be minimized is then defined as follows: Formula TeX Source $$\eqalignno{E &= \mu \cdot \hbox{Length}(C) + v \cdot \hbox{Area}(\hbox{inside}(C)) \cr &\quad +\lambda_{F}\int_F{\vert DI(p)-C_F(p)\vert^2 dp} \cr &\quad +\lambda_{B}\int_B{\vert DI(p)-C_B(p)\vert^2 dp} \cr &\quad -\sum_{k=1}^{N}{\lambda^k\int_{F}{\log {{\bf p}_F^k(f^k(p))} dp}} \cr &\quad -\sum_{k=1}^{N}{\lambda^k\int_{B}{\log {{\bf p}_B^k(f^k(p))} dp}} \cr &\quad -\alpha \sum_{k=N+1}^{2N}{\lambda^{k-N}\int_{F}{\log {{\bf p}_F^k(f^k(p))} dp}} \cr &\quad -\alpha \sum_{k=N+1}^{2N}{\lambda^{k-N}\int_{B}{\log {{\bf p}_B^k(f^k(p))} dp}} &\hbox{(1)}}$$ where Formula$\mu, v, \lambda_F, \lambda_B,$ and Formula$\alpha$ are fixed coefficients. Formula$C_F$ and Formula$C_B$ are the mean intensities of the nuclear region and background region, respectively, measured in the decomposed nuclear channel (DI). It is easy to see that the first two terms regularize the smoothness of the nuclear boundary and nuclear size, respectively; the second two terms penalize the variation in the decomposed nuclear staining space for nuclear region and background region, respectively; and the last four terms ensure the fitness of nuclei and background to the prior knowledge.

The separation of the nuclei from the background is achieved by minimizing the energy function defined earlier via the evolution of the level set; subsequently, the regularized Heaviside function Formula$H$ [17] is introduced as follows: Formula TeX Source $$H(z) = {1\over 2} \left(1+{2\over \pi} \hbox{arctan}({z\over \epsilon})\right) \eqno{\hbox{(2)}}$$ where Formula$\epsilon$ is the regulation parameter of the Heaviside function and Delta function is defined as follows: Formula TeX Source $$\delta (z) = {d\over dz} H(z). \eqno{\hbox{(3)}}$$ The objective energy function can then be rewritten as Formula TeX Source $$\eqalignno{E &= \mu \int_{\Omega}{\mid \nabla H(\phi (p))\mid dp} + v \int_{\Omega}{H(\phi (p)) dp} \cr &\quad +\lambda_{F}\int_{\Omega}{\vert DI(p)-C_F(p)\vert^2\cdot H(\phi (p)) dp} \cr &\quad +\lambda_{B}\int_{\Omega}{\vert DI(p)-C_B(p)\vert^2\cdot (1-H(\phi (p))) dp} \cr &\quad -\sum_{k=1}^{N}{\lambda^k\int_{\Omega}{\log {{\bf p}_F^k(f^k(p))}\cdot H(\phi (p)) dp}} \cr &\quad -\sum_{k=1}^{N}{\lambda^k\int_{\Omega}{\log {{\bf p}_B^k(f^k(p))}\cdot (1-H(\phi (p))) dp}} \cr &\quad -\alpha \sum_{k=N+1}^{2N}{\lambda^{k-N}\int_{\Omega}{\log {{\bf p}_F^k(f^k(p))}\cdot H(\phi (p)) dp}} \cr &\quad -\alpha \sum_{k=N+1}^{2N}{\lambda^{k-N}\int_{\Omega}{\log {{\bf p}_B^k(f^k(p))}\cdot (1-H(\phi (p))) dp.}}\cr & & \hbox{(4)}}$$ The minimization of the energy function can be achieved by gradient decent method, and the corresponding Euler–Lagrange equation for Formula$\phi$ is Formula TeX Source $$\eqalignno{{\partial \phi \over \partial t} &=\delta (\phi)\left(\mu \cdot \hbox{div}{\nabla \phi \over \mid \nabla \phi \mid} - v\right)\cr &\,\,\, +\delta (\phi)\left(\lambda_{\leavevmode {B}}\vert DI-C_{\leavevmode {B}}\vert^2-\lambda_{\leavevmode {F}}\vert DI-C_{\leavevmode {F}}\vert^2\right) \cr &\,\,\, +\delta (\phi)\left(\!\sum_{k=1}^{N}{\log {{{\bf p}_F^k(f^k)^{\lambda^k}\over {\bf p}_B^k(f^k)^{\lambda^k}}}} {+}\sum_{k=N+1}^{2N}{\log {{{\bf p}_F^k(f^k)^{\alpha \lambda^{k -N}}\over {\bf p}_B^k(f^k)^{\alpha \lambda^{k-N}}}}}\!\right)\!.\cr && \hbox{(5)}}$$ Since the multireference level set is a region-based active contour model, it is not sensitive to initialization. In our case, a circle with constant radius (Formula$r=100$) at the center of each test image was used as the initial zero level set, and it is evolved until the differences in the spatial location between two zero level sets from two consecutive iterations are below an empirical threshold. Based on our experience, the convergence is typically reached within 50 iterations.

F. Nuclear Mask Partition

After the level set evolution, we end up with a binarized image of blobs (a single nucleus or a clump of nuclei). The next step is to partition them into single nucleus, if necessary. A key observation we made is that the nuclear shape is typically convex. Therefore, ambiguities associated with the delineation of overlapping nuclei could be resolved by detecting concavities and partitioning them through geometric reasoning. The process, shown in Fig. 2, consists of the following steps:

  1. Detection of Points of Maximum Curvature. The contours of the nuclear mask were extracted, and the curvature along the contour was computed by using Formula$k = {x^{\prime}y^{\prime \prime}- y^{\prime}x^{\prime \prime}\over \left({x^{\prime}}^{2}+{y^{\prime}}^{2}\right)^{3/2}},$ where Formula$x$ and Formula$y$ are coordinates of the boundary points. The derivatives are computed by convoluting the boundary with derivatives of Gaussian. An example of detected points of maximum curvature is shown in Fig. 2.
  2. Delaunay Triangulation (DT) of Points of Maximum Curvature for Hypothesis Generation and Edge Removal. DT was applied to all points of maximum curvature to hypothesize all possible groupings. The main advantages of DT are that the edges are nonintersecting, and the Euclidean minimum spanning tree is a subgraph of DT. This hypothesis space was further refined by removing edges based on certain rules, e.g., no background intersection.
  3. Geometric Reasoning. Properties of both the hypothesis graph (e.g, degree of vertex), and the shape of the object (e.g., convexity) were integrated for edge inference.
Figure 2
Fig. 2. Steps in delineating overlapping nuclei. First step: detection of points with maximum curvature along contours of nuclear mask; Second step: hypothesis generation through triangulation; Third step: edge inference through geometric constrains.

Among all the different parameters of this process, only the scale for curvature detection and the threshold for curvature maximum points were adjusted based on the preferred morphology and scale of nuclei in the dataset at 20 ×, which were further verified on the manually annotated reference image set.

This method is similar to the one proposed in our previous work [18]; however, a significant performance improvement has been made through triangulation and subsequent geometric reasoning. Refer to [19] for details.



Our target dataset consists of 440 hematoxylin and eosin (H&E) stained glioblastoma multiforme (GBM) tumor sections from 152 patients, which were scanned with either a 20× or 40× objective. Since those samples were collected at different laboratories, fixation, and staining protocols lack uniformity. In order to capture the technical variations, we manually selected and annotated 20 samples (at 20×) as reference images from the tumor repository. Each sample is a 1k × 1k block, and a subset is shown in Fig. 3. The segmentation was carried out on decomposed tissue blocks with size 1k × 1k pixels at 20×, and for each tissue block, only the top Formula$M=10$ reference images with the highest Formula$\lambda$ were used. Since Formula$\lambda$ is a similarity measurement between the normalized tissue block and each of the reference images, different tissue blocks may have different subset of reference images during classification. The number of components for Formula$\hbox{GMM}$ was fixed to be 20, with the parameters of Formula$\hbox{GMM}$ estimated via EM algorithm, and the other parameter settings were: Formula$\alpha = 0.1, \lambda_F = \lambda_B = 0.05, \mu = 1.0, v = 1.0$, and Formula$\sigma \in \{2.0,4.0,6.0\}$, in which Formula$\sigma$ was determined based on the preferred dimensions of malignant and normal nuclear size at 20×, and all other parameters were selected to minimize the cross validation error. Repeated hold-out cross-validation was applied on the reference images, and a comparison of the classification performance was made among our approach, random forest [20], EMaGAC [6], and MCV [15], as shown in Table I and Fig. 5, which indicates:

  1. by incorporating both prior information and nuclear staining information, our system better characterizes the variation in the data, thus is much more effective and robust.
  2. by incorporating the multiscale LoG responses as a feature, we encode the prior scale information into the system. As a result, ambiguous background structures are excluded, which leads to an increase of precision. However, there is also a decrease in the recall when compared to MRL with only color features, which is due to the fact that the tiny fragments inside the nuclei, as indicated by Fig. 3, can also be eliminated.
Figure 3
Fig. 3. Subset of reference image ROI, with manual annotation overlaid as green contours, indicating significant amount of technical variation. Nuclei with white hollow regions inside are pointed out by arrows.
Figure 4
Fig. 4. Classification (nuclei/background differentiation) comparison among our approach (MRL), EMaGAC [6], MCV [15], and random forest [20]. Foreground regions are enclosed within green contours. (a) Original image patch with level set initialization for MRL; (b) Decomposed nuclear channel with level set initialization for EMaGAC, where initial contours are centered at the green points with constant radius: Formula$r=3$(pixel); (c) Classification by MRL; (d) Classification by EMaGAC; (e) Classification by MCV; (f) Classification by Random forest.
Figure 5
Fig. 5. Scatter plot of Precision-Recall for different approaches on reference dataset.
Table 1

We also provide an intuitive comparison among different approaches, as shown in Fig. 4, which demonstrates the effectiveness of our approach. During comparison, we noticed that EMaGAC [6] was sensitive to initialization, and the quality of initialization provided by [6] experienced a large degradation in the presence of large variation in the our dataset, which led to nonfavorable classification results. More results of our approach on classification and segmentation can be found in Fig. 6.

Figure 6
Fig. 6. Classification and segmentation results based on our approach. (a) Original images. (b) Nuclear/background classification results via our approach (MRL). (c) Nuclear partition results via geometric reasoning.

The overall computational complexity of our approach is Formula$O(M^2+N\times M)$, in which Formula$M$ is the number of pixels in the input image, and Formula$N$ is the number of reference images. In our experiments, the final segmentation was achieved with an average computational time of around 60 s per tissue block with a size 1k × 1k pixels at 20 ×. The segmentation performance of MRL is indicated in Table II, where the correct nuclear segmentation is defined as follows. Let

  1. Formula$\hbox{MaxSize}(a,b)$ be the maximum nuclear size of nuclei Formula$a$ and Formula$b$, and
  2. Formula$\hbox{Overlap}(a,b)$ be the amount of overlap between nuclei Formula$a$ and Formula$b$.
Table 2
TABLE II COMPARISON OF AVERAGE SEGMENTATION PERFORMANCE BETWEEN OUR CURRENT APPROACH (MRL), AND OUR PREVIOUS APPROACH [21], IN WHICH Formula$\hbox{precision} = {\#correctly\_segmented\_nuclei\over \#segmented\_nuclei}$, AND Formula$\hbox{recall}={\#correctly\_segmented\_nuclei\over \#manually\_segmented\_nuclei}$

Then for any nucleus Formula$n_G$ from ground truth, if there is one and only one nucleus Formula$n_S$ in the segmentation result, that satisfies Formula${\hbox{Overlap}(n_G,n_S)\over \hbox{MaxSize}(n_G, n_S)} > T$, then Formula$n_S$ is considered to be a correct segmentation of Formula$n_G$. The threshold was set to be Formula$T=0.8$.

The reader may question the classification performance since both the precision and recall are not very high. The reason for this is that the ground truth(annotation) for the reference images is created at the object(nucleus) level, which means the hollow regions(lost of chromatin content for various reasons) inside the nuclei will be marked as part of the nuclear region rather than the background, as indicated by Fig. 3 (pointed out by arrows).



We have developed a multireference level set approach for delineating nuclei from Formula$H\&E$ stained tumor sections, and applied it to the GBM cohort from TCGA dataset. Our approach addresses the problem of technical and biological variations by utilizing both global information from the manually annotated reference images, and the local information from the decomposed nuclear channel of the target image. The experimental results and comparisons demonstrate the effectiveness of the proposed approach. Our future work will focus on improving nuclear segmentation by incorporating the nuclear shape model, and evaluating proposed method on other tumor types.


This document was prepared as an account of work sponsored by the United States Government. While this document is believed to contain correct information, neither the United States Government nor any agency thereof, nor the Regents of the University of California, nor any of their employees, makes any warranty, express or implied, or assumes any legal responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by its trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or the Regents of the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof or the Regents of the University of California.


This work was supported by National Institutes of Health (NIH) under Grant U24 CA1437991 at Lawrence Berkeley National Laboratory under Contract DE-AC02-05CH11231. Asterisk indicates corresponding author.

H. Chang is with the Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA (e-mail:

J. Han is with the Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA (e-mail:

P. T. Spellman is with the Center for Spatial Systems Biomedicine, Oregon Health Sciences University, Portland, OR 97239 USA (e-mail:

B. Parvin is with the Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA (e-mail:

Color versions of one or more of the figures in this paper are available online at


No Data Available


No Photo Available

Hang Chang

Authors’ photographs and biographies not available at the time of publication.

No Photo Available

Ju Han

No Bio Available
No Photo Available

Paul T. Spellman

No Bio Available
No Photo Available

Bahram Parvin

No Bio Available

Cited By

No Data Available





No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
INSPEC Accession Number:
Digital Object Identifier:
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size