Adjusting the Ground Truth Annotations for Connectivity-Based Learning to Delineate

Deep learning-based approaches to delineating 3D structure depend on accurate annotations to train the networks. Yet in practice, people, no matter how conscientious, have trouble precisely delineating in 3D and on a large scale, in part because the data is often hard to interpret visually and in part because the 3D interfaces are awkward to use. In this paper, we introduce a method that explicitly accounts for annotation inaccuracies. To this end, we treat the annotations as active contour models that can deform themselves while preserving their topology. This enables us to jointly train the network and correct potential errors in the original annotations. The result is an approach that boosts performance of deep networks trained with potentially inaccurate annotations.


I. INTRODUCTION
As in many areas of computer vision, deep networks now deliver state-of-the-art results for delineation tasks, such as finding axons and dendrites in 3D light microscopy images. However, their performance depends critically on the accuracy of the ground-truth data used to train them. This is especially true when the delineation task is treated as a segmentation one and the network is trained by minimizing the crossentropy between the centerline predictions and ground-truth annotations, which is one of the most popular paradigms.
In practice, these so-called ground-truth annotations are usually supplied manually by an annotator who may not draw with the utmost accuracy and can therefore easily be a few voxels off the true centerline. This is not a matter of carelessness but a consequence of 3D delineation being truly difficult to do well on a large scale. As a result, inaccurate annotations are more the rule than the exception and this adversely affects how well the networks ultimately perform. One solution would be to have several annotators delineate the same data and combine their delineations. However, this would turn an already tedious, slow, and expensive process  into an even slower and more expensive one that almost no one can afford. Ideally, the pixels crossed by the centerline should have value zero (dark color). In practice, this is not always the case. There are non-zero values in the area indicated by the red arrow, presumably because the neurite is hardly visible there. Nevertheless, the distance map is sufficiently good to adjust the annotation. The adjusted annotation is shown in (c) and (d). This network retrained with adjusted annotations can now generate a better distance map even where the neurite is barely visible.
In this paper, we introduce a method that explicitly accounts for annotation inaccuracies and delivers the same performance as if they were perfectly accurate. Our main insight is that the annotations are usually imprecise more in terms of the 3D location of the centerlines than of the topology of the graph they define. We can therefore treat them as deformable contours forming a graph that can be refined by moving its nodes while preserving its structure. We cast this approach to training a deep network as a joint optimization over the arXiv:2112.02781v2 [cs.CV] 24 Dec 2022 network parameters and node positions. We then show that we can eliminate the node variables from the optimization problem, which can then be solved by minimizing a loss function. This loss function accounts for the annotation's lack of spatial precision. It can be minimized in the traditional manner and the output of the re-trained network used to refine the annotation. Fig. 1 depicts our approach and Fig. 2 showcases its behavior. We will demonstrate that it brings substantial improvements when training networks to delineate neurons in two-photon and confocal microscopy image stacks. Hence, our contribution is an automated approach to better leveraging inaccurate training data, which, in our experience, represents the vast majority of data available to practitioners.
The latter now routinely deliver the best performance when properly trained. However, obtaining accurately annotated data, especially in 3D, is a challenge. In practice it is rarely available in sufficient quantities. And what annotated data there is, is rarely accurate because manually delineating 3D structures is challenging. Introducing a degree of selfsupervision is a way to address this difficulty [2], [9] but this does not detract from the fact that the training would work even better if the available annotated data were accurate. This can be partially ascribed to the fact that most current networks are trained by minimizing the standard cross entropy or differentiable intersection-over-union loss [24]. As pixelwise measures, both are sensitive to even small misplacements of the linear structures' centerlines. In [29], this is partially addressed by introducing a loss component that accounts for global statistics of the network output, but the cross entropy remains a key component of the overall loss. Similarly, the method of [30] relies on introducing a topology-preserving term but still depends on the annotation being accurate. Accuracy can be improved by having several people annotate and combining their results using robust statistics. This is effective but even more expensive than obtaining one set of annotations and therefore out of reach for most practitioners. The problem can be partially alleviated by annotating only in 2D projections of the 3D data volumes [31], [47], [20], which is easier, but may result in even less precise annotations than those performed in 3D.
A similar problem to the one we address here also arises in the context of two-dimensional semantic boundary detection. The outlines one finds in annotated training sets are often rather imprecise and training the networks to nevertheless discover contours that overlap with them is an issue. In [45], training is reformulated as simultaneously optimizing the parameters of a deep net and correcting the annotations by solving a mixed binary-continuous optimization problem. However, unlike in our approach, preservation of annotation topology is not warranted and the corrections may break the continuity of annotations. This is a major problem when tracing neurons or blood vessels, because topology changes influence the biological interpretation of the results. The same problem is addressed in [1] by proposing a neural layer and a loss function that can be added on top of an edge detector and make it possible to find more accurate contours than those in the annotations. However, because the regions are represented in an implicit fashion, there is no more guarantee than in [45] that the annotations' connectivity will be preserved. Connectivity being at the heart of our applications, we therefore chose to use explicit deformable models, such as those described below.

B. Handling Noisy Annotations
Even though we know of no other algorithm that adjusts the geometry of centerline annotations during training, explicitly accounting for the fact that the annotations are noisy has received some attention. In [40], annotations produced by nonexpert annotators are accommodated by means of a dedicated distillation architecture and a noise-robust Dice loss. In [6], a dedicated network architecture and a semi-supervised training routine encourage equivariance to deformations to handle potential inaccuracies resulting from using a heuristic annotation tool. In [48], annotation noise is handled by a quality assessment module that discounts the loss in regions where the estimated label quality is low. Similarly, in [25], a distillation training setup and architecture based on self-attention are used to suppress the influence of erroneous labels on the trained network. In contrast to all these approaches, ours explicitly distinguishes between inaccuracies in position and topological errors. Because the former occur far more frequently than the latter, our loss function adjusts the centerline locations, while preserving the topology of the annotations.

C. Deformable Contour Models
Deformable contours [18], [37], [12] were initially introduced as a means to semi-automatically delineate simple contours while imposing smoothness constraints on the resulting outlines. They were later generalized to model network structures [11], [5] that can deform while preserving their topology. They are therefore well suited for refining our inaccurate annotations under the assumption they are topologically correct but that their locations are imprecise.
More recent deformable contours rely on minimizing energy functions generated by deep networks [23], [7], [40], [15], which enables end-to-end learning. Unlike in these methods, which rely on evolving the contour for segmenting the image at test time, our use of deformable contours is limited to adjusting the annotations during training.
Active appearance models [8] enable modelling the appearance of imaged objects, in addition to their shape. They can be learnt from coarse annotations, which are adjusted when fitting the model to the data [33]. The level of detail of the active appearance model can then be increased and, before the more detailed model is fitted to the data, it can be initialized with the parameters of its less detailed version. In this work, we also adjust the annotation during learning, but represent them as network snakes, and train a deep convolutional network, instead of fitting an active appearance model.

III. METHOD
Given a set of microscopy stacks along with the corresponding and possibly imprecise centerline annotations, we want to train a deep net to produce precise delineation. To this end, when training the deep network, we adjust not only its weights but also the annotations themselves. We first present the vanilla training procedure without annotation adjustment and explain why it is sub-optimal when the annotations lack precision. We then formalize our training procedure with adjustment.

A. Standard Training Procedure
Let us consider a set of N microscopy scans {X i } 1≤i≤N and corresponding centerline annotations {ŷ i } 1≤i≤N , in the form of distance maps of the same size as the scans. Voxel p of annotationŷ, denotedŷ[p], contains the distance from the center of p to the closest centerline. Let F (·; Θ) be a deep network, with weights Θ. It takes a scan X i as input and return a volume y i = F (X i ; Θ), containing a delineation of centerlines visible in X i . To keep the notation concise, we omit the dependencies on y i on Θ. The traditional approach to learning the network weights is to make y i as close as possible toŷ i by solving where the loss term L(ŷ, y) measures the voxel-wise difference between the annotation and the prediction. In our experiments, we take L to be the Mean Square Error. This assumes that the deviations of the annotations from actual centerline trajectories are small and unbiased. In reality, they rarely are. Hence, the network learns to accommodate this uncertainty in the annotations by blurring the predictions. At test time, this leads to breaking the continuity of predictions wherever the image quality is compromised by high level of noise or low contrast between the foreground and the background, as illustrated by Fig. 2.

B. Overview of our Approach
The formulation of Eq. 1 assumes that the deviations of the annotations from reality are small and unbiased. This work is predicated on the fact that they rarely are and that we must allow for substantial non-Gaussian deviations from the original annotations. Thus, instead of encoding the annotations in terms of volumesŷ i , we represent the annotated centerline C i of each X i as a graph, with the set of vertices V i and the set of edges E i . Each vertex v ∈ V i has a 3D coordinate c v , and each edge (u, v) ∈ E i represents a short line segment. This is shown in Fig. 2 where the circles along the annotations denote the vertices. Let c i be the vector formed by concatenating coordinates of all the vertices of V i . To accommodate the possible lack of precision of the annotations, we let c i change its initial value. Doing so changes the shape of C i but preserves its topology and can be used to explicitly model the deviation of the annotated centerlines from their true position. In other words, the minimization problem can be reformulated as finding where L = L D(c i ), y i ; C is the vector obtained by concatenating all the c i ; R is a regularization term that forces the deformed centerlines to be smooth, and that we define in Sec. III-C; L is the same MSE as in Eq. 1; and D is a distance transform that creates a volume in which a voxel with coordinates q is assigned its truncated distance to the closest edge of C. Formally, we write where δ(c, q) = min d is the threshold used to truncate the distance map, and the minimization over φ serves to find the point on edge (u, v), that is closest to q. Solving the problem of Eq. 2 means training the network to find centerlines that are smooth and with the same topology as the annotations. This is what we want but, unfortunately, this optimization problem involves two kinds of variables, the components of C and Θ respectively, which are not commensurate in any way. In practice, this makes optimization difficult. We address this problem by eliminating the C variables by rewriting Eq. 2 as In the following section, we describe our choice of R and the formulation of c * i (y i ) that results from it. Eq. 6 is a standard continuous optimization problem that we can solve using the usual tools of the trade.

C. Annotations as Network Snakes
We propose to represent each C i as a network snake, and to take R to be a classical sum of spring and elasticity terms [11], [5]. This regularization term takes the form where α and β are hyper-parameters that balance the strength of the two terms, E is the set of edges of C and T is the set of node triples (u, v, w) such that (u, v) ∈ E, (v, w) ∈ E, and v is a node of order two, that is, not a junction of multiple snake branches. As shown in [11], [5], R can be written as where A is a sparse symmetric matrix. Given this quadratic formulation of R, we can use the well-known semi-implicit scheme introduced to deform snakes, also known as active contour models [18], to minimize Eq. 5. It involves initializing each snake c 0 i to the manually produced annotation and refining it by iteratively solving for c t+1 i , where γ is a hyper-parameter known as the viscosity and is inversely proportional to the step size in each iteration. We refer the reader to [18] for the complete derivation. Here we only note, that when the iteration stabilizes, we have ∀i, c t i ≈ c t+1 i . We can therefore denote the stable vector of node locations by c , and use the derivative of Eq. 8, to write which means that c * minimizes R + L and is a solution of Eq. 5.
In practice, we solve Eq. 9 by inverting the matrix (A+γI) at the start of the training procedure and then multiplying the right-hand-side of the equation by the inverse at each iteration. Hence, we write We perform the update or Eq. (11) for 0 ≤ t < T . We take T = 10 in our implementation, which is sufficient for the process to stabilize, and denote the result of the last iteration by c * i (y i ) = c T i .

D. Computing the Gradients of the Loss Function
Performing the minimization in Eq. 6 requires computing at each iteration the gradient of the loss with respect to the network output y i . To avoid cluttering the notation, we denote c * (y i ) by c * . The gradient can then be expressed as where we used Eq. 10 to eliminate the second term. In other words, even though c * is a function of y i , we do not need to compute its derivatives with respect to y i to train the neural network. We only need those of L, and can treat c * as a constant when evaluating them. Therefore, the only difference between using our approach and the standard one of Section III-A is that instead of evaluating the loss using the original annotation c, we use its optimized version c * . We call this approach SnakeFull and it is depicted at the top of Fig. 3 Fig. 3. The three approaches to training described in Sec. III-D and III-E. In SnakeFull, the training objective is also used as the objective of the snake. This makes some gradient components vanish, simplifying gradient computation, but results in snake updates that are costly to compute. SnakeFast can accommodate an arbitrary snake objective, which makes it faster than SnakeFull, even though it requires backpropagation through a sequence of snake updates. In SnakeSimple, the backpropagation over the snake updates is simply omitted. This approach is the fastest. We analyze the accuracy vs. speed tradeoff induced by these three methods in section IV.

E. Speeding Things Up
We will show in Section IV that SnakeFull performs well but is slow to train. The culprit is the term ∂L ∂c in the update Eq. 11, which involves a time-consuming computation of the gradient of a distance map. To speed things up, we introduce a faster approach that we call SnakeFast. In it, we replace the term L in Eq. 5 by a simpler objective function S directly inspired by the classical external snake energy [18]. We take it to be where * G denotes a convolution with a Gaussian kernel and y[c v ] denotes the network output at vertex v. S is very similar to the energies used in traditional network snake formulations [11], [5]. Importantly, S and its gradients are easy and fast to compute because doing so only requires  In three separate runs, we performed 100 Gradient Descent using either SnakeFull, SnakeFast, or SnakeSimple. In the top row, we show the corrected annotation and the updated distance maps. The bottom row depicts the differences between the updated maps and the ground-truth one. We also indicate the computation times. SnakeFull removes the interruption in the distance map but the computation is slow. SnakeFast is much faster and fills the gap in the distance map almost as well. SnakeSimple is even faster but yields a corrected annotation that is too short, as highlighted by the red arrow.
convolving y with a Gaussian kernel and sampling the result at the locations of the snake nodes. Deforming the annotations then involves finding which means that the sum of distance values along the snake should be as low as possible while preserving snake smoothness. As in Section III-C, the snake update takes the form In practice, we take c † i (y i ) = c T i , where T = 10, as in Section III-C. Finally, we take the network training objective to be where we still use the original L of Eq. 2. We do this because S only depends on a small subset of voxels of y. Hence, it only provides a sparse supervisory signal and is not well suited as the training objective for the network that produces a dense distance map. The gradient of the objective of Eq. 16 is Because we minimized S instead of L in Eq. 14, we can no longer assume that the second term is zero as we did in Section III-D. Hence, to compute it during the minimization, we backpropagate through the snake update procedure of Eq. 15, as depicted by the middle row of Fig. 3. In practice, we use the autograd functionality of Pytorch to this end. The non-zero second term of Eq. 17 helps guide the snake to a position where the data loss L is low and ultimately influences the distance map that our deep network F outputs. It could be argued that ignoring this term so that the networks focuses exclusively on fitting the annotations would be preferable. To test this assertion, we implemented SnakeSimple, a third variant or our approach in which we take the second term of Eq. 17 to be zero. SnakeSimple is even faster than SnakeFast. In essence, it is a simplified version of SnakeFull and SnakeFast in which we successively optimize the network weights and then the snake position without any direct interaction between these two optimization steps. Fig. 4 uses a synthetic example to illustrates the differences between our three variants. SnakeFast and SnakeFull yield similar results with the former being much faster whereas SnakeSimple is even faster but prone to generating artifacts. We now turn to our experimental results on real data that confirm this.

A. Datasets
We tested our approach on the following data sets.
• The Brain data set comprises fourteen two-photon microscopy 3D scans of fragments of a mouse brain, with manually traced neurites. We use four volumes for testing and ten for training, each of size 200 × 250 × 250 voxels and spatial resolution 0.3 × 0.3 × 1.0 µm. • The Neurons data set contains two 3D images of neurons in a mouse brain. They had been outlined manually while viewing the sample under a microscope and the image was captured later. The sample deformed in the meantime, exacerbating misalignment between the annotation and the image. We use one stack of size 151 × 714 × 865 voxels and a resolution of 1 µm for training and one of size 228 × 764 × 1360 for testing. • The MRA is a publicly available set of Magnetic Resonance Angiography brain scans [4]. It consists of 42 annotated stacks, which we cropped to 416 × 320 × 128 voxels by removing their empty margins. Their resolution is 0.5 × 0.5 × 0.6 mm. We randomly partitioned the data into 31 training and 11 test volumes. None of our data sets can be considered as perfectly annotated. All annotations were performed as accurately as possible, but their precision is affected by the uneven distribution of the dye, image noise, and generic difficulty of annotating 3D volumes. In Neurons, the difficulty is compounded by the fact that the annotation were performed live days before image acquisition, and the sample deformed in the meantime.

B. Metrics
We used the following performance metrics.
• CCQ. Since standard segmentation metrics such as the F1 score [35] and precision-recall break-even point [26] are very sensitive to misalignment of thin structures, we use the correctness-completeness-quality, which is specifically designed for linear structures [43]. Correctness corresponds to precision, completeness to recall, and quality to the intersection-over-union. However, the notion of a true positive is relaxed from perfect coincidence of the ground truth and the prediction to their co-occurrence within a distance of d pixels. We used d = 3. Although it accounts for possible ground truth misalignment, CCQ is still a voxel-wise metric, insensitive to topological errors, such as short interruptions of neurites. • APLS. The Average Path Length Similarity is defined as the aggregation of relative length differences of shortest paths between pairs of corresponding end points, randomly sampled in the reconstructed and predicted graphs. It was introduced to evaluate road map reconstructions from aerial images [39] and aims to evaluate the connectivity of the reconstructions, as opposed to their pixelwise accuracy, which makes it a perfect performance measure for our task. • TLTS. The Too-Long-Too-Short is another performance criterion based on statistics of relative lengths of shortest paths between corresponding pairs of end points in the prediction and the ground truth [42]. We report the fraction of correct paths, that is, predicted paths whose relative length difference to the corresponding ground truth paths is lower than 15%.

C. Architectures and Training Details
Our contribution lies in the updating of the annotations and the loss function we use to achieve it, which should improve performance independently of any specific network architecture. To demonstrate this, we used two different architectures.
• UNet. A 3D UNet [34] with three max-pooling layers and two convolutional blocks. The first layer has 64 filters. Each convolution layer is followed by a batchnormalization and dropout with a probability of 0.15. During training, we randomly crop sub-volumes of size 96 × 96 × 96 and flip them along each dimension with probability 0.5. We combine them into batches of 8. • DRU. A recurrent architecture iteratively refining segmentation output 3 times [41]. The first layer has 64 filters. Each convolution layer is followed by a groupnormalization and dropout with a probability of 0.15. During training, we randomly crop sub-volumes of size 96 × 96 × 96 and flip them along each dimension with probability 0.5. We combine them into batches of 4. To compute the loss function, we average the outputs of all 3 refinement steps. During testing, the output of the final step is used to evaluate performance. We trained both architectures in four different ways: by minimizing the Mean Squared Error to the original annotations, which we will refer to as OrigAnnot, and by using the SnakeSimple, SnakeFull, and SnakeFast variants of our approach, described in Sections III-D and depicted by Fig. 3. In all cases, we used Adam [19] with the learning rate set to 1e − 4, and a weight decay of 1e − 4. At test time, the predicted distance map were thresholded at 2 and skeletonized to obtain centerlines. To compute the TLTS and APLS scores, we converted them into graphs.

D. Label Correction Baselines
As noted in section II, we do not know of other methods that deform the annotation graph during training, while maintaining its topology. However, there are methods designed to train deep nets using noisy annotations, where the noise is understood as flipping some pixel labels. In the following section, we compare our algorithm to three such methods: • NR-Dice. A UNet trained with the Noise Robust Dice Loss proposed in [40]. • QAM. An architecture with an auxiliary deep network to recognize annotations that might be wrong and downplay their importance during training [48]. • DS6. A Siamese architecture and a training routine dedicated to enforcing equivariance of the network to deformations [6].

E. Comparative Evaluation.
We present example reconstructions in Fig. 5 and Fig. 6. As shown in Tab. I, SnakeFull and SnakeFast outperform OrigAnnot in CCQ terms by a small margin, and in APLS and TLTS terms by a significantly larger one, which confirms that the main benefit of our loss is the improved connectivity of the predictions. As can be seen in Fig 5, our approach to training yields delineations with fewer unwarranted breaks and longer uninterrupted curvilinear segments.
On average UNet and DRU perform best when trained with SnakeFull and SnakeFast. However, SnakeFast requires three times less time per training iteration. SnakeSimple delivers a further 20-30% speedup but incurs a clear performance drop. Crucially, these conclusions apply to both the UNet and DRU architectures. In fact, the performance gain resulting from switching from OrigAnnot to SnakeFast is larger than the one resulting from changing from the simpler UNet to the more sophisticated DRU while retaining the standard OrigAnnot approach to training.
In short, SnakeFast represents an excellent compromise between training time and performance. This being said, at test time, the run-time is the same no matter how the network was trained, because there is no alignment of annotations anymore. Hence, given sufficient computational resources, SnakeFull is also a valid option. The bottom third of each part of Tab. I measures the performance of the methods designed to accommodate label noise, as described in Section IV-D. Because they don't explicitly preserve annotation topology and we do, UNet trained with SnakeFast outperform these methods in terms of the topologyaware scores but not necessarily in terms of the pixel-aware ones, which are note our main concern.

F. Perfectly Accurate Annotations
Having demonstrated that our loss function improves delineation results when the annotations lack spatial precision in Sec. IV-E, we now investigate its behavior when the annotation is precise. Since it is virtually impossible to precisely annotate 3D microscopy scans, we resort to synthetic data set Synthetic, which we generated using the VascuSynth algorithm [14], [17] and its implementation [46]. The images are generated from vascular graphs, which we use as perfectly accurate annotations. We used twenty stacks for training and ten for testing, each of size 400 × 400 × 400. Fig. 7 shows the maximumintensity projection of a test stack. The results, presented in Tab. II, confirm that, for perfectly accurate annotations, our method reduces to standard training with the MSE without incurring any performance drop.  To investigate how increasing the level of inaccuracy of the annotations affects the performance of a UNet trained with SnakeFast, we perturbed the annotations of the Synthetic data set. We applied a random deformation field that varies slowly across space to each annotation graph. We modulated its amplitude to change the level of inaccuracy. This produced three sets of annotations, as depicted by Fig. 8. We trained the network on each of them and present the results in Fig. 9. When the network is trained with SnakeFast, its connectivityrelated scores degrade much slower than when trained using OrigAnnot.  The robustness of SnakeFast to deviations in the annotation inspired us to ask another question: Can this loss function be used to train deep networks with annotations that are simplified to the point where they become much easier, faster, and therefore cheaper to obtain? To answer this, we trained the UNet with SnakeFast and OrigAnnot on the Brain data set with very coarse annotations. We obtained them by connecting neurite branching-and end-points with straight lines, as shown in Fig. 10. The results are presented in Tab. III. As expected, training on the coarse annotations without adjusting them results in a significant performance drop as compared to training on precise annotations. Switching from precise to coarse annotations still incurs a performance drop when using SnakeFast, but a much smaller one than when using the baseline. Visual inspection of the resulting segmentations, shown in Fg. 11, leads us to conclude that, for tasks where a compromise between accuracy and annotation cost is acceptable, using the easy annotations together with SnakeFast is a viable alternative to the classical approach.

I. Ablation Studies.
To investigate the impact of hyper-parameters of our method on performance, we run the following ablation studies.
1) Regularization terms: The regularization term R of Eq. 7 is the sum of a spring term, weighted by a coefficient α, and an elasticity term, weighted by a coefficient β. To investigate their influence on performance, we varied α and β and trained our UNet on the Brain data set. The results are presented in Tab. IV. The best results are attained with relatively low values of both terms. Higher values of the spring term, originally proposed for closed contours, effectively reguralize loopy topologies, but when used on tree-shaped structures, representing blood vessels and neuronal processes, tend to shorten the reconstructed neurites and vessels. Higher values of the elasticity term make it more difficult to fit irregular trajectories of neurites, like the ones shown in Fig. 5. 2) Step size for snake update: As explained in section III-C, the snake update iteration has a parameter γ, called viscosity, that acts as an inverse step size. We report the results of changing γ in Tab. V. Low viscosity results in large step size and can make the snake update procedure diverge, which we observed for γ = 1. On the other hand, high viscosity corresponds to small step size and increases the risk that the snake does not converge within the preset number of iterations. With γ = 100, we needed to increase the number of snake updates from 10 to 80 to ensure convergence. This also increased the iteration time by one second. γ = 10 made the snake converge within 10 updates, while also resulting in marginally higher performance than γ = 100.
3) L1 vs L2 distance: We also verified the performance of a UNet trained with SnakeFast when changing the loss data term from Mean Squared Error to Mean Absolute Error. The results, shown in Tab. VI show very slight advantage of MSE, possibly due to a gradient profile that prioritizes penalizing higher errors.

V. CONCLUSION AND FUTURE WORK
We have proposed a method that accounts for the inevitable inaccuracies in manual annotations of curvilinear 3D structures, such as neurites and blood vessels, in 3D image stacks. It leverages on the network snake formalism to define a loss function that simultaneously trains the deep network to produce the delineation and adjusts the initially imprecise annotations.
Our approach does not depend on the specific network architecture we use. Hence, its effectiveness suggests that handling such imprecisions may be even more important than refining the network architecture, which is something that has been largely neglected in the literature.
In future work, we will investigate the extension our approach to segmenting surfaces, like cell membranes in electron microscopy scans.