LSDNet: Trainable Modification of LSD Algorithm for Real-Time Line Segment Detection

As of today, the best accuracy in line segment detection (LSD) is achieved by algorithms based on convolutional neural networks - CNNs. Unfortunately, these methods utilize deep, heavy networks and are slower than traditional model-based detectors. In this paper we build an accurate yet fast CNN- based detector, LSDNet, by incorporating a lightweight CNN into a classical LSD detector. Specifically, we replace the first step of the original LSD algorithm - construction of line segments heatmap and tangent field from raw image gradients - with a lightweight CNN, which is able to calculate more complex and rich features. The second part of the LSD algorithm is used with only minor modifications. Compared with several modern line segment detectors on standard Wireframe dataset, the proposed LSDNet provides the highest speed (among CNN-based detectors) of 214 FPS with a competitive accuracy of 78 Fh . Although the best-reported accuracy is 83 Fh at 33 FPS, we speculate that the observed accuracy gap is caused by errors in annotations and the actual gap is significantly lower. We point out systematic inconsistencies in the annotations of popular line detection benchmarks - Wireframe and York Urban, carefully reannotate a subset of images and show that (i) existing detectors have improved quality on updated annotations without retraining, suggesting that new annotations correlate better with the notion of correct line segment detection; (ii) the gap between accuracies of our detector and others diminishes to negligible 0.2 Fh , with our method being the fastest.


I. INTRODUCTION
A UTOMATIC general-purpose line segment detection is a long-standing computer vision problem of high practical importance. Line segment detectors are exploited to construct an intermediate representation of image contents in visual recognition systems over a wide range of applications, such as autonomous vehicle localization [1]- [3], infrastructure maintenance with an UAV [4], [5], document recognition [6], [7].
Traditionally, the problem of line segment detection was approached with so-called model-based algorithms [8]- [11]. These algorithms operate by searching an image for elements that satisfy an explicit definition of a salient line segment, for example, "line segment is a strip-like set of image pixels with similar gradients" [8], [9] or "an image region is a line segment if its contour map triggers a peak in Hough space" [10], [11]. These algorithms typically have the benefits of being fast and having interpretable parameters. However, they may miss the segments that are salient for human, but for some reason don't match the exact explicit implemented definition. They are also prone to over-segmentation (splitting a single segment into parts) and sometimes demand nontrivial problem-specific postprocessing [12], [13].
The troubling problem of formulating an explicit criterion that matches the human expectation of what exactly constitutes a "salient line segment" can be avoided by manually annotating images and training a CNN, which then learns an implicit algorithm from data samples. This is an approach that yields the best accuracy in line segment detection task today [14]- [20]. Skipping ahead, let us note that such annotation is not a simple task either -existing datasets on line segment detection have numerous and sometimes extreme  internal inconsistencies -probably caused by the inherent ambiguity of the task, lack of clear labeling instructions and the tediousness of the task, leading to missed segments.
From a technical perspective, there also is a challenge in designing a CNN-based line detector. The typical solution is that CNN constructs an intermediate representation -encoding -which is then converted into a set of answers by some hand-crafted algorithm. Object detection networks have solved this problem by using so-called anchors [21] and, later, more simple anchorless detectors such as FCOS [22]. But line segment detectors need alternative encodings, suitable not for bounding boxes, but for line segments. The encodings should effectively deal with the fact that the line segments to be detected in a typical image intersect with each other a lot, while encodings for bounding boxes are effective only when "overlapping mostly happens between objects with considerably different sizes" [22], making them hard to exploit for line segment detection. While object detectors encondings -after many iterations of refinementhave become fast and elegant, we believe that intermediate representations of most line detector used today are still either imprecise, slow or unintuitive.
So whether it is the complexity of the interpreter or the sheer weight of the CNN backbone, CNN-based detectors that outperform the traditional ones in accuracy are also computationally harder [20]. Their complexity limits the scope of application of such algorithms in cases where speed, energy consumption, or hardware price are critical.
In this work we propose a fast yet accurate CNN-based line segment detection algorithm, LSDNet, built on the basis of a widely used model-based detector, LSD [8]. LSDNet overview is presented in Figure 1. The first step of LSD is the calculation of image gradient's orientation and magnitude. We view this step as an estimation of an intermediate representation, composed of line segments' heatmap, estimated as gradient's magnitude, and tangent field, estimated as gradient's orientation. We substitute this step with a lightweight CNN to generate the heatmap and tangent field from a more diverse and complex set of features than a simple gradient; the second step, conversion the intermediate representation into a set of line segments, is taken from the LSD almost asis. This substitution boosts the LSD accuracy and simplifies the postprocessing due to more accurate heatmap and tangent field. The heatmap and tangent field generation is a relatively simple task -the answer can be correctly inferred from the local context -which allows to use a lightweight CNN.
We discuss considerable inconsistencies (section IV) in ground truth labeling of current benchmarks for line segment detection accuracy -datasets Wireframe [23] and YorkUrban [24]. The labeling in these datasets is inconsistent not only between images: within the very same image many segments, similar in appearance, are often marked up differently -some as a positives, others as negatives. We speculate that these datasets in their current state are flawed for assessing the accuracy of general purpose line segment detectors.
So, in addition to measuring the accuracy of our detector on these datasets, we select and reannotate a part of Wireframe dataset -consisting mainly of simple images with less ambiguity -and show that the reannotated subset correlates better with conventional notion of correct line segment detection. Specifically, it narrows the accuracy gap between LSD and CNN-based approaches, and makes the gap between the proposed LSDNet and L-CNN [18], one the most accurate CNN-based approaches, negligible.

II. RELATED WORK
In this section we cover the model-based LSD algorithm [8] and the existing CNN-based approaches to the problem of line segment detection.

A. LSD DETECTOR
LSD is one of the most popular general-purpose line segment detectors and serves as a common baseline for CNN-based algorithms both in accuracy and speed [15]- [17].
The first step of LSD detector is gradient calculation; then the algorithm builds so-called line support regions (LSRs) [25]. LSRs are image segments (not to be confused with line segments), spanning actual line segments in an image. They are built by iteratively grouping neighbouring pixels with high gradients' magnitudes and similar orientations. After the formation of initial LSRs, they undergo several steps of filtering and refinement. These include, among others, splitting LSRs of "hockey stick" shape -effectively decoupling distinct but merged regions; removal of LSRs with a high deviation of gradients' orientations -possibly false-positives. Typical resulting LSRs are long, straight and a few pixels thick. Finally, each such region is individually encoded as a pair of points -effectively, a line segment.
LSD detector is fast, reaching 185 FPS for 320 × 320 images on a conventional CPU, and provides an accuracy of 63.3 F H on the Wireframe dataset.

B. CNN-BASED APPROACHES
The CNN-based approaches are typically composed of two modules: the CNN itself predicts an intermediate representation, then a postprocessing module reconstructs line segments from this representation.
We consider the design of the intermediate representation to be the key growth point of CNN-based line segment detectors since the desired detector's output -a unknownsize set of line segments with potentially high overlap -is hard to represent as a "CNN-friendly" fixed-shape tensor [26]. The survey below covers most popular intermediate representations used in existing CNN-based approaches.
The first CNN-based detectors represented line segments in an image as a set of endpoints and their connectivity graph [18], [23]. The endpoints were detected as local maxima of a CNN-produced heatmap. The connectivity of the endpoints was deduced either with the help of edge map heuristics [23] or with a trainable classifier [18]. To generate the classifier's input, a fixed number of uniform spaced points was sampled from the feature map between the endpoints. This operation was called LoI pooling [18].
In [14] a representation called attraction field was proposed (distance field [19] is a similar concept). Line segments were represented as a 2D vector field of translations to a nearest line segment. The postprocessing step for such a representation required nontrivial line segments extraction from heatmap-like prediction. Interestingly, the postprocessing of even a perfect attraction field generated from dataset annotations did not provide absolute detection accuracy [14] -in other words, such representation is inherently ambiguous.
The covered representations of endpoints and attraction field were combined and modified in [15]. The proposed CNN predicted both the endpoints and the attraction field, which was enriched to encode the translations to the both segment's endpoints as 4D vector field. Then the endpoints and the attraction field were used to refine each other. The refined line segments proposals underwent final verification with the help of LoI pooling and a trainable classifier. The detector provides state-of-the-art quality of 83 F H on Wireframe [23] dataset to date, but with low speed of 33 FPS on GPU.
The recently proposed "tri-points" representation [17] is focused on speeding up the detector. A line segment was represented by its center point and two vectors to its endpoints. It allowed to significantly boost the speed up to 50 FPS, driven by a much faster postprocessing requiring trivial conversion from "tri-point" to a line segment and nonmaximum supression. The CNN itself remained comparably slow. In [20] some further enhancements were proposed. A lightweight CNN was designed and the training procedure was improved by augmentations and more sophisticated loss function. It resulted in the fastest CNN-based detector to date with 200 FPS overall and 241 FPS for standalone CNN.

III. PROPOSED APPROACH
A perfect line segments representation should make it possible to design a CNN and a postprocessing module both being fast and accurate. We argue that in the search for such a representation there is no need to develop a brand new one from scratch; instead, the representation used implicitly by LSD detector -line segments heatmap and tangent field -already possesses all the desired properties. Indeed, the heatmap and tangent field could be inferred from local image context and does not require the reasoning of complex abstract features, which allows to use a lightweight CNN. On the other hand, as proven by LSD, the representation could be efficiently postprocessed to actual line segments. In the next sections we cover each step of the algorithm in detail. VOLUME 4, 2016 A. CNN Line Segment Representation. We represent a set of line segments in an image as a 2-channel feature map of the same height and width as the image. The first channel denoted by M contains a line segment mask. The second channel denoted as F contains tangent vector field of line segments.
Since the values of F are unit vectors, we encode F as one channel feature map.
Let l = (x 1 , y 1 , x 2 , y 2 ) be a line segment, ϕ l -segment's level line angle in range [0, π), p = (x, y) -an image pixel, Note that in case of overlapping segments (||L p || > 1) F (p) is defined by one arbitrary segment of overlap. This ambiguity probably can be ignored since only about 1% of line segments' pixels lie on overlaps -as measured on Wireframe dataset [23].
Loss Function. The loss function L = L mask + αL f ield used to fit the network is composed of two independent weighted terms, one responsible for mask M , other -for the tangent vector field F .
While the prediction of mask M is a straightforward segmentation problem with L mask being a conventional cross enthropy loss, to correctly estimate error in prediction of the tangent field F we should account for the following property of line segments' level line angles -the distance between angles 0 0 , 10 0 and 0 0 , 170 0 should be equal.
This problem can be approached by computing several distances between the original angles and the angles, shifted by ±π, and picking the minimum distance [27]. The calculation can be done simpler: let ϕ 1 , ϕ 2 -angles between which the distance is to be computed, z 1 = e iϕ1 , z 2 = e iϕ2 ∈ C -the representation of the angles as complex numbers with unit length and phases ϕ 1 , ϕ 2 , then This distance function has a geometrical interpretation, illustrated in Fig. 2a. It equals to 2 -norm of vector difference between unit vectors with phases 2ϕ 1 , 2ϕ 2 . Turning to the aforementioned example, the doubled phase makes vectors, corresponding to angles 10 0 and 170 0 , equally close to the horizontal, corresponding to angle 0 0 . Given the angle distance function ρ, the predicted field F p , the reference mask M t and reference field F t , the tangent vector field loss L f ield is defined as where M t is the reference line segment mask -essentially, loss is the average tangent angle discrepancy over the pixels that correspond to the ground truth line segments. CNN Architecture. To predict the proposed feature map, we use a CNN of U-Net [28] family. The architecture we use differs from the original one in the following simplifications. We exploit padded convolutions providing the same input and output spatial sizes of convolutional layers, which allows not to crop the feature maps feeded to skip connections. Instead of transposed convolutions we use bilinear upsampling. We reduce the depth of encoder-decoder branches up to 3 maxpooling and 3 upsampling layers, correspondingly, and use fewer filters in convolutional blocks -16, 32, 64, 128 filters per block (the number of blocks is greater than the number if maxpooling layers by one). The resulting CNN has ≈ 0.5M trainable parameters and can run at 48 FPS on CPU and at 695 FPS on GPU (refer to section V for benchmarking details).

B. LINE SEGMENTS RECONSTRUCTION
Let us consider how the predicted segments mask M and tangent field F are converted into the desired output -a set of line segment ((x 1 , y 1 ), (x 2 , y 2 )), .... This process has three steps: firstly, the predicted features are coarsely segmented into lines and background (section "Foreground segmentation"). Secondly, the lines are finely segmented into several line support regions (LSRs) (section "Region grouping"). Finally, a line segment and its confidence is extracted from each LSR (section "Line segments extraction"). Foreground segmentation. The first step of line segments reconstruction is to segment foreground (lines) from background (not lines).
We use a coarse-to-fine binarization approach by multiplying the masks of global thresholding M (p) > τ and local thresholding where θ -threshold, K p -a window centered at pixel p, W p (d) -Gaussian averaging weight. Global thresholding with a small threshold gives a coarse extraction of line segments mask, but often incorrectly joins close -but separate -line segments. On the contrary, local thresholding provides much finer local distinction of line segments, but can produce clumps of false positive detections in low intensity areas (Fig. 3). The combination of these binarizations by simple multiplication of the resulting masks allows to filter out false positives of both types and achieve better accuracy. Region grouping. The goal of this step (being a modification of a similar step of LSD algorithm [8]) is to split the foreground, binarized at the previous step, into narrow striplike LSRs, one per true line segment.
Informally, we want neighbouring pixels p 1 , p 2 to be assigned to one LSR, if the values of M (p 1 ), F (p 1 ) and M (p 2 ), F (p 2 ) are similar. The algorithm grows LSRs iteratively, starting from pixels with highest M value and adding new pixels to the existing LSR which are geometrically close to the pixels and have similar features.
Let us formally introduce the similarity measure used to decide whether pixel g is fit to be joined into a LSR. Let R = p 1 , p 2 , ..., p n be the set of pixels of this LSR, I R = 1/n · p∈R M (p) be the mean line segments' mask over it, and φ R = ∠ p∈R e 2iF (p) -the average tangent field (here ∠z = atan2(Im(z)/ Re(z)) is the phase of a complex number). Then the similarity function is given by The first term defines similarity of tangent field orientation (refer to Eq. (1) for details), the second -the similarity of line mask, α -weighting coefficient. Given the distance function, LSRs are built with an iterative growing algorithm, presented in Algorithm 1.
Line segments extraction. Each LSR R = p 1 , p 2 , ..., p n , should be converted into a line segment satisfying the following criteria.
• The segment goes through LSR's center of mass p µ • The segment is collinear with minor eigenvector a of region's inertia tensor I defines as follows • The segment spans the furthest LSR's points, projected onto axis a. The segment's confidence is mean value of M over the region R. input : Line heatmap M ∈ [0, 1] hw , tangent field F ∈ [0, π) hw output: Set of line support regions R = {R} param: τ ∈ R -distance threshold param: n ∈ N -minimum region size B ∈ {0, 1} hw -foreground segmentation of M ; if |R| < n then remove R from R; end end Line segment extraction is visualized in Figure 2b.

IV. DATASET
In this section we analyze the issues of the existing line segment detection datasets and propose a dataset Wireframe-tiny++, a subset of Wireframe dataset [23] with refined annotations.

A. THE EXISTING DATASETS
To the best of our knowledge, there are two widely-used public line segment detection datasets: Wireframe [23] and YorkUrban [24]. The former is composed of 5.000 train and 462 test images, the latter is composed of 120 test images. The datasets contain both indoor and outdoor colour images of various man-made environments. Some samples from the datasets are presented in Figures 4 and 5. The datasets are annotated with a list of point pairs, representing line segments.
York dataset was annotated under so-called Manhattan world assumption [29], which means that the annotated line segments are those aligned with the basis of some Cartesian VOLUME 4, 2016 coordinate system (specifically, with axes parallel to image sides), while the others are ignored. Wireframe dataset did not follow Manhattan world assumption and was annotated with line segments, from which "meaningful geometric information of the scene can be extracted" [23], which also resulted in some salient line segments being not annotated.
So, the ground truth labeling in these datasets is explicitly limited to some category of line segments -which means other categories of line segments are viewed as negatives. CNNs that are trained on these datasets (and/or with high accuracy on them) will have to systematically classify these categories of line segments as negatives and therefore -by design! -can't be viewed as general-purpose line detectors.
While such annotations can be useful to train and test some specific niche line segment detectors (e.g. for indoor robot navigation [30]) we believe they are flawed as datasets for general purpose line segment detection. Specifically, we would like to highlight the following problems (also illustrated in Fig. 4b).
One problem is the inconsistency of the annotations -it is easily noticeable on strip-like objects having two side line segments looking almost exactly alike -however one of them is annotated while the other is not.
Some categories of salient line segments are systematically not annotated -e.g. shadows and reflections. We believe the fact that these segments are not "real" physical objects should not be considered in the context of general purpose line segment detection and these segments should be annotated as well.
Finally, some line segments lying on the same straight line are falsely merged (vertical segments on the bed canopy's frame in Fig. 4b). It happens when a long line segment is intercepted by another object. Although for some applications it could be desirable to avoid such a splitting and there are approaches to achieve that [12], [13], we believe, that for general-purpose detector splitting is the desired detector's behaviour.

B. WIREFRAME-TINY++
To approach the covered issues with the existing datasets, we selected 20 random images from Wireframe test subset and reannotated them to make the annotations more accurate and consistent. We call the selected subset of images with the original markup Wireframe-tiny, and the resulting dataset with enhanced annotations -Wireframe-tiny++.
Comparing to the original annotations, we mainly added unannotated segments, 9 per image on average. Some segments are removed as undetectable. Some segments are divided into several smaller segments due to occlusion. The refined annotations are presented in Fig. 4c.

V. EXPERIMENTAL SETTING
Datasets. The proposed algorithm is trained and evaluated with the following datasets. Wireframe dataset [23] consisting of 5000 training and 462 test images is used both to train and evaluate LSDNet. Datasets YorkUrban [24], Wireframetiny and its reannotated version Wireframe-tiny++ (refer to the previous section for details), composed of 120 and 20 images correspondingly, are used solely for evaluation.
Accuracy. To evaluate LSDNet accuracy we use standard [15], [18] quality score F H = 100 · 2 · p · r/(p + r), where p, r stand for precision and recall. Multiplier 100 is added for readability, making F H fall in [0, 100] range. The score is evaluated pixel-wise by rasterizing both the predicted and the reference line segments. A pixel of a predicted line segment is considered true positive, if its distance to a pixel of a reference segment does not exceed 1% of image diagonal. For evaluation we use F H implementation provided with L-CNN [18].
Quality score F H was criticised [18] for being not sensitive towards overlapping and splitted line segments. We consider such an insensibility is not critical: LSDNet can not produce overlapping segments by design, since line support regions (LSRs) can not overlap, and we did not observe a notable amount of splitted line segments for any CNN-based algorithm.
Speed. CNN is benchmarked on Quadro GV100 GPU for comparison with other detectors. Reconstruction algorithm is benchmarked on Core i5 9300hf CPU. CNN and the reconstruction algorithm are benchmarked independently. The speed of the latter depends on its input, we report the average speed over the dataset given the trained preprocessing network.
The reported FPS for all CNN-based methods is cited as in [20], where benchmarking was performed on Tesla V100 GPU with practically the same characteristics as the GPU used in our experiments.
The reported F H is also cited as in [20] for all methods except LSD and L-CNN [18], for which it was reproduced by our means. We tried to reproduce the stated quality measurements for other approaches with the help of their opensource implementations, but they appeared notably lower than the reported ones. Therefore on datasets Wireframe-tiny and Wireframe-tiny++ we compare LSDNet only to L-CNN and LSD.
Preprocessing. For LSDNet, all images are resized to 288 × 288, which appeared to be the optimal input shape in terms of speed-accuracy tradeoff. Pixel intensities are simply converted from 8-bit unsigned integer to 32-bit floating-point with 1/255 scaling coefficient. During training, random horizontal and vertical flips and gamma correction are applied.
For baseline methods, the preprocessing from the corresponding paper is applied. For LSD, we use 320 × 320 image shape.
Hyperparameters. LSDNet is initialized with He uniform initialization [31] and trained by Adam optimizer [32] with 10 −4 weight decay and 8 images per batch for 180 epochs. The initial learning rate is 10 −3 and is reduced by half if the value of loss function does not improve for 15 epochs. Implementation. CNN training and inference is implemented in TensorFlow [33] and ONNX Runtime [34], correspondingly. Reconstruction algorithm is implemented in C++ with the help of OpenCV [35].

VI. RESULTS AND ANALYSIS
In this section we quantitatively and qualitatively analyze LSDNet performance and compare it to a wide range of stateof-the-art line segment detectors. Please refer to Section V for evaluation and comparison details.      Compared to the fastest detector outperformimg LSDNet in accuracy on Wireframe dataset, M-LSD, the proposed approach is approximately two times faster with 214 FPS against 115 FPS. The fastest algorithm to outperform LSD-Net on both datasets is TP-LSD, which is approximately four times slower with 49 FPS.
Custom datasets. Table 2 summarizes the results on Wireframe dataset, its subset Wireframe-tiny and its reannotated version Wireframe-tiny++. Please refer to section IV-B for details.
On Wireframe-tiny++, all the detectors demonstrate higher accuracy than those on Wireframe-tiny. Since these datasets are composed of the same images and differ only in annotations, such a consistent accuracy growth indicates that Wireframe-tiny++ annotation is more suitable for the problem of general purpose line segment detection.
All the approaches, being arranged by F H , show the same relative order on all the datasets, but the absolute differences change significantly. The gap between LSDNet and L-CNN has shrinked from 3.7 F H on Wireframe to negligible 0.2 F H on Wireframe-tiny++. We believe it could be explained by different learning capacity of detectors' CNNs. Expressive L-CNN with 9.8M parameters managed to learn the subtle notion of a line segment implied by Wireframe train dataset annotation (discussed in Sec. IV-A); whereas lightweight LSDNet with only 0.5M parameters learned the general line segment detection with no capacity to learn the subtle details. It made L-CNN good for wireframe-like detection problems with the goal to detect line segments, from which "meaningful geometric information of the scene can be extracted". But it could possess confusing properties in terms of generalpurpose line segment detection, making LSDNet a better choice in such a case.
Qualitative results. Qualitative comparison of LSDNet to other line segment detectors is illustrated in Figure 6. In this section we refer to LSD and LSDNet as LSR-based and to HAWP and M-LSD-tiny as endpoint-based detectors, since the methods within these groups demonstrate similar behaviour.
Endpoint-based detectors demonstrate the selectivity of line segments, which could not be attributed to overall segments' saliency. This effect is mostly notable in the foreground in the right column in Figure 6. LSR-based LSD detects all the shadows and the carpet in front of the sofa, while it can't "see" floor tiles due to their low contrast. LSDNet detects all the shadows, the carpet and the floor tiles. Whereas endpoint-based detectors HAWP and M-LSD-tiny detect these objects poorly, but at the same time they detect way less salient segments in the background. We believe such a selectivity could be attributed to the combination of high expressive power of the underlying CNN and annotation inconsistencies of train dataset, discussed in Section IV-A. This effect could be undesirable in an application requiring that very class of segments, which is missed by endpointbased detectors.
Another interesting difference between the detectors' groups occurs due to the types of misdetections. In terms of a quality measure, LSR-based LSD and LSDNet can detect false positives, typically corresponding to line segmentlike patterns on highly structured image regions, and miss some annotated segments (false negatives), which are usually poorly visible. These errors could be, at least partially, attributed to the ill-posed nature of the task. The endpointsbased methods are also prone to miss poorly visible segments, and possess an advantage of not detecting false positives on structured image regions. However, an potential drawback of endpoints-based detectors is that they can produce hard false positives -line segments of high confidence score with no evidence of a true line segment in an image. We believe the reason for hard false positives is a classification error of line segment verification module. An example could be seen in the middle column in Figure 6, please note the salient diagonal segments in the bedhead (fourth row) and right part of the carpet (third row). This issue is to be approached prior to successful exploitation of an endpointbased detector.

VII. CONCLUSION
In this study we introduce a fast and accurate line segment detector LSDNet. The detector is composed of a lightweight encoder-decoder CNN, which predicts line segment heatmap and tangent field, and a postprocessing module -a modification of the famous LSD algorithm. When benchmarked on the traditional Wireframe dataset against several SOTA methods, LSDNet shows the highest FPS of 214 -though it achieves detection accuracy of 78 F H -lower than the best methods (82 and 83.1 F H ). However, we speculate that this gap in detection accuracy is primarily caused by the imperfections of the dataset rather than the network itself. We analyze the commonly used line segment detection datasets -Wireframe and York Urban -and point out numerous and significant inconsistencies in their annotation. By carefully reannotating a part of the Wireframe test dataset, we show that (i) all detectors demonstrate better quality on improved annotations (without any retraining), which indicates that the refined annotations correlate better with the notion of correct line segment detection, (ii) the gap between accuracies of our detector and others is reduced to almost non-existent -with our method being the fastest.

VIII. ACKNOWLEDGEMENTS
We would like to thank Marina Tepliakova for making the illustrations; Alexey Savchik and Veniamin Blinov for reviewing the early versions of the manuscript; Dmitry Nikolaev for his strong belief in the power of fusing model-based algorithms with light-weight neural networks, which inspired this work.