FASTDLO: Fast Deformable Linear Objects Instance Segmentation

—In this paper, an approach for fast and accurate seg-mentationofDeformableLinearObjects(DLOs)named FASTDLO is presented. A deep convolutional neural network is employed for background segmentation, generating a binary mask that isolates DLOs in the image. Thereafter, the obtained mask is processed with a skeletonization algorithm and the intersections between different DLOs are solved with a similarity-based network. Apart from the usual pixel-wise color-mapped image, FASTDLO also describes each DLO instance with a sequence of 2D coordinates, enabling the possibility of modeling the DLO instances with splines curves, for example. Synthetically generated data are exploited for the training of the data-driven methods, avoiding expensive collection and annotations of real data. FASTDLO is experimentally compared against both a DLO-speciﬁc approach and general-purpose deep learning instance segmentation models, achieving better overall performances and a processing rate higher than 20 FPS.


I. INTRODUCTION
D EFORMABLE Linear Objects (DLOs) are a special subgroup of deformable objects consisting, among the main constituents, of cables, wires, ropes, suture threads and elastic tubes [1]. Despite DLOs being vastly present both in domestic and in industrial environments, very few robotics systems are currently deployed in scenarios in which a proper perception of DLOs is required. Indeed, the manufacturing and assembly industries working with wires and wiring harness still largely rely on human labor whereas the introduction of robotic solutions is only discussed and studied at a research level, e.g. in automotive [2] and aerospace industries [3]. This deployment gap is due to the lack of a stable, efficient and accurate approach for the perception of this class of objects.
In this paper, an algorithm named FASTDLO (FAst Segmen-Tation of Deformable Linear Objects) for a reliable, accurate Manuscript  and fast instance segmentation of DLOs assuming no knowledge about the background and the number of objects in the scene is presented. FASTDLO takes as input an image and provides as output a colored mask where each DLO instance is denoted with a unique color. In addition, FASTDLO outputs a sequence of key-points for each DLO instance. Thus, it is possible to model the DLO instances by means, for example, of spline curves, providing in this way an important and useful description for robotic manipulation tasks. From the input image, the background, i.e. pixels not corresponding to a DLO-like object, is removed by means of a Deep Convolutional Neural Network (DCNN), generating as output a binary mask. Thereafter, the binary mask is processed with a skeletonization algorithm and the ambiguous intersections between the DLOs are solved with a second data-driven approach based on a shallow similarity-based neural network. Synthetically generated data are deployed in the learning-based methods allowing a fast adaptability to every possible custom scenario. FASTDLO achieves a processing rate higher than 20 Frames-Per-Second (FPS) with an image size of 640 × 360 pixels, employing, as hardware, a workstation with an Intel Core i9-9900 K CPU clocked at 3.60 GHz and an NVIDIA GeForce GTX 2080 Ti. PyTorch 1.4 is used for the software implementation. To summarize, the main contributions of this paper are: r A reliable and efficient method for the instance segmentation of DLOs in images without assumptions about the type of background and the number of objects in the scene; r Deployment of synthetic data for all the data-driven approaches involved in the proposed method, enabling a faster adaptability to specific use-cases; r Exploitation of similarity learning to discern the DLO instances at intersection-level employing both appearancebased and topological features; r Better overall performances in terms of speed and accuracy compared to existing methods available in the literature, both DLO-specific and general-purpose ones. The source code implementing FASTDLO and the associated data is available at https://github.com/lar-unibo/fastdlo.

II. RELATED WORKS
The importance of DLOs in a large variety of applications results in a high interest in solutions that allows their correct and precise identification for different tasks such as cable manipulation [4] for switchgears and harnesses manufacturing or DLOs shape estimation [5] by means of multiple 2D images. In the past, the problem of DLO identification has been solved in simple settings: in [6] the authors required the presence of a single DLO in the scene and its segmentation was based on a color threshold with a controlled background; in [7] a good contrast between the background and the DLO is again assumed; in [8] a threshold is again applied in a controlled background to segment the cable. Indeed, the major difficulties in DLO identification rely on its simplicity, which does not offer distinctive features to be used for an unambiguous detection. Moreover, the approach proposed in [6] assumes to deal with just a single DLO in the scene. These assumptions about segmentation capability and number of expected instances limit the applicability of the proposed solutions in real-case scenario where DLOs are commonly involved. On the contrary, FASTDLO does not make any assumption about the background and the number of DLOs in the scene.
Concerning advanced DLO-specific approaches, the earliest contribution for DLO detection, segmentation and modeling is represented by Ariadne [9], an algorithm based on the oversegmentation of the source image employing superpixels, developed to perform DLO segmentation in case of complex backgrounds. Ariadne+ [10] was recently introduced as an improved version of Ariadne concerning several aspects: better accuracy and efficiency, ability to consider even more complex scenarios in which the endpoints of the cable were not present in the image. Currently, Ariadne+ represents the state of the art in terms of DLOs instance segmentation. However, its throughput is limited to a few FPS providing a strong limiting factor for its applicability on real-world applications. In this regard, instead of a paths discovery method based on the superpixelization that requires significant processing effort and the definition of the number of superpixels in an image, FASTDLO focuses directly on solving the intersection areas of the image between multiple DLOs to distinguish the instances, increasing the speed and accuracy of the results. Indeed, these improvements can be beneficial for DLO tracking problems that have been only relatively solved in partially occluded environments [11], in simulation [12] or with markers attached on the DLO [2], [13].
Regarding other data-driven approaches, the advancements of deep learning in the last years resulted in several DCNN tailored for the general problem of instance segmentation task, e.g. [14]- [17]. In addition, the segmentation of wires and cables via learning-based methods has been attempted in [18] where a dataset consisting of electric wires obtained with a chroma-key approach is made publicly available. A relevant problem in the application of data-driven approaches for the instance segmentation of DLOs resides in the lack of good-quality publicly available datasets and, consequently, the difficulty in annotating a large set of images. However, some approaches are emerged focusing on synthetic data generation pipelines [19], [20] that tackle this problem.
For the sake of completeness, other methods exist that rely on different sensing approaches and instead of images use other sensors for the detection of DLOs such as electrical cables, e.g. sensorized tactile fingers in [21]. These methods ca be useful in case of occlusions affecting the camera view due to tight operating spaces.

III. THE FASTDLO ALGORITHM
The FASTDLO pipeline, schematized in Fig. 1, consists of the following main steps: A) Background Segmentation: A DCNN performs the segmentation of the source image discerning background pixels from DLOs pixels, outputting a binary mask M b . B) Skeleton Pixels Classification: A skeleton M s is generated from the mask M b and its pixels are classified depending of their local neighborhoods. C) Segments Generation: The intersection areas of M s are filtered out and segments are generated; D) Intersections Processing: A shallow neural network is employed to predict connection probabilities among endpoint-pairs; E) Informed Merging: The segments are concatenated to recover the full description of each DLO employing the result of intersections processing; F) Intersections Layout: The standard deviations of the DLOs instances RGB colors at intersections-level are used to asses the correct ordering at the intersection areas to create correct instance masks. The aforementioned steps are deeply analyzed in the following.

A. Background Segmentation
The generic input image I s is processed by means of a DCNN performing the semantic segmentation, i.e. the task of labeling each pixel of an image with a given class. For the paper purposes, just one class (the DLO) is defined, thus the output of the segmentation is simply a binary mask M b with the DLOs pixels labeled in white.
Aiming toward the prediction of a reliable binary mask, and due to the usual difficulties in data collection and labeling for deep learning applications, a novel pipeline [19] making use of Blender to render realistic images is exploited. A dataset of synthetically generated cables is built randomizing the shapes, radius, color and stripes of the DLOs. A random texture is chosen as background and the scene lighting conditions are randomized as well creating different combinations of shadows. All these expedients are needed to enhance the generalization capabilities of the network during training. Overall, a total of about 32,000 images were rendered to handle the data-driven learning methods involved throughout this paper. As an example, in Fig. 2 some generated images are shown.
As network architecture for the background segmentation, DeeplabV3+ [22] is selected since it provides reliable performances in the context of DLOs especially along object boundaries, as demonstrated also in [10], [18].
In Fig. 3 an example of the segmentation process on real samples is shown: the shadows present in the input image do not affect the predicted mask and are successfully neglected; instead, the background object in the second row is more difficult to handle since it appears as a thin wire in the image, so few false positives can be found in the associated mask.

B. Skeleton Pixels Classification
The segmentation mask M b is processed with a skeletonization algorithm consisting of a thinning iterative approach which erodes the input mask. Thus, a new mask M s is obtained having the following properties: 1) same connectivity as the input mask; 2) 1-pixel width across the mask instead of the original mask thickness; 3) equidistant skeleton to the borders of M b . In Fig. 3 (last column) the masks M b and M s are combined to highlight these properties.
From an image comprising several DLOs and exploiting the linearity property of the latter, for each pixel of the skeleton M s , defining a small (3 × 3) kernel, only three types of local neighbors can be experienced. They are depicted in Fig. 4 with the target pixel at the center of the local region highlighted with a r section: two more pixels are present in the neighborhood, i.e. Fig. 4(b). The term section refers to the considered pixel being placed along a section of a DLO and not at its end.
r intersection: the central pixel is surrounded by three more pixels in the neighborhood forming, in general, a characteristic 'Y ' shape [6], i.e. Fig. 4(c). This condition in case of binary masks describing DLOs occurs when two DLOs cross each other.

C. Segments Generation
The intersection pixels with their surrounding area are discarded from M s since they correspond to a topologically misleading region of M s due to the DLOs crossing. Indeed, these phenomena can be appreciated in Fig. 3 where the generated skeletons nearby the intersection pixels do not describe correctly the DLO topology, i.e. center line, as opposed to the skeleton pixels far away from it. The discard operation is performed based on the distance transform image of the local area considered. The distance transform is an operation that computes the distance, in pixel values, between a given pixel location to the nearest boundary [23], i.e. black pixels of M b .
Because of the removal of the intersection areas, new endpoint pixels emerge in the updated skeleton. Thus, segments are generated between two connected endpoints. A segment is defined as an ordered sequence of pixels where the elements inside the sequence are sections whereas the extremities are endpoints. The segment sequence can be effectively obtained by sliding the skeleton with a 3 × 3 kernel from one of its endpoints, collecting the only pixel not already in the segment under construction and updating the kernel anchor to the added pixel location. In addition, for each segment, a common thickness is estimated based on a distance transform previously computed. The overall segment thickness is obtained by computing the median value of the distances gathered for each segment's pixel. The median allows gaining robustness against spurious values due to noisy boundaries in M b . In Fig. 1 the segments generated for the considered image are denoted with unique colors.

D. Intersections Processing
The intersections among the DLOs are solved by comparing the feature vectors of the endpoints of two candidate segments via a shallow neural network, i.e. similarity network, predicting the probability of their connection. The computation of the connection probabilities is schematized in Algorithm 1 showing the two main phases: endpoint-pairs collection; similarity network predictions. The inputs of the algorithm are the set of all the intersections in the image, i.e. C, and, given the updated skeleton of Section III-C, the sets of the endpoints and of the segments, i.e. E and S respectively.
The approach starts by collecting all the endpoint-pairs which connection needs to be evaluated by initializing P empty (line 1). Thus, for each intersection to be solved c, the endpoints of the segments associated to c, i.e. originally connected to c before removing the intersection pixels from the skeleton (see Section III-C), are extracted from E and collected into E c , line 3. Then, the components of E c are organized into combinations of 2 elements, i.e. endpoint-pairs, in P c , and the set of endpoint-pairs P is updated accordingly, lines 4 and 5.
The collected endpoints-pairs P are now processed by a similarity network. The goal of this network is to transform an input feature vector into an embedding space where similar input vectors are close together and dissimilar ones are far apart. In the setup adopted in this work, the triplet loss [24] is deployed for the required optimization of the network. The loss is computed between an anchor, a positive and a negative sample. The distance in the embedding space between anchor and positive is minimized, while the one between anchor and negative is maximized. The input feature vectors are obtained from the endpoints of the segments around a given intersection. As feature elements of the input vector x ∈ R d i , the following values are used: RGB color of the local endpoint area; thickness of the segment associated to the endpoint; endpoint direction estimate.
In Algorithm 1, the feature vector for the endpoints e i and e j are created at lines 8 and 9. Then, a forward pass in the network layers is performed to compute the embedding vectors z i and z j , lines 10 and 11.
As mentioned, the prediction is based on the distance of the embedding vectors which can be computed as d ij = z i , z j 2 , where · 2 denotes the L2-distance. To obtain a probabilitylike value in the [0,1] range describing the likelihood of the  connection, the distance is transformed by means of a Gaussian activation function as p ij = e −d ij . This last step in the similarity network is provided at line 12 while an illustration schematizing the computation flow of the similarity network from its inputs till the predicted connection score is available in Fig. 5.
To conclude, for each endpoint-pair a probability value p ij is computed and the set Z updated (line 13) with tuples of three values, i.e. endpoint-pair (e i , e j ) and connection probability p ij . Although Algorithm 1 describes the process for each individual element of P, the actual implementation is based on batch processing enabling an efficient computation of the scores, as described in Section IV-F. In Fig. 6(b) an example of the processing for four segments (six endpoint-pairs) extracted from Fig. 3 (first row) is shown.

E. Informed Merging
Exploiting the endpoint-pairs connection probabilities computed in Section III-D it is possible to concatenate segments obtaining the full description of each DLO in the image. This concatenation process is addressed as informed merging and it is schematized in Algorithm 2, showing how Z is employed to iteratively update S till each s ∈ S describes a whole DLO.
First, the elements of Z are sorted based on the connection probability values in descending order (line 1), thus prioritizing during the merging process the most probable connections. The set of nodes already processed, i.e. E z , is initialized to zero at line 2. An iteration on the elements of Z is performed and, starting from the highest score and moving toward the lowest one, the merging of the segments, i.e. Fig. 6(a), is executed. Indeed, if both the endpoints retrieved from one of the elements of Z are not already processed (line 4), and their endpoint-pair connection probability is larger than a user-defined threshold t c (line 5), the two corresponding segments associated to the endpoints are collected (lines 6 and 7), merged together (line 8) and the segments set updated (line 9). With the term informed merging we refer to the high-level operation of performing the union between the two segments sets taking into consideration their ordering, e.g. head-tail, tail-tail and all the others combinations.
Consequently, the endpoint-pairs having lower scores and with one of the two endpoint elements already associated are not considered and their merging avoided. The describes association continues for all the elements of Z having a connection probability larger than the threshold t c , introduced to avoid merging endpoints with incompatible orientations, colors or thicknesses. This threshold is effectively used only in situations where the mask M b is not reliable and edge-conditions occur. Instead, in normal settings, the merging process would result in first high probability endpoints association thus making the low probability ones already incompatible irrespective of the threshold value.
The presented merging process is performed directly on the existing segments, thus the operation is propagated by updating the set of segments accordingly. For instance, in case of a segment disputed by two different intersections, at the second merging process the operation of joining the two candidates segments is performed on the new merged segment (obtained after the first merging process) and not on the initial one.

F. Intersections Layout
As additional information aiming at providing a complete and accurate solution of the scene, the order of the DLOs in a given intersection, i.e. which is the one at the top of the pile, is provided by comparing the standard deviation of the RGB colors along the line connecting the endpoint-pair previously solved. For example, given an intersection made of two DLOs, i.e. with four endpoints, and hence two endpoint-pairs predicted, the RGB color values along the two endpoint-pairs positions are collected and their mean standard deviation, i.e. the mean of the standard deviations computed for each channel, compared. The pair with the smallest standard deviation is assumed to be at the top of the intersection pile, while the highest standard deviation pair below. The difference in the value is due to the change in the color along the line for the DLO not at the top or, in case of DLOs with identical colors, mostly due to the shadows projected from the above DLO onto the one below. In the case of a cross composed of three or more DLOs, only the instance above them all can be identified since the continuity in the colors in the intersection region is the main deciding condition. This continuity is only met for the top DLO. The approach is quite simple and yet proves to be effective and inherently fast given the intersection solution provided in Section III-D is reliable, more in Section IV-A.
DLOs ordering in an intersection is particularly needed in case this approach is integrated into a larger manipulation pipeline with a robotic system for routing or pick and place tasks. The information about the layout of the DLOs in an intersection is missing both in [9] and [6]. Instead, in [10] a solution based on a data-driven classification approach is presented requiring specific training and constrained at precise image crop resolutions. On the contrary, the approach presented in this paper exploits the accurate DLOs center line localization obtained from the skeleton, as opposed to the superpixels centroids used in [10], avoiding then the introduction of additional data-driven approaches. In Fig. 7 some example intersections are displayed with the computed values.

A. Training
The training dataset is obtained from 90% of the synthetic dataset described in Section III-A, while the validation from the remaining 10%. The segmentation network of Section III-A, i.e. [22], employed in FASTDLO is trained with a ResNet-101 [25] backbone pre-trained on ImageNet for 250 K iterations with the final weights selected as the ones corresponding to the lowest validation loss. As hyper-parameters, we employed a batch size of 10, output stride of 16, separable convolutions, Adam as optimizer and a polynomial learning rate adjustment policy with power 0.95 starting from 10 −6 to a minimum of 10 −9 . As augmentation scheme we deploy: channel shuffling; hue, saturation and value randomization; flipping; perspective distortions; random cropping; random brightness and contrast.
Concerning the similarity network of Section III-D, from the synthetic dataset, training and validation samples are offline sampled taking into consideration the intersections. The similarity network is composed of three fully connected layers with input, hidden and output dimensions of d i = 7, d h = 32 and d o = 16 neurons respectively. The similarity network is trained starting from randomly initialized weights for 50 epochs, with batch size 128, learning rate 5 · 10 −4 , using Adam as optimizer and with the final weights selected based on the validation loss. As connection probability threshold t c , the value of 0.2 is used throughout the experiments. As highlighted in Section III-E, this threshold comes into play only in case of not reliable masks M b , i.e. false negatives and positives.

B. Baseline Methods
The DLO-specific approach named Ariadne+ [10] is used as comparison. It employs the same segmentation network architecture of the one introduced in Section III-A to distinguish the DLOs from the scene. Thus, comparisons with FASTDLO can be established both at segmentation level, i.e. utilizing the weights of the original work [10] and the ones obtained from the synthetic dataset, and at the instance segmentation level, i.e. comparing the final result fixing the segmentation network and weights for both Ariadne+ and FASTDLO. As described in [10], a number of superpixels equal to 50 is employed in Ariadne+ for the comparisons.
The other baselines employed are general purpose DCNN performing the instance segmentation: YOLACT [14], YOLACT++ [15], BlendMask [16] and CondInst [17]. Networks backbones with different depths can be applied in these DCNN models, thus comparisons are established for each configuration. The already introduced synthetic dataset of Section III-A, labeled in this case for the instance segmentation task, and with the mentioned train-val split 90 − 10 is used in the training stage of each model. The hyper-parameters of each method have been tuned trying to maximize the performances. A general training strategy consisting of a maximum of 250 K iterations with the selection of the final weights based on the minimum validation loss has been followed for all the DCNN baselines. The augmentation schema resembles the one used for the semantic segmentation network. YOLACT and YOLACT++ have been trained starting from the ImageNet weights with a batch size of 6, an initial learning rate of 10 −3 reduced by a factor of 10 at iterations 100 K and 150 K. Stochastic gradient descent (SGD) with momentum 0.9 and weight decay 5 · 10 −4 is employed as optimizer. BlendMask and CondInst have also been trained starting from pre-trained weights on ImageNet with a batch size of 6 and with an initial learning rate of 0.01 reduced by a factor of 10 at iterations 100 K and 150 K. As optimizer SGD is employed with weights decay and momentum set to 10 −4 and 0.9 respectively.

C. Test Dataset and Metrics
To evaluate the FASTDLO performances on real data, a test set of 135 manually labeled real images of electrical wires with varying diameters and collected in different real scenarios is used. The test dataset is organized into 3 categories, each containing 45 images: C1: scenes with the target wires laying on a surface and no other disturbing objects. The difficulties in these scenes are the high contrast shadows, possible chroma similarities with the background, the light settings and the perspective distortions. C2: scenes with the target wires on a highly featured and complex background and no other disturbing objects. Here, the challenge for the algorithm is to extract the wires correctly in a cluttered scene. C3: scenes with the target wires in a realistic setting as an industrial one (e.g. an electric panel). The difficulties are given by the metallic surface reflecting the wires and other disturbing objects like commercial electromechanical components, typical of these products.
Each category is further divided in sub-classes based on the number of intersections present in the images, i.e. the subcategories 1 (one), 2 (two) and 3 (three) are created with 15 samples each. Compared to the test set employed in [10], here 45 new images have been added to evaluate the different diameters condition, 15 images for each category with 5 images in each subcategory. In the remainder of this section, the group of images corresponding to the test set of [10] is referenced as base while the new group of images as ext.
As a metric for the evaluation, the Intersection over Union (IoU = |M ∩M gt | |M |+|M gt | , where M is the mask under evaluation and M gt is the ground truth) is employed. For the semantic segmentation network, the mask M corresponds to the binary mask M b , while for the instance segmentation results the mask M corresponds to the colored mask M c where each DLO instance is denoted by a unique color and the IoU score is just the average score across the instances of the image.

D. Evaluation
In Fig. 8 the plots related to the segmentation and similarity network performances on the test set are provided. For the first, i.e. Fig. 8(a), an almost constant IoU score is obtained for a wide range of masks' threshold values. Based on this plot, a value of 0.3 is selected as mask threshold. Concerning Fig. 8(b), the evaluation on each intersection in the test set images is performed by denoting a positive result if the predicted probability score between the correct endpoints is the largest among the complete set of scores obtained from the intersection under test. Thus, the Receiver Operating Characteristic (ROC) curve is built.
The baseline methods are evaluated in Table I by means of the IoU score computed starting from the color masks provided as output by each method. In addition, the table also provides  [10], YOLACT [14], YOLACT++ [15], BLENDMASK [16] AND CONDINST [17]. RESNET-50 AND RESNET-101 ARE FROM [25] Fig. 9. Qualitative evaluation of FASTDLO and the best performing baseline using a sample for each category. details about the average inference time and FPS of each method when applied to the test set plus a flag indicating if each approach provides as output an additional representation of the DLO instances in terms of key-points or splines, allowing for a broader comparison. FASTDLO achieves better overall scores showing a large advantage over the general purpose approaches and performing slightly better compared to Ariadne+, where both methods employ the same weights in the segmentation network. From the processing time perspective, FASTDLO is competitive with respect to the general purpose methods while being almost one order of magnitude faster than Ariande+. In Fig. 9 some samples for each test set category are shown with the corresponding output predictions obtained with FASTDLO and with Ariande+. The prediction performances of the intersections layouts, i.e. Section III-F, are also evaluated on the test set. Considering only the correct endpoint-pairs predictions, the approach discussed in Section III-F is able to provide a correct result in 226 of the totals of 232 intersections achieving an overall accuracy of 97.4% compared to 78.3% (177/226) of Ariadne+. Thus, it is clear the validity of the proposed method that is executed without noticeable overhead in terms of processing time.

E. Comparison Studies
The semantic segmentation performances on the test set are compared in Table II when deploying the synthetic dataset of   TABLE II  COMPARISON OF THE SEMANTIC SEGMENTATION PERFORMANCES WHEN  EMPLOYING THE SYNTHETIC DATASET OF SECTION III-A AND THE CHROMA-KEY DATASET [18] ResNet-101 is used as backbone. The values denote the IoU scores in percentage.  Section III-A or the chroma-key dataset of [18] for the training. From the table it is observable that the average scores are very similar, with the synthetic dataset handling better the category C1, mostly due to the shadows, and the ext group of images consisting of cables with varying diameters. On the contrary, the dataset of [18] shows stronger performances on the complex industrial background category C3. For a better evaluation of the FASTDLO performances with respect to Ariadne+, an Hybrid model is built starting from FASTDLO and replacing its intersection processing method (Section III-D) with the curvature and colors predictors of Ariadne+. The utility of the Hybrid model is thus twofold, since it allows to evaluate the benefits of: 1) the skeletonbased processing of the mask (Sections III-B and III-C); 2) the similarity-network (Section III-D). The results of the comparisons are shown in Table III. The skeleton-based processing of the masks allows to gain on average 0.50 in IoU score between Ariadne+ and Hybrid. On the contrary, the similarity-network based processing of the intersections introduces an average gain of 0.35 in the IoU scores. The gains are mostly present in the ext group of the test set images, showing the limitation of Ariadne+ in handling scenes with DLOs of different diameters.

F. Timings
In Table IV a   time experienced on the test set. As the number of intersections increases, both the segmentation and endpoint-pairs predictions times stay relatively constant. The inference performed by the similarity network is indeed very fast and does not suffer significantly from the increase in the number of intersections to process thanks to the batch-inference. Instead, the skeleton generation time increases of about 15% from 1 to 3 intersections. Also, the additional processing time, mostly due to the informed merging approach, increases with the number of intersections, as expected. Overall, the total processing time is in the range of 40 to 50 ms in all the conditions.

G. Extensions to Other DLOs
FASTDLO can be easily extended to work with a large variety of DLOs. Fig. 10 displays the results of medical hoses segmentation where FASTDLO is applied directly without modifications. For other types of DLOs, like ropes and strings, where the texture characteristic of the surface of the objects can be different compared to the one of cables and wires, the semantic segmentation stage should be re-trained or fine-tuned. This can be accomplished easily by leveraging synthetic data, as shown in this paper. Apart from that, FASTDLO can also be directly applied to these objects.

V. CONCLUSIONS AND FUTURE WORK
In this paper, a DLOs instance segmentation algorithm is presented featuring a processing rate higher than 20 FPS while preserving reliable and accurate predictions. The experimental results demonstrate the validity of FASTDLO when compared to several baselines available in the literature.
In future works, FASTDLO will be integrated into a robotic system for switchgears cabling. Moreover, the use of multiple camera frames will be investigated to improve the semantic segmentation of the scene. Finally, further refinements and optimization in the approach will be performed aiming towards real-time capabilities and, most importantly, a real-time tracking system for DLOs.