Graph Convolutional Networks for Semi-Supervised Image Segmentation

The problem of image segmentation is one of the most significant ones in computer vision. Recently, deep-learning methods have dominated state-of-the-art solutions that automatically or interactively divide an image into subregions. However, the limitation of deep-learning approaches is that they require a substantial amount of training data, which is costly to prepare. An alternative solution is semi-supervised image segmentation. It requires rough denotations to define constraints that are next generalized to precisely delimit relevant image regions without using train examples. Among semi-supervised strategies for image segmentation, the leading are graph-based techniques that define image segmentation as a result of pixel or region affinity graph partitioning. This paper revisits the problem of graph-based image segmentation. It approaches the problem as semi-supervised node classification in the SLIC superpixels region adjacency graph using a graph convolutional network (GCN). The performance of both spectral and spatial graph convolution operators is considered, represented by Chebyshev convolution operator and GraphSAGE respectively. The results of the proposed method applied to binary and multi-label segmentation are presented, numerically assessed, and analyzed. In its best variant, the proposed method scored the average DICE of 0.86 in the binary segmentation task and 0.79 in the multi-label segmentation task. Comparison with state-of-the-art graph-based methods, including Random Walker and GrabCut, shows that graph convolutional networks can represent an attractive alternative to the existing solutions to graph-based semi-supervised image segmentation.


I. INTRODUCTION
The problem of image segmentation has been continuously gaining the attention of the computer vision research community. The precise location and delineation of meaningful objects are crucial in many image processing and analysis pipelines. Therefore, various approaches have been proposed to partition an image into regions corresponding to diverse objects and precisely outline their borders.
Recently, state-of-the-art approaches to image segmentation have been dominated by convolutional neural networks (CNN) [1], [2], [3]. The vast majority of these automatic approaches focus either on semantic segmentation [4], [5], The associate editor coordinating the review of this manuscript and approving it for publication was Mingbo Zhao .
[6], [7] or identifying object instances of particular characteristics (e.g., specific organs, tumorous regions, or disease symptoms in the case of medical images [8], [9], [10], [11]). However, convolutional neural network architectures were also leveraged to learn the interactive segmentation task [12], [13], [14], [15], [16], [17]. The CNN-based approaches to interactive segmentation additionally need image-user interaction pairs at the input in the form of bounding-boxes [15], user positive and negative clicks or scribbles [12], [13], [14], or object's four most extreme points [16]. Although CNN-based methods have demonstrated exceptional performance in various image segmentation problems, they usually require a substantial amount of precisely annotated train data which is costly to prepare and, therefore, frequently limited. Despite current efforts to save for the annotation workload, semi-supervised CNN-based segmentation approaches still require some precisely labeled data to learn from unlabeled data [18]. At the same time, most self-supervised methods depend on fully-supervised pre-trained models [19], while weakly-supervised methods may still require roughly annotating region location (e.g., by drawing bounding boxes, scribbles, or points on many images [20]). Furthermore, convolutional neural networks cannot generalize to objects unrepresented in the train set.
Convolutional neural networks' exceptional performance and potential benefits have drawn attention away from semisupervised image segmentation that uses user-provided labels to form a training set. These approaches need coarse indicating the position of relevant objects and background [12], [13], [14], [15], [16], [17], [21], [22], [23], [24]. The seed regions can be shown either interactively or in some other way, e.g., using a priori knowledge about the regions to be segmented. Initial constraints derived from rough denotations are generalized to provide label-likelihood for unlabelled pixels and precisely delimit relevant image regions. The semisupervised approach allows adapting the segmentation process to a particular image and various objects. As a result, they can often provide a much more accurate object outline than automatic algorithms. Besides, semi-supervised segmentation does not require training examples, as CNN-based methods do. This property can be a significant advantage in many image processing pipelines.
The most prominent group of semi-supervised segmentation methods are graph-based algorithms. They define image segmentation as a result of pixel or region affinity graph partitioning. Several approaches to graph partitioning have been proposed over the years, including graph cuts [25], [26], [27], and random walker [28]. Even though they remain the most popular semi-supervised image segmentation methods, graph-based approaches exhibit limitations. The most significant one represents the considerable computational cost, which limits the application of graph-based algorithms in the case of high image resolutions. This paper revisits the problem of graph-based image segmentation. It approaches the problem as semi-supervised node classification in the affinity graph using a graph convolutional network (GCN). As recent works regarding graph data show, GCNs exhibit the ability to discover structural patterns and learn graph data feature representation, which can be next used for node labeling [29], [30], [31]. The labeling results remain accurate, even for a limited number of labeled nodes.
Applications of the graph convolutional networks in image segmentation are yet limited. To the best of the author's knowledge, this is one of the first works exploring the performance of GCNs in the problem of semi-supervised image segmentation. A few related works regard semantic segmentation [14], [32], [33], [34], [35], which remains a diverse problem since segmentation is performed fully automatically in a fully supervised way by models trained from end-to-end with multiple train examples. This paper brings the following main contributions: • graph convolutional neural network is applied for semisupervised image segmentation for the first time, • performance of both spectral and spatial graph convolution operators is analyzed in the problem of binary and multi-label image segmentation, and their performance is compared, • the advantages and limitations of graph convolutional networks in the problem of semi-supervised image segmentation are discussed.
The following part of this paper is organized as follows. Section II briefly revisits graph-based methods of image segmentation and reviews the basics of graph convolutional networks. Then follows the formulation of a semi-supervised image segmentation problem in Section III. Section IV details the proposed approach and the resulting graph convolutional network architecture. The results of the semi-supervised image segmentation with the use of GCN are demonstrated in Section V and discussed in Section VI. Finally, Section VII concludes the paper.

II. RELATED WORKS A. GRAPH BASED IMAGE SEGMENTATION
Graph-based methods represent images as weighted graphs where nodes represent pixels, edges represent relations between pixels, and weights quantify these relations employing a similarity measure. Image segmentation is then defined as a graph partitioning into sub-graphs representing relevant image regions.
There exist diverse approaches for finding edges to be removed in graph partitioning. They can be divided into three main groups: minimal spanning tree (MST) methods, graph cuts, and the Random Walker approach.
MST-based methods for image segmentation perform graph division concerning the minimal spanning tree of pixel adjacency graph [36], [37]. These methods can divide an image into any number of classes. However, they usually result in significant over-segmentation and are primarily used to generate superpixels (i.e., homogeneous image regions).
Graph-cut methods partition affinity graph concerning a minimal cut (i.e., a set of edges with minimal total weights). The cut is found either via spectral graph partitioning [25], [38] or combinatorial graph cuts. In the first case, the generalized eigenvalue problem is solved to find eigenvectors of the graph Laplacian and utilize them to perform optimal graph bipartition. Finding the eigenvectors is, however, a highly complex problem. Therefore spectral graph partitioning methods remain mainly theoretical, and their application is limited to low-resolution images. Combinatorial graph cuts define subgraphs by solving the min-cut/max-flow problem [26], [27], [39]. Explicitly, the graph bipartition is determined by computing the global optimum among all possible bipartitions that satisfy constraints imposed on the object and background. The optimum is determined by minimizing the energy that incorporates regional and boundary conditions. VOLUME 10, 2022 Extensions to the original min-cut/max-flow approach like GrabCut [40], and Lazy Snapping [41] up to today remain the most effective and prevailing approaches to semi-supervised image segmentation [42], [43], [44].
Random Walker [28] presents an alternative approach to graph-based segmentation. It assigns labels to nodes based on the probability that a random walker released from unlabeled nodes first arrives in nodes with a particular label assigned. The unlabeled nodes are assigned a label for which the highest probability exists. Contrary to graph-cuts, the Random Walker can partition an image into multiple classes.
When applied pixel-wise, the graph-based approaches to image segmentation experience significant problems with efficiency. Explicitly, for high image resolutions, an enormous amount of computer memory is required to represent the corresponding massive image graph. This overhead makes graph partitioning highly complex in computations, providing segmentation results in an unacceptable time.
To alleviate these limitations, graph-based approaches are routinely used region-wise where nodes represent so-called superpixels, i.e., groups of connected pixels sharing some properties [45], [46]. Processing region adjacency graph reduces time and memory overhead. However, it may decrease segmentation accuracy near image boundaries.

B. GRAPH CONVOLUTIONAL NETWORKS
Graph convolutional networks (GCN) have been proposed in the last few years to generalize convolutional neural networks into irregular or non-Euclidean domains. Particularly, GCNs attempt to deal with general (irregular) graphs where computing convolution is not as straightforward as for the grid graphs modeling pixel relations. Such irregular graphs may encode complex geometric data, structure, and pairwise relationships in numerous problems [47], [48], examples of which are social networks [49], [50], citation networks [51], [52], [53], gene data and protein structure [54], chemical molecular structures [55] or transportation networks [56], [57].
Extending convolutions, which are easy to compute for regular grids, into non-regular domains remains not a straightforward problem. It is mainly due to the necessity of preserving the weight-sharing property in the case of different size node neighborhoods. Therefore, many attempts have recently been made to extend convolution operation to irregular graphs. Most of them utilize neighborhood aggregation schemes or message passing [55] that generalize graph embedding for the node classification task.
There exist several concepts that generalize convolution operation into irregular graph domains. They can be classified into spectral methods and spatial methods [58].
Spectral methods define a convolution through the graph Fourier transform from graph signal filtering after converting to spectral-domain representation. This approach involves eigenvectors derived from the spectral decomposition of the graph Laplacian used as a Fourier basis [29], [59], [60]. To avoid direct computation of the graph Laplacian eigenvectors, [61] proposed ChebNet where k-th order Chebyshev polynomials are used to approximate spectral filters that learn on k-hop neighborhoods of the graph. In [30] this approach was further simplified by using firstorder polynomials fused with original feature information of data. The significant drawback of spectral convolution methods is that they are restricted to a fixed graph structure, including the number of nodes and their degrees.
The spatial graph convolution performs computations directly on the graph dependent on nodes' spatial relations [59]. Explicitly, the representation of a central node is updated concerning the representation of its neighbor. The node information is next propagated along graph edges. In this approach, two challenges need to be faced. These are receptive field selection and node ordering. Main approaches to this problem include: assuming a predefined contribution of each node [62], employing a graph diffusion process to incorporate the node context information [63], converting the graph locally to a linear vector space [64], adopting attention mechanism [65] or sampling node neighborhood to obtain a fixed number of neighbors [66]. Compared to spectral approaches, spatial graph convolution methods are more flexible, allowing variable graph structures. They are also more efficient in computing than their spectral counterparts. This paper's approach to semi-supervised image segmentation combines graph-based image segmentation with graph convolutional networks. However, instead of graphpartitioning, it approaches the image segmentation problem as node clustering in the SLIC region adjacency graph (RAG). A graph convolutional network performs the classification, which uses image region relations expressed by RAG to perform image segmentation. The performance of spectral and spatial convolutions applied in image segmentation is compared and analyzed.

III. PROBLEM FORMULATION
Let G = (V, E) be a weighted and undirected graph where V is a vertex (node) set of cardinality N = |V| and E ⊆ V ×V is a weighted edge set of cardinality M = |E|. Nodes v i ∈ V represent either single pixels or groups of connected pixels sharing similar properties (i.e., superpixels). Edges Depending on representation, graph G is thus either a regular pixel adjacency graph or irregular region adjacency graph. Each node v i is a C-dimensional feature vector f i , of color (or intensity) features. Features of all nodes are thus represented as N × C dimensional matrix X . Additionally, each edge e ij has its corresponding weight describing similarity of the adjacent nodes v i and v j . All edges in a graph are represented as an adjacency matrix A of size N × N where entry a ij indicates if nodes i and j are connected.
Let L and U denote subsets of labeled and unlabeled nodes such that L∪U = V, L∩U = ∅, and |L| |U|. Each labeled node v i ∈ L has its corresponding label l i ∈ {1, 2, . . . , N C } where N C is the total number of classes. With respect to the above, the problem of image segmentation can be formulated as a semi-supervised learning of a classifier F : v → y to assign a class label l i to each unlabeled node v i ∈ U based on both labeled and unlabeled nodes (transductive learning settings).

IV. PROPOSED METHOD A. GENERAL IDEA
The key idea behind the proposed approach is to use a graph convolutional neural network (GCN) to perform image segmentation concerning some constraints imposed on the regions of interest. Mainly, a GCN is applied to the image region adjacency graph (RAG) to predict region labels given sample nodes representing resulting regions. A GCN is thus trained in a semi-supervised learning framework to perform node-level classification.
Region constraints may be input in many ways. In this work, they are given as scribbles indicating the rough location (seed points) of regions of interest and thus representing their properties. Graph nodes covered by scribbles are considered labeled nodes L with region labels l i assigned regarding the unique scribble colors. Nodes not covered by scribbles remain unlabeled. A GCN predicts their region labels. The general idea of the proposed approach is presented in Fig. 1. The proposed approach can segment several regions of interest at a time, providing that the seeds for each of them are given (multi-label segmentation).

B. REGION ADJACENCY GRAPH
To build region adjacency graph G, the input image I was divided into multiple uniform disjoint regions {R i } N i=1 using the SLIC (Simple Linear Iterative Clustering) superpixel approach (see Fig. 2). The number of resulting regions was arbitrarily set to N = PQ 100 , where P × Q is image spatial resolution. Prior division image features were normalized to the range of [0, 1] simply by dividing all channels by 2 B − 1 where B stands for a bit depth of each channel. The image was next smoothed with a Gaussian kernel of size σ = 2. In the resulting region adjacency graph, the superpixels R i were represented by vertices v i , each described by a N C -dimensional feature vector f i derived from the corresponding region. Mean color (or intensity) within the region R i was considered by averaging each color channel separately.
Weighted edges connected nodes representing adjacent regions. Weights are derived from a Gaussian weighting function given by the following formula: where: and β is a free parameter of the method.
An image RAG is represented by a region adjacency matrix A = [a ij ] ∈ R N ×N such that: a ij = 1 if regions R i and R j are adjacent 0 otherwise (3) and a region feature matrix X ∈ R N ×C representing the N vertices each having features in R 1×C .

C. GRAPH CONVOLUTION NETWORK MODEL a: NETWORK GENERAL ARCHITECTURE
The architecture of the graph convolutional neural network used in this study is detailed in Table 1. The resulting GCN architecture stacks two convolutional layers such that: where: -H (p) ∈ R N ×n p denotes feature matrix of p-th layer, n p is the number of feature maps and H 0 = X ; -σ (·) stands for an activation function; - * G denotes graph convolution operator; g is some function which aggregates node neighborhoods. In the considered architecture graph convolutional layers are separated by a dropout layer used to reduce overfitting. The number of input channels to the first layer H (0) ∈ R N ×C is C (which in the case of the RGB images as used in this study equals 3) and the number of output channels is 16. The ReLU activation, such that f (H) = max(0, H), is applied to the output of the first layer. The second convolutional layer H (1) ∈ R N ×16 takes 16 channels at the input and outputs N C channels, where N C is the number of labels in the segmented image. Softmax activation is applied to the output of the second convolutional layer to obtain the probability that the region R i represented by node v i belongs to each of the N C regions. Finally, node v i is the assigned label of a region for which the greatest probability exists. The GCN architecture outlined in Table 1 was tested in two variants, the spectral and the spatial one. They differ in the approach used for calculating convolution on a graph in the convolutional layers of the neural network. The most popular graph convolution operators representing the spatial and the spectral approach were considered and comparatively tested. Particularly, the GraphSAGE and Chebyshev convolutional operator were considered. VOLUME 10, 2022 FIGURE 1. Method is initialized with user-provided seeds (scribbles) corresponding to regions of interest. The input image is divided into superpixels using the SLIC approach. A superpixels adjacency graph is next generated to represent an image. Grah nodes correspond to superpixels. Edges connect neighboring superpixels. Graph nodes corresponding to superpixels that overlap with user-provided seeds are assigned labels of the corresponding regions. Labels of the remaining graph nodes are determined by the graph convolutional network. The final segmentation result is obtained by assigning predicted labels to the corresponding superpixels.

b: ChebNet-THE SPECTRAL GCN VARIANT
The spectral variant of the considered GCN architecture (referred later herein as ChebNet) uses the Chebyshev convolutional layers (ChebConv) as proposed in [61]. The information between layers was propagated following the equation: where: -T k (·) denotes recursive series of Chebyshev polynomials, where T 0 (X ) = 1, T 1 (X ) = X , and T i ( θ is learnable filter coefficient matrix (namely, the vectors of Chebyshev coefficients θ k ); -K stands for the order of network neighborhood on each convolutional layer; for of the first order neighborhood (K = 1) only the nodes' immediate neighboors are considered when computing convolution, higher orders indicate that nodes K-hoops away are considered.
where L is a normalized graph Laplacian, such that L = I n − D − 1 2 AD − 1 2 , λ max is the largest eigenvalue of graph Laplacian, and D is graph degree matrix with D i = j a ij , and A = [a ij ] denoting graph adjacency matrix.

c: SageNet-THE SPATIAL GCN VARIANT
The spatial variant of the considered GCN architecture (referred later herein as SageNet) uses the GraphSAGE operator as proposed in [66]. The information between layers was propagated following the equation: where: and W is a weight matrix, N (·) is a node neighborhood function, and H (p) v is a feature vector of node v in p-th layer.

D. EXPERIMENTAL SETUP
The method was implemented in Python 3.7 programming language. Mainly, the GCN was implemented using PyTorch Geometric, i.e., a geometric deep learning extension library for PyTorch. PyTorch ran at the top of TensorFlow. Experiments were performed on a desktop computer with Intel Core i9-7940X (3.10 GHz) processor, 128 GB RAM, and NVidia GeForce GTX Titan X GPU. The GCN was trained for 5000 epochs with the patience of 500 epochs (i.e., the training was stopped after 1000 epochs with no improvement). The Adam optimizer with the learning rate of 0.005 and weight decay of 0.0005 was applied to minimize the negative log-likelihood loss. The weights ensuring the lowest loss were used for graph node classification.

A. EVALUATION PROCEDURE
The assessment of the proposed approach was performed concerning two available publicity datasets, namely the Berkeley Segmentation Dataset [67] and the GrabCut Vision Dataset [40]. In total, 100 images and the corresponding ground-truth segmentation results, were considered. The performance of the GCN-based approach was assessed in both binary (single object vs. background) and multi-label segmentation task. In the first case, the GrabCut dataset, containing 50 images in total, was used. The evaluation of multi-region segmentation accuracy was performed concerning 50 images randomly selected from the Berkeley dataset. The test images represented challenging scenes in both experiments, including complex content, multiple objects, and similar foreground and background. The rough location of regions of interest was given as scribbles with different scribble colors representing different labels.
The DICE coefficient and Jaccard index (also known as Intersection over Union, IoU) were used for the numerical evaluation of the segmentation results. The measures are defined by (8) and (9) respectively.
where S and T stand for segmentation result and ground truth, respectively.
In the case of multi-label image segmentation, individual scores were determined for each label and then averaged.
The proposed approach to semi-supervised image segmentation was also compared to the most popular state-of-theart graph-based competitors. For binary segmentation, these included Random Walker and GrabCut. In the case of multilabel segmentation, only the Random Walker was considered since the GrabCut is intended for binary segmentation (object extraction from background). All considered methods were initialized with the same scribbles to facilitate a fair comparison.
A qualitative evaluation of the considered method was also performed for medical and microscopy images randomly collected over available publicity sources. For these images, the ground truths were not given. However visual assessment of the segmentation allowed to observe some properties of the proposed approach.

B. BINARY SEGMENTATION
The visual results of object extraction from the background using the proposed approach are presented in Fig. 3. Sample images from the Microsoft dataset were selected for presentation. The top panel presents original images with user input (scribbles) overlaid on them. Consecutive rows present segmentation results obtained by the considered methods with method names indicated on the left side of each row. ChebNet corresponds to a graph convolutional network with two Chebyshev spectral convolutional layers, while SageNet corresponds to a network of the same architecture but with SAGE spatial convolutional layers.
The distributions of the considered image segmentation accuracy measures obtained for each of the considered methods over the Microsoft dataset are presented in Fig. 4.
Finally, the visual results of the GCN-based approach applied to random histopathological images 1 are presented and compared to the results of the competitors in Fig. 5.

C. MULTI-LABEL SEGMENTATION
The visual results of multi-label segmentation are presented in Fig. 6. Sample images were selected from the Berkeley dataset. As in the case of binary segmentation, the top panel presents original images with user input (scribbles) overlaid on them. Consecutive rows present segmentation results obtained by the considered methods with method names indicated on the left side of each row.
The distributions of the considered image segmentation accuracy measures obtained for each of the considered methods over the Berkeley dataset are presented in Fig. 7. The measures were determined for each resulting region separately and then averaged over the number of regions.
Finally, the visual results of the GCN-based multi-label segmentation applied to random microscopy, and medical images are presented and compared to the results of the competitors in Fig. 8.

D. SENSITIVITY TO INITIALIZATION
The influence of initial seeds on image segmentation results is presented in Fig. 9. Selected examples present cases with seeds changing gradually from precise and dense markings spreading across whole regions (cf. Fig. 9a) to very scarce annotations (cf. Fig. 9d). For both considered samples, the top panel presents seeds overlaid on the original image, while the panels below present the corresponding results with colors of the regions complying with the colors of initial annotations. Again, ChebNet and SageNet are considered with the model's name on each row's left side.

VI. DISCUSSION
The results presented in Section V confirm that graph convolutional networks can be successfully applied for semisupervised image segmentation in both binary (see Fig. 3, 5) and multi-label (see Fig. 6, 8) segmentation tasks. Based on the visual assessment, it can be observed that the GCN-based approach is universal. Particularly, it performs reasonably well for different images, representing diverse scenes and exhibiting different properties, including natural scene images, microscopic images, and medical images of  various modalities. The results of image partitioning by the GCN mostly precisely match the regions' boundaries, even if the background, object, or both are nonuniform and complex. This observation regards both variants of the considered GCN architecture, i.e., the spectral one represented by the ChebNet and the spatial one represented by the SageNet.
Spectral and spatial convolution operators perform similarly well in the case of image segmentation. However, the SageNet (i.e., spatial approach) is less sensitive to local intensity and color variations. As a result, it provides slightly better segmentation results with more regular edges and less noise. Image segmentation quality metrics also confirm this visual observation. Notably, in the case of binary segmentation, the average and median values of the DICE coefficient scored by the SageNet were at the level of 0.86 and 0.90 vs. 0.82 and 0.85 scored respectively by the ChebNet. The corresponding values of the Jaccard index were almost equal for both variants and equal on average to 0.77, with the median value equal to 0.82. However, in the case of multi-label segmentation, both variants of the GCN performed equally well, scoring on average the DICE coefficient at the level of 0.79 and a median value equal to 0.83.
When compared to the competitive approaches, the GCN-based methods exhibit certain advantages. The advantage over the Random Walker is evident and confirmed by visual and numerical results. The Random Walker compromises details of region shapes over the compactness of the resulting regions. As a result, object shape details are  often lost while object boundaries are smoothed. At the same time, the GCN-based approach retains even the finer details of object shapes. This effect can be observed, for example, in Fig. 3a where eyes, ears, and eyebrows are visible in the segmentation results, in Fig. 3h where swimmer's arms and legs were precisely retained in the resulting image or Fig. 5ab where the shape of the cells was precisely represented. The proposed approach is also resistant to the nonuniform background color or intensity distribution. In such a case, it precisely extracts objects from the background while the Random Walker experiences serious problems (see Fig. 3bcei). In the case of the multi-label image segmentation task, the Random Walker approach is not able to segment small regions and merges them. In contrast, the VOLUME 10, 2022  GCN approach retains even the smallest regions. This effect can be seen in the natural scene images in Fig. 3ef, as well as microscopy image in Fig. 8a. Overall, the GCN-based approach scores higher measures than the Random Walker in both binary (see Fig. 4) and multi-label segmentation tasks (see Fig. 7).
When the comparison with the GrabCut is considered, the main advantage of the proposed GCN-based approach is the ability to perform multi-label segmentation. The GrabCut is intended to extract an object from the background. As a result, its application is limited to binary segmentation tasks. When compared numerically, the GrabCut slightly outperforms the proposed approach in the natural scene segmentation task (see Fig. 4). This effect is caused mainly by the exclusion from the results of the proposed approach, some subregions like eyes in Fig. 3aei. Depending on the application, this can be both a disadvantage and an advantage of the method.
In the case of microscopy images presented in Fig. 5ab, the GrabCut performed significantly worse than in the case of the natural scene images. Although the ground truth results are not given for these images, based on the visual assessment, it can be seen that the GrabCut causes significant over-segmentation. In contrast, the results of the GCN-based approach precisely match the tissue shape.
The main parameter of the method is β used to determine edge weights in (1). In this study, the values of β = {0.001, 0.01, 0.1, 1, 10, 100, 1000, 10000} were tested. In the case of ChebNet for the k values below 100, no significant influence on image segmentation results was observed. However, for values higher or equal to 100, the segmentation result became noisier with the increasing value of the parameter. The GCN mostly failed to produce consistent results for the highest considered value of β. All subpixels were predominantly assigned one label, with clear scribbles, and other labels randomly distributed over the image. SageNet remained insensitive to all considered values of parameter β.
The additional parameter of the ChebNet variant of the considered GCN architecture was the order of Chebyshev polynomials used to approximate spectral filters that learn on k-hop neighborhoods of the graph. The best results (regarding the image segmentation accuracy measures) were obtained for k = 2. These results were presented in Section V. In the case of unambiguous scenes, for k = 1, the ChebNet was more sensitive to local intensity variations, in some cases producing a small amount of noise. The increasing values of k decreased the resulting noise level and the level of details in the segmentation result, with the borders of regions being softer and smoother. However, increasing the value of k may be necessary to perform the segmentation of unambiguous regions. These two effects can be observed in Fig. 10 The property of the proposed approach is that it does not preserve the resulting region connectivity, which can be both an advantage and a disadvantage depending on the application. Notably, the GCN assigns one label to regions similar to the region coarsely indicated by a user regardless of the location in the image and the connectivity to the seed region. As a result, it can extract objects split into disconnected regions. This effect (seen e.g. in Fig. 3f, Fig. 5ab or Fig. 8) is stronger for spectral variant of the GCN. It also discerns the GCN-based approach from the Random Walker, which always outputs the number of regions equal to the number of scribbles and does not assign a region a label when it is disconnected with a scribble (see, e.g., Fig. 8b).
As shown in Fig. 9 the segmentation results obtained for different initial seeds are very similar. They demonstrate that the proposed method moderately depends on the initialization. Good quality region segmentation results can be similarly obtained for scribbles accurately covering regions and scarce seeds (cf. Fig. 9a-c). The method can also perform accurately in the case of very scarce seeds. However, in such a case, image segmentation accuracy may deteriorate for some images with strongly heterogeneous regions, especially for the ChebNet variant of the proposed approach. SageNet remains even resistant to scarce initialization with seeds (cf. Fig. 9d).
The limitation of the GCN-based approach is that it cannot complete a segmentation task when similar regions are assigned different labels. This effect can be observed in Fig. 11 where both ChebNet (Fig. 11b) and SageNet (Fig. 11c) failed to extract regions of interest, while the Random Walker achieved this goal (Fig. 11d).
Depending on the image resolution and complexity of the segmented scene, image segmentation lasted up to a second for the spatial variant (SageNet) and from several seconds to three minutes for the spectral variant (ChebNet) in the case of computations performed on a GPU. For the Cheb-Net, the computation time significantly increased with the Chebyshev polynomial order. However, the computation time can be decreased by diminishing the number of superpixels. No computer memory limitations were observed regardless of image resolution (the resolution of images considered in this study was not higher than 512 × 512 pixels).

VII. CONCLUSION
The proposed method introduces graph convolutional neural networks to semi-supervised image segmentation. The approach is universal and performs equally well when applied to images of different characteristics. As a result, it seems to be a noteworthy alternative to existing graph-based segmentation methods. Notably, the proposed GCN-based image segmenter can be helpful when precise object-shaped delineation is required, fine shape details need to be preserved, or disconnected regions of similar characteristics need to be extracted.
The spatial variant seems to be a better choice from the considered graph convolution operators. Although both variants scored similar image segmentation accuracy, the spatial approach was much more efficient in computation time.
Even if the GCN-based approach performs well in binary and multi-label segmentation tasks, there is still room for improvements. They could potentially regard the development of dedicated graph weighting functions. The function used in this work considers only region color or intensity differences. Extending the weighting function by a factor that considers the spatial location of regions and a mechanism to control the influence of this factor could probably allow adjusting the balance between regions' similarities and compactness of segmentation results. This issue will be the main objective of future works.