Image Classification of Leukemic Cells Using Invariants of Triangle-Free Graphs as Synthetic Features

Development of efficient methods of the cellular image processing is an important avenue for practical application of modern artificial intelligence techniques. In particular, practical hematology requires automatic classification of images with or without leukemic (blast) cells in peripheral blood smears. This paper presents a new approach to the problem of classification of such cellular images based on graph theory, XGBoost algorithm and convolutional neural networks (CNN). Firstly, each image is transformed into a weighted graph using gradient of intensity. Secondly, a number of graph invariants are computed thus producing a set of synthetic features that is used to train machine learning model based on XGBoost. Combining XGBoost with CNN further increases the accuracy of leukemic cell classification. Sensitivity (TPR) and Specifity (TNR) of the XGBoost-based model were 95% and 97% accordingly; ResNet-50 model showed TPR of 95% and TNR of 98%. Combined use of the XGBoost-based and the ResNet-50 models demonstrated TPR of 99% and TNR of 99%.


I. INTRODUCTION
Automation of the processing of cellular images requires a number of image analysis tasks had to be solved.In particular, it is necessary to (1) detect presence of a particular cell type in the image, (2) distinguish between different types of cells in the images, (3) count the numbers of cells of each type and (4) highlight the relevant structures inside the cells.
This work presents a development of image analysis methods applied to classification of blood smear images obtained from patients with acute lymphoblastic leukemia (ALL) with the purpose of detecting leukemic (blast) cells.The practical importance of the methods proposed is obvious since smear analysis is performed only by trained specialists highly qualified in cell morphology.
ALL is a disease of the blood system that results from malignant transformation of B/T lymphocyte precursors [1].It is the most common leukemia in the pediatric population, The associate editor coordinating the review of this manuscript and approving it for publication was Sandra Costanzo .
accounting for up to 80% of cases in this group (compared to only 20% in adults).Early and accurate diagnosis of ALL is important for successful therapy [2].Expert analysis of bone marrow smear images is necessary to establish a diagnosis of ALL.A characteristic feature of the images is the presence of so called ''blast cells,'' a morphologically distinct group of cells including the earliest hematopoietic progenitor cells.Typically, these blast cells comprise 1%-2% of the cells in bone marrow smears [3].The malignant cells remain poorly differentiated, but carry markers of lymphoid and/or myeloid cells.Previously, it was believed that malignant cells, while proliferating, displaced normal hematopoietic cells from the bone marrow.Later, it was shown that the number of normal hematopoietic stem cells does not decrease, but they undergo a blockade of differentiation, resulting in the development of cytopenia [4].Peripheral blood smear may contain more than 20% blast cells [5], [6].
The task of classifying images containing blast cells is complicated by the fact that three different types of lymphoblasts are found in bone marrow biopsy specimens from patients with ALL: (1) small size, high nuclearcytoplasmic ratio, rounded nuclei, delicate chromatin structure and poorly distinguishable nucleoli; (2) medium-sized with varied nucleus shape, delicate chromatin structure; (3) medium to large size with oval or rounded nucleus shape, delicate chromatin structure, one or more well-defined nucleoli, sharp basophilia and vacuolization of the cytoplasm [7].A characteristic feature of the blast cells is their difference in size from lymphocytes which, nevertheless, varies widely [8].These features of images containing blast cells increase the variability of image data, making it difficult to create effective classification algorithms.
This study proposes an approach to classify the images containing the leukemic cells by transforming the images into weighted triangle-free graphs in which the image gradient magnitudes are used as edge weights.The weighted graphs obtained from the images are used to compute several graph invariants which form a feature set that is subsequently used to train a machine-learning ensemble based on the XGBoost model architecture [9].We also explored how the complementing the graph-based model with the ResNet-50 model [10] can improve the results of classification (in particular, on low-resolution images).Low resolution of the blood smear images represents an actual problem for neural networks of complex architectures and also has important practical ramifications since it is not always possible to obtain biomedical images of high resolution.
The remainder of this paper is organized as follows: In Section II we describe similar approaches and publications related to the subject of the problem to be solved.In Section III, we describe the dataset used in the study and the methods developed for preprocessing the data.In Section IV we describe the algorithms proposed, their implementations and overall experimental setup.In Section V we present the results of the algorithms application on real data, compare performance of our algorithms with the performance of the state-of-the-art neural networks.In Section VI we compare our results with the results of earlier studies that used the ALL_IDB1 dataset.Section VII presents finalizing remarks and highlights some ideas for future studies.

II. RELATED WORK
Several attempts have been made to apply various automated image processing methods to solve the problem of detecting leukemic cells in images of blood smears.In particular, paper [11] presents the idea of applying a CNN to classify images from an ALL-related dataset.The authors also proposed a valuable data augmentation approach and described implementation peculiarities of their CNN architecture.The shortcomings of the study are the lack of technical detail and relatively small number of computational experiments.
Closely related to this paper are the works [12], [13] in which the authors suggested the use of more sophisticated approaches based on segmentation combined with CNN and other types of network architectures.The merits of these studies include a large number of tested algorithms (which were based on the application of neural networks of different architectures).A major disadvantage of both of these studies is insufficient description of the solutions proposed, including the approaches to the problem of training neural algorithms on small amounts of image data.
In [14], the authors used a combination (ensemble) of complex neural networks (VGG19, MobileNet, and DenseNet) to solve somewhat similar problem of biomedical image classification (breast cancer biopsy images) over a small set of data without using methods of graph theory.The advantages include an effective combination of several neural networks of complex architecture to perform classification while disadvantages are related to the fact that such combinations produce excessive complexity of the algorithms.
The main advantage of the paper [15] is to extract original image features called ''Energy,'' ''Entropy,'' ''Shannon Entropy,'' ''Log Energy Entropy'', Mean, Variance, Correlation and to use these synthetic features to predict the image classes.The most prominent disadvantage of this approach is relatively low classification accuracy of the algorithms tested.
In [16] the authors used a gradient calculation method based on weighted graphs combined with a Random Forest algorithm.Paper shows that the image gradients constructed with the method proposed can be used as an alternative to the edge maps methods and described an interesting approach of training Random Forest algorithm on gradientbased information.Limitation of the study is due to the fact that it focuses only on the image segmentation tasks not covering classification use cases.Also, it's worth mentioning that the algorithm proposed by authors was not tested on the real-world datasets.Other approaches of this kind were proposed in [17] and [18].
Here, we would also like to discuss the specifics of the papers that were using the same ALL_IDB1 dataset [19] for training and validation of the models developed.In the works [11], [20], and [21], different variants of CNNs were used to detect images with leukemic cells.They achieved accuracies of 96.6%, 98.3% and 96.5%, respectively.The authors of [13] used different set of models including the residual-based neural networks (such as ResNet-50); the accuracy reported was 96.2%.Several other works used different variations of SVM [22], [23], and [24] with reported accuracy of 93.7%, 98.1% and 94%, respectively.Both studies [25] and [26] used variations of the YOLO algorithm for the task of blast cell detection with reported accuracy of 97.2% and 96.1%, respectively.In [27], authors used 4-moment statistical features and several neural networks for building and training the model to achieve 97.1% accuracy on the same ALL_IDB1 dataset.

III. DATA AND PREPROCESSING
The ALL_IDB1 dataset used in this study consisted of 108 microscopic images representing approximately 39000 cells.Images were taken at various magnifications (x300-x500).An example of an image is presented in Fig. 1.Two approaches, raw image processing and preprocessing of the tabular data, were used to prepare the data for training our models.Data augmentation (increasing the image sample using simple transformations such as image rotation, stretching, changing image brightness and adding reflections) was applied in order to combat overfitting [28].For this purpose three basic transformations were applied to the original images: • image rotation by a random number between 0 and 20 degrees; • brightness alteration is in the range of [0.15;0.3],where 1 indicates the maximum brightness of the image; • approximation (zooming) of the image.
For each original image from the dataset of 108 images, additional images were generated by applying all of the three transformations which resulted in the formation of a dataset of 3231 images.An example of augmented image is shown in Fig. 2.

A. GENERATING GRAPHS FROM THE IMAGE
The algorithm takes an RGB image with dimensions H × W × D as input, where H is the height of the image in pixels, W is the width, and D is the number of channels (layers) in the image [29].The elements of this array are intensities with values ranging from 0 to 255, where 0 is the minimum and 255 is the maximum intensity.Each pixel in the image corresponds to a vertex of the graph.Edges of the graph were calculated as the connections between adjacent pixels in each layer of the image by translating an array of indices into a 3D mesh and stacking slices of the mesh around the layers.
We demonstrate this process using an example on an imaginary input image I with dimensions of (3, 3, 1).We select and number the vertices from the first to the last pixel in the image; in total, we obtain 3 × 3 × 1 = 9 vertices.The edges generated by connecting adjacent pixels are shown in Fig. 3.In this example, the edges were generated by considering only the x and y directions (because z = 1), and the case of ±1 is considered.In cases with z > 1 there is also a third direction of edge formation (z-axis).The main stages of the graph generation procedure are shown in Fig. 4.

B. CALCULATING THE IMAGE GRADIENTS
The gradient of an image was calculated as the absolute difference in intensity between every two pixels connected by an edge [30].
Next, the vertex indices were computed as follows: where E i is the vertex index of the image edge, n y is the pixel width of the image, n z is the number of image channels.
As a result, we obtained an adjacency matrix of a weighted graph, in which the main diagonal contains the pixel values of the original image and the other values in the matrix are either the resulting gradient values or zeros.This adjacency matrix corresponds to weighted graphs with self-loops and without triangles.An example of such a matrix is shown in Fig. 5.

C. TRIANGLE-FREE GRAPHS
After forming the adjacency matrix as described above, the graphs thus obtained were triangle-free by construction [31].The number of self-loops in such graph G equals to the number of vertices The resulting graphs obtained from the images had a diameter equal to H +W , where H -is the height of the image in pixels and W -is the width of the image.These graphs had a clique number equal to 2. As the image dimensionality increases, the complexity of the graphs rises (an example is presented in Fig. 6).

D. GRAPH INVARIANTS AS SYNTHETIC FEATURES
After preliminary experiments with various graph invariants using the procedures of feature estimation and selection ( The average degree connectivity [33] is the average degree of the nearest neighbor of a vertex with degree k.For weighted graphs, this invariant was computed by calculating the weighted average vertex neighbor degree for vertex i as follows: where s i is the weighted degree of vertex i, w ij is the weight of the edge connecting vertices i and j, and N (i) are the neighbors of vertex i.
The degree assortativity coefficient [34] estimates the tendency of vertices to connect with other vertices with similar features within the graph and was computed as the Pearson correlation coefficient between vertex degrees: where f1 = j f (j) |E| , f2 = i f (i) |E| , i, j ∈ E. The local efficiency [35] was computed for each individual vertex i in the graph by identifying the subgraphs to which vertex i is directly connected.After removing vertex i from the identified subgraph, the shortest path between all the vertices in the subgraph was computed.The inverse of the shortest path from each vertex previously connected to vertex i to every other vertex previously connected to vertex i was then summed over all vertices, which was normalized to consider the total possible number of links that may exist between all vertices previously connected to i. Formally, local efficiency calculated as where N G i is the number of vertices in subgraphs G i and L j,k is the average distance (number of steps) between vertices i and j in the graph.
The average edge weight (across vertices) was defined as the ratio of the sum of the edge weights to the number of vertices in a given graph.
The Estrada index is the invariant that is used in chemical graph theory for topological operations with molecular structures [36] and was computed as follows: where λ 1 , . . ., λ n are eigenvalues of the corresponding adjacency matrix A of the graph G.
During the study, other invariants were also tested.However, they were not included in the final set of synthetic features because of insignificance of their effects on the performance of the algorithm.Among them, we can mention such invariants as the Pagerank [36], Wiener Index [37], Edge Connectivity [38], Node Connectivity [39], Edge Betweenness Centrality [40], and Length of the Sparsest Cut [41].
After converting the augmented image set into graph representations and obtaining the graph-based feature set F, the feature values were standardized to a set of values from a distribution with zero mean and standard deviation equal to 1 [42].

E. APPROACHES TO FEATURE SELECTION
For feature selection, we used the Kolmogorov-Smirnov test statistic (D n,m ), feature importances from the XGBoost model (weight importances), and Shapley Values (mean SHAP values).
D n,m can be used to estimate the significance of the differences between two discretely presented distributions of the variable, with the null hypothesis that samples are drawn from the same distribution [43].Let F n (x) be an empirical distribution function of a variable on X = (X 1 , . . ., X n ) .Then where I X i defined as following [44]: Statistics for the function F n (x) is defined as the Kolmogorov's maximal deviance: In case of two-sample statistics, the equation transforms into the following: We used the two-sample statistic D n,m to identify features significant for the model (comparing the distributions of each feature to the target class variable) and to determine the degree of confidence in the significance of the feature based on the value of the statistic [45].
Apart from D n,m , we used the feature importances obtained within the XGBoost model.Specifically, we tested three estimates of feature importances: ''weight'', ''gain'' and ''coverage''.When using weight, we calculated the number of times a particular feature divided the data across all decision trees in the algorithm.Gain was defined as the average information gain of the tree splits that utilize the features.Cover was defined as the average coverage of the tree splits of the feature, while coverage represents the sample number affected by the tree split [46], [47].Preliminary experiments showed that ''weight'' score proved to be the most accurate for feature evaluation in our case.
To evaluate the set of graph invariants we also used SHAP (SHapley Additive exPlanations) -a characteristic based on the results from game theory that can be used for feature importance calculations [48].The SHAP of a feature was calculated as follows: where S is a subset of the features used in the model, val -the value function of S and p is the number of features.
where val x (S) is prediction for the values in S.

F. IMPLEMENTATIONS OF XGBOOST AND CNN ALGORITHMS
For training XGBoost model we used a set of features obtained as the result of testing various graph invariants.An early stopping procedure was also used in the model; the algorithm stopped training if the loss parameter did not decrease within five consecutive iterations.The model was trained using 100 estimators, a learning rate of 0.1 and a maximum depth of 3.
We have used the classical version of the pre-trained ResNet-50 with ImageNet weights.For fine-tuning using our dataset, an Average Pooling layer, a fully-connected layer of four neurons and a dropout layer were added.We have used Binary Cross Entropy as the loss function and RMSProp as the optimizer.The output layer used the softmax activation function for class probability prediction.Softmax function was compared with sigmoid function but we found no evidence that sigmoid function outperforms softmax in terms of evaluation metrics we used or in terms of computation speed.The CNN was trained for 100 epochs; early stopping was added to control overfitting.Training stopped earlier if the loss rate on validation did not decrease for 5 consecutive epochs.
The predictions of the two models were averaged so that the final result was constructed based on the average probability of the classes from the models.Consider an example in which an algorithm produces five results: the first model M 1 gives a set of predictions p1, . . ., p5, and the second model M 2: q1, . . ., q5, so that By setting any value of weight w in the interval 0 < w < 1, we denote the combined ensemble prediction as Then we obtain that The value of parameter w = 1/2 was considered to be the optimal value and was further used for comparing approaches and algorithms [49].

G. EVALUATING THE ACCURACY OF THE ALGORITHMS
The developed algorithms were evaluated by dividing the dataset randomly in the ratio of 70% to 30% into training and test subsets, respectively.The common functional indicators for evaluating performance of binary classification (Accuracy, Precision, Recall, F1-Score and Area under ROC curve, AUC) were calculated for the evaluation of the algorithms [50].
where TPR = TP TP+FN and FPR = TN TN +FP .The code was written in Python 3.10.12(using Anaconda of version 23.3.1).
We share all of the results and the code at https://github.com/ds-muzalevskiy/graph-invariants-leukemia-detection.

V. EXPERIMENTS A. OVERALL SETTING
As part of the computational experiment, we solved the binary classification problem of detecting images with the blast cells.First, we augmented the original ALL_IDB1 dataset and obtained a set of 3231 images.Each image in this augmented image set was also compressed to 32 × 32 pixels.Secondly, after augmentation and rescaling the images were subsequently used in two ways: 1) the images were transformed into weighted graphs and graph invariants were calculated as described in SectionIV which, in turn, were used for training the XGBoost model; 2) the images were directly used for training the ResNet-50 CNN algorithm.We also evaluated and compared the results of these two models (graph-invariant XGBoost model and CNN) and their combined performance.

B. INVARIANTS' SELECTION AND EVALUATION
We generated graphs by considering the connection of adjacent pixels as of ±1, ±2 and ±3 types thus obtaining three sets of graphs and then computing the same set of 11 invariants for the graphs in these three sets.The set of 11 invariants included the Average Edge Weight, Average Degree Connectivity, Local Efficiency, Global Efficiency, Degree Assortativity Coefficient, Average Neighbor Degree, Average Edge Weight (across vertices), Estrada Index, Wiener Index, Pagerank and Degree Centrality.Subsequently, we calculated invariant evaluation functionals using weight importances, mean shap values (also calculated using the XGBoost model) and D n,m .

1) ±1 GENERATED GRAPHS
The set of the most important invariants for the ±1 graphs was formed by the Local Efficiency, Degree Assortativity, Average Edge Weight, Average Edge Weight (across the vertices), and Average Degree Connectivity.According to the values of SHAP and D n,m , the Estrada Index can also be considered to be important for the model (Table 1).Low values of Local Efficiency (Fig. 7) correlated with a higher chance of obtaining the 0 class in the model (images without leukemic cells).On the contrary, higher values for the Degree Assortativity invariant were related to a higher likelihood of the model predicting the 1 class (images with leukemic cells).Additionally, higher values of the Average Edge Weight and Average Edge Weight (across the vertices) are related to 0 class prediction.

2) ±2 GENERATED GRAPHS
For the ±2 generated graph invariants, the most significant features were the Average Edge Weight (across the vertices), Average Degree Connectivity, Degree Assortativity and the Estrada Index.We can also notice that Global and  Local Efficiencies also play a role in the overall model's performance.According to Table 1, Degree Assortativity and Estrada Index have the highest impact on the model output.

3) ±3 GENERATED GRAPHS AND THE FINAL SET OF INVARIANTS
As n increases from 1 to 3, the graph begins to lose its edges.This explains the differences in feature importances for different n values (Fig. 8): for the ±3 generated graphs, the most significant features were Local Efficiency, Degree Assortativity, Average Edge Weight (across the vertices) and Average Degree Connectivity (Table 1).
To form the final set of the most significant features we took the six most important features (according to the three functionals we used) of ±1 graph invariants and added one of the most important features from the sets of ±2 and ±3 graph invariants.The complete list of features contained the following features from ±1: Local Efficiency, Degree Assortativity, Average Edge Weight, Average Edge Weight (across the vertices), Average Degree Connectivity, Estrada Index, Degree Assortativity from ±2 and Local Efficiency from ±3 graphs.

C. RESULTS OF THE IMAGE CLASSIFICATION EXPERIMENTS
In addition to ResNet-50 and graph-invariant XGboost models, that formed the basis of the study, a series of experiments were conducted with a set of different boosting algorithms for the tabular data: AdaBoost (ADA) [51], Gradient Boosting (GB) [52], LightGBM model [53], and NGBoost [54].The Table 2 shows the results of standalone graph-invariant XGBoost (GXGB), ResNet-50 (RES-50), and the combined ensemble of the two models (COMB).The results of the evaluation of the models are also presented in the Figure 9.
After performing a standalone comparison of the performance of the graph-invariant XGBoost and Resnet- 50, we tested a hybrid method, an ensemble constructed from these two models.After obtaining the augmented dataset and dividing the data into training and test sets, the algorithms were trained in parallel -XGBoost on the computed graph invariants and ResNet-50 on the augmented images.After each model made class predictions, the probabilities were averaged and used as a final estimate of model performance.The results of the evaluation of this ensemble model are shown in Figure 10; the validation curve for the graph-invariant XGBoost model is displayed in Figure 11.

VI. DISCUSSION
In this section, we compare our proposed algorithms with the results of algorithms applied in earlier studies which used the same ALL_IDB1 dataset.We summarized the results of these comparisons in the Table 3.We used Accuracy metric for comparison for two reasons.Firstly, in ALL_IDB1 dataset we operated with balanced classes, so in this case we can use Accuracy as an evaluation metric similar to F1-Score.Secondly, not all papers used all of the evaluation metrics that we applied (F1-Score, Precision, Recall, TPR and FPR).Its apparent from the data in the Table 3 that the combined use of XGBoost and ResNet-50 might improve the quality of the image cells classification.It is also worth noting that all of the ALL_IDB1-based studies cited used the original (high resolution) images (up to 2592 × 1944).
For XGBoost training in the present work we used highly compressed images of size 32 × 32 as inputs, i.e.  very low resolution images.It's common knowledge that low resolution of input images significantly aggravates the performance of neural networks of complex architecture.The main advantage of our proposed method is that using the graph invariants we managed to increase the amount of useful information extracted during training on images of low quality and low dimensionality.In addition, our combined (ensemble) method might outperform similar approaches that use exclusively high-resolution images thereby showing the promise of further in-depth application of graph invariants in the task of medical image classification.To the best of our knowledge, this work is the first to use this type of weighted triangle-free graph construction to generate specific invariants based on edge weights, shortest paths and degrees of vertices and their neighbors for further training of machine learning models and subsequent enrichment and improvement of neural networks of complex architecture.
The major limitation of the proposed method is its relatively high computational complexity.The invariants related with the Maximum flow problem (such as Node Connectivity and Edge Connectivity) as well as the invariants based on the calculation of all-pairs shortest paths in the graph such as Edge Betweenness Centrality are the most computation-consuming with computational complexity exceeding O(n 2 log n).However, possibility of using the low-resolution images for generating synthetic features based on these invariants does alleviate the computation costs.

VII. CONCLUSION
In this study, we considered the problem of converting images into weighted graphs, calculating invariants, and trained several image classification algorithms.During the computational experiments on a real-world image dataset it was shown that the method proposed can improve the performance of the network architecture ResNet-50.Further steps of research may include analyzing different approaches to edge extraction in images and obtaining graphs from them (such as the use of Sobel filter [55] or Canny detector [56]), which might further improve the results along with more accurate detection of the edges and boundaries in the image [57].Mathematical problems related to changes in graph structure as the image size increases (e.g., estimating the growth of connected components) seem to be a promising research direction and an interesting theoretical problem [58].

FIGURE 1 .
FIGURE 1. Example of a microscopic blood image.Lymphocytes (purple) and erythrocytes (most of the cells in the figure) are shown.

FIGURE 2 .
FIGURE 2. Example of an augmented image.Rotation, zooming and brightness adjustment operations were applied to the original image.

FIGURE 3 .
FIGURE 3. Process of forming edges inside the image pixels.Connecting vertices ±n (in this example ±1 is used) along the x, y , z axes ensures that there will be no triangles in the newly generated graph.

FIGURE 4 .
FIGURE 4. Process of the graph generation: a) Input image as image pixel intensities; b) Vertices formed based on each pixel of the input image; c) Edge creation in x-direction, in y-direction and the overall set of edges.

FIGURE 5 .
FIGURE 5. Adjacency matrix with image gradient magnitudes.The main diagonal (marked in orange) contains pixel intensities of the original image, while the other values are the magnitudes of the gradients.

FIGURE 7 .
FIGURE 7. Estimates of the significance of the synthetic features for ±1 graph invariants.

FIGURE 10 .
FIGURE 10.Confusion Matrices for both standalone models and the ensemble approach.a) Graph invariants XGBoost model; b) Resnet-50 model and c) Ensemble model.

FIGURE 11 .
FIGURE 11.Validation Curve for the graph-invariant XGBoost model.F1-Score was used as evaluation score metric.Training and cross validation scores were displayed.

TABLE 1 .
Results of evaluation of significance of invariants for ±1, ±2 and ±3 graphs.Abbreviations: xgb-imp is Weight Importances, shap-val is Mean Shap Values and D n,m is Kolmogorov-Smirnov statistics.

TABLE 3 .
Comparison of the model performances from the different papers based on ALL_IDB1 dataset.