Journals & Magazines >IEEE Access >Volume: 13

An Explainable AI and Optimized Multi-Branch Convolutional Neural Network Model for Eye Anemia Diagnosis

The proposed MultiBranchNet model is designed to extract deep, distinctive features from eye images, enabling superior classification performance for non-invasive anemia ...

Abstract:

This paper proposes a novel, non-invasive approach to diagnosing eye anemia using deep learning techniques. Traditional methods, reliant on invasive procedures like venip...Show More

Metadata

Abstract:

This paper proposes a novel, non-invasive approach to diagnosing eye anemia using deep learning techniques. Traditional methods, reliant on invasive procedures like venipuncture, are costly and can cause patient discomfort. Our model leverages a multi-branch convolutional neural network (CNN) architecture, incorporating the Hippopotamus Optimization (HO) algorithm and multiclass support vector machines (SVMs) for enhanced accuracy. To address data imbalance, we employ the Synthetic Minority Oversampling Technique (SMOTE) and data augmentation. The model is trained and evaluated on a dataset of 211 eye images. The model achieves a remarkable 97.06% accuracy, with a Receiver Operating Characteristic (ROC) curve demonstrating an Area Under the Curve (AUC) of 0.973, indicating strong discriminative power. The parallel branch CNN architecture significantly improves training speed and reduces inference time. Furthermore, t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization effectively clusters data points, showcasing the model’s ability to distinguish between anemic and non-anemic cases. To ensure model transparency and reliability, we utilize the SHapley Additive exPlanations (SHAP) method to understand feature importance. This non-invasive approach holds significant promise for early and efficient anemia detection, particularly in resource-constrained settings.

The proposed MultiBranchNet model is designed to extract deep, distinctive features from eye images, enabling superior classification performance for non-invasive anemia ...

Published in: IEEE Access ( Volume: 13)

Page(s): 71840 - 71857

Date of Publication: 15 April 2025

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2025.3560689

Contents

SECTION I.

Introduction

Anemia is a global health concern significantly impacting populations worldwide, encompassing both developed and developing nations. Characterized by a deficiency in red blood cells or hemoglobin, anemia impairs oxygen transport, leading to symptoms like fatigue, weakness, and dizziness [1]. The World Health Organization (WHO) estimates that approximately one billion individuals globally suffer from anemia in varying degrees of severity [2].

Anemia can manifest in various forms, with iron, folate, vitamin B12, or hemoglobin deficiencies often disrupting erythrocyte production [2]. While mild anemia may initially be asymptomatic, the body’s compensatory mechanisms eventually fail, leading to the emergence of more pronounced symptoms. A specific manifestation of anemia, known as eye anemia or ocular anemia, affects the eyes by limiting oxygen supply to tissues, particularly the retina. Symptoms can include blurred vision, eye fatigue, and in severe cases, retinal hemorrhages. Early diagnosis and appropriate management are crucial to mitigate the consequences of anemia, including its ocular manifestations. Recent advances in IoT-based healthcare solutions, such as those developed in [49], demonstrate how technology can significantly improve treatment adherence and monitoring. An approach that could be adapted for anemia management.

Treatment strategies typically involve addressing the underlying cause, such as dietary adjustments, supplementation, or treatment of underlying medical conditions [5], [6]. The global impact of anemia is substantial, extending beyond individual health to encompass significant social and economic consequences. Eye anemia further exacerbates these challenges, affecting vision and daily activities, thereby impacting productivity and quality of life [6]. The management of anemia, including prevention, diagnosis, and treatment, poses a significant economic burden on healthcare systems worldwide. Recent advancements in unified healthcare data systems, such as in [48] demonstrate how integrated data management can revolutionize patient care and diagnostic efficiency. Inspired by such innovations, this work aims to develop a non-invasive method for hemoglobin measurement that can be easily deployed in resource-limited settings.

Deep learning (DL) techniques have been extensively and effectively employed in several domains, particularly in the field of medicine including eye anemia evaluation [7], [8]. Recent advancements in multi-deep transfer learning in cervical cancer cell detection, highlight the potential of sophisticated DL architectures in medical diagnostics [46]. One of the main algorithms of DL is the Convolutional Neural Network (CNN) which it has shown exemplary performance in medical image classification including the detection and analysis of anemia in eye images [9]. The CNN algorithm is a powerful DL capable of analyzing several attributes or objects inside an image and accurately differentiating between anemic and non-anemic eyes. CNN employs a hierarchical architecture, wherein a network is constructed in the form of a funnel and then generates a completely connected layer, where all neurons are interconnected, and the output is processed. The performance of neural networks is determined by several key factors, including memory consumption, the number of network parameters, accuracy, and speed, all of which play a crucial role. Network scaling aims to optimize the structure of a model by considering the above-mentioned parameters.

In this paper, a new CNN architecture model is proposed which incorporates three modules where each module consists of two branches working in parallel for classification to make the model robust [10], [11]. The parallel CNN (PCNN) design uses fewer parameters and layers compared to a similar sequential CNN structure. Five convolutional layers are placed in parallel, which effectively acts as a single convolutional layer but performs like five layers [11]. This parallel arrangement lowers the overall parameters and complexity of the network. This process will enhance the model execution time and improve accuracy by extracting features in the branches. Despite having fewer layers and parameters, the parallel CNN can extract the most discriminant features from the preprocessed Eyes-defy-anemia images. These features contribute to the high classification performance of the subsequent extreme learning machine (ELM) classifier. The parallel CNN architecture, combined with techniques like dropout and batch normalization, helps reduce overfitting and improves the model’s generalization ability [11]. Additionally, this architecture offers the benefit of incorporating a feature expansion system that transfers the extracted characteristics from the branches to a multi-level distribution. This enables communication between branches at various levels within the layers. For handling the imbalanced dataset this paper employed the Synthetic Minority Oversampling Technique (SMOTE) to get a balanced distribution of data in the dataset [12]. Due to the small size of the image dataset, trying to train an accurate model with a small amount of data is a challenge. Therefore, data augmentation methods, such as scaling, rotation, and translation applied to the original image dataset [13].

The paper key contributions include:

The proposed model combines a multi-branch CNN architecture (MultiBranchNet) for efficient feature extraction, the SMOTE technique and data augmentation used to address data imbalance, and HO with multiclass SVM for enhanced classification accuracy.
The parallel structure of the proposed model improves training speed and reduces inference time. HO fine-tunes the SVM hyperparameters, achieving 97.06% accuracy on the test set.
The model offers an accessible alternative to invasive diagnostic methods requiring repeated lab tests. Its high accuracy and sensitivity show potential for effective anemia screening, especially in resource-limited settings.
We provide a foundation for future research on clinical validation and integration of the model into diagnostic workflows.
We employ XAI to explain the results of the model by applying SHAP explainability.

The paper is organized into six sections. Section II performs a comprehensive literature review on anemia diagnosis employing Machine Learning (ML) and Deep Learning (DL) and summarizes the limitations in a table. Section III provides the preliminaries, like the essential concepts for CNN, SVM, and Hippopotamus Optimization. Section IV describes the details of the proposed model. Section V discusses the experiment results. The conclusion of the paper summarizes the key takeaways and potential areas for future exploration in Section VI.

SECTION II.

Related Work

Recently, various research has been performed about hemoglobin deficiency (anemia) which impacts by affecting vision and overall eye health (eye anemia). These studies focused on developing a non-invasive method of identifying anemia from conjunctiva images employing ML and DL models which have proven to be successful. The fundamental assumption for this study is the significant association between the color of the palpebral conjunctiva and hemoglobin values [14]. An automated system employing computer vision and machine learning methods was developed by Zhang et al. [15] to detect anemia in the conjunctiva of eye images. After being trained on a dataset consisting of 362 conjunctiva images, the system attained an accuracy rate of 82.37%. Jain et al. [16] introduced a semi-automated method for detecting anemia using eye conjunctiva images. They utilized a dataset including 99 images, which was augmented using operations such as rotation, translation and mirroring. Because of the uneven conjunctiva shape, they apply image segmentation to extract the region-of-interest (RoI). Asare et al. [17] analyze pallor and diagnose anemia by employing machine learning methods on images of the conjunctiva of the eyes. They employed an available dataset containing 710 images of the conjunctiva eyes. These images were obtained using a specialized instrument that effectively removes any potential influence caused by ambient light. They developed a model to detect conjunctiva and an anemia diagnosis model by combining CNN, Logistic Regression, and Gaussian Blur technique. In addition, CNN was used by Magdalena et al. [18] by employing images of eye conjunctiva for anemia detection and achieved an accuracy of 94 %. Moreover, Delgado-Rivera et al. [19] employed a CNN to classify conjunctiva images anemic or non-anemic after segmentation. The CNN achieves a sensitivity of 77.58% in comparison with the results obtained from laboratory experimental testing. Agrawal [20] proposed automated method for detecting anemia from retinal images using deep learning. In this research, they chose a dataset containing 400 images. Their proposed model achieved accuracy 62.23% and a precision of 59%. Dimauro at al. [21] proposed an intelligent system, based on machine learning to diagnosis anemia and then employed MobileNet V2 on two different datasets from Italy and Indian. The accuracy of the MobileNet-V2 model was 91% on the Italy dataset, while it was 68% on the India dataset. Dhalla et al. [22] employed five pretrained segmentation models namely FCN, PSPNet, LinkNet, UNet and UNet++ for conjunctiva segmentation, which are optimized on a customized dataset. Their experiments are conducted on a specially constructed dataset consisting of 2592 palpebral images from the pediatric population. Their research findings demonstrate that the LinkNet architecture outperforms the best. Its scores at 94.17% for accuracy, 90% for the Intersection Over Union (IOU) and 93% for the Dice score performance metrics. Table 1 summarizes the related works. Due to the importance and necessity of diagnosing eye anemia and the limitation found in previous studies, this paper proposes a new CNN architecture called MultiBranchNet that consists of three modules every module consists of two branches working in parallel for non-invasive diagnosis.

TABLE 1 (Continued.) Related Works and their Results

SECTION III.

Material and Methods

A. Convolutional Neural Network

The objective of CNN is to learn more complex features in the data by convolution. These techniques are mostly used for the analysis of visual images and represent the latest advancements in segmentation, object detection, image classification, and other image processing tasks. The structure is composed of multiple layers that include local spatial connectivity and sets of neurons with shared parameters. A typical CNN architecture consists of three key layers which are convolutional, pooling layers, and fully connected layers. The convolutional layers extract local features from the input data through the application of filters (kernels). These filters learn to detect specific patterns or edges within the data. Subsequently, pooling layers down samples the feature maps generated by the convolutional layers, reducing computational complexity and data dimensionality. Finally, fully connected layers operate similarly to traditional neural network hidden layers, where neurons are fully connected and perform classification or regression tasks based on the learned features as shown in Figure 1.

FIGURE 1.

CNN architecture.

Show All

Using a CNN architecture will decrease the hardware requirements for storage and accelerate the training process, especially when dealing with many parameters, in contrast to alternative architectures available. CNN possesses the ability to process raw inputs and extract distinctive features from these inputs. Figure 1 illustrates the connection between each neuron in the input layer and the neurons in the hidden layer in standard neural networks. However, in a CNN, only a limited area of the input layer is connected to neurons in the hidden layer [23], [24], [25].

B. Support Vector Machine

The Support Vector Machine (SVM) is a widely recognized ML algorithm employed for classification tasks. The algorithm operates by separating two separate data classes by a hyperplane while maximizing the margin among them. The specific data points within this hyperplane have been defined as “support vector points.” Linear or non-linear classification of Support Vector Machines (SVM) is determined by the selection of the kernel function. In addition, it is capable of processing both single-class and multi-class classifications. Nevertheless, SVM can consume a substantial number of resources, including memory and training time. Regular training at varying periods is advantageous for capturing changing user behaviors. The kernel function and its parameters have a major impact on the performance of a classifier [26]. When considering SVM, three essential hyperparameters are identified: the method for handling multiclass classification, the level of box constraint and the type of kernel. The choice of kernel type, such as cubic, linear, quadratic or Gaussian RBF, determines the data’s transformation before a classification. Each box constraint level determines the boundaries for the values of the Lagrange multipliers, which in turn affects the overall time training and the number of support vectors. Optimal selection of these hyperparameters is crucial for an effective classifier. SVM can address problems multiclass by transforming them into binary problems with either one-versus-all or one-versus-one strategies. Whereas the one-versus-all strategy training a distinct SVM to every class against all other classes, the one-versus-one strategy training an SVM for every pair of classes individually. SVM is a kernel-based method that offers excellent classification performance. It may effectively use multiple types of hyperplanes, such as linear, quadratic, cubic, or Gaussian, dataset customization for varying sizes [27].

C. Hippopotamus Optimization Algorithm

The Hippopotamus Optimization (HO) algorithm is used to fine-tune the hyperparameter selection for multiple SVM models. The HO algorithm is based on three well-known behavioral characteristics observed in the life of hippopotamuses. The hippopotamus, a fascinating creature native to Africa [28] falls within the classification of vertebrates. These organisms, which are adapted to live both in water and on land, mainly live in aquatic environments like rivers and ponds. Their habitat consists of clusters or groups, with a population ranging from 10 to 30 individuals [29]. Hippopotamuses are herbivores and depend on food that includes leaves, grass, stems, branches, plant husks, reeds, and flowers. They possess a strong inquisitiveness and aggressively investigate other sources of food. Due to their robust jaws, aggressive behavior, and territorial behaviors, they rank among the most dangerous mammals on the planet. Predators do often not target adult hippopotamuses due to their substantial impressive power and size. Both debilitated adult individuals and young hippopotamuses and are susceptible to predation by Nile crocodiles, lions, and spotted hyenas [30]. When they are under attack, they display defensive behavior by turning towards the attacker and producing loud vocalizations. If the defensive approach proves ineffective, they retreat rapidly, often moving towards nearby water bodies.

1) Mathematical Modeling

The HO algorithm is an optimization method based on population techniques that utilizes hippopotamuses as search agents. The HO algorithm utilizes hippopotamuses as candidate solutions for the optimization issue. Each hippopotamus’s position update in the search space corresponds to values for the decision variables. Every hippopotamus is denoted as a vector, and the hippopotamus’s population is formally described by a matrix. Like conventional optimization algorithms, the initialization phase of the HO entails creating random initial solutions. In this stage, the decision variables vector is generated by applying the following equation: $\begin{align*} x_{i}\mathrm {:}x_{ij}& ={low}_{j}+r\left ({{ {upp}_{j}-{low}_{j} }}\right ) \\ i & =\mathrm {1,2,3,\ldots ,}N\mathrm { }j=\mathrm {1,2,3,\ldots \ldots .}m \tag {1}\end{align*}$ View Sourcewhere the variable $\chi$ i denotes the location of the i-th potential solution. r is a random number between 0 and 1. The lower and upper boundaries of the j-th choice variable are denoted by upp and low, respectively. Denote N as the size of the hippopotamus population in the herd, and the number of decision variables employed in the problem is denoted by m. Equation (2) provides the specific definition of the population matrix. $\begin{align*} x=\left[\begin{array}{c} x_1 \\ \vdots \\ x_i \\ \vdots \\ x_N \end{array}\right]_{N x m}=\left[\begin{array}{ccccc} x_{1,1} & \cdots & v_{1, j} & \cdots & x_{1, m} \\ \vdots & \ddots & \vdots & \ddots & \vdots \\ x_{i, 1} & \cdots & x_{i, j} & \cdots & x_{i, m} \\ \vdots & \ddots & \vdots & \ddots & \vdots \\ x_{N, 1} & \cdots & x_{N, j} & \cdots & x_{N, m} \end{array}\right]_{N x m}\tag {2}\end{align*}$ View Source

2) Hippopotamus Phases

The following paragraph will show the three phases of HO algorithm which derive inspiration from three notable behavioral patterns identified in the life of hippopotamuses [31].

a) Phase 1: Position Update of the Hippopotamuses in the River or Pond (Exploration)

The equation no. (3) represents the mathematical for the male hippopotamus position members of the herd within a pond or lake [31]. $\begin{align*} x_{i}^{Mhippo}\mathrm {:}~x_{ij}^{Mhippo}& =x_{ij}+y_{1}(Dhippo-I_{1x_{ij}}) \\ for~i& =\mathrm {1,2,\ldots \ldots ,}\left [{{ \frac {N}{2} }}\right ]~and~ \\ j & =\mathrm {1,2,\ldots \ldots \ldots .,}~m\mathrm {.} \tag {3}\end{align*}$ View Sourcewhere $x_{i}^{Mhippo}$ denotes the male hippopotamus position, Dhippo represent the dominant hippopotamus position.

Equations (4) and (5) indicate the position update of the female and male or immature hippopotamus within the herd. $\begin{equation*}x_{i}=\{_{x_{i}\mathrm { }\qquad \qquad else}^{x_{i}^{Mhippo}F_{i}^{Mhippo}\mathrm { \lt }F_{i}} \tag {4}\end{equation*}$ View Sourcewhere Fi is objective function value. $\begin{equation*}x_{i}\{_{x_{i}\mathrm { }\qquad \qquad else}^{x_{i}^{FBhippo}F_{i}^{FBhippo}\mathrm { \lt }F_{i}} \tag {5}\end{equation*}$ View Source

This phase improves global search and enhances the exploration process.

b) Phase 2: Hippopotamus Defense Against Predators (Exploration)

The following equation (6) represents the predator’s position in search space [31]. $\begin{equation*} predators\mathrm {:}{predators}_{j}={low}_{j}+\vec {r}\mathrm {8}\left ({{ {upp}_{j}-{low}_{j} }}\right ),\end{equation*}$ View Source $\begin{equation*} j\mathrm {=1,2,3,\ldots \ldots .,}m\mathrm {.} \tag {6}\end{equation*}$ View Sourcewhere $\vec {r}\mathrm {8}$ indicate a random vector that rang from 0 to 1.

Based on Equation (15), if the value of $F_{i}^{HippoR}$ exceeds F, it indicates that the hippopotamus has been hunted by hunters. In this case, another hippopotamus will take its position in the herd. Conversely, if the hunter fails to capture the hippopotamus, it will be able to return to the herd. $\begin{equation*} x_{i}=\{_{x_{i}F_{i}^{HippoR}\quad \qquad \!\mathrm { \ge }F_{i}\mathrm { }}^{x_{i}^{HippoR}F_{i}^{HippoR}\quad \mathrm { \lt }F_{i}} \tag {7}\end{equation*}$ View Source

Both the first and second phases work in tandem to effectively reduce the chance of becoming stuck in local minima.

c) Phase 3: The Hippopotamus Evades the Predator (Exploitation)

The hippopotamus’s behavior is modeled by employing Equations (8)–(9). The hippopotamus has changed its position since it has discovered a safer position close to its existing position which leads to an increase in the cost function value. [31]. $\begin{align*} x_{i}^{Hippo\varepsilon }\mathrm {:}~x_{ij}^{Hippo\varepsilon }& =\!x_{ij}\!+\!r\mathrm {10 (}{low}_{j}^{local}\!+\!\eta \mathrm {1(}{upp}_{j}^{local}\!-\!{low}_{j}^{local}\mathrm {))} \\ i& =\mathrm {1,2,\ldots \ldots \ldots ,}~N, \\ j & =\mathrm {1,2,\ldots \ldots \ldots .,}~m \tag {8}\end{align*}$ View Sourcewhere $x_{i}^{Hippo\varepsilon }$ is the hippopotamus position was explored to identify the closest secure location.

$\eta 1$ is a number or a random vector that is picked at random from three possible scenarios $\eta$ . $\begin{equation*} x_{i}=\{_{x_{i}F_{i}^{Hippo\varepsilon }\qquad \mathrm { \ge }F_{i}\mathrm { }}^{x_{i}^{Hippo\varepsilon }F_{i}^{Hippo\varepsilon }~\mathrm { \lt }F_{i}} \tag {9}\end{equation*}$ View Source

D. Explainable Artificial Intelligence Algorithm

An objective for Explainable Artificial Intelligence (XAI) is to develop methods and methodologies that improve the understanding and readability of Artificial Intelligence (AI) models. XAI techniques aim to offer users an obvious understanding of the decision-making process of AI models and reliable explanations. A comprehensive framework known as Shapley Additive Explanations (SHAP) can be employed to clarify ML algorithm results. It determines the Shapley value of each feature in every occurrence by employing cooperative game theory, therefore reflecting the influence of the feature on the model’s prediction precision [32].

SHAP values provide a thorough framework for assessing the importance of features and can be used to generate clear explanations for specific predictions. The gradient-weighted class activation mapping (Grad-CAM) [33] approach, often known as XAI methods, is mostly used in computer vision applications. Visual explanations are generated by highlighting the specific regions of the input image that provides the most significant influence on the prediction of the model. By viewing these hetmans, users can understand the specific image regions that the model focused on during decision formulation. Layer-wise relevance propagation (LRP) [34] is an XAI method that provides relevant indices to the input features of a deep neural network. Its objective is to explain the distribution of the significance of the prediction over the input space and the respective contribution of each feature. LRP [35] applicable to understanding the impact of different input features on the decision-making capabilities of a model. An anchor-based method aims to discover rules or anchors that are readily comprehensible and effectively explain the model behavior model.

Anchors are specified as the criteria on the input data that are reliably understandable to humans and lead to an accurate prediction. The identification of these anchors allows users to acquire further knowledge regarding the decision boundaries of the model and its behavior in various scenarios. The counterfactual explanations [36] provide hypothetical situations or inputs that, if modified, might result in an alternative prediction of the model. The specific characteristics or features are highlighted by these justifications that have the greatest impact on the selection of the model. Counterfactual explanations can explain the decision-making process of a model and support the reasons behind a specific prediction made by the model.

The goal is to provide users with AI systems that are simple to understand, explainable, transparent, and reliable.

SECTION IV.

The Proposed Model for Eye Anemia Diagnosis

A. Dataset Description

The dataset Eyes-defy-anemia consists of 211 images of eyes. The dataset consists of 113 images of eye through Italian patients were obtained in conjunction with a blood sample and 95 eye images from patients in India were obtained at Karapakkam, Chennai [21]. Blood samples accompanied these images and became known as the Indian dataset. Every dataset included an Excel file with patient medical information, including medical information (for example the hemoglobin concentration level determined from the blood count, expressed in g/dL) and demographic information (for example age and sex). To determine if a patient had anemia, this paper employed a threshold value of 10.5 g/dL for the hemoglobin concentration, [37]. To be more specific, we classified patients with hemoglobin concentrations of 10.5 g/dL or less as anemic, and those with hemoglobin levels greater than 10.5 g/dL as nonanemic. The datasets of patient eye images from India and Italy consisted of 31 patients (around 32%) and 11 patients (approximately 10%) with anemia, respectively. Additionally, as shown in Table 2, the datasets contained 67 non-anemic patients (roughly 68%) from India and 102 non-anemic patients (nearly 90%) from Italy. 2. Conversely, the combined dataset consisted of 42 patients with anemia, accounting for nearly 20% of the total 211 images.

TABLE 2 Dataset Distribution

To mitigate the impact of the small dataset size, we employed several strategies:

Data augmentation techniques, such as rotation, scaling, and translation, were applied to artificially increase the diversity and size of the training set. This helps the model learn more robust features and reduces overfitting. [50].
The Synthetic Minority Over-Sampling Technique (SMOTE) was employed to address the class imbalance issue, ensuring a more balanced representation of anemic and non-anemic samples during training. [50].
The multi-branch CNN architecture (MultiBranchNet) was designed to efficiently extract a diverse set of features from the limited data, leveraging parallel processing to capture both local and global patterns relevant to anemia detection.
Progressive kernel size scaling (32/16, 64/32, 128/64) across modules that creates a hierarchical feature pyramid specifically tuned for eye anemia detection.

B. The Proposed Detailed Model and Its Phases

The main objective of the proposed MultiBranchNet model is to extract deep features that have distinctive features, allowing greater classification ability. DL is an essential class of ML algorithms that draws inspiration from the structure of the human brain. CNN is widely regarded as one of the most powerful deep learning architectures. Research has proved that CNN exhibits robustness when it comes to image translation and rotation. The CNN architecture is based on the convolutions’ set, fully connected layers, and pooling. The convolution layers extract both the high and low-level image features, while the pooling layer is employed for decreasing the input features size. The fully connected layer is then used for classification. The proposed model consists of three phases: (1) pre-processing phase, (2) feature extraction phase, (3) Feature extraction and optimization phase. The proposed model as in Fig 2. The integration of these three phases – data pre-processing, feature extraction using MultiBranchNet, and optimization using HO with multiclass SVM – forms the core of the proposed intelligent model for non-invasive anemia diagnosis.

FIGURE 2.

The proposed model architecture.

Show All

1) Pre-Processing Phase

Before inputting images into a CNN model, it is essential to undertake preprocessing processes. At first, the used dataset was imbalanced, meaning that there was a significant disparity in the number of instances across different classes. This imbalance introduces a bias in the CNN model, causing it to tend to predict the majority class more frequently [38]. In this paper the SMOTE technique is employe to manage imbalance in the dataset. After that to achieve more accuracy, the CNN model requires training using a large dataset. The limited amount of dataset can lead to overfitting [16]. This is because, during the training and testing stages, a small number of data samples cannot be effectively generalized and applied to hyperparameter models.

The techniques of image augmentation are employed to perform procedural operations on artificially generated images, including rotation, shear, scale, translation, and other similar transformations. Getting adequate images on the anemia dataset is challenging. Therefore, this paper employs image augmentation techniques to increase the dataset size used for training, testing, and validating the model.

Rotation: During this process, the original image is rotated by 90° and 270° to generate artificial images. The mean intensity of the image remains unchanged, regardless of the orientation of the images [39]. Equations (10)-(14) precisely define the mathematical structure for rotation image augmentation. The rotation angle $\theta$ for the equations defining the new coordinates of a pixel is precisely described as: $\begin{align*} x^{\prime }& =xcos\left ({{ \theta }}\right )-ysin(\theta \mathrm { } \tag {10}\\ y ^{\prime }& =xsin\left ({{ \theta }}\right )-ycos(\theta ) \tag {11}\end{align*}$ View SourceThrough the rotation of an anticlockwise, using an analogous formulation, it can express Pixel coordinates (x’, y’), a vector representation of the new image together with its rotation matrix: $\begin{align*} & x^{\prime} \\ & y^{\prime} \left[\begin{array}{l} x^{\prime} \\ y^{\prime} \end{array}\right]=\left[\begin{array}{cc} \cos (\theta) & -\sin (\theta) \\ \sin (\theta) & \cos (\theta) \end{array}\right] \tag {12}\end{align*}$ View SourceThe rotation (x0, y0 is the position of (0, 0) that denotes the derivation point and rotates either at the center of the image or around a certain point. The equation for subjective rotation of an image or pixel around a point (x0, y0) is written as: $\begin{align*} x^{\prime }& =x_{0}+\left ({{ x-x_{0} }}\right )\cos \left ({{ \theta }}\right )\mathrm {+(}y-y_{0}) \tag {13}\\ y^{\prime }& =y_{0}+\left ({{ x-x_{0} }}\right )\sin \left ({{ \theta }}\right )\mathrm {+(}y-y_{0}\mathrm {)cos(}\theta ) \tag {14}\end{align*}$ View SourceThe pixel located at coordinates (x0, y0) remains not moving and does not change its position. To determine the equation that needs to be solved to obtain an additional image from the original image, specify the values of x and y.
Translation: The original images ROI is slightly moved in the X and Y dimensions, or in both directions all together. Following the translation procedure, the mean intensity values of the image’s component remain unchanged [40].
Equations (15) and (16) list the mathematical models for translation in image augmentation. $\begin{align*} x^{\prime }=x+xT \tag {15}\\ y^{\prime }=y+yT \tag {16}\end{align*}$ View Source
The initial coordinates of x and y are transformed into the position of x’ and y’ in the resulting image, which are denoted by the coordinates of a pixel P. This transformation occurs when P is translated by a distance in all directions, indicated by xT and yT respectively.
Scale: Image Scale Augmentation is an augmentation technique where Images are scaled down or up to simulate varying distances or resolutions [41]. Images are resized along various axes using a scaling factor, which may vary or remain consistent for each axis. Specifically, changes in scaling can be understood as either zooming out (when the scaling factor is less than 1) or zooming in (when the scaling factor is bigger than 1).
Shear: This operation translates one side of an image along either the vertical or horizontal axis, resulting in the generation of a parallelogram. A vertical shear moves an edge parallel to the vertical axis, while a horizontal shear moves an edge parallel to the horizontal axis. The magnitude of the shear is determined by the angle of shear [42].

At the final step in the preprocessing phase, the eyes-defy-anemia images are resized to ( $224\times 224$ ) to ensure that all images have the same spatial dimensions, which is often a requirement for many CNN algorithms. The resized images are taken as input to MultiBranchNet as the next phase of the proposed model for features extraction

2) Feature Extraction Phase Using Multibranchnet

This phase introduces a novel CNN architecture, MultiBranchNet, which is composed of three modules, each containing two parallel branches. The design of MultiBranchNet leverages the PCNN architecture, which reduces parameters and layers compared to traditional sequential CNNs [11]. By using five convolutional layers arranged in parallel, this network performs as a single layer but achieves the efficiency of four, significantly lowering both parameter count and complexity.

This parallel architecture enhances processing speed by minimizing the number of parameters and layers [10], [11], enabling faster extraction of discriminative features compared to deeper sequential CNN models. Despite its simplified structure, the parallel CNN efficiently extracts the most prominent and distinguishing features from Eyes-defy-anemia images, contributing to the superior classification accuracy of the subsequent extreme learning machine (ELM) classifier. Furthermore, techniques such as dropout and batch normalization are integrated within the architecture, reducing overfitting and improving the model’s generalization capabilities.

Within MultiBranchNet, features extracted from the parallel branches at each module level are concatenated to capture both local and global features. These features are then passed to the subsequent optimization and classification phase.

The MultiBranchNet architecture consists of multiple modules (Module 1, Module 2, and Module 3), where each module includes two branches: the First Branch and the Second Branch. The model begins with two branches initially working in parallel in module 1. After the concatenation of the initial branches, this model is divided into two further branches working in parallel in module 2. Then concatenation occurred of the two branches in module 2, this model is divided into two further branches in module 3 which is divided into two branches working in parallel as shown in figure 2. The purpose of this division is to assemble features, and unique features are extracted from each branch and eventually aggregated.

Figure 3 shows the input image size is $224\times 224$ with 3 modules where first module consists of two branches “First Branch” which it starts with a 2D convolutional layer with a 32 kernel with size $3^{\ast } 3$ followed by batch normalization and a ReLU activation and “Second Branch” which it starts with a smaller 16 kernel with size $3^{\ast } 3$ but follows the same structure convolution, batch normalization, and ReLU. Both branches then concatenate after another set of convolution layers with batch normalization, ReLU activation, global average pooling layers and fully connected layers. After the concatenation of the initial branches, this model is divided into two further branches working in parallel in module 2. Module 2 has a similar structure but with a little larger kernel sizes 64 with size $3^{\ast } 3$ in first branch and 32 with size $3^{\ast } 3$ in second branch. Each branch has its own convolution layers, batch normalization, ReLU activation, global average pooling layers and fully connected layers. Then the concatenation of the first and second branches, this model is divided into two further branches working in parallel in module 3. Module 3 has the same dual branch structure with even larger kernel sizes 128 with size $3^{\ast } 3$ in first branch and 64 with size $3^{\ast } 3$ in second branch. Each branch has its own convolution layers, batch normalization, ReLU activation, global average pooling layers and fully connected layers. This module also ends with concatenation.

FIGURE 3.

Three models and first and second branches for each model for CNN architecture of MultiBranchNet.

Show All

3) Hippopotamus Optimization Algorithm Phase

In this phase employ the HO algorithm to optimize the hyperparameters of the multiclass SVM classifier HO-SVM, specifically focusing on the kernel parameter ( $\sigma$ ) and the penalty parameter (C). Then initialization of the HO algorithm with a population size of 30, a maximum of 100 iterations, and a search space defined by the bounds of $\sigma$ and After that evaluate each HO solution using the SVM classifier’s misclassification error rate as the fitness function, calculated on a separate testing dataset. Iterative update of the HO solutions based on their fitness values until the termination criteria are met, yielding the optimal hyperparameter values for the SVM. Final classification of the eye images using the optimized SVM model to determine the presence or absence of anemia. Below is a detailed explanation of the HO-SVM model.

a) Setting up Initial Parameters

These parameters consist of the population size, the maximum iteration count, the number of variables, and the upper and lower bounds. This paper focuses on optimizing two critical parameters of the linear kernel function of the SVM’s: the $\sigma$ parameter, that affects feature space delineation, and the penalty parameter C, which controls the classification accuracy [43], [44]. This paper set a fixed population size of 30, maximum iteration number is established at 100, and the number of variables to two. The maximum bounds of C and $\sigma$ are initially established at 1000 and 10, respectively, while the lower bounds are both set at 0.01. The initial positions of each individual inside the HO are then determined at random. It is important to understand that expanding the search space by increasing the upper boundary may result in slower convergence and higher computational demands. The values for C and $\sigma$ define each individual’s position.

b) Fitness Function, Positions Updates, and Termination Criteria

The evaluation of each individual’s position during the optimization process is conducted by utilizing a fitness function, defined as $F_{nt}$ . The paper applied the rate of misclassification errors as the fitness function, as defined by Equation (17), in which N denotes is the total number of samples and M the number of misclassified samples and. At the beginning, the paper randomly divides each dataset into two groups: a training group and a testing group, with a ratio of 80:20. The training dataset is utilized to train the SVM, while the testing dataset is used for evaluating the fitness of each position individual inside the HO algorithm. The most suitable position is defined as the solution in the search space with the lowest degree of misclassification error, and is determined by $\begin{equation*} \boldsymbol {F}_{\boldsymbol {nt}}=\boldsymbol {minimze}\frac {\boldsymbol {1}}{\boldsymbol {N}}\sum \nolimits _{\boldsymbol {i}=\boldsymbol {1}}^{\boldsymbol {N}} \left ({{ \boldsymbol {I}\left ({{ \boldsymbol {y}_{\boldsymbol {i}}^{\boldsymbol {(test)}}=\boldsymbol {y}_{\boldsymbol {i}}^{\boldsymbol {pred}} }}\right ) }}\right ) \tag {17}\end{equation*}$ View Source

In which N is the total number of samples in the dataset testing, I denote the function that takes the value of 1 when the predicted label, $y_{i}^{pred}$ matches the actual label $y_{i}^{(test)}$ and 0 otherwise.

SECTION V.

Experimental Results and Discussion

This section analyzes the experimental results of the proposed classification model based on the HO-SVM model. The research investigations were conducted using hardware specifications that comprised a Core (TM) i7-8th Gen processor operating at a frequency of 2.40 GHz, 16GB of RAM, and a 1TB hard disk. The software specification contains Windows 10 as the operating system and MATLAB R2024a as the programming language. These tools are utilized for the extraction and analysis of the results. The paper evaluates the classification performance using Accuracy, Sensitivity, Precision, and Matthews Correlation Coefficient. These scales are calculated using the values of true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The confusion matrix in Figure 7 provides the necessary data for calculating these metrics. The equations for the scales can be found in equations (18) to (22). $\begin{align*} Accuracy& =\frac {TP+TN}{T P + F N + F P + T N} \tag {18}\\ \text {Sensitivity}& = \frac {TP}{\left ({{ TP+FN }}\right )} \tag {19}\\ \text {Specificity} & =\frac {TN}{(TN+FP)} \tag {20}\\ Precision& =\frac {TP}{TP+FP} \tag {21}\\ Recall& =\frac {TP}{TP+FN} \tag {22}\\ F1-Score& =2\ast \frac {Precision\ast Recall}{Precision+Recall} \tag {23}\\ MCC& =\frac {\left ({{ TP\!\times \! TN }}\right )\!-\!(FP\!\times \! FN)}{\sqrt {(TP\!+\!FP)(TP\!+\!FN)(TN\!+\!FP)(TN\!+\!FN)} } \tag {24}\end{align*}$ View SourceTo optimize a Neural Network, we need a metric to evaluate its performance. In this case, we will use inference time as our metric. To accurately measure inference time, it’s essential to understand the concept of FLOPs (Floating Point Operations). FLOPs denote the total number of computations a model must perform during inference, such as addition, subtraction, division, multiplication, or any other operation involving floating-point values. By calculating the FLOPs, we can determine the complexity of our model and, consequently, estimate its inference time [50].

FIGURE 4.

2D t-SNE visualization of data points with class labels.

Show All

FIGURE 5.

3D t-SNE embedding of data points with color-mapped values.

Show All

FIGURE 6.

Classification using MultiBranchNet model.

Show All

FIGURE 7.

Confusion matrix: Model classification performance overview.

Show All

To calculate the FLOPs in a model, follow these guidelines:

Convolutions:
- FLOPs $= 2\times$ Number of Kernels $\times$ Kernel Shape $\times$ Output Shape
- Remember, the output shape of a convolutional layer is calculated as: Output = (Input Shape - Kernel Shape) +1
Fully Connected Layers:
- FLOPs $= 2\times$ Input Size $\times$ Output Size
Pooling Layers:
- FLOPs = Height $\times$ Depth $\times$ Width of an image
- If a stride is used, the formula becomes: FLOPs = (Height / Stride) $\times$ Depth $\times$ (Width / Stride) of an image.

A. Hippopotamus Experiments of Eye Anemia Images Augmentation

The experimental results demonstrate the efficacy of the proposed non-invasive anemia diagnosis model, which integrates a Multi-Branch CNN with Harris Hawks Optimization (HO) and a multiclass SVM. The experiments were conducted on a dataset of 211 eye images (anemic vs. non-anemic) collected from Indian and Italian patients. Initially, the dataset was imbalanced, with a disproportionately higher number of non-anemic images.

To rectify this imbalance, we applied the Synthetic Minority Over-sampling Technique (SMOTE) and various data augmentation strategies—namely rotation, scaling, translation, and shearing. These steps increased the number of anemic images and balanced the dataset, allowing for more robust model training.

Figure 4 is a 2D visualization of the anemic and non-anemic dataset generated using the t-Distributed Stochastic Neighbor Embedding (t-SNE) technique, which is frequently employed for reducing dimensionality and visualizing data. The left plot shows the pre-training t-SNE visualization, while the right plot displays the post-training feature space. These visualizations clearly display two separate clusters: one cluster represented by blue data points (anemic), while the other cluster is represented by orange data points (non-anemic). The pre-training plot shows broader, overlapping distributions for the anemic and non-anemic classes, while the post-training plot exhibits more separated, distinct peaks for each class. This indicates that the training process effectively learns features that better discriminate between the two categories.

Figure 5 presents a 3-dimensional t-SNE plot of the dataset, classifying between anemic and non-anemic persons. The plot visualizes how the data points cluster within the reduced dimensional space, with the color gradient representing different values or intensities, indicative of a specific biomarker or variable associated with anemia status. The blue and orange colors in the scatter plot denote two distinct categories, corresponding to anemic and non-anemic.

The Multi-Branch CNN architecture was first evaluated without the use of optimization techniques. The model demonstrated strong classification performance with accuracy, precision, Sensitivity, Specificity, F1-score and AUC were 92.65, 97.06, 89.19, 96.77, 92.96 and 97.3% respectively as shown in Figure. 6 and Figure. 7, Figure. 8. Figure 7 presents the confusion matrix, which reveals that the model achieves a high classification rate but exhibits a few misclassifications between the two classes. Figure 9 shows the distribution and cumulative distribution of prediction confidence for correct and incorrect predictions. The top graph reveals that correct predictions generally have higher confidence values clustered towards the right (closer to 1.0), while incorrect predictions have lower confidence spread more evenly across the range. The cumulative distribution in the bottom graph further illustrates that the curve for correct predictions rises sharply, indicating a high proportion of predictions with high confidence, in contrast to the more gradual increase seen for incorrect predictions. Figure 10 presents a set of sample images demonstrating the model’s predictions versus the true labels. The top two rows show anemic images, with the model correctly identifying the majority as anemic. The bottom two rows display non-anemic images, where the model again correctly predicts most as non-anemic, with only two misclassifications in the set shown. Based on the proposed model’s architecture and the specifications of the Intel Core i7-8550U CPU (estimated at 100 GFLOPS), the total number of floating-point operations (FLOPs) required for a single inference was calculated to be $1.1151\times 1010^{\wedge} 9$ , resulting in a theoretical inference time of approximately 11.151 milliseconds.

FIGURE 8.

ROC curve for anemic and non-anemic image.

Show All

FIGURE 9.

Distribution and cumulative distribution of prediction confidence.

Show All

FIGURE 10.

Sample Predictions: Predicted vs. True labels.

Show All

A feature extractor from Multi-Branch CNN architecture followed the HO-optimized multiclass SVM achieved an impressive accuracy of 97.06%, significantly outperforming existing methods in the literature. Additionally, this paper investigated feature extraction using the Multi-Branch CNN architecture as feature extraction for (anemic vs. non-anemic) collected from Indian and Italian patients. The objective is to determine how well Multi-Branch CNN architecture extracts pertinent features that can be used to improve classification. The sensitivity of the model reached 100%, indicating its effectiveness in correctly identifying anemic cases. This high sensitivity is critical for medical diagnostics, where false negatives can have severe consequences. The confusion matrix revealed that the model had minimal misclassifications, reflecting its ability to accurately differentiate between anemic and non-anemic cases. The high true positive rate underscores the model’s reliability. These features are given in the optimization (HO-SVM) classifier to get higher accuracy The model sets a fixed population size of 30, the maximum number of iterations to 100, and the number of variables to two. The maximum bounds of $\sigma$ and C are set at 10 for $\sigma$ and1000 for C, respectively, while the lower bounds are both set at 0.01 as shown in Table 3. Table 4 displays the best feasible hyperparameters for data testing with 100 iterations. The MultiBranchNet with (HO-SVM) model with an accuracy of 97.06% as shown in Figure 11. The confusion matrix evaluates the performance of a classification model applied to a dataset distinguishing between anemic and non-anemic individuals based on eye images. The matrix indicates the model’s accuracy at 97.06%, reflecting a high level of performance. The matrix’s cells show the number and percentage of correctly and incorrectly classified instances for each class. Specifically, the model correctly identified 35 out of 37 anemic images, resulting in a 94.6% true positive rate for the anemic class, with only 2 non-anemic images mistakenly classified as anemic (5.4% false positive rate). Importantly, the model achieved a perfect true negative rate, correctly identifying all 31 non-anemic images (100% accuracy for the non-anemic class). The matrix shows up the model’s robust capability in accurately differentiating between anemic and non-anemic cases, which is crucial for developing reliable diagnostic tools based on image analysis.

TABLE 3 The Selection of Hyperparameters for the Proposed Model

TABLE 4 The Optimal Possible Values Obtained after 100 Iterations

FIGURE 11.

The accuracy of MultiBranchNet with (HO-SVM) model.

Show All

The Receiver Operating Characteristic (ROC) curve, used to evaluate the performance MultiBranchNet with (HO-SVM) of a classification model applied to a dataset distinguishing between anemic and non-anemic persons as shown in Figure 12. The ROC curve plots the rate of true positive (sensitivity) against the rate of false positive (1)-specificity) at various threshold settings, providing insight into the compromise between being specific or sensitive. The curve’s proximity to the top-left corner indicates high performance, with an Area Under the Curve (AUC) value of 0.973. The AUC record suggests that the model is highly effective at classifying between the anemic and non-anemic classes. This strong performance underscores the model’s potential efficiency in accurately diagnosing eye anemia based on eye image, making it a valuable model for clinical or diagnostic applications.

FIGURE 12.

ROC curve for anemic and non-anemic image.

Show All

The SHAP force plot in Figure 13 explains a single model prediction, making it appropriate for error analysis to determine the explanation for a particular observation prediction [45]. Based on the plot, the following observations can be made:

x127 is the most influential predictor, with the highest SHAP value, indicating that it plays a significant role in determining the model’s output.
Predictors such as x65, x124, x29, and x20 also have high SHAP values, showing they are important in the classification process.
Predictors x72, x125, x87, and x42 have moderate SHAP values, meaning they have a moderate influence on the model’s predictions.
x106 has the lowest SHAP value among the listed predictors, indicating it has the least impact on the model’s decision compared to the others.

FIGURE 13.

SHAP values of key predictors in eye anemia classification model.

Show All

B. Comparative Analysis with Literature

This section provides a concise summary of the comparison between the proposed model and previous work. Table 5 proves that the proposed model outperforms most of the models in related work.

TABLE 5 Comparative Analysis against the Related Work

In previous studies on eye anemia detection, researchers faced significant challenges in achieving high accuracy and sensitivity due to the limitations of traditional machine learning models and the imbalance in available datasets. For example, models like SVM and simple CNN architectures often challenge accurately classifying anemic and non-anemic eye images, leading to not optimal diagnostic performance. These limitations were made worse by the small size of datasets, which often resulted in overfitting and poor generalization of new data.

The proposed model in this paper addresses these challenges by introducing a novel multi-branch CNN architecture combined with HO with multiclass SVM (HO-SVM). The multi-branch CNN improves feature extraction efficiency and accuracy, while the HO-SVM optimizes the classification process, leading to a significant increase in diagnostic accuracy, reaching 97.06%. Moreover, the use of SMOTE and various data augmentation techniques effectively reduces the issue of data imbalance, further enhancing the model’s robustness. This model not only overcomes the limitations of previous works but also sets a new benchmark for non-invasive eye anemia diagnosis, particularly in resource-limited settings. The proposed CNN model has been evaluated against other pre-trained deep CNN models by comparing the number of floating-point operations, as presented in Table 6. It can be observed from Table 6 that the proposed method requires only $1.1151\times 10^{\wedge} 9$ FLOPs, which is significantly lower than the computational requirements of other pre-trained CNN models such as ResNet 101 ( $3.6154\times 10^{\wedge} 9$ FLOPs), DenseNet201 ( $3.3099\times 10^{\wedge} 9$ FLOPs), and VGG16 ( $3.0947\times 10^{\wedge} 10$ 0 FLOPs). The results in Table 6 show that the proposed method only requires an inference time of approximately 11.151 significantly lower milliseconds.

TABLE 6 Contrasting the Computational (FLOPs) of the Suggested Approach against those of other Pre-trained CNN models

SECTION VI.

Limitations and Implementation Considerations

A. Limitations and Potential Failure Cases

Despite the promising results achieved by our proposed MultiBranchNet with HO-SVM model, several limitations and potential failure cases warrant discussion. The model’s performance, while robust on our test dataset, may be affected by several factors in real-world deployment scenarios.

1) Dataset Limitations

The current study utilized a dataset of 211 eye images from Italian and Indian populations. We acknowledge that this relatively small sample size may not fully capture the diversity in ocular features across different ethnicities, age groups, and geographical regions. However, obtaining large datasets in medical fields, especially for conditions like eye anemia, presents significant challenges due to privacy concerns, ethical issues, and resource limitations. This dataset, despite its size, represents one of the first publicly available resources for eye anemia detection and offers valuable insights into this domain.

To mitigate the impact of the small dataset size, we employed several strategies:

Data augmentation techniques, such as rotation, scaling, and translation, were applied to artificially increase the diversity and size of the training set. This helps the model learn more robust features and reduces overfitting.
The Synthetic Minority Over-Sampling Technique (SMOTE) was employed to address the class imbalance issue, ensuring a more balanced representation of anemic and non-anemic samples during training.
The multi-branch CNN architecture (MultiBranchNet) was designed to efficiently extract a diverse set of features from the limited data, leveraging parallel processing to capture both local and global patterns relevant to anemia detection.
Progressive kernel size scaling (32/16, 64/32, 128/64) across modules that creates a hierarchical feature pyramid specifically tuned for eye anemia detection.

Despite these mitigation strategies, variations in conjunctival appearance due to factors such as ethnicity, environmental exposure, and concurrent ocular conditions may still affect the model’s generalizability to broader populations.

2) Image Quality Dependencies

Our model’s performance is highly dependent on image quality, which can vary significantly in real-world settings. Factors that may adversely affect classification performance include:

Poor lighting conditions during image capture
Suboptimal focus or resolution
Inconsistent angle of capture
Presence of external factors such as eye irritation, allergies, or conjunctivitis
Varying devices used for image capture with different color calibrations

In resource-limited settings, where high-quality imaging equipment may not be available, these variations could substantially impact diagnostic accuracy. Our analysis of misclassified cases revealed that images with suboptimal lighting conditions or those captured at unusual angles were more likely to be misclassified.

3) Hemoglobin Threshold Considerations

The binary classification approach used in this study (anemic vs. non-anemic) employs a hemoglobin threshold of 10.5 g/dL. This simplification, while necessary for classification, does not account for the continuous nature of hemoglobin levels and their clinical interpretation. Cases near this threshold boundary showed higher misclassification rates in our analysis. Additionally, different medical guidelines may recommend varying thresholds for different populations (e.g., children, pregnant women, elderly), which our current model does not address.

4) Justification for Multibranchnet Architecture

The choice of MultiBranchNet for our eye anemia detection task was based on several key considerations:

Domain-specific requirements: Eye anemia detection requires a model that can effectively capture and integrate multi-scale features, as the signs of anemia can manifest at different levels of detail in the eye image. MultiBranchNet’s parallel processing and fusion of multiple scales directly address this requirement, making it a suitable choice for this specific domain.
Limited dataset size: Given the challenges in acquiring large-scale datasets for eye anemia, it is crucial to have a model that can learn robust features from limited samples. MultiBranchNet’s efficient processing and fusion of multi-scale features enable it to extract discriminative patterns even from small datasets, reducing the risk of overfitting and improving generalization.
Interpretability and trustworthiness: In medical applications, it is essential to have models that are not only accurate but also interpretable and trustworthy. MultiBranchNet’s multi-branch structure provides a natural way to visualize and understand the learned features at different scales, enhancing the interpretability of the model’s predictions. This is particularly important for building trust and facilitating clinical adoption of the proposed method.
Computational efficiency: In resource-constrained settings, such as low-income regions where eye anemia is more prevalent, it is important to have models that can operate efficiently on limited computational resources. MultiBranchNet’s parallel processing and streamlined architecture make it more computationally efficient compared to deeper, sequential models, enabling its deployment in a wider range of settings.

B. Practical Implementation Considerations

The translation of our proposed model from a research prototype to a deployable clinical tool requires careful consideration of several practical aspects.

1) Mobile Integration Pathway

Deploying the MultiBranchNet model on mobile platforms represents a promising avenue for increasing accessibility, particularly in resource-constrained settings. Our analysis indicates that the model’s computational requirements ( $1.1151\times 10$ FLOPs) are significantly lower than comparable models such as ResNet110 ( $3.6154\times 10^{\wedge} 9$ FLOPs) and VGG16 ( $3.0947\times 10^{\wedge} 10$ 0 FLOPs), making it suitable for mobile deployment.

To optimize mobile performance, several techniques could be implemented:

Model quantization to reduce precision from 32-bit to 8-bit or 16-bit
On-device inference optimization using hardware acceleration (GPU).

With these optimizations, we estimate the model could operate effectively on mid-range smartphones with at least 3GB RAM and modest processing capabilities.

2) User Interface Considerations

An effective user interface for clinical deployment should balance simplicity with functionality. Key elements should include:

Simple image capture guidance with real-time feedback on image quality
Clear instructions for optimal lighting and positioning
Immediate presentation of results with confidence scores
Ability to store and compare results over time
Integration with patient records where available
Offline functionality for areas with limited connectivity
Multilingual support for global deployment

The interface should be designed with input from healthcare workers to ensure it aligns with their workflow needs and technical capabilities.

3) Clinical Workflow Integration

The successful integration of our model into clinical workflows requires consideration of:

Training requirements for healthcare workers with varying levels of technical expertise
Standard operating procedures for image capture and interpretation
Guidelines for result verification and confirmation
Referral pathways for positive cases
Documentation and record-keeping protocols
Integration with existing electronic health record systems where available
Fallback procedures for cases where the system cannot provide high-confidence predictions

In resource-limited settings, the model could serve as a screening tool to prioritize patients for further testing, potentially reducing the burden on laboratory services and allowing for more efficient allocation of resources.

4) Regulatory and Ethical Considerations

Implementation in clinical settings would require navigating regulatory pathways specific to each region. Key considerations include:

Classification as a medical device under various regulatory frameworks (e.g., FDA, CE marking)
Data privacy compliance for patient information
Requirements for clinical validation studies
Ethical considerations regarding algorithmic bias and equity of access
Ongoing monitoring and updating protocols
Liability considerations for misclassification cases

SECTION VII.

Conclusion and Future Work

This study presents an innovative automated model for the early and non-invasive diagnosis of eye anemia, combining the MultiBranchNet architecture with a Hyperparameter-Optimized SVM (HO-SVM). The proposed system achieved outstanding performance with 97.06% accuracy in distinguishing between anemic and non-anemic patients using ocular images. This demonstrates its strong potential as a reliable screening tool that could significantly improve anemia detection, particularly in resource-limited settings. The model’s success stems from several key innovations, including robust data augmentation techniques (rotation, translation, and shear) to overcome dataset limitations, HO-SVM optimization for enhanced classification performance, and the efficient parallel structure of MultiBranchNet that enables real-time processing. The integration of Explainable AI (XAI) techniques, particularly SHAP, further strengthens the model’s clinical applicability by providing transparent insights into its decision-making process.

Looking ahead, future research should focus on three primary areas to advance this technology toward clinical implementation. First, model enhancement efforts should explore continuous learning frameworks to improve adaptability across diverse populations and imaging conditions. The development of multiclass classification capabilities could provide more nuanced anemia severity stratification, moving beyond the current binary detection approach. Additionally, investigating advanced augmentation techniques like Generative Adversarial Networks (GANs) may yield higher-quality synthetic training data to further boost model performance. Second, comprehensive clinical validation studies will be essential. This includes large-scale field testing in real-world low-resource environments to evaluate practical implementation challenges. Usability testing with healthcare workers and rigorous assessment of the model’s impact on clinical workflows will provide valuable insights for optimization. The development of region-specific models that account for local population characteristics and healthcare infrastructure constraints could significantly enhance the technology’s effectiveness in diverse settings. Finally, technology integration efforts should focus on developing smartphone applications for point-of-care deployment and exploring combinations with other non-invasive diagnostic modalities. These developments could dramatically expand the system’s accessibility and utility in various healthcare contexts.

This work establishes a strong foundation for transforming anemia screening through accessible, AI-driven technology. By pursuing these future directions, the system can evolve into a comprehensive diagnostic solution that bridges critical healthcare gaps in underserved regions while maintaining an optimal balance between accuracy, efficiency, and interpretability. The potential impact of this technology extends beyond improved diagnosis to include better health outcomes and more efficient resource allocation in global healthcare systems.

References is not available for this document.

An Explainable AI and Optimized Multi-Branch Convolutional Neural Network Model for Eye Anemia Diagnosis

Alerts

Abstract:

Metadata

Abstract:

Introduction

Related Work

Material and Methods

A. Convolutional Neural Network

B. Support Vector Machine

C. Hippopotamus Optimization Algorithm

1) Mathematical Modeling

2) Hippopotamus Phases

a) Phase 1: Position Update of the Hippopotamuses in the River or Pond (Exploration)

b) Phase 2: Hippopotamus Defense Against Predators (Exploration)

c) Phase 3: The Hippopotamus Evades the Predator (Exploitation)

D. Explainable Artificial Intelligence Algorithm

The Proposed Model for Eye Anemia Diagnosis

A. Dataset Description

B. The Proposed Detailed Model and Its Phases

1) Pre-Processing Phase

2) Feature Extraction Phase Using Multibranchnet

3) Hippopotamus Optimization Algorithm Phase

a) Setting up Initial Parameters

b) Fitness Function, Positions Updates, and Termination Criteria

Experimental Results and Discussion

A. Hippopotamus Experiments of Eye Anemia Images Augmentation

B. Comparative Analysis with Literature

Limitations and Implementation Considerations

A. Limitations and Potential Failure Cases

1) Dataset Limitations

2) Image Quality Dependencies

3) Hemoglobin Threshold Considerations

4) Justification for Multibranchnet Architecture

B. Practical Implementation Considerations

1) Mobile Integration Pathway

2) User Interface Considerations

3) Clinical Workflow Integration

4) Regulatory and Ethical Considerations

Conclusion and Future Work

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?