Introduction
Anemia is a global health concern significantly impacting populations worldwide, encompassing both developed and developing nations. Characterized by a deficiency in red blood cells or hemoglobin, anemia impairs oxygen transport, leading to symptoms like fatigue, weakness, and dizziness [1]. The World Health Organization (WHO) estimates that approximately one billion individuals globally suffer from anemia in varying degrees of severity [2].
Anemia can manifest in various forms, with iron, folate, vitamin B12, or hemoglobin deficiencies often disrupting erythrocyte production [2]. While mild anemia may initially be asymptomatic, the body’s compensatory mechanisms eventually fail, leading to the emergence of more pronounced symptoms. A specific manifestation of anemia, known as eye anemia or ocular anemia, affects the eyes by limiting oxygen supply to tissues, particularly the retina. Symptoms can include blurred vision, eye fatigue, and in severe cases, retinal hemorrhages. Early diagnosis and appropriate management are crucial to mitigate the consequences of anemia, including its ocular manifestations. Recent advances in IoT-based healthcare solutions, such as those developed in [49], demonstrate how technology can significantly improve treatment adherence and monitoring. An approach that could be adapted for anemia management.
Treatment strategies typically involve addressing the underlying cause, such as dietary adjustments, supplementation, or treatment of underlying medical conditions [5], [6]. The global impact of anemia is substantial, extending beyond individual health to encompass significant social and economic consequences. Eye anemia further exacerbates these challenges, affecting vision and daily activities, thereby impacting productivity and quality of life [6]. The management of anemia, including prevention, diagnosis, and treatment, poses a significant economic burden on healthcare systems worldwide. Recent advancements in unified healthcare data systems, such as in [48] demonstrate how integrated data management can revolutionize patient care and diagnostic efficiency. Inspired by such innovations, this work aims to develop a non-invasive method for hemoglobin measurement that can be easily deployed in resource-limited settings.
Deep learning (DL) techniques have been extensively and effectively employed in several domains, particularly in the field of medicine including eye anemia evaluation [7], [8]. Recent advancements in multi-deep transfer learning in cervical cancer cell detection, highlight the potential of sophisticated DL architectures in medical diagnostics [46]. One of the main algorithms of DL is the Convolutional Neural Network (CNN) which it has shown exemplary performance in medical image classification including the detection and analysis of anemia in eye images [9]. The CNN algorithm is a powerful DL capable of analyzing several attributes or objects inside an image and accurately differentiating between anemic and non-anemic eyes. CNN employs a hierarchical architecture, wherein a network is constructed in the form of a funnel and then generates a completely connected layer, where all neurons are interconnected, and the output is processed. The performance of neural networks is determined by several key factors, including memory consumption, the number of network parameters, accuracy, and speed, all of which play a crucial role. Network scaling aims to optimize the structure of a model by considering the above-mentioned parameters.
In this paper, a new CNN architecture model is proposed which incorporates three modules where each module consists of two branches working in parallel for classification to make the model robust [10], [11]. The parallel CNN (PCNN) design uses fewer parameters and layers compared to a similar sequential CNN structure. Five convolutional layers are placed in parallel, which effectively acts as a single convolutional layer but performs like five layers [11]. This parallel arrangement lowers the overall parameters and complexity of the network. This process will enhance the model execution time and improve accuracy by extracting features in the branches. Despite having fewer layers and parameters, the parallel CNN can extract the most discriminant features from the preprocessed Eyes-defy-anemia images. These features contribute to the high classification performance of the subsequent extreme learning machine (ELM) classifier. The parallel CNN architecture, combined with techniques like dropout and batch normalization, helps reduce overfitting and improves the model’s generalization ability [11]. Additionally, this architecture offers the benefit of incorporating a feature expansion system that transfers the extracted characteristics from the branches to a multi-level distribution. This enables communication between branches at various levels within the layers. For handling the imbalanced dataset this paper employed the Synthetic Minority Oversampling Technique (SMOTE) to get a balanced distribution of data in the dataset [12]. Due to the small size of the image dataset, trying to train an accurate model with a small amount of data is a challenge. Therefore, data augmentation methods, such as scaling, rotation, and translation applied to the original image dataset [13].
The paper key contributions include:
The proposed model combines a multi-branch CNN architecture (MultiBranchNet) for efficient feature extraction, the SMOTE technique and data augmentation used to address data imbalance, and HO with multiclass SVM for enhanced classification accuracy.
The parallel structure of the proposed model improves training speed and reduces inference time. HO fine-tunes the SVM hyperparameters, achieving 97.06% accuracy on the test set.
The model offers an accessible alternative to invasive diagnostic methods requiring repeated lab tests. Its high accuracy and sensitivity show potential for effective anemia screening, especially in resource-limited settings.
We provide a foundation for future research on clinical validation and integration of the model into diagnostic workflows.
We employ XAI to explain the results of the model by applying SHAP explainability.
The paper is organized into six sections. Section II performs a comprehensive literature review on anemia diagnosis employing Machine Learning (ML) and Deep Learning (DL) and summarizes the limitations in a table. Section III provides the preliminaries, like the essential concepts for CNN, SVM, and Hippopotamus Optimization. Section IV describes the details of the proposed model. Section V discusses the experiment results. The conclusion of the paper summarizes the key takeaways and potential areas for future exploration in Section VI.
Related Work
Recently, various research has been performed about hemoglobin deficiency (anemia) which impacts by affecting vision and overall eye health (eye anemia). These studies focused on developing a non-invasive method of identifying anemia from conjunctiva images employing ML and DL models which have proven to be successful. The fundamental assumption for this study is the significant association between the color of the palpebral conjunctiva and hemoglobin values [14]. An automated system employing computer vision and machine learning methods was developed by Zhang et al. [15] to detect anemia in the conjunctiva of eye images. After being trained on a dataset consisting of 362 conjunctiva images, the system attained an accuracy rate of 82.37%. Jain et al. [16] introduced a semi-automated method for detecting anemia using eye conjunctiva images. They utilized a dataset including 99 images, which was augmented using operations such as rotation, translation and mirroring. Because of the uneven conjunctiva shape, they apply image segmentation to extract the region-of-interest (RoI). Asare et al. [17] analyze pallor and diagnose anemia by employing machine learning methods on images of the conjunctiva of the eyes. They employed an available dataset containing 710 images of the conjunctiva eyes. These images were obtained using a specialized instrument that effectively removes any potential influence caused by ambient light. They developed a model to detect conjunctiva and an anemia diagnosis model by combining CNN, Logistic Regression, and Gaussian Blur technique. In addition, CNN was used by Magdalena et al. [18] by employing images of eye conjunctiva for anemia detection and achieved an accuracy of 94 %. Moreover, Delgado-Rivera et al. [19] employed a CNN to classify conjunctiva images anemic or non-anemic after segmentation. The CNN achieves a sensitivity of 77.58% in comparison with the results obtained from laboratory experimental testing. Agrawal [20] proposed automated method for detecting anemia from retinal images using deep learning. In this research, they chose a dataset containing 400 images. Their proposed model achieved accuracy 62.23% and a precision of 59%. Dimauro at al. [21] proposed an intelligent system, based on machine learning to diagnosis anemia and then employed MobileNet V2 on two different datasets from Italy and Indian. The accuracy of the MobileNet-V2 model was 91% on the Italy dataset, while it was 68% on the India dataset. Dhalla et al. [22] employed five pretrained segmentation models namely FCN, PSPNet, LinkNet, UNet and UNet++ for conjunctiva segmentation, which are optimized on a customized dataset. Their experiments are conducted on a specially constructed dataset consisting of 2592 palpebral images from the pediatric population. Their research findings demonstrate that the LinkNet architecture outperforms the best. Its scores at 94.17% for accuracy, 90% for the Intersection Over Union (IOU) and 93% for the Dice score performance metrics. Table 1 summarizes the related works. Due to the importance and necessity of diagnosing eye anemia and the limitation found in previous studies, this paper proposes a new CNN architecture called MultiBranchNet that consists of three modules every module consists of two branches working in parallel for non-invasive diagnosis.
Material and Methods
A. Convolutional Neural Network
The objective of CNN is to learn more complex features in the data by convolution. These techniques are mostly used for the analysis of visual images and represent the latest advancements in segmentation, object detection, image classification, and other image processing tasks. The structure is composed of multiple layers that include local spatial connectivity and sets of neurons with shared parameters. A typical CNN architecture consists of three key layers which are convolutional, pooling layers, and fully connected layers. The convolutional layers extract local features from the input data through the application of filters (kernels). These filters learn to detect specific patterns or edges within the data. Subsequently, pooling layers down samples the feature maps generated by the convolutional layers, reducing computational complexity and data dimensionality. Finally, fully connected layers operate similarly to traditional neural network hidden layers, where neurons are fully connected and perform classification or regression tasks based on the learned features as shown in Figure 1.
Using a CNN architecture will decrease the hardware requirements for storage and accelerate the training process, especially when dealing with many parameters, in contrast to alternative architectures available. CNN possesses the ability to process raw inputs and extract distinctive features from these inputs. Figure 1 illustrates the connection between each neuron in the input layer and the neurons in the hidden layer in standard neural networks. However, in a CNN, only a limited area of the input layer is connected to neurons in the hidden layer [23], [24], [25].
B. Support Vector Machine
The Support Vector Machine (SVM) is a widely recognized ML algorithm employed for classification tasks. The algorithm operates by separating two separate data classes by a hyperplane while maximizing the margin among them. The specific data points within this hyperplane have been defined as “support vector points.” Linear or non-linear classification of Support Vector Machines (SVM) is determined by the selection of the kernel function. In addition, it is capable of processing both single-class and multi-class classifications. Nevertheless, SVM can consume a substantial number of resources, including memory and training time. Regular training at varying periods is advantageous for capturing changing user behaviors. The kernel function and its parameters have a major impact on the performance of a classifier [26]. When considering SVM, three essential hyperparameters are identified: the method for handling multiclass classification, the level of box constraint and the type of kernel. The choice of kernel type, such as cubic, linear, quadratic or Gaussian RBF, determines the data’s transformation before a classification. Each box constraint level determines the boundaries for the values of the Lagrange multipliers, which in turn affects the overall time training and the number of support vectors. Optimal selection of these hyperparameters is crucial for an effective classifier. SVM can address problems multiclass by transforming them into binary problems with either one-versus-all or one-versus-one strategies. Whereas the one-versus-all strategy training a distinct SVM to every class against all other classes, the one-versus-one strategy training an SVM for every pair of classes individually. SVM is a kernel-based method that offers excellent classification performance. It may effectively use multiple types of hyperplanes, such as linear, quadratic, cubic, or Gaussian, dataset customization for varying sizes [27].
C. Hippopotamus Optimization Algorithm
The Hippopotamus Optimization (HO) algorithm is used to fine-tune the hyperparameter selection for multiple SVM models. The HO algorithm is based on three well-known behavioral characteristics observed in the life of hippopotamuses. The hippopotamus, a fascinating creature native to Africa [28] falls within the classification of vertebrates. These organisms, which are adapted to live both in water and on land, mainly live in aquatic environments like rivers and ponds. Their habitat consists of clusters or groups, with a population ranging from 10 to 30 individuals [29]. Hippopotamuses are herbivores and depend on food that includes leaves, grass, stems, branches, plant husks, reeds, and flowers. They possess a strong inquisitiveness and aggressively investigate other sources of food. Due to their robust jaws, aggressive behavior, and territorial behaviors, they rank among the most dangerous mammals on the planet. Predators do often not target adult hippopotamuses due to their substantial impressive power and size. Both debilitated adult individuals and young hippopotamuses and are susceptible to predation by Nile crocodiles, lions, and spotted hyenas [30]. When they are under attack, they display defensive behavior by turning towards the attacker and producing loud vocalizations. If the defensive approach proves ineffective, they retreat rapidly, often moving towards nearby water bodies.
1) Mathematical Modeling
The HO algorithm is an optimization method based on population techniques that utilizes hippopotamuses as search agents. The HO algorithm utilizes hippopotamuses as candidate solutions for the optimization issue. Each hippopotamus’s position update in the search space corresponds to values for the decision variables. Every hippopotamus is denoted as a vector, and the hippopotamus’s population is formally described by a matrix. Like conventional optimization algorithms, the initialization phase of the HO entails creating random initial solutions. In this stage, the decision variables vector is generated by applying the following equation:\begin{align*} x_{i}\mathrm {:}x_{ij}& ={low}_{j}+r\left ({{ {upp}_{j}-{low}_{j} }}\right ) \\ i & =\mathrm {1,2,3,\ldots ,}N\mathrm { }j=\mathrm {1,2,3,\ldots \ldots .}m \tag {1}\end{align*}
\begin{align*} x=\left[\begin{array}{c} x_1 \\ \vdots \\ x_i \\ \vdots \\ x_N \end{array}\right]_{N x m}=\left[\begin{array}{ccccc} x_{1,1} & \cdots & v_{1, j} & \cdots & x_{1, m} \\ \vdots & \ddots & \vdots & \ddots & \vdots \\ x_{i, 1} & \cdots & x_{i, j} & \cdots & x_{i, m} \\ \vdots & \ddots & \vdots & \ddots & \vdots \\ x_{N, 1} & \cdots & x_{N, j} & \cdots & x_{N, m} \end{array}\right]_{N x m}\tag {2}\end{align*}
2) Hippopotamus Phases
The following paragraph will show the three phases of HO algorithm which derive inspiration from three notable behavioral patterns identified in the life of hippopotamuses [31].
a) Phase 1: Position Update of the Hippopotamuses in the River or Pond (Exploration)
The equation no. (3) represents the mathematical for the male hippopotamus position members of the herd within a pond or lake [31].\begin{align*} x_{i}^{Mhippo}\mathrm {:}~x_{ij}^{Mhippo}& =x_{ij}+y_{1}(Dhippo-I_{1x_{ij}}) \\ for~i& =\mathrm {1,2,\ldots \ldots ,}\left [{{ \frac {N}{2} }}\right ]~and~ \\ j & =\mathrm {1,2,\ldots \ldots \ldots .,}~m\mathrm {.} \tag {3}\end{align*}
Equations (4) and (5) indicate the position update of the female and male or immature hippopotamus within the herd.\begin{equation*}x_{i}=\{_{x_{i}\mathrm { }\qquad \qquad else}^{x_{i}^{Mhippo}F_{i}^{Mhippo}\mathrm { \lt }F_{i}} \tag {4}\end{equation*}
\begin{equation*}x_{i}\{_{x_{i}\mathrm { }\qquad \qquad else}^{x_{i}^{FBhippo}F_{i}^{FBhippo}\mathrm { \lt }F_{i}} \tag {5}\end{equation*}
This phase improves global search and enhances the exploration process.
b) Phase 2: Hippopotamus Defense Against Predators (Exploration)
The following equation (6) represents the predator’s position in search space [31].\begin{equation*} predators\mathrm {:}{predators}_{j}={low}_{j}+\vec {r}\mathrm {8}\left ({{ {upp}_{j}-{low}_{j} }}\right ),\end{equation*}
\begin{equation*} j\mathrm {=1,2,3,\ldots \ldots .,}m\mathrm {.} \tag {6}\end{equation*}
Based on Equation (15), if the value of \begin{equation*} x_{i}=\{_{x_{i}F_{i}^{HippoR}\quad \qquad \!\mathrm { \ge }F_{i}\mathrm { }}^{x_{i}^{HippoR}F_{i}^{HippoR}\quad \mathrm { \lt }F_{i}} \tag {7}\end{equation*}
Both the first and second phases work in tandem to effectively reduce the chance of becoming stuck in local minima.
c) Phase 3: The Hippopotamus Evades the Predator (Exploitation)
The hippopotamus’s behavior is modeled by employing Equations (8)–(9). The hippopotamus has changed its position since it has discovered a safer position close to its existing position which leads to an increase in the cost function value. [31].\begin{align*} x_{i}^{Hippo\varepsilon }\mathrm {:}~x_{ij}^{Hippo\varepsilon }& =\!x_{ij}\!+\!r\mathrm {10 (}{low}_{j}^{local}\!+\!\eta \mathrm {1(}{upp}_{j}^{local}\!-\!{low}_{j}^{local}\mathrm {))} \\ i& =\mathrm {1,2,\ldots \ldots \ldots ,}~N, \\ j & =\mathrm {1,2,\ldots \ldots \ldots .,}~m \tag {8}\end{align*}
\begin{equation*} x_{i}=\{_{x_{i}F_{i}^{Hippo\varepsilon }\qquad \mathrm { \ge }F_{i}\mathrm { }}^{x_{i}^{Hippo\varepsilon }F_{i}^{Hippo\varepsilon }~\mathrm { \lt }F_{i}} \tag {9}\end{equation*}
D. Explainable Artificial Intelligence Algorithm
An objective for Explainable Artificial Intelligence (XAI) is to develop methods and methodologies that improve the understanding and readability of Artificial Intelligence (AI) models. XAI techniques aim to offer users an obvious understanding of the decision-making process of AI models and reliable explanations. A comprehensive framework known as Shapley Additive Explanations (SHAP) can be employed to clarify ML algorithm results. It determines the Shapley value of each feature in every occurrence by employing cooperative game theory, therefore reflecting the influence of the feature on the model’s prediction precision [32].
SHAP values provide a thorough framework for assessing the importance of features and can be used to generate clear explanations for specific predictions. The gradient-weighted class activation mapping (Grad-CAM) [33] approach, often known as XAI methods, is mostly used in computer vision applications. Visual explanations are generated by highlighting the specific regions of the input image that provides the most significant influence on the prediction of the model. By viewing these hetmans, users can understand the specific image regions that the model focused on during decision formulation. Layer-wise relevance propagation (LRP) [34] is an XAI method that provides relevant indices to the input features of a deep neural network. Its objective is to explain the distribution of the significance of the prediction over the input space and the respective contribution of each feature. LRP [35] applicable to understanding the impact of different input features on the decision-making capabilities of a model. An anchor-based method aims to discover rules or anchors that are readily comprehensible and effectively explain the model behavior model.
Anchors are specified as the criteria on the input data that are reliably understandable to humans and lead to an accurate prediction. The identification of these anchors allows users to acquire further knowledge regarding the decision boundaries of the model and its behavior in various scenarios. The counterfactual explanations [36] provide hypothetical situations or inputs that, if modified, might result in an alternative prediction of the model. The specific characteristics or features are highlighted by these justifications that have the greatest impact on the selection of the model. Counterfactual explanations can explain the decision-making process of a model and support the reasons behind a specific prediction made by the model.
The goal is to provide users with AI systems that are simple to understand, explainable, transparent, and reliable.
The Proposed Model for Eye Anemia Diagnosis
A. Dataset Description
The dataset Eyes-defy-anemia consists of 211 images of eyes. The dataset consists of 113 images of eye through Italian patients were obtained in conjunction with a blood sample and 95 eye images from patients in India were obtained at Karapakkam, Chennai [21]. Blood samples accompanied these images and became known as the Indian dataset. Every dataset included an Excel file with patient medical information, including medical information (for example the hemoglobin concentration level determined from the blood count, expressed in g/dL) and demographic information (for example age and sex). To determine if a patient had anemia, this paper employed a threshold value of 10.5 g/dL for the hemoglobin concentration, [37]. To be more specific, we classified patients with hemoglobin concentrations of 10.5 g/dL or less as anemic, and those with hemoglobin levels greater than 10.5 g/dL as nonanemic. The datasets of patient eye images from India and Italy consisted of 31 patients (around 32%) and 11 patients (approximately 10%) with anemia, respectively. Additionally, as shown in Table 2, the datasets contained 67 non-anemic patients (roughly 68%) from India and 102 non-anemic patients (nearly 90%) from Italy. 2. Conversely, the combined dataset consisted of 42 patients with anemia, accounting for nearly 20% of the total 211 images.
To mitigate the impact of the small dataset size, we employed several strategies:
Data augmentation techniques, such as rotation, scaling, and translation, were applied to artificially increase the diversity and size of the training set. This helps the model learn more robust features and reduces overfitting. [50].
The Synthetic Minority Over-Sampling Technique (SMOTE) was employed to address the class imbalance issue, ensuring a more balanced representation of anemic and non-anemic samples during training. [50].
The multi-branch CNN architecture (MultiBranchNet) was designed to efficiently extract a diverse set of features from the limited data, leveraging parallel processing to capture both local and global patterns relevant to anemia detection.
Progressive kernel size scaling (32/16, 64/32, 128/64) across modules that creates a hierarchical feature pyramid specifically tuned for eye anemia detection.
B. The Proposed Detailed Model and Its Phases
The main objective of the proposed MultiBranchNet model is to extract deep features that have distinctive features, allowing greater classification ability. DL is an essential class of ML algorithms that draws inspiration from the structure of the human brain. CNN is widely regarded as one of the most powerful deep learning architectures. Research has proved that CNN exhibits robustness when it comes to image translation and rotation. The CNN architecture is based on the convolutions’ set, fully connected layers, and pooling. The convolution layers extract both the high and low-level image features, while the pooling layer is employed for decreasing the input features size. The fully connected layer is then used for classification. The proposed model consists of three phases: (1) pre-processing phase, (2) feature extraction phase, (3) Feature extraction and optimization phase. The proposed model as in Fig 2. The integration of these three phases – data pre-processing, feature extraction using MultiBranchNet, and optimization using HO with multiclass SVM – forms the core of the proposed intelligent model for non-invasive anemia diagnosis.
1) Pre-Processing Phase
Before inputting images into a CNN model, it is essential to undertake preprocessing processes. At first, the used dataset was imbalanced, meaning that there was a significant disparity in the number of instances across different classes. This imbalance introduces a bias in the CNN model, causing it to tend to predict the majority class more frequently [38]. In this paper the SMOTE technique is employe to manage imbalance in the dataset. After that to achieve more accuracy, the CNN model requires training using a large dataset. The limited amount of dataset can lead to overfitting [16]. This is because, during the training and testing stages, a small number of data samples cannot be effectively generalized and applied to hyperparameter models.
The techniques of image augmentation are employed to perform procedural operations on artificially generated images, including rotation, shear, scale, translation, and other similar transformations. Getting adequate images on the anemia dataset is challenging. Therefore, this paper employs image augmentation techniques to increase the dataset size used for training, testing, and validating the model.
Rotation: During this process, the original image is rotated by 90° and 270° to generate artificial images. The mean intensity of the image remains unchanged, regardless of the orientation of the images [39]. Equations (10)-(14) precisely define the mathematical structure for rotation image augmentation. The rotation angle
for the equations defining the new coordinates of a pixel is precisely described as:\theta Through the rotation of an anticlockwise, using an analogous formulation, it can express Pixel coordinates (x’, y’), a vector representation of the new image together with its rotation matrix:\begin{align*} x^{\prime }& =xcos\left ({{ \theta }}\right )-ysin(\theta \mathrm { } \tag {10}\\ y ^{\prime }& =xsin\left ({{ \theta }}\right )-ycos(\theta ) \tag {11}\end{align*} View Source\begin{align*} x^{\prime }& =xcos\left ({{ \theta }}\right )-ysin(\theta \mathrm { } \tag {10}\\ y ^{\prime }& =xsin\left ({{ \theta }}\right )-ycos(\theta ) \tag {11}\end{align*}
The rotation (x0, y0 is the position of (0, 0) that denotes the derivation point and rotates either at the center of the image or around a certain point. The equation for subjective rotation of an image or pixel around a point (x0, y0) is written as:\begin{align*} & x^{\prime} \\ & y^{\prime} \left[\begin{array}{l} x^{\prime} \\ y^{\prime} \end{array}\right]=\left[\begin{array}{cc} \cos (\theta) & -\sin (\theta) \\ \sin (\theta) & \cos (\theta) \end{array}\right] \tag {12}\end{align*} View Source\begin{align*} & x^{\prime} \\ & y^{\prime} \left[\begin{array}{l} x^{\prime} \\ y^{\prime} \end{array}\right]=\left[\begin{array}{cc} \cos (\theta) & -\sin (\theta) \\ \sin (\theta) & \cos (\theta) \end{array}\right] \tag {12}\end{align*}
The pixel located at coordinates (x0, y0) remains not moving and does not change its position. To determine the equation that needs to be solved to obtain an additional image from the original image, specify the values of x and y.\begin{align*} x^{\prime }& =x_{0}+\left ({{ x-x_{0} }}\right )\cos \left ({{ \theta }}\right )\mathrm {+(}y-y_{0}) \tag {13}\\ y^{\prime }& =y_{0}+\left ({{ x-x_{0} }}\right )\sin \left ({{ \theta }}\right )\mathrm {+(}y-y_{0}\mathrm {)cos(}\theta ) \tag {14}\end{align*} View Source\begin{align*} x^{\prime }& =x_{0}+\left ({{ x-x_{0} }}\right )\cos \left ({{ \theta }}\right )\mathrm {+(}y-y_{0}) \tag {13}\\ y^{\prime }& =y_{0}+\left ({{ x-x_{0} }}\right )\sin \left ({{ \theta }}\right )\mathrm {+(}y-y_{0}\mathrm {)cos(}\theta ) \tag {14}\end{align*}
Translation: The original images ROI is slightly moved in the X and Y dimensions, or in both directions all together. Following the translation procedure, the mean intensity values of the image’s component remain unchanged [40].
Equations (15) and (16) list the mathematical models for translation in image augmentation.
\begin{align*} x^{\prime }=x+xT \tag {15}\\ y^{\prime }=y+yT \tag {16}\end{align*} View Source\begin{align*} x^{\prime }=x+xT \tag {15}\\ y^{\prime }=y+yT \tag {16}\end{align*}
The initial coordinates of x and y are transformed into the position of x’ and y’ in the resulting image, which are denoted by the coordinates of a pixel P. This transformation occurs when P is translated by a distance in all directions, indicated by xT and yT respectively.
Scale: Image Scale Augmentation is an augmentation technique where Images are scaled down or up to simulate varying distances or resolutions [41]. Images are resized along various axes using a scaling factor, which may vary or remain consistent for each axis. Specifically, changes in scaling can be understood as either zooming out (when the scaling factor is less than 1) or zooming in (when the scaling factor is bigger than 1).
Shear: This operation translates one side of an image along either the vertical or horizontal axis, resulting in the generation of a parallelogram. A vertical shear moves an edge parallel to the vertical axis, while a horizontal shear moves an edge parallel to the horizontal axis. The magnitude of the shear is determined by the angle of shear [42].
At the final step in the preprocessing phase, the eyes-defy-anemia images are resized to (
2) Feature Extraction Phase Using Multibranchnet
This phase introduces a novel CNN architecture, MultiBranchNet, which is composed of three modules, each containing two parallel branches. The design of MultiBranchNet leverages the PCNN architecture, which reduces parameters and layers compared to traditional sequential CNNs [11]. By using five convolutional layers arranged in parallel, this network performs as a single layer but achieves the efficiency of four, significantly lowering both parameter count and complexity.
This parallel architecture enhances processing speed by minimizing the number of parameters and layers [10], [11], enabling faster extraction of discriminative features compared to deeper sequential CNN models. Despite its simplified structure, the parallel CNN efficiently extracts the most prominent and distinguishing features from Eyes-defy-anemia images, contributing to the superior classification accuracy of the subsequent extreme learning machine (ELM) classifier. Furthermore, techniques such as dropout and batch normalization are integrated within the architecture, reducing overfitting and improving the model’s generalization capabilities.
Within MultiBranchNet, features extracted from the parallel branches at each module level are concatenated to capture both local and global features. These features are then passed to the subsequent optimization and classification phase.
The MultiBranchNet architecture consists of multiple modules (Module 1, Module 2, and Module 3), where each module includes two branches: the First Branch and the Second Branch. The model begins with two branches initially working in parallel in module 1. After the concatenation of the initial branches, this model is divided into two further branches working in parallel in module 2. Then concatenation occurred of the two branches in module 2, this model is divided into two further branches in module 3 which is divided into two branches working in parallel as shown in figure 2. The purpose of this division is to assemble features, and unique features are extracted from each branch and eventually aggregated.
Figure 3 shows the input image size is
Three models and first and second branches for each model for CNN architecture of MultiBranchNet.
3) Hippopotamus Optimization Algorithm Phase
In this phase employ the HO algorithm to optimize the hyperparameters of the multiclass SVM classifier HO-SVM, specifically focusing on the kernel parameter (
a) Setting up Initial Parameters
These parameters consist of the population size, the maximum iteration count, the number of variables, and the upper and lower bounds. This paper focuses on optimizing two critical parameters of the linear kernel function of the SVM’s: the
b) Fitness Function, Positions Updates, and Termination Criteria
The evaluation of each individual’s position during the optimization process is conducted by utilizing a fitness function, defined as \begin{equation*} \boldsymbol {F}_{\boldsymbol {nt}}=\boldsymbol {minimze}\frac {\boldsymbol {1}}{\boldsymbol {N}}\sum \nolimits _{\boldsymbol {i}=\boldsymbol {1}}^{\boldsymbol {N}} \left ({{ \boldsymbol {I}\left ({{ \boldsymbol {y}_{\boldsymbol {i}}^{\boldsymbol {(test)}}=\boldsymbol {y}_{\boldsymbol {i}}^{\boldsymbol {pred}} }}\right ) }}\right ) \tag {17}\end{equation*}
In which N is the total number of samples in the dataset testing, I denote the function that takes the value of 1 when the predicted label,
Experimental Results and Discussion
This section analyzes the experimental results of the proposed classification model based on the HO-SVM model. The research investigations were conducted using hardware specifications that comprised a Core (TM) i7-8th Gen processor operating at a frequency of 2.40 GHz, 16GB of RAM, and a 1TB hard disk. The software specification contains Windows 10 as the operating system and MATLAB R2024a as the programming language. These tools are utilized for the extraction and analysis of the results. The paper evaluates the classification performance using Accuracy, Sensitivity, Precision, and Matthews Correlation Coefficient. These scales are calculated using the values of true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The confusion matrix in Figure 7 provides the necessary data for calculating these metrics. The equations for the scales can be found in equations (18) to (22).\begin{align*} Accuracy& =\frac {TP+TN}{T P + F N + F P + T N} \tag {18}\\ \text {Sensitivity}& = \frac {TP}{\left ({{ TP+FN }}\right )} \tag {19}\\ \text {Specificity} & =\frac {TN}{(TN+FP)} \tag {20}\\ Precision& =\frac {TP}{TP+FP} \tag {21}\\ Recall& =\frac {TP}{TP+FN} \tag {22}\\ F1-Score& =2\ast \frac {Precision\ast Recall}{Precision+Recall} \tag {23}\\ MCC& =\frac {\left ({{ TP\!\times \! TN }}\right )\!-\!(FP\!\times \! FN)}{\sqrt {(TP\!+\!FP)(TP\!+\!FN)(TN\!+\!FP)(TN\!+\!FN)} } \tag {24}\end{align*}
To calculate the FLOPs in a model, follow these guidelines:
Convolutions:
FLOPs
Number of Kernels= 2\times Kernel Shape\times Output Shape\times Remember, the output shape of a convolutional layer is calculated as: Output = (Input Shape - Kernel Shape) +1
Fully Connected Layers:
FLOPs
Input Size= 2\times Output Size\times
Pooling Layers:
FLOPs = Height
Depth\times Width of an image\times If a stride is used, the formula becomes: FLOPs = (Height / Stride)
Depth\times (Width / Stride) of an image.\times
A. Hippopotamus Experiments of Eye Anemia Images Augmentation
The experimental results demonstrate the efficacy of the proposed non-invasive anemia diagnosis model, which integrates a Multi-Branch CNN with Harris Hawks Optimization (HO) and a multiclass SVM. The experiments were conducted on a dataset of 211 eye images (anemic vs. non-anemic) collected from Indian and Italian patients. Initially, the dataset was imbalanced, with a disproportionately higher number of non-anemic images.
To rectify this imbalance, we applied the Synthetic Minority Over-sampling Technique (SMOTE) and various data augmentation strategies—namely rotation, scaling, translation, and shearing. These steps increased the number of anemic images and balanced the dataset, allowing for more robust model training.
Figure 4 is a 2D visualization of the anemic and non-anemic dataset generated using the t-Distributed Stochastic Neighbor Embedding (t-SNE) technique, which is frequently employed for reducing dimensionality and visualizing data. The left plot shows the pre-training t-SNE visualization, while the right plot displays the post-training feature space. These visualizations clearly display two separate clusters: one cluster represented by blue data points (anemic), while the other cluster is represented by orange data points (non-anemic). The pre-training plot shows broader, overlapping distributions for the anemic and non-anemic classes, while the post-training plot exhibits more separated, distinct peaks for each class. This indicates that the training process effectively learns features that better discriminate between the two categories.
Figure 5 presents a 3-dimensional t-SNE plot of the dataset, classifying between anemic and non-anemic persons. The plot visualizes how the data points cluster within the reduced dimensional space, with the color gradient representing different values or intensities, indicative of a specific biomarker or variable associated with anemia status. The blue and orange colors in the scatter plot denote two distinct categories, corresponding to anemic and non-anemic.
The Multi-Branch CNN architecture was first evaluated without the use of optimization techniques. The model demonstrated strong classification performance with accuracy, precision, Sensitivity, Specificity, F1-score and AUC were 92.65, 97.06, 89.19, 96.77, 92.96 and 97.3% respectively as shown in Figure. 6 and Figure. 7, Figure. 8. Figure 7 presents the confusion matrix, which reveals that the model achieves a high classification rate but exhibits a few misclassifications between the two classes. Figure 9 shows the distribution and cumulative distribution of prediction confidence for correct and incorrect predictions. The top graph reveals that correct predictions generally have higher confidence values clustered towards the right (closer to 1.0), while incorrect predictions have lower confidence spread more evenly across the range. The cumulative distribution in the bottom graph further illustrates that the curve for correct predictions rises sharply, indicating a high proportion of predictions with high confidence, in contrast to the more gradual increase seen for incorrect predictions. Figure 10 presents a set of sample images demonstrating the model’s predictions versus the true labels. The top two rows show anemic images, with the model correctly identifying the majority as anemic. The bottom two rows display non-anemic images, where the model again correctly predicts most as non-anemic, with only two misclassifications in the set shown. Based on the proposed model’s architecture and the specifications of the Intel Core i7-8550U CPU (estimated at 100 GFLOPS), the total number of floating-point operations (FLOPs) required for a single inference was calculated to be
A feature extractor from Multi-Branch CNN architecture followed the HO-optimized multiclass SVM achieved an impressive accuracy of 97.06%, significantly outperforming existing methods in the literature. Additionally, this paper investigated feature extraction using the Multi-Branch CNN architecture as feature extraction for (anemic vs. non-anemic) collected from Indian and Italian patients. The objective is to determine how well Multi-Branch CNN architecture extracts pertinent features that can be used to improve classification. The sensitivity of the model reached 100%, indicating its effectiveness in correctly identifying anemic cases. This high sensitivity is critical for medical diagnostics, where false negatives can have severe consequences. The confusion matrix revealed that the model had minimal misclassifications, reflecting its ability to accurately differentiate between anemic and non-anemic cases. The high true positive rate underscores the model’s reliability. These features are given in the optimization (HO-SVM) classifier to get higher accuracy The model sets a fixed population size of 30, the maximum number of iterations to 100, and the number of variables to two. The maximum bounds of
The Receiver Operating Characteristic (ROC) curve, used to evaluate the performance MultiBranchNet with (HO-SVM) of a classification model applied to a dataset distinguishing between anemic and non-anemic persons as shown in Figure 12. The ROC curve plots the rate of true positive (sensitivity) against the rate of false positive (1)-specificity) at various threshold settings, providing insight into the compromise between being specific or sensitive. The curve’s proximity to the top-left corner indicates high performance, with an Area Under the Curve (AUC) value of 0.973. The AUC record suggests that the model is highly effective at classifying between the anemic and non-anemic classes. This strong performance underscores the model’s potential efficiency in accurately diagnosing eye anemia based on eye image, making it a valuable model for clinical or diagnostic applications.
The SHAP force plot in Figure 13 explains a single model prediction, making it appropriate for error analysis to determine the explanation for a particular observation prediction [45]. Based on the plot, the following observations can be made:
x127 is the most influential predictor, with the highest SHAP value, indicating that it plays a significant role in determining the model’s output.
Predictors such as x65, x124, x29, and x20 also have high SHAP values, showing they are important in the classification process.
Predictors x72, x125, x87, and x42 have moderate SHAP values, meaning they have a moderate influence on the model’s predictions.
x106 has the lowest SHAP value among the listed predictors, indicating it has the least impact on the model’s decision compared to the others.
B. Comparative Analysis with Literature
This section provides a concise summary of the comparison between the proposed model and previous work. Table 5 proves that the proposed model outperforms most of the models in related work.
In previous studies on eye anemia detection, researchers faced significant challenges in achieving high accuracy and sensitivity due to the limitations of traditional machine learning models and the imbalance in available datasets. For example, models like SVM and simple CNN architectures often challenge accurately classifying anemic and non-anemic eye images, leading to not optimal diagnostic performance. These limitations were made worse by the small size of datasets, which often resulted in overfitting and poor generalization of new data.
The proposed model in this paper addresses these challenges by introducing a novel multi-branch CNN architecture combined with HO with multiclass SVM (HO-SVM). The multi-branch CNN improves feature extraction efficiency and accuracy, while the HO-SVM optimizes the classification process, leading to a significant increase in diagnostic accuracy, reaching 97.06%. Moreover, the use of SMOTE and various data augmentation techniques effectively reduces the issue of data imbalance, further enhancing the model’s robustness. This model not only overcomes the limitations of previous works but also sets a new benchmark for non-invasive eye anemia diagnosis, particularly in resource-limited settings. The proposed CNN model has been evaluated against other pre-trained deep CNN models by comparing the number of floating-point operations, as presented in Table 6. It can be observed from Table 6 that the proposed method requires only
Limitations and Implementation Considerations
A. Limitations and Potential Failure Cases
Despite the promising results achieved by our proposed MultiBranchNet with HO-SVM model, several limitations and potential failure cases warrant discussion. The model’s performance, while robust on our test dataset, may be affected by several factors in real-world deployment scenarios.
1) Dataset Limitations
The current study utilized a dataset of 211 eye images from Italian and Indian populations. We acknowledge that this relatively small sample size may not fully capture the diversity in ocular features across different ethnicities, age groups, and geographical regions. However, obtaining large datasets in medical fields, especially for conditions like eye anemia, presents significant challenges due to privacy concerns, ethical issues, and resource limitations. This dataset, despite its size, represents one of the first publicly available resources for eye anemia detection and offers valuable insights into this domain.
To mitigate the impact of the small dataset size, we employed several strategies:
Data augmentation techniques, such as rotation, scaling, and translation, were applied to artificially increase the diversity and size of the training set. This helps the model learn more robust features and reduces overfitting.
The Synthetic Minority Over-Sampling Technique (SMOTE) was employed to address the class imbalance issue, ensuring a more balanced representation of anemic and non-anemic samples during training.
The multi-branch CNN architecture (MultiBranchNet) was designed to efficiently extract a diverse set of features from the limited data, leveraging parallel processing to capture both local and global patterns relevant to anemia detection.
Progressive kernel size scaling (32/16, 64/32, 128/64) across modules that creates a hierarchical feature pyramid specifically tuned for eye anemia detection.
2) Image Quality Dependencies
Our model’s performance is highly dependent on image quality, which can vary significantly in real-world settings. Factors that may adversely affect classification performance include:
Poor lighting conditions during image capture
Suboptimal focus or resolution
Inconsistent angle of capture
Presence of external factors such as eye irritation, allergies, or conjunctivitis
Varying devices used for image capture with different color calibrations
3) Hemoglobin Threshold Considerations
The binary classification approach used in this study (anemic vs. non-anemic) employs a hemoglobin threshold of 10.5 g/dL. This simplification, while necessary for classification, does not account for the continuous nature of hemoglobin levels and their clinical interpretation. Cases near this threshold boundary showed higher misclassification rates in our analysis. Additionally, different medical guidelines may recommend varying thresholds for different populations (e.g., children, pregnant women, elderly), which our current model does not address.
4) Justification for Multibranchnet Architecture
The choice of MultiBranchNet for our eye anemia detection task was based on several key considerations:
Domain-specific requirements: Eye anemia detection requires a model that can effectively capture and integrate multi-scale features, as the signs of anemia can manifest at different levels of detail in the eye image. MultiBranchNet’s parallel processing and fusion of multiple scales directly address this requirement, making it a suitable choice for this specific domain.
Limited dataset size: Given the challenges in acquiring large-scale datasets for eye anemia, it is crucial to have a model that can learn robust features from limited samples. MultiBranchNet’s efficient processing and fusion of multi-scale features enable it to extract discriminative patterns even from small datasets, reducing the risk of overfitting and improving generalization.
Interpretability and trustworthiness: In medical applications, it is essential to have models that are not only accurate but also interpretable and trustworthy. MultiBranchNet’s multi-branch structure provides a natural way to visualize and understand the learned features at different scales, enhancing the interpretability of the model’s predictions. This is particularly important for building trust and facilitating clinical adoption of the proposed method.
Computational efficiency: In resource-constrained settings, such as low-income regions where eye anemia is more prevalent, it is important to have models that can operate efficiently on limited computational resources. MultiBranchNet’s parallel processing and streamlined architecture make it more computationally efficient compared to deeper, sequential models, enabling its deployment in a wider range of settings.
B. Practical Implementation Considerations
The translation of our proposed model from a research prototype to a deployable clinical tool requires careful consideration of several practical aspects.
1) Mobile Integration Pathway
Deploying the MultiBranchNet model on mobile platforms represents a promising avenue for increasing accessibility, particularly in resource-constrained settings. Our analysis indicates that the model’s computational requirements (
To optimize mobile performance, several techniques could be implemented:
Model quantization to reduce precision from 32-bit to 8-bit or 16-bit
On-device inference optimization using hardware acceleration (GPU).
2) User Interface Considerations
An effective user interface for clinical deployment should balance simplicity with functionality. Key elements should include:
Simple image capture guidance with real-time feedback on image quality
Clear instructions for optimal lighting and positioning
Immediate presentation of results with confidence scores
Ability to store and compare results over time
Integration with patient records where available
Offline functionality for areas with limited connectivity
Multilingual support for global deployment
3) Clinical Workflow Integration
The successful integration of our model into clinical workflows requires consideration of:
Training requirements for healthcare workers with varying levels of technical expertise
Standard operating procedures for image capture and interpretation
Guidelines for result verification and confirmation
Referral pathways for positive cases
Documentation and record-keeping protocols
Integration with existing electronic health record systems where available
Fallback procedures for cases where the system cannot provide high-confidence predictions
4) Regulatory and Ethical Considerations
Implementation in clinical settings would require navigating regulatory pathways specific to each region. Key considerations include:
Classification as a medical device under various regulatory frameworks (e.g., FDA, CE marking)
Data privacy compliance for patient information
Requirements for clinical validation studies
Ethical considerations regarding algorithmic bias and equity of access
Ongoing monitoring and updating protocols
Liability considerations for misclassification cases
Conclusion and Future Work
This study presents an innovative automated model for the early and non-invasive diagnosis of eye anemia, combining the MultiBranchNet architecture with a Hyperparameter-Optimized SVM (HO-SVM). The proposed system achieved outstanding performance with 97.06% accuracy in distinguishing between anemic and non-anemic patients using ocular images. This demonstrates its strong potential as a reliable screening tool that could significantly improve anemia detection, particularly in resource-limited settings. The model’s success stems from several key innovations, including robust data augmentation techniques (rotation, translation, and shear) to overcome dataset limitations, HO-SVM optimization for enhanced classification performance, and the efficient parallel structure of MultiBranchNet that enables real-time processing. The integration of Explainable AI (XAI) techniques, particularly SHAP, further strengthens the model’s clinical applicability by providing transparent insights into its decision-making process.
Looking ahead, future research should focus on three primary areas to advance this technology toward clinical implementation. First, model enhancement efforts should explore continuous learning frameworks to improve adaptability across diverse populations and imaging conditions. The development of multiclass classification capabilities could provide more nuanced anemia severity stratification, moving beyond the current binary detection approach. Additionally, investigating advanced augmentation techniques like Generative Adversarial Networks (GANs) may yield higher-quality synthetic training data to further boost model performance. Second, comprehensive clinical validation studies will be essential. This includes large-scale field testing in real-world low-resource environments to evaluate practical implementation challenges. Usability testing with healthcare workers and rigorous assessment of the model’s impact on clinical workflows will provide valuable insights for optimization. The development of region-specific models that account for local population characteristics and healthcare infrastructure constraints could significantly enhance the technology’s effectiveness in diverse settings. Finally, technology integration efforts should focus on developing smartphone applications for point-of-care deployment and exploring combinations with other non-invasive diagnostic modalities. These developments could dramatically expand the system’s accessibility and utility in various healthcare contexts.
This work establishes a strong foundation for transforming anemia screening through accessible, AI-driven technology. By pursuing these future directions, the system can evolve into a comprehensive diagnostic solution that bridges critical healthcare gaps in underserved regions while maintaining an optimal balance between accuracy, efficiency, and interpretability. The potential impact of this technology extends beyond improved diagnosis to include better health outcomes and more efficient resource allocation in global healthcare systems.