Exudate Regeneration for Automated Exudate Detection in Retinal Fundus Images

This paper presents a framework for the automated detection of Exudates, an early sign of Diabetic Retinopathy. The paper introduces a classification-extraction-superimposition (CES) mechanism for enabling the generation of representative exudate samples based on limited open-source samples. The paper demonstrates how the manipulation of Yolov5M output vector can be utilized for exudate extraction and super-imposition, segueing into the development of a custom CNN architecture focused on exudate classification in retinal based fundus images. The performance of the proposed architecture is compared with various state-of-the-art image classification architectures on a wide range of metrics, including the simulation of post deployment inference statistics. A self-label mechanism is presented, endorsing the high performance of the developed architecture, achieving 100% on the test dataset.


I. INTRODUCTION
The World Health Organisation (WHO), in its world-vision report 2019, estimated the figure for visually impaired people to be at a staggering 2.2 billion [1]. What is further astounding, is the fact that at least 1 billion of these cases could have been prevented [1], via early-stage detection followed by timely and targeted treatment. Focusing on the developed countries, we find that Diabetic Retinopathy (DR) is the prominent cause of visual impairment for the working population [2], [3].
Retinal Imaging provides a visual output of the back of the patients eye (retina), known as a fundus image. This method of eye screening provides an effective, exteroceptive examination of the retina, and a systematic mechanism for optometrists to look for microcirculation in the patient's retina [4]. Clinical inspection of retinal fundus images not only provides a check mechanism on existing disease but more importantly can lead to the unearthing of early-stage indicators of developing diseases such as DR, triggering timely treatment [5]. Thus, this is a testification of the The associate editor coordinating the review of this manuscript and approving it for publication was G. R. Sinha . importance of timely retinal inspection and its implications on suppressing disease progression.
The above premise is a natural segue into the rationale for exploration of automated retinal fundus examination aimed at providing patients with timely consultations and triggering of patient-specific disease suppression plans. Advancements in deep learning, in particular convolutional neural networks (CNN) enable the generalisation of networks on determining objects of interest from image-based inputs. Unlike Machine learning approaches, CNNs belong to the deep learning facet of AI, thus manual feature engineering is no longer required albeit at the cost of more computational demand [6].
The delicate nature of disease detection via visualization can make the task of identification via human inspection a very complex process requiring specific domain expertise coupled with time and cost factors. Furthermore, the human-bias factor cannot be ignored in particular for disease detection where a misclassification can lead to the exacerbation of the condition, leading to lifelong implications for the patient.
Exudates are one of the early stages of DR belonging to the non-proliferative stage of DR. They are a resultant of damaged pericytes, resulting in increased vascular permeability.
This leads to leakage of molecules such as serum proteins and lipids into the retinal tissue. On a fundus image this phenomenon can be visually observed in the form of 'yellowish flecks' on the retinal tissue [7]. Hence, the development of automated Exudate detection can assist optometrists with timely detection of this early stage of DR leading to timely suppression plan for the patient.

A. LITERATURE REVIEW
Researchers are actively working in the area of automated Retina disease detection through the implementation of deep learning. In particular, automated detection of exudates has been extensively investigated as it is one of the early signs of non-proliferate DR. Tan et al. [8] propose the classification of Microaneurysms, Hemorrhages and Exudates, via a twostage detector. The classifier is based on class segmentation achieving a sensitivity range of 71.58% for the Exudates class. We feel the average result is due to the fact that, Exudates can vary significantly in shape and dimensions and hence trying to apply pixel-wise segmentation can be a difficult procedure.
Guo et al. [9] also present a segmentation classifier for Fundus image segmentation. However, the authors focus on suppressing the computational load of the network via a light-weight CNN containing only 36,000 parameters. The architecture focuses on the detection of Microaneurysms, Exudates and vessels via five publicly available datasets. The author's contribution is in the form an architecture that is designed to learn high-resolution image representations without having major implications on the networks inference speed.
Huang et al. [10] base their work on the premise that the majority of research was focused architectural design considerations rather than pathogenic causes. The authors go on to present a relational transformer block for incorporating attention mechanisms for two stages: firstly for exploiting global dependencies among lesion features and secondly facilitating interactions between lesion and vessel features. The implementation of Transformers is an interesting strategy as vision transformers can provide highly accurate performance, however, require large amounts of training data and also are computationally demanding to a level that restricts them for CPU based deployment without appropriate pruning.
Zhou et al. [11] aimed to enhance DR lesion segmentation and grading via a collaborative learning architecture. Similar to [10], the authors opt for the attention mechanism enabling for the refinement of image-level annotations with specificclass information. The proposed architecture achieved an average area-under-the-curve (AUC) for precision-recall of 70.44%.
Habib et al. [12] proposed an automated architecture for Microaneurysms detection and classification via fundus images of the retina. The methodology was based on feature extraction via the Gaussian matched filters followed by extracted features as input to an ensemble classifier. However, the reported receiver operating curve (ROC) was below par at 41.5%, potentially indicating towards the inability of the network generalizing during the training phase.
Romero et al. [13] proposed a 'Bottom-Hat technique for Microaneurysms detection. The objective was to retain the 'reddish-region' within the fundus image whilst removing the blood vessels. This was followed by the applying of radon transforms and principal component analysis (PCA) for identifying true Microaneurysms. The proposed mechanism achieved an impressive classification accuracy of 95.93%.
Shan et al. [14] presented a stacked sparse auto-encoder (SSAE) mechanism for retinal defect detection. The methodology consisted of firstly dividing images into small patches followed by SSAE for feature learning from the extracted patches. The classification of patches was based on the selected features for distinction between the actual and false Microaneurysms with an impressive AUC of 96.2%.
Tang et al. [15] propose a novel 'splat-feature' classification mechanism for fundus based retinal disease detection. The supervised approach is initiated by the partitioning of non-overlapping segments. Each segment known as slat, compromises of similar pixels with respect to spatial location and color. The optimal features from the splats are selected via a wrapper approach and used as input for training a classifier. the trained classifier achieved an AUC of 96%.
Murugan et al. [16] present an automated retinal based Microaneurysms detection via CNN. The methodological approach consists of three distinct steps: data pre-processing, class candidate detection and finally pixel-based classification. The novel CNN architecture is coupled with the majority voting classifier for the detection of Microaneurysms. The proposed classifier achieved an impressive AUC of 92%, outputting respectable performance when compared to stateof-art models.
Abdullah et al. [17] propose the detection and segmentation of the retinal optic disk as a preliminary due to its importance as a key component within the fundus images. The methodology is based on morphological operations, grow-cut algorithm, and the Hough transform. Morphological operators, exacerbate the optic disk whilst removing the vasculature. The center of the optic disk is then estimated via the Hough transform and finally the grow-cut algorithm is implemented for precise segmenting of the optic disk. The developed mechanism is tested on five publicly available datasets, achieving optimal accuracy (100%) on 3/5 whilst obtaining 99.09% and 99.25% on the other two.
Summing the literature review, we can say that although there is a significant amount of recent research in Retinal pathologies detection via deep learning, the majority of the work is focused on the application of segmentation strategies for regional defect exacerbation before performing any classification. In cases, where segmentation is avoided, no particular focus is given to the computational implications of the selected or developed architectures. We also find that there are a handful of retinal datasets and in some cases are heavily orientated towards a particular class whilst others have a VOLUME 11, 2023 lack of fundus images for sufficient training and validation such as the DIARETDB1 public dataset [18] containing a total of 89 colored fundus images. From the 89 images 41 contain Exudates, our class of interest. This has led to recent but limited work in synthetic dataset creation [19], [20], however, again the majority of the research is focused on dataset creation via computationally expensive means such as segmentation followed by extraction [21].

B. PAPER CONTRIBUTIONS AND ORGANIZATION
Based on the gaps highlighted in the literature review we propose a three-phase mechanism, termed 'classificationextraction-superimposition' (CES). The proposed mechanism aims to address the three key limitations of the reviewed literature, that is: computationally lightweight architecture and representative synthetic data generation focusing on Retinal Exudates.
The first stage of CES, focuses on the development of a representative Exudate regeneration (rEr) mechanism for synthetic data creation. For this object detection is utilized by training the YoloV5M model on an open-source retinal image dataset. The trained architecture is tuned for determining bounding-box co-ordinates for Exudate candidates. The highest confidence candidate is extracted from the model output vector and saved to a JSON file with its dimensions and pixel data. Post rEr, we implement a 'random-fusion' algorithm for super-imposing the extracted Exudates onto normal fundus images containing no signs of DR. This enabled us to collate a synthetic exudate dataset.
We then address the issue of computational efficacy by training our custom CNN architecture purely based on Exudate classification via image labels rather than segmentation or defect localization. We back our approach by highlighting the fact the Exudates are an early stage of DR. Hence, the purpose of the automated detection is to provide a timely mechanism of highly likely defective samples for the clinical expert to further inspect, rather than trying to localize every single instance of Exudate on a Fundus image. Finally, as a result of our high performant CNN, we present a self-label mechanism providing 100% classification accuracy on the test dataset.

II. METHODOLOGY A. DATASET
The initial dataset for initiating our research was the publicly available EyePACS [22] dataset, containing around 88,702 high-resolution retinal fundus images. The fundus images were procured via a group of subjects. For each subject, two fundus images were obtained, one of each eye. The images were taken with varying camera specifications, resulting in a high degree of internal class variance i.e., shading, color contrast. Table 1   From Figure 1 a better visual comprehension of Exudates can be gained. The Exudates can be differentiated from the other retinal components such as the optic disk, macula, and capillaries by their 'yellowish' color, and 'fleckish' composition. However, the non-uniformity in its size, shape and starkness can make its detection more challenging. For example, Figure 1 (a) contains Exudates on a much smaller scale as compared to the manifestation of Exudates in Figure 1 (c).
Furthermore, we observe that internal variance within the Exudates exists on many fronts due to internal factors such as DR stage of retina on a patient-by-patient basis but also due to external factors such as variations in camera quality.

B. PROPOSED METHODOLOGY OVERVIEW
As mentioned earlier the EyePACS dataset had a large imbalance towards the normal DR images. However, as our research was focused on Exudate detection and classification, we decided to manually, inspect and extract a small sample of Exudates, guided by the label csv file provided for each sample. The objective was to create an exclusive Exudates dataset, this would then be used for the training of an object detector (Yolov5M). The object detector would be tuned to detect Exudates, crop the relevant pixels along with the bounding boxes dimensions and save to a JSON file. Thus, enabling us to apply our proposed CES mechanism for scaling the dataset via random super-impositions of Exudates on normal fundus images. The scaled dataset would then be used to train our custom lightweight CNN architecture providing high accuracy but also a mechanism for carrying out self-labelling of Exudates for new retinal images solving the issue of representative Exudate scaling for development purposes. The high-level methodological components of the research are presented in Figure 2.

C. EXUDATE LOCALIZATION/EXTRACTION VIA YOLOV5M
The first phase involved the training of an object detection architecture on the EyePACS dataset. The aim here was to extract the Exudate regions in pixel and dimensional format i.e., bounding boxes so that a more representative dataset can be generated, for specific CNN training as per the pipeline presented in Figure 2.
When it comes to object detection, all candidate architectures can be placed into one of two categories: two-stage or single-shot detectors. Two-stage detectors operate by firstly determining candidate boxes via a regional proposal network (RPN) before moving onto regression and classification of bounding boxes. Whilst the latter, completes both stages in one-shot, eliminating the requirement of RPN.
Yolov5(M), a single-shot detector, was selected for training on the EyePACS dataset. The dataset provided a csv file with the corresponding annotations for each image sample via an ID. It may be questioned as to why the Yolov5 architecture was selected, whilst knowing that in general two-stage detectors provide more accuracy albeit at the expense of higher computational requirements. The justification was due to the fact that firstly the dataset contained ∼8000 images, hence two-stage detectors would require significantly higher training time and GPU allocation, both of which had constraints due to the utilization of Google Collaboratory, for our training needs. Secondly and more importantly, the aim of this phased was not to get the highest accuracy across all the classes but rather train a generalized network that can confidently determine Exudates via bounding boxes. These predictions would then be extracted from the architecture for further processing and generation of a new representative dataset for training a custom CNN that is both light-weight and highly accurate. The internal structure of the Yolov5 is presented in Figure 3. The architecture consists of three key pillars: backbone, neck and head. The backbone serves the purpose of feature extraction from the images used as inputs. Yolo has many variants with incremental improvements, in Yolov5, Cross Stage Partial Networks (CSP) are introduced as the backbone of the network. CSP's aim to reduce the computational demand due to duplicated gradient information flow, with experimental results showing a reduction in computations by 20% on the Image-net dataset [23].
The intermediary block is known as the model neck. Here, yolov5 deploys a Path Aggregation Network (PANet) for generating feature pyramids leading to the models enhanced capability to generalize on class of interest appearing in various dimensions and scales. The final layer is known as the Model Head. It applies anchor boxes on the generated features, generating output vectors containing the predicted bounding box dimensions, confidence, and predicted class. The architecture makes use of both sigmoid and leaky-ReLu activation functions, utilizing the latter within the hidden layers for non-linearity and the former in the final detection layer. For the optimizer, Stochastic Gradient Descent (SGD) was selected with 'Binary Cross Entropy With Logits Loss' as the loss function. As our training filters for the Exudates class, we aim to extract the output vector, removing redundant vector points and saving to a JSON file for the proposed CES Mechanism.

D. PROPOSED CES MECHANISM
The output vector obtained from the Yolov5 trained architecture as shown previously in Figure 3, is in the form:  where x, y are coordinates representing center of the bounding box, w, h represents width and height of the predicted bounding box, c provides the predicted class, cf is the confidence score, selected based on Jaccard Index (Intersection Over Union), where B p = predicted ground truth and B g = actual ground truth: Additionally, the id for the prediction is also added in the output vector. As we are interested in only the bounding box dimensions, we extract these from the output vector into a JSON file, hence the new output vector is; For Height.
Gen vec 01 = x, y, N ew vec_01_res_w , N ew vec_01_res_h The resultant vector Gen vec_01 , containing the generated Exudate sample based on the original vector sample N ew vec_01 , is then super-imposed onto the randomly selected fundus image of a normal retina. Due to the variance found in the starkness of the different Exudates samples along with background variations in the retinal image caused by varying camera quality, the newly constructed image was normalized as per max-min normalisation.
The visual interpretation of the first stage of CES i.e., representative Exudate regeneration (rEr) process flowchart is shown in Figure 4. Figure 5, presents a visual process flow showing the transformation of the dataset as it progresses starting with the EyePACS dataset, through to the generated Exudate dataset.

E. CUSTOM CNN DEVELOPMENT
After the successful extraction and transformation of the Exudates via the proposed CES process, the result was a new dataset compromising of two classes normal and Exudate fundus images of the retina. The next stage of the process was the development of a light-weight CNN architecture for training on the resultant Exudate dataset shown in Table 2.
Based on the visual characteristics of the Exudates, discussed earlier, we felt that state-of-the-art architectures such 83938 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   as the ResNet-101 may provide high accuracy but at the expense of significantly high computational demand due to their internal architectural depth. As a result, specific hardware would be required for on-site deployment increasing the cost of deployment. To avoid this we decided to develop a custom CNN basing the selection of our convolutional blocks on impact it would have on the Multiply-Accumulate operations (MACS). Hence the development of each convolutional block was guided by keep a check on its impact on computation via (8), where K= kernel width and height, C in = number of input channels, C out = number of output channels and H out , W out = width and height of output matrix, respectively, Dwelling deeper into the internal layer impact on computation, we know that the convolutional process has a major impact on the overall computational load. Filters are a fundamental component required for carrying out the convolutional process and hence the defining of the filter parameters on a block-by-block basis has implications on the architectures generalizing capabilities and computational payload. The determination of the filter outputs was determined via (9), Where n out = Nu. of output features, n in = number of input features, p=padding size, k=kernel size, s=stride, The architecture consisted of to convolutional blocks, with each block compromising of filters, and activation function and a pooling mechanism. We decided to initiate the convolutional process with only 5 filters in the initial block followed by the doubling of filters in the following block. The implementation of a reduced number of filters as compared to the conventional approach was not only due to the suppression of computational parameters but also due to our assumption, post data inspection. That is, we felt that the CES process had been successful in generating a highly representative dataset, capturing internal Exudate variance via random super-imposition. Therefore, a large number of filters would no longer be required to learn sophisticated under-lying representation of classes. ReLu was selected as the activation function due to its computational simplicity, essentially a max operation in mathematical terms, presented in (10), if the ReLu function receives a negative input this is converted to a zero, whilst preserving any positive input in its original form, Translational invariance also known as equivariance was another key concept that had to be factored into our architecture design. Elaborating further, the aim is to detect exudates in a fundus image. We know that the Exudates don't have a fixed location i.e., always residing near the optic disk or the macula. Therefore, it is paramount for our architecture to be able to detect Exudates within a fundus image regardless of its position. Hence, to remove positional dependency we decided to implement max-pooling as an intermediary between the two convolutional blocks and before the first fully connected layer of the network. Hence, the translational invariance for a given input (I ), requires an external mathematical structure (g ) for translating the transformation (g) at the output as shown in (11), Furthermore, to address the potential of reduced accuracy due to the internal variance within the Exudates class such as size, color and regional placement, we introduced batch normalisation within both convolutional blocks. Elaborating further, we felt the high internal variance would result in samples belonging to the same class, in our case Exudates, being projected onto different regions of the feature space during training, as a result the accuracy of the network may diminish or at the very least the convergence of the architecture in the training stage would take longer due to the non-normalized feature projections. Application of batch normalisation is presented in (12), where z is the neuron outputs, z n is the normalised neuron output, m z is the mean of the neurons and s z is the standard deviation of the neurons output.
Without batch normalisation, an input layer of say, a [l−1] goes through a preserved transform and is then passed through a non-linear activation function (g [l] ) such as ReLu, giving the ultimate activation function (a [l] ) from the unit, shown in (13), After the implementation of batch normalization with a transform (BN), the output is observed as (14), It is important to note, the implementation of batch normalisation, introduces two new parameters for each unit, β & γ . If β = µ and γ = square root oγ = √ γ 2 + ε, then z n = z and the result is an identity function. This implies batch normalisation would not decrease performance of the network as the optimizer in that case can return to the identity function.
The proposed CNN architecture is presented in Figure 6. Table 3 presents the block-wise computational implications of our design. 83940 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   Figure 7 presents the self-labelling mechanism that would be initiated post CNN training. Note the resultant Exudate dataset is split into the standard train, validation and test. However, after the model has been trained and tuned to provide high accuracy on the validation set the test set is used to validate the self-label accuracy of the classifier. The labels of the corresponding images in the test set are stored in JSON file before introducing the test set images to the trained classifier. The classifier produces a label for each test sample,  which is directed into another JSON file. The two JSON files (original and predicted labels) are compared to quantify the self-label accuracy of the CNN.

A. YOLOV5M PERFORMANCE
The research started with the training of a Yolov5M architecture for the localization of Exudates and to facilitate the extraction of the corresponding bounding box prediction for initiating the CES process. Table 4 presents the training configuration for the architecture.
As mentioned in the methodology section the purpose of the objection detection stage via Yolov5M was to simply extract Exudates for input into our CES mechanism. Hence, the EyePACS Dataset was trained post class filtering to only focus on the Mild DR (class 1) and Proliferate DR (class 2) as Exudates develop at an early stage of DR and hence more likely to appear in these two classes along with Microaneurysms. The trained architecture provided an impressive mean-average-precision of 95% based on the intersectionover-union (IoU) of 50%. Based on the validation results, the output vectors for each exudate prediction were saved to a JSON file, for initiating the next phase of the project. Figure 8 presents the architectures inference on a sample image from the validation batch.
A question arises here as to the rationale for extracting the predictions for training a image classifier rather than sticking to the object localization via Yolov5M. The answer to this is manifold. Firstly, the object localization perspective, we have highlighted missed Exudates on Figure 8. These undiscovered Exudates by the model may have been the result of lack of attention during the manual annotation process for labelling the dataset, hence they did not have a detrimental impact on the model, otherwise the model performance would have been decreased. Secondly, from a clinical perspective, we observe that the model is not very robust at detecting smaller sized Exudates, this can be critical as it provides insights into the early stage of Exudate development and hence needs to be flagged by the model for further attention. So, in order to reduce the labor-intensive, expert-led, manual annotation process, we present the case for, translating this into a image classification process, and by training a highly robust classifier, we also address the issue of manual labelling, via a self-label process.

B. PROPOSED CNN PERFORMANCE
After generating the Exudate dataset presented earlier in Table 2, based on the Yolov5M prediction and the CES mechanism, the developed CNN architecture (Figure 6) was trained on the hyperparameters presented in Table 5.   To provide a granular performance evaluation the performance of the developed classifier was examined over a range of metrics; Precision, Recall and F1-score. The confusion matrix presents the class-wise classification performance. It can be observed that the Exudates class output 100% correct predictions whilst 2 samples of the normal class were incorrectly predicted as containing Exudates. Table 6 complements the results presented via the confusion matrix with an overall precision of 99%. The results are evident to the fact that our developed architecture was highly successful on generalizing on the Exudate dataset generated via the proposed CES mechanism.
However, these results do not justify the development of the CNN architecture as a lightweight structure unless a comparison is provided against present state-of-the-art (SOTA) image classification architectures, not only on data-specific but also computational and architectural performance, segueing into the next section of research, SOTA comparison.

C. SOTA COMPARISON
The comparison of various SOTA image classification architectures was based the resultant Exudate dataset. For result integrity and fair comparison each architecture was trained based on the hyperparameters presented in Table 5. It is evident that from the results presented in Table 7 that our model was able to output impressive performance across all metrics, as was the case with all SOTA architecture used for comparison. However, it is also noted that, our  CNN architecture was not the optimal performer albeit by a decrease in precision of 1%. Therefore, based on just the training and validation performance, the development of a custom CNN is not justified. But as the research objective was broader than just the training performance of the architecture, we evaluate other metrics for determining the architectural and deployment performance. Table 8 presents the architectural complexities of the respective architectures under consideration. Multiplyaccumulate operations (GMAC's) were used for benchmarking each architectures speed on the basis of the number of computations involved within the internal architecture of the network. The rationale for selecting this metric as a benchmark for model comparison was it enables appreciation of model's computational demand and hence required deployment infrastructure. It is evident from Table 8 that our CNN architecture was by far the most lightweight in terms of computational demand with Resnet-18 requiring significant computational resources at 1.82 GMAC's.
The second metric refers to the number of learnable parameters within the architecture. These parameters don't have an impact post deployment as the optimal weights during the training phase are frozen and passed for inference, essentially disabling the backpropagation process. However, this metric does have an impact on the networks training time and time to convergence based on factors such as GPU allocation. Our proposed CNN surpassed all SOTA architectures in this regard as a result of the design process selecting only two convolutional blocks followed by two fully connected layers with limited neurons.
In order to provide a robust comparison and one that can provide insights into a model's deployment performance we decided to run each network through post deployment simulation. The simulation was based on extracting each model's performance in deployment terms i.e., Frames per second (FPS) and latency. The simulation was performed on a CPU as the computation device, performing 100 forwards pass operations with a batch size of 32 and an input image resolution of 224,224 pixels. It is evident from Table 9; our   TABLE 9. CPU deployment comparison.  CNN architecture was the most feasible deployment option both in terms of the FPS and the latency with regards to inference. Whilst Google-net due to its increased internal layer depth, provided the lowest FPS at 0.27 at the highest latency (3.31 econds).

D. SELF-LABEL PERFORMANCE EVALUATION
After demonstrating the high performance of our developed architecture across a broad range of metric, the architectures self labelling capacity was tested via the proposed procedure presented earlier in Figure 7.
The original test data labels were stored in a JSON file prior to introducing the images to the trained CNN. The test data consisted of 63 samples of normal and 63 sample of Exudate Fundus images. The proposed CNN architecture carried out a forward pass on each image in the test folder and the corresponding prediction labels for each image were stored in a separate JSON file. The results are shown in Table 10. VOLUME 11, 2023 As shown in Table 10, the JSON file containing the predicted labels when compared to the actual labels for the test set provided 100% accuracy for the normal and Exudates images, respectively.

IV. CONCLUSION
In summary, we have demonstrated the successful development of a custom CNN architecture, for the detection of Exudates in fundus images. In doing so, we have provided a transitional framework, one that can be emulated by peers in the field of automated DR detection to develop representative datasets and custom CNN architectures that are tailored to provide maximum efficiency for their particular domain with respect to not only the network accuracy but also on-site deployment feasibility.
The manipulation of the output vector from Yolov5M, an object detection and localization architecture, enabled us to generate a more representative dataset focused on Exudates via our proposed CES mechanism. The generation of a rich dataset capturing key internal variance via CES, meant that we could suffice with a suppressed CNN architecture without compromising on the performance of the network.
Summing the SOTA comparison, it can be said that our proposed CNN architecture was the most suitable selection providing highest performance in 4 out of 5 metrics. The fact that the metrics were based on a broad spectrum from training performance to post deployment simulation shows that the developed architecture was not only high performant in its inference but also provided a feasibly option for deploying on a standard CPU device. The computationally light architecture enables our model to be integrated into optometry clinics standard hardware, facilitating automated inferences in a timely manner and without having to invest in costly GPU installations.
Diabetic Retinopathy is a complex condition with various signs emerging and growing as the condition progresses, such as the emergence of Microaneurysms at the non-proliferative stage progressing towards Neovascularization at the proliferative DR stage and many signs in between such as Hemorrhage, intra-retinal Microvascular Abnormality and Pre-retinal Hemorrhage. As future work, our proposed framework can be implemented for generating a representative dataset for each of these signs and developing custom high performant detection architectures, in particular focusing on early-stage conditions for timely detection and medication initiation.
MUHAMMAD HUSSAIN received the B.Eng. degree in electrical and electronic engineering and the M.S. degree in Internet of Things from the University of Huddersfield, in 2019, and the Ph.D. degree in artificial intelligence for defect identification. His research is focused on the detection of various fault in particular micro-cracks forming on the surface of photovoltaic (PV) cells because of mechanical and thermal stress. He has a particular interest in the field of machine vision, focusing on the development of lightweight architectures that can be optimized for deployment on edge device and ultimately on the production floor. He is also researching into design-level architectural interpretability, with a focus on explainable AI for sensitive fields, such as medicine and healthcare.
HUSSAIN AL-AQRABI received the M.Sc. degree in computer networks and the Ph.D. degree in cloud security from the University of Derby, U.K. He is currently a Professor with the Department of Computer Science and the Deputy Course Leader at the University of Huddersfield, U.K. He also received a Postgraduate Certificate in higher education from the University of Huddersfield. He is a fellow of the Higher Education Academy. In addition to his university education, he holds industry certifications, including EC-Council Certified Ethical Hacker, Microsoft Certified Educator, and Microsoft Certified IT Professional on Windows Server and he is also Cisco Certified in routing and switching. He has published nearly 50 publications in peer-reviewed journals, international conferences, and book series. He is a reviewer for many scientific journals, international conferences, and workshops. His research interests include cloud security, multiparty authentication, digital manufacturing, the Industrial Internet of Things, artificial intelligence, distributed ledger, network security, optimization, secure protocol development, and evaluation.
MUHAMMAD MUNAWAR received the B.S. degree in computer science from COMSATS University Islamabad, Wah Campus, in 2021. He is currently a Software Engineer at Multi-National Company. He has a particular interests in the field of machine vision and embedded systems, focusing on the development of vision applications, refining model architectures for different research tasks, and developing custom modules that are suitable for his research.
RICHARD HILL (Senior Member, IEEE) is currently the Head of the Department of Computer Science, and the Director of the Centre for Industrial Analytics, University of Huddersfield, U.K. He has published over 200 peer-reviewed articles and has been a recipient of several best paper awards, having been recognized by the IEEE for outstanding research leadership in the areas of big data, predictive analytics, the Internet of Things, cyber physical systems security, and industry 4.0, and has specific interests in digital manufacturing.
SIMON PARKINSON received the degree (Hons.) in secure and forensic computing from the University of Huddersfield, in 2010, and the Ph.D. degree, in 2014. He is currently a Reader (an Associate Professor) with the Department of Computer Science, University of Huddersfield, and is leading the Centre for Cyber Security. His research interest is in developing intelligent systems for manufacturing and cyber security. This involves his continuing research of developing and utilizing artificial intelligence for task automation. His research interests include cyber security focused as cover aspects, such as access control, vulnerability and anomaly detection, learning domain knowledge, mitigation planning, and software tools to aid situation awareness. His research is driven to address to global security skills shortage by equipping non-specialist users with software tools to perform expert-equivalent security analysis. Previous security analysis tools that he has released have in excess of 35k users.