Deep Learning in Cervical Cancer Diagnosis: Architecture, Opportunities, and Open Research Challenges

Nowadays, deep learning (DL) is a popular tool used in various applications in different fields, including the medical domain. DL techniques can cope with several challenges, which are difficult to resolve via traditional artificial intelligence (AI) techniques. Cervical cancer (CC) is one of the leading reasons for death in females and ranks second after breast cancer, with more than 700 mortalities daily. This number is estimated to be 400,000 annually by 2030. However, if the cancer is detected in the early and precancerous stages, it is completely curable. Pap smear and colposcopy are the most widely used screening methods for the detection of cervical cancer. But manual screening approach suffers from a high false rate due to human errors. To overcome this challenge, machine learning (ML) and DL-based computer-aided diagnostic (CAD) techniques are being extensively expanded to automatically segment and categorize cervical cytology and colposcopy images. These methods increase the accuracy of detecting different stages of cervical cancer. Hence, there is an increased interest in creating computer-aided solutions for CC screening, especially in less-developed countries where the majority of cervical cancer-related fatalities occur. This review overviews state-of-the-art approaches that use DL techniques to analyze cervical cytology and screening images. It reviews and discusses relevant DL techniques, their architectures, classification methods, and the segmentation of cervical cytology and colposcopy images. Finally, it reviews the DL algorithms that are currently used in CC screening and offers useful insights, research opportunities and future directions in this field.


I. INTRODUCTION
The rapid development of AI and ML technology, and digital platforms in the recent decade has enabled novel applications in the field of health. Among them, DL has the potential to provide a wide range of new opportunities for many medical domains. For instance, it could be used to produce quicker and more accurate diagnoses, which would then result in more precise and individualized therapy.
The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang .

A. DEEP LEARNING (DL)
DL is a subdivision of AI and ML that was very successful in various sectors such as health-care, business, education, and government. By achieving human-equivalent performance in some tasks, DL has effectually replaced other Mltechniques for applications like computer vision recognition and processing. DL consists of multiple layers of data processing computational models that enable learning by presenting input data via various levels of abstraction. Recently, DL has been used successfully to solve actual issues in an extensive range of applications [1]. DL methods are an excellent source of VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ development in medical image analysis in various research and clinical fields, such as in the detection and diagnosis of different types of cancers [2]. In layman's terms, cancer is a type of disease in which some cells have become abnormal and divide uncontrolled, occurring in certain organs or tissues but also sometimes spreading to other organs in the body [3]. Globally, one of the leading causes of mortality is the different types of cancer, such as CC, bladder cancer, colorectal cancer, and breast cancer [4].

B. CERVICAL CANCER
CC is a type of cancer that affects women's cervix [5]. It is the fourth most common cause of cancer death among women worldwide [6]. However, it has become the second most common cancer among women, especially in developing countries, after breast cancer, with more than 700 mortality daily and estimated by 2030 to be 400,000 annually, of which 90% will occur in developing countries [5], [7]. In 2012, over 528,000 cases of CC were diagnosed, and more than 266,000 women died [8]. Additionally, in 2018 more than 311,000 death was reported due to CC among women aged 20 to 39 [9], [10], [11]. More than 60,000 CC instances were detected, and over 25,000 death were reported in European countries each year [12]. CC is a type of cancer that is easily treatable if diagnosed in the early stages., Because more than 95% of CC cases are associated with the human papillomavirus (HPV), which is sexually transmitted, it is possible to prevent the HPV infection and thus the ensuing CC through vaccination, especially in young girls. For older women who are sexually active, the recommended method to screen for CC is through Pap smear screening. Typically, a cervical biopsy and screening procedure are used to make the diagnosis. For the exploration of the advancement of CC, also known as cancer staging, image processing techniques can be used. These facts and estimates highlight the importance of Computer Aided Diagnosis (CAD) systems [13].

C. PROBLEM STATEMENT
Cervical cancer (CC) is among the leading causes of death around the globe, imposing a significant challenge to scientists and health care providers dealing with cervical cancer patients. None of the existing solutions can accurately detect the early stages of cervical cancer due to the limitations and the type of medical detection tests used in those solutions. Detection of cervical cancer is mainly through Pap smear screening as well as colposcopy, the two procedures that are highly reliant on professionally trained specialist doctors who are often scarce in low-resource countries and rural areas. Recent deep learning-based solutions provide notable effectiveness in object detection and classification speed and accuracy, with specific applications in health monitoring. Although several previous studies had reported a combination of deep learning-based algorithms with standard cervical cancer screening tests such as colposcopy and Pap smear screening, they are not able to detect early stages of cervical cancer due to their limitations and the type of medical detection tests that are used in those solutions. Moreover, the accuracy of any proper solutions needs to reach a high and accepted level in all stages of cervical precancer and cancer to support required timely treatments and supports. Therefore, there is an intense need to further improve existing deep learning-based digital solutions for timely and accurate cervical cancer diagnosis and detection of cervical cancer in all stages, especially in the early stages, to avoid morbidity and mortality, especially in low-income countries.

D. LIST OF CONTRIBUTIONS
The research mainly aims to provide a comprehensive review of state-of-the-art literature related to DL techniques, cervical cancer diagnosis automated approaches, and the existing solutions for detecting CC using DL algorithms and digital colposcopy images.
• A survey of existing solutions for diagnosing cervical cancer using DL techniques.
• Existing research gaps and potential problems of the existing studies on detecting cervical cancer using ML methods are presented.

E. ORGANIZATION OF THE ARTICLE
The remaining layout of this review is as follows: it introduces image processing in medical diagnosis in Section II. Section III includes an overview of cancer diagnosis using image processing. It is followed by a comprehensive review of CC detection approaches and DL in CC detection in Section IV and Section V, respectively. Then, thorough research opportunities and future direction in Section VI. Section VII summarizes the conclusion and prospective applications. Figure 1 illustrates the organization of the chapters of the article.

II. RELATED WORK
Several existing works have proposed solutions to detect and classify CC using DL algorithms. Ghoneim et al. proposed a DL-based solution using a Convolutional Neural Network-Extreme Learning Machine (CNN-ELM) [14]. It recognizes and extracts deep-learned properties from cell images using a Convolutional Neural Network (CNN) algorithm. Then, an extreme learning machine (ELM)-based classifier is employed to classify the input images. The CNN model is applied to learn transmission and fine-tuning, as shown in Figure 2. Ghoneim et al have used the Herlev database for the experiments. The detection issue with two classes was solved with 99.5% of accuracy by the suggested CNN-ELM-based system, including normal and abnormal, and achieved 91.2% accuracy for the classification problem into seven different classes of cancer using cellular images. However, there is no information on the segmentation accuracy, sensitivity, and specificity presented in this research work [14]. Harangi et al. concentrated on the segmentation of cell issues in automated Pap smear images, which is one of the prerequisites for  early diagnosis of CC detection in its primary stages. They implemented DL-based techniques, particularly, Fully Convolutional Neural Networks (FCNNs). Large labeled datasets are necessary for the training of DL-based techniques., However, occasionally the corresponding annotation may have low accuracy and ambiguity.
Harangi et al. have indicated that it is accomplishable to improve the effect of training on uncertain ground truth by using a combination of DL-based using segmentation approaches and corresponding traditional image processingbased techniques. They combined the outputs of the FCNN and superpixel-based segmentation algorithm with 86.67% accuracy. Also, it achieved a sensitivity of 66.13% and a specificity of 96.15% [15]. In this method, the superpixel of segmentation divides millions of pixels into numerous uniform sections [16], [17], [18], [19].
In another study by Chikhaoui et al, as a solution, the Cellular Neural Network algorithm is suggested to identify cancer cells in real-time by performing Pap smear image  processing. The Pap smear procedure is essentially a microscopical method for examining the cells of the cervix for the existence of malignant or precancerous cells [20]. Numerous patterns are compounded and modified to create a perfect CNN algorithm for detecting cancer cells with 115 Pap smear images in total. They used a CNN of MATLAB-based to automatically detect CC cells in which patterns divide the cell's nucleus. According to the simulation findings, the suggested CNN algorithm has an accuracy of more than 88% for automatically identifying CC cells [21]. Alyafeai and Ghouti have developed a completely automated pipeline for the classification and detection of CC using Cervigram images. Two different DL models composed the proposed pipeline. Model one is 1000 times quicker in detecting the cervix region. Figure 3 illustrates the model of one structure, while the second model assists in classifying cervical tumors via self-extracted features, as shown in Figure 4.
These features are further dissected using two simple replicas based on the CNN, whereby they used two datasets of Cervigram studied to train and evaluate major components of DL pipelines [5].
Yasunari et al. applied CNN for the classification accuracy of specificity and sensitivity using images of HPV status to envisage the underlying pathology of CC [22]. A CC cell identification and classification method based on CNNs was provided by Ghoneim et al. In this work, CNN has been applied by fine-tuning and transfer learning. Variations to the Multi-Layer Perceptron (MLP), Auto Encoder (AE), and Extreme Learning Machine (ELM) based classifiers were also researched [14]. Fekri-Ershad provided a method with high diagnostic accuracy, as shown in Figure 5, along with a sample of Pap smear images classification model employing various classifiers, such as K-Nearest Neighbor (KNN), J48 Tree, MLP, and Bayesian Network [23]. Likewise, Zhang classified precancerous cervical lesions VOLUME 11, 2023 via pre-trained, closely connected CNN in the form of a computer-aided diagnosis method, as shown in Figure 6 [24].
In another study, Miyagi studied AI and DL techniques used for the classification of CC squamous epithelial lesions (SIL) using colposcopy images. Colposcopy is a medical diagnostic procedure to visually examine the cervix using a machine called a colposcope. Out of 330 patients who endured biopsy and colposcopy by oncologists, 97 and 213 were diagnosed with low and high-grade SIL, respectively. An AI classifier combined with 11 layers of CNN was employed and trained. The findings of this study showed a comparative analysis of accuracy, sensitivity, and specificity of the conventional colposcopy diagnosis for pathological HSIL were 0.828, 0.859, and 0.658, respectively, whereas the Area under the Curve (AUC) of the receiver operating attribute was 0.826 ± 0.052. These outcomes revealed the better performance of the AI classifier over human-based diagnosis. However, it lacks substantial inference. Moreover, this current framework needs the validation of the classifier [22]. The summary of related works are shown in Table 1.

III. IMAGE PROCESSING IN MEDICAL DIAGNOSIS
The technique of aligning two or more pictures based on appearance is called image registration, often referred to as image fusion or image matching. Finding the best spatial transformation to align the underlying anatomical features is the goal of medical picture registration. Numerous clinical applications, including image guiding, motion tracking, segmentation, dosage accumulation, image reconstruction, and others, employ medical image registration. From the perspective of the input image, there are four categories of registration techniques: multimodal, unimodal, intra-patient, and interpatient. The three main categories of registration methods are rigid, affine, and deformable from the perspective of the deformation model. From the standpoint of the Region of Interest (ROI), registration techniques may be divided into groups based on anatomical locations, such as brain, lung, and so forth. According to the dimensions of the image pair, there are three types of registration techniques: 3D to 2D, 3D to 3D, and 2D to 2D/3D [33].
The best chance of saving many lives is through the early identification of cancer. For these kinds of cancer diagnostics, visual inspection and manual procedures are frequently employed. This manual assessment of medical pictures requires a significant amount of time and is quite error-prone. CAD systems were presented in the early 1980s to help clinicians analyze medical images more effectively [34]. The most important stage in implementing Mlis feature extraction. For various cancer types, several feature extraction techniques have been researched [35].
These feature extraction-based techniques do, however, have drawbacks. To address these issues and improve performance, representation learnings were proposed by Bengio et al. in 2013 [36]. The benefit of DL is that it can immediately produce a high-level feature representation from unprocessed photos. Also, DL, feature extraction, and picture identification are being made in parallel using Graphics Processing Units (GPU). For instance, CNNs have demonstrated promising performance in the detection of cancer [37].

IV. CANCER DIAGNOSIS USING IMAGE PROCESSING A. PRE-PROCESSING
Pre-processing, also known as one of the image noises, is the first phase in the detection process which involves enhancing the quality of an image so that it may be utilized further by eliminating undesired image information. Raw images contain noises, so this is the first step in the detection procedure. If this issue is not appropriately addressed, the categorization may contain several errors. This pre-processing is necessary due to the poor contrast between the skin lesion and the encompassing healthy skin, the uneven border, and the skin artifacts, which include hairs, black frames, and skin lines.
These filters may be an adaptive wiener, adaptive wiener, adaptive median, adaptive mean, or adaptive median. For instance, misdiagnosis might result from a picture with hairs and a lesion. By conducting pre-processing operations such as vignetting effect removal, contrast adjustment, color alteration, hair removal, picture smoothing, normalization, and localization, the image noises are intended to be reduced or eliminated. Accuracy is increased with the correct pre-processing task combination. Black frame removal, automated color equalization, hair removal, dull Razor, Karhunen-Loeve transform [38], non-skin masking, Gaussian filter, pseudo-random filter, color space conversion, and contrast enhancement are a few of the pre-processing techniques.
In the case of brain cancer imaging, to modify the contrast, the brain cancer magnetic resonance imaging (MRI) pictures are first converted to greyscale and then subjected to a smoothing procedure [39]. Brain Extraction Tool (BET) skull stripping is another procedure used on brain MRI scans, along with the removal of brain tissues from different areas of the skull [40]. The Computed Tomographic pictures achieved for the diagnosis of lung cancer are pre-processed using X-ray equipment by first turning them into grayscale images, then going through the normalization process, and finally, noise decrease. These photos are then transformed into binary images, where the undesirable content is subsequently eliminated [41].
Pre-processing in breast cancer, in particular, entails separating tumors from surrounding tissue, removing the breast border, and suppressing the pectoral muscle. Mammogram images that are employed to diagnose breast cancer contain a variety of sounds, including tape artifacts, low-intensity labels, and high-intensity rectangular labels. As a result, preprocessing is used for mammography labeling, orientation, and segmentation [42].
Transrectal Ultrasound (TRUS) pictures, which have poor resolution and intrinsic noise, are used to diagnose prostate cancer. The tree-structured wavelet transform (TSWT), directional wavelet transform (DWT), and tree-structured non-linear filtering (TSF) are all parts of the pre-processing module that is used to reduce noise and artifacts [43], [44]. Additionally, you might incorporate various techniques for improving images based on color and shape alterations, such as: The horizontal and vertical rollover components of an image rollover alternate. Every time an operation for processing images is chosen at random, it is the processing of brightness images, where the upper limit of the variation coefficient is specified, and an integer is chosen at random from a range for processing brightness. Processing a saturated image where the coefficient of change range is adjusted to a good value and the color space of the image is changed to (hue, saturation, value) HSV. Each time, the picture in the saturation space is multiplied by a random value drawn from the coefficient range. When processing a (red, green, and blue) RGB color picture with a fixed coefficient of change, three R, G, and B component values are chosen at random. Each time, one or more of the four processing techniques is randomly chosen to increase the variety of the picture data [45].

B. IMAGE SEGMENTATION
Segmentation is the division of the input sample into areas from which the data required for additional processing may be retrieved. In essence, segmentation involves eliminating the background of an image from a ROI. The area of the image we wish to use is the ROI. When it comes to malignant photos, we require the lesion portion to separate the sick portion's characteristics. There are four primary categories of segmentation: model-based segmentation, region-based segmentation, threshold-based segmentation, and pixel-based segmentation. Ostu's approach, local and global thresholding, histogram-based thresholding, and maximum entropy are all examples of threshold-based segmentation.
Several techniques in the category of pixel-based segmentation include the Markov field method, artificial neural networks, and fuzzy c-means clustering. Model-based segmentation uses deformable parametric models, such as level sets. There are numerous additional techniques for segmenting images, including gradient flow vector, distributed and localized region identification, adaptive thresholding, histogram thresholding, statistical region growing, and clustering [46], fuzzy-C Mean clustering bootstrap learning, supervised learning, edge detection, active contours, probabilistic modeling, sparse coding [47], contextual hypergraph [48], and To increase the system's accuracy, hybrid models of these techniques that combine two or more have been utilized [44].
The accurate identification of cells and their corresponding structural components is the initial step in cytology diagnosis. Since the morphology of the nucleus and cytoplasm is most often connected to anomalies used as recommendations VOLUME 11, 2023 VOLUME 11, 2023 to diagnose CC, precise segmentation is a crucial need for screening systems. The segmentation issue was originally solved historically exclusively for transparent freelying cells. Background extraction, cell localization, and cell boundary determination were the three primary phases in pipelines overall. The majority of ways for doing this relied on noise-reduction mean filters that were preceded by straightforward picture histogram thresholding techniques. The information retrieved to create the histograms includes optical density information [49], gradient and compactness information [50], and grey level brightness or energy [51]. The most challenging intrinsic problem was determining the ideal threshold [52]. Direct application to Pap smears was not conceivable since these studies showed amazing performance for the pictures under examination but failed with more complicated instances. The majority of this section's attention will be on a few strategies that were the subject of further study [53].

C. FEATURE EXTRACTION
Feature selection and Feature extraction are different methods for getting a subset of features. A subset of the initial collection of features is chosen in feature selection as opposed to feature extraction, which retrieved characteristics that could have discriminatory value [54]. The basic goal of feature selection is to select a subset of input variables while preserving or improving classification accuracy by excluding characteristics with little to no predictive information. Strong and weak relevance is how Arif et al. classified feature relevance. A feature is said to be very relevant if it is not able to be eliminated from the feature set without reducing classification accuracy. According to Arif, a characteristic can occasionally improve classification accuracy if it is of weak significance [55]. More information on feature selection and feature extraction is provided below.

1) FEATURE EXTRACTION AND FEATURE SELECTION
Feature selection aims to be essential to the entire process because the classifier is unable to identify poorly selected features. There are some criteria reported by some of the existing works for choosing the right features. For example, Lippman has mentioned that features need to possess the information demanded to differentiate classes [54].
In the character recognition system, the procedure of feature extraction is started following the pre-processing phase. Pattern recognition's main objective is taking an input sample and properly allocating it to one of the potential output labels. This process includes two general stages, namely feature selection and classification. Feature extraction is an essential step in constructing any classification of pattern and purpose for extracting the applicable information that characterizes each class. These feature vectors are then used by classifiers to distinguish between the input variable and the desired output unit. It goes more accessible for the classifier to separate different classes by looking at these features. In other words, feature extraction retrieves the most critical data derived from raw data. This process identifies the set of variables that determines how a character will appear or object precisely and uniquely. Feature vectors extracted in this process play as identities of the associated objects or characters [56].
The feature extraction's primary goal is to construct a similar feature set for several instances of the same symbol and to extract a collection of features that maximizes the rate of recognition with the minimum amount of components [57]. The current solutions suggest several feature extraction techniques. Contour profiles, geometric moment invariants, Graph description, zoning, Zernike moments, projection histograms, spline curve approximation, Gabor features, gradient features, and Fourier descriptors are a few of the frequently used techniques [58], [59].

2) IMPORTANCE OF FEATURE EXTRACTION IN IMAGE PROCESSING
After completing the pre-processing and achieving the appropriate level of segmentation (symbol, word, line, character), several feature extraction methods are used to extract features from the segments. This is done before using classification and post-processing algorithms. Focusing on the feature extraction stage is crucial since it might be seen as affecting the recognition system's effectiveness. To achieve excellent recognition performance, in the feature extraction method feature selection task is the most crucial component.
Feature extraction has been yielded as extracting from the raw data information that is most suitable for classification purposes while minimizing the within-class pattern variability and enhancing the between-class pattern variability. Therefore, it is important to choose a feature extraction method with the utmost care in accordance with the input to be used. With all of these things taken into account, it becomes crucial to evaluate the numerous feature extraction approaches that are several feature extraction methods available in a given domain that cover a large variety of conceivable circumstances [60].
The local elements of the picture are used in image classification to differentiate between the various images. These characteristics are categorized based on several essential picture data elements, including color intensity, the boundaries of the objects in the image, and texture [58]. The effectiveness of the feature extraction approach greatly improves the subsequent image processing. In picture matching, pattern recognition, and retrieval, these attributes can be employed. For these apps to attain a high level of accuracy, they need concise and pertinent information. An input picture contains a lot of redundant, complicated information. Using this data to decrease a collection of VOLUME 11, 2023 features (or feature vectors) is a technique known as feature selection [61].
Image analysis is the process of extracting and analyzing features from photographs for use in other applications. It is distinct from other image processing techniques, including enhancement, coding, and restoration. Techniques for detection, segmentation, extraction, and classification are used in image analysis [62]. The technique of feature extraction is used to extract the features from a big set of visual data while retaining as much information as feasible. Today, selecting and extracting features efficiently and effectively is quite difficult.

D. DEEP LEARNING
One of the most useful sources of diagnostic data is medical imaging, but it depends on human interpretation and faces growing resource problems. Particularly in low-countries and developing countries, the demand for and availability of diagnostic pictures is outpacing the abilities of the professionals who are now available. [13] This issue could be solved by automated diagnosis using medical imaging using AI, particularly in the field of DL. [14], [15] Reports of DL models doing diagnostic tasks as well as or better than humans have caused a lot of excitement, but this enthusiasm should not take the place of serious evaluation. If certain studies proposed are biassed in favor of the new technology, whether the findings are generalizable, whether the study was conducted in silico or a clinical environment, and consequently, to what extent the study results apply to the realworld setting are all issues that have been brought up in this field.
The US Food and Drug Administration has now given its approval to more than [30] AI algorithms. [16] It is appropriate to comprehensively assess the corpus of information that broadly supports AI-based diagnosis in anticipation of AI diagnostic tools being applied in clinical practice. This comprehensive study aimed to evaluate the actual state of identification performance for medical imaging via DL algorithms in comparison to health care professionals, considering issues with reporting, clinical value, and study design to the world. They also carried out a meta-analysis to evaluate the diagnostic performance accuracy of DL algorithms in comparison to health care professionals [63].

E. MEDICAL APPLICATIONS OF DEEP LEARNING
DL has several uses, including helping with medical diagnostics. This encompasses but is not limited to biomedicine, magnetic resonance image analysis, and health informatics [63]. Classification, diagnosis, prediction, segmentation, and identification of various anatomical regions of interest are additional specialized approaches to DL in the field of medicine ROI. DL excels in comparison to typical Mlsince it has several hidden levels and can learn from original data. That enables it to learn concepts depending on inputs [64].
DL models have shown significant capability in identifying and classifying objects with higher speed and accuracy of objects. DL has made valuable improvements in major processing tasks, including segmentation, classification, diagnosis, and recording, that are widely used in medical image processing. The multilayer neural network perception mechanism of DL can learn more abstract features in images, and because of this ability, it is expected to eliminate traditional medical CAD systems problems. Automated systems have the potential to enable less experienced technicians, physicians, and other health care staff to prepare an equivalent assessment to those of specialists [5]. The purpose of these systems is to provide a diagnosis with acceptable accuracy, in the shortest possible time, with the least human intervention [24]. Furthermore, DL uses several intermediate layers located between the input and output layers that help to discover complex structures in a large set of data [65]. DL approaches are used in vital cases that can save human lives, such as cancer detection and other diseases diagnosis.
The field of medical image processing as a whole has been heavily influenced by DL techniques, and there have been an increasing several studies that have identified its use for traditional tasks that are enhancement, image classification, image generation, detection, segmentation, and registration. Recently Litjens et al. [66] achieved a very thorough analysis of the same topic, by Shen et al. both give a comprehensive image of the prevalence and applicability of DL techniques in the community [67]. Following is a summary of recent research on some of these classic tasks.
Medical image classification issues are designed to help in the detection of anomalies in pictures obtained during medical exams. Numerous neural network architectures have been applied for this purpose over the past few years, including Restricted Boltzmann Machines applied to discriminative learning techniques and Lung Computed Tomography analysis, combining reproductive (van Tulder & de Bruijne), and stacked auto-encoders were used to identify moderate cognitive impairment and Alzheimer's disease by utilizing the latent non-linear intricate linkages among different characteristics [68], [69]. Convolutional neural networks are rapidly being used for classification and detection tasks in recent years [70], [71], [72], [73], [74].

V. CERVICAL CANCER DETECTION APPROACHES
A cervical biopsy's histological evaluation provides the basis for the diagnosis. Cervical cytology, a pelvic exam, and transparency of the vaginal mucosa and cervix are all necessary for women who have symptoms of CC. By using a speculum, the cervix and vaginal mucosa should be visible. When the illness is micro-invasive or in the endocervical canal, the cervix may seem normal. Lymphatic vessels allow CC to spread to para-aortic, mediastinal, and inguinal regions, lymph nodes in the pelvic, and supraclavicular. In advanced illness, enlarged, indurated inguinal and supraclavicular lymph nodes may be visible [4]. In individuals or women with infection suggesting incursion without obvious lesions, colposcopy and biopsy should be done. If cancer is supposed clinically or by cervical cytology but is not proven through histopathologic analysis of cervical samples, a cone biopsy is required [75].
Persistent triggering of the immune system due to HPV infection leads to deregulated cell division of cervical epithelium, mainly squamous cells. Consequently, a pre-cancer condition is developed termed cervical intraepithelial neoplasia (CIN), formerly known as dysplasia. Based on the severity of the pathophysiology and extent of damage to the squamous epithelial layer, CIN lesions are classified into CIN1, CIN2, and CIN3, as shown in Figure 7. Moreover, World Health Organization (WHO) also classified precancerous cervical lesions as follows [5]: CIN1 is also categorized as low-grade CIN, indicating a mild state of dysplasia, where only onethird portion display a dysplastic signature. When a precancerous lesion progresses to two-thirds of the epithelium due to uncontrolled cell division, the condition is referred to as moderate dysplasia or CIN2. When dysplasia affects more than two-third portion of the epithelium with a massive reduction in its thickness, the precancerous lesion is categorized into CIN3. Collectively, CIN2 and CIN3 are ranked as high-grade CIN, as illustrated in Figure 7 [76].
According to the WHO research, there are four common ways to diagnose CC [77]: Conventional cytology (Pap smear), Liquid-based cytology (LBC), Human Papillomavirus Deoxyribonucleic acid (HPV DNA) test, Visual inspection with acetic acid (VIA) using a manual colposcope and digital colposcope. The following section presents a review of DL algorithms that are used in solutions for detecting CC.

VI. DEEP LEARNING IN CERVICAL CANCER DETECTION
Convolutional neural networks are the most used DL technique, according to the data gathered (CNNs).
Additionally, the most common training data was MRI. Segmentation is the most prevalent use when it comes to a particular purpose. To train and use deep neural networks, a wide range of data types are used to be able to observe. For expert-level assessment, MRIs, fundus photography, Computerized Tomography scan images, and other procedures of data can be used [64].
At first, these CAD systems required a significant amount of manual work to establish a selection of the key attributes of the images and their perception. Still, CNN -based algorithms are taking off today. They are able to learn and retrieve characteristics and represent extremely complex non-linear functions without human intervention, based solely on input data following a supervised training process. In numerous tasks and applications, DL algorithms have demonstrated performance equal to or better than that of a human, according to numerous studies and publications. On the ImageNet Large-Scale Visual Recognition Challenge, a well-known example was published in 2015 when a model achieved human-level performance in the job of classifying images. Since then, DL-based models and algorithms in the field of computer vision have advanced in a variety of tasks, including object identification and segmentation, in addition to image classification [79].

A. EXISTING DEEP LEARNING-BASED CERVICAL CANCER DETECTION SOLUTIONS
In the following section, we discussed in detail the existing DL methods, which are the center of interest among the scientific community. Recent publications from 2018 to the present 2022 employed various DL solutions in Pap smear, LBC, VIA, and colposcopy.

B. EXISTING SOLUTIONS IN THE PAP SMEAR TEST AND LBC
Previously different techniques have been employed for the detection of CC by using DL approaches. Table 2 shows a compilation of DL accuracy in the detection of Pap smear and LBC methods in the last five years.

C. EXISTING SOLUTIONS FOR HPV
Recently, DL application in HPV-associated CC was conducted by Yasunari et al. (2020), and they applied CNN for the classification accuracy of specificity (84.4%) and sensitivity (83.3%) using images of cervical SILs combined with HPV to envisage underlined pathology [22] as shown in Table no 2.

D. EXISTING SOLUTIONS USING COLPOSCOPY
As CC is categorized as the fourth most common lead of cancer-related deaths, researchers around the globe utilize various solutions to properly diagnose CC by applying DL algorithms. In line with this, Alyafeai and Ghouti developed a fully automated pipeline for the classification and detection of CC using cervigram. The proposed pipeline is VOLUME 11, 2023 composed of two DL models. The first detects the cervix region 1000 times faster, while the second model assists in classifying cervical tumors via self-extracted features. These features were further dissected by two simple replicas based on CNN. The two datasets of cervigram were studied to train and evaluate major components of DL pipelines. Based on the proposed DL classifier, the authors revealed that the classifier was attributed to the area under curve score of 0.82 with 20 times faster efficacy of classifying cervigram. Moreover, the current method is also suitable for mobile phone applications to enhance detection efficacy. Besides, the proposed pipeline lacks the perceptual quality of cervigrams to provide better and more accurate labeling of the cervical ROI [5].
In another study, Miyagi et al. [22] explored DL as AI for the classification of CC (SIL) using colposcopy images. Out of 330 patients, who endured biopsy and colposcopy by oncologists, 97 and 213 were diagnosed with low and high-grade SIL, respectively. An AI classifier combined with 11 layers of CNN was employed and trained. The findings of this study showed a comparative analysis of specificity, sensitivity, and accuracy of AI classifier and human-based diagnosis of high-grade SIL were 0.823 and 0.797, 0.800 and 0.831, and 0.882 and 0.773, respectively. At the same time, the AUC of the receiveroperating attribute was 0.826 ± 0.052. These outcomes revealed the better performance of AI classifiers over humanbased diagnosis; however, they lack substantial inference. Moreover, this current framework needs the validation of the classifier [22].
Likewise, Zhang et al. [24] classify precancerous cervical lesions via pre-trained closely connected CNN, a computeraided diagnosis method. The proposed approach was applied to evaluate CIN2 or higher-level cervical lesions. After preprocessing image [negative samples (4337 images), positive samples (3902 images)] data with ROI isolation and data amplification, DenseNet CNN from two datasets, 'ImageNet' and Intel& Mobile ODT were fine-tuned with parameters of all layers. The authors investigated the influence of various training strategies on the performance of the model, such as the different sizes of training data, random initialization (RI) training from scratch, K-fold cross-validation, and finetuning the pre-trained model. Interestingly, the outcomes highlighted the accuracy of 73.08% with an AUC of 0.75 in 600 test images. Nevertheless, data augmentation and the CNN algorithm need further improvement to develop a better diagnostic structure to analyze new data for precancerous cervical lesions [24].
Likewise, Zhang et al. [24] classify precancerous cervical lesions via pre-trained closely connected CNN, a computeraided diagnosis method. The proposed approach was applied to evaluate CIN2 or higher-level cervical lesions. After preprocessing image [negative samples (4337 images), positive samples (3902 images)] data with ROI isolation and data amplification, DenseNet CNN from two datasets, 'ImageNet' and Mobile ODT were fine-tuned with parameters of all layers.
The authors investigated the influence of various training strategies on the performance of the model, such as the different sizes of training data, random initialization (RI) training from scratch, K-fold cross-validation, and fine-tuning the pretrained model. Interestingly, the outcomes highlighted the accuracy of 73.08% with an AUC of 0.75 in 600 test images. Nevertheless, data augmentation and the CNN algorithm need further improvement to develop a better diagnostic structure to analyze new data for precancerous cervical lesions [24].
On the other hand, Bai and his colleagues proposed CNN based cervical lesion detection net (CLDNet) model to extract deep features of colposcopy images. More specifically, they used Squeeze-Excitation (SE) CNN for the recalibration of isolated features of images. Furthermore, they developed a proposal box via a regional proposal network (RPN) to highlight the ROI. A total of 6536 colposcopic images were selected, out of which training data included 5095 with 2567 negative images and 2528 positive cervical images isolated. The outcome revealed average precision extracted lesion region of approximately 92.53%, with an average recall rate of 85.56% to positively augment auxiliary diagnosis [92]. Recently, an innovative fuzzy reasoning model in practice to classify cervical images after an acetic acid test to reduce the risk burden of CIN. Liu et al. [25] used an automated image segmentation algorithm to extract useful information from the acetowhite region before and after the acetic acid test from 505 patients (122 CIN positive and 383 CIN negative). The grayscale alteration and the post-test image's texture complexity and coarseness were analyzed. The sensitivity and specificity for the three parameters were 80.8%, 80.9% and 82.8% and 82.0%, 87.4% and 86.2% respectively. In addition, overall sensitivity and specificity were improved significantly by fuzzy reasoning, despite this solution being unable to distinguish between low and high-grade SIL case studies. Secondly, the selected sample size was small, which further limits the number of features for analysis [25].
An alternative study conducted by Chen and his colleagues applied a CAD system to aid the diagnosis of cervical diseases such as HPV, CIN, and CER using uterine cervix images. Briefly, they segmented ROI via the proposed random forest (RF) segmentation algorithm from three different types of images (natural, acetic acid, and Lugol's iodine test). The ROI was further characterized by seven color spaces to extract features by using the Boruta algorithm to classify cervical diseases based on targeted features. The outcome reflected 83.1% accuracy in the final diagnosis of three cervical diseases; however, non-uniform distribution along with limited population size were the main limitations of the proposed study [26]. Additionally, an observational study of a DL algorithm is conducted by Hu and co-workers [27] on a longitudinal cohort of 9406 females in Costa Rica who underwent numerous cervical screening protocols as well as histopathological observations for  pre-cancer/cancer. Multiple cervical screenings included cervicography, HPV testing, and cytology. Later cervigram was obtained to apply faster R-CNN to evaluate cervical images in terms of detection, feature extraction, and classification. Automatic visual evaluation of cervigram highlighted better accuracy (AUC = 0.91) than human-based cervigram elucidation of pre-cancer and cancer cases. Nevertheless, the aforementioned study is conducted on a limited number of cases and includes only CIN2 cases instead of CIN3 and AIS. Moreover, images were captured with the discontinued film camera technique as compared to the digital camera, and images were taken by a small number of highly trained nurses [27].
Conversely, Shrivastav and his colleagues retrieved colposcopic images of CC from anonymized patients undergoing routine checkups for cervical injuries at the outpatient department of ''Batra Hospital and Medical Research Centre,'' New Delhi. Later retrieved images were processed before applying the segmentation algorithm, Earth mover's distance in R (programing language), and software capable of quantifying, identifying, and classifying morphological features, sensitivity, and color intensity automatically to aid accelerated diagnosis. The results of the study showed high validity during the comparison of anonymous colposcopic images (endocervix, ectocervix, and endo-ectocervix) using image repositories powered by Mobile ODT to categorize into quantitative values created through an algorithm [93]. Recently, Guo et al. [10] used a combination of DL networks, namely, transfer learning models (Inception feature extractor + support vector machines SVM, VGG), fine-tuned DL models (Inception, VGG), and RetinaNet, to evaluate HPV-associated cervical image sharpness, which hampered accurate diagnosis of cervical lesions. Therefore, they obtained 4525 unidentified images from 1399 females via Mobile ODT's EVA system and categorized them as 'Not sharp' and 'Sharp' images. Based on the results, the highest overall performance was obtained for RetinaNet as compared to other investigated replicas with 98% and 85% sensitivity and specificity, respectively, along with 94% accuracy [10]. Besides another group of researchers evaluated the utilization of smartphones in the detection of cervical lesions in subjects with atypical cervical cytology. Specialized doctors inspected the cervix of seventy-five females with abnormal cervical cytology via smartphone or colposcopy. VOLUME 11, 2023 Afterwards, the diagnostic potential of smartphones for CIN1 and two was investigated, and the kappa value was computed to reveal the chance-corrected agreement of the histologic observation based on smartphone and colposcopic findings. The findings of the investigation showed a significant correlation between histologic diagnosis based on the smartphone and colposcopic output with a kappa value of 0.67. On the other hand, the sensitivity and specificity of the smartphone in diagnosing CIN1 or chronic stage were 0.89 and 0.83, respectively, while for CIN2 or worse were 0.92 and 0.24, respectively. Nonetheless, smartphones displayed high sensitivity with great positive predictive value (PPV) during CIN1 detection, whilst specificity and negative predictive value (NPV) were low. In addition, PPV and NPV are affected by the disease prevalence, and findings cannot be generalized to other populations [28]. Elayaraja and Suganthi proposed a new method for diagnosing cervical tumors through cervigram images. The cervical images were acquired from the Guanacaste dataset (2005) and pre-processed with the oriented local histogram technique (OLHT) to improve the edges, and then dual-tree complex wavelet transform (DT-CWT) gets high multi-resolution images. The targeted topographies such as gray level cooccurrence matrix (GLCM), local binary pattern (LBP), moment invariant, and wavelet were extracted from processed images. These isolated features were then trained and testified by feed-forward back propagation neural network to categorize cervical images into benign and malignant compared to trained characteristics. The accuracy, sensitivity, and specificity were 98.29%, 97.42%, and 99.36%, respectively. Despite above mentioned beneficial outcomes, this method cannot apply to cervical melanoma diagnosis using a Pap smear cervigram [30].
Similarly, Kudva et al. [11] validated the likelihood of utilizing a shallow layer CNN to classify cervical images as malignant and non-malignant. Images were obtained from 102 females post acetic acid (3-5%) test using an android device. Out of these, 42 images were VIA positive (pathologic), and 60 images were VIA negative (healthy subjects). Later on, 275 images (15 × 15 pixels) were isolated manually from CC patients, and 409 images were extracted from healthy controls. These images were classified via shallow layer CNN consisting of a layer each of convolutional, pooling, rectified linear unit, and two completely associated layers. Overall classification accuracy was 100%, however, based on percentage training data of traditional machines sensitivity and specificity ranged from 61.9-71.3% and 69-77.3%, respectively, whereas for DL sensitivity and specificity oscillated between 43.9-100% and 75.6-100% respectively [11]. Also, Zhang and co-workers employed DL tools to classify CC images to help clinicians for better diagnosis. After filtration of the dataset, they used 6692 cervical images for experimentation and segregated the number of images as Type_1 (1184), Type_2 (3549), and Type_3 (1959). In the first phase, CNN was applied for the segmentation of cervical lesions, and in the second phase, a neural network model identical to CapsNet was utilized to classify cervical lesions. Consequently, the accuracy of the training set and test set were 99% and 80.1%, respectively. Though, due to a lack of optimization and structural adjustment of the CapsNet-cervical network, overfitting was observed in this model [29].
Adaptive Neuro-Fuzzy Inference System (ANFIS) was used by Jaya and Kumar to classify cervical lesions to detect CC. After acquiring 50 cervical images from the Guanacaste dataset, images were divided into benign (35) and malignant (15) cases. Images were centrally aligned by Fast Fourier Transform (FFT) to extract GLCM, trinary, and gray level features. Thereafter, extracted features were trained and classified by the ANFIS classifier with 99.36% accuracy, 97.42% sensitivity, and 99.36% specificity. These results were analyzed using MATLAB R2014b [31]. Moreover, a lot of efforts have been made to develop a better diagnostic performance of colposcopy images. Gutiérrez-Fragoso and Acosta-Mesa [32] investigated three renowned automated classification models, including k-Nearest Neighbors (KNN), C4.5, and Naïve Bayes. The study was conducted after the enrollment of 200 females with positive Pap smear tests referred for colposcopy. Acetic acid (3%) was applied to the cervical region, and data acquisition was performed using MATLAB (R2009a) equipped with an STC-N63BJ camera. A total of 180 images were captured from the colposcope using a green filter, and ten images were obtained before acetic acid application as reference images. After the central alignment of images, automatic models were applied to classify colposcopic images with an accuracy of 70%, specificity of 79%, and sensitivity of 60%. Conversely, the automatic classification method needs more refinement to avoid false negative and false positive results [32].
Lastly, Xu et al. [93] introduced a novel image dataset with a specialized interpreted diagnosis for the assessment of image-based cervical lesions classification procedure. Cervigram was selected by US National Cancer Institute. The dataset contained 1112 patients with 767 negatives (CIN1/normal) and 345 positives (CIN2/3/Cancer). After the selection of images from the database, three complementary pyramid features such as Pyramid histogram in L * A * B * color space (PLAB), Pyramid histogram of Local Binary Patterns (LBP), and Pyramid histogram of Oriented Gradients (PHOG), were extracted and classified by using CNN algorithm. In addition, they compared seven classic ML models, namely, RF, AdaBoost, SVM, gradient boosting decision tree (GBDT), Logistic regression (LR), MLP, and KNN and which serve as a baseline for future comparison. The results highlighted accuracy, sensitivity, and specificity of 77.17%, 78.55%, and 75.80%, respectively, for handcrafted and DL features [94]. Table 3 provides the summary of the latest existing results of accuracy, sensitivity, and specificity for the colposcopy method.

VII. RESEARCH OPPORTUNITIES AND FUTURE DIRECTION
Alyafeai and Ghouti pointed out that by decreasing specular reflection's impact in future work, the perceptual quality of cervical images can be improved, and improved manual cervical ROI labeling can be had [5]. In another study, Zhang believe that to improve the data, they should increase the volume of data and, accordingly, using the CNN algorithm for new data, consider a diagnostic framework for precancerous lesions [24]. Liu and Peng mentioned increasing the current sample size in later work [25]. The more risk indicators that were gathered and incorporated, according to the authors, the more impact the disease there would be. Additionally, they tried to make methods more efficient in terms of computing cost [26]. According to L. Hu et al., the main restrictions were: They only looked at a limited sample of cases from one cohort study. While that would be preferable to train on more certain pre-cancer instances, they involved CIN2 patients among the case group (AIS and CIN3 brought on by different HPV types prevail, CC that has spread). A small group of skilled nurses provided the images. In their work, instead of using contemporary digital image technology, they used images shot with a discontinued film camera method [27].
According to Guo et al., data gathering must adhere to a clear and comprehensive standard to acquire photos of higher quality. They intended to continue refining the algorithm in terms of classification accuracy and training duration as their future stages [10]. The Caps Net-cervix network model was proposed to be optimized and adjusted by a different set of writers. Additionally, they want to significantly reduce the overfitting issue and plan to collect additional training data from other sources [29]. Ghoneim et al. used different databases to assess the suggested system. Their system may be improved further by adding some custom elements. Also, they stated that the CC detection system should look into these architectural designs [14]. According to Harangi it will be able to additional enhance the collaboration they outlined in their paper by employing a consensus technique to assess a more precise content truth for the training [15]. Abdullah and colleagues thought that the absence of visuals reduced the accuracy of the created template.
Because the composition, brightness, and intensity of each picture vary. More photos are required for the simulation and analysis in instruction to fulfill and accomplish the new system's stability-created models. They indicated that categorization might be done in the future with great accuracy and precision by first calculating the nucleus' size [21].
Despite the differences between tasks, previous works share the same limitations and gaps once they are related to the available datasets. The number of images is relatively small, the annotations are naturally different. Some images have no annotations at all, and not all stages (normal, stage1, stage2, stage3 and stage4) of cancer have been studied and covered.

VIII. CONCLUSION
This paper reviewed existing research on DL-based solutions that use image segmentation and classification techniques to analyze and classify cervical screening images. The main components of DL techniques and important methods are discussed. This study revealed the importance of using DL techniques for CC cytopathology image and colposcopy image processing and classification. CNN is thought to have achieved an outstanding performance. The segmentation and classification task helps the patient with early detection, diagnosis and treatment of cervical cancer. However, there is still room for improvement. Compound algorithms have been used to enhance classification performance, and segmentation of cervical cytology images, colposcopy images, and common DL algorithms are discussed. Reviewing the existing works shows that the most widely used CNN architectures in this discipline for feature extraction and classification are ANFIS, Caps Net, ResNet, VGGNet, and AlexNet. In the future, mixed feature selection with a DL algorithm such as RCNN, Faster RCNN, and VGG19 can be studied to progress the CC classification.