Machine Learning Based Diagnostic Paradigm in Viral and Non-Viral Hepatocellular Carcinoma

Viral and non-viral hepatocellular carcinoma (HCC) is becoming predominant in developing countries. A major issue linked to HCC-related mortality rate is the late diagnosis of cancer development. Although traditional approaches to diagnosing HCC have become gold-standard, there remain several limitations due to which the confirmation of cancer progression takes a longer period. The recent emergence of artificial intelligence tools with the capacity to analyze biomedical datasets is assisting traditional diagnostic approaches for early diagnosis with certainty. Here we present a review of traditional HCC diagnostic approaches versus the use of artificial intelligence (Machine Learning and Deep Learning) for HCC diagnosis. The overview of the cancer-related databases along with the use of AI in histopathology, radiology, biomarker, and electronic health records (EHRs) based HCC diagnosis is given.


I. INTRODUCTION
Approximately, 80% of the hepatocellular carcinoma are caused by chronic viral infections including Hepatitis C Virus (HCV) and Hepatis B Virus (HBV) [1].World Health Organization (WHO) aims to reduce the CHV incidence by 90% and mortality by 65% before 2030 according to the strategy on viral hepatitis prepared in 2016.Globally, two million deaths are caused due to hepatocellular carcinoma (HCC) [2].Hepatitis C virus and hepatitis B virus are the major cause of chronic liver disease leading to HCC development especially in developing countries [3], [4], [5].As early diagnosis of liver inflammation causing fibrosis and cirrhosis is crucial for better treatment outcomes [6].The current The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues .invasive method, biopsy, for liver fibrosis analysis is complicated and present the challenges of bleeding, severe pain, observer variability, sampling errors and increased chances of infection [7].Although, another method of ultrasound elastography is emerging but it has limitations with measuring stiffness of liver tissue owing to interference of factors such as tissue inflammation, hepatic vein congestion, obesity, meal, etc., which lead to misinterpretation of acquired data [8].Comparatively, there is a dire need of developing a less invasive method for analysis of fibrotic liver diagnosis.
Traditional statistical models based on conventional regression have failed to capture non-linear and high order interaction of predictor variables in large datasets [9].Multiple logistic regression models have been used to analyze fibrotic liver condition [10].However, traditional approaches lack the capacity to interlink non-linear interactions between multiple variables which play a significant role in defining the severity of inflammation development.Recently, machine learning-assisted methods to analyze big-data are emerging to resolve issues regarding diagnosis, staging and prognosis multiple diseases.
A new scoring system based on simple clinical parameters, including blood tests and protein profiles may help in the analysis of fibrotic liver.For better clinical analysis of data, machine learning may play a pivotal role due to its adept analytical approach.Machine learning based algorithms can capture complex relationships in large datasets [11].Continuous evidence shows the implementation of machine learning approaches with improved prediction system for diagnosis of multiple disease including breast cancer and NAFLD, and postprandial glycemic responses [12].
For better prognostic analysis of chronic viral hepatitis, the diagnostic capacity of current approaches must be enhanced using advanced tools.Machine learning has recently emerged as to better understand the diagnostic data for diseases as shown in Figure 1.

II. TRADITIONAL APPROACHES IN DIAGNOSIS
Traditional approaches to diagnose hepatocellular carcinoma mainly include histopathology analysis of tissue biopsies, radiological analysis, cancer-specific biomarker analysis and electronic health records [13].These approaches are usually time consuming and always require analytical opinion of histopathologist or physician which may include factors like error in observation or individual biasness [14], [15].Owing to such discrepancies, the diagnosis and prognosis of a cancer is highly misjudged.Eventually, the patients' management and survival are affected by traditional approaches.
For managing liver disorders like autoimmune hepatitis and non-alcoholic steatohepatitis, Histopathology has played a fundamental role [16].However, noninvasive techniques for HCC diagnosis are limited which makes it is obligatory to do histological scrutiny of tumor samples for masses with uncommon features on imaging or to eliminate diagnosis of cholangiocarcinoma, metastasis, or benign primary liver tumor [17].Owing to certain limitations of noninvasive methods to diagnose cancer makes it difficult to detect cancer at an early stage.The basic method to diagnose liver disease is the histopathological analysis for defining the cancer and its stage or grade [18].There are several limitations for histopathology as well, a significant tissue mass is usually required for the tissue sections and sample preparation for analysis [19].Additionally, the sample analysis may be affected by the personal bias among histopathologists leading disagreements [20].Therefore, precision in histological characterization of liver tumor samples might frequently prove to be challenging.To resolve this issue, researchers recently has applied AI techniques to support diagnosis of liver tumor samples [21].Recent evidence suggests that by using a huge dataset of H&E-stained images from The Cancer Genome Atlas (TCGA), a convolutional neural network (CNN) was designed to differentiate between neighboring healthy liver tissue around the cancerous liver tissue to detect HCC with approximately 90% accuracy [22].In another study, Kiani et al. developed a CNN based algorithm to distinguish stained tissue images for their classification as HCC and cholangiocarcinoma with approximately 80% accuracy [23].These algorithms with accuracy of ∼ 90% on validation in combination with observations of histopathologists outperformed their individual accuracy of human analysis.So, instead of just relying on AI alone or pathologists alone we should use AI to enhance accuracy instead of replacing it for traditional histological diagnosis.Research revealed that incorrect prediction can negatively affect final diagnosis made by pathologists, which infers that a cautious approach should be taken when using AI models exclusively designed to automate the process of HCC diagnosis [24].
Another method to analyze liver cancer development is the use of radiology, by application of ultrasound analysis of liver [25].Traditionally, a two-dimensional ultrasound image (B-mode ultrasound) is developed from the abdominal cavity which has become a clinical guideline for liver tumor detection [26], [119].Although it has become a common practice to analyze the potential tumor with ultrasound radiation, there have been well defined limitations to this process.Mainly, B-mode ultrasonography is 50% to 60% accurate due to underlying factors such as, patient's body posture, equipment quality, operator experience, etc [27].Owing to such limitations after initial analysis by ultrasound imaging, patients are referred to contrast-enhanced computed tomography (CT) imaging or magnetic resonance imaging (MRI) [28].These radiological imaging techniques assist in confirming the pathognomonic features for HCC detection yet there remain some limitations.MRI and CT imaging often lack the capacity to identify liver nodules at early stages leading to subsequent follow-up imaging or liver biopsy recommendations [29].Consequently, radiological analysis causes unreasonable stress for patients by delaying the process of HCC diagnosis.Fig. 3 shows a schematic representation of commonly used diagnosis methods for HCC diagnosis.
The recent rise of artificial intelligence in healthcare by make use of machine learning and deep learning tools for genomics, proteomics, statistical and image-based analysis may play a pivotal role in HCC diagnosis in future [30], [31], [32], [33].Implementation of AI in addition to clinical diagnosis may speed-up the process of diagnosis at earliest stages of HCC development with correct identification of liver lesions and cancerous tissue [34], [35].

III. IMPLEMENTATION OF MACHINE LEARNING AND DEEP LEARNING
Machine learning (ML) and deep learning (DL) are subgroups of artificial intelligence (AI).ML uses various statistical, probabilistic, and optimization methods that help algorithms learn and apply complex patterns from given large, noisy, and complex datasets [36].This feature of the ML as well as DL is best suited for medical application where an acceptable generalization is obtained by searching through an n-dimensional space for a given set of biological data using variety of ML and DL techniques and algorithms [37].Every learning process in ML can be divided in to two phases: (i) approximation of non-classified dependencies from given data (ii) and use of the approximated estimation dependencies to predict novel results from the systems [38].Further, ML methods can be classified into two common main methods such as (i) supervised learning and (ii) unsupervised learning [39].In supervised ML the data fed to ML algorithm for training is labeled whereas the training data for unsupervised ML is unlabeled.It is up to the ML model used to find and apply the underlying patterns in the data and in case of supervised learning this leads to classification and regression.Similarly, in unsupervised learning it refers to clustering and association [40].The unprecedented advances in cloud and GPU computing have revolutionized the use of ML in medical applications such as cancer diagnosis and detection [41].More focus is paid on the use of AI in Histopathology, radiology, biomarker-based diagnostic, and use of statistical models in diagnostic of cancers.All these approaches are step by step discussed with benefits, challenges, and future recommendations.

A. AI IN HISTOPATHOLOGY
Histopathology refers to the diagnosis and study of the diseases of tissues where the examination of tissues or cells is carried out using microscope (Figure 1).Biopsy tissues obtained and processed overnight are visualized using sophisticated imaging techniques using microscope.In histopathology laboratory, a single diagnosis typically requires a histopathologist with experience of more than 10 years to correctly identify diseased tissue through microscope [42].Thus, despite being widely used, traditional histopathology techniques suffer from limitations such as time consuming, requires high level of responsibility, and sufficient expertise.To overcome these limitations, ML and DL methods are recently used in cancer diagnosis.However, at first, technological advances and use of ML methods were limited to quality assurance certain research applications [43].Unprecedent development in AI and ML methods with rise in computational power has led to realization of AI tools that can be readily deployed to accurately diagnose cancer disease; is remarkable advancement.Additionally, for evaluation of AI models in Hepatocellular Carcinoma (HCC) typically involves several key metrics, which are generally used to assess the performance of machine learning and artificial intelligence systems in medical imaging and diagnosis.Accuracy: This is the most straightforward metric, representing the proportion of correct predictions (both true positives and true negatives) made by the model out of all predictions.Accuracy is a good initial indicator but can be misleading in imbalanced datasets where one class is much more prevalent.Sensitivity (True Positive Rate): Sensitivity measures the proportion of actual positive cases (patients with HCC) correctly identified by the AI model.This metric is crucial in medical diagnostics to ensure that patients with the disease are not missing.Specificity (True Negative Rate): Specificity assesses the proportion of actual negative cases (healthy individuals or those without HCC) correctly identified.High specificity means the model is good at avoiding false alarms.Precision (Positive Predictive Value): This metric evaluates the proportion of positive identifications that were actually correct.In the context of HCC, it reflects how many of the patients identified by the model as having HCC actually have the disease.Recall: This is another term for sensitivity.It measures the model's ability to detect all relevant instances (all patients with HCC).F1 Score: The F1 score is the harmonic mean of precision and recall.It is a useful measure when you need to balance precision and recall, particularly in cases of uneven class distribution.Area Under the Receiver Operating Characteristic (ROC) Curve (AUC-ROC): This is a performance measurement for classification problems at various thresholds settings.The ROC is a probability curve, and the AUC represents the degree or measure of separability.It tells how much the model is capable of distinguishing between classes.Area Under the Precision-Recall Curve (AUC-PR): This metric is used particularly in cases where there is a significant imbalance in the observations between the two classes.Confusion Matrix: While not a single metric, the confusion matrix is a table layout that allows visualization of the performance of an algorithm.It shows the counts of true positives, false positives, true negatives, and false negatives.Thus, the applications can be extended to detect and count cells and to identify the morphology of cancerous cells for early diagnosis.ML methods can also be used to classify the tissues images obtained from microscopy as healthy or cancerous.Such methods as shown in Table 1 (AI in Histopathology) can be applied to whole images, where the trained algorithms automatically extract the patterns and based on the previous training.
In this connection, a DL based architecture called NucleiSegNet [15] was developed for diagnostic of the hepatocellular carcinoma by locating the nuclei using the image segmentation of the histopathological data (Figure 3(a) [44].Nuclei segmentation is referred to as important requirement to be fulfilled for the stained histopathological images prior to thinking about the computer-supported diagnostic systems for cancer such as training DL Model as shown in Figure 3(b).Automated segmentation of the nuclei of the histopathological images enables the qualitative and quantitative analysis at scale.This problem is handled by a study which presents a powerful DL-based network architecture trained for the nuclei segmentation of images obtained by H&E-stained liver cancer histopathology [45].This is carried out in three steps consisting of residual block, bottle neck, and attention decoder block.The novel proposed block here is residual block which enables the extractions of high-level semantic maps followed by efficient object localization enabled by attention decoder [46].Here the false positives are decreased, and the performance is improved.Upon deployment for nuclei segmentation, NucleiSegNet showed results better than the state of art methods for nuclei segmentation.The deployment generally follows the procedure as shown in Figure 3(c).This work also contributed by introducing the liver dataset named KMC consisting of the H&E staining images of liver cancer histopathology with labeled nuclei.Like KMC, the other databases containing the information about the cancers are summarized in Table 2 [47].Similarly, another study done in proposes a new DL-based framework (LiverNet) for classification of cancer.They handle it as multi-class classification of liver hepatocellular carcinoma (HCC) tumor histopathology images [48].It enables the automatic diagnosis of four subtypes of liver hepatocellular carcinoma using the liver histopathology images as training samples.The multi-class classification places the data in one of the four categories: non-cancerous class, low sub-type liver HCC tumor, medium sub-type liver HCC tumor, and high sub-type liver HCC tumor.Moreover, in LiverNet, BreastNet architecture has been extended by addition of atrous spatial pyramid pooling (ASPP) which enables the multi-scale feature extraction in histopathological image data [48].Additionally, study results demonstrate that the LiverNet outperforms the already existing, multi-class classification, state-of-art architectures for histopathological images by 2%.The datasets used in this study are TCGA-LIHC and KMC whereas the KMC was labeled by NucleiSegNet [49].The study claims to be the first to provide the proof-of-concept demonstration of multi-class handling for histopathological data.The accuracy obtained is 90-93% on the KMC liver dataset with fewer parameters and floating points.The motivation behind the KMC was the shortage of properly labeled histopathological data [50].Apart from them, a similar study presents a DL based framework called HistoCAE [31] in which a multi-resolution CAE based framework is used for image reconstruction [51].It is further continued by patch-based classification of each histopathology image as tumor and non-tumor.
Despite much progress reported in the form DL and ML based developed frameworks for AI applications in histopathology (Figure 3), there are still certain challenges in automated analysis of histopathological images.These challenges to a greater extent are solved by the advanced CNN and RNN architectures such as shown in Figure 3(b).However, the challenges are: first, representation of vivid clinical features is still difficult to attain because the morphological features of histopathological images differ from patient to patient which makes it hard to find the generalizable data patterns [52].Second, the labeled data even available today is scarce whereas the pixel size of the image (Standard: 100,000 × 100,000) is too large to handle.There are numerous cancerous regions and it's difficult to annotate all of them (Third) and finally, the noisy nature of the histopathological images caused by various reasons [53].To handle these long-standing problems patch-based methods were utilized for large images [54].One such study is done in where a patch-based CNN classifier was trained for the classification of tissue image (whole slide) [55].Here special attention was paid to the batch size and parameters because of their effect on performance of networks.Many of the studies tried to solve the problem of scarcity of labeled images and one such study is presented in where global labels were used histopathology images to perform classification of liver cancer [56].Moreover, the patch-features along with transfer learning were used to realize the features on patch-level and were later combined with multiple instances learning to scale these features to image-level for classification [57].This method solves the problem of data scarcity for histological images of liver cancer.Owing to the binary nature of the classification, the images are classified either as abnormal or normal and helps in early diagnosis of liver cancer.This study, considering three articles, proposes the development of non-invasive AI enabled biomedical image data processing tool to handle the otherwise impractical and time-consuming volume of the biomedical data (Figure 1).The ML methods upon fully developed thus can be used to characterize the tissue at various scales such as tumor appearance in histopathology, radiology, and the cell morphology analysis for automatic diagnosis of malignancy.Summary of these studies for application in diagnostic of viral cancers (Hepatocellular carcinoma) is given in Table 1, whereas Table 2 mentions about some important databses.Moreover, the degree of malignancy in liver cancer can be categorized into three types: poorly, moderately, and well differentiated.The differentiation of different levels is important for diagnosis and treatment of liver cancer.To fulfil this need, an attention (DL) mechanism-based classification study is presented in [31] where five different classified models were used and SENet model achieved the highest classification accuracy of 95.27%.Thus, this study, like others, demonstrate the potential of ML and DL based methods in sorting out the time consuming and laborious problems faced in previously existing manual techniques (Figure 3).

B. AI IN RADIOLOGY
Radiology like histopathology uses medical imaging (from different parts of the body) to diagnose disease in humans and animal.In diagnostic radiology [32] some of the most widely used imaging exams include x-ray, MRI, ultrasound, CT scan, and PET scan (Figure 3 (a)).
Despite being widely adopted, traditional radiology suffers from some limitations such as lack of reproducibility, especially for textural features of hepatocellular carcinoma [33] and these changes negatively affect the radiomic signatures.There is no standard protocol on the count of texture features as well as on the development of prediction models which hampers the generalizability of the results.A major limitation is lack of data and restricted data sharing between institutions because of legal concerns, make it more challenging.Thanks to the rise in computational power and amazing progress in DL and ML frameworks (Algorithms and tools) many of the mentioned concerns are now solved.Lot of work combining the radiology or radiomics with ML and DL has been already reported.However, scarcity of data or more accurately properly labeled data is still a big challenge.
Moreover, radiomics with DL integration have recently been used in image-based diagnosis as well as prognosis of various liver diseases [58] and hepatocellular carcinoma is one among them.One such study presented a multinetwork-based DL model for risk prediction (Similar to Symptoms) of liver transplantation in case of hepatocellular carcinoma [59].The database was constructed by extracting magnetic resonance (MR) images from picture archiving and communications system (PACS) followed by the extraction of pathology images [60].Additionally, a deep learning-based framework utilizing ResNet-18 and support vector machine models was developed to predict microvascular invasion (MVI) in hepatocellular carcinoma using CT images of arterial phase from 309 patients at China Medical University Hospital and 164 patients referred from 54 different hospitals.The models combined CT images (Figure 4-(1-6)) and patients' clinical factors (Figure 4-(7-8)), with the ResNet-18 model exhibiting the highest accuracy, achieving an AUC of 0.845.Its performance was consistent in external validation with an AUC of 0.777.The model's effectiveness was further confirmed through Grad-CAM visualization, demonstrating its ability to focus on relevant imaging features for MVI prediction (Figure 4) [116].
Similarly, another study used MR images for automatic diagnosis of hepatocellular carcinoma.The MRIs used in the said study were multiphasic with improved-contrast using the T1-weighted breadth hold sequences [61].Additionally, a systematic analysis of over 1500 genome-wide DNA methylation arrays [120] was conducted across multiple studies to identify a distinctive methylation signature for HCC.This study utilized a machine learning pipeline to pinpoint differentially methylated regions in HCC, linked to the repression of genes [122] associated with cancer progression.A unique signature comprising 38 DNA methylation regions was developed, yielding a high-precision HCC detection score.This score demonstrated remarkable efficacy, correctly identifying 96% of HCC tissue samples with 98% precision in an independent dataset.It also effectively distinguished cell-free    DNA (cfDNA) of tumor samples from healthy controls and identified cfDNA from patients with other tumors, including colorectal cancer [117], [121] as shown in Figure 5.
DL model was consisted of deep convolutional neural network (DCNN) (also shown in Figure 3(b)) with U-net architecture and was trained using train-test split method where training data was 70%, followed by 15% as validation data, and remaining 15% was used as testing data [62].To realize the effectiveness of DCNN-Unet, comparison was made between model prediction and manual work.73∼75% accuracy was obtained on test and validation sets respectively with dice similarity coefficients (DSC) adjusted to >0.2.Thus, it makes the process of diagnosis much faster, reliable, and error free.However, there is no external validation performed to validate the work against the bias and data leakage issues.One of the reasons for this inherent limitation is data scarcity (suitable).Another problem in dealing with medical imaging is, high resolution and vast feature space which makes it challenging for the image segmentation to identify and use the valuable features [63].To sort this problem, an ensemble based extreme learning machine (ELM) study is done in which works on liver tumor diagnosis using the random feature space [64].In this study, tumor detection is handled as binary classification problem with two-classes.Moreover, the ELM used here, is equipped with fast learning (than SVM and other algorithms) and commendable generalization and have been used already as a single-hidden-layer feed-forward neural network (SLFN) architectures.
Additionally, past years have witnessed the fruitful application of image processing techniques applied on medical imaging for diagnostics.Advances in imaging algorithms extend support to radiologists for timely diagnosis of hepatocellular carcinoma and other related disorders [65].Apart from conventional DL and ML algorithms, fuzzy models are also used for diagnosis in hepatocellular carcinoma.One such example study is given in where a new fuzzy linguistic constant (FLC) is introduced for computer-aided automated diagnosis of the liver cancer from CT images (Low contrast) [66].Fuzzy membership functional is developed to classify the contrast-enhanced images as cancerous and non-cancerous [123].To assess the extracted features, structural similarity index is utilized which in turn tells whether the tumor is malign or benign.This work overcomes the limitation and performs the external validation of the developed models on the dataset containing the information from 179 clinical cases whereas the information for other such databases is summarized in Table 2 [67].The class distribution is 98 and 81 for benign and malignant respectively for liver tumors.SVM applied to extracted features yield the accuracy of 98.7% whereas the presented segmentation method yields the enhanced detection value of 78% [68].Such algorithms are valuable tools for radiologists in diagnostic of tumors.In addition to CT, some studies have focused on combination of liver ultrasound and DL to perform classification and diagnosis of hepatocellular carcinoma.Classification of LI-RADS based ultrasound monitored reports for hepatocellular carcinoma diagnosis was performed Morgan et al. [69].Not only in hepatocellular carcinoma, but liver ultrasound images have also been used to perform the cirrhosis diagnosis.Other numerous studies can be found demonstrating the use of computational (ML, DL etc.) methods for the diagnosis of hepatocellular carcinoma using the radiology (radiomics) data [70], [71], [72].
No doubt, a lot of progress and success have been observed in DL and ML applications to radiology, however, there are some inherent limitations impeding the true potential of the subject.Major limitation is lack of generalization in radiomic classifiers, owing to the variance in radiomic features from person to person, dependent on protocols followed, and feature extraction approaches [73].The limitation is partially solved by image preprocessing such as gray-level normalization and standardization of resolution.In this connection, an algorithm was introduced recently which enable the tion of variations in radiomic features and enabled the feature analysis using multicenter image data [74].Future research is directed to design and implement optimal, fast, and reproducible image processing algorithms to avoid the variations in future and improve the generalizations of radiomic models to enhance the diagnosis abilities in hepatocellular carcinoma.

C. BIOMARKER BASED DISEASE PREDICTION
Techniques like integration of multi-omics analysis, use of ML, and DL have become important tools for biomarker screening and disease (Figure 3) HCC diagnosis as shown in Table 1.ML and DL algorithms once trained properly can easily be deployed to diagnose the disease [118] form patient samples [98].Variety of algorithms are trained on various biomarker related data types, and all are aligned with a common goal for early diagnosis of disease (HCC) which is necessary to treat HCC.Detection of early and potential tumors in HCC, is normally done by monitoring of vulnerable groups utilizing the abdominal ultrasonography in the presence and absence of serum analysis of α-fetoprotein (AFP) [99].It comes with the challenge of limited sensitivity for which two current approaches are being used, first one is to use the already known empirical biomarkers (derived) and another one uses circulating nucleic acid biomarkers consisting of cell free DNA and RNA [100].Modern molecular biology aligned with ML techniques closely allied to underlying biology of cancer [124].The approaches are considered as promising opportunities for obtaining the timely diagnosis with additional functionalities on the plate.
To address these challenges, a study using the integration of gene expressions (database information Table 2) of cfDNA to predict the clinic-pathological response of HCC patients is done in [62].Since HCC can be diagnosed by circulating cell-free DNA (cfDNA) and it is also referred to as predictive biomarker for HCC.The blood biomarkers are absent for early diagnosis in HCC because of which the mortality rate is very high.In this study a new ML based scoring system is proposed called cfDNA HCC.As already stated, it integrates the expression profiles of cfDNA which paved the way for the prediction of clinic-pathological response of the patients suffering from HCC.This study proves the use of known biomarkers could be useful for diagnosis of HCC.In this connection, studies have been conducted which are focusing on the discovery of robust biomarkers for HCC.One such study is done in [63] where six different methods for recursive feature elimination were used to select the gene signatures from TCGA liver cancer data (Table 2).It was hypothesized that, the genes shared among the adopted six subsets would be regarded as robust biomarkers in HCC.Statistical interpretation for feature selection in ML was performed using the Akaike information criterion (AIC) to explain the optimization process of feature selection.The biomarkers shortlisted in this study by the process of backward logistic stepwise regression were found to contain in the already known biomarkers.Another similar study using supervised ML identifies the biomarkers levels that can be used as diagnostic tool to classify the HCC [64].Moreover, large-scale transcriptomic data has also been used for identification of diagnostic biomarkers [65].Previously, efforts were put to make the genetic biomarkers but due to lack of data both in quantity and diversity it was not possible.Now with the emergence of large-scale transcriptomic data this study [65] presents identification of diagnostic biomarkers in HCC.The profiled dataset contained in total 2316 positive and 1665 negative (non-tumorous) samples.These data samples were obtained from four different studies using various types of profiling techniques.Based on overlapping genes in all datasets 26 genes were found highly expressed.Different feature selection techniques were used to finally select three genes (FCN3, CLEC1B, and PRC1) as diagnostic biomarkers in HCC.In order to overcome the existing limitation this study used a systematic approach to identify genetic biomarkers for HCC diagnosis which makes it applicable to wide range of platforms.
No doubt, the rise in computational power, generation of multi-omics data at scale, advances in approaches to mine the massive datasets, and publicly available databases providing useful data has been beneficial for the discovery of diagnostic biomarkers for timely diagnosis of cancers such as liver cancer.Despite these steps, there are still some limitations that need to be addressed to uncover the full potential of biomarker-based disease prediction strategy.A major limitation is the variations found in transcriptomic signatures because of use of different platforms, protocols, and person to person variation.Future work should focus on resolving these issues and the strategies based on ML and DL which could encompass these variations and generalize well.

D. EHRS BASED DIAGNOSIS
Electronic health records (EHR) provide accurate, up to date, and comprehensive information about the patient enabling coordinated and effective patient care [101].Sharing EHRs makes the diagnosis of disease more easily, reduces errors, and provides faster and safer care when reliable and precise patient health information is available [102].A wellstructured, end-to-end EHR system supports the critical clinical decision of disease diagnosis.
Recently, DL and ML approaches have been applied to EHRs to diagnose viral cancers such as hepatocellular carcinoma in patients.Hepatocellular carcinoma was predicted in patients with hepatitis C cirrhosis by Ioannou et al [93].DL recurrent neural network (RNN) models were applied to raw EHRs, and the resulting model outperformed the state of art traditional regression models in diagnosing the risk of occurring of HCC [103].In case of data, over the years 52983 samples were collected, out of which 98% samples were collected from male patients [104].The analysis of the EHRs revealed that the patients who developed HCC were older than the ones who did not experience it [105].The results form RNN showed the 80% of all HCCs that took place.RNN models are also powerful DL models for medical imaging applications as shown in Figure 3(b).Thus, the presented framework trained on EHRs outperformed conventional linear regression models.ML was used to signify EHRs and identify liver cancers which is caused by HCCs virus in more than 90% of the cases [96].The key features demonstrated by the proposed framework was fast prediction with risk modelling [106].In this a set of 112 abdominal CT images which contained 59 HCCs and 53 Non-HCCs were arranged from four Hong Kong based hospitals [94].Ontological features here were extracted which formed the basis of primary predictor panel [107].Pearson's correlation coefficient was used to quantify the relationship between every pair of HCCs and Non-HCCs samples [108].The model trained on these features when deployed obtained 84.7% sensitivity and 88.4% overall accuracy [109].The feature extraction using the Pearson's correlation coefficient resulted in better accuracy than without the it.Thus, the ML and DL architecture show a great potential in identifying viral cancers when properly trained on EHRs and deployed.
Moreover, one after the other methods and approaches are introduced making use of EHRs in combination with computational methods (ML and DL) (Table 2) in diagnostic of viral and non-viral hepatocellular carcinoma (Figure 3).One such state of art study is done in where a novel ML model is performing better than then already implemented risk scores in predicting the hepatocellular carcinoma in patients with chronic viral hepatitis [110].Comprehensive clinical data is fed to ML model and based on the HCC ridge score presence or absence of viral cancer is diagnosed [111].At the same time, this study has developed the statistical model to indirectly benchmark novel ML model.It shows that the novel ML model introduce here generate correct risk scores for HCC in patients with chronic viral hepatitis.The HCC ridge score developed by this study was more accurate than the risk scores already developed [112].Additionally, a validation study was done in where EHRs (inpatient) were used to find out the existence of cirrhosis in patients with hepatocellular carcinoma [113].This study did not include any AI assisted module.However, this study was able to perform the diagnosis using the inpatient EHRs.Review of these studies as mentioned in Table 1 prove that the use of EHRs coupled with computational (AI) techniques can open the door to success in diagnostic of viral cancers.However, there are certain limitations associated with the EHRs, ML, and DL models [114].First, EHRs are raw, heterogenous, and noisy data which negatively affect the performance of the models.Second, it takes a lot of effort to make the EHRs ML ready [115].When it comes to ML, owing to the high dimensional and diverse nature of the data, it is hard for the linear ML models to learn and generalize the data well.In addition, the feature engineering for conventional ML is troublesome.Whereas the DL models are hard to interpret.This emphasize the need for sufficient time to be invested for the processing of EHRs to suitable format followed by the interpretability of DL models.Once these limitations are overcome and newer more diverse data adopting algorithms are developed, only then the true diagnostic potential from EHRs can be harvested.

IV. DISCUSSION AND CHALLENGES
EHRs are inherently raw, noisy, and heterogeneous, comprising unstructured text, images, lab results, and more.This diversity and inconsistency in data quality can significantly impede the performance of ML models, making it challenging to extract meaningful insights without extensive preprocessing and standardization efforts.Moreover, Data Preparation and Accessibility is also an important factor.Transforming EHRs into a format that is amenable to ML algorithms requires significant effort.The process involves data cleaning, normalization, feature extraction, and handling of missing values, which is both time-consuming and resource intensive.Furthermore, issues related to data privacy, security, and sharing can limit the accessibility of comprehensive datasets necessary for training robust ML models.While ML and DL models hold great potential in identifying complex patterns and associations within large datasets, their application in healthcare, particularly in diagnostics, is hampered by the high dimensionality and variability of medical data.Linear ML models may struggle to capture the intricate relationships present in the data, necessitating more sophisticated, yet computationally intensive, DL models.One of the significant hurdles in the adoption of DL models in clinical settings is their ''black box'' nature, which makes it difficult for clinicians to understand how these models arrive at a particular diagnosis or prediction.This lack of interpretability can hinder trust in AI-based diagnostic systems and poses challenges in clinical decision-making.Additionally, ensuring that ML models can generalize well to new, unseen data is a critical challenge.Models may perform well on the data they were trained on but fail to maintain accuracy when applied to data from different sources or populations.Rigorous validation studies, including external validation on diverse datasets, are essential to establish the reliability and applicability of ML models in clinical practice.Finally, the successful implementation of ML-based diagnostics in clinical settings requires seamless integration into existing workflows, with minimal disruption to clinical practices.This integration must also address clinicians' needs, providing intuitive interfaces and decision support that enhance, rather than complicate, the diagnostic process.
To overcome these challenges and fully realize the potential of ML in the diagnosis of HCC, several future directions can be pursued such as advancements in data processing aimed to developing more sophisticated algorithms for data preprocessing and feature engineering can help mitigate the issues of data quality and heterogeneity, making EHRs more amenable to ML analysis.Similarly, adopting federated learning approaches can address data privacy and accessibility issues by enabling ML models to be trained multiple decentralized data sources without needing to share the data directly and investing in research and development of explainable AI models can enhance the interpretability and transparency of ML-based diagnostic tools, fostering trust and acceptance among healthcare professionals.Finally, close collaboration between computer scientists, data scientists, clinicians, and other healthcare professionals is crucial to ensure that ML models are developed with a deep understanding of clinical needs and constraints, leading to more effective and user-friendly diagnostic tools.

V. CONCLUSION
Incorporating Artificial Intelligence (AI) into oncological diagnostics, particularly for hepatocellular carcinoma (HCC), enhances traditional methods by leveraging advanced Machine Learning (ML) and Deep Learning (DL) techniques alongside increased computational power to analyze extensive datasets.This synergy facilitates the nuanced examination of image-based data in conjunction with biomarker datasets, crucial for early detection of both viral and nonviral HCC, with AI algorithms designed to supplement rather than replace existing clinical methodologies.The integration of AI not only promises to refine diagnostic accuracy but also to contribute to the prognostic assessment, guiding more personalized treatment approaches.The path forward should focus on developing interpretable AI models that engender clinician and patient trust, ensuring seamless integration into clinical workflows, conducting rigorous validation studies to ensure efficacy and reliability across diverse populations, and addressing ethical and regulatory considerations to safeguard patient privacy and data security.Such efforts necessitate a multidisciplinary approach, underscoring the importance of collaboration between data scientists, clinicians, ethicists, and policymakers to fully realize the potential of AI in transforming HCC diagnostics and patient care.Funding

FIGURE 1 .
FIGURE 1. Left: Biological approaches used to obtain the data related to cancer patients (hepatocellular carcinoma), Middle: Traditional vs Machine learning and Deep learning-based approaches are shown for the diagnosis, and finally in Right: Final diagnosis results which detect and classify the data as malign, benign or can further differentiate in sub-tumor types based on model training in case of ML and DL.

FIGURE 2 .
FIGURE 2. A schematic overview of the traditional approaches to diagnose HCC.

FIGURE 3 .
FIGURE 3. A process of cancer diagnosis using ML and DL techniques.a) shows the process of collecting cancer related data from different databases and once the data is collected then it is cleaned to make it ready for training the models, b) shows the splitting of the data into training and testing followed by the ML and DL model development, training, and testing, and finally c) shows the deployment of trained models to predict weather given new data (patient data) is classified as cancer or not-cancer.

FIGURE 4 .
FIGURE 4. This flow chart illustrates the development of a ResNet-18 based model for preoperative MVI prediction in HCC, detailing steps from labeling and preprocessing CT images to integrating clinical factors and model training.The process includes image augmentation techniques like rotation, cropping, and flipping, with examples provided.The final model predicts MVI status by combining image data with clinical factors [116].

FIGURE 5 .
FIGURE 5. (a) A diagram showing datasets assembled for discovering HCC DNA methylation biomarkers using machine learning and constructing an HCC risk score.(b) PCA of Train & Test DNA methylation dataset highlighting HCC samples with explained variances.(c) Steps in feature reduction, including probes, CpG sites, clusters, and DMRs, in the processing and feature discovery pipeline [117].

ARUN
ASIF is currently a dedicated Postdoctoral Researcher with the Albany Medical Center.With a background in AI and organ on a chip, his research focuses on brief description of research interests.His contributions aim to advance the understanding of specific area of research for the betterment of potential impact or applications.

TABLE 1 .
Application of AI (ML and DL) for diagnosis of Hepatocellular carcinoma using AI in histopathology, radiology, biomarker based, and EHRs based diagnosis.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 1 .
(Continued.) Application of AI (ML and DL) for diagnosis of Hepatocellular carcinoma using AI in histopathology, radiology, biomarker based, and EHRs based diagnosis.

TABLE 2 .
Summary of cancer related databases.