Machine Learning Techniques for Biomedical Natural Language Processing: A comprehensive Review

The widespread use of electronic health records (EHR) systems in health care provides a large amount of real-world data, leading to new areas for clinical research. Natural language processing (NLP) techniques have been used as an artificial intelligence strategy to extract information from clinical narratives in electronic health records since they include a great amount of valuable clinical information. However, in a free-form text such as electronic health records, many clinical data are still hidden in a clinical narrative format. Therefore, the performance of biomedical NLP techniques is required to unlock the full potential of EHR data to convert a clinical narrative text automatically into structured clinical data. In this way, biomedical NLP applications can be used to direct clinical decisions, identify medical problems, and effectively postpone or avoid the occurrence of a disease. This review discusses the current literature on the secondary use of electronic health record data for clinical research on chronic diseases and addresses the potential, challenges, and applications of biomedical NLP techniques. We review some of the biomedical NLP methods and systems used over EHRs and give an overview of machine learning and deep learning methodologies used to process EHRs and improve the understanding of the patient’s clinical records and the prediction of chronic diseases risk, providing a great chance to extract previously unknown clinical information. Moreover, this review summarizes the utilizing of Deep Learning and Machine Learning techniques in biomedical NLP tasks based on chronic diseases related EHR data. Finally, this review presents the future trends and challenges in the biomedical NLP.


I. INTRODUCTION
There is a significant impact of Natural Language Processing (NLP) and Machine Learning techniques on processing digital data. The reliance on digital data is increasing, so it is essential to use the value of data in different research fields. Extracting information from the clinical text can be applied to various applications such as automatic terminology management, de-identification of the clinical text, data mining, identification of research subject, prediction of the onset and progress of different chronic diseases, analysis of the disease medication and its side effect, etc. Although NLP-based machine learning techniques have a better performance in the field of biomedicine and healthcare, more experience is required in the analysis of the narrative clinical text [1]. Therefore, it is necessary to intensively review the problems and challenges of extracting informa-tion from the clinical text to develop new opportunities in this field of research [2].
Biomedical NLP is a field of research that includes natural language processing, bioinformatics, medical informatics, and computer linguistics [1]. Extracting valuable information from a free clinical text embedded in unstructured data is a significant task of NLP that can support decision making, reporting on administration, and research. Applying biomedical NLP applications in EHRs has a considerable effect on several domains of healthcare and biomedical research.
Healthcare-related NLP paved the way to medical language processing. Usually, most of the biomedical data exist in an unstructured form, which is the result of dictated transcriptions, direct entry, or using speech recognition applications. Consequently, data pre-processing is required for information extraction because the summarization and decision-support tasks cannot be performed using the input data in its narrative form. Preprocessing includes document structure analysis, tokenization, partof-speech tagging, spell checking, sentence splitting, Word Sense Disambiguation (WSD), and some form of parsing. Situation dependent features like event subject identification, temporality, and negation play a crucial role in the inappropriate interpretation of the extracted information [3].
There are various information extraction techniques, such as rule-based techniques, pattern matching techniques, and machine learning and statistical techniques. Then, the extracted information can be used to analyze the clinical text as well as improve the EHR and the decision support systems and to be related to concepts in the standard terminologies. The biomedical natural languages processing involves the methods and studies of how NLP can be applied to the biomedical and electronic medical record texts and literature.
Recently, deep learning techniques have achieved better performance by applying their techniques to different general NLP tasks such as language modeling, (Part of Speech) POS tagging, named entity recognition, paraphrase detection, and sentiment analysis compared to traditional machine learning (ML) techniques. Because of the frequent use of acronyms and non-standard clinical terminology by healthcare professionals, unorganized structure of the document, and the need for complete deidentification and anonymization to protect the privacy of patient data, clinical reports typically face specific challenges compared to general-domain text. Eventually, addressing and solving these challenges could promote further research and improvement for various biomedical applications such as clinical decision support, identification of patient cohorts, patient engagement support, public health management, pharmacovigilance, medications, and summarization of clinical texts.

A. MOTIVATION
Historically, extracting clinical information from narrative clinical texts is done manually by clinical experts, which caused several issues such as lack of scalability and high cost. These issues have particularly affected chronic diseases since clinical notes are more than structured data; for example, the number of clinical notes compared to structured data for chronic diseases such as rheumatoid arthritis, Parkinson's disease, and Alzheimer's disease is graphically quantified by Wei et al., [4].
NLP approaches have a significant roles in addressing and solving several challenges of various clinical tasks such as automatic extraction of relevant clinical information that may postpone or avoid the onset of disease for instance: • We have identified the NLP in general and biomedical NLP with its methods and technologies.
• Then we have presented the application areas of machine learning/deep learning in the biomedical NLP. • We have provided an overview of the most popular biomedical NLP systems and their general architecture. • We have identified the usage of NLP applications in clinical notes to identify chronic diseases and understand the challenges currently facing them. • Next, we have discussed a literature review of the application of various NLP techniques to narrative clinical notes on chronic diseases, including the analysis of difficulties faced by NLP methodologies in clinical narrative comprehension. • Finally, we conclude this review paper by describing existing challenges currently faced and open issues associated with the processing of the biomedical and clinical text and providing the NLP domain with sufficient resources and opportunities to extract new methodologies.

B. CRITERIA FOR SEARCH AND SELECTION
We searched for previous studies released from 2009 to 2021 using Google Scholar, PubMed, and the Web of Science. All searches used the keywords "electronic health records" or "electronic medical records" or "EHR" or "EMR," in combination with either "machine learning" or the name of a particular technique of machine learning in conjunction with 'chronic diseases'. Figure 1a shows the number of publications related to applying machine learning to EHR per year. Figure 1b shows the number of publications related to the use of EHR in chronic diseases per year.
In the rest of this review paper, we propose an overview of the most significant and noticeable articles and researches that focus on EHR using the machine learning and deep learning techniques.
We start with a general review of NLP in general, NLP in biomedicine and healthcare with its methods, technologies and potential tasks/usecases in the biomedical and healthcare domains in Section II and Section III, followed by application areas of machine learning in the biomedical NLP in Section IV. Then we provide an overview of NLP systems and system architecture in Section V. Next, we discuss a literature review of recent related works about applying NLP on chronic diseases in Section VI. Then we look at current open issues and challenges in the domain of the biomedical NLP in Section VII. Finally, Section VIII demonstrates the conclusion of the review paper by identifying current challenges and open issues.

II. BASICS AND BACKGROUND A. NATURAL LANGUAGE PROCESSING OVERVIEW
NLP is a sub-field that combines computer science, Artificial Intelligence (AI), and linguistics, where the aim is to process and interpret human language to carry out several tasks (e.g., automatic answering questions and, Many challenges are facing natural language processing when it is applied to general language, but some critical issues are especially relevant to the biomedical and healthcare domains. There is a wealth of electronic information concerning the healthcare domain, including publications, e-health records, and the Internet. Subsequently, there are many critical aspects relating to biomedical information, most of which are in textual form, in terms of controlling and using such information which is necessary to health research promotion, quality improvement and cost reduction. NLP is important because it is required to convert narrative clinical texts into structured data that can be used in computer applications [6]. The adoption of electronic health records systems in hospitals has increased significantly in the last ten years, by providing incentives of $30 billion to hospitals and physicians practices for the adoption of EHR systems, partly because of the 2009 Health Information Technology for Economic and Clinical Health (HITECH) Act [3]. The basic EHR system is used by 84% of hospitals, which has increased 9-fold since 2008, according to the most recent study from the Office of the National Coordinator for Health Information Technology (ONC) [7]. Furthermore, the use and adoption of basic and certified EHRs by officebased physicians has increased from 42% to 87%. Data of each encountered patient are stored by EHR systems such as demographic information, diagnosis, laboratory examinations, drugs, radiological images, clinical notes, etc.
Generally, the use of Electronic Health Records (EHR) Systems in both the hospital and outpatient care settings has increased significantly [7]. The use of EHR in hospitals and clinics has the potential to enhance patient care by reducing errors, and improving efficiency, the quality of treatment, while providing researchers with a rich data source [8]. The functionality of EHR systems can vary and are usually classified into basic EHRs without clinical notes, basic EHR with clinical notes, and comprehensive systems [7]. Even basic EHR systems can provide a wide range of patient information such as medical history, diseases, and medication use while lacking more advanced features. As the EHR was mainly developed for administrative activities at the hospital, there are many classification schemes and controlled vocabulary for recording patient medical data and events. Table 1 shows some codes from International Statistical Classification of Diseases and Related Health Problems (ICD) containing diagnosis codes, codes from Current Procedural Terminology (CPT) containing procedure codes, codes from Logical Observation Identifiers Names and Codes (LOINC) containing laboratory notes, and codes from RxNorm containing drug codes.
Such codes may differ between organizations, with partial mappings managed by tools such as the United Medical Language System (UMLS) and the Systemized Nomenclature of Medicine -Clinical Terms (SNOMED CT). With the availability of various classification schemas, coordinating and analyzing data through terminologies and across organizations is a field of ongoing research.
Diverse types of patient information are stored in EHR systems, including demographics, diagnostics, physical examinations, sensor measurement, lab results, prescribed or managed medicines, and clinical notes. One difficulty is to deal with the complexity of EHR data with its different VOLUME 4, 2016 While other biomedical data such as medical images or genomic information are present and treated in important recent researches [9][10] [11], we concentrate in this review paper on the five types of data that exist in most modern EHR systems. In the field of chronic diseases, new methods are needed to support and advance evidentiary medicine, given the increasing incidence of such conditions all over the world. There is a powerful and successful impact of the secondary use of EHRs in processing clinical data for biomedical and translational applications.
Several research studies have discovered a secondary use of EHRs in bioinformatics and healthcare applications [12] [13], although it is designed primarily to enhance operational healthcare performance. In particular, patient-relevant data stored in EHR systems were used for biomedical tasks such as extracting medical concepts [2] [14], modeling patient trajectories [15], diagnosing diseases [16] [17], supporting clinical decisions [18], etc.
Processing EHRs using machine learning and deep learning methods contributes to a better and more deep understanding of clinical patient trajectories which track the patient status from one health state to another being diagnosed with a specific clinical condition and risk prediction of chronic diseases, giving a unique opportunity to get unknown clinical information. However, A wide range of clinical history, remains locked in free-form texts behind clinical narratives. As a result, the unlocking of the full potential of EHR data depends on the development of NLP techniques to automatically convert the clinical text from its narrative nature to a structured form that can direct clinical decisions and potentially postpone or prevent the onset of diseases [19].
EHR processing and modeling are significant challenges due to its high dimensionality, noise, heterogeneity, sparse design, incompleteness, random errors, and systematic biases. In addition, a vast amount of information about patient clinical history is usually stored in free-text clinical narratives [20] since the most widely and descriptive method for recording clinical events remains written text. The development of NLP techniques integrated into machine learning algorithms is essential for the automatic conversion of clinical free-text into a structured data format. NLP has been used for a broader variety of applications in the clinical domain, including the detection of medical concepts from nursing documentation [21], discharge summaries [22] and radiology reports [23] as much potentially useful clinical information for pharmacoepidemiological research is contained in unstructured free-text documents. Routine health data such as Scottish Morbidity Records (SMR01) frequently use generic 'stroke' codes. Free-text Computerised Radiology Information System (CRIS) reports have potential to provide this missing detail. In order to increase the number of stroke-type-specific diagnoses by augmenting SMR01 with data derived from CRIS reports and to assess the accuracy of this methodology. However, applying NLP-based frameworks to a narrative clinical text has not been widely used in clinical activities and tasks to direct decision-support systems or administrative processes.

C. TASKS OF NLP IN THE HEALTHCARE DOMAIN
There are several Tasks of the clinical NLP: • Word Sense Disambiguation (WSD): is the process of automatically assign an accurate meaning (sense) to an ambiguous word in a specific context. The biomedical NLP tasks require the ability to accurately understanding ambiguous words within a specific context which is a critical issue. According to the medical word sense disambiguation, there is a list of all possible meanings (senses) for each ambiguous word. There are many ambiguous terms in clinical notes. There are a variety of interpretations for the abbreviation "PCA," including principal component analysis, patient-controlled analgesia, and prostate cancer. WSD is a critical issue in the medical domain [24] [25] [26] [27] because it is an essential step for the analysis of clinical notes [28]. • Name Entity Recognition (NER): is a subtask of IE (Information Extraction). One of the most important tasks in biomedical NLP is to turn unstructured text into computer-readable structured data [29]. NER is the task of identifying expressions that denote named entities (such as diseases, medications, and lab tests) in clinical notes. Many techniques can be used in NER such as [30] dictionary-based approach, rulebased approach, statistical approach, deep learning approach, hybrid approach [31]. • Adverse Drug Events (ADEs) Detection: Both medical research and hospital medical treatment benefit from detecting adverse drug events (ADEs) and medication-related information in clinical notes. ADEs are known as diseases occurring from medical interventions of medicines such as prescription errors, overdoes, adverse drug reactions, and allergic reactions [32]. EHRs have a wealth of information on ADEs which is hidden in unstructured data such as discharge summaries, procedural notes, medical history, laboratory results [33][34] [35]. The process of identifying and detecting the information related to ADEs from narrative clinical notes is very difficult and time-consuming. So there is a need for the NLP system for automatically processing narrative EHRs and detecting drugs, ADEs, and their interactions [36]. • Information Extraction (IE): is an important biomedical NLP task that facilitates the use of EHRs for clinical decision support, quality improvement, or clinical and translation research by automatically extracting and encoding clinical concepts from narratives notes. In the general domain, IE is commonly recognized as a specialized area in empirical NLP and refers to the automatic extraction of concepts, entities, and events, as well as their relations and associated attributes from free text [34] [37] • Relation Extraction (RE): is an important subtask of information extraction (IE) that focuses on identifying and detecting semantic relationships between clinical concepts in clinical notes [38] [39]. For example, in this clinical note "an MRI revealed a C5-6 disc herniation with cord compression", the lab test "MRI" indicates two diseases "a C5-6 disc herniation" and "cord compression". Many types of relations are mentioned by previous researches such as disease-attribute pair extraction [40] [41], temporal relation identification [42], adverse drug event detection [43][44], etc. clinical NLP domain has recently launched several shared tasks related to relation extraction from clinical notes such as Integrating Biology and the Bedside (i2b2) challenges [45], the Semantic Evaluation (SemEval) challenges [46] and the most recent 2018 National NLP Clinical Challenge (n2c2) [47]. These open shared tasks and challenges provide many resources and methods for medical RE tasks [40].

III. BIOMEDICAL NLP METHODS
This review paper gives an overview of the most recent articles based on most of the main biomedical NLP methods employing dictionary-based, rule-based, and machine learning techniques. Figure 2 shows the number of publications in the EHR domain applying deep/machine learning methods and rule-based techniques per year. Although the use of machine-learning methods is growing compared to rule-based methods, the performance and efficiency of machine learning algorithms can be highlighted by using rule-based methods as a benchmark because we are still seeing a shift from rule-based methods to machine learning algorithms [48].
Recently, biomedical NLP researches have shown the significant performance of the methods based on deep learning. The effective performance of Recurrent Neural Network (RNN) in biomedical texts for the NER (Name Entity Recognition) task was proposed by Sahu and Anand [49]. They developed a model which is a combination of a bidirectional Long Short-Term Memory Network (BiLSTM) and Conditional Random Field (CRF) applying character-level word embedding. Habibi et al., [50] combined the BiLSTM-CRF model developed by Lample et al., [51] and the word embedding model developed by Pyysalo et al., [52]. To generate important features such as orthographic features of biomedical organisms, Habibi et al., [50] used character-level word embedding to show that the characteristic word embedding is successful in biomedical NLP tasks.

A. RULE-BASED TECHNIQUES
Rule-based techniques are based on a set of specific textual relationship rules that called patterns that encode similar structures in the expression of relationships. These rules are represented over words or POS tags as regular expressions. In such systems, the rules extend as patterns by adding more constraints to resolve a few issues, including checking negation of relations and determining the direction of relationships. The rules are generated in two ways: manually and automatically generated from the training dataset. The efficiency of the rule-based system can be enhanced to a certain extent using an extension with additional rules, but it tend to produce much FP information. Therefore, rules-based systems usually provide high VOLUME 4, 2016 precision but low recall because the rules for a particular dataset cannot be created for other data sets. However, the recall of such systems can be improved by relaxing the constraints or by learning rules automatically from training data [53].
Although the architecture of dictionary-based systems is simple, they cannot be applied to manage unknown entities or ambiguous words, resulting in low recall [54] [55]. It also requires a considerable amount of manual labor to develop and maintain a comprehensive and up-to-date dictionary. Although the rule-based method is more flexible, the features are handcrafted to fit a model into a dataset [56] [57]. Both rules and dictionary-based methods can achieve high precision [58], but they can generate wrong predictions when the out-of-vocabulary problem occurs if a new word, which is not found in the training data, appears in a sentence. The issue of out-of-vocabulary arises in the biomedical field in particular because it is common to register a new biomedical term such as the name of a new drug.

B. MACHINE LEARNING TECHNIQUES
There are two main categories of learning techniques of machine learning algorithms: supervised learning and unsupervised learning. Supervised learning techniques pro-vide a function that maps from inputs x to outputs y: There are two main types of supervised learning techniques: classification and regression, and the most widely learning algorithms are logistic regression algorithm and support vector machine algorithm. On the other hand, the purpose of unsupervised learning techniques is to learn about the input x distribution features. There are two main methods of unsupervised learning:-cluster analysis and principal component. The input representation is an essential task for all machine learning frameworks. Machinelearning techniques input is a set of attributes known as features that are extracted for each data point. Such features are handcrafted based on domain knowledge in traditional machine learning where automatic data-oriented feature extraction is an essential aspect of deep learning techniques.
Up to the last few years, machine learning methods such as logistic regressions, support vector machines (SVM), and random forests were employed as key methods for analyzing and processing rich EHR data [70]. Most modern NLP platforms are built on models refined through machine learning techniques [71] [72]. Machine learning techniques are based on four components: a model; data; a loss function, which is a measure of how well the model fits the data; and an algorithm for training (improving) the model [73].
Deep learning techniques: Deep learning is a subfield of machine learning methods based on multi-layered neural network architectures with hierarchical data representations learning, as shown in Figure 3. Machine learning techniques require time-consuming and hard work for data representation feature extraction [74], While learning multiple levels of representations can be automatically done by deep learning techniques with increasing order of abstractions [75].
There are several factors contributed in the development of deep learning such as the availability of extensive unlabeled data along with rapid computing resources based on powerful graphics processing units (GPUs), new algorithms and frameworks and adaptations/transformations of learned data features/representations to similar or a new domain of interest.
Several non-linear classification problems with hierarchical inputs that naturally occur, such as language and images, can be solved by deep learning methods. Recently, deep learning techniques can be applied to NLP applications providing better results than techniques based on linear models such as support vector machines (SVMs) or logistic regression [76].
The most popular deep learning architectures are illustrated in this section by highlighting their key equation that demonstrates their operation method. Data representation is the primary task of deep learning. Using a machine learning algorithm, input features must be hand-crafted from the dataset based on the researcher's experience and the domain of knowledge to identify specific patterns of prior interest.
The development method of designing, reviewing, choosing and testing suitable features can be complicated and time-consuming. It can also be regarded as a "black art" [77] requiring creativity, trial-and-error, and sometimes luck. On the other hand, learning the optimal feature directly from the given dataset is performed by deep learning techniques without any handcrafting. Through deep learning, complex data representation is sometimes represented as compositions of other, simpler representations.
Recurrent deep learning architecture is a complex unsupervised hierarchical representation. Many of the major deep learning algorithms and architectures are based on the artificial neural network (ANN) architecture. ANNs consist of multiple interconnected nodes (neurons) organized in layers, as shown in Figure 3. Hidden units are neurons that do not appeared in the input or output layers and store several weights W, which are updated with the training of the model.
The optimization of ANN weights is performed by minimizing loss function as shown in Equation 2, such as a negative log-likelihood.
The summation of the log loss over the given training dataset D is minimized by the first term. While the minimization of the learned model parameters θ t p-norm controlled by a tunable parameter λ is performed by the second term, which is known as the regularization technique used to prevent the model overfitting and to improve its ability to be applied to new problems. Usually, the backpropagation technique is used for loss function optimization by minimizing the final layer loss over the network [75].
Many open-source sources are supported by different programming languages such as TensorFlow, Theano, Keras, Torch, PyTorch, Caffe6, CNTK, and Deeplearn-ing4j to deal with deep learning algorithms. In the following subsections, we give an overview of the most common deep learning techniques which can be applied to biomedical NLP applications, such as supervised and unsupervised techniques. A multi-layer perceptron is a multiple-hidden layered type of ANN that completely connects each neuron in the layer i to each neuron in the layer i + 1. These networks are usually limited to specific hidden layers, and unlike recurrent or undirected architectures, the data flows only in one direction. From the definition of the single-layer ANN, as shown in Equation 3, the outputs weighted sum from the previous layer is calculated by each hidden unit, followed by a nonlinear activation σ of the calculated sum. Where d is the number of the previous layer units, x j is the output of jth node of the prior layer, and ω ij and b ij are the weight and bias terms of each x j . The most common nonlinear activation functions are usually sigmoid or tanh, but recently rectified linear units (ReLU) are used by modern networks [75].
The network can learn the relationship between the input X and the output Y after optimizing the weights of the hidden layer during training. With the addition of more hidden layers, the input data is supposed to be more abstractly represented due to the non-linear activations of each hidden layer. Although MLP is one of the simplest architecture, other architectures combine fully connected neurons in their final layers.

B) Convolutional neural networks (CNN)
Recently, the most popular method is Convolutional Neural Networks (CNN) especially when applied in the image processing domain. CNNs require local raw data connectivity. A one-dimensional time series is also a set of local signal segments.Equation 4 demonstrates one-dimensional convolution where the input is x and the weighting function or convolution filter is VOLUME 4, 2016 w.
In Equation 5, where X is a 2-D grid (e.g., image) and K is a kernel, is demonstrated in a two-dimensional convolution in which a kernel or filter can pass a matrix of weights through the entire input to the feature maps.
The generated number of parameters is small since the filters are usually smaller than the input; therefore, CNNs have limited interactions. CNNs facilitates parameter sharing since all filters are applied across the entire input. The convolution layer in CNN contains several convolutional filters which receive the same input from the previous layer to extract different lower-level features. For features aggregation, these convolution layers are usually pooled or subsampled. Figure 5 provides an example of two convolutionary layers of the CNN architecture, followed by a pooling layer. In the case of a simple-spatial structure of input data (like image pixels), CNNs are the appropriate approach, but in the case of sequential organized data (such as time-series data or the natural language), recurrent neural networks (RNN) is the best method. The generated features are shallow when the CNN is fed with single-dimensional sequences, which means that the feature representations include only very close localized relationships between some neighbors [75]. RNNs are designed to manage this time dependencies for a long time. The RNNs are used to update the hidden ht state sequentially, based not only on the activation of the current input x t at time t, but also on the previously hidden state h t−1 , which in turn was updated from x t−1 , h t−2 , and so on as shown in Figure 7. Thus, after processing a whole sequence, the final hidden state includes information from all its previous sequences. Long-term memory (LTM) and Gated Recurrent Unit (GRU) models, both known as Gated RNNs, belong to the popular RNN architectures. While standard RNNs consist of interconnected hidden cells, a particular cell containing an internal recurrence circuit and a gate system that controls the information flow is substituted for each unit in the gated RNN. The gated RNNs have demonstrated better performance when modeling LSTM [75].

D) Autoencoders (AE)
The autoencoder (AE) is one of the deep learning models that demonstrate the concept of unsupervised representation learning. First they were considered as a tool for pre-training supervised deep learning models, but they are still useful for completely unsupervised tasks like phenotype discovery. Autoencoders are used to convert input data into a lowerdimensional space called z. After that, the encoded representation is decoded by reconstructing an estimated representation of the input x, calledx. The process of encoding and reconstruction for an autoencoder with one hidden layer are illustrated in equations 6 and 7, respectively. The weights of encoding and decoding processes are W andẀ and the encoded representation z is considered more accurate when minimizing the reconstruction error x −x .
A single input is fed through the network as the encoded representation of the input after the AE has been trained with the innermost hidden layer activations. The main task of AEs is to convert and encode the input data to only represent the most significant derived dimensions. Therefore, AEs are similar to traditional dimensionality reduction techniques such as principal component analysis (PCA) and singular value decomposition (SVD), but they provide a major impact for solving complex problems due to nonlinear transformations through the activation functions of each hidden layer. Many models of AEs have been developed, such as variation autoencoders (VAE), de-noising autoencoders (DAE) [78], and sparse autoencoders (SAE) [75]. E) Restricted Boltzmann machine (RBM) The restricted Boltzmann machine (RBM) is another deep learning model that demonstrates the concept of unsupervised representation learning. RBMs are similar to autoencoders in that they estimate the probability distribution of the input data, but they do so in a stochastic manner. Therefore RBMs are considered to be generative models as they attempt to model the underlying process by which the data was generated.
The energy-based model with visible binary units v and hidden units h, with energy functions defined in the Equation 8 is called the canonical RBM [75].
However an RBM has no connected visible or hidden units, all the units in a standard Boltzmann machine (BM) are fully connected. It generates the learned representation of the input data in a form of h. RBMs can be stacked hierarchically for the development of a deep belief network (DBN) for supervised learning tasks.

IV. APPLICATION OF MACHINE LEARNING AND DEEP LEARNING TECHNIQUES IN THE BIOMEDICAL NLP DOMAIN
Early EHR analyses were based on simpler and more conventional statistical techniques [79]. Recently, machine learning techniques, such as: Logistic Regression [80], Support Vector Machines (SVM) [81], Cox Proportional Hazard Model [82] and Random Forest [83] have been applied to EHR data for mining reliable predictive patterns.
There are critical issues based on the statistical model when being applied to EHR data analyses [84][85] [86]. Such issues can be overcome by applying modeling techniques that can be used to analyze and extract complex nonlinear variables interactions [75][87] which come from each entire patient's medical history such as mixed and multimodal data obtained in random times [86].
The support vector machine algorithm is the most popular machine learning method that has been applied to medical reports for the prediction of heart disease [88] [89], the identification of diabetes EHR progress notes and the classification of breast radiology reports according to BI-RADS [90].
The second most popular machine learning method is Naïve Bayes which has been applied to medical records for the prediction of heart disease [91] [92], the classification of smoking status [93], for the identification of multiple sclerosis [94], and the EHR records classification for obesity [95] and cancer [96][20] [97].
Conditional random fields (CRFs) are the third most common machine learning method, which has been applied to medical records for the prediction of heart diseases [98] [88], for the identification of diabetes EHR progress notes [99], for breast radiology reports classification [90], and identifying tumor characteristics in radiology reports [100].
Finally, random forests have been used for heart disease prediction, cancer classification [101], and identification of hypertension [102]. Table 2 outlines the most recent biomedical models using machine learning techniques with their major application.
The drawback of machine learning is in handling highscale data, their adoption of several statistical and structural assumptions, and their use of hand-crafted features/markers make the use of such statistical models in analyzing the EHR data is impractical, despite its simplicity and interpretability required for biomedical applications [84][85] [86]. Recent breakthroughs in these areas have led to vastly improved NLP models that are powered by deep learning, a subfield of machine learning [103].
Through the deep hierarchical construction of features and the efficient capture of long-range data dependencies, deep learning techniques have recently achieved significant progress in many fields [75]. There is an increased number of researches which apply deep learning techniques to EHR data for biomedical tasks [62][104] , due to the growing development of deep learning methods and the increasing number of patient data providing enhanced results and requiring less time-consumption preprocessing and feature extraction compared with traditional methods.
Modern biomedical NLP systems can identify and model more complex relationships and concepts [105] by using the main deep learning architectures such as feedforward neural networks (FFNN), convolutional neural networks (CNN), and recurrent neural networks (RNN) that can be applied for the analysis and modeling of HER. Vector-embedding approaches are used for data preprocessing by encoding words before feeding them into a model. These approaches understand that words may have different meanings depending on context (for example, the meanings of "patient," "shot," and "virus" differ depending on context) and treat them as points in a conceptual space rather than separated entities. The emergence of transfer learning has improved the performance of the models, which involves taking a model trained to perform one task and using it as the starting model for training on a similar task [106].
Convolution neural networks (CNNs) have an effective performance in a wide range of NLP biomedical tasks, for example: 1) CNNs have an effective success in the development of a model for classifying biomedical articles to identify VOLUME 4, 2016 cancer hallmarks associated with an abstract article [107]. 2) CNNs are used to learn the representation of time expressions for clinical temporal relationship extraction [108]. 3) CNNs can be applied to model the appropriate article for the biomedical article retrieval task [109]. 4) CNNs can be applied to biomedical reports to identify protein-protein interaction relations [110]. 5) CNNs can be used with an attention mechanism to extract drug-drug interactions [111]. 6) CNNs can be used for classifying free-text radiology reports using the pulmonary embolism results [112]. 7) CNNs can effectively support the classification of patient portal messages [113]. 8) CNNs can be applied to biomedical text for named entities recognition [114].
In the case of automated coding in radiology reports by using an International Classification of Disease (ICD-10) system, CNN models have contributed to achieving improved efficiency compared with machine learning classifiers [115]. There is also a semi-supervised CNN architecture that can be used in social media to automatically detect adverse drug events (ADE), inspired by the previously mentioned accomplishments of CNNs for different clinical NLP applications, unlike conventional systems [116] that usually employ lexicon-and machine learningbased techniques that depend on expert annotations for ADE detection by producing large quantities of labeled data to train supervised machine learning algorithms.
Many clinical events can be detected from free text EHR notes by applying Recurrent Neural Network (RNN) architectures such as disorders, medications, tests, adverse drug effects [117], and patient data de-identification from EHRs [118]. Bidirectional RNNs / LSTMs have been successfully applied to several biomedical NLP tasks such as building models for the prediction of the missing punctuation in medical reports [119], the identification of biomedical events [120], the modeling of relational and contextual similarities between the named entities in biomedical articles to understand important information to provide appropriate treatment suggestions [121], the extraction of clinical concepts from EHR reports [122], and the recognition of named entities in clinical texts [123]. Many recent researches develop models using the embedded graph information for adverse drug reaction detection in social media data [124] by applying bidirectional LSTM transducer. RNNs are used to develop recognition models for disease name learning with term-and character-level embedding features [49] when they are used in conjunction with CNNs. We provide, in this section, an overview of the recent state of the art of biomedical applications as a consequence of the rapid and recent development of deep learning techniques being applied to electronic health records (EHR). Table 3 outlines the most recent biomedical models using deep learning techniques with their major application, subtask definitions, and type of input data according to current research's logical classification.

V. BIOMEDICAL NLP SYSTEMS
We give an overview of NLP systems and their architecture in this section. General architecture of the Biomedical NLP System Friedman and Elhadad's discussion [6] illustrates NLP and its different aspects and parts, as shown in Figure 8a.
As shown in Figure 8a, the left part consists of the trained corpora, domain model, domain knowledge, and linguistic knowledge; the right part includes techniques, tools, systems, and applications. Thus aspects of NLP can be divided into two parts. Figure 8b provides an overview of the general architecture of the NLP system, in which there are two primary components of the NLP system: background knowledge corresponding to the left part of the figure, and a framework that incorporates NLP tools and modules corresponding to the right part of the figure. The two primary com- ponents of biomedical NLP systems and their tasks are illustrated below, which are how NLP tools incorporated into a pipeline designed on top of a particular framework.
Regarding the framework, which is a software platform for controlling and managing pipeline components like loading, unloading, and handling, the framework's components may be integrated, combined, or used in the system as plug-ins. There are two levels of the framework of biomedical NLP systems:-low-level and highlevel processors. Basic NLP tasks are carried out by lowlevel processors such as part-of-speech tagging, segment tagging, sentence boundary detection, and chunking of noun phrases. Semantic level processing, such as named entities recognition (e.g., disease/disorder, sign/symptoms, medicines), relationship identification, and timeline extraction, is performed by high-level processors.

A. BIOMEDICAL NLP BACKGROUND KNOWLEDGE
The Unified Medical Language System (UMLS) Biomedical and linguistic knowledge are essential components in the development of biomedical NLP systems. The Unified Medical Language System (UMLS) was developed in 1986 and applied to the biomedical NLP systems. There are three key components of the UMLS: the Metathesarus, the Semantic Network, and the SPECIALIST lexicon. The UMLS can be known as the ontology of biomedical concepts and their relationships for practical applications. Furthermore, background knowledge includes domain models and trained corpora that are used to be applied to particular fields like radiology/pathology reports and discharge summaries. Annotated corpora will be labeled by human annotators manually and will be used to train machine/deep linguistic classifiers and to evaluate rulebased systems.
The Metathesarus of UMLS currently comprises over one million biomedical terms and five million concept names have been derived from more than several biomedical controlled vocabularies, such as RxNorm, MeSH, ICD-10, and SNOMED CT.
The UMLS Semantic Network categorizes all UMLS Metathesaurus concepts consistently depending on their semantic types to minimize Metathesaurus complexity. It currently contains 135 main categories and 54 relationships (a) Aspects of biomedical NLP systems.
(b) A general biomedical NLP system architecture. FIGURE 8: Biomedical NLP system between categories. For example, the "Disease" category has a relationship "associated with" with the "Finding" category, and the "Hormone" category has a relationship "Affects" with the "Disease" category.
The UMLS SPECIALIST lexicon contains information for biomedical terms on their syntax, morphology, and spelling [145]. Currently, it contains over 200,000 biomedical terms and is used for biomedical NLP tasks by the UMLS lexical tools.

B. TOOLS AND FRAMEWORKS OF THE BIOMEDICAL NLP
NLP Tools/Methods: For the construction of NLP tools, there are two main methods/techniques. The first technique VOLUME 4, 2016 is rule-based, mainly focused on rules and dictionary lookup. The second technique is the machine learning method based on annotated corpora to train learning algorithms. The Rule-based approach was often adopted by early systems because their design and implementation was very simple. Currently, many biomedical NLP systems have shifted away from using rule-based methods and depend on machine learning approaches due to their progress and the growing number of annotated corpora, while new annotated training data may have a high cost to generate. Machine learning approaches often deliver better results than rule-based methods, as demonstrated in many challenges of biomedical NLP. At the same time, most recent NLP systems have been designed from integrating rulebased and machine learning methods, which have been called hybrid systems [6]. NLP Frameworks: It is possible to incorporate the framework into the NLP system itself or to use the available common architectures. GATE (General Architecture for Text Engineering) and UIMA (Unstructured Information Management Architecture) are the two most common generalized architectures, which consist of open-source software.
GATE, which was initially developed in 1995 at Sheffield University, is commonly applied in the NLP domain. It contains basic NLP tools for low-level processing (e.g., tokenizers, penetration splitters, and partspeak taggers) packed into a CREOLE wrapper and a highlevel processor for named entity recognition packaged into an ANNIE which is an information extraction system. It can incorporate current techniques of NLP and machine learning such as Weka, RASP, SVM Light, and LIBSVM. GATE was used as a basis by many clinical NLP systems, such as HITEx and caTIES, for the extraction of cancer information.
UIMA belongs to the Apache Software Foundation software and it was initially designed since 2006 by IBM. Its objective is to promote the reuse of analytical components and to reduce the duplication of analytical development. The pluggable architecture of UIMA allows you to easily plug-in your analysis components and combines them with others." IBM's 2011 Jeopardy challenge Watson system has developed UIMA's framework, which is recognized as the best-known foundation. The functionality of UIMA is broader than that of GATE, since UIMA can be used to analyze audio/video data in addition to textual data. Several biomedical NLP systems, such as cTAKES, MedKAT/P, and MedEx, use the UIMA framework for cancer-specific characteristics extraction [146][147] and medication extraction.
This section provides a general overview of the biomedical NLP system architecture by explaining the most significant and influenual NLP systems in the biomedical NLP field. Two of the common systems for extracting UMLS concepts from clinical texts are the Linguistic String Project-Medical Language Processor (LSP-MLP) [148] and the Language Extraction and Encoding System (MedLEE) [149]. The Mayo clinical Analysis and Knowledge Extraction System (cTAKES) [150], Special Purpose Understanding System (SPUS) [151], SymText (Symbolic Text Processor) [152] and SPECIALIST languageprocessing system [153] are the major systems developed by few dedicated research groups for maintaining the extracted information in the clinical domain. Another important system widely used in the clinical domain is MetaMap [154]. Among all, MetaMap is found to be useful with patients' HER for automatically providing relevant health information. Table 4 presents the characteristics of the major biomedical NLP systems discussed in this section.

C. THE ENSEMBLE METHODS FOR THE BIOMEDICAL NLP TOOLS
The ensemble approach improves the portability of biomedical NLP systems by combining the strengths of individual tools. An ensemble is a meta-algorithm that incorporates various basic models into a predictive model, and in several machine learning tasks, this combination has demonstrated superior results [155][156] [157].
The ensemble approach has been widely applied to various clinical and biomedical issues such as identification of biomarker [158], protein-protein interaction [159], causal molecular networks inference [160] and gene expression based disease diagnosis [161].
Many studies have explored the ensemble of NLP tools for medical concept recognition.
For example, Torii et al., developed BioTagger-GM by combining recognition results from individual systems and using a voting schema and achieved the best performance in the BioCreAtIvE II challenge to recognize gene/protein names from literature [162] [163].
Doan et al. demonstrated that the ensemble classification results which incorporate single classification models into a voting system could perform better than a single classification model in identifying medical information from clinical text using the 2009 i2b2 (Informatics for Integrating Biology and the Bedside) challenge datasets [164].
Kang et al., merged two dictionaries-based systems with five statistical systems into a simple voting scheme and achieved a third-place finish in the 2010 i2b2/VA challenge to extract medical problems, examinations and medications [165].
Kuo et al. combined cTAKES and MetaMap to develop an ensemble pipeline that improved the efficiency of NLP tools in extracting clinical data terms, but with high variation depending on the cohort [166].

VI. LITERATURE REVIEW AND RELATED WORKS
Throughout this section, we discuss some articles and surveys that constitute our literature review, including a list of all related works for applying machine learning to biomedical NLP, especially on chronic diseases.

University of Utah
Radiology Concepts from findings in radiology reports.

ICD-9 SPECIALIST
A part of the UMLS project with the SPECIALIST lexicon, semantic network, and UMLS Meta-thesaurus.

National
Library of Medicine (NLM)

UMLS -
Diseases classification: About 106 studies have been analyzed and were mainly linked to 43 specific chronic diseases. One objective was to clarify the application of NLP and its related clinical notes for particular types of conditions. Therefore, using the International Classification of Diseases, 10th Revision (ICD-10) [19], the 43 specific chronic diseases were then classified into ten types of diseases, as shown in Table 5. Figure 9 shows the number of EHR-related publications on chronic diseases per year.

A. DISEASES OF THE CIRCULATORY SYSTEM
A) Cardiovascular Diseases Heart disease is one of the major death causes, while prediction and prevention have recently developed. The identification of risk factors is a necessary first step in predicting and preventing heart disease. Many studies have been proposed to determine heart disease-related risk factors, but no one has tried to identify all risk factors. A challenge for biomedical NLP, in 2014, was released by the National Center for Computer Science for Integrating Biology and Beside (i2b2) that involved a track (track 2) for determining risk factors of heart disease in the clinical texts over time. The purpose of this track was to classify information on cardiovascular risks, as well as to monitor the quality of the historical medical records. It was important to classify tags and characteristics associated with the existence and development of the disease, risk factors, VOLUME 4, 2016 and medications in inpatient medical history. Table 6 summarizes the number of papers related to diseases of the circulatory system. B) Peripheral and coronary arterial disease Millions of people worldwide were affected by Peripheral arterial disease (PAD), which is a type of chronic disease. For automated determination of PAD status using predetermined criteria in clinical reports, the NLP algorithm should be used as a determining PAD status from clinical notes, which is labor-intensive and time-consuming by manual chart review. Many researchers have used NLP to identify peripheral arterial disease (PAD) cases and critical limb ischemia in clinical records. It is also used by recent genome-wide PAD research to identify medications, diseases, signs/symptoms, anatomical locations, and procedures. Table 7 summarizes the number of papers related to peripheral and coronary arterial disease. C) Hypertension One of the main health problems is hypertension (HTN) and high blood pressure (HBP) diseases. It is estimated that by 2025, adults with hypertension will increase by 60One of the major risks for cardiovascular and kidney diseases is HTN. Any HTN-relevant patient knowledge has significant application in cohort discovery and the development of predictive prevention and monitoring models. Most of this important medical knowledge typically takes the form of non-structured clinical records distributed over multiple EHR systems. Extracting patient-relevant information from unstructured clinical notes usually takes a lot of resources and consumes time. In particular, manual extracting of HTN information is a significant issue, which is timeconsuming as HTN information is usually reported in multiple records for one patient. Another important issue besides the manual extraction is coding HTN information to standard ontologies like SNOMED-CT. There are simple mining techniques of clinical texts that can be applied to extracting HTN information from unstructured clinical reports. Table 8 sum-marizes the number of papers related to hypertension disease. D) Heart failure identification Heart failure is a chronic disease usually caused by some deficiency in structure or function. The quick and accurate prediction of heart failure mortality is important for improving patient health care and preventing death. But, due to the weak feature representation of heart failure data, prediction of death caused by heart failure is a significant challenge using simple models. Table  9 summarizes the number of papers related to heart failure disease.

B. NEOPLASMS
EHR provides important cancer-related knowledge which can be valuable for biomedical research because extracting and structuring this knowledge is provided by NLP methods. This section discusses many studies related to cancer, such as the identification of multiple cancer types, the extraction of tumor characteristics and tumor-related information, cancer patient trajectories, cancer recurrence, and cancer stage identification. Table 10 summarizes the number of papers related to neoplasms disease. Regarding breast neoplasm, a significant data source for epidemiologic research is the EHR system. In studies related to population, structured EHR data such as diagnosis and procedure codes are usually used. They do not accurately capture some conditions such as breast cancer recurrence that is only recorded in unstructured clinical reports. A typical method for extracting information from EHR data is manual processing, which consumes time, costly and causes inherent privacy risks, restricting the amount of available information for the study. NLP methods can be used to solve this issue by processing unstructured texts and they were used as an alternative or a supplement to manual chart abstraction. NLP has been successfully applied to several biomedical applications such as analyzing results from imaging and pathology reports, recognizing persons based on cancer examinations, selecting clinical trials, detecting postoperative surgical complications, and Chen et al., [98] Identification of heart disease risk information.
Hybrid pipeline system based on both: machine learningbased rule-based approaches using: SVM, libshortText, and CRFsuite.
The i2b2/UTHealth Challenge. F1-score of 92.68% The proposed system did not perform very well for coronary artery disease (CAD), obesity status, and smoking status.
Torri et al., [91] Detection of the risk factor for cardiac disease.
A hybrid of several ML and rule-based techniques. The current EHR database has information which could be critical for model performance including imaging of brain scans was not available. As with all studies based on realworld data, there is the potential for missing records. Healthcare information in the database was not available until January 2007.
Garg et al., [169] Automatic classification of the Ischemic Stroke subtype.

Machine Learning and NLP algorithms
The Northwestern Enterprise Data Warehouse (EDW).
Kappa of . 25 The proposed method relies on the level of documentation and detail in the EHR. The proposed system did not include the entire EHR (e.g., cardiac imaging, laboratory, procedures). The proposed system did not include CT-based radiology reports to reduce variability in the dataset.
Kim et al., [170] Identifying AIS patients by automatically classifying brain MRI reports.
Supervised ML-based NLP algorithms. All brain MRI reports from a single academic institution.
F1-measure of 93.2% Accuracy of 98.0% The proposed system used text corpus which was created at a single institution. The proposed system only included brain MRI reports with conventional stroke MRI sequence.
Grechishcheva et al., [171] Developed a study of risk markers identification.

NLP algorithms.
Almazov National Medical Research Center.
high accuracy. One of the weak sides of the algorithm is its speed. For current corpus of data, it took 6,100 seconds to remove marginal parts of speech, short words and form a normal form for left ones.
performing pharmacogenomics and translational research. NLP has also demonstrated recent progress in identifying breast and prostate malignancies recorded in pathology reports. NLP-based algorithms, in some cases, do as the same as manual processing, or even better. Table 11 summarizes the number of papers related to breast cancer disease.

VII. OPEN ISSUES AND CHALLENGES
One of the primary healthcare issues is broadly acknowl-edged as the risk of chronic diseases such as cancers, diabetes, and hypertension. Although considerable development has been achieved in discovering new therapies and prevention methods, it remains a challenge, and the magnitude of this challenge is growing, having a significant effect on the quality of life and the cost of healthcare. Therefore, effective strategies and methodologies are required to supplement and expand beyond existing evidence-based therapies, which can mitigate the severity of chronic con-VOLUME 4, 2016 Afzal et al., [63] Identification of critical limb ischemia using NLP.
NLP algorithm for PAD identification.
The Mayo clinical data warehouse.

F1-score CLI-NLP of 90%
The proposed algorithm used data for a PAD cohort from a single medical center and future studies should apply and validate this algorithm to other institutions to make the findings generalizable. Afzal et al., [172] Identifying PAD cases from narrative clinical notes. NLP Algorithm.

The
Mayo clinical data warehouse.

accuracy of 91.8%
A limitation of this study is that data were retrieved from the data warehouse of a single academic medical center.
Leeper et al., [64] Applying No association between the use of Cilostazol and any serious cardiovascular adverse event including stroke is observed.
A limitation of this study is that it could have missed comorbidities due to false negatives from lower sensitivity (73%). The outcome measures may not have captured events occurring outside of the hospital or that led to hospitalizations in other institutions.
Buchan et al., [89] Automatic prediction of coronary artery disease from clinical narratives.

Naive Bayes, MaxEnt, and SVM
The 2014 i2b2 Heart Disease Risk Factors Challenge data set.  Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976.
The proposed system evaluated the portability at only a single additional site. Other institutions may differ from both Vanderbilt and Marshfield Clinic. Proposed algorithm also did not detect the date of onset of hypertension, which could be clinically interesting in a number of circumstances.
ditions. The secondary use of EHRs for processing patient data, promoting medical research, and enhancing the clinical decision making is a promising path. Methods based on EHR processing and modeling contribute to a better understanding of patient clinical trajectories and improving stratification of the patient and risk prediction. Effective extraction of unknown clinical knowledge is provided by using machine learning and especially deep learning for processing EHRs. The longitudinal structure of chronic diseases provides a broad continuous stream of data that can identify useful clinical trends and direct clinical decisions in a way that delays or avoids the onset of the disease.
Because of the various difficulties involved in the production of clinical reports, progress in NLP research in the biomedical domain is sluggish and lagging relative to progress in general NLP. The main reasons for the challenges to the development of biomedical NLP are that the access to shared data is very difficult, the annotated datasets that can be used for training and benchmarking are insufficient, the annotation agreements and standards are inadequate, reproducibility is formidable, partnerships are restricted, and user-centered development and scalability are missing. The i2b2 / VA Challenge shared tasks, tackle these obstacles by providing participants with annotated datasets for potential solutions.
The development of biomedical NLP has several issues and challenges that faced the process of clinical notes for chronic disease detection. It is worth noting that these the proposed framework achieves the best performance for Heart Failure mortality prediction.
Extensive experimental results compared with other machine learning methods demonstrate that the proposed method has the highest average accuracy.
The performance of the FRDLS (feature rearrangement based deep learning system) for heart failure mortality prediction relies on two hyper-parameters which require some time to adjust. The prediction label in this study is binary to "death" or "alive", making the complex HF problem simplified.
Topaz et al., [177] Identify patients suffering from heart failure with ineffective selfmanagement status.

The Partners
Healthcare System EHR data.
F-measure of 86.3% Precision of 95% Recall of79.2% The performance of proposed system was evaluated with only one type of note-discharge notes-in one healthcare system. So the ability to assess the system's performance for different HF self-management domains was limited because of the small number of positive cases in the testing data.
Garvin et al., [178] Automation of heart failure quality measures using NLP. First, it is likely that some clinical information was not documented in the patient charts and therefore could not be captured by the NLP system. Second, although the CHIEF (Congestive Heart Failure Information Extraction Framework) system performed well using VA text notes, it might not perform as well in non-VA settings. Third, documents from only eight medical centers were used in this research; therefore, the CHIEF might underperform initially when used with documents from other VA medical centers. Vijayakrishnan et al., [65] Recognition of signs/ symptoms of HF in EHRs.

EMR Data
Extraction.
The Geisinger Health System HER data.
A total of 892,805 affirmed criteria were documented over an average observation period of 3.4 years. Among eventual HF cases, 85% had $1 criterion within 1 year before their HF diagnosis, as did 55% of control subjects.
First, unlike the Framingham investigators, the proposed application was unable to accurately account for other, non-cardiac, causes for a patient to have experienced minor signs and symptoms. The variability in the documentation of HF signs and symptoms by various clinicians. challenges remain until now as presented in [53] [106]: 1) Domain knowledge: Adequate knowledge of the domain is the most important requirement for an NLP researcher involved in the development of systems and methodologies for processing biomedical records. The primary importance of domain knowledge stems from the fact that the output of the system is made available for application in healthcare. Thus, the system is always required to have sufficient recall, accuracy, and F-measurement for the intended biomedical application, with the necessary performance modification. Interestingly, it is possible to apply the NLP techniques capturing the domain knowledge available in the free text. The NLP approach for the automated capture of ontology-related domain knowledge, for example, uses a two-phase methodology to extract terms from the linguistic representations of concepts in the initial phase followed by the extraction of semantic relations. 2) Confidentiality of the biomedical text: A sample of training data is required to develop and evaluate an NLP system. The training dataset is a vast array of electronic patient records in textual formats in a clinical context. The privacy of patient data is protected by The Health Insurance Portability and Accountability Act (HIPAA) in the United States. It is necessary to de-identify personal information to make the records accessible for research purposes. However, automated recognition of details such as names, addresses, telephone numbers, etc., is a highly challenging task, which often needs manual review. Eighteen personal information identifiers, i.e., the identification of protected health information (PHI) in the clinical report, which should be excluded as required by HIPAA, Although regular expressions were more robust for extracting TNM (tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M)) mentions, the used range of features used with the CRF classifier were still limited and potential improvements may be observed if other more sophisticated feature patterns were used such as character N-grams. In addition, other machine learning algorithms could yield better performance than CRF(The Conditional Random Fields) and further investigation is required. Si Y. et al., [41] Frame- A limitation of this study is that its evaluation was limited to the gold standard for each process in the pipeline instead of a multi-step evaluation and optimizing the pipeline.
Datta. [181] This study presented a literature review of biomedical NLP related to cancer.
-The most common scope of the papers (36 of 78) was cancer diagnostics, with recent work on the extraction of information on treatment and the diagnosis of breast cancer.
The first limitation of this review is that, given the rapid pace of NLP development and publication, it is likely that papers meeting inclusion criteria were omitted. Specifically, papers published starting in late 2018 were likely missed. Second, although the proposed review have tried to accurately interpret and represent all the possible information types extracted in the literature as frames, there might be a few which are not captured or mis-represented in proposed final frame list. Finally, there may be inconsistencies in assigning the frames and associated elements across all the papers.
is very complicated and time-consuming. In 2006, the challenge of i2b2 de-identification took the most significant effort to develop and evaluate automated de-identification tasks. The available approaches to de-identification include (1) rule-based methods that use dictionaries and manually crafted rules to match PHI patterns, (2)   First, the used NLP modules may require adaptation to accommodate language usage and document sectioning in other institutional settings. Second, NLP development costs limit its applicability to large or repeated tasks where it is cost effective relative to 100% manual abstraction. Third, NLP requires access to machine-readable clinical text and does not work with print documents or their scanned copies. Fourth, proposed study cohort was limited to women with early stage (I or II) breast cancers; the algorithm has not been tested for recurrence in women with initial late stage disease or ductal carcinoma in situ (stage 0 Fifth, reference standard corrections were limited to the review of charts where NLP and the reference standard were discordant. Castro et al., [90] The An important potential limitation of proposed methods was the using of the BROK software to sample by BI-RADS category during the development of the corpus used for development of the used machine-learning base annotator. For the BI-RADS token annotator, the generalizability of proposed findings may also be limited because the detection step of used annotator strongly relies on the accurate performance of the pre-processing steps. First, since the bulk of the text is automatically generated by the reporting system, the variability in narrative reporting style was not represented, which could make the proposed results seem better than when applied to reports not generated using the proposed system. Another limitation is that if the radiologist edits the text report, corresponding changes are not made in the structured database. Another limitation is that although the regular expressions that used in the proposed system seemed to work well for detecting BI-RADS descriptors, this approach is specific to BI-RADS and might not work as well if radiologists use other types of terminologies in reporting mammograms. the biomedical text, especially with patients' medical reports: (1) the clinical text often contains the information in a free-text format like a pseudo table, i.e., text intentionally made to appear as a table. Although the contents of the pseudo table are easy to interpret by a human, for a general NLP system, the identification of the formatting features is complicated.
(2) Although the importance of report sections and subsections relevant to many applications, the section headers are either ignored or combined with similar headers on many occasions. (3) Another issue often found in the clinical text is the missing or incorrect punctuation, e.g., to indicate the end of a sentence, a new line can be used instead of a point. The Clinical Text Architecture (CTA), which tries to define the criteria for the clinical report structure, effectively addresses the issue of different formats of the clinical text. 5) Expressiveness: The biomedical domain language is hugely expressive. There are many ways to describe the same medical concept, e.g., cancer can be expressed as a tumor, lesion, mass, carcinoma, metastasis, neoplasm, etc. Likewise, the modifiers of the concept can also be described with many different terms, e.g., the modifiers for certainty information would match more than 800 MedLEE lexicons, thus making the retrieval process more complicated. 6) Intra-and interoperability: A biomedical NLP system is expected to work well in various healthcare and biomedical applications and to be easily integrated into a biomedical information system. In other words, the system needs to handle a biomedical text in different formats. For example, the formats of discharge summaries, diagnostic reports, and radiology reports are different. The output of the NLP system can also be stored in the clinical database. However, it is almost unlikely to map the same to the clinical database scheme because of the complexity and nested relationships of the output. Additionally, the output from the NLP system must be available for comparison for a variety of automated applications through widespread deployment across the institutions. To achieve this, the output must be mapped onto a standardized vocabulary system such as UMLS, ICD-10, and SNOMED-CT, and onto a standard domain representation. Finally, it is considered essential to interpret the biomedical information and the relationships between concepts to construct a representational model. For example, "treats" is one of the relationships between a drug and a disease. 7) Interpreting information: Interpretation of clinical information available in a report requires the knowledge of the report structure and additional medical knowledge to associate the findings with possible diagnoses. The complexity of interpreting information depends on the type of report and section, e.g., it is easier to obtain information on the vaccination being administered than to get information from a radiological report containing patterns of lights (patchy opacity). An NLP system that interprets light patterns to specific diseases should contain medical knowledge related to the findings.
Despite the recent advances and developments, these recent limitations have affected the use of NLP technology [106]: 1) The availability, consistency and characteristics of the training data The availability, consistency and characteristics of the training data are very essential for building NLP models [186]. For the training and implementation of an effective NLP models, the access and availability of appropriately annotated datasets are very important. For example, the designing of NLP algorithms that can perform a systematic synthesis of published research on a specific topic or an analysis and data extraction from EHR needs unrestricted access to databases of publisher or primary care/hospital. While the number of biomedical datasets and pre-trained models that are publicly available has increased over recent years, the availability of public health concepts is still restricted [187].
2) The ability to de-bias data The ability to de-bias data which means the ability to inspect, explain and ethically modify data is an important issue for training and using NLP models in healthcare domain. If data biases are not taken into consideration in the development (e.g. data annotation), deployment (e.g. use of pre-trained platforms) and evaluation of NLP models, the results of NLP models can be compromised [188]. However, it should be noted that this does not guarantee the same effect across morally appropriate levels, even if datasets and assessments are modified for biases. For example, it must take into account particular age group and socioeconomic groups that use social media sites when using the health data available. A Facebook-trained monitoring system could be biased towards health data and linguistic issues unique to people older than that in Snapchat's data [189]. Recently, several agnostic model tools have been developed to assess and correct injustices in machine learning models in accordance with the efforts of the government and academic communities to identify unacceptable development of AI [190][191] [192][193] [194].
3) The limited access to dataset Recently, the limited access to data is a major issue that barriers the progress of NLP system in healthcare domain [6] [19]. Health data are generally regulated regionally in Canada, and there is reluctance to provide access to these systems and incorporation with other datasets without restrictions due to security and confidentiality issues (e.g. data linkage). Public understanding of the privacy and data access has also caused critical issues.
A new study of social media users revealed that most people found analyzing their social media data in order to find "intrusive and exposing" problems of mental health is not accepted [195]. Before key public health NLP activities such as the real-time analysis of national disease patterns can be carried out, jurisdictions must collectively identify a reasonable scope and access to data sources of public health (e.g. EHR and administrative data). Future NLP applications which analyzing personal EHR rely on their ability to incorporate varying privacy in models, both during and after training to avoid breaches of privacy and data misuse [196]. The current methods for accessing full text publications often restrict access to essential data. Total automation and synthesis of PICO-specific information requires unlimited access to journal databases or new data storage modelling [197]. The available clinical datasets are MIMIC-II,the Informatics for Integrating Biology and the Bedside (i2b2) datasets, PhenoCHF, Temporal Histories of Your Medical Event (THYME), and Cancer Deep Phenotype Extraction (DeepPhe).

4) The assessment and evaluation of NLP models
Finally, as with any emerging technology, validation and evaluation of NLP models should be taken into account to ensure that they operate as expected and keep up with the changing ethical views of society.
These NLP technology must be tested to ensure it performs as intended and to take bias into account [198]. Although many methods today publish equal or better-than-human scores on tasks of textual analysis, it is important not to equate high scores with a real understanding of language. But it is also important not to consider the lack of true understanding of the language as an inefficiency. Models with a "relatively poor" depth of understanding can still be highly effective at information extraction, classification and prediction tasks, particularly with the increasing availability of labelled data.

VIII. CONCLUSIONS AND FUTURE RESEARCH ISSUES
We have discussed in this review paper, an overview of NLP in general and NLP in biomedicine and healthcare with its methods and technologies and its potential tasks and use-cases in the biomedical and healthcare domains. Then we have presented the application areas of machine learning/deep learning in the biomedical NLP. We have provided an overview of the most popular biomedical NLP systems and their general architecture. Next, we have discussed a literature review of the application of various NLP techniques to narrative clinical notes on chronic diseases, including the analysis of difficulties faced by NLP methodologies in clinical narrative comprehension. Finally, we conclude this review paper by describing existing challenges currently faced and open issues associated with the processing of the biomedical and clinical text and providing the NLP domain with sufficient resources and opportunities to extract new methodologies. In this review paper, we have discussed essential challenges such as domain knowledge, the confidentiality of clinical texts, abbreviations, diverse formats, expressiveness, intra-operability and interoperability, and information interpreting. These discussions provide an opportunity to understand the complexity of the clinical text processing and various approaches available. An important area of research related to the understanding of the challenges involved in processing the clinical text is the development of methodologies for processing the diverse format of clinical texts. Each format, on its own, is a challenge for NLP researchers and can be explored using traditional and hybrid methodologies. Our review has shown that biomedical NLP methods need to be modified and updated beyond the extraction of clinical terms to concentrate more on the interpretation of concepts (i.e., not only understanding of relationships between concepts but also combining the clinical data, domain knowledge, and general knowledge in the reasoning process).
In conclusion, NLP provides a powerful methods for unlocking information about chronic diseases from un-structured clinical narratives. Despite of developing new standards and better encoding EHR with clinical terminology standards, there is still a narrative aspect, which makes the biomedical NLP methods essential for clinical research informatics. There have also been widespread application of different techniques and models to biomedical literature and all of these NLP techniques are important and can be applied to effectively mining EHRs to support essential clinical research activities. New deep learning techniques have contributed with a significant progress across various tasks and will be increasingly adopted to analysis big data of EHRs effectively and efficiently, further advancing disease management, quality improvement, and all aspects of clinical research.