Automated Triaging Medical Referral for Otorhinolaryngology Using Data Mining and Machine Learning Techniques

Public hospitals receive and triage a large volume of medical referrals for otorhinolaryngology annually and it can be a challenge to derive knowledge from them as they are written in unstructured text and may be unavailable in electronic formats. Acquiring knowledge and insights from these referrals are important to public health management and policymakers. Triaging of general practitioner (GP) referrals for ear, nose, and throat (ENT) specialists is a manual process performed by experienced clinicians, but it is time-consuming. This paper proposes utilising machine learning and data mining to automate the process of referrals. In this study, an ensemble of machine learning algorithms to perform clinical text mining against the unstructured referral text in order to derive the relationship among the discovered medical terms was proposed and implemented. A set of comprehensive term sets’ association rules which describe the entire referral dataset’s characteristics was obtained from the association rule mining experiments. The neural network-based text classification model that can classify referrals with high accuracy was developed, tested and reported in this paper.


I. INTRODUCTION
Patients are normally referred to specialists when the general practitioner (GP) requires further diagnostic support or is not able to provide specialised care for them. Another purpose of the referral process is to support the GP in managing the care that the patients have received and keep track of their progress [1], [2]. Every year, there are millions of medical referrals written by general practitioners to refer their patients to other medical professionals for specialised treatment in the state of Queensland [2]. Healthcare organisations such as hospitals and medical centres triage patients' medical conditions based on the referrals to ensure that they can receive The associate editor coordinating the review of this manuscript and approving it for publication was Mu-Yen Chen . the relevant and timely care to avoid care duplication and untreated illness, thus minimising cost and resources [2], [3]. These referrals are designed to help patients to obtain the appropriate level of care. They contain the current conditions, including the patient's medical history and list of medications [4].
However, not all specialists are within the GP's consideration when they decide to refer the patients. Each GPs has their referral circle, specialists that they are familiar with, or have worked with in the past. This can create an element of bias, in that the referrer may ascribe a particular diagnosis to a set of symptoms or may assign a particular level of urgency to the referral due to patient pressures. The objective of triaging is to get the right patient to the appropriate specialist for the first time. The process of triaging referrals is performed by certified clinicians (usually a nurse or doctor) to sort and allocate the patients following a system of priorities to specialists to shoehorn the quality of care that they need and their survival rate, whilst minimising delays and cost [5]. The triaging clinicians decide which referrals should be prioritised, by examining the level of urgency and the pathway for the patients [6], [7]. Cases that are deemed to be clinically serious or severe will receive higher priorities on the triage list and will therefore see the specialist more urgently. Most health organisations that employ the triaging process categorise the referrals into Category 1 (urgent), 2 (moderate urgency) and 3 (non-urgent). Medical organisations employ a large group of clinicians and systems to manage the triaging process.
The majority of the referrals are sent in electronic documents such as word or Portable Document Format (PDF) files and they are written in free text by the GPs describing the patient's condition or challenges in managing the presentation [2]. The current method of processing general practitioner (GP) referrals has been computerised with referrals coming in digital formats with PDF and scanned images as examples. However, the triaging procedure is still performed by clinicians, in which each document is reviewed manually [1]. Every GP has their personal style of writing referrals, and there is wide variation in the content and structure in the referral formats [3]. Each clinician who triages referrals also has varying levels of professional experiences and familiarity with the clinical operation [5]. While there are governing policies and procedures for triaging, it is not unusual for variations among the clinicians' approach to triaging and associated biases [3], [6], [7].
Beside triaging, healthcare management parties also need to gain clinical insights into the large volume of medical referrals that are received yearly in all medical specialties [3]. Examples such as identification of medical entities described among the referrals and under different triaged groups, their associative relationships that constitute the referred contents for the patients, and their frequency of occurrence are some of the knowledge that can be acquired [6]. This allows the healthcare management and policymakers to appreciate the variation among the referrals dataset's contents as well as the clinicians' teams triaging expertise [2]. Several potential benefits can be gained from achieving this; namely variance can be reduced and biases can be adjusted for in calibrating the criteria and policy that govern the triaging process and gauge the triaging clinicians' judgments as a collective [3].
While conventional Natural Language Processing (NLP) can mine contents from unstructured text, they are usually not equipped to handle medical content. Therefore, new branches of medical-based NLP have been repurposed in recent times to handle the Unified Medical Language System (UMLS) [7]. The medical referrals contain medical terms which details information about the patient's condition, history, illness, location, interventions already performed, and medications prescribed. These terms can be represented as codes from the International Classification of Diseases (ICD) [5]. The diagnoses are written in unstructured text formats [6] which makes it challenging to extract useful content. This feature engineering can also open other avenues to pursue clinical insights such as outlier detection among the referrals [8]. Furthermore, conventional text classification is unable to score effectively against ICD due to the characteristics of the medical text contents.
The process of triaging involves several stages: firstly, data extraction of the referrals, which arrive in numerous formats from binary files like pdfs to physical copies via fax or mail. Secondly, review of the content to ascertain their validity, domain, and relevancy in the referrals. The triaged referrals are then assessed based on a set of rules to determine their urgency [7]. Finally, the allocation of pathways or services. The triaged referrals are then assigned to the pathway following the clinicians' judgment. The model of the triage process is reviewed manually by the clinicians periodically to ensure that they are up to date and reflect contemporary practices [6], [7].
The goal of this research is to support public health policymakers to be informed and have better insight into the triaged referrals' landscape so that policies can be improved through effective quality control over the triaging processes. The specific objective of this research is multifaceted: firstly, to develop a novel approach to triage GP's referrals. Secondly, to acquire knowledge and insights about the medical entities and their associative relationships that make up their occurrence among the various categories of the triaged referrals from the large volume of referrals and to automatically classify them. Thirdly, to apply the concept of outlier detection to identify referrals that are considered as outliers in each triaged group.
To achieve the goals, this paper proposes a novel approach of mining and classifying the large pool of GP referrals and acquiring insights into the contents using a combination of a medical-capable NLP service [1], association rule mining [3] and a combination of dimensionality reduction with outlier detection [8], [9]. For the dataset, we obtained a group of GP referrals that are from the medical speciality of Otorhinolaryngology. One of the main medical NLP used for this research is AWS (Amazon Web Services) Comprehend Medical [10] which we extract and encode the notes' entities into proper terminology [4].
This research makes several significant contributions: 1. The application of machine learning/deep learning to real world applications (medical referrals) in order to acquire a high level of accuracy with evidence. 2. Support of the Queensland public health system in the potential application of machine learning/deep learning to provide decision support to clinicians. 3. Establishment of a novel approach that enables clinicians and policymakers to appreciate the landscape of medical needs and deficits by creating useful clinical insights with the use of data mining and machine learning.
44532 VOLUME 10, 2022 The rest of the paper is organized as followed: Section II briefly presents a summary of related works including technologies and algorithms. III describes the overview of the three designs, covering the proposed approach. Section VI discusses the study dataset. The empirical results and analysis are reported in section V. The last section provides a conclusion of this study with suggestions on future enhancements and potentials.

II. RELATED WORKS
To achieve the goal of this research, three AI-based models are proposed; Referral Classification, Referral Insight and Outlying Referral detection. In this section, we first give a brief description of the data mining, machine learning/deep learning techniques, outlier detection and medical NLP techniques used for building these three AI-based models and then summarise some recent medical referrals works.

A. ASSOCIATION RULES MINING
Association Rules Mining (ARM) is a data mining procedure that discovers the relationship among the items in a dataset that are based on rules that can have some measure of interest. It is commonly known as Market Basket Analysis. According to Agrawal et al. [14], the rule to measure the interest between two items, X and Y , from a dataset, N , can be expressed as Support, Confidence and Lift. For the rule of X → Y , the Support indicates the frequency of the itemset in the dataset, the Confidence is the frequency of the rule that is found to be accurate, and the Lift of the rule represents the ratio of the rule's support to the support of X and Y when they are independent.
The Apriori algorithm was proposed by Agrawal and it has a bottom-up approach in identifying the items that are the subset within the dataset that meet the given threshold of C transactions [14]. The algorithm then extends the frequent incrementally, also known as candidate generation, and then repeat the process in testing the groups of candidates against the dataset to discover more rules.
A neural network is a group of machine learning algorithms that mimic the organic human brain to recognize a set of data's underlying relationship through the process of minimizing a loss function via experiencing with the learning of weights on the neurons [11], [12]. The neural network consists of layers of interconnected nodes known as perceptron and each perception receives multiple signals produced from data into an activation function which produce output that is fed into other perceptron or as result. Deep learning is another implementation of a neural network where multiple layers of perceptrons are arranged in an interconnected layout where the input layer received data input patterns while the output layer produces corresponding classification or output signals. The layers between the input and the output layers are referred to as hidden layers and it optimizes the weighting of the input within the layers until the neural network's error margin is minimal [11], [12].

C. UNIFORM MANIFOLD APPROXIMATION AND PROJECTION (UMAP) AND LOCAL OUTLIER FACTOR (LOF)
Uniform Manifold Approximation (UMAP) is a dimensionality reduction algorithm that is based on manifold learning techniques such as Riemannian geometry, and topological data analysis [8]. The GP referral dataset is represented as a matrix and it is considered as high dimensional data which that is difficult to process [8], [13]. Such high dimension dataset must be reduced to mitigate the curse of dimensionality, to a lower dimension for easier analysis. UMAP's approach is to present the data on an embedded non-linear manifold on a space of higher dimension, then the data is visualized in space of lower-dimension once the manifold is reduced to a lower dimension [8], [13]. it is a neighbour-graph algorithm that defines fuzzy simplicial sets using local metric spaces when are concatenated to form a large singular topological structure. A k-value is provided by the user and it is used to balance the local-global structure representation, which is comprised of the local metric spaces that are calculated from each data points to their k-th nearest neighbour [8]. This is followed by the embedding process which is a low dimension layout that is built from the fuzzy set cross-entropy that matches the largest edge weights from the structure's topology using the minimum pairwise distances available in the layout, and this process is reversible [8], [13].
Once the data's dimensionality is reduced, the next step is to perform the outlier detection. For this, the Local Outlier Factor (LOF) algorithm is used [9], [14]. The LOF algorithm computes the local density deviation for selected data points against their neighbouring data points and regards those data points that have a much lower density value to be outliers. It iterates through all the data points and finds its k-distance from its neighbouring data points in Manhattan distance which is then used to derive its Local reachability density (LRD) [9], [14]. The value of the surrounding datapoint's LRD is used to determine the Local outlier factor (LOF). And whichever datapoint has the highest LOF value is then considered as an outlier [9], [14]. VOLUME 10, 2022

D. MEDICAL NATURE LANGUAGE PROCESSING
Medical NLP have acquired significant improvement by many researchers and especially cloud-based service providers to decipher large volume of unstructured medical texts [15]. Sources for these medical text are ubiquitous among medical centres and hospitals and the medical NLP acquire not only the specific information but also to link them to other medical ontologies such as ICD, UMLS to serve other medical services such as clinical insights, finance, insurance, or serve as feature engineering to support other ensembled machine learning models for specific clinical application [17]. One of the common uses of Medical NLP is in the triaging of patients' conditions in places such as emergency departments, specialist clinics [1], [18]. Other used are creation of medical intelligence that enable clinicians to understand the patients' groups of conditions and status of their health [19], [20].

E. SUMMARY OF RECENT MEDICAL REFERRAL WORKS
Researchers have applied machine learning algorithms to a range of the medical disciplines in the healthcare sector [21]- [23].
A summary of recent studies that use machine learning models to triage patients across for medical services among the hospitals are displayed in Table 1.
The studies [24]- [27] are closely matched to the goal of this paper's objective where the urgency to attend to the patients' needs at the emergency departments by triaging their current medical conditions and history. In particular, the study [28] used convolutional neural network (CNN) and artificial neural network (ANN) to assist with the triaging of ophthalmology referrals. CNN provided the best accuracy of 81% on the test set and ANN achieved accuracy of 77%. Studies [29], [30] have similar objectives but they are targeting medical specifics which have similar triaging requirements. The studies use both statistical algorithms and deep learning to evaluate their effectiveness and their outcomes of their classifications are calculated on their precision, recall and specificity results. Their outcomes show that deep learning fair better than other models but require additional support in model training and tuning.

III. DESIGN OF THREE AI-BASED MODELS
Three AI-based models namely Referral Classification, Referral Insight and Outlying Referral detection are proposed to achieve the research goals. In this section, we described the design of these three models.

A. AI-BASED REFERRAL INSIGHT MODEL DESIGN
For this paper, the terms refer to the individual medical entities or terms that are identified among the referrals by the Medical NLP [1]. A term set refers to a series of medical terms present in a patient's referral and they have a significant associative relationship with each other. The proposed novel approach is called Referral AI Insight (RAII) and it is shown in Fig. 1.
It uses a combination of medical capable NLP and Association Rules algorithms [13] to mine the GP referrals' contents within 4 distinct stages of operation. Due to the sensitivity of the data, some part of the referral's contenting parsing is done by a separate group of information technology (IT) specialists to ensure data security and privacy.
The first stage -All the GP referrals are in PDF format and are uploaded to the cloud blob storage. Their contents are extracted out via Python's API (Application Programming Interface) libraries and stored in a cloud SQL database. Each record has a triaged label that the clinicians have assigned to them. Second stage -The referral contents are then parsed by the Medical NLP service which identifies and retrieves out the medical entities, or terms, as well as their associated medical-ontology linked codes (ICD10CM and RxNorm). The medical NLP provides a list of possible matching codes together with their confidence score. We select only those codes that have the highest score and are above the acceptable threshold of 50% confidence. The codes are then combined into an array and stored in the same database. The third stage -Once the 2nd stage is complete, we retrieve all the records' parsed medical codes under a specific referral category and then apply count vectorization on it to create a large sparse matrix. In the fourth stage -the Apriori algorithm is applied to the dataset.
Results obtained are a comprehensive list of rules between the itemset of medical codes from the dataset, which are then plotted and discussed later in the paper.

B. AI-BASED REFERRAL CLASSIFICATION MODEL DESIGN
Our AI-based Referral classification model uses a combination of medical NLP and deep neural networks to classify triage GP referrals as shown in Fig. 2. These processes are comprised of 4 stages and they are performed by different groups of IT specialists to ensure data security and privacy. The first stage is to upload the GP referrals in PDF format onto the blob storage and extract their contents out via pdf API libraries. The second stage involves using the medical NLP services the extract referral contents and extracts out the medical entities as well as their respective medical codes. The third stage involves encoding and vectorizing the dataset's content medical codes and labels. Lastly, the final stage is to use the dataset and train the models.
The experiment includes using other machine learning models and compares the results with our main AI-based Referral Classification model. It is intended that our new solution becomes flexible enough to adapt to any medical domain with greater flexibility and accuracy as compared to other generic text classification methods for healthcare [4]. As the tests are conducted against a sample set of the GP referrals that have been triaged by experienced clinicians, the results are validated against their original classifications. To our best knowledge, these are the first methods that use both healthcare capable NLP services  that clinical terminology and deep learning for triaging GP referrals. The results from our experiment showed that this approach can yield high accuracy results in triaging medical referrals.
Referring to Fig. 2 and 3, the neural network model is built with 5 layers. Starting with the 1st layer with the equivalent number of neurons to match the input vector size, the second layer that follows is half the number of neurons of layer 1. Based on the result as shown in Figure 3, the model starts to converge at the 10th epoch. The validation accuracy achieved in this setup is observed at 93% with the validation loss below 0.02.

C. AI-BASED REFERRAL OUTLIER DETECTION MODEL DESIGN
For our AI-based referral's outlier detection model that is illustrated in Fig. 4, both the UMAP and LOF algorithms are used to reduce the dimensionality of the referrals data and locate those data that are regarded as outliers. Using the same processed data from section C's approach, where the extracted medical terms are feature engineered by the AWS Comprehend Medical in ICD codes, they are also vectorized to produce a sparse matrix. This is in turn used by both UMAP and LOF to locate the outliers. The outliers' data comprise VOLUME 10, 2022  a list of ICD codes and we can reference them back to the original patients' referrals.

IV. STUDY DATASET A. DATA SOURCES
In this experiment, the sample data is obtained from QLD Health's GP referral system for the field of otorhinolaryngology (ear, nose, and throat) with the date ranging from 2019 to 2020. The sampling of the data from the system complies with the States' safety and privacy of the patient's information which had been performed by respective clinical support staff. The dataset is then parsed by a separate IT application team using AWS Comprehend Medical [1] through their secured private virtual network with strong network encryptions. A subset of the samples is then made available to use in this experiment. The sample has a total of 3000 ENT referrals, and they have been labelled from the triaging process that has been conducted by the clinicians. The referral's label has a range from 1 to 3, with 1 signifying the highest urgency. There are 1000 records for each category to ensure an equal representation. The content of the referrals varies from different GPs, with some having more insights about the patients' present medical conditions, while others have included the patient's past medical history, and some have added in their current medications' routines. The machine in which this experiment is conducted is located within the organization's secured IT computer system and protected by firewalls, security policies and network ringfencing.
We use a cloud-based medical text processing facility that is specialized in the medical field called AWS comprehend Medical. It can derive useful information such as medical conditions, medical and protected health information (PHI) from medical notes using natural language processing [11], [12]. The detected entities are scored on the probability of their match through its processes of the ontologyconnecting process to the standardized medical knowledge bases of RxNorm and ICD10-CM. Therefore, each referral is processed and decomposed to JSON text which contains the entities found and their closest matching RxNorm and ICD10-CM codes. We picked only those codes with the highest match score and tabulate them against the referral category.

B. DATA PRE-PROCESSING AND PREPARATION
The data pre-processing and preparation method is depicted in Fig. 5.
Normal human-readable texts are complex in structures, meaning and are constructed for any machine to interpret. Natural Language Processing (NLP) is a form of artificial intelligence that enables computers to read and interpret text data, thus giving them the ability to measure and  determine the importance of specific text segments. The volume of Electronic Health Records (EHR) generated by healthcare-related IT systems posed a challenge for the clinicians to work with. NLP is gaining popularity in the healthcare sector as it can process these unstructured text data quickly to derive their meaning with great efficiency. But due to the nature of the business, there are numerous ways that certain important meanings may have ambiguities and inconsistencies. Examples include scenarios where ambiguities are usually caused by spelling errors, abbreviations, and description variations etc., where knowledge-base of normal language processing are not equipped to handle [15].
Text classification has been widely used to analyse and categorize texts across industry and academia [16]. Many models have been developed for this purpose such as linear regression, support vector machine, random forest, and deep learning to name a few. When it comes to performing text classification of medical text, there is a level of uncertainty especially when it comes to medical terminologies relating to illnesses, procedures, symptoms, drugs, prognosis, etc., where the lemmatization and stemming of conventional NLP knowledge bases are not capable to handle this [15]. This proved to be a challenge to perform text classification in the healthcare sector for a long time. However as of present, several cloud service and AI vendors have built specialized NLPs to cater to healthcare, and one example is AWS Comprehend Medical [17] which can derive the meaning from medical-related notes following HIPAA regulations. It can link the meanings of medical terms to the standard medical ontologies like ICD10-CM or RxNorm [1]. With this feature, each medical note can be analysed by these medical NLPs to derive and match relevant entities to the closest medical ontology's codes. Consequently, each medical note can have an array of medical-related codes that are subsequently exploited by various forms of artificial intelligence.
In our Proof of Concept (POC) environment, the GP referrals are received in PDF file format and stored in blob storage. Their details are stored in a database with their medical contents processed and extracted by AWS Comprehend Medical, including their ontologies-link codes. It generates a list of suitable codes and their probability scores against each detected medical entity. So, our program will scan and pull only those code that has the highest scores with a confidence level above 60%. Fig. 5 illustrates the different stages in the data pre-processing. Tables 2 and 3 show a sample of the processed outputs from several referrals' contents. Table 4 shows a sample of the processed referrals where their contents of extracted ICD10 codes have been filtered out and merged into an array. The next stage is to prepare and encode the processed dataset before they are ready for the association rules mining and the neural network's training and testing programs. There are several types of encoding ranging from numerical representation to hashing and vectorization. The conventional vectorizing process will use tokenization to remove certain unimportant or less meaningful words, but the text data has been tokenized by AWS Comprehend Medical during the first phase. What the vectorization process contributes to this stage is to create a collection of all the medical codes from the data     There are two methods of vectorizations: Count vectorization, where the word occurrence is represented by their counts, and Term Frequency-Inverse Document Frequency (TFIDF) vectorization, where the word occurrence is represented in the normalized form following the overall frequency of occurrence of text across the entire dataset.
For the deep learning model, the dataset is then split into a training and a testing group in the ratio of 80/20 and 90/10 for two sets of training and testing. The subset of the data is further segregated into input (X) and output (Y) sets followed by the application of vectorization. The size of the X vector is 3206 x 2395 rows of input data. Consequently, the output data of the Y vector shares the same size.

V. THE RESULTS AND ANALYSIS
In this section, we report and analyze the empirical results from the AI-based referral insight model and AI-based referral classification model.

A. THE RESULTS OF AI-BASED REFERRAL INSIGHT MODEL FROM ASSOCIATION RULES MINING
The first set of results showed a heatmap of the top 40 most common medical terms' ICD10 codes in the three referral  categories. Referring to Fig. 6, both J34.89 and H91.90 have clear distinctions to be the most common terms in both categories 2 and 3. But in category 1, there are several significant highs occurring terms with R52, L98.9, R91.8, R59.0, I25.10, I10, J34.89 and E11.9 sharing the top spots. This can help the clinicians to appreciate the landscape of the referrals' group of terms that constitutes their triaged groups and may help to adjust gradually in their urgencies as the policy change. Table 5 shows the list of those terms' ICD10 codes together with their occurrence among the dataset and their respective descriptions. The next set of results is a small sample of the frequent pattern of medical term-sets discovered from the dataset using the algorithm as shown in Table 6. It is sorted in ascending order based on their length and have min support of 0.01 under each triaged category. We can see that some of the terms did exist across the three categories but most of them have unique termset associations. In category 1, the longest-term sets have many similar terms, and their support is above 0.03. Such patterns can be also observed in category 2, but their support values are lower. However, for category 3, there are more variations among the longer-term sets and have lower support values. The term sets' confidence versus support under each category is plotted as shown in Fig. 7. Category 2 and 3's support value is generally low below 0.025, which signifies a lower frequent pattern of occurrence in the dataset. But  for Category 1, the rules' support value disperses up to 0.06, which signifies that there are more term sets with a higher frequency of presence. Their confidence values varied uniformly throughout the normalized range. The term sets have ranges of inter-dependencies among their items, which means the return rate varies with no concentration to specific term-set rules. Both the antecedent and consequent's support indicate the terms or term sets within the dataset, while the other three values indicate the strength of their relationships.  The details in Table 7 can be visualized as a heatmap as shown in Fig. 8. The three charts show several concentration areas where some of the term sets' relationships have a higher number of occurrences. This can assist the clinicians to appreciate the landscape of the term sets' relationship across the entire referral's dataset. Due to the large volume of data, only a subset of the term sets' rules is plotted in these charts.

B. THE RESULTS OF THE AI-BASED REFERRAL CLASSIFICATION MODEL
For this section, we run the test on two groups of models, the first group is comprised of a list of machine learning models from the sci-kit-learn library, and the second group is the customized NN constructed for this experiment. We then compare and critique the results of the two groups.

1) RESULTS FROM GROUP 1
For the first round of tests, we use a series of statistical and machine learning models: linear regression (LR), support vector machine (SVM), Random forest, Extreme Gradient Boosting Classifier (XGBclassifer), Gaussian Naive Bayes (GaussianNB), KNeighbours-classifier, and three Multi-level Perception (MLP) classifiers with different learning strategies such as Stochastic Gradient descent (SGD), Adam, and limited-memory Broyden, Fletcher, Goldfarb, and Shanno (lbfgs). Their accuracy and their confusion matrix results are tabulated in Table 8. Though the hyperparameters of the models have not been changed, the majority of the model's accuracy is from 58% to 82%. A confusion matrix is generated for each model to determine and compare the numbers of actual and predicted samples' labels.
The result shows that there are significant error gaps. While we observe the need not to over-fit the models, the error values are unacceptable. As this is related to the medical term, having higher accuracy in classification is a critical priority.
All the models' accuracy are compiled and plotted in Fig. 9, with the MLP models and the Randomforests managed to achieve accuracy above 80%. The others fair with the range of 70-80% except for KNeighbours-Classifier and GaussianNB which score less than 70%. The results can be improved if their hyperparameters can be tuned, however, our goal is to use deep learning to build a customized NN model for the text classification.

2) RESULTS FROM GROUP 2
In the second group of tests, we use the neural network built based on the specification set in the previous section. Both the training and testing dataset have been vectorized. The model has been trained and tested with the dataset using the ratio of 80:20 and 90:10 split.  The results are shown in Fig. 10. As the first result shows the model that has been trained with an 80:20 split has an accuracy of 90% with validation loss at 0.5, the second test that uses 90:10 achieved a higher accuracy of 93% and lower validation loss of <0.4. This can be attributed to a better-trained model which has been achieved with more training datasets.

3) COMPARISONS OF RESULTS FROM GROUP 1 AND 2
We pit the test samples against the model and create a confusion matrix between the actual test labels against the predictions as shown in Fig. 11. Compared to the current model's confusion results against those of the previous models' results in Group 1, we can see that there is a big improvement in the accuracy.
Referring to Table 9, only the customized NN developed for this project managed to achieve higher effectiveness results as compared to the rest, scoring above 90% in terms of accuracy. Based on the computational time taken for the algorithms to complete, LR, SVM, Randomforest and XGB classifier took on an average of 5 seconds to complete while the MLP models require twice as long to complete. The customized NN that is used for the final classification took the longest with the score of over 20 minutes due to the NN's depth, with high value set for both the epoch and iterations which was set to 500 and 2000. Accuracy is paramount in health and medicine settings. Although the running time of customized NN is relatively large, we have achieved our research goal of developing model that can achieve the highest accuracy. Currently, the models were run on a desktop computer with the configuration of Intel i7 CPU, 8GB RAM and 500GB HDD. To compensate the current deficiency and for further enhancement the parallel computing techniques can be applied and more powerful computing devices can be used.

C. THE RESULTS OF OUTLIER DETECTION FROM AI-BASED REFERRAL INSIGHT MODEL
The technique is applied against the referrals under their respective categories. Referring to the following diagrams and tables, the vectorized referrals' contents are visualized while the outliers have been identified and reversed back VOLUME 10, 2022   to their original contents with ICD10 codes. The level of contamination parameter allows the user to change the sensitivity of the outlier computation, with a lower value to increase the LOF's stringency and produce fewer outliers, and vice versa. All the scatter plots as shown in Fig. 12, and Fig. 13 is scaled based on their range limits. Fig. 12 showed that the differences between the outlying referrals and the rest under category 1 are high, whereas, for those under category 2, the spread is less as compared with those in category 1. While those under category 2 have a bigger spread but not extreme. The outliers among the 3 categories can be identified using the LOF and their outputs have transformed back to the ICD10 codes as shown in Table 10, based on the contamination hyperparameter value of 0.005. This value is selected to demonstrate the ability of LOF in extracting out a small set of outliers. This can be changed to suit the users' requirements if they want to see more outlying referrals. For those under category 1, the top few have the common ICD10 representation starting with the first combination of B20, C34.90, D49.1, E11_9 and I49.9. Those in cat 3 have similarities among themselves on the ICD10 codes of B96.89, B99.9, C18.9. but those under cat 2 don't share such patterns. All their ICD10 descriptions are tabulated in Table 11 under their respective categories. The outliers here represent the uniqueness of the referrals which don't normally conform to the usual clusters and commonalities of referrals, rather than erroneous types. These records may be of some interest to the triaging clinicians, who wish to investigate further.  The final scatter plot as shown in Fig. 13 has all the 3 referrals consolidated. Their scale is mapped to their range limits, so the plot doesn't correlate visually with the previous Fig. 12. Their dimensionalities are reduced accordingly to the entire referral dataset, so their values are relative to the collective whole.

VI. CONCLUSION AND FUTURE WORK
This study used the combination of a Medical NLP and association rules mining to acquire insights into GP referrals to otorhinolaryngology. The referrals were originally in digital document form and are unstructured free texts. The RAII performed data preparation and converted them to digital form, followed by using the medical NLP to extract the medical term features. It then selected only those terms with the highest score and was organised into a list of medical terms that were based on ICD10 codes. Association rules mining was applied against the dataset, extracting multiple associative rules among the medical term sets across all the parsed referrals. The results showed that this approach can produce significant insight into the referrals' datasets with ease. This information is particularly valuable for clinicians and healthcare management who need to know the landscape of the referrals from a holistic perspective. In analysing this data, it may be possible to target resource distribution to vulnerable patients and those with the highest level of need.
The neural network driven triaging approach has shown that it achieved high accuracy results in classifying the referrals based on the test dataset which uses the medical-based NLP system to derive the relevant medical codes from the GP referrals. Prior to text classification, the text data needed to be pre-processed with both stemming and lemmatization, which are normalization techniques that are used before they can be further processed. However, these pre-processing of NLPs was not suitable to handle the meaning of words within the medical field where each term has a critical relationship to the well-being of the patients. Therefore, a specialized NLP developed for the healthcare sector should be used for this intent. We consider that deriving out the medical diagnosis of the referrals' content into clinical codes such as ICD10CM and RxNorm will promote the better accuracy and reliability of the referral's text classification process. Our experiment showed that our approach can achieve a significantly higher accuracy result as compared to the results of the research in [28]. This is our initial experiment, and we expect future enhancements to include exogenous conditions such as social and demographic considerations. Besides, high-risk categories such as comorbidity or pre-requisite for other medical attention can be used to fine-tune the triaging process even further.
For the approach to locate the referrals' outliers for the three categories, it uses the ensemble of the dimensionality reduction algorithm and the outlier detection, on the processed data that had been processed with the aid of the Medical NLP, including the vectorization process. The approach can then extract the array of features that are regarded as abnormal based on the density's threshold of the LOF, and that in term can be referred to the specific medical referrals.
The level of outlier's detection can be adjusted as required to support the clinical process improvement. This can lead to better management of the clinician's triaging process and improve their quality of work.

AUTHOR CONTRIBUTIONS
Chee Keong Wee and Xujuan Zhou have proposed the idea and consulted it with Raj Gururajan, Xiaohui Tao, Jennifer Chen, Rashmi Gururajan, Nathan Wee, and Prabal D. Barua, and all authors contributed to the development and completion of the idea, Chee Keong Wee conducted the main part of the algorithms design and experiments, and all authors participated in discussions and structure of the articles, and writing the manuscript. RAJ GURURAJAN is currently a Professor with the University of Southern Queensland (USQ), Australia. His research record to date includes over 300 refereed publications with three best paper awards, over six millions Australian dollar in total research grants, completion of 59 research theses, Queensland Premier's Research Excellence Award for the Patient Journey Board project, USQ Research Excellence Award, and an Endeavour Executive Award. He has delivered many keynote addresses in health informatics, including the prestigious SAARCC event. In addition, he is currently the Treasurer of the Australian Committee of Professors and Heads of Information Systems (ACPHIS). He has also been invited to accompany the Queensland Minister for Science and Technology on international delegations, including the Boston Biotech event, in 2012. In 2009, he led a delegation to India to explore the Australia-India Obesity Institute, however this did not materialize due to change in governments.
XIAOHUI TAO (Senior Member, IEEE) is currently an Active Researcher in AI and data science and an Associate Professor (computing) with the School of Mathematics, Physics and Computing, University of Southern Queensland (USQ), Australia. His research outcomes have been published on many top-tier journals (e.g., IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, IPM, KBS, and ESWA) and conferences (e.g., ICDE, CIKM, PAKDD, and WISE). His research interests include data analytics, natural language processing, machine learning, knowledge engineering, information retrieval, and health informatics. He is a Senior Member of ACM. He received the ARC DP Grant (2022-2024) and an Australian Endeavour Research Fellow (2015-2016), along with many other schemes.
JENNIFER CHEN currently practices as an Otolaryngology Unaccredited at the Royal Brisbane and Women's Hospital and is currently an Associate Academic Lecturer with The University of Queensland. She has a keen interest in the field of head and neck cancer and is aiming to complete her Masters of Philosophy thesis on second primary malignancy in head and neck cancer surveillance by the end of this year.

RASHMI GURURAJAN is a final year Psychiatry
Registrar at the Royal Brisbane and Women's Hospital. She is specializing in the field of Consultation-Liaison Psychiatry and has special interests in psycho-oncology and palliative care, pain medicine, and the integration of artificial intelligence technology with clinical medicine. In particular, she hopes to be able to use AI techniques to optimize the delivery of health services and to optimize resource distribution.
NATHAN WEE graduated from The University of Queensland, in 2021. He is an IT professional working at Dialog, as a Data Analyst, and he supports his company's corporate client, such as Queensland Health in various large-scale IT projects. He has acquired various data science and IT certifications with Coursera and Microsoft. His research interest includes the application of AI for healthcare.
PRABAL DATTA BARUA received the Ph.D. degree in information system from the University of Southern Queensland. He is currently an Adjunct Professor with the University of Southern Queensland and an Honorary Industry Fellow at the University of Technology Sydney. He is an academic and accredited research supervisor at the University of Southern Queensland with 15 years of teaching experience. He received research support from the Queensland Government Innovation Connections under the Entrepreneurs program to research ''Cancer recurrence using innovative machine learning approaches.'' He is interested in AI technology development in health, education, agriculture, and environmental science, and he published several papers in the Q1 journal. He is the industry leader in ICT entrepreneurship in Australia and sitting as an ICT advisory panel member of many organizations.