Knowledge and Data-Driven Framework for Designing a Computerized Physician Order Entry System

A major concern related to the healthcare industry is uniformity in healthcare delivery. There is considerable variation in the diagnosis and treatment of patients depending on the experience and expertise of the doctors. Information technology can play a major role in addressing this issue. Research investigating the use of data-driven approaches and knowledge-driven clinical pathways to achieve uniformity in the delivery of healthcare is ongoing. Specifically, the integration of data and knowledge-driven approaches can be used to ensure uniformity in the delivery of patient care, thus avoiding inappropriate variance. The data-driven approach can utilize the bulk of medical data being generated. The knowledge-driven approach in the form of clinical pathways incorporates evidence-based care. In this context, knowledge and data-driven computerized physician order entry (CPOE) systems are gaining importance in healthcare delivery systems. This paper proposes a knowledge and data-driven framework for a CPOE system. This work is based on a knowledge base populated with disease quadruples, each of which comprises a list of symptoms, tests, results, and medications for a particular disease. The data used in the proposed system are obtained from two datasets, namely, the MIMIC (Medical Information Mart for Intensive Care) and the Disease-Symptom Knowledge Database, which is based on the operational data of New York-Presbyterian Hospital (NYPH). We combined both datasets using the common attribute of disease to generate data with more attributes to aid decision making. This was performed with the help of specialists and clinical knowledge. The resulting patient data are further integrated with clinical pathways before the extraction of disease quadruples. The novelty of this work lies in the extraction of disease quadruples from the integration of patient data with clinical pathways. The list dynamically ranks each element of the quadruple based on its association score to facilitate the generation of prescription order sets for the CPOE. The effectiveness of the proposed system in providing uniform patient care delivery has been validated by experts. The proposed system can significantly improve patient safety and the quality of healthcare delivery due to the integration of data-driven capability with clinical pathways.


I. INTRODUCTION
Uniformity in healthcare delivery is a major challenge faced by the medical sector worldwide [2]- [4]. Information technology can play a major role in addressing the emerging challenges of the healthcare sector [5]. Specifically, the integration of data and knowledge-driven approaches can be used to ensure uniformity in the delivery of patient care and The associate editor coordinating the review of this manuscript and approving it for publication was Alba Amato . reduce inappropriate variance. The data-driven approach can utilize the bulk of the medical data that have been generated. The knowledge-driven approach in the form of clinical pathways incorporates evidence-based care. The integration of these two approaches can contribute to improvement in medical diagnostics and administrative workflows. Such integration can improve the quality of care by reducing diagnosis and prescription errors. The organizational aspect of hospitals can also be improved by improvement in operational workflows [6]. Hence, devising tools and techniques for VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. Workflow of the CPOE system for processing medical order sets [1]. healthcare personnel to utilize the advantage of both data and knowledge-driven approaches is urgently needed. Therefore, an important research contribution can be the development of a data and knowledge-driven technique for the Computerized Physician Order Entry System. The Computerized Physician Order Entry System (CPOE) is the generation of automated order sets (for medication prescriptions, imaging diagnostics, laboratory testing, or other actions) by using a computer to facilitate healthcare personnel. The CPOE guarantees that the order sets are legible and can be stored in the Electronic Health Record without ambiguity and shared with other healthcare entities for execution and billing [7]. Figure 1 depicts the generic workflow of the CPOE system. The figure shows how the medical order set undergoes various stages in its processing. The system is segregated into multiple layers, namely, View, Data, and Process. The View layer addresses visualization and user input. In the Data layer, archiving of the order sets is performed. Processing and result generation are performed in the last layer [1]. In this paper, we focus on the design of a framework for knowledge-and data-driven CPOE systems. The novelty of this work lies in the integration of patient data with clinical pathways, which can provide a major contribution to the design of an efficient and robust CPOE system. The organization of the paper is as follows. A literature review is provided in Section 2. In Section 3, an overview of the datasets used in this work is provided. The working of the proposed system is explained in detail in Section 4. Then, the analysis and discussion are presented in Section 5. Finally, Section 6 concludes the paper.

II. LITERATURE REVIEW
Many recent studies have focused on uniformity in healthcare delivery. In this regard, emphasis has been placed on a variety of tools and techniques that could assist in the uniformity of 40954 VOLUME 10, 2022 FIGURE 2. Framework of the proposed knowledge and data-driven CPOE system. medical diagnostics. CPOE systems can play a major role in this context.
Due to the abundance of the data generated in the healthcare sector, researchers have been greatly interested in employing data-driven approaches in this field. Therefore, many research studies have focused on data-driven healthcare information systems, particularly CPOE systems. In this context, the future of technology-enabled healthcare relates to data-driven approaches regarding virtual care 2.0 [8]. The challenges and opportunities related to data-driven healthcare have been widely analyzed in the literature [6], [9], [10]. Data-driven approaches play a major role in personalized [11] and smart healthcare systems [12]. These approaches have also been used in the automated generation of clinical order sets [13]- [16] and clinical pathways [17]- [22].
Clinical pathways play a central role in providing uniformity in healthcare delivery and reducing variance. In this context, it is important to understand the interest in and usability of clinical pathways by healthcare professionals [23]. The use of information technology can also contribute to the compliance of clinical pathways [24]. Many studies have focused on the role of clinical pathways in reducing process and outcome variances [25], [26]. Researchers have also evaluated the impact of clinical pathways on care processes [27]- [31]. The challenge of the anglicization of clinical pathways has also been discussed in the literature [32].
The Computerized Physician Order Entry System (CPOE) is one of the major HIS systems used in the automation of medical diagnoses and treatments. Different researchers have evaluated the impact of CPOE using different parameters, such as the economic impact [33], medication errors and preventable adverse drug affects [34]- [36], turnaround time [37], length of stay [38], etc. Another important aspect in the literature is research investigating the modification of medical workflows created for CPOE adoption [39]- [45]. Many studies have also focused on the status and adoption of CPOE in the healthcare industry in Pakistan [46]- [49]. Research investigating CPOE end-user satisfaction has also  been performed [50]. Order sets play an integral role in the working of CPOE. The effectiveness of standardized order sets has been evaluated in many studies [51]- [54]. Different machine learning algorithms have been used for the development and modification of order sets [55]- [57].
The use of clinical pathways for the implementation of healthcare information systems, especially CPOE, is an important area of research. Research shows that the implementation of clinical pathways in CPOE reduces the treatment time [58]. An ontological framework has also been presented for the standardization and digitization of clinical pathways in healthcare information systems [59]. This research [60] highlights the importance of using decision trees for incorporating clinical pathways into healthcare information systems. Sherine et al. [61] presented a case of a tertiary care facility in which 15 clinical pathways 40956 VOLUME 10, 2022  were designed and implemented into the HIS system of the hospital. Studies have also focused on the development of order sets for HISs by the involvement of clinical experts [62], [63]. Clinical pathways have been incorporated into the computerized order process for the care of postoperative total joint patients [64]. In this research [65], the authors targeted and changed the order set design related to asthma in a CPOE system, which resulted in an improved selection of evidence-based care. Event trees have been used to integrate clinical pathways within CPOE [66]. Similar to these various case studies, hospitals have been discussed in the literature [67]- [70], and clinical pathways have been integrated into existing CPOE systems to ensure uniform healthcare delivery.
A summary of the existing techniques is shown in Table 1. Most studies discussed CPOE in terms of clinical pathways or data-driven approaches separately. Research considering data-driven approaches along with clinical pathways to develop CPOE is limited. Therefore, the major contribution of this research is the integration of patient data with clinical pathways. The proposed framework integrates patient data with clinical pathways to develop a CPOE system that ensures uniform healthcare delivery. Another contribution is the development of a VOLUME 10, 2022  mathematical background and user interfaces of application along with a framework for the calculation of association scores.

III. DATASETS
The datasets used in this work are the MIMIC dataset [71] and the Disease-Symptom Knowledge Database, which is based 40958 VOLUME 10, 2022 on the operational data of New York-Presbyterian Hospital (NYPH) [72], [73].
MIMIC is an abbreviation for Medical Information Mart for Intensive Care. MIMIC is an open-source dataset containing approximately 60,000 deidentified ITC admission records. These data belong to the MIT Lab. This dataset comprises vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, etc. Applications, such as academic research, quality enhancement projects, and higher education coursework, are supported by MIMIC.
The Disease-Symptom Knowledge Database is based on the operational data of New York-Presbyterian Hospital (NYPH). This knowledge database was developed using a study including a total of 25,074 discharge summaries from NYPH. In total, there are 1,366 unique disease concepts in the database, and the top 150 diseases account for 90% of the occurrences. The dataset organizers selected a total of 1,767 pairs based on statistical measurements for the disease-symptom knowledge base construction. We aim to present a knowledge and data-driven framework in this research; thus, we utilize both datasets to generate a combined dataset with the help of specialists and clinical knowledge. The objective of this study is to provide a combined larger dataset including attributes, such as symptoms, tests, medicines, and diagnosis of all diseases, which will eventually help in designing CPOE.

IV. PROPOSED FRAMEWORK
This section describes the proposed framework for knowledge-and data-driven computerized physician order entry systems in detail. The framework was designed in the form of layers for the segregation of data generation, data analysis, and visualization services. The stepwise working of the system has been elaborated. The concept and structure of the disease quadruple have also been explained.

A. LAYERS OF THE PROPOSED FRAMEWORK
The proposed CPOE system was logically segregated into three modules, namely, the Resource Fusion layer, Data Analysis Layer, and CPOE Layer. A graphical representation of the proposed framework is shown in Figure 2, which illustrates that data profiling is performed in the Resource Fusion Layer using the MIMIC and NYPH datasets. In the Data Analysis Layer, the knowledge base is populated using disease quadruples. The disease quadruples are extracted by the framework through the integration of patient data obtained from the Resource Fusion Layer with clinical pathways. The CPOE layer then relies on the knowledge base for its VOLUME 10, 2022 operation. It can provide data visualization in dynamic ways using CPOE dashboards, reports, prescription order sets, and notifications as and when required.

B. WORKING OF THE PROPOSED FRAMEWORK
• Both datasets are combined using common attributes of disease to generate data with more attributes to aid decision making as mentioned in Section III.
• Quadruples are established for each Disease D i i.e., (Complaints c ij , Tests t ij , Results r ij , Medicines m ij ) from these patient data.
• The disease quadruples are further integrated with clinical pathways to augment the additional information present in these pathways in the form of clinical knowledge.
• After this integration, the disease quadruples are populated into the knowledge base. An instance of a disease quadrupole of arrhythmia is shown in Figure 3.
• This knowledge base forms the foundation of the proposed CPOE system.
• The CPOE lists the symptoms based on their cumulative association score.
• Then, the CPOE calculates the association score of each complaint C j against D i represented by h ij . h ij , by considering the product of the probability of compliant represented by e ij and the weight W i of the disease based on the prevalence of the disease.
h ij = e ij * W i for ∀ i (2) • The system calculates the cumulative score of each complaint, represented by h j , based on the summation of its scores in associated diseases. The calculation of the association score of a complaint against diseases is depicted in Figure 4. It specifies the complaints related to each disease. Based on the association score of each complaint against a particular disease, the cumulative association score of that complaint is calculated. For example, the association score of complaint C 0 is calculated with reference to Disease D 0 as h 00 and against Disease D 2 as h 20 . The cumulative association score of C 0 represented as h 0 is the summation of h 00 and h 20 .
• The doctor selects the complaints specified by the patient on the CPOE screen.
• Then, the CPOE lists the diseases related to these complaints based on their cumulative association score.
40960 VOLUME 10, 2022 The calculation of the association score of the diseases against patient specified complaints is shown in Figure 5. This indicates the patient-specified complaints in the first column. Based on the patient-specified complaints, the system calculates the association score of each disease against these complaints. For example, in Figure 5, C 0 and C 2 are the patient-specified complaints. The association score of Disease D 0 is calculated with reference to Complaint C 0 as h 00 and against Complaint C 2 as h 02 . The cumulative association score of D 0 represented as h 0 is the summation of h 00 and h 02 .
• This association score is calculated for all diseases that can exhibit these particular complaints. If three complaints are selected and a particular disease is related to all three complaints, its score will be based on the cumulative scores of these three complaints.
• The doctor can select one or more suspected diseases.
• Based on the diseases selected by the doctor, the system lists the relevant tests extracted from the disease quadruples.
• The doctor provides a diagnosis based on the results of the tests conducted, and based on this diagnosis, CPOE lists the medicines required for the diagnosed disease. Figure 6 shows a diagrammatic representation of the stepwise operation of the proposed system for the extraction of disease quadruples from patient data and clinical pathways.

C. DISEASE QUADRUPLES
The disease quadruple of a specific disease, namely, arrhythmia, as it exists in the knowledge base, is depicted in Figure 3. The working of the CPOE system, in general, is based on four parameters (complaints, tests, results, and medicines), which are a part of the quadruple. The first element of the disease quadruple represents a list of symptoms or complaints associated with the particular disease. The second element lists the laboratory tests required to be conducted for the diagnosis of the disease. The third element of the disease quadruple specifies the diagnosis based on the tests conducted. Finally, medicines related to the disease are listed in the fourth and final element of the quadruple. The functionality of the proposed framework is based on populating the knowledge base with quadruples, each related to a particular disease.

V. ANALYSIS AND DISCUSSION
The proposed framework is used to generate a knowledge and data-driven CPOE system by merging clinical pathways with data extracted from datasets, which are then populated into quadruples in the knowledge base, resulting in dynamic CPOE screens. Two such instances of flowcharts based on VOLUME 10, 2022   Figure 7 relates to heart failure, and the symptoms that lead to suspicion of heart failure are listed. A pro-BNP test is conducted, and if its value is found to be higher than the normal range, ECG and Echo tests are performed. If the patient is diagnosed with acute heart failure, hospital admission is required for medical treatment. Otherwise, home treatment is prescribed. The flowchart shown in Figure 8 relates to acute coronary  syndrome, and the symptoms that lead to suspicion of acute coronary syndrome are listed. ECG is performed, and if ST-elevation is observed, the diagnosis of ST-elevation myocardial infarction (STEMI), which requires reperfusion treatment, can be made. If ST-Depression is observed, troponin test is conducted. A positive value indicates non-ST-elevation myocardial infarction (NSTEMI), which requires invasive treatment. In contrast, a negative value VOLUME 10, 2022 suggests a diagnosis of unstable angina, which requires noninvasive treatment.
The proposed framework is implemented as a desktop application developed using the.NET framework with Windows Presentation Foundation (WPF) forms and the Python run-time environment to execute the proposed association score calculation workflow at the back end. This application was shared with clinicians for their feedback and validation.
To provide a greater understanding of the proposed framework, four instances related to use cases of different patients are described in the form of the stepwise working of the CPOE system in Figures 9, 10, 11 and 12. The symptoms screen (Figure 9, 10, 11 and 12 (a)) represents the complete list of symptoms in descending order of their association scores. The scores are calculated using equation 2 and equation 3. The doctor selects symptoms based on the complaints reported by the patient. Based on the specified complaints, the CPOE then proceeds to the next screen, which lists the suspected diseases associated with these symptoms. The diseases screen also presents the list of diseases in descending order based on their calculated score as specified in the working of the proposed framework. The doctor selects one (Figure 9 and 10 (b)) or more suspected diseases ( Figure 11 and 12 (b)) to proceed to the test screen (Figure 9, 10, 11 and 12 (c)), which presents the diagnostic tests that need to be conducted to make an exact diagnosis of the disease. Once the diagnosis is made by the doctor based on the results of the tests, the medicines screen (Figure 9, 10, 11 and 12 (d)) lists the relevant medicines based on data extracted from the quadruple in the knowledge base. Figure 13 shows a graph that lists the cumulative association scores of complaints based on their association with multiple diseases. For instance, regarding shortness of breath in the graph, its total score is the sum of its individual scores related to arrhythmia, congestive heart failure, myocardial infarction, etc. In Figure 14, the scores of diseases are plotted based on some specific complaints reported by the patient. Each disease derives its cumulative score from a subset of complaints specified. Here, heart failure derives its score from chest pain, dyspnea, and leg edema.
A comparison of the proposed framework with other approaches is presented in Table 2 using the three aspects of data-driven CPOE, clinical pathways in CPOE, and data-driven clinical pathways. Most studies presented CPOE with respect to clinical pathways or data-driven approaches separately. Very few studies considered data-driven approaches along with clinical pathways to develop CPOE. Therefore, the major contribution of this research is the integration of patient data with clinical pathways. Another contribution is development of a mathematical background and user interfaces of application along with a framework for the calculation of the association score. It can be observed that the working of the proposed framework is based on the calculation of the association score, which is performed by using the complaints specified by the user.
The proposed system aims to provide uniform delivery of healthcare services to patients regardless of the experience and expertise of the doctors. As the experience of doctors and clinical practices are embedded in the user interface, the CPOE only shows tests that should be prescribed in a particular case, which restricts the doctor to only prescribe the medicines that are preferred. Therefore, the CPOE helps doctors make informed decisions regarding the diagnosis of patients and enhances accuracy in treating patients, which, in turn, optimizes the healthcare delivery system of the hospital. The proposed framework and its supportive application have been validated by senior doctors from different specialties who tested the application against different use cases. The validation form used to collect feedback from specialists is shown in Figure 15.

VI. CONCLUSION
This work provides a novel contribution by proposing a knowledge-and data-driven framework for a CPOE system. The knowledge-driven approach in the form of clinical pathways incorporates evidence-based care. The data-driven approach can utilize the bulk of medical data generated. The proposed framework addresses a major concern related to the healthcare industry regarding uniformity in healthcare delivery. There is considerable variation in the diagnosis and treatment of patients depending on the experience and expertise of the doctors. This framework can help meet the challenge regarding inappropriate variance in patient care. This framework is based on the integration of patient data with clinical pathways for the extraction of disease quadruples. These quadruples, each of which comprises a list of symptoms, tests, results, and medications for a particular disease, are populated into a knowledge base. The data used for the proposed system are obtained from two datasets that were combined with the help of specialists and clinical knowledge using the common attributes of diseases to generate data with more attributes to aid decision making. The proposed framework was explained in detail by elaborating the structure of its layers, i.e., the resource fusion layer, data analysis layer, and CPOE layer. The detailed working of the system for the development of mathematical relations to calculate the association scores of symptoms and diseases was specified. To provide an understanding of the proposed framework, four instances related to use cases of different patients were elaborated in the form of the stepwise working of the CPOE system. A comparison of the proposed framework with other approaches was shown using the three aspects of data-driven CPOE, clinical pathways in CPOE, and data-driven clinical pathways. The proposed framework can help doctors make informed decisions regarding the diagnosis of patients and enhance accuracy in treating patients. This research could be helpful in supporting clinical decision-making and can be applied in medical practices.