Causal Artificial Intelligence for High-Stakes Decisions: The Design and Development of a Causal Machine Learning Model

A high-stakes decision requires deep thought to understand the complex factors that stop a situation from becoming worse. Such decisions are carried out under high pressure, with a lack of information, and in limited time. This research applies Causal Artificial Intelligence to high-stakes decisions, aiming to encode causal assumptions based on human-like intelligence, and thereby produce interpretable and argumentative knowledge. We develop a Causal Bayesian Networks model based on causal science using d-separation and do-operations to discover the causal graph aligned with cognitive understanding. Causal odd ratios are used to measure the causal assumptions integrated with the real-world data to prove the proposed causal model compatibility. Causal effect relationships in the model are verified based on causal P-values and causal confident intervals and approved less than 1% by random chance. It shows that the causal model can encode cognitive understanding as precise, robust relationships. The concept of model design allows software agents to imitate human intelligence by inferring potential knowledge and be employed in high-stakes decision applications.


I. INTRODUCTION
Critical events are unexpected situations that severely affect citizens (e.g., by causing serious injury or death), infrastructure (e.g., via transportation damage or communications failure), and government (e.g., with economic crises or financial loss). These situations lead to high pressure and life-and-death trade-offs where decisionmakers must make crucial choices that may impact the daily life of millions of citizens. One constraint on making such decisions is the numerous low-probability, high-consequence situations that may arise due to uncertain and complicated factors. In addition, high-stakes decisions are limited by time and knowledge, even as they must protect against the severe consequences of failure. A knowledge discovery-based approach is required for such high-stakes management, to provide the right knowledge to the right users at the right time.
Event explanation based on Causal Artificial Intelligence (Causal AI) may interchangeably use eXplainable Artificial Intelligence (XAI) in critical event management. It is a wellknown concept that is driven by observational evidence to explore knowledge for decision-makers [1], [2]. Machine Learning (ML) is a specific class of algorithms that can provide causality of Causal AI that has become an essential ingredient to serve the knowledge discovery-based approach for event explanation [3].
Ofli et al. [4] and Kumar et al. [5] employed ML-based deep neural networks using real-time evidence for detecting and explaining high-stakes events with high-performance accuracy. Formosa [6] proposed an approach for traffic conflicts using proactive safety management strategies, while Anbarasan et al. [7] introduced a technique for highstakes events during flood disasters. Both support highperformance accuracy for better decision-making, but current deep learning focuses on detection and explanation performance rather than on supporting high-stakes decisions. Deep learning is notably a "black box", discovering events by estimating enormous sets of parameters with complicated representations. It does not provide fundamental knowledge to interpret critical events as human-like arguments, or explanations of why and how critical events happen. These missing features make it unsuitable for making high-stakes decisions.
Rudin [8] has strongly argued that the black-box model cannot offer a human-like interpretation of knowledge for the high-stakes decision process. For example, early deep learning-based black-box models reached a performance of over 95% but could not handle simple questions such as "Why was this output predicted?", and "Why were other solutions not predicted?". Only knowledge based on humanlike interpretation can answer these kinds of questions and plays a key role in helping authorities understand and react to critical events. High-stakes decisions in critical event management require a new paradigm of machine learning that goes beyond general event explanation towards cognitive event interpretation.
Causal AI lets machine learning describe the cognitive reasons for predicted output based on human-like interpretations [9]. It aims to produce reasons for "Why" and "How" events happen given current evidence regardless of outcomes, and so synthesizes plausible arguments and interpretations that decision-makers can utilize. Critical event interpretation should take advantage of Causal AIbased machine learning to produce practical knowledge for high-stakes decisions. This needs causal knowledge produced by human-like intelligent agents, which will help interpret the events that may critically influence the future.
The main contributions of this research are: • A fundamental interpretation principle based on Causal AI for high-stakes decisions; • Causal AI-based machine learning for event interpretation-based-high-stakes decisions; • Proof that Causal AI-based machine learning can encode high-stakes knowledge, which converges towards human-like intelligence. The current limitations of machine learning-based highstakes decisions are examined in section II; background on causal science based on human-like intelligence is given in section III; section IV investigates causal encoding for highstakes decisions and its outstanding properties; section V presents a case study of critical events in high-stakes decisions; section VI measures the causal paths in the model compared with human-like interpretations, and conclusions and future work directions appear in section VII.

II. RELATED WORK
This section reviews the recent technologies and trends for making high-stakes decisions based on machine learning while highlighting the limitations that high-stakes decisionmaking must address.

A. HIGH-STAKES DECISION MAKING
High-stakes decision-making aims to prevent the worsening of a situation such as the occurrence of serious injuries and death during a first-aid incident [10]. However, the process is limited by incomplete, insufficient, and conflicting evidence, which may cause authorities to make poor decisions.
To close the gap, research on first-aid decision-making has paid attention to event descriptions using big data [11] and machine learning [12]. Devaraj [13] and Madichetty [14] used machine learning to identify requests for urgent help in critical conditions, while Sarkar et al. [15] predicted injury severity.
Yu et al. [16] employed a case-based reasoning system for supplying timed information to help authorities. Kuo et al. [17] utilized machine learning for time predictions and identified time as a key factor in high-stakes decisionmaking. Yu et al. [18] applied machine learning to identify susceptible areas related to a natural disaster, while Zhao et al [19] examined locations such as public buildings in manmade disasters that affect decision making. Clearly, Spatiotemporal analysis plays a key role in high-stakes events [20] [21].
Although these studies identified factors that are helpful for high-stakes decision-making, none of them proposed meaningful relationships among those factors to aid deep explanation.

B. BAYESIAN NETWORKS
Bayesian Networks (BNs) model is an interpretable probabilistic machine learning approach. It interprets causal effect relationships using conditional dependence structure between random variables based on Directed Acyclic Graph (DAG). DAG lets agents predict the outcomes and explain how and why the results are made plausibly.
Zhou et al. [23] proposed the BNs model to generate ifthen rules to assess risks of shipping service. Moreira et al. [24] proposed a BNs-based approach as an explainable model for providing insightful information in decisionmaking. Although these studies claimed their models provided practical explanations, they did not consider information to explain how interventions could change outcomes to support high-stake decision-makers [3]. For example, high-stakes decisions require knowing how to provide the assistance requested by an injured victim, considering where and when the event happened. These interpretations help authorities plan and respond to conditions appropriately.
Uncovering such hidden knowledge requires critical thinking, a fundamental human principle for synthesizing knowledge intelligently. It is a capability of Causal AI that software agents must imitate to model human-like intelligence. It is a challenge in Causal AI to apply that concept at the implementational level, which is still an infant in recent AI applications [25].

C. CRITICAL THINKING
Critical thinking is the requirement for supporting scientific event explanations. General critical thinking typically uses 5W1H (Who, What, Where, When, Why, and How) to extract and describe events. For example, Yu et al [20] investigated event detection to support decisions. Sahoh and Choksuriwong [26] and Abebe et al. [27] employed a semantics-aware event-based approach and discussed how critical thinking could be utilized in intelligent high-stakes systems. Xu et al. [28] proposed heuristic-based event descriptions using critical thinking for detection. However, these studies did not consider the interpretation of the circumstances that led to catastrophe. Event interpretation requires answers to Why and How questions which necessitate the use of high-level cognition to describe the events from the viewpoint of human-like intelligence.
Pearl and Mackenzie [29] defined How as interventional questions where software agents are asked to describe their reasons (e.g., how did the critical event happen?). Why are counterfactual questions where software agents must interpret contrastive events (e.g., why not a different event?). These kinds of questions are outside the bounds of our current literature although they are very important. For example, the current evidence posits that there is around a 1% chance of a catastrophic incident, but when it does occur the impact will affect a high-density population zone. Clearly, authorities should ask for reasonable deep knowledge so they can take a proactive approach to protect their citizens. Unfortunately, current critical event description approaches cannot answer Why and How questions. Instead, the burden is passed to the decisionmakers as additional time-consuming and labor-intensive tasks. Critical event interpretation needs a way to model Why and How answers cognitively.
Our approach aims to contribute Causal AI based on BNs that apply critical thinking concepts to provide human-like interpretation, called Causal Bayesian Networks (CBNs). Our research challenges are 1) How to model high-stakes knowledge to provide reasonable answers based on Why and How?, and 2) What are the fundamental concepts for encoding human-like interpretation to construct such an approach?

III. CAUSAL BAYESIAN NETWORKS FOR HIGH-STAKES DECISIONS
CBNs satisfy causal science that aims to produce interpretable and argumentative conclusions for high-stakes decisions based on visible evidence and prior knowledge. CBNs are a core component of agent architecture that helps agent infer plausible information [30]. Causal science consists of three main concepts: 1) questions that we need to ask software agents to reach conclusions, 2) background knowledge that software agents employ as initial grounded truth, and 3) evidence that software agents can obtain from the environment [29]. The general components of causal science are shown in Figure 1. Figure 1 has three main elements: 1) evidence (E) taken from the real-world environment, 2) knowledge (K) encoding prior experience for plausibly interpreting the evidence, and 3) desirable conclusions (C) generated to answer the questions. Causal science plays a key role in connecting the real world to stakeholders because it can be applied with Why and How critical thinking to serve high-stakes decision-making. This section will explore several technologies based on causal science for modeling highstakes decisions.

A. CAUSAL QUESTIONS BASED ON CONCLUSIONS FOR HIGH-STAKES DECISIONS
Causal questions using critical thinking (5W1H) produce interpretable conclusions because good questions help people understand the chaotic real world [31]. Human thought is encoded in the form of assumptions based on human-like interpretation, as proposed by Pearl [32]. Examples of causal questions for high-stakes are shown in Table 1.
The causal questions in Table 1 are differentiated into three levels: associations, interventions, and counterfactuals. Each type is essential for software agents to mimic humanlike interpretation.
Association allows a software agent to answer a related question using basic statistical conditions (e.g., detection, description, and prediction), which lets the software agent directly match the related object to exact events. For example, features such as {gunshot, shooter, gunfire} can match a {shooting} event, while features {explosions, suspicious packages, suicide attacks} match {bombing}.
Intervention is a medium-level ability that lets software agents decide on future actions. It fixes some events (e.g., x = {shooting}, z = {rural area}) and then interprets how future scenarios are affected (e.g., y = {basic medical first aid}). Intervention happens daily when authorities need to understand upcoming trends. It allows software agents to mimic human-like thinking when they have to decide the best actions with the lowest uncertainty in the real world. This cannot be based on raw data alone, regardless of its size, but must also make assumptions based on cause-and-effect relationships. Software agents benefit from this by being able to simulate scenarios and present snapshots of possible futures. These are utilized by the authorities for early planning, the issuing of warnings, and preventive measures.
A counterfactual is a high-level ability that relies on human imagination. It cannot be derived from associations or interventions because the situation has not happened. For example, given the current situation x = {shooting} we might ask "what would be (yx) if x was bombing (¬x), and it happened in a crowded area?". This ability is needed so that software agents can adapt themselves to inexperienced situations.
Both interventions and counterfactuals go beyond traditional AI, and need human-like ability to interpret their answers. They require an explicit model based on causal relationships that can interpret both how and why answers.

B. THE CAUSAL CONCEPT FOR HIGH-STAKES DECISION
Every conclusion reached by human decision-making employs rational reasons based on knowledge and evidence [33]. Cause-and-effect comprehension is a fundamental principle for obtaining answers to causal questions that support high-stakes decisions. Cognitive comprehension is shown in the form of a simple diagram in Figure 2. The figure shows how evidence can be used to interpret conclusions, while conclusions can argue causes. These processes are called interpretable and argumentative abilities. The argumentation provides potential causes when a conclusion has been discovered so that the software agent can answer "What was the cause of the emergency first aids?". The answer can be argued by finding the most possible cause of the conclusion based on prior knowledge, such as "heavily injured victims are likely to be the cause". On the other hand, the interpretation may provide alternative explanations of the most likely conclusion given the evidence. For example, given the evidence "Minorly injured victims" and the question "What kind of first aid should be prepared?", the software agent might answer "It can be basic first aid because the victims are minorly injured". The interpretation dynamically changes the answers' confidence according to new evidence.
Human beings employ cause-and-effect comprehension daily to exchange knowledge because it is a powerful tool for interpreting complex events and making decisions. Software agents need to mimic this ability to produce knowledge by answering critical questions.

C. CAUSAL MACHINE LEARNING
High-stakes decisions should use Causal ML to empower software agents to uncover knowledge from observational evidence and produce interpretable reasons. Fortunately, Bayes' theorem [34] can be used to support cognition by employing observational evidence to interpret and argue an event's reason. Bayes' theorem can be written as equation 1.
The equation consists of four components: Posterior P(Effect | Cause) computes a conclusion given evidence that is aligned with the interpretation process; Likelihood P(Cause | Effect) computes the evidence given a conclusion that corresponds to the argumentation process; Prior P(Effect) encodes the likelihood of an occurrence of a conclusion known from the past; Evidence P(Cause) encodes the overall chance of new evidence without reference to the conclusion. They provide both interpretation and argumentation that serve the needs of the casual concept from topic B in section III.
Causal Bayesian Networks (CBNs) handle complex problems based on Bayes' theorem by encoding causality using a Directed Acyclic Graph [35]. The DAG models random variables as nodes, and semantic meaning between the nodes as edges with statistical dependency weights called conditional probabilities. A conditional probability lets a random variable conditionally control the state of another random variable according to a causal assumption computed by equation 2.
We utilize Equation 2 to explain the causal concept from topic B in section III. P(Xi) encodes the possible effects that interpret events for supporting decision-making. Pa(Xi) encodes possible Xi's causes that clarify the how and why answers made with Equation 1. For example, given the evidence "the heavily injured victims" as Pa(Xi), the authorities may ask "What kind of first aids should be prepared?". The answer can be computed by estimating the most possible conclusion P(Xi) using Equation 2. In this way, CBNs represent human-like interpretation and are a powerful tool for supporting high-stakes decisions made by the software agents discussed in topic A in section III.
Suppose we need to understand a critical accident in order to provide first aid assistance in a high-stakes situation. Three variables are considered: Impact I (e.g., minor injury, heavy injury, or death), Severity S (e.g., very severe, severe, or not severe), and First Aid F (e.g., basic or emergency first aid). They can be encoded by causal assumptions using CBNs as diagrammed in Figure 3. Figure 3 represents CBNs that can compute F based on two types of causal paths: a direct cause (S→F) drawn as a solid line and an indirect cause (I→F) drawn as a dashed line.

S F I FIGURE 3. CBN-based First Aid Assistance for High-stakes Decisions.
A direct cause captures the causal path determined by starting at nodes pointing towards ending nodes. S→F (S is a cause of F) which can be interpreted as accident severity, a cause that directly influences the first aid requirements. The indirect cause is a causal path determined by the unobservable evidence of intermediary nodes. In this case, only I = i is observable and influences F through S, even though S is currently unobservable, and the semantic meaning for the potential effects of F is still produced. In other words, different kinds of an accident I = i may require different help F, which are inferred through an intermediary variable S that is computed by equation 4.
Equation 4 expresses how a CBN uncovers possible F events when given accident impact as evidence. It shows how a software agent suffering from poor evidence (e.g., only I = i) in uncertain situations is still able to compute an answer using causal understanding [36].
Causal ML can be employed for high-stakes decision making especially during critical and insecure events. This research employs Causal ML based on CBNs to encode causal assumptions and produce answers to causal questions to help make better decisions.

IV. CAUSAL ENCODING FOR HIGH-STAKES DECISIONS
The main concern of high-stakes decisions is not only to achieve the best prediction but also to understand the uncertainty factors. These express the likelihood of events that can have a devastating impact, and the interpretations of the model are a fundamental requirement for authorities when making decisions.
For example, the time (T) of an accident and its location (L) must be employed in first aid understanding how to access the area quickly [37]. T = critical and L = crowded zone help the authorities interpret any difficulties by statistically associating T with L so they are likely to occur together. However, in a real-world situation, a high correlation between T and L does not mean that they directly influence one another since time does not change the location and vice versa. A hidden factor (H), or confounding bias, may be invisible but can causally connect them. Software agents must learn how to model T and L bridged by H plausibly. Fortunately, causal graphs can represent this in the form of CBNs [38] [39]. The causal relationships can be encoded in several ways, and software agents can initially compute T and L dependency measured by H based on dseparation [40], as shown in Table 2. Table 2 shows four different forms of causal graphs that encode causal assumptions when T and L are causally connected by H.
The chain, inverse chain, and common cause types show that the computation of T and L is independent T ⫫ L given H. In other words, if H is observed, then software agents can compute T or L depending only on H without needing further information, which is called d-separated. In contrast, if only T or L is observed, then computing them completely depends on the observation and must go through H. This makes all of them dependent, which is defined as d-connected.
The collider is a special graph form whose semantic meaning contrasts with the other forms. T and L are initially independent or d-separated T ⫫ L. But when H is observed, it makes T and L d-connected or T ⫫ / L | H.
These causal graphs explain the semantic relationships between T and L through H. d-separation cuts and connect the nodes in CBNs that hold the most relevant nodes for interpreting and arguing the reason for how and why it is computed. Causal graph and d-separation let software agents imitate human-like intelligence in high-stakes decisionmaking.
The graph is inversely conditional upon the causal chain.
Common Cause Although d-separation can encode criteria based on conditional independencies, it is unable to distinguish the semantics of how differences form between the causal graph. For example, the chain, inverse chain, and common causes types encode the same conditions (see the d-separation column in Table 2), so they cannot give significant meaning to how they semantically differ from one another.
The do-operation is a successor to d-separation which computes the target node by forcing some nodes to be constant. For example, the authorities may ask "What is L likely to be if H was restricted to h?", where L is the target node and H is set to be h, which can be written as an algebraic causal question P(L | do (H = h), T). The computational functions to compute the answer using different causal graph forms and a do-operation are shown in Table 3.  There are four ways to compute answers with computational functions based on graph types suitable for the problem. From the viewpoint of human-like interpretation, the ability to access an accident area and give first aid is considered. H represents the difficulty of accessing the area, and T and L represent contexts. Generally, T and L do not affect one another (e.g., period time does not influence the choice of the area) and H factors do not influence either of them. Unless the H factors are observed, T and L will plausibly influence one another (e.g., If H = very difficult, then either T = critical or L = crowded zone must be true). Software agents can imitate such understanding by considering the semantic relationships between T, L, and H through d-separation and do-operation. In particular, H is fixed to determine the (in)dependence between T and L. Also, the function that computes the answer is unlikely to be a chain, inverse chain, or common cause, because the collider-based-causal graph is more suitable.
This section has shown how software agents can imitate human-like intelligence using CBNs. CBNs semantically encode knowledge to deal with high-stakes problems which allow the agents to produce interpretable and argumentative knowledge. This research employs CBNs to construct a causal model to benefit high-stakes decision-making.

V. CASE STUDY: CAUSAL MACHINE LEARNING MODEL FOR HIGH-STAKES DECISION
The goal of this section is to develop a causal model to support decision-making in high-stakes management strategies. Therefore, the research questions are 1) what kinds of critical factors are relevant to high-stakes decisionmaking, 2) how to represent these factors to generate knowledge, and 3) how to approve these critical factors to causally explain events in real-world environments.
Oroszi [41] identified terrorism as a high-stakes situation, with intensive time pressure and high uncertainty, which must be handled by interpretable knowledge. Terrorism affects the well-being of people, breaks society's function, and is feared by counties around the world. Thailand is one of the top ten countries suffering from its impact [42]. From January 2004 to June 2019, Thailand had to deal with 20,323 terrorist attacks, with 6,997 people killed and 13,143 injured, as reported by Deep South Watch [43]. As a result, we have chosen Thailand's terrorism as an environment in which to build a high-stakes decision-making model. Time (T) and location (L) are general factors (as discussed in topic A in section II) for explaining the causal encoding of section IV. However, high-stakes issues require more than just time and location data to determine the trade-offs between low chance and serious consequences. Wang et al. [44] and Mujalli et al. [45] argued that accident types and their impact are also important factors for decision-making. Based on the literature, we have highlighted the following critical factors in Table 4 to be represented by random variables.  Table 4 shows critical factors considered as random variables aligned with human critical thinking (5W1H). The variables can be categorized into dependent and independent groups. The First Aid dependent variable will be changed during the experiment depending on other factors. The other variables are independent whose states can randomly occur and control the dependent variable. In the other words, First Aid's states are exposed when the independent variable's states are observed or measured. For example, the experiment can set First Aid = immediate response if we observe Accident = bombing, to compute the odds of the immediate response given the bombing that has occurred. This is similar to how people interpret a situation on an everyday basis. However, all the values of the random variables are determined based on qualitative and abstract understanding. This digitization of human-like intelligence to support high-stakes decision-making scenarios is a challenge.
The digitization of human-like intelligence focuses on measuring the corresponding between the independent variables (X) and the dependent variable (Y). Conditional probability is employed to observe how likely that the Y states occur given states of X. This can be symbolized by P(Y | X) where Y = {First Aid} and X = { Time, Location, Accident, Impact, Search and Rescue, Severity }. The basic hypothesis is that if event X = x and event Y = y are mutually relevant, then the conditional probability between them can be represented by a Conditional Probability Table  (CPT). The matrix for P(Y = y i | X = x j ) can be formed by CPT ij where i is the range of the independent variable and j is the range of the dependent variable. This matrix based on CPT ij must compute the column values using the ∑ j CPT ij ≈ 1 for all i.
We employed 20,323 terrorism events taken from the Deep South Watch Database [46] to perform the CPT of P(Y | X). The probability outcomes of the dependent variable under the condition of independent variables were represented as a matrix-based CPT with colored graphics using a heatmap visualization to show the likelihood of events co-occurring (red to white). The associations between the X and Y sets are shown in Table 5. Table 5 displays how likely event Y is given event X using a color scale that ranges from red for higher probabilities to white for lower probabilities. For example, the conditional probability of P(First Aid = immediate | Location = very-crowded) is 72% and P(First Aid = monitoring | Location = not crowded) is 80%, which means they are highly likely to co-occur. Clearly, when a "very-crowded" area is observed, an "immediate" response should be considered by authorities.
Although the pattern-based CPT represents the probabilities of two events co-occurring, it is a correlated relationship and does not plausibly signify causation between X and Y. For instance, in the above example, human intuition can understand that certain very-crowded areas, such as parks, casinos, and shopping malls, do not require an immediate response from first aid services (e.g., blood reserves, breathing apparatus, and recovery vehicles). This shows that relevant decisions depend on hidden factors that need to be encoded. In other words, causal science is required to model high-stakes events that require interpretations and arguments rather than purely highly correlated scores.

A. CAUSAL EFFECT MODELLING
The goal of this section is to determine causal relationships between random variables from Table 4 by imitating human commonsense to encode transparent and testable knowledge. For example, we discussed the semantic relationships between Location (L), Time (T), and Search and Rescue (SR) in section IV as represented by a collider-based causal graph that can be understood by software agents.
Accident (A), Impact (I), and Severity (S) do not randomly occur, since A might directly cause I while I influences S so that A indirectly convinces S. We can represent these causal relationships using a causal chain.
SR and S are usually independent except when the authorities ask for the likelihood of First Aid (F), which will cause SR and S to influence one another. Relationships of these types can be discovered by observation (e.g., via statistical studies or by talking to experts). We draw these causal assumptions as the CBNs shown in Figure 4.  The model represents how L, T, and A are the root causes in the graph and are d-separated, which occur independently in physical reality because they connect in the form of a collider-based causal graph. This means that the accident can happen anywhere and at any time unless the authorities ask about F, and all of them become d-connected. For example, consider when the software agent observes that time T = normal but the situation response is F = immediate. The causal model generates the rational reason for this situation that S is most likely to be very severe, probably due to I = mortality, and the most possible reason is A = bombing. This is because its impact can affect a high-density population zone that sets SR = difficult search since L = crowded area is highly likely. This shows how L, T, and A are linked when F is aligned with a d-connected graph.
An additional aspect is that SR and S directly cause F, which is independent of the other factors. For example, given I = mortality the authorities may ask how likely F will be. The graph shows that I and F do not directly influence each other but provides an intuition that I = mortality may cause S = very severe which must trigger F = immediate. However, if S = very severe is observed, then it directly influences F = immediate without requiring any information from I because S already summarizes I. This allows the software agent to interpret it gives a reasonable conclusion from the viewpoint of the causal model.
A causal model can interpret conclusions from high-stake situations for two main reasons: 1) all the causal relationships can be interpreted by a software agent and generate How and Why answers for decision-makers, and 2) relationships in the causal model are transparent which can be troubleshot by the authorities and experts to address how the model wrongly connects variables. If the answer from the model is conflicted, the model's structure can be easily revised and updated by experts.
The causal model can encode commonsense to support high-stake decision-making based on qualitative design. This requires the real-world environment to be specified so that observational data can be collected to fit the parameters and transform the models into a quantitative representation that can be evaluated using model fitting.

B. DATA PREPROCESSING
Twitter [12] allows software agents to consume real-time and worldwide observations, and so can be utilized to collect information on dependent and independent variables. However, most tweets consist of unimportant words, symbols, conjunctions, and abbreviations, so natural language technology is needed to handle such problems. Our information extraction technique is aligned with similar approaches [47][48] that detect states of random variables from tweets. The overview architecture is shown in Figure 5. Figure 5 shows that there are five main components for the extraction of random variables and their states from Twitter. In 1), Tweet streaming collects real-time tweets as text and feeds them into 2) Tokenization and Noise Removal to split the strings into tokens and remove insignificant values. In stage 3), Named Entity Recognition identifies the meanings of tokens using a Human Critical Thinking Model [26]. This identifies elements such as people, building and place, time, and accident. The states are then matched with variables by employing the resulting contexts in 4) Variable and State Description. Statements are generated from this information by in stage 5) Variable and State Creation.
To show the difference between the input (raw tweet) and output (information) of the information extraction process, we sampled a tweet posted by @TichilaThaipbs, a field reporter for Thai PBS (Thai Public Broadcasting Service). Her tweet was evaluated by two emergency medicine physicians and three practitioners from Prince of Songkla University Hospital; they confirmed that the tweet showed that the situation called for immediate first aid. In addition, the tweet was converted into a form suitable for the software agent by our extraction process, resulting in Table 6. [Bombing <bomb ∊ Accident> around highway road < highway road ∊ Location> in Pak-Bang village <village ∊ Location>, Thea-pa district <district ∊ Location>, Songkhla province <crowded ∊ Location >. One solder < victim ∊ Impact> has been killed and six injured < one death, six injured ∊ Impact> reported at 12:38 PM Oct 1, 2020 <12:38 PM (critical) Oct 1, 2020 (Thai holiday) = crucial ∊ Time>] The second row of Table 6 gives the tweet with semantic tags with extracted information in a software agent readable format. Light-gray text marks the tokens considered to be noise, while the red text is words that may denote states and variables. The italic text is states, and the bold text is variables. These are linked by subset (∊) to denote the context of the texts in the form of state and variable information which will become observational evidence used by the causal model. In the next section, we propose a methodology for measuring the causal model with this evidence.

VI. EXPERIMENT SETUP
One of the best-known abilities of the causal machine learning model is its predictive ability. However, human-like intelligence extends far beyond pure prediction. Its most challenging aspect is how to evaluate the plausibility of knowledge in causal paths because testing this is a difficult task with no standard tool to measure its performance and needs. This is key for allowing the model to interpret and argue about the reasons for high-stake decision-making.
The motivation of our experiment is to measure the rationality of causal paths in a DAG using observed data as evidence. Intuitions are proved by interpreting causality with d-separations that state whether the relationships between variables are separated or connected.

A. MEASUREMENT METRICS
Causal Odds Ratio (Causal OR) measures the robust causality between random variables. It is written as Causal OR(X, Y | Z), where Y is a set of dependent variables, X is a set of independent variables, and Z is a set of confounding bias variables. It measures how event X = x can influence event Y = y conditioned on event Z = z. Causal OR has been proposed as the basis for the do-operation [49], and we use it to measure our assumptions about the causal graph: X = x is an event of interest that must be measured for robust causality with Y = y while X = ¬x are the rest of the events that oppose to X = x. Y = ¬y is a reference set by the majority of the sample events that is employed to calculate the ratio between X = x and Y = y normalized by Y = ¬y. The Causal OR results can be semantically interpreted as: • Causal OR ≈ 1: X = x does not change the likelihood of Y = y given Z = z, (i.e., X ⫫ Y | Z) • Causal OR > 1: X = x increases the likelihood of Y = y given Z = z, (i.e., X ⫫ / AY | Z) • Causal OR < 1: X = x decreases the likelihood of Y = y given Z = z, (i.e., X ⫫ / AY | Z). Causal OR shows the strength of the relationships between variables but cannot confirm whether they co-occur by chance or have statistical significance.
The Causal P-value is a probability score to express a significance under the causal assumption of X and Y conditional on Z. A Causal P-value of less than 0.001 is considered of high significance which means that all evidence lower than 1 out of 1000 that X and Y co-occurred by random chance given Z.
The Causal Confident Interval (Causal CI) measures the precision of Causal OR of X and Y conditional on Z. A Causal CI of 95% states that the range of the Causal OR of X and Y is sure that its relevant evidence lies within 95% of all evidence.

B. DATASET
We collected evidence from field reporters to determine relevant events for all the random variables using the Twitter platform between May 16 and June 16, 2019, resulting in 100,000 transactions. These collected data were labeled by experts from Deep South Watch, the center of conflict studies based on the national security-related decision. The resulting variables and states were employed to estimate hyperparameters in the proposed CBNs model structure that we mentioned in section V using an Expectation-Maximization algorithm [50]. The CBNs model were utilized to measure event interpretation's cause-and-effect relationships.

C. RESULTS
We measured the causal model as detailed in topic B from section V to examine the relationships between two variables (e.g., between X and Y) given a third variable (e.g., a certain value of Z) to determine them to be d-separated (e.g., X ⫫ Y | Z) or d-connected (e.g., X ⫫ / AY | Z). We broke the model into sub-graphs in the form of a triple-based graph to measure its semantic relationship. The first triple-based graph uses Location (L), Time (T), and Search and Rescue (SR), the second triple-based graph employs Accident (A), Severity (S), and Impact (I), the third utilizes First Aid (F), S, and SR, and the last uses T, L, and F.

1) SUB-GRAPH 1
Given L → SR ← T or P(L, T | do(SR)), we set L as a dependent variable, T as an independent variable, and SR as a confounding bias variable. The hypothesis was "Is T necessary to interpret L given a certain state of SR?" The expectation: the collider-based graph set SR to do(SR = sr), so T and L must be d-connected T ⫫ / AL | do(SR = sr). In contrast, if SR is unknown and sets SR to do(SR = marginal), T and L must be d-separatedL ⫫ T | do(SR = marginal). Therefore, the Causal OR of the given sub-graph should converge to "1".
We measured the first sub-graph utilizing our dataset, and the Causal OR of T and L given SR is shown in Table 7.
On the right of Table 7, the Causal OR of do(SR = marginal) is approximately 1, which means that L and T are generally d-separated without considering SR. The Causal P-Value verifies that the evidence of T = difficult is around 68% by random chance which means that the relationship between L and T is not statistically significant. Moreover, the Causal OR is in the Causal CI range of both T = difficult (i.e., 0.983-1.027) and T = critical (i.e., 0.967-1.005). These measured precisions for the d-separated relationship are consistent with our expectations.    In the case of a given state of SR, the rest of the table shows that the Causal OR of do(SR = crucial, difficult, normal) is far from 1, which induces both L and T to be dconnected. This is verified by the Causal P-Value being less than 1% by random chance and 1 is not in the Causal CI ranges.
In summary, our causal encoding of L, T, and SR using a collider-based graph fits the real-world evidence.

2) SUB-GRAPH 2
Given L → F← T where F is a child of SR that is also considered a collider-based graph, we set L as a dependent variable, T as an independent variable, and F as a confounding bias variable. The hypothesis was "Is T necessary to interpret L given a certain state of F?" The expectation: the collider-based graph set F to do(F = f), so T and L must be d-connectedT ⫫ / AL | do(F = f). This suggests the same trend as the first hypothesis of subgraph 1.
We measured this hypothesis utilizing our dataset, and the Causal OR of T and L given F is shown in Table 8. Table 8 shows that when F is set to a constant, the Causal OR of L and T displays the same trends as Table 7. However, in the case of the Causal OR of T = critical, it is slightly different because of increased uncertainty. For example, the Causal P-Value of T = critical given F = immediate response is around 27% by random chance while "1" is in the Causal CI range (i.e., 0.874-1.038) because F is indirect evidence of L and T and connects through SR. It reduces confidence as human understanding in the same way when we observe indirect evidence that confirms our belief less well than direct observations.

3) SUB-GRAPH 3
Given A → I → S, we set A as a dependent variable, S as an independent variable, and I as a confounding bias variable. The hypothesis was "Is S necessary to interpret A given a certain state of I?" The expectation: the chain-based graph set I to do(I = i), so A and S must be d-separatedA ⫫ S | do(I = i). Therefore, the Causal OR of the chain-based graph given do(I = i) should converge to 1. In contrast, if I is undetermined by   We measured this hypothesis utilizing our dataset, and the Causal OR of A and S given I is shown in Table 9.
On the right of Table 9, the Causal OR of A and S given d0o(I = marginal) is for from 1, which means that they are d-connected. This is verified by 1 not being in the Causal CI range (i.e., 1.359-1.409 for S = critical and 1.193-1.236 for S = difficult) and the Causal P-Value shows that less than 1% of the evidence can occur by random chance. In other words, if the evidence from I is unobserved, the knowledge of S must be summarized from A as indirect evidence. While the rest of the table shows that the Causal OR of A and S given do(I = crucial, difficult, and normal) is approximately 1, which means that they are d-separated. This is similar to human cognitive understanding when the knowledge that I summarizes A means that A is not an important factor for interpreting S.

4) SUB-GRAPH 4
Given SR → F ← S, we set SR as a dependent variable, S as an independent variable, and F confounding bias variable. The hypothesis was "Is S necessary to interpret SR given a certain state of F?" The expectation: the collider-based graph set F to do(F = f), so SR and S must be d-connectedSR ⫫ / AS | do(F = f). The expectation will display the same trend as the hypotheses in sub-graphs 1 and 2.
We measured this hypothesis utilizing our dataset, and the Causal OR of SR and S given F is shown in Table 10.
On the right of Table 10 SR ⫫ S | do(F = marginal) is approximately 1, which means SR and S are d-separated if F is unexplored. The rest depends on each other semantically when F is given.

D. DISCUSSIONS
The proposed CBNs in Section VI-C let software agents break and choose the relevant variables to infer knowledge based on d-connected and d-separated. The structure is cause-and-effect relationships, dynamically based on evidence to determine the related random variables, cut off unrelated, and produce high-stakes knowledge. For example, the question is, "What is a probability of search and rescue (SR) can be trouble (SR = difficult) given the incident period is in the morning?". CBNs employ the incident period as critical time (T = critical) according to the sense about rush hour. The location (L) is an unobserved variable. However, agents can still inference L based on marginal distribution because the high-stakes knowledge from subgraph 1 shows that SR information causally depends upon T and L. In contrast, software agents do not accumulate A, I, and S in the inference process because CBNs let them understand which variables are useless or useful based on d-connected and dseparated in high-stakes situations.
CBNs help software agents deal with insufficient evidence because they can compute both direct and indirect evidence. Indirect evidence is often considered outliers because of uncertainty and therefore excluded from the model. Although indirect evidence may produce an unclear outcome, it is still helpful if software agents can explain how and why such effects are made. This shows that the CBNs can encode human-like sophisticated knowledge, especially in sensitive cases of high-stakes events.
Moreover, tables 8-10 show that the same dataset provides different facts when setting the confounding bias variables to be constant. The problem is known as Simpson's paradox [35], and only experts could explain how and why it happens. The paradox may confuse non-expert decisionmakers and cause difficulty in the high-stakes decisionmaking process. The CBNs help software agents realize Simpson's paradox and deal with high-stakes situations effectively.

VII. CONCLUSIONS
High-stakes decision-making deals with highly uncertain events that have a low chance (of occurring) but have a high impact when they do. Interpretable knowledge is required to understand events to prevent bad outcomes.
This research used Causal AI for high-stakes decisionmaking by utilizing causal science to encode human-like intelligence. Causal encoding based on d-separation and dooperation was applied to model causal assumptions as represented by CBNs with Causal OR, Causal P-Value, and Causal CI used to discover causal effects by measuring the commonsense behind a graph. Causal OR measured the robustness of the causality between random variables, Causal P-Value measured if the Causal OR occurred with statistical significance, and Causal CI confirmed whether the Causal OR was precisely aligned with the evidence. Our experiment shows that CBNs can encode commonsense based on causal assumptions by measuring their rationality using observed data as evidence. The results confirm that employing a causal model can add a significant level of cognitive understanding to high-stakes decision-making.
In the future, we plan to develop an automatic mechanism to generate causal assumptions based on unknown scenarios. This is needed when the model is applied to a new environment and needs to evolve according to new evidence. We hope to enhance the model's flexibility by employing variational inference to generate potential samples for estimating causal paths. This will allow the model to learn unknown events in high-stakes situations.

APPENDICES
These four appendices give the functions based on dseparation and do-operation for whether two variables are semantically dependent or independent given a confounding bias variable. They employ a conditional (in)dependent concept based on the chain rule where P(X 1 , X 2 … X i ) = P(X i | X 1 … X i-1 ) transforms into P(X 1 , X 2 … X i ) = P(X i | Pa(X i )) using CBNs.
According to Table 3 in section IV, there are four types of causal graphs: causal chain, inverse causal chain, common cause, and collider. Each of them consists of three variables; T, H, and L, with T the dependent variable, L an independent variable, and H the confounding bias variable. The hypothesis is "Is T necessary to compute L given a certain state of H?".