Enhancing Alarm Prioritization in the Alarm Management Lifecycle

Despite significant improvements being made in control and safety systems, near-miss incidents and adverse accidents continue to occur in the industry. Indeed, humans have a vital role in process control success or failure due to their responses to abnormal situations and alarms. A broad study on the alarm system performance shows that good rationalization and accurate prioritization of alarms should increase the efficacy of alarm systems and improve operator decision performances. This paper discusses current gaps in alarm prioritization approaches. It then proposes a method based on Graph theory and metrics capabilities to facilitate and improve the alarm prioritization process. The method is developed based on the causal and layer of protection modeling, followed by measuring the graph metrics for prioritization purposes. Finally, the proposed method is evaluated through implementation in a simulated case study. Results show that this approach facilitates similar achievement to the alarm workshop and produces more valuable data to the cascade of abnormal situations in a structured method and shorter time.


I. INTRODUCTION
Alarm systems and operator responses to them are crucial in effectively managing process deviations and abnormal event situations. Reising et al. [1] define abnormal situations in process industries as ''any process or operation disturbance that requires an operator action promptly to restore the plant to a normal operating condition.'' These disturbances can emerge from complex interactions between different process or system components; for instance, in a medium-scale refinery, thousands of different abnormal situations or system malfunctions may occur [2]. Traditionally, from an alarm system design point of view, each abnormal event should activate an alert or alarm to notify the operator that a disturbance in the system needs to be managed to not escalate into an unwanted accident.
Over time, the complexity of process operations and the span of operator control has increased [3]. Indeed, the number of alarms per operator in process plants has increased The associate editor coordinating the review of this manuscript and approving it for publication was Yu Liu . exponentially from less than 100 in 1960-2000 to approximately 4000 in recent years [4], [5]. Under this load, operators are expected to perform well and make appropriate decisions in all circumstances [6]. Issues like alarm floods or nuisance alarms may lead to miscommunication with the operator and loss of operators' situation awareness, leading to an adverse loss of control event or an unwanted shutdown. Nuisance and chattering alarm issues distract operators, whereas alarm floods easily confuse or overload operators [7]. All of these factors adversely impact operator performance through missing either the required response order or following the priority of alarms to response [7], [8]. As a result of increasing size and complexity, alarm system issues that include chattering alarms, nuisance alarms, and alarm floods commonly occur in many systems.
Chattering alarms frequently change status from activation to deactivation in a short period [9], which lead to disturbing the operator from proper decision making and correct action due to noise and disturbance. Indeed they can confuse operators by activating many other alarms [2]. Nuisance alarms, like chattering, fleeting, or stale alarms, activate excessively, VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ unnecessarily, or do not return to a normal state after the correct response is taken [9]. The most challenging part of nuisance alarms are failed alarms that occur due to sensor failures and can result in complete failure, bias failure, calibration failure, drift, and degradation failure [10]. Failed alarms are a complex challenge for operators to recognize and manage. Fleeting alarms activate and deactivate quickly but do not necessarily recur [9]. Repeating alarms initiate and repeatedly terminate over a period [2]. Stale alarms activate but are not deactivated for at least 24 hours [9]. Standing alarms remain activated for a long time [11]. Alarm flood occurs when the alarm activation rate in a given period is more than the ability of the operator to respond effectively [12], creating a stressful environment for the operator to manage and understand the stream of activated alarms [13]. Therefore, it is necessary to understand the highest operator performance capability in terms of responding to each alarm. Based on operator performance studies, response time is approximately 49 seconds per alarm. In other words, it takes an operator one minute to respond to an alarm and execute the correct response [14]. Therefore, to accurately model a complete flood event, alarm flood calculation must consider adjacent time periods when high alarm activation rates [9]. To address alarm system issues, the American National Standard Institute and International Society of Automation (ANSI/ISA) developed standards for alarm management systems in process industries and power plants in 2003. The first revision of the standard ANSI/ISA-18.2 ''Management of Alarm Systems for the Process Industries'' was made in 2009 to introduce the requirements and the recommendation for effective alarm management. The most crucial feature of ANSI/ISA-18.2 is the introduction of ten stages of the alarm management lifecycle for abnormal situation management (ASM) [9], [15]. The USA Chemical Safety Board (CSB) recognized the importance of this revised standard through references in accident investigations [9], [16]. Engineering Equipment and Materials Users Association (EEMUA) introduced EEMUA-191 (Alarm systems: a guide to design, management and procurement) to complement ANSI/ISA 18.2. EEMUA-191 prescribes the requirements for alarm systems and acknowledges human limitations regarding the management of alarms. It introduced a method to define the level of manageability for the number of activated alarms, defined alarm flooding based on the number of activated alarms, and described processes to define alarm attributes [2]. Despite the work done on the standards and guidelines, alarm issues are still commonplace in the process industries. This paper follows our previous research in which a method for alarm rationalization was introduced in [17] and takes a step towards developing practical solutions that comply with the standards and guidelines to address concerns with alarm floods. It does so by proposing a method for better alarm prioritization that reduces the number of unnecessary and less important alarms presented to the operator when trying to manage abnormal situations. This approach aims to produce an alarm system that notifies the operator in a timely manner that enhances the operators' decision-making process instead of adversely impacting.
The remainder of the paper is organized as follows. Section II reviews the literature and describes the alarm management challenges and the related engineering requirements. Section III introduces the proposed modeling and metrics to improve the prioritization process. Section IV shows the prioritization approach applied to the Tennessee Eastman Process (TEP) model. Section V provides a discussion. Finally, Section VI concludes the paper and explains future research directions.

II. LITERATURE REVIEW
System safety requirements for abnormal operational events must be considered to develop a safe and productive operation without unnecessary trips or downtime [18]. ASM in large systems is critical for keeping an operation productive and safe [19]. Producing the required control requires a defence in depth or layers of protection approach that sometimes demands human intervention that can be prompted by notification with alarms. IEC 61511 defines the layers of protection as showed in Fig. 1 to prevent or mitigate plant hazards.  [20], [21].
The process control layer, which includes the basic plant control system (BPCS), controls operational variables automatically based on predefined logic to maintain production close to the optimal level. The alarm layer is designed for operator intervention to prevent product loss, hazardous failures or unnecessary shutdowns [22]. In alarm engineering, the design and implementation of effective alarms at each layer of control or safety are essential. Finally, the safety instrumented layer, which includes a safety instrumented system (SIS), measures operational variables to detect hazardous conditions and trigger automatic emergency shutdowns [20]. The mentioned layers above are preventive layers of protection. The remaining layers are mitigative layers to reduce the consequences of the incident.
The failure of alarm systems as a layer of protection may lead to disastrous incidents, for example, in Buncefield Oil Depot, where a failure of a tank-level transmitter prevented the high-level alarm activating, and the liquid level continued to rise in the tank to the 'ultimate' high level. Then the following protection layer, an independent safety level switch, failed to trigger and stop filling, which ensuing tank overflow incident that resulted in a 1.6 billion USD loss [23], [24]. Alarm system failure was also a major contributing factor in the BP Texas refinery explosion that resulted in fatalities, injuries and 1.5 billion USD loss, the layer of protection and alarm system failure [25]. However, some alarm issues can be resolved by improving the design and rationalization of alarms. For instance, greater alarm setpoint precision and alarm offset (dead-band) based on the process conditions can significantly decrease alarm chattering and nuisance alarms [26].
On the other hand, alarm flood is a more complex issue to cope with for the operators, which requires better alarm rationalization and prioritization due to a load of active alarms [27]. For instance, in the Texaco Refinery in Milford Haven, two operators received more than 275 alarms in less than 11 minutes which caused alarm system failure as a layer of protection [28]. This exceeds the recommended ten alarms per ten minutes, which is deemed as is the limit of an operator's ability to manage alarms safely [2].
In moderately-sized process plants with thousands of alarms defined, alarm management is time-consuming for the operation [29]. Therefore, tools and techniques are required to rationalize and prioritize alarms to ensure safety and maintain plant operation efficiency [30]. Not only is responding to each alarm promptly a critical part of effective abnormal situation management, but the order of responses also impacts the alarm management performance [9]. The response order can be related to the alarm priority and the operator's understanding of the root causes of the abnormal events. This study takes a step towards addressing the gaps in alarm issues by developing a method to prioritize active alarms in a manner that helps improve operator understanding about progressing abnormal situations.

A. ALARM RATIONALIZATION AND PRIORITIZATION
The complexity of modern control and safety systems means that alarms are activated in ways that can overwhelm operators, making it difficult for them to respond to situations correctly [31]. Spurious and irrelevant alarms distract operators from their tasks and result in ignored high-priority alarms [32]. Alarm rationalization aims to reduce the number of alarms presented in abnormal situations, following some practices like eliminating unnecessary or repeated alarms (to avoid operator overloading) and revising alarm attributes (to reduce alarm issues and identify alarm priorities) [33]. Alarm attributes include setpoints, dead bands and responses, like the critical timing, appropriate action, and procedures. These attributes should be verified and updated accordingly during commissioning and startup to reduce nuisance or failed alarms. Based on ISA-18.2, appropriate alarm management is an ongoing activity that occurs throughout the lifecycle. All changes need to be updated and revalidated due to the consequences and impact of those changes. Alarm rationalization aims to define the minimum alarm set required for maintaining normal operation. Rationalization is essential as it removes redundant and unnecessary alarm load for the operator.
Alarm prioritization helps to reduce the likelihood of excluding critical alarms and assisting operator decision making [11]. Prioritization is the critical part of the alarm rationalization, and each priority is often defined by the severity of the consequences and required response time [9]. The EEMUA defines the requirements of the alarms to be unique and related to the particular process variable with an assigned priority level. During operation, operators face hundreds of alarms and should be able to take the necessary corrective actions in the correct order, based on alarm priorities. ANSI/ISA 18.2 recommend rules for alarm prioritization, as shown in Table1. If operators do not respond to alarms according to the required-response timing and priority levels, an alarm flood may result and lead to loss of plant control [34]. According to EEMUA-191, ten alarms per ten minutes is the limit of an operator's ability to manage alarms safely [2]. Alarm prioritization is usually achieved by considering the severity of the consequences as defined in a risk matrix [15], [35]. For example, some reports recommend that alarms be prioritized into 3 or 4 [9], while others recommend six [35]. In complex systems, each category may contain hundreds or thousands of alarms. Hence ordering alarms in each priority level assists the operator in returning the condition to the normal state in a shorter time [11].

B. ALARM MANAGEMENT CHALLENGES
Most of the challenges related to ASM in process plant operations occur during operation startup and when the operation is functioning close to design limits to maximize productivity [4], [36]. More human intervention is required in both situations as operations are exposed to more disturbances and abnormal events. When the BPCS cannot control the anomalies, operator intervention is needed to resolve the abnormal situation before safety systems shut down the operation. Alarm systems can be unsatisfactory when designed without careful consideration of human capabilities and limitations, as recommended by EEMUA-191 and shown in Table 2. Design teams usually tend to consider more alarms as adding extra alarms is relatively cheap with new technologies, VOLUME 10, 2022 leading to alarm issues like nuisance alarms and alarm floods. Therefore, to reduce human error in process industry operations, operator skills, tasks, and concentration levels must be considered in the alarm design process [32].

C. SAFETY-CRITICAL ALARMS
Safety alarms, as IPL alarms, are related to critical hazards or have a key role in controlling those hazards. Safety-critical alarms are classified as highly managed alarms (HMA) that require well-developed documentation and procedures. According to IEC61511/ISA-84, a critical safety alarm applied to an SIS should be independent of BPCS and considered an independent protection layer (IPL). A common failure mode for safety-critical alarms, like operators, procedures, and systems, needs to be considered to verify the claimed risk reduction factor. Therefore, a human error analysis should be applied to calculate operator reliability to manage alarms within the maximum response time in a prescribed working environment. The process safety time is critical in determining the maximum available response time to achieve the required risk reduction for a process [20], [22]. EEMUA191 recommends that corrective actions shall be described clearly with sufficient details. Periodic training and testing are required for operators to respond to alarms appropriately [2].

D. ALARM PRIORITIZATION
Regardless of some improvements in alarms rationalization and prioritization process, alarm flood and failure in managing alarms are still significant issues in abnormal situation management [30]. Therefore, an advanced alarm prioritization method is recommended to track the process and provide more detailed priority information, particularly within process upset peak alarm activation rate [11]. Suppose alarms can be prioritized in a way that matches the real-time operational priorities. In that case, operators are more likely to maintain more accurate situational awareness and respond more efficiently to alarms.
A couple of studies in the last decade have tried to address the prioritization problem for alarm management. Naghoosi et al. [37] and Foong et al. [19] developed a solution with the fuzzy logic method, Kondo et al. [38], Dorgo and Abonyi [39] and Niyazmand and Izadi [40] applied data mining to prioritize alarms, Bayesian network modelling like Wunderlich and Niggemann [41], Stief et al. [42], Naderpour et al. [43]. All the mentioned researchers offer substantial ways to prioritize alarms, but they need sufficient actual operational datasets in all operational modes to increase their accuracy; therefore, it is impossible to apply these methods in the design phase. The motivation for this research is to develop a method to assist the design team in prioritizing alarms prior to commissioning and then use the priority index for the alarm management lifecycle.

III. ALARM PRIORITIZATION AND MANAGEMENT
This section presents the proposed alarm prioritization and management method that uses the design alarm data to prioritize alarms. The proposed method relies upon our previous research on process alarm modeling using graph theory [17], in which each alarm links to the next available alarm based on the process control and flow from one equipment to the next one; or from lower protection layer to the higher level until reaching to a trip alarm. In doing so, this model integrates all alarms in one model so that further detailed analysis can be performed. The current paper develops a method for alarm prioritization based on Graph metrics.

A. GRAPH THEORY AND METRICS
Graph theory is a powerful tool that has been used in the last decade to solve complex issues [44]. A graph includes nodes and links G = F (N, L) where G is a function of N and L where N is a set of nodes (n 1 , . . . , n n ), and L is a set of connections (l 1 , . . . , l m ). Each graph is also known by the related adjacency matrix A(G) = {a ij } as n × n matrix (A ∈ R n×n ) of zeros and ones, where there is a connection between n i and n j then a ij = 1; and w ij is the weight of the link l ij . An eigenvalue λ of the G is define det (A − λI n ) = 0; where λ = (λ 1 , . . . , λ n ) is a set of real values as A(G) is a symmetric matrix [45]. Graph metrics assign values to each node which reveals information about, i.e. connectivity, importance or clustering; which shows patterns of connections, detail structure of the graph: i. PageRank, introduced by Page et al. [47], is a complicated algorithm in graph analytics that measures nodes' transitive influence or neighbours' influence. PageRank measures the connectivity of nodes by iteration or by counting how many hits will occur on each node throughout a random walkthrough using Equation 1 [48]. ii. Degree centrality (DC) measures the number of incoming and outgoing relationships for each node. When Q (n i ) is the number of connected nodes, and DC (n i ) is the degree centrality of the node n i using Equation 2. iii. Betweenness centrality (BC) estimates the shortest path between nodes, and each node is ranked based on how many short routes pass through that particular node. Thus, BC is calculated as S(n j , n k ) shows the quantity of shortest path between nodes, and S(n j , n k |n i ) shows the paths that transverse through the node i by Equation 3 [49], [50]. iv. Closeness Centrality (CC) measures nodes based on propagation patterns and calculates a total score of the shortest distance to other nodes by Equation 4. v. Shortest Path or Dijkstra algorithm is the famous algorithm to calculate the shortest path or weighted path between two nodes [49].

B. ALARM GRAPH MODELING (AGM)
A modern process plant has thousands of alarms, and each alarm has dozens of attributes; therefore, a powerful tool is required to manage this big data. A relational database can be a suitable solution that can be made from the master alarm database, including all inaction alarm consequences. The resulted relational database can be converted into a graph model. Each alarm is one node in the graph, and inaction consequences are the links. Other alarm characteristics are considered as in the node weights or attributes [51]. We previously developed a rationalization model called Alarm Graph Modeling (AGM) [17]. The AGM is a graphical alarm representation based on the causality and layer of protection characteristics to display the order of alarms, track cascading alarms, and help detect nuisance alarms. In AGM, all defined alarms are considered nodes. According to the processing logic, they link together based on the process variable correlations on each equipment and then from equipment to equipment or sub-systems according to process flow to end in trip alarms [17].

C. ALARM PRIORITIZATION METHOD
Prioritization is the critical part of the alarm rationalization. We propose here a qualitative and systematic solution for prioritization problem through providing priority indexes (Pis) by the following steps: Step 1: Set objectives and boundaries of investigation. This method studies alarms and trips defined in the alarm system and related to the alarm layer of protection.
Step 2: A graph model underlying AGM consists of alarms and trips, as nodes and causality as a directed link between them are developed. More details for AGM can be found in the previous paper [17].
Step 3: Using the graph model developed in Step 2, the following PIs are calculated: • PI-1: The highest priority includes alarms in AGM that are the closest node to an incident and usually contains safety alarms as the last barrier to the trip or plant emergency shutdown. These alarms are identifiable by using the graph distance function (distance from a trip).
• PI-2: The high priority category contains alarms with high numbers of outgoing links. If an operator fails to respond, it will cause further alarm activation propagation and consequently can result in an alarm flood. This process will be done with Page Ranke and the outdegree function.
• PI-3: The medium category includes alarms with more links as this category of alarms connects more paths to trip, consequently changing the pattern of the proceeding failure. Indegree function and betweenness centrality functions support this process.
• PI-4: The low priority category is for the alarms remaining on the short path ending in trip alarms. This step is supported by the Shortpath function-Dijkstra [52].
• PI-5: The lowest priority category contains the remaining alarms.
Step 4: The developed AGM will be used to evaluate the change impacts as a change management integrity tool.

IV. CASE STUDY
TEP is a process model introduced by Downs and Vogel in 1993. Since then, it has been used broadly in process control research to investigate controllability challenges associated with its non-linear characteristics and open-loop instability [53]. The TEP model contains five major equipment, including a two-phase reactor, a partial condenser, a separator, a stripper, and a compressor, as shown in Fig. 2 [54]. TEP includes a large number of measured processes and independent variables that can be manipulated. Two products (G and H) are produced from four reactants (A, C, D, and E). A further inert trace component (B) and one by-product (F) are present. The gaseous reactants are fed to the reactor, where they transform into liquid products. The following reactions take place in the gas phase [54]: A gas + C gas +D gas → G liquid A gas + E gas → H liquid A gas + C gas + E gas → H liquid 3D gas → 2F liquid VOLUME 10, 2022 A. PROCESS CONSTRAINTS Table 4 lists process constraints related to the equipment protection which need to be incorporated into the alarm system, alarm setpoints and shutdown limits. The TEP model is based on a real non-linear process rather than a complex multi-component system [55]. The process has 41 measurements and 12 manipulated variables [53]. Ricker developed a multi-loop controller and simulation under MATLAB/Simulink, including C-Mex and S-functions [56].
A partial HAZOP was executed to identify the possible process safety hazards and abnormal situations for TEP cascading events, like spillover or massive release, which are detailed in [57].

B. TEP MODEL SIMPLIFIED ALARM PHILOSOPHY
The alarm philosophy is developed for the TEP based on the applicable standards and best practices, which define alarm attribute requirements, rationalization and prioritization prerequisites, operation needs, change management, and audit requirements. The simplified alarm philosophy is as follows:

1) PURPOSE
The alarm system is in place to maintain normal operating conditions and prevent unnecessary trips or accidents. The operational alarms are connected to the plant BPCS, and safety alarms are connected to the SIS, including the fire and gas system (FGS) alarms. The SIS alarms also include safety-critical alarms activated before a trip occurs [20], [58].

2) DEFINITION
According to the IEC-62682, alarms should be related to a unique sign of an abnormal event with an explicit response action and an identifiable response time [59].

3) PRIORITIZATION
The priority of alarms should be clear to enable the operator to manage the underlying problem effectively. Each alarm should be distinguishable and logical to guide the operator and facilitate effective decision making [9].

C. ALARM SYSTEM DESIGN
The alarm philosophy abstracted above was applied to the design of the alarm system whilst the following hazards were considered for the simulation: equipment failure, overfilling, spillover, uncontrollable reaction, overpressure protection, over-temperature, and leakage or massive release [57]. Alarm setpoints are generally set below operation design limits according to the rulesets mentioned in Table 5 unless an adjustment is required during commissioning and startup.
Alarm response-times were defined as per the below criteria in Table 6 in which if specific process characteristic required quicker or slower response times: • Safety-critical alarms or those alarms related to the SIS (that cause a trip) should be responded to in less than three minutes to prevent spurious trips or accidents.
• Alarms related to the fast process reactions should be responded to between 3-15 minutes.
• Alarms related to non-urgent abnormal events should be responded to between 15 to 60 minutes.
• Other abnormal events should be considered an alert with a response time over 60 minutes [60].

D. TEP ALARM SYSTEM
The development of the proposed alarm system is based on the well-known MATLAB/SIMULINK simulation on the TEP by Bathlet et al. [55]. This simulation includes the original codes developed by Vogels et al. with some minor updates to the C-Mex/S-function component of the simulation [61]. Past studies involving the TEP model focused more on the controlling strategies to make the whole process operation stable and on optimizing the process operation. However, for model applications involving alarm management, alarms must first be identified and implemented into the TEP simulation to measure the process variables and activate the alarms when in abnormal conditions. In this study, 74 alarms were defined for process control and safety to enable monitoring of process variables such as pressure, temperature, flow, level, and quality. For this component of the study, the following three simple rules have been defined for alarm philosophy: • High and low alarms are considered alarms related to the BPCS, with agitator, pump, and compressor failure alarms also considered connected to the BPCS.
• High-high and low-low alarms are considered trip alarms, near misses, or incidents and belong to SIS.
• Nine trips are defined for the TEP, including a reactor trip, condenser trip, cooling system trip, stripper trip, reboiler trip, downstream overfill, purge increase, and production loss. All these trips are simply called the plant trip for this study.

E. DEVELOPING AGM FOR ALARM PRIORITIZATION
Activation of each alarm is likely to increase the probability of activating linked alarms if the previous alarm is not responded to; hence, connections are drawn between defined alarms by following the process flow, process control and IPL logic. For example, flow alarms can increase the probability of activating level and pressure alarms (if process fluid contains high-temperature vapour), temperature alarms can link to pressure alarms, level alarms on one piece of equipment can link to level alarms on adjacent equipment, high alarms can connect to high-high alarms on higher protection layer, and so on. Finally, alarms can end in a trip alarm, as illustrated in Fig. 3. The main advantage is to capture all alarms in one model for further detailed studies and rationalization purposes, e.g. alarm justification, safety design review as detailed in the rationalization paper [17]. This study applies the introduced method for prioritization in section III to the TEP. AGM is fully connected, and there is not any singular alarm or separated part, which can be a sign of good alarm system design as each abnormal situation can end in a trip alarm. Fig. 4 shows the shortest alarm activation paths to trip for the TEP, which is developed by the graph ''Shortpath'' that in each path, diverse alarms are considered before the trip alarms. The identified Activation Path To Trip (APTT) can be used to provide a more accurate guide for the operators at the design stage.
Path A: The high-pressure scenario on the reactor causes a reactor trip and consequently the plant emergency shutdown. The left branch of this path can result from either the high temperature of the feed gases or the failure of the cooling system on the right branch of path A. Hence, on the left branch, the feeder flow needs to be reduced, or the temperature of the feeder gases need to be treated in the pre-treatment facilities. Nevertheless, on the right branch operator need to increase the flow of the cooling water (or, in case of cooling system failure, then interrupt the operation).
Path B: Low temperature in the reactor will cause loss of production as it interrupts the reaction; the temperature of the feed gases should be maintained at the desired level of temperature. Also, the outlet temperature of the cooling systems (TAL-16) shows the high-flow or low temperature of the cooling water system, which should be controlled to keep the reactor temperature in the normal area.
Path C: Stripper temperature should not be below a specific limit; path C shows the low-temperature failure path for the stripper, which causes production loss or quality loss and trip. This can happen due to boiler failure or the low temperature of feed lines.
Path D: This path is related to overfilling the reactor; the proper action is to stop feeding the reactor or discharge it; otherwise, it progresses to trip and will trigger with pattern A if not appropriately treated.
Path E: The path is related to the low amount of the fluid to start the reaction that will lead to reactor trip unless the operator increases the filling rate into the reactor. Path F: The path shows the overfilling pattern of the separator, which needs to increase the discharge from the separator via the pump, or the purge rate will increase and cause the trip.
Path G: The path shows the pattern that overfills the stripper that may end in the boiler trip to prevent leak or release.
Path H : It displays the pattern to the empty splitter, which causes trips to the boiler and operation.
Path I : It shows the failure pattern, which increases the purge rate and causes the trip. This left branch of this pattern is related to high temperature and pressure, and the right is related to the discharge pump failure.
Path J : It shows the high temperature and pressure on the separator, which also can be triggered by path F.
Path K : It shows the overpressure scenario in the stripper and trips the boiler.
The mentioned patterns are the major shortest alarm activation paths that may occur individually at the same time or trigger each other due to the correlation between the variables. Therefore, these APTTs aim to provide the opportunity to define a more effective operator recommended response according to the pattern of alarm activation; moreover, these patterns can be used in the alarm workshop activity to justify the number and diversity of alarms in each path. Table 7 shows the measured graph metrics for the TEP based on the developed AGM. The related graph metrics data are used to determine PIs in the five proposed categories. The last column shows the PI ranking based on graph metrics versus the current priority in the second column, in which alarms are prioritized based on the conventional method.
PI-1: the critical alarms are in the PI-1 category, which shows their importance for the safe operation, and they are primarily included high-high and low-low alarms. This group of alarms in PI-1 are matched with the conventional method results due to their consequences. Also, some of the other priorities are changed based on their essential role in the graph, (i.e. TALL-12 and LALL-16) are moved to PI-1 due to distance and outdegrees. However, they have medium priority based on the conventional method.
PI-2: this category is allocated to those alarms with high centrality value and high outdegree as the significant alarms due to the importance and causing an alarm flood. Some alarms, e.g. LAH-01, LAH-09, LAH-13, LAL-10 and CPF-01, with low severity of consequences, have PI-2 based    on the operational importance due to the cascading effect derived from AGM. This category of alarms is not possible to capture with conventional methods but will significantly impact alarm management strategy to reduce the load of activating alarms. PI-3: shows the pivotal alarms which turn paths of alarm activation in case of inadequate alarm response, and they are not mentioned in PI-2. This category of alarms, if poorly responded to, will be hard to address as root causes on downstream alarms due to change in the pattern of alarm activation. LALL-12, TAHH-22 and TAHH-25 are moved to this category after graph analysis shows that they are not adjacent to the trip, and there are high and high-high pressure alarms on the downstream path to the trip. Still, they have a pivotal role and are located in this category due to their metric value. Also, they do not need to be SIS alarms and shall be considered as BPCS alarms.
PI4: Those alarms in the short paths to failure are not mentioned in the categories mentioned above. The short paths indicate the alarms in the critical time path to the trip, which need to be responded to on time. All the alarms in this category have low priority for the TEP but need to be responded after previous alarm categories to stop APTTs progression. PI-5 contains the remained alarms as the lowest priority. This group includes FAH-01, FAH-03, FAH-05, FAH-07 from the medium priority group as graph metrics show that they have enough alarms on the downstream paths, so they are assigned to this category.

V. DISCUSSION
After reviewing many methods for alarm prioritization, below questions have been raised: can a valid method of alarm rationalization and dynamic prioritization be developed for complex systems? To what extent do the proposed alarm rationalization and dynamic prioritization may improve alarm management efficiency? Most of the available methods use the actual operational dataset for evaluation purposes. Therefore, there was still a need to facilitate rationalization and prioritization at the design stage, which was not demonstrated or discussed in the literature review. The AGM method provides a uniform structure to collect all alarm attributes in one model for a variety of alarm system applications to assess real-time cause analysis and alarm system performance study, which is not addressed in the reviewed literature. Another issue not addressed in the reviewed literature is the enormous amount of effort required to develop the other approaches.
The proposed approach described here uses the master alarm database (MADB), which is an output of the design stage, to develop the relational database. Then this is used to develop the graph model. There is a variety of methods available to develop MADB, from design documents and smart P&IDs. In this work, this process is performed using MATLAB.
A measurable framework for alarm PI identification proposed in this research systematically prioritizes the alarms. It presents them in a manner aimed at assisting the design team in reviewing the results in a shorter time and comparing them with the results of a conventional method based on the consequence severities. Indeed, the AGM shows the capability to evaluate alarms for alarm workshops without overruling the priority of the safety-critical alarms. At the end of the alarm workshop, the alarm practitioner will be able to review the graphical representation for verification purposes through the whole system lifecycle. In the next step of this research, the proposed approach will be applied to the TEP simulation to evaluate the alarm management performance indications. Time-related factors will be evaluated to investigate how the alarm management issues may be improved with the priority index order decisions.

VI. CONCLUSION AND FUTURE WORKS
Today, many complex systems, including process plants, rely upon advanced control systems that collect a massive amount of data from distributed sensors and indicators from all over the system. Such distributed systems have hundreds of individual alarms and still need human operators to understand and handle abnormal situations. Consequently, sophisticated alarm management methodologies are required to support operators with this understanding and decision making. This paper develops an alarm prioritization method based on Alarm Graph Modeling (AGM) [17] to provide an alarm priority index at the design stage when actual operation data are not available. The proposed method also optimizes alarm response order, which assists in overcoming the difficulties of alarm issues, particularly alarm flood, and improving alarm suppression and shelving techniques. The proposed method is implemented and evaluated through the well-known TEP simulation. The result of the proposed alarm rationalization and prioritization method compared to the results from the traditional method and discussed.
The proposed method will be applied to the simulation. In the simulation's capacity, the alarm management performance indications will be measured, and time-related factors will be evaluated to show how the alarm management issues may be improved with the priority index order of decisions.