A Formal Method for Description and Decision of Android Apps Behavior Based on Process Algebra

Android is the most popular mobile platform, and it has become a primary malware target. Existing behavior-based Android malware detection methods suffer from false positive and false negative problems, which lead to low detection accuracy. Formal theory is crucial in studying the behaviors of Android applications characterized by high concurrency, interaction, and mobility. However, existing formal methods mainly focus on specific issues and lack the essential abstraction and high-level description of application behavior. In this study, we propose a formal method for the description and decision of application behavior based on process algebra. First, we propose a formal method for describing application behavior at a component level using process algebra. By extending $\pi $ -calculus theory, we establish the mapping relationship from the Android application to process algebra, and present the semantics and evolution rules of behavior based on process algebra. Second, we describe the behavior of four types of components in applications and characterize concurrent interactions of components using process algebra expressions. Third, we define the behavior equivalence and simulation mechanism for application behavior analysis and propose the decision rules based on weak simulation. Finally, we discuss a demonstration case, which includes malicious behavior, to demonstrate the feasibility and effectiveness of the proposed method. The results show that our method can accurately describe and analyze application behavior, which provides theoretical support for technologies and methods of behavior-based detection.


I. INTRODUCTION
Smartphones are being widely used with the development of Internet and IOT. Annual worldwide sales of smartphones stand at around 1.56 billion units, with Android accounting for 85.1% [1]. In the next few years, Android smartphones will remain more than 85% of the market [2]. Nowadays Android has become the most popular mobile platform, and the number of Android Apps is growing rapidly. However, due to the openness of free source, Android is highly The associate editor coordinating the review of this manuscript and approving it for publication was Eyuphan Bulut . vulnerable to malware attacks and has become the primary target of malware. Malware not only expose the private information and confidential data in smartphones to the risk of being stolen [3], but also impact the confidentiality, availability or integrity of system [4]. The efficient detection of Android malware has become a popular research topic.
The technologies of malware detection are divided into three categories: static analysis, dynamic analysis, and hybrid analysis [5]. Static analysis decomposes Android Application Package (APK) files by reverse engineering and extracts various features from the disassembly code without running source code [6], [7], [8]. It has high accuracy and efficiency in detecting known malicious code, but has high false negative in detecting unknown malicious code because it cannot deal with code confusion and dynamic code loading. Dynamic analysis runs Apps and analyzes them by monitoring runtime behaviors and data [9], [10], [11]. It performs well in detecting all types of code, but requires more resources and time costs. Moreover, some behaviors cannot be recognized correctly because the execution path of the application cannot be fully traversed. Hybrid analysis uses both static analysis and dynamic analysis by combining their advantages [12], [13], [14].
A variety of malware detection approaches are proposed based on these techniques. To further improve the detection of malware, researchers conducted a series of studies. On the one hand, technologies such as machine learning were widely applied to detect malware. Although the detection model based on machine learning had false positives in intrusion detection [15], Liu et al. [16] discussed the application and prospect of machine learning and suggested that it would be effective and promising in malware detection. On the other hand, researchers carried out theoretical research to promote the development and innovation of malware detection technology. Nowadays a series of formal theories have been proposed to analyze permission frameworks, component interactions, and application behavior.
By summarizing existing research, we observed that malware detection based on behavior performs satisfactorily but still suffers from false positives and false negatives. Formal theory is crucial in analyzing the behavior of Android Apps characterized by high concurrency, interaction, and mobility. However, existing formal methods mainly focus on modeling and validating specific issues, which lack the essential abstraction and modeling of behavior. In this paper, we use process algebra as an abstract language to analyze the behavior of applications, and propose a formal method for the description and decision of application behavior. Behaviors are described using process algebra expressions and operators, and can be analyzed by the inference and calculation mechanism of process algebra. The feasibility and effectiveness of the method in this study are verified by discussing a case containing malicious behavior. To the best of our knowledge, this is the first time process algebra has been employed to research behavior formalization and decision in behavior-based malware detection.
The formal method in this paper can help understand and reveal the essence and laws of application behavior. Our research will help any researcher who wants to carry out research in behavior-based Android malware detection in various domains such as static, dynamic, and hybrid, and guide them with theoretical support. The main contributions of this paper are summarized as follows: 1) A formal theory was proposed for describing the application behavior based on process algebra. By extending π -calculus theory, we abstract the behavior elements of Apps and map them to process algebra. In section III, we defined the basic behavior semantics of application behavior and proposed a formal definition for component behavior. Based on the inference and calculation mechanism, we presented the evolution rules for analyzing behavior. 2) Based on the formal theory proposed in this study, we described the behavior of four types of Android application components and formalized the behavior of application. The states and concurrent interactions of components are described using process algebra expressions. 3) On the basis of behavior formalization, we proposed a method for application behavior decisions based on weak simulation. According to process equivalence, we defined strong simulation and weak simulation for discussing the equivalence of behavior, and proposed decision rules for application behavior based on weak simulation. The reminder of this paper is organized as follows. Section II discusses the related work. Section III establishes the formal theory for describing application behavior based on process algebra. Section IV describes the behaviors and concurrent interactions of the four types of components in applications. Section V defines strong simulation and weak simulation of behavior equivalence, and proposes decision rules based on weak simulation. Section VI discusses a demonstration case derived from real application that contains malicious behavior. Section VII concludes our work.

II. RELATED WORK
Currently, malware detection has been heavily researched. Researchers apply technologies such as machine learning to the analysis of the static and dynamic features and have proposed many methods from various perspectives.
Li et al. [17] proposed a framework based on association mining, which uses association rules derived from N-gram features mining to achieve efficient detection. In [18] and [19], deep learning technology was used to detect malware and performed well. In [20], a malware detection approach was proposed to analyze Apps at source code level by utilizing a deep traversal tree neural network. In [21], they converted the bytecodes of the ''classes.dex'' files to visual images, and proposed a vision-based detection model composed of 16 CNN algorithms. Zhang et al. [22] presented a hybrid representation learning approach to clustering weakly-labeled malware by preserving heterogeneous information from multiple sources. In [23], a novel framework was presented to improve the malware detection for Android IoT devices by combining the advantages of both machine learning and block chain technology. In [24], a signaturebased framework was proposed to detect malware using API calls and other features. Zhang et al. [25] proposed a detection method based on the method-level correlation relationship of abstract API calls. The accuracy on malware datasets Drebin and AMD was 96%. Huang et al. [26] proposed a sequence-to-sequence neural network to investigate a sequence of Windows API calls recorded from malware execution and produce tags to label their malicious behavior. Arora et al. [27] constructed graphs for malicious and benign applications by extracting permission pairs from manifest file, and detected malware by comparing these graphs. Xiao et al. [28] combined dynamic features and static features into composite features to detect malware and achieved an accuracy of 97.12%. In [29], they defined different corresponding behaviors and correlated features at four levels, and proposed a host-based detection system to classify behaviors of malware. Base on machine learning, [30] and [31] used permission as dynamic and static features to detect malware, respectively. Wang et al. [32] used seven feature selection algorithms to select permissions, API calls, and opcodes. The results of each algorithm were merged to obtain a new feature set to classify applications. Fatima et al. [33] used evolutionary genetic algorithm to construct the optimal feature subset and then used the subset to train classifier for malware detection. In [34], useful API calls were used as features to construct API subsets of malicious and benign applications to classify applications. In [35], they used permission and action repetition as static and dynamic features to identify malware by leveraging machine learning, and proved their efficiency and influential roles in detecting malware. Arslan et al. [36] designed a permission-based detection system. The system used hybrid analysis to detect malware and achieved an accuracy of 91.95%.
To reduce false positives and false negatives, researchers have introduced formal theory into the research of malware detection to promote the development of detection techniques and methods. In [37], π-calculus was used to analyze and validate security of software behavior. Chaudhuri [38] proposed a semantic-based formal description of Android Apps to help understand behavior security. Jia et al. [39] proposed a formal model for Android components based on process algebra, aiding developers in implementing least permission. Shen et al. [40] proposed a behavior detection method based on function and process algebra for the detection of privilege escalation attacks in Android Apps. To describe the interactions between Apps, [41] proposed a formal interoperability semantic to help understand and infer Android interoperations. In [42], a methodology based on formal methods was proposed to help understand and identify obfuscation codes. To help understand permission model of Android Apps, researchers have proposed a series of formal theories. He [43] presented a formal model of permission framework using high level Petri nets. It precisely defined relationships among different levels and could be used to analyze permissions and their combinations. In [44], formal methods were used to verify the security mechanisms of Android, and a comprehensive specification of permission model was developed to describe and justify the attributes of expected behavior in Apps. In [45], a formal approach was proposed to help identify the potential defects in Android permission protocol. Khan et al. [46] used theorem proving approach to analyze the security of Android permission, and proposed a language-based formal model for the analysis of Android security. In theoretical research, formal theory can be validated by theorem proving and existing tools. Scyther and Tamarin are automatic tools for the formal analysis and verification of security protocols [47], [48]. MWB (Mobility Workbench) is a tool for manipulating and analyzing mobile concurrent systems described in π -calculus or CCS. In [40], MWB was used to analyze formal expressions of application behavior to help detect collusion attacks.
Researchers have proposed various methods in malware detection and developed formal theories to support these techniques. However, existing detection methods based on behavior suffer from false positives and false negatives. Furthermore, existing formal research lacks the essential abstraction and description of application behavior. In view of the concurrency and interaction characteristics of Android Apps, we extend the π -calculus theory, which is suitable for mobile concurrent systems, and propose a formal method for the description and decision of application behavior. Based on the behavior semantics and rules proposed in this paper, application behavior is described using process expressions and operators, and then is determined according to behavior equivalence mechanism. In this study, the theory for the description and decision of application behavior is validated by analyzing a demonstration case derived from real Android Apps. The next section discusses the modeling of process algebra elements for application behavior, and present the semantics and evolution rules for application behavior.

III. SEMANTICS AND EVOLUTION RULES FOR ANDROID APPLICATION BEHAVIOR
An application is made up of four types of components: activity, service, content provider, and broadcast receiver. Components communicate with content provider through Uri, and communicate with other components through Intent. Intent mechanism is a run-time binding mechanism and is a communication mode of Android Apps. It is used to transfer information and data between components and its intentional or unintentional improper use may lead to security problems such as information leaks, malicious calls, and component hijacking [49], [50], [51]. Therefore, researchers had used Intent as important feature to detect malware [52], [53], [54]. In the modeling of application behavior, Intent becomes more important because it is used in components communication.

A. PROCESS ALGEBRA ELEMENTS FOR APPLICATION BEHAVIOR
In behavior analysis of Apps, analyzing all paths of behaviors is a huge and complex task. However, formal method is effective for studying complex system and plays an important role in behavior analysis. Process algebra, as the representative of formal methods, is suitable for analyzing concurrent system. Bekic proposed the basic semantics of process algebra consisted of at least three operators and seven operation rules [55]. In 1982 [56], Bergstra proposed the specific definition of process algebra. Nowadays, process algebra has developed many branches and extensions [57].
CCS [58] is a process algebra theory proposed by R. Milner and has been widely used in the analysis and validation of concurrent systems. Since CCS cannot describe the change of topology in mobile concurrent systems, R. Miller further extended CCS to proposed the π -calculus [59]. It is a named calculus and is especially suitable for mobile concurrent system. Its structured semantics can formally describe the concurrency and interaction of behaviors, and achieve the composition, decomposition and reduction of behavior. Additionally, π -calculus provides a mature and sound simulation theory to study behavior equivalence in the analysis of system behavior [60]. In π -calculus, an action or event is a behavior unit, which is called atomic behavior. Behavior is a series of actions or events, which is called a process. By introducing the concepts of name and channel from communication, messages, events, and actions are mapped to names, communication ports are mapped to channels. Names could be transmitted in channels. By adding prefix action to describe interactions, process expressions can be used to describe the behaviors of system.
The term ''process'' in process algebra is not the process of operating system, but the behavior mode of system. It describes system behavior through a finite set of actions, which can be further analyzed by inference and calculation mechanisms. Unless otherwise noted, the process mentioned below is ''process''. Android Apps are complex systems composed of collaborative and concurrent components, and attackers are no longer limited to a single attack mode, but implement collaborative collusion attacks through the communication of components. Therefore, we extend π -calculus suitable for studying mobile concurrent systems to the research of application behavior.
The behavior of Android application can be described by the behavior of the concurrent and interactive component instances which are instantiated in the process of operating system. At the code level, a statement or function call is a basic behavior, called behavior unit. Component behavior is the external representation of a series of program statements and function calls. The behavior unit is abstracted as an action. The actions in a component are divided into intra-action and inter-action according to whether they interact with the outside of the component. Intra-action which is independent of external environment can only describe the internal evolution of component, so it is simple and easy to describe. Inter-action can describe not only the internal behavior of the component, but also the interaction with the external environment, so the behavior description is very complicated. By introducing the semantics and rules of process algebra, we establish the mapping from Android Apps to the π -calculus of process algebra as follows: • Modeling the component instance of Android Apps as process. The instance can be run in one or different processes of the operating system.
• Modeling the interaction of component instances as process communication. • Modeling the behavior unit in program code, such as statement, function call and method call as action.
• Modeling variable, parameter, data, message, event, entity attribute, intent and other elements used in component interaction as name. The mapping relationship between elements of Android Apps and π -calculus is established as shown in Table 1.
By extending π -calculus, we propose the process algebra elements in Android application behavior, and use process algebra as an abstract language to describe the behaviors of application. In the following, we propose the semantics and rules for application behavior formalization.

B. BASIC SEMANTICS OF APPLICATION BEHAVIOR
Components in Android Apps are treated as the subject and object of behavior. Application behavior is described with subject, object, action, input of subject, output of object, and state of component. The semantics of application behavior are given below.
Definition 1 (Name): The concepts of data, parameters, and communication channels, as well as behavior information and state information, are unified and abstracted as name, which can be denoted as a, b, . . . ∈ Name.
Name is the basic element of behavior and is transmitted as message in process. a is the complementary name of a, and a def = a, a, b, . . . ∈ Name. An sequence of ordered names a 1 , . . . a n can be denoted as a, then P ( a) = P(a 1 , . . . a n ). If a and b are name-sequences of length n and P is a process expression, then { b/ a}P means that each a i in P are replaced by b i separately, which is called α conversion. If the length of a and b are both one, { b/ a}P can be denoted as {b/a}P.

Definition 2 (Observation & Reaction):
The actions are divided into observation and reaction according to whether they can be observed from outside.
• Action t is an observation if it can be observed through its interaction with component, or through the complementary action t after t interacts with component. Ation t is also called observation action.
• Action τ is a reaction if it cannot be observed outside, but can only be observed through the results of internal interaction. In the concurrent execution of processes, the occurrence of actions in a process affects itself and the interaction between processes. The inter-process interactions can be observed, but VOLUME 10, 2022 interactions between parts of a process are like occurring in a black box and cannot be observed externally.
If action t in process P can initiate an interaction with process Q, we use t to represent the action in Q that interacts with P, where t is a complementary action of t, and t def = t. In this way, a pair of labels (t, t) represents the inter-process interactions. We can observe t by observing the occurrence of t. As a result, we can confirm interaction by observing the occurrence of t or t. Action t is called observation and is an inter-action with a complementary action t.
If Action τ in process P only affects the process itself, it will not directly affect the process interaction. As a result, τ cannot be observed directly, but only through the results after τ occurs. Action τ is called reaction and is an intra-action without complementary action.
Definition 3 (Prefix Action): Let P be a process, π be a action, then π.P means that π must occur before P becomes active. Action π is a prefix action of P and ''.'' is called prefix operator. There is a sequential relationship between π and P. The prefix action π is defined recursively as follows: [x R y] π means action π could be executed when [x R y] holds, where R is a logical operator. τ is the reaction defined in Definition 2.
An expression with the structure ''π.P'' is called a guardian expression, and P is guarded by π . A expression with the structure ''P = a.P'' is called a recursive guardian expression, and P is recursive. To simplify guardian expressions without affecting the semantics they represent, the expressions can be reduced to one expression according to the following rules: • None of the expressions to be simplified are recursive. • The expression obtained by reduction is not recursive. For example, Let P = a.P 1 , P 1 = b.P 2 , where P and P 1 are not recursive. They could be reduced to P = a.b.P 2 if P = a.b.P 2 is not recursive. However, P = a.Q, Q = b.R, and R = c.P are not recursive, but they cannot be reduced to P = a.b.c.P because the result is recursive.
In communication, (t, t) means action synchronization, where t is the complementary action of observation t. If t (x) and t y in expression (t (x) .P + M )|(t y .Q + N ) are not guarded by other actions, they constitute a sync-action-pair (t (x) , t y ). The firing of (t (x) , t y ) in process will lead to the occurrence of (t (x) .P Definition 5 (Trace): A series of changes of the system state can be represented as s 0 t n → s n , the sequence of ordered actions t 1 , t 2 , . . . , t n is called a trace of system behavior. The set of traces of all system behaviors is represented as traces(S).
Definition 6 (Component Behavior): Component behavior is composed of names, processes, and symbols according to the BNF paradigm and the following syntax: 1) π.P is a guardian expression, which means P will be active while action π occurs. π is the prefix action defined in Definition 3. ''.'' is the prefix operator, which is another expression of sequence operator. 2) P 1 + P 2 is a selection structure, which means P 1 or P 2 will be selected and be active according to the context. ''+'' is the selection operator. 3) P 1 | P 2 is a parallel structure, which means that P 1 and P 2 are executed concurrently. The result depends on the context while there are guardian structures during concurrency. ''|'' is the parallel operator. 4) new aP means that name a is restricted within P and can only be used inside P. It can be represented as (new a)P, new (a)P or (v a)P. ''new'' is the restriction operator, and a is a restricted name of P. 5) !P means that P replicates itself and a copy of P is created. ''!'' is the replication operator. 6) 0| √ means the end of process, where 0 indicates that the process is forcibly terminated and √ indicates that the process ends successfully.

C. EVOLUTION RULES
All rules are based on the atomicity of action. Action a is atomic means a can be executed and terminated successfully, which is represented as a a → √ .

1) RULES FOR PREFIX OPERATOR
• a.P a → P While observation a occurs, process will transfer to P and P becomes active.
→ P While reaction τ occurs, process will transfer to P and P becomes active.
→P Observation a can execute successfully while reaction τ occurs. While reaction τ occurs, process will transfer to P and P becomes active.

2) RULES FOR SELECTION OPERATOR
• a.P 1 + P 2 a → P 1 While observation a occurs, a.P 1 is executed. Then process will transfer to P 1 and P 1 becomes active.
While observation a occurs, P 1 will be executed and terminated successfully. Then process will transfer to P 1 and P 1 becomes active.
While observation a occurs, P 1 and P 2 can be executed and terminated successfully. Then process will transfer to another selection structure P 1 + P 2 .
These rules also hold for reaction τ .

3) RULES FOR PARALLEL OPERATOR
While observation a occurs, P 1 will transfer to P 1 and P 2 has no change. Then process will transfer to another parallel structure P 1 |P 2 . The rule holds for reaction τ . ) is a pair of synchronous actions used to represent communication between P 1 and P 2 , which is discussed in Definition 3. P 1 a → P 1 indicates P 1 transfer to P 1 after a occurs, expressed as P 1 = a.P 1 ; P 2 a → P 2 indicates a in P 2 occurs as the complementary action of a and P 2 transfer to P 2 , expressed as P 2 a → P 2 . According to the first rule for prefix operator, there are expressions a.P 1 a → P 1 and a.P 2 a → P 2 . While a occurs, P 1 initiates interaction with P 2 and P 1 |P 2 will transfer to another parallel structure P 1 |P 2 . a) means that actions are synchronous, which is discussed in Definition 3. While observation a occurs, process will select a.P 1 and a.P 2 to execute and can be expressed as a.P 1 |a.P 2 . According to the first rule for prefix operator, there are a.P 1 a → .P 1 and a.P 2 a → P 2 . According to the second rule for parallel operator, the process will transfer to another parallel structure P 1 |P 2 .
• (a( x).P 1 + M )|(a y .P 2 + N ) a → { y/ x}P 1 |P 2 x and y are name-sequences of length n, which is discussed in Definition 1. (a, a) is discussed in Definition 3. While observation a occurs, process will select a( x).P 1 and a y .P 2 to excute and then transfer to another parallel structure { y/ x}P 1 |P 2 .

IV. BEHAVIOR AND INTERACTION DESCRIPTION OF COMPONENTS IN ANDROID APPLICATION
Application behavior consists of the behavior of components that are executed concurrently. In the concurrent system S, the size of traces(S) increases too rapidly as the actions in the interaction increase. Consider two processes with n actions, where actions can be executed interactively in any correct order. Using S(n) to represent the scale of concurrency, the following conclusions are obtained after calculation: It can be predicted that S(n) will increase geometrically with n, which is not conducive to further analysis.
In application behavior analysis, considering all actions to fully characterize and analyze application behavior will bring an unbearable burden. While studying application behavior at the component level, intra-action can affect the state, but has little impact on transitions. Therefore, we ignore invisible intra-actions and describe behaviors by inter-actions which can be observed and can trigger component interaction. The commands of inter-component communication (ICC) are abstracted as actions to describe the behavior of application. Some ICC commands are shown in Table 2. In Android Apps, an activity is presented as a page, which is the carrier of information. An activity can be the initiator or receiver of the interaction between activities, in which intent is used to encapsulate data to exchange information. Behaviors of activity include the initiation and acceptance of request, the return and reception of result, and the closure of activity, which are defined as follows: In the interaction between activities, action setResult must wait for the action finish to execute before forming a syncaction-pair with setResult to return data to the initiator. The actions in this process always appear in the form of setResult, finish, setResult . Therefore, we replace it by setResult, setResult and describe the process as follows: setResult m .Activity running setResult ------→ Activity destroyed , setResult (n) .Activity paused setResult ------→ Activity running .
A typical interaction between activities is ''The display page switches from page A 1 to page A 2 . After time T , page A 2 is closed and page A 1 returns to the foreground''. The actions executed by A 2 in the duration T is represented as e, and the interaction process is described as follows: Page A 1 is stopped and page A 2 is running while the display page is switched from A 1 to A 2 . The behaviors and state transitions are described as follows:  In the interaction between activity and other components, the process description and state transitions are as follows: startActivity x .Component | startActivity (y) . Service has two startup modes: starting and binding. Components use intent to encapsulate data to communicate with service. Service is responsible for responding to service requests of components, including the start, bind, unbind, and stop of the service. The behaviors are defined as follows: startS (y) .Service, stopS (y) .Service, bindS (y) .Service, unbindS (y) .Service.
A running service can be bound by multiple components. To reflect the changes of the component instances connected to service, the collection of currently connected instances is represented as clients. Then, clients (c) .Service indicates that instance c is added to clients, clients (c).Service indicates that instance c is removed from clients. In addition, Service provides a method stopSelf () which is used to stop service instance when clients is empty. Consequently, there are some more behaviors defined as follows: In order to stop a running service, regardless of the order in which startSevice() and bindService() were called, unbindService() and stopService() should be both called to ensure that both of onUnbind() and onDestroy() are executed. Moreover, onUnbind() should be executed before onDestroy(). The behaviors and state transitions of service are described as follows:

C. BROADCAST RECEIVER
Broadcast receiver is the receiver in Android broadcasts and is responsible for responding to the broadcasts of system and component. The behavior is defined as follows: sendBroadcast (y) .BroadcastReceiver.
Each component of application can initiate a broadcast as a sender, where the broadcast is handled by onReceive() in the receiver. Broadcast has a short life cycle, which begins at creation and ends after successfully execution or termination due to timeout. The duration of the broadcast is represented as timeCost, and the timeout of the broadcast is represented as timeOut. Then, the behavior is described as follows: sendBroadcast x .Component | sendBroadcast (y) .

D. CONTENT PROVIDER
Content provider is used to store data and provide data sharing in applications. It can be accessed by external process if android:exported=''true'' is set in declaration. Components manipulate the data in content provider by using the methods of content resolver. The behaviors are defined as follows: insertCP (u) .ContentProvider, deleteCP (u) .ContentProvider, updateCP (u) .ContentProvider, queryCP (u) .ContentProvider.
Components first create an instance of content resolver, and then call insert(), delete(), update(), query() with uri as VOLUME 10, 2022 the parameter in the instance to manipulate data. The behavior of inserting data is described as follows: (newr)(RegistContentResolver (r) .insertCP u .

Component) | insertCP (u) .ContentProvider
Action RegistContentResolver (r) is the intra-action of caller component for creating instance r of content resolver, where r can call class methods to manipulate data. Therefore, the behavior can be simplified and described as follows: The processes of deleting, updating, and querying data are consistent with the process of inserting data, therefore there is no longer to repeat the description.

E. COMPONENT INTERACTION WITH DATABASE
The methods for data operations on database are predefined. In order to maintain the consistency of logic and formal descriptions, data operations are regarded as interactions and expressed as op (data) .Component|op (data) .DataBase. The behaviors of data operations are defined as follows: insertDB (data) .DataBase, deleteDB (data) .DataBase, queryDB (data) .DataBase, updateDB (data) .DataBase.
In this section, we describe the behavior of components using process algebra expression and achieve the behavior formalization of application. These expressions conform to the semantics and rules in this paper and have been validated using MWB. Since the validations are not the focus of study and require many pages, they are not included in this paper.
Based on behavior formalization, application behavior can be analyzed using the process equivalence mechanism of process algebra theory. In the following section, we define the simulation and mutual simulation of behavior to analyze the similarity between behaviors, and then propose decision rules based on weak simulation.

V. BEHAVIOR DECISION BASED ON BEHAVIOR EQUIVALENCE-WEAK SIMULATION
Expected behavior refers to the combination of queues and actions necessary to achieve application functions and meet user requirements. Application behaviors can be categorized as follows according to whether they are expected: • Credible behavior. Behaviors can be monitored and identified, and are expected.
• Malicious behavior. Behaviors can be monitored and identified, and are unexpected.
• Suspicious behavior. Behaviors can be monitored, but can only be partially identified and cannot be identified as expected or unexpected.
An application is benign if all its behaviors are expected, which means that all behaviors are credible. An application is malicious if there are some unexpected behaviors in it, which means that it has at least one malicious behavior. Therefore, in Android malware detection, the focus should be whether there is malicious behavior in the application. Based on the relationship between application behavior and expected or unexpected behavior, application behavior could be analyzed according to the behavior equivalence mechanism.
Definition 7 (Behavior Equivalence): Let P and Q be two different processes, P and Q are trace equivalent if and only if traces(P)= traces(Q). The relationship between P and Q is process equivalence, also called behavior equivalence.
The term process describes the behavior of system by using the elements discussed in Section III. Processes are trace equivalent means that they have the same behavior pattern.
To describe the equivalence between behaviors, we extend the simulation theory of π -calculus and define the behavior simulation and behavior mutual simulation. In the following section, the two concepts are represented with simulation and mutual simulation for ease of description.

A. BEHAVIOR EQUIVALENCE: SIMULATION AND BISIMULATION
Simulation is an one-way description between behaviors. ''P simulates Q'' indicates that the behavior pattern of P is at least as rich as that of Q. Mutual simulation is a two-way simulation between behaviors, also known as bisimulation. ''P and Q are mutual simulated'' indicates that their behavior patterns are equivalent to some extent.
To accurately express the degree of similarity between behaviors, simulation is divided into strong simulation and weak simulation.
Definition 8 (Strong Simulation): Let P and Q be two behaviors, and let S be a binary relation. The relationship between P and Q is expressed as QSP. Then, we say that P strongly simulates Q if the following conditions hold: (1) For each action a in Q and its transition q a → q , where a is defined in formula (1), there exists action a and its transition in P such that p a → p , where p is a derived state of p.
(2) p strongly simulates q . The binary relation S is called a strong simulation. QSP means that for any transition of Q, P has a path that contains all the actions of Q to match it.
Definition 9 (Strong Bisimulation): Let P and Q be two behaviors, and let S be a binary relation. The relationship between P and Q is expressed as QSP. Then, we say that P and Q are strongly bisimulated if the following conditions hold: (1) For each action a in Q and its transition q a → q , where a is defined in formula (1), there exists action a and its transition in P such that p a → p , where p is a derived state of p. Additionally, p strongly simulates q and q strongly simulates p .
(2) For each action b in P and its transition p b → p , where b is defined in formula (1), there exists action b and its transition in Q such that q b → q , where q is a derived state of q. Additionally, q strongly simulates p and p strongly simulates q . The binary relation S is called a strong bisimulation. QSP means that P and Q are strongly equivalent, written as P ∼ Q. From definition 9, it can be concluded that QSP is equivalent to QSP.
Kindly note that ''P and Q are strongly equivalent'' is not equal to ''P strongly simulates Q, and Q strongly simulates P''. The former is a stricter condition and includes the latter.
It is extremely difficult to use strong simulation to analyze the behavior of complex systems because considering all the actions and transitions of the system will bring an unbearable burden. However, the focus of behavior analysis of Android Apps should be on whether there is malicious behavior in the application, rather than whether the application behavior and malicious behavior are isomorphic or homomorphic. Android Apps are complex systems with a high degree of concurrency and interaction, so it is a feasible solution to ignore irrelevant actions to analyze the behavior of application. Therefore, we present the definition of weak simulation of behavior.
Based on the semantics of behavior proposed in Definition 4 and Definition 5, we use symbol e to represent a sequence of actions. It is a sequence of ordered actions, denoted as e = t 1 , t 2 , t 3 , . . . , which can contain any type and any number of actions. (new a) e means there is at least one action a in e. The execution of e is denoted as e ⇒, where there can be any number of interactions.
Definition 10 (Weak Simulation): Let P and Q be two behaviors, and let S be a binary relation. The relationship between P and Q is expressed as QSP. Then, we say that P weakly simulates Q if the following conditions hold: (1) For each action a in Q and its transition q a → q , where a is defined in formula (1), there exists action a and (new a)e in P such that p (new a)e == ⇒ p , where p is a derived state of p.
(2) p weakly simulates q . The binary relation S is called a weak simulation. QSP means that for any transition of Q, P has a path to cover it.
Definition 11 (Weak Bisimulation): Let P and Q be two behaviors, and let S be a binary relation. The relationship between P and Q is expressed as QSP. Then, we say that P and Q are weakly bisimulated if the following conditions hold: (1) For each action a in Q and its transition q a → q , where a is defined in formula (1), there exists action a and (new a)e in P such that p == ⇒ q . Additionally, q weakly simulates p . The binary relation S is called a weak bisimulation. QSP means that P and Q are weakly equivalent, written as P ≈ Q. Weak bisimulation is also called weak equivalence or observation equivalence.
The restrictions of strong bisimulation, strong simulation, weak bisimulation, and weak simulation are gradually reduced. Researchers can apply the corresponding simulation mechanism according to their actual situation.
In application behavior analysis, using strong simulation to analyze the equivalence between behaviors must consider all actions and trasitions. This will be an intolerable burden because it is extremely complicated. Fortunately, we have found that weak simulation is suitable and sufficient for studying suspicious and uncertain behavior. The next section proposes decision rules based on weak simulation according to behavior equivalence.

Rule 1 (Credible Behavior):
Behavior P is a credible behavior if expected behavior Q weakly simulates P.
The rule indicates that P is contained in expected behavior Q. The sequence of actions in P is an ordered subset of the sequence of actions in Q, where actions can be continuous or discontinuous. The sequence of actions in Q is credible, and so is its ordered subset. Therefore, P is credible.
Rule 2 (Malicious Behavior): Behavior P is a malicious behavior if P weakly simulates unexpected behavior R.
The rule indicates that unexpected behavior R is contained in P. The sequence of actions in R is an ordered subset of the sequence of actions in P, where actions can be continuous or discontinuous. Since the sequence of actions in R is malicious, there is at least one path of malicious behavior in P. Therefore, P is malicious.
Rule 3 (Suspicious Behavior):Behavior P is a suspicious behavior if P weakly simulates expected behavior Q, or unexpected behavior R weakly simulates P. P weakly simulates expected behavior Q indicates that traces(P) contains part of the sequence of actions in Q. However, it is impossible to decide whether P is credible or malicious according to this condition alone, because there is no guarantee that all traces in P are credible. Therefore, P is suspicious and should be further analyzed.
Unexpected behavior R weakly simulates P indicates that traces(P) contains part or all of the sequence of actions in R. If traces(P) contains only part of R s actions, it does not necessarily constitute a malicious behavior. It is impossible to decide whether P is credible or malicious according to this condition alone. Therefore, P is suspicious and should be further analyzed.
Based on the relationship between weak simulation and weak bisimulation, the following inferences can be drawn from Rule 1 and Rule 2.
Inference 1 (Credible Behavior): P is credible if it weakly mutual simulates expected behavior Q.
Inference 2 (Malicious Behavior):P is malicious if it weakly mutual simulates unexpected behavior R.
To make decision of application behavior P, it is necessary to construct formal descriptions of expected and unexpected behaviors to form priori rules, and make decision according to these rules. The decision process is as follows: (1) P is malicious if P has some malicious behaviors.
(2) P is credible if all the behaviors of P are credible.
(3) P is suspicious if it contains suspicious behaviors that cannot be determined as malicious or credible. It should be further analyzed according to the rules based on weak simulation and weak bisimulation. (4) P is malicious if it weakly simulates unexpected behavior R, or it is weakly mutual simulated by R. (5) P is credible if it weakly simulates expected behavior Q, or it is weakly mutual simulated by Q. In this process, the decisions of suspicious behaviors are abstracted into new decision rules for expected or unexpected behaviors, and other subsequent behaviors can be directly determined by continuously improving rules. On the basis of behavior formalization, according to the proposed rules, application behavior can be classified and determined by the inference and calculation mechanism of process algebra.
According to Definition 8 and Definition 10, it can be inferred that P weakly simulates Q from the condition ''P strongly simulates Q'' or ''P and Q are strongly equivalent''. Therefore, the rules and inferences based on weak simulation also hold for strong simulation, as shown below: • Behavior P is credible if expected behavior Q strongly simulates P, or P strongly mutual simulates Q.
• Behavior P is malicious if P strongly simulates or strongly mutual simulates unexpected behavior R.
• Behavior P is suspicious if P strongly simulates expected behavior Q, or unexpected behavior R strongly simulates P. In the behavior analysis of Android Apps, using strong simulation to study equivalence is not only complex and difficult, but also flawed. For example, let Q be malicious behaviors, let P weakly simulates Q but does not strongly simulate Q. According to the decision rules based on weak simulation, it can be concluded that P is malicious. However, P cannot be analyzed by using strong simulation because there is no such relationship between P and Q. So P cannot be considered malicious when using the rules based on strong simulation. Therefore, weak simulation is more suitable than strong simulation in studying the equivalence between behaviors. Actully, if there is a strong simulation relationship between behaviors, which indicates that there is a higher equivalence than weak simulation, the same conclusion can be drawn as using weak simulation.
In the following section, we discuss a demonstration case derived from a real Android application containing malicious behavior to demonstrates the feasibility and effectiveness of the method in this paper. Fig. 1 shows component interactions of the demonstration case. The default page is LogActivity and registered users can log into the page directly. New users should first register on RegActivity and then return to LogActivity to log into MainActivity, which provides functions such as information modification. Users register on RegActivity using a mobile number and authorize the application to access the mobile phone address book and send SMS. This application can leak information through component interactions.

VI. CASE ANALYSIS
The process of analyzing application behavior using the process equivalence mechanism of process algebra is divided into two phases: Phase 1: Establish the instances of application component and achieve the behavior formalization of application.
Phase 2: Analyze the relationship between application behavior and malicious behavior by using simulation mechanism, and then make the decision.  LogActivity | startActivity (login) .MainActivity) RegActivity will be terminated if the new user cancels the registration. If the registration is successful, LogActivity will be active after saving user information to database, otherwise user should fill in registration information again. Use isExsited to indicate whether the user exists, the component behavior is described as follows: Users can modify personal information on ModActivity. If user confirms the modification, the new information will be updated to the database, otherwise ModActivity will be terminated. The component behavior is described as follows: (queryDB phone .ModActivity | queryDB (phone) .

DataBase).([confirm = Yes] updateDB newInfo .
ModActivity | updateDB (newInfo) .DataBase The behavior of the case is composed of the concurrent execution of these formal expressions above. The results of validation by MWB show that these formal expressions can accurately describe the behavior of components and the interactions with other components.

B. BEHAVIOR ANALYSIS USING STRONG SIMULATION AND WEAK SIMULATION
As shown in Fig. 1, MainActivity has the behavior of accessing address book and ModActivity has the behavior of sending text message. Then, they form a path for information leakage through interaction and achieve a collusion attack, as shown in Fig. 3.
MainActivity implements collaborative behavior and obtains data from mobile address book through queryCP. It initiates communication through action startActivity and transmits data to ModActivity. Then, ModActivity leaks data through sendTextMessage. Without considering the intermediate states and interactions with other components, the   The symbol e i is a sequence of actions, which can contain any number and any type of actions. We first represent queryCP x .e 1 as (newquery CP) e 1 , and then represent (newsendTextMessage)(e 2 .queryDB.e 3 .ModActivity) as (newsendTM ) e 2 .updateDB.ModActivity. Therefore, the collusion attack containing harmful sequence of actions queryCP, sendTM can be described as follows: (newquery CP) e 1 .startActivity. (newsendTM ) e 2 .
The action of obtaining private data is represented as r, the action of initiating component interaction is represented as c, and the action of leaking data is represented as s. In MainActivity, r is the action queryCP that accesses the mobile address book. Using e 1 to represent the actions of obtaining data, the behavior of obtaining private data can be expressed as p (new r)e 1 = == ⇒ p 0 , and there is some private data in p 0 . In ModActivity, s is the action sendTM that sends text message. Using e 2 to represent the actions of leaking data, the behavior of leaking data is expressed as p 1 (new s)e 2 = == ⇒ p 2 , and data in p 2 has been leaked. Therefore, the collusion attack implemented in the interaction between MainActivity and ModActivity is represented as P, as shown in Fig. 4.
The harmful sequence of actions queryCP, sendTM in application can be expressed as r, s , which constitutes collusion attack Q with a series structure, as shown in Fig. 5.
P is the information leakage behavior in the demonstration case. Q is a typical collusion attack behavior in Android applications. Using strong simulation and weak simulation to analyze the relationship between P and Q, we can draw the following conclusions:
• For r ∈ Q and q r → q 0 , then ∃p (new r)e 1 = == ⇒ p 0 in P. Since p 0 weakly simulates q 0 has been proven, p weakly simulates q. According to Definition 10, we draw the conclusion that P weakly simulates Q.
2) Q cannot weakly simulate P.
• For c ∈ P and p 0 c → p 1 , there are no actions and transitions to match them in Q. According to Definition 10, we draw the conclusion that Q cannot weakly simulate P.

3) P cannot strongly simulate Q.
• For r ∈ Q and q r → q 0 , we first assume that there is a transition p (new r)e 1 = == ⇒ p 0 in P. However, p 0 cannot strongly simulate q 0 because there are no actions and transitions to match s ∈ Q and q 0 s → q 1 . According to Definition 8, we draw the conclusion that P cannot strongly simulate Q.
• If the assumption is not valid, there are no actions and transitions to match r ∈ Q and q r → q 0 . According to Definition 8, we draw the same conclusion. 4) Q cannot strongly simulate P. • It has been proven that Q cannot weakly simulate P in 2).
According to the relationship between strong simulation and weak simulation, it is obvious that Q cannot strongly simulate P. From the above discussion, it can be concluded that application behavior P weakly simulates malicious behavior Q, and P is malicious, but P cannot strongly simulate Q.
In the behavior analysis, even if the strong simulation relationship between P and Q is invalid, we can draw the conclusion that P is malicious based on the condition ''P weakly simulates malicious behavior Q''. However, behavior M cannot be considered malicious if malicious behavior Q weakly simulates M. For example, let Q be a malicious behavior containing transitions q r → q 0 s → q 1 , M be a behavior containing only one transition m r → m 0 . Obviously, Q weakly simulates behavior M, but M cannot be considered malicious because M only contains one action of obtaining data.
By discussing the behavior P in Fig. 3 and the behavior Q in Fig. 4, we have demonstrated that the weak simulation relationship between P and Q can be used to make decision of application behavior. Unfortunately, there is no strong simulation relationship between P and Q. Thus, although strong simulation represents higher behavior equivalence, it sometimes has limitations and flaws in behavior analysis. Moreover, strong simulation requires more work than weak simulation. Therefore, weak simulation is more suitable than strong simulation in application behavior analysis.
In order to illustrate the application of strong simulation in behavior equivalence, we further abstract and simplify the actions in P and Q to generate two more general behaviors, which are still represented as P and Q, as shown in Fig. 6.
Using strong simulation to analyze the relationship between P and Q, we draw the following conclusions: 1) P strongly simulates Q.
• For b ∈ Q and q 1 b → q 2 , then ∃p 1 b → p 2 in P. It is obvious that p 2 strongly simulates q 2 because there is no action in q 2 . Therefore, p 1 strongly simulates q 1 .
• For a ∈ Q and q a → q 1 , then ∃p a → p 1 in P. Since p 1 strongly simulates q 1 has been proven, p strongly simulates q. According to Definition 8, we draw the conclusion that P strongly simulates Q.
• For b ∈ P and p 1 b → p 2 , then ∃q 1 b → q 2 in Q. It is obvious that q 2 strongly simulates p 2 because there is no action in p 2 . Therefore, q 1 strongly simulates p 1 .
• For a ∈ P and p a → p 1 , then ∃q a → q 1 in Q, and q 1 strongly simulates p 1 has been proven; for a ∈ P and p a → p 3 , then ∃q a → q 1 in Q, and q 1 strongly simulates p 3 because there is no action in p 3 . Therefore, q strongly simulates p. According to Definition 8, we draw the conclusion that Q strongly simulates P.
3) P and Q are not strongly equivalent.
• For b ∈ P and p 1 b → p 2 , then ∃q 1 b → q 2 in Q. Since p 1 strongly simulates q 1 and p 2 strongly simulates q 2 were proven in 1); q 1 strongly simulates p 1 and q 2 strongly simulates p 2 were proven in 2). Therefore, q 1 and p 1 are strongly equivalent, denoted as q 1 ∼ p 1 .
• P has two branches at action a. Consider the bottom branch alone, for a ∈ P and p a → p 1 , then ∃q a → q 1 , and q 1 ∼ p 1 is proven, such that q ∼ p. However, for a ∈ P and p a → p 1 in the top branch, then ∃q a → q 3 . Since q 1 strongly simulates p 3 but p 3 cannot strongly simulate q 1 , p and q are not strongly equivalent.
According to Definition 9, we draw the conclusion that P and Q are not strongly equivalent.
From the discussion above, we concluded that P and Q in Fig. 6 are not strongly equivalent, but P strongly simulates Q and Q strongly simulates P. The proof of these conclusions shows that ''P strongly simulates Q and Q strongly simulates P'' is not equal to ''P and Q are strongly equivalent''. In fact, according to Definition 8 and Definition 10, it can be inferred that P weakly simulates Q and Q weakly simulates P. Therefore, both P and Q are malicious or credible according to the rules based on weak simulation.
In this section, we achieve the formal description of the behavior of demonstration application, and then use strong simulation and weak simulation to analyze the equivalence between application behavior and malicious behavior. The results show that the method for the description and decision of application behavior can accurately and effectively modle and analyze application behavior. Furthermore, the analysis process prove that weak simulation is suitable compared with strong simulation and the rules and inferences based on weak simulation are sufficient and effective for the analysis and decision of application behavior.

VII. CONCLUSION
In this study, we proposed a formal method for studying application behavior, which uses process algebra expressions and operators to achieve behavior formalization of an application, and use the inference and calculation mechanism to achieve the analysis and decision of application behavior. In view of the concurrency and interaction characteristics of Android Apps, we extended the π -calculus theory, which is suitable for analyzing mobile concurrent interaction systems, to the study of application behavior. Based on this study's semantics and rules, the behavior of four types of application components was described using process algebra expressions. Further, we presented definitions for analyzing behavior equivalence and proposed decision rules based on weak simulation by discussing the application of strong simulation and weak simulation in behavior analysis. The formal method for the description and decision of application behavior was validated in case analysis. This study will help scholars understand the relationships and laws of application behavior with collaborations and interactions, and it will provide theoretical support for the analysis and detection of application behavior based on various dynamic and static behavior features.
In future research, we will focus on the construction of behavior rules and the simulation detection of behavior to conduct automatic analysis and decision of application behavior. In this study, we have proved that the rules based on weak simulation are effective and sufficient in behavior analysis, and weak simulation is more suitable than strong simulation. However, strong simulation represents a higher degree of similarity in behavior equivalence and the rules based on weak simulation also hold for strong simulation. Therefore, combining inter-action and intra-action to analyze VOLUME 10, 2022 application behavior using strong simulation is a potential research field.