Abnormal Detection of Wireless Power Terminals in Untrusted Environment Based on Double Hidden Markov Model

The wireless power terminals are deployed in harsh public places and lack strict control, facing security problems. Thus, they are faced with security problems such as illegal and counterfeit terminal access, unlawful control of connected terminals, etc. The intrusion detection system based on machine learning and artificial intelligence significantly improve the terminal side’s abnormal detection capacity. In this article, we aim at identifying the abnormal behavior of wireless power terminals based on a double Hidden Markov Model (HMM), which solves the computational complexity problem caused by high dimensions in intrusion detection systems using a single HMM. The lower-layer HMM is used to identify the discrete single network abnormal behavior. Simultaneously, the upper-layer can obtain more extended period attack behavior in multiple independent abnormal events identified by the low-level. The experiment results indicate that the intrusion detection system using proposed double HMM can effectively detect the terminal’s abnormal behavior and identify the network attack behavior for an extended period.


I. INTRODUCTION
In recent years, the situation of network security is gradually rigorous. Network attacks initiated by terminal devices often occur. The destructive power of attacks increases obviously, and the scope of influence tends to expand. In October 2016, an anonymous attacker launched a large-scale DDoS attack by illegally controlling webcam, DVR, and other terminal devices, causing severe damage to Dyn, a DNS service provider on the east coast of the United States. The attack caused a total outage of several well-known Internet services of its customers (including Twitter, Amazon, PayPal, etc.), resulting in more than half of the Americans unable to access the Internet. Gartner, a research organization, predicts that by 2020, there will be more than 20 billion terminal devices globally, and more than a quarter of network attacks on enterprises will involve terminal devices. By the end of 2019, a total of 2526 control servers have been found to control more than 1254000 terminals, posing a serious potential security threat to the stable operation of the Internet.
The associate editor coordinating the review of this manuscript and approving it for publication was Qilian Liang .
The above situation shows that the trend of network attacks extending to the terminal side is apparent. The number of network attacks launched against terminals will continue to grow in the future. Various terminal security at the end of the network has become a key component of complete network security.
With the development of the power IoT, the types of IoT terminals connected to the smart grid are also increasing. According to statistics, 25 kinds of 15.8512 million terminals are connected to the State Grid management information area. Among them, the number of wireless power terminals is the largest, accounting for 84.87%. Due to the lack of effective monitoring means, it is difficult to find the abnormal behavior of wireless power terminals in time, which will enable the abnormal behavior to continue to destroy, obtain the company's critical data, and further launch network attacks. It is very urgent to design security measures to prevent network security incidents in such a complex and severe situation. However, it is unrealistic to avoid security attacks altogether. We can only find and block the abnormal behavior of wireless power terminals as far as possible.
The idea of abnormal behavior detection technology is to establish a benchmark behavior model based on standard network data and then compare the detected data with the benchmark behavior to determine whether there is an abnormal. Compared with other technologies, abnormal behavior detection is sufficient to identify unknown attacks in network flow. The development of machine learning technology further improves the technical advantages of abnormal behavior detection, which can detect abnormal flow and identify existing threats, specific attack types, and unknown new attack means. The anomaly detection mechanism's essence is to analyze, understand, and characterize network behavior and identify or classify abnormal flow instances. Therefore, from the perspective of machine learning, abnormal detection is a classification problem. The methods usually include supervised, semi-supervised, and unsupervised abnormal detection.
The supervised model requires a dataset with instances of tagged normal and abnormal categories. In this case, the typical method is to establish a prediction model for standard and abnormal classes and compare new data instances with the model to determine which class it belongs to. Compared with the standard cases in training data, the number of abnormal cases is usually much less. Meanwhile, to obtain accurate and representative tags, especially those for exception categories, is challenging.
Semi-supervised mode assumes that the training data only has tagged instances for standard categories. Since semi-supervised mode does not require tags for exception categories, it is more widely applicable than supervised techniques. A typical method is to build a model for the class, corresponding to normal behavior and use the model to identify anomalies in test data.
The unsupervised mode does not require training data and is more challenging to achieve current goals. An implicit assumption is needed: normal conditions occur much more frequently than exceptions in the test data. If the hypothesis does not exist, the false positive rate will be high.
Hidden Markov Model (HMM) is a classical model for modeling and analyzing sequence behavior. It has been widely used in many fields, such as speech recognition, natural language processing, and so on [1]. Due to the temporal characteristics of intrusion behavior, HMM's application in the field of intrusion detection has been widely concerned. However, the statistical learning algorithm used in the existing HMM schemes will increase exponentially with the increase of the analysis packets' data volume. In large dimension state space, HMM converge difficultly, which leads to training failure. The behavior recognition of HMM is only related to the current state; therefore, it will ignore the multi-state network attacks across large time scales. In order to solve the above problems, this article proposes an abnormal detection architecture based on a double HMM with two layers. The lower-layer realizes fine-grained abnormal behavior detection by detecting the network data flow frame by frame. Also, it identifies the specific abnormal attack behavior and then obtains the time series of the attack behavior. On this basis, the upper-layer realizes the identification of network attacks in a considerable period.
The structure of this article is as follows: Section 2 introduces the related work. Section 3 gives the framework of abnormal behavior identification of wireless power terminals based on double HMM. Section 4 tests the framework and discusses its performance. The last section summarizes this article.

II. RELATED WORK
In 1987, Denning first proposed the abstract model of abnormal detection and regarded intrusion detection as a security defense measure of computer systems [2]. According to the statistics of behavior portraits, Denning's general model is mainly based on host audit records to generate system behavior portraits and discover intrusion behavior. This model is a real-time, intrusion detection system model. In addition to the host audit record, the system call of the operating system kernel also reflects the program's running behavior in the computer system. Reference [3] uses the system call data set generated by different programs to accurately represent the program's normal behavior through the data modeling method and is used to detect intrusion.
Wagner and Dean proposed an abnormal detection model based on program analysis, which can construct a control flow model by static analysis of source code, instead of building a learning model from program tracking [4]. In reference [5], the control flow abnormal detection can be judged according to the control flow-sensitive attributes (i.e., the ability to analyze the execution sequence of statements) and the mutually orthogonal context-sensitive attributes (the ability to distinguish the call context at runtime). It proposes a static analysis algorithm to construct the control flow and context-sensitive models, in which context sensitivity can reduce the impossible control flow paths to be considered in the intrusion detection system. In reference [6], the Dyck model describes how NFA (nondeterministic finite automaton) is related to context-sensitivity. There is a certain balance between context-sensitive and runtime overhead. In reference [7], a context-sensitive automaton PDA (push-down automaton) is constructed, reducing the time complexity. In reference [8], several techniques are proposed to improve context-sensitivity, such as renaming system calls to distinguish different calls of the same function. The Dyck model's code connects the entry and returns the objective function's address with the call point.
The model can distinguish the call point and improve the context-sensitivity. In reference [9], CFI (control flow integrity) usually means that the program execution must follow the predetermined CFG path. The CFI property's execution can be realized by modifying the source code and object code related to control flow transfer and embedding control flow policy in the binary file. The subsequent CFI technology improves the front and back edge processing and kernel rootkit detection. In reference [10], static analysis can VOLUME 9, 2021 be used to reduce the cost of CFI. In reference [11], Zhang and Sekar proposed a method based on static analysis, which can be used on binary files to reduce CFI's execution cost.
Furthermore, a control flow integrity framework is proposed to demonstrate the replication of functions and function pointers to prevent control-flow hijacking. Reference [12] improves CFI technology, and the monitoring system realized pays more attention to the call part of the control flow. Both data flow and control flow have specific effects on anomaly detection. Data dependency analysis has been used to model and detect malicious behaviors. Research has confirmed the validity of system call modeling parameters, such as anomaly detection according to the string distribution in reference [13].
In reference [14], WIT (Write Integrity Testing) technology can prevent memory error attacks. It can predict writable objects through static analysis. Wit technology also realizes the integrity of control flow and ensures the consistency of indirect control transmission and control flow graph during runtime. In reference [15], DFI (data flow integrity, DFI) attribute, first proposed by Castro, Costa, and Harris, refers to the consistency requirements between the runtime data stream and the static predicted data stream and demonstrates the detection process of DFI to control and uncontrolled data attacks.
The above work mainly analyzes the program statically's possible execution control flow, so it needs to deal with the considerable program execution space. The system call stack information of program execution reflects the program's actual execution process, so it can better reflect the program's behavior. Reference [16] proposed a new method for abnormal detection using call stack information. Experimental results show that this method can detect attacks that other methods cannot detect. In reference [17], the combination method of static analysis and dynamic learning is adopted. In this method, program tracing is used to define the basic static generation model. The hybrid pushes down automata (HPDA) to describe the call stack information to obtain the program's control flow efficiently. However, this model is not a probabilistic method and cannot record, model, and predict branches. In reference [18], probabilistic data mining technology is used to analyze attack behavior. Warrender et al. proposed the first probabilistic learning work for program behavior modeling. Probabilistic abstract interpretation in reference [19] is used to calculate and limit the knowledge gain associated with information dissemination. In reference [20], the probability of program path execution was estimated by Monte Carlo simulation. In reference [21], Sampson et al. Provided a framework for expressing and verifying the probability of variables in programs based on the Bayesian network model. In reference [21], a probabilistic modeling method is proposed to predict new and invisible programs' properties. An intrusion detection model for recording and evaluating call sequences is based on n-gram. This method collects call sequences (such as system calls) to form a collection of allowed call sequences, and any new or unordered call sequences are classified as exception sequences. However, this method is limited to the need to enumerate and store all possible call sequences, which affects its scalability.
Hidden Markov Model (HMM) is a classical model for modeling and analyzing sequence behavior and has been widely used in many fields such as speech recognition, natural language processing, etc. Due to intrusion behavior's temporal characteristics, the application of HMM in intrusion detection has also received extensive attention [22]. In reference [23], researchers proposed using HMM to compare two parallel abnormal detection methods. The execution graph model in reference [24] is constructed by learning the program runtime's execution mode, that is, the return address on the call stack related to the system call, and using the inductive attributes in the call sequence.

III. ABNORMAL DETECTION STRUCTURE OF WIRELESS POWER TERMINALS BASED ON DOUBLE HMM A. OVERALL STRUCTURE
According to the smart grid system's security characteristics, this article adopts the model of network isolation and security access based on no connection. The model includes four parts: communication front-end processor, network security isolation (short for isolation), network security access gateway(short for gateway), acquisition front processor. Also, it is equipped with a self-defined private protocol for communication. First, the overall access model is described with single isolation and single gateway architecture, as shown in the following(see Figure 1): Communication front-end processor CP: it has the socket link to access and maintain a large number of terminals, initiate a small number of sockets to connect to the isolator, and have the ability to filter private protocols and forward application layer messages (private protocols). Meanwhile, the communication front-end and the acquisition front-end interactive terminal access information to provide addressing service for the messages sent by the master station.
Network security isolation N.I.: with physical isolation and analytical isolation capabilities. In physical isolation, the classic 2 + 1 physical isolation design idea is adopted, composed of two systems: the front and post systems. The high-speed multi isolation card channel based on PCIe is used to communicate with each other, and the TCP / IP protocol of the network layer is shielded physically. The isolation card uses a high-performance FPGA chip, adopts scalable multi-channel mode to support high-speed isolated switching, uses four channels by default, and single-channel can support 1Gbps traffic.
Network security access gateway N.G.: as a server, it provides socket access for isolation post and acquisition front. The gateway has the ability of message encryption and decryption. The gateway decrypts the data reported by the terminal. After processing, the message is forwarded to the acquisition front end in plaintext. The plaintext message sent from the acquisition front is encrypted by the gateway and sent to the ciphertext terminal.
Acquisition front processor A.P.: it encapsulates the private protocol. It is used to address the C.P. The corresponding to the private protocol's encapsulation and landing is the terminal side. After receiving the private protocol from the gateway, the A.P. unpacks the private protocol, extracts the business data, and transfers it to the master station. It receives the business data from the master station and assembles the private protocol. Next, the communication front node is designated to assist in addressing the corresponding terminal.
HMM, abnormal detection HAD: with terminal abnormal behavior detection ability. After the terminal access authentication, HAD has real-time analysis of terminal transmission message content to ensure that the legitimate terminal will not be used as a springboard to carry out network attacks. The HAD uses the HMM to analyze and predict terminal transmission behavior and excavate terminal behavior deviation. Thus, HAD prevents the legitimate terminal from abnormal behavior attacks.

B. DOUBLE HIDDEN MARKOV MODEL
HMM is a parameterized probability model used to describe the statistical characteristics of a random process. It is a double random process. One is the Markov chain, which describes states' transition, and the other random process describes the relationship between states and observations. HMM defines three kinds of probability: the initial state probability vector α, the state transition probability matrix P, and the observation probability matrix O. HMM can be expressed by these three probabilities, namely λ = {A, Bπ }. A represents the state transition matrix of the implicit state, which describes the transition probability between each state in the HMM model; B represents the observable state chain, which is related to the implicit state in the model; π represents the initial probability matrix, which refers to the probability matrix of the initial implicit state.
According to the characteristics of attack behavior, each attack behavior event can be described by several attack actions, and each attack action is composed of a set of abnormal behavior data time series. Therefore, we can construct a double HMM with two layers to describe aggressive behavior characteristics in a considerable period (see Figure 2).
Each layer is an HMM sequence, and the upper HMM uses the possible visible state sequence of each HMM in the lower layer to construct the second layer's training data. It will be used to train the upper HMM, which will use information from the lower HMM to learn new patterns that the lower HMM may not recognize.
Use M = {A 1 , B 1 , π 1 , A 2 , B 2 , π 2 , H} to represent the double HMM, A 1 , B 1 , π 1 and A 2 , B 2 , π 2 represent the lower and upper HMM, respectively. H represents the conditional probability matrix of the upper HMM to the lower HMM.
For a specific network behavior, its parameter set M is: 1) State transition matrix A i : In the i-th HMM, the current state can only be transferred to the next state but cannot return to the previous state, a ( 2) The state output probability matrix B i , which represents the probability that the state will output an observation value at the current moment. defined as: 3) The initial state probability distribution π i , since the transition of the state always starts from the S B state, it has the following definition: π Likelihood probability (see Equation 2):

C. OBSERVABLE STATE CHARACTERISTICS OF ABNORMAL TERMINAL BEHAVIOR
How to extract data with abnormal behavior characteristics from monitoring sequence and observe these characteristics to reflect the terminal's abnormal behavior plays a vital role in determining abnormal detection accuracy. A total of 4 statistical observation features are selected to determine the terminal's behavior, which not only fully shows the change of terminal behavior, but also effectively avoids the complexity of calculation. The four-movement characteristics are as follows: (1) Familiar characteristics FCh TM i : the higher the number of historical communications between the terminal and the communication front-end, the greater the familiarity between them. Familiarity will affect the terminal's trust under evaluation, and the familiarity between them depends mainly on the number of communications after the terminal is connected. Therefore, the familiar characteristics of the terminal can be expressed by Equation (3): (2) Similar characteristics of business behavior BCh TM i : In order to calculate the similarity trust degree of business behaviors, the same number of message types transmitted by the terminal TM i Moreover, the same type of terminal can be used to calculate. Therefore, if the terminal TM i The terminal of the same type transmits more of the same message type, that is, participating in the same business activity; it will have a higher degree of trust. Equation (4) gives the calculation of the similarity trust of business behaviors. Use PT same TM i to indicate the number of the same message type transmitted by the terminal TM i and the same type of terminal, and PT all TM i to indicate the total number of message types transmitted by the terminal. If most of the message types transmitted by the terminal are the same as those transmitted by other terminals of the same type, the credibility of this terminal is higher. Conversely, although it cannot be directly determined that the terminal is untrustworthy, it can be suspected that the terminal has been attacked and abnormal behavior has occurred.

BCh
(3) Access behavior characteristics ACh TM i : The smart grid edge computing terminal's network access behavior has a certain regularity, so the network access behavior of the terminal can also reflect its abnormal state. The calculation method is given in Equation (5): ε(DA t TM i , DA old TM i ) represents the difference between the terminal's destination address access behavior in the current time and its historical access address, which can be solved by editing distance. When the difference of its access behavior is more significant than a value L, the trust degree is zero. This formula shows that the more regular the terminal's access behavior, the higher the trust degree. If the access behavior is too different, it means that the terminal node has minimal data contact with the destination address node it visits or that the terminal node has abnormal behavior.
(4) Data load behavior characteristics DCh TM i : After the terminal is connected, the real-time data load characteristics in the data interaction process also reflect the terminal's trust to a certain extent. Under normal conditions, the data load of the terminal DATA TM should present a regular Gaussian distribution. When the terminal is attacked or used to carry out an attack, the terminal's data load will be mainly attacked, deviating from the average data load. Therefore, the characteristics related to the terminal data load behavior can be calculated by Equation (6): Use the above four different behavior characteristics to construct an observable state set of terminal behavior: On this basis, a terminal behavior observation feature matrix within the observation period can be constructed. Each column of the matrix T represents the terminal's TM i familiarity characteristics, business behavior, similar characteristics, access behavior, similar characteristics, data load behavior characteristics, and constructed observation characteristics. The matrix is as follows: By observing and analyzing the above characteristics, we can detect and analyze the abnormal behavior of the smart grid edge computing terminal caused by the type of attack to evaluate the terminal risk. So far, we have determined the input and output parameter set of the HMM model (see Figure 3).

D. CALCULATION OF STATE TRANSITION PROBABILITY OF ABNORMAL TERMINAL BEHAVIOR
The traditional Markov risk assessment model assumes that the state transition probability matrix of the system does not change with time. However, in the smart grid edge network environment, the state transition probability is continually changing, especially in network attacks. Therefore, this article updates the state transition probability matrix in real-time according to the time transition probability of the attack state switching on the network from the perspective of time. First, we determine the difficulty of each stage of the attack by calculating the ratio of the time spent in each attack stage to the time it takes to complete the entire attack to objectively calculate the difficulty of state transition between each stage of the same attack. Second, the attack state's transition probability can be calculated according to the difficulty of transition between different attacks. The smaller the difficulty, the greater the transition probability and vice versa.
We define T as the time cost of the whole process, and t i represents the attack time cost of state i of the whole attack process. We define the attack process as A = .
Then give the general formula of the state transition probability matrix P, D i ∈ D, d ij represents the difficulty form from state i to state j. Here: expresses the transition probability of a node from state i to state j to be attacked.

IV. EXPERIMENTAL RESULTS AND ANALYSIS A. TEST ENVIRONMENT CONSTRUCTION
To verify the effectiveness and performance of the algorithm proposed in this article, related experiments have been done in a laboratory environment. The topology of the experimental environment is shown below (see Figure 4): The configuration of the relevant experimental environment is shown (see Table 1).

B. ANALYSIS OF EXPERIMENTAL RESULTS OF ABNORMAL BEHAVIOR DETECTION AND EVALUATION
First, acquiring the data set from the NS2 simulation database, and dividing it into three different attack stages, according to the details and specific functions of each data set. The attack stages include the data collection stage, continued attack stage, and the node occupation stage. Second, calculate the average attack time based on each attack's statistical analysis (see Table 2 -4).
The experience of the observation matrix can be based on the expert's experience and set as: Next, use the γ k t Viterbi algorithm to calculate the abnormal value. First, understand the general process of the Viterbi algorithm, as follows: Step 1: Step 2: Recursion or loop.
Step 3: Result. P * = max 1≤j≤N γ t (i) . In the same way, we can obtain the abnormal value of the remaining nodes. R 2 t = 6, R 3 t = 6.5, R 4 t = 5.5,   According to the abnormal value of the seven nodes in the simulation, the entire network's abnormal value can be calculated. First, determine the time weight in the Equation.
The attacks of nodes are different at different times, and the degree of exposure is also different. Take one day as an example. The time of the day is divided into three time periods: T 1 : 0 : 00 ∼ 8 : 00, T 2 : 8 : 00 ∼ 16 : 00, T 3 : 16 : 00 ∼ 24 : 00. The attacks in the second period are the most active. It also has the greatest impact on the entire network. Followed by the third period, and finally, the first period. The quantitative value of the importance of these periods based on professional knowledge. The relative importance weights of the three-time periods can be obtained after normalization: w T 1 = 0.11, w T 2 = 0.67, w T 3 = 0. 22 From the above, we can get the abnormal value of the entire network R t = 1.49.
As the time slice is divided into 3 hours, the attacks in Table 4 were applied to network nodes between 9 o'clock and 12 o'clock, and 21 o'clock to 24 o'clock, and the  abnormal value increased significantly from the normal state (no attack), as shown by the red circle in the figure (see Figure 5). Among them, a total of 3 attacks in Table 4 were applied between 21:00 and 24:00, and the abnormal value was significantly increased from 12:00 (one attack), which was about twice the value at 12:00. The experimental simulation results show that this article method can effectively identify and evaluate the terminal's abnormal behavior in the network.
As shown in Figure 6, compared with the Probabilistic Risk Assessment and Dynamic probabilistic risk assessment methods, the anomaly recognition and evaluation calculation method proposed in this article requires the shortest calculation time under different network scales. Moreover, with the increase of the network scale, the time required for this article's method shows a slow increase trend, so the method proposed in this article can be better suited for large-scale network environments.

V. CONCLUSION
This article proposes a method for identifying abnormal behaviors of wireless access power terminals based on double HMM, which solves the computational complexity problem caused by high dimensions in intrusion detection systems. The lower-layer is used to identify discrete individual network abnormal behaviors. The upper-layer obtains a longer span of attack behavior from multiple independent abnormal events identified by the lower-layer. The experiment shows that our method can effectively detect the terminal's abnormal behavior and identify the network attack behavior through a long-time span.