Human-Object Relations and Security Control in Inference System for the User Intention

The Internet of Things (IoT) networks are getting bigger and bigger. In most cases, all IT assets are connected to the network and various resources and services are provisioned proactively as needed. To achieve this, many smart objects are being developed in the field of intelligent devices. However, most of these objects can only enable their smart functionalities only after the user starts interacting with the object, which leads to an absence of intelligence between things. Bridging the gap between these entities to improve network productivity and improve network security is a challenge. Because the logs generated by IoT devices are vast and diverse, it is difficult to detect and defend against cyber-attacks with existing network security technologies. Existing cyber-attack detection systems cannot detect new attacks because they defend by defining known attack patterns as rules. This paper presents relations and security control in inference system to infer with which object a person wishes to interact by observing his behavior. Security control for services in this paper is important, and specialized. To achieve this goal, this inference problem is resolved into a problem of distinguishing, for each object, whether the IoT device has the intention to interact with it. It analyzes human behavior and detects whether there is a cyber-attack intention. Subsequently, for every object, a set of human-device relations, including Relative Distance, Relative Angle, Movement Speed, Approach Efficiency and Movement Efficiency, is extracted from the person’s behavior. These relationships are used to determine whether a person wants to interact with a particular object using a Support Vector Machine (SVM) classifier. And new mechanisms are needed to shrink massive raw logs and detect new attack patterns. Thus, this paper suggests a method for securing a huge network supporting IoT services. The proposed inference systems detect unusual network patterns by calculating correlations between events based on graphs and network measurements. We model ensemble of events based on log graphs interconnected between network devices and IoT gateways. We implement and evaluate an algorithm that detects new attack patterns by estimating the attack probability by clustering the event ensemble in real time. Finally, for the effectiveness of the proposed relations, the experimental application of a real dataset is evaluated, with encouraging results. Using the proposed human-object relations, it is possible to sense users’ interaction intentions in advance and thus to proactively provide user-adaptive services based on safety.

next interaction target would be helpful in alleviating this problem.
Many studies have attempted to explore human intention in all its aspects. Several researchers have tried to directly explore intentions using raw neuronal signals based on the premise that human intention is a mental plan established in the brain [4], [5]. Some of these researchers have obtained brain data by implanting intra-cranial electrodes in subjects, but such invasive surgical procedures are risky and painful [6], [7]. Others have utilized electroencephalograms (EEGs) to avoid this risk; however, this method suffers from a limited bandwidth for neural signals, making it difficult to acquire sufficient data [6].
Using an alternative approach to the research discussed above, several scholars have attempted to infer intentions through the observation of behavior, based on the premise that human intention guides a sequence of behaviors in pursuit of an anticipated outcome [8]. Some of these researchers have extracted features from the movements of parts of the user's body, such as the hands, fingers, and face, and have then inferred the user's intention by identifying the most similar one from among a predefined set of intentions [9].
However, the applicability of methods of this kind is restricted to their predefined intentions. In addition, in some of these studies, intentions are treated as discrete states and the most likely one is inferred based on learned transition relations between intentions, which is an approach that is difficult to extend to a large-scale intention domain. In this context, this article illustrates a method of measuring human-object relations from user behaviors and inferring whether there is an intention to interact based on these relations. More specifically, this problem of inferring interaction intentions is resolved into a problem consisting of a mutually exclusive judgment for each object: whether there is an intention to interact or no intention to interact. To differentiate these intentions, the following human-object relations are extracted from the behavior exhibited by humans when approaching objects [10].
• Relative Distance: The Euclidean distance between the user being tracked and a specific object.
• Relative Angle: The angle between the user's body orientation and the line connecting the user and the given object.
• Movement Speed: The rate of change in the user's position.
• Approach Efficiency: The ratio of the number of objects that have been approached during the time window of interest to the total number of observations.
• Movement Efficiency: The ratio of the displacement with respect to the object to the total travel distance.
This paper describes these relationships in detail, which are proven effective in estimating whether an individual is willing to interact with a particular object based on real data and extraction methods. Using the proposed human-object relationship, the user's interaction intention can be identified in advance, which is a function that can be used for useradaptive security, AI-driven human-IoT service interaction, and anti-phishing service.
At the same time, service security threats are increasing. As the number of connected objects increases and service types change dynamically, security threats increase, and types of attacks become more diverse. Most devices have many vulnerabilities of their own with limited resources due to low-cost constraints. If the device's authentication function is focused on increasing the security level, there is a problem of poor service flexibility. To detect signs of intrusion, it is necessary to analyze the log sequences generated by devices, as well as log sequences generated by existing network equipment and security equipment. To collect and analyze the logs of these various devices, integrated security control is proposed and utilized [11]. Because of the large number of logs generated by the device, it is necessary to detect anomalies after the raw logs are shrunk. Therefore, we propose an algorithm that detects service attacks by correlating network logs. We propose an algorithm that dynamically clusters events and detects attacks by calculating the average degree of neighboring nodes without defining rules or states for attacks. You can infer infected devices by connecting clustered devices and guessing attack vectors. This paper is structured as follows. Section II describes the problems formulation of services. Section III describes related works. Section IV describes Concrete relation extraction methods on human-object. Section V explains event ensemble model and clustering. Section VI shows the network events collection and clustering algorithm for detecting intrusion. Section VII shows implement results and evaluates the model and concludes in section VIII.

II. PREREQUISITES AND PROBLEM FORMULATION
Before the details of the proposed method of inferring interaction intentions can be presented, several basic prerequisite conditions must be stated. First, the environment should be such that all relevant data are known, such as the positions of all objects. Second, the behaviors of the target user are assumed to be reasonable and trackable; moreover, his intentions are assumed to be temporarily stable during inference processing [10].
In this paperwork, this intention inference problem is resolved into several problems of mutually exclusive judgments for each interactive object. If n interactive objects exist in total, then the object space is denoted by O = {o 1 , o 2 , . . . , o n }. For ∀o k ∈ O, there are only two possible cases: the user being tracked either has the intention to interact with this object or does not. Let i k denote whether the user has the intention to interact, as follows: The goal of this research is to construct an inference function f o k , B t to estimate whether a user has the intention VOLUME 11, 2023 to interact with a specific o k . Letî k represent the inferred interaction intention at time t +1 for o k based on the observed behavior up through the instant t: . . ,b t is the set of all recorded behaviors up through t and b 0 is the initial state. The intention inference problem is then formulated as follows.
whereÎ is the inferred set of all objects with which the individual may interact in the next instant. Note thatÎ = {φ} means that there is no inferable interaction intention. Furthermore, because a person may hold more than one interaction intention at the same time, it is possible that Î ≥ 2.
IoT devices have various forms depending on their purpose, function, complexity, and operating system, and the size and complexity of raw logs are high. Collecting, processing, and analyzing raw logs to detect anomalous traffic in real time is difficult. In addition, as new services are rapidly provided, it is difficult to build a response system prior to security incidents as hacking techniques become more sophisticated and intelligent. If active security management for IoT devices is not performed, internal network intrusion using vulnerabilities poses a threat to the entire IT asset. Therefore, intrusion detection and monitoring must be performed on IoT services that connect various types of wired and wireless network devices, platforms, and IoT devices. It is necessary to always store and manage log records so that it is possible to analyze the cause after an intrusion incident occurs. [11].
Most IoT devices have low-power, small hardware and operating systems that cannot generate and store their own log records. And there is no authority management through identity authentication. Due to these constraints, logging and authentication processing is mostly managed at the IoT gateway layer. Therefore, IoT gateways must be able to safely record and store state information of IoT devices periodically. Since IoT devices are mostly applied in a mobile environment, a solution that analyzes anomalies in a cloud environment by mirroring IoT device messages at an IoT gateway is being researched and developed [11]. A framework that analyzes network traffic to find all installed devices and determine which devices are installed. It enables realtime security monitoring for all IoT devices by tracking and monitoring each device's network usage patterns.
Botnets are a typical attack that threatens IoT devices without security authentication and logging. Fig. 1 shows the botnet formation process. Infected devices obtain authentication information by scanning IoT devices that have been factory reset or have weak administrator account passwords. The infected device registers the obtained authentication information with the reporter server and transmits malware from the loader server to the vulnerable device. Vulnerable IoT devices download and execute botnet code from downloader servers set up in malware. When the IoT device's botnet code is successfully executed, it sends a botnet success message to the controller server. As described above, in the process of distributing malicious codes to IoT devices, logs are generated in various network/security devices and servers. Therefore, management systems that integrate these logs and algorithms to analyze them are needed.

III. RELATED WORKS
There were several previous studies about inference of the user intention and security control for IoT services. There are two common means of exploring human intention: surveys and experiments. In survey-based research, data are always collected through questionnaires or interviews, whereas experimental studies involve the analysis of data observed in experiments. Because of space constraints, only the latter are discussed in this paper. To clearly organize these studies, they are divided into two categories based on the data sources used: direct brainwave-based research and indirect observation-based research.

A. BRAINWAVE-BASED RESEARCH
Because intention is generated in the human brain, several researchers have attempted to directly access human brains and thus to assist individuals in issuing intentional commands by means of brain activity. These studies can be subdivided into two major classes based on the way electrophysiological signals are recorded: invasive and non-invasive methods [12].
In invasive methods, several intra-cranial electrodes are implanted in subjects' brains to access neural signals [6]. For example, researchers have implanted electrodes in monkeys' brains to explore the possibility of controlling external devices through neural firing patterns, yielding results that are promising for severely paralyzed patients [5]. Other researchers have implanted microwires in the brains of rats to investigate coding strategies. To achieve pattern recognition, an artificial neural network (ANN) using an optimal learning vector quantization (LVQ) algorithm was applied. That study proved that multiple strategies are used in the rat somatosensory thalamocortical pathway to encode the locations of tactile stimuli. Such invasive studies help to elucidate the mechanisms of various brain functions, but implantation surgery is risky and painful, and there is still a long way to go before these results can be extended to practical applications.
To avoid these disadvantages, several researchers have turned to non-invasive methods of study. Among them, EEGs are the most utilized to acquire brain data in a userfriendly manner. Many such studies have been performed in attempts to understand intentions by analyzing the signals from neurons and then providing feedback to subjects [4], [5]. In most such studies, classification algorithms are used to identify patterns of brain activity. Brain Computer Interface (BCI) based on based on visual spatial attention has been designed to help users implement their selections in a visual field, which is achieved by classifying left/right spatial attention using steady-state visual evoked potentials (SSVEPs) [6]. Another computer-assisted system, called a virtual keyboard, has been developed to allow patients to indicate letter-selection intentions for spelling tasks using a second-class BCI system composed of Hidden Markov Models (BCI-HMM). [7]. To some extent, these non-invasive approaches can successfully assist users, especially disabled patients; however, such research always suffers from a limited capacity to capture neural signals.

B. OBSERVATION-BASED RESEARCH
Other researchers have attempted to infer intentions by observing the movements of various body parts, such as the hands, fingers, and face, based on the understanding that a given intention will guide a sequence of behaviors in pursuit of a certain anticipated outcome [13].
In one study, an intention prediction model was proposed for human-robot collaboration based on the recognition of continuous body gestures. In this method, the threedimensional (3D) position coordinates of the hand joint are collected as features and trained using a Hidden Markov Model (HMM). Then, robots can provide appropriate support to target users based on classification of their current gestures. To help robots to understand human actions, an intentiondriven dynamics model, IDDM for short, has been proposed. It is an extension of Gaussian Process Dynamical Models (GPDMs) to support the construction of a low-dimensional representation of high-dimensional movement observations and the inference of human intentions using an approximate algorithm [14].
The authors of suggested a gaze-based intention prediction solution, in which sight positions, fixation durations, saccadic movements and pupillary responses are gathered to classify intentional or non-intentional eye movements using Support Vector Machine (SVM) classification [15].
Beyond those discussed above, two other studies have been conducted that are particularly relevant to the current study. An intention-based adaptive assistance system was designed for wheelchair users based on their locations and heading orientations [16]. For each object, a confidence value is calculated by combining the Euclidean distance between the wheelchair and the object with the angle between the wheelchair's orientation and the line connecting the wheelchair and the object. When the confidence value exceeds a pre-set threshold, a suitable adaptive assistance measure will be invoked to assist the user.
In another study, a human-like robot was developed to approach a person when that person has the intention to interact with it [17]. The control system for this robot utilizes features including the distance between the subject and the robot, the smallest fan consistent with the direction of the trajectory, stability while walking, and the time spent not in motion.

C. SECURITY CONTROL RESEARCH
Rule-based reasoning (RBR), model-based reasoning (MBR), and state-transition graph-based reasoning (STG) methods have been proposed for correlation analysis techniques for detecting intrusions [18].
RBR consists of working memory, rules, and reasoning algorithms. Working memory is a set of raw logs for clustering. The rule expresses knowledge for abbreviating logs and detecting anomalies and takes the form of IF-THEN. The inference algorithm describes a procedure for mapping rules to working memory. In a large network with a large log, RBR is inadequate to apply due to the large working memory and inference time.
MBR expresses the clustering result as a model. The model consists of a network/security device that is a real entity, VOLUME 11, 2023 or a network session that is a logical entity, or a suspicious process. One model consists of attributes, relationships with other models, and actions. Log clustering is achieved by mutual collaboration between models. MBR has a disadvantage in that it is difficult to construct a set of models, and it is inadequate to create a dynamic model in the current IoT service networks where new zero-day attacks are emerging.
STG is composed of token, state, and arc. Attack scenarios are represented by states and arcs. If the token has a state transition by the log and the token transitions to the end state, it means that an attack has been detected. This is a technique to detect an attack by clustering logs that match the attack scenario. STG has the advantage of not having to analyze all the logs, but it has the disadvantage of having to configure the STG for the attack scenario. It is not suitable for detecting zero-day attacks.
Comparing to existing models that analyze based on network log analysis, the model proposed in this study can detect zero-day attacks by analyzing interactions between user and objects. As shown in Table 1, the model proposed in this study is suitable for IoT service networks where many new attacks appear.

IV. BEHAVIOR-BASED HUMAN-OBJECT RELATIONS
This section explains in detail the meaning of the proposed human-object relations and how to extract them from observed behavior. First, an overview of the proposed method is presented, and the following subsections then describe these relations in sequence.

A. STRUCTURE OF THE PROCESS FOR PROPOSED INTENTION INFERENCE
The objective of this study is to infer with which an object a target individual is likely to interact next. Humans always make inferences regarding others' intentions by observing their behavior. Hence, to endow smart objects with human-like intentional cognition, this study attempts to extract various discriminative features from human behavior that can be used to infer whether a person has the intention to interact with a certain object. As mentioned in section II, this intention inference problem is converted into a classification problem for each object: whether a user has the intention to interact or no intention to interact with that object. To this end, five relations are extracted from the user's behavioral data and subjected to classification. Based on these humanobject relations, a supervised method is used to generate an inference model through training on a set of labeled behavioral data as shown in Fig. 2. The human-object relations defined in this paper are as follows: The Euclidean distance between the user and a given object o k at the instant of time t. • Approach Efficiency (ae t k ): How directly the user has been approaching the object o k up through the instant t.
• Movement Efficiency (me t k ): How efficiently the user has been moving toward the object o k up through the instant t.

B. HUMAN-OBJECT RELATIONS
This subsection explains, in turn, why these relations were selected and how they are extracted. At each instant of time t, the user's behavior is recorded, including his location x t u , y t u , his velocity v t u and his body orientation θ t u . For each target object (o k ), whose position x t k , y t k is known, the following five relations are extracted and applied to infer whether this user will interact with this object in the next instant.

1) RELATIVE DISTANCE
Proxemics is the study to address the implicit spatial relationships between individuals, with an emphasis on the vital function of distance [19]. Subsequently, various scholars have attempted to extend this concept to the field of human-computer interaction (HCI), with some success [10]. To a certain extent, the problem of human-object interaction is also an object selection problem, which is affected by the distance from the object of interaction [20]. Based on the above discussion, it can be concluded that the distance from the object is meaningful for inferring interaction intentions.
In this work, the relative distance (d t k ) refers to the spatial distance between a target individual and a given object ( Fig. 3 (a)). It is quantified in terms of the Euclidean distance between the coordinates of the person and the object (as shown in (4)).
This relation is particularly significant if the target needs to interact with the object through physical touch, because he or she must first approach it and then begin to manipulate it.

2) RELATIVE ANGLE
Human attention is always a valuable cue for HCI, in which the gaze orientation is widely treated as a meaningful index. However, most eye gaze tracking systems rely on equipment-assisted or vision-based analysis, which still lacks practicality [15], [16]. To capture a person's interaction intentions in a natural way, one can instead consider that person's body orientation, which describes behavior in a directional sense and is closely related to the gaze orientation [21].
The relative angle (θ t k ) is measured as the included angle between the body orientation vector ( − → dv t ) and the object vector ( − → ov t k ) (as shown in Fig. 3 (b)), where the body orientation vector is the unit vector in the direction of θ t u (expressed in (5)) and the object vector is the vector pointing from the user's position to the object's position (expressed in (6)).
Therefore, the relative angle is the angle between − → dv t and − → ov t k , which is found using the following formula (refer to (7)). Here, Because the human field of view is limited to a certain degree range, visual sensitivity improves as the relative angle decreases. Because humans are good at directing their attention toward objects in which they are interested and ignoring those with which they are unconcerned, if a user desires to interact with a certain object, the relative angle with that object is likely to be smaller than the relative angle with an object with which he has no intention to interact.

3) MOVEMENT SPEED
Movement speed is a scalar variable that describes how rapidly the target user is moving. Many researchers have studied movement speed while considering personal characteristics such as gender and age. In this work, the efficacy with which movement speed reflects the intention to interact is emphasized. In a previous experiment, it was found that individuals decrease their speed to focus on interacting with something [22]. Thus, when a target user is within the interaction area of an object, it is more likely to be interacted with if the user is either walking slowly or stopped, whereas it is likely to be ignored if the user is moving quickly. Therefore, the movement speed is regarded as an indicator of the existence of an intention to interact with a given object.
Here, the movement speed (s t ) is measured as the average speed over the period t 3(c)) using the movement distance, as shown in the following (refer to (8)).

4) APPROACH EFFICIENCY
A profile of human interaction behavior, from the starting position to the target object, includes target selection, path planning and collision avoidance [23]. After choosing an interaction target, a person selects the most effective route to that target based on energy minimization and the shortest possible path [24]. The user will approach the object that is the selected interaction target within as little time as possible. By contrast, for an object that is not the selected target, the user may ignore it or even avoid it as he approaches his desired object. Hence, the directness with which an object is approached can be treated as a cue for inferring the intention to interact. The approach efficiency (ae t k ) is proposed as a measure of the effort devoted to approaching a given object. More specifically, it reflects the number of times up through the current moment that a target person has attempted to approach that object. To identity whether the user is moving toward the object or away from it, the momentary velocity, including both speed and direction, is resolved into two mutually perpendicular velocities, which are called the approach velocity ( (9):

− → v t ) is expressed as in
At each moment, the shortest path connecting the positions of the user and the object is regarded as the desired approach path; this path is identical to the object vector ( − → ov t k ) mentioned above. The approach velocity ( is the projection of the velocity onto the desired path, which indicates how directly the user is approaching the object at the current instant of time. The off-approach velocity ( is the projection of the velocity onto the direction perpendicular to the desired path, which represents how strongly the user is avoiding this object at the current instant. Obviously,

− → v t a and
− → v t o are orthogonal, meaning that neither affects the other. To obtain these velocities, the angle between the velocity and the desired path is necessary, which is denoted by θ t v and is calculated as shown in (10).
where s t ̸ = 0 and d t k ̸ = 0 Here, With this angle, it is easy to obtain the approach velocity − → v t a and the off-approach velocity − → v t o using (11) and (12), respectively (as in Fig. 3(d)).
At the time instant t, whether the user is approaching the object is determined using (13), in which the approach intention is regarded as dominant when the approach velocity is greater than the off-approach velocity.
Because the approach efficiency ae t k is based on how many times this person has attempted to approach this object up through the time instant t, it is computed using (14) as follows.
The value of this quantity is expected to be high for an individual who intends to interact with the given object because more effort will be devoted to approaching the object than to moving away from it.

5) MOVEMENT EFFICIENCY
Deviation occurs during motion following a planned path because of the influence of dynamic constraints, such as social forces. Even so, a person continues to select the path that minimizes the relevant energy criterion at every instant [25]. Therefore, if a user has the intention to move to or interact with something, an efficient path is more likely to be taken than a winding one. Therefore, a user's efficiency in traveling toward a given object can be measured in terms of the ratio of the user's relative displacement toward the given object to the total travel distance of the user. A sketch diagram of a travel path and the corresponding relative displacement is presented in Fig 3(e).
The displacement is generally defined as the direct, straight distance from the starting point to the ending point of a travel path. Here, however, the relative displacement is defined differently, by considering the change in the relative distance with respect to a given object from the initial instant of time to the instant t, that is, where rd t k ≤ 0 indicates that there is no effective relative displacement with respect to the given object thus far. The true travel path is the real path taken by the person being tracked, the total length of which is called the travel distance and is denoted by td t . This td t is calculated as the travel distance accumulated at every instant as follows (refer to (16)).
Using the above definitions of the travel distance and relative displacement, the proposed movement efficiency (me t k ) is calculated as shown in (17).
This movement efficiency is expected to be larger for individuals who have the intention to interact with the object of interest because such a person is more likely to travel toward the position of the target by adjusting his or her travel path to achieve relatively low time consumption and cost, rather than wandering through the available space.

V. SECURITY CONTROL FOR PROPOSED SYSTEM
The problem with using security services alone is that there are limitations in systematically operating and managing various professional security solutions. The core of information protection is economics and efficiency of management, and after security products are applied, they must be systematically managed according to the company-wide security policy. Centralized integrated management is required for systematic management and analysis of integrated logs. Security control has been proposed to improve the quality of security services through the unification of specialized management while maintaining the complexity of upgrading defense systems [26].
In the IoT service environment, security devices such as firewall, antivirus, and intrusion detection/prevention system are placed between the external Internet network and the internal network, which are the same as the existing network service environment. IoT devices directly connected to the network collect logs directly, and devices connected through an IoT gateway collect logs from the IoT gateway, which is different from the existing security event integration method. Since most IoT devices have limited resources, they are controlled through the IoT gateway. Fig. 4 shows a process of configuring an event ensemble performed by a log collection and clustering engine for security control including an IoT device.
The event ensemble in Fig. 4 is defined in Section III. The system is installed on the internal network to solve the security problem of the device. This is the process of storing and archiving security information in the clustering engine and integrated security control. It is a hierarchical clustering model in which each monitor, master, slave, and agent initially form an ensemble of events. The agent collects raw logs and sends them to the slave.
The slave constructs an ensemble of events and sends the clustering engine's abbreviated events to the master. The master re-clusters the events sent by the slaves and sends the suspect network to the monitor. When the IoT device is added in step ① , the registration information is transmitted to the agent-slave-master through the IoT gateway.
In step ② , it is determined whether the log collected by the clustering algorithm will be clustered by configuring it as an event ensemble. In step ③, the event is broadcast to all network devices. In step ④, the events are logged and clustered in the clustering engine of the slaves. In step ⑤, the master updates and clusters all event ensembles. The master's clustering engine detects suspicious networks and updates security policy rules for the device in question.

VI. EVENT ENSEMBLE MODEL AND CLUSTERING A. EVENT ENSEMBLE MODEL
As the event is defined as human-object interaction. The event log is constructed by normalizing the collected raw logs with timestamp, departure device, destination device, and event ID. Table 2 below shows an example of normalizing the raw log of a scanning attack with an internal IP Cam. Because intention is generated in the human brain, several researchers have attempted to directly access human brains and thus to assist individuals in issuing intentional and efficiency of management, and after security [5].
We define event e (refer to (18)) to create an event ensemble model. The event e is composed of s having a starting ip:port, d having a target ip:port, and weight r. r is a value between [0, 1] indicating the possibility of attack, and the initial value is collectively set to 0. In a network with many botnet threats, the r value is initialized close to 1, and the less likely zone is initialized to 0. The event word w is a list of events e as shown in (19), regardless of the order of events. For example, e 1 e 2 e 3 , e 1 e 3 e 2 , and e 3 e 1 e 2 are treated as the same event word. The k − event is a set of all the combination of the words whose event word size is k, as shown in (19). For example, if = {e 1 e 2 e 3 }, the 2-event set is 2 = {e 1 e 2 , e 1 e 3 , e 2 e 3, , e 2 e 1 , e 3 e 1, , e 2 e 2, }. + is an event ensemble (EE) as shown in (20). The closure of event, + is the set of words that can be constructed from one or more events and the maximum possible word density. For example, in Fig. 1, if there is a reporter server, a loader server, a download server, a controller server, and four IoT devices, the k value is 8 or higher.

B. CORRELATION MATRIX
When events are collected online/offline in the event ensemble model, the event source is identified using the topological information of the event. Topology information provides the object names of node instances and event sources, and the hierarchy of nodes in the system. Events are reordered with refinement operations and time-series data and stored in the object's queue in a FIFO fashion. If this is the first event generated by an entity (such as an IoT device or network device), a new queue is created, and the event is placed first. Events are numbered based on the object's id (that is, the unique number or unique name of the device) and labeled as uncorrelated events. This means that the event has not yet been identified as part of a pattern. Queue length is a systemdependent parameter. Considering the general network size, the length of the queue set for each device is less than 20. With this technique, older events are pushed out from the queue as more events arrive in the queue. As all new events arrive in the object's queue, the uncorrelated events are used to form a vector called as working vector. A similarity measure is then performed on all vectors (so-called patterns) of the working vector and the object class template. The vector with the highest similarity to the working vector is added to the correlation analysis matrix. All events included in the pattern are marked as ''correlated'' and removed from the queue. Local correlation results, i.e., composite events, are constructed. Event vectors with a similarity below the threshold are ignored.   5 is used as an example to illustrate the above. First, we assign a unique value to each pattern as follows: p1 = 1, p2 = 2, and p3 = 3. Second, assign a unique value to each event as well, like so: e1 = 1, e2 = 2, e3 = 3, e4 = 4, e5 = 5, e6 = 6, e7 = 7, and e8 = 8. Next, configure fields unique to each pattern. This can be a short description of a pattern the operator can use. Finally, each field is separated by a delimiter (e.g., square brackets []). The following is an example of patterning the Botnet attack scenario in Fig.1

C. EVENT CLUSTERING ALGORITHM
The event clustering algorithm clusters a network where a collaborative attack occurs, such as a botnet, to detect a network for intrusion in the event ensemble k-EE. To cluster the network, the average order of adjacent events is obtained and reflected in the probability of intrusion. Equation (21) is an expression for the average order of adjacent events for N neighbor nodes of any event i. k j and k i are k values of kEN including events i and j, respectively, and A is an adjacency matrix representing a network. P aned k j indicates the degree of indirect activation of related devices in the event ensemble. The average degree of the first neighbors of events of degree k is evaluated using Algorithm 1. EECA shown in Fig. 6. EECA iteratively updates the intrusion probability P(k) and correlates events within a finite time span. The handling process for an event is active until a predefined value is reached. When an event is considered obsolete, it is removed from the queue. This behavior can be modeled as a step function. Events are also assigned weights based on how long they are in the queue. Events arriving on the queue have a weight of 1. This weight decreases over time, and when the weight reaches zero, the event is removed from the queue.

VII. IMPLEMENTATION RESULT AND EVALUATION A. DATASET
A real dataset was used to evaluate the performance of the proposed method. This trajectory dataset contains data from devices which want to interact with the other devices and  devices which don't, collected by a tracking system that uses various IoT sensors. Specifically, the dataset includes tracking data from a total of 130 participants, 63 of whom interacted with the robot and the remaining 67 of whom did not as show in Table 3. For each participant, his or her motion was recorded for the 10 seconds prior to the moment when he or she either reached the nearest position to the robot or began to interact with the robot. During these 10 seconds, each single instance of motion was captured and recorded as one row, where each row contains the corresponding instant of time, personal ID, entity type (person or robot), positions of the person and robot, velocity, motion angle, face angle, and so on.

1) SUPPORT VECTOR MACHINE
Among several methods such as Decision-Tree, Naïve Bayes and SVM, we applied the SVM approach. The data applied in this study uses at least five features (Relative Distance, Relative Angle, Movement Speed, Approach Efficiency, and Movement Efficiency). Therefore, as show in Table 4, it is not possible to construct a decision tree for all cases, and since the features of the data are not independent of each other, the Decision-Tree and Naive Bayes methods cannot be applied.
The SVM approach is a supervised learning model for classification analysis, which is widely believed to be efficient and effective in many fields of study [27]. The fundamental SVM concept is based on the selection of the hyperplane in the feature space that maximizes the margin between the positive and negative samples. This hyperplane is the boundary between the different classes. For this study, we applied 2-class SVM classification, which was implemented using the SVM library named LibSVM [27].
The experimental dataset was formatted such that for each subject, it contained the five extracted relations and their associated labels (1: interaction intention, 0: no interaction intention) regarding whether the subject interacted with the robot. The radial basis function (RBF) kernel was used in the SVM classifier, and the penalty parameter C and the RBF kernel parameters were determined via grid search and cross-validation.

2) EVALUATION RESULTS
In this study, two types of datasets were used to validate the model. One is a dataset of human interaction behaviors, and the other is a dataset of abnormal behaviors that occur as a result. Evaluation of human interaction is made simply by measuring the accuracy of reasoning as show in equation (22). On the other hand, since security evaluation to measure abnormal behavior should also be evaluated for false positives and false negatives, f-measure values were evaluated as shown in equation (23).
The effectiveness of the proposed human-object relations was verified using the dataset and SVM classifier described above. Two experiments were conducted. In the first experiment, the effectiveness of the proposed relations was evaluated with respect to the true intentions recorded in the dataset. In the second experiment, the performance of the proposed relations was compared with the performances achieved in other, similar studies [28]. The results of these evaluations are presented and discussed in this subsection.
The purpose of these evaluation experiments was to verify the effectiveness of each model for classifying interaction intentions. To this end, the inference accuracy was calculated, which is defined as follows:

3) EFFECTIVENESS OF HUMAN-OBJECT RELATIONS
The purpose of this experiment was to evaluate the effectiveness of the proposed relations for identifying the intention to interact. First, each relation was separately used to train the classification model; the resulting inference accuracies are reported in Table 5.
The evaluation results show that the best inference performance, with an accuracy of 85.4%, was achieved using the relative distance. The relative angle and movement speed yielded accuracies of 82.3% and 84.6%, respectively. By contrast, the inference accuracies achieved using the approach efficiency and movement efficiency were less promising. According to the results presented in Table 3, the relative distance, relative angle, and movement speed can be regarded as more beneficial than the remaining relations for intention inference. Hence, only these three relations were extracted and applied to identify users' interaction intentions, resulting in an inference accuracy of up to 94.6%. When we subsequently included the approach efficiency and movement efficiency in the analysis, the inference accuracy increased from 94.6% to 97.7%. This finding indicates that although the approach efficiency and movement efficiency are less effective when used separately, they can assist in increasing the overall inference accuracy when combined with the other relations.

B. PERFORMANCE COMPARISON WITH OTHER MODELS
In this experiment, we applied the five proposed relations to infer the intention to interact, and the relations proposed in [27] and were also tested to evaluate their performance using the SVM classifier [28]. All models were trained on data corresponding to both an ''intention to interact'' and some ''other distinctive intention''. The resulting inference accuracies are reported in Table 6.
The suggested method in this paper is based on nonlinear SVM, LibLinear Approach is based on linear SVM, and Ensemble-based Approach is a hybrid model. Since the features of the human-object relation have nonlinear characteristics, the remaining two models were selected as comparison groups in this paper. By comparison with [27] and [28], the proposed model exhibits an increase in accuracy by 5.2% and 7%, respectively. As a result of the experiment, the optimized SVM model has the highest accuracy in the human-object relation dataset. Table 6 shows that our method surpasses other similar studies in term of inference accuracy, implying that our human-object relations are more effective in distinguishing whether a target user has the intention to interact with a given object.

C. SECURITY EVALUATION
To verify the system, 300 normal behaviors and 30 abnormal behaviors were used as dataset. The normal behavior dataset was created by sampling from the Bot-IoT data set of ACCS (Australian Center for Cyber Security) [29]. The abnormal behavior dataset was created by using the device name sampled from the ACCS dataset and combining log messages from the new IoTroop botnet attack scenario. Fig. 5 is to visualize the result of clustering the event ensemble of the attack scenario and predicting the intrusion path in the security control system. Fig. 7(a) is an example of algorithm that proposes an event ensemble and clusters logs generated from security equipment and devices. Fig. 7(b) is an example of visualizing a vulnerable path by connecting clusters with high attack probability.
The accuracy of clustering is evaluated by obtaining F-measure values using precision (P, Precision) and reproducibility (R, Recall) as defined in (23). F−measure is an indicator for evaluating the accuracy of classification [10]. The ratio F and false positive rate 1 − F of detecting the proposed botnet attack were determined.
where TP:true positiv, FP:false positive, and FN is false negitive. Table 7 shows the accuracy and F-measure obtained for each classification algorithm for the dataset sampled 5% of the BOT-IoT large-scale dataset of ACCS [29]. The Decision Tree method has the lowest results because it was not possible to construct a complete decision tree for the botnet attack. In the Botnet dataset, the SVM classification method shows higher accuracy and F-measure than Naïve Bayes classification method. Table 8 shows the results of repeated experiments by adjusting the degree of diversity of each data set to normal factor values when sampling normal and abnormal flows. The average detection rate is almost equally measured when the logarithmic variability is normal, from 0 to 10. When the regularization parameter (λ ) is 1, the average accuracy is around 80%, so the intrusion flow is effectively detected.
The semanticity verification of clustering for a clustering algorithm uses modularity. The event ensemble was clustered (cluster_edge_betweenness(kEE)) using R's igraph library, and the module value (0.8359884) was measured using the modularity function. The module value is 0.8 or more, which is meaningful for clustering.

VIII. CONCLUSION
This paper presents a behavior-based intention inference approach for identifying with which object a user is likely to interact, and, for security of service, we defined a model for converting logs generated from network/security devices and gateways into event ensembles and presented a clustering algorithm using order centering.
To do this, the problem is first transformed into a set of binary classification problems for each potential interacting object. Then, five human-object relations, named Relative Distance, Relative Angle, Movement Speed, Approach Efficiency and Movement Efficiency, are extracted from the tracked user trajectories, and classified using the SVM approach. Experimental evaluation shows that the proposed approach can be used effectively to discriminate whether a user intends to interact with a particular object.
The security control framework organizes an ensemble of events between neighboring device logs to reduce them to suspicious logs. In addition, the proposed algorithm can respond to a new attack type by updating the intrusion degree with the average intrusion value of neighboring nodes for a given time. It was confirmed that the results of clustering and detection were meaningful when the F measurement value and module value were observed.
The proposed approach can bridge the intelligence gap between devices and can also be extended for use in useradaptive advertising, robot-initiated human-robot interaction, and intent recognition services. In addition, this study makes it possible to create a safer environment in the Industrial Internet of Things (IIoT) network where user, object, machine, and data are interconnected.
However, this approach is subject to certain limitations; it is built on the assumptions of a fully known environment and of voluntary intentions free from the influence of other individuals. In future studies, relationships between objects are considered to improve prediction of interaction intent. In addition, it is necessary to improve the classification algorithm so that it can be applied to large-scale datasets. In this study, we are using the collected datasets provided by ACCS [29], but in the future, we plan to construct a test bed that collects datasets from the real world. To reduce the size of large datasets, improved methods for selecting relevant patterns are needed in future studies.