Large-Screen Interactive Technology With 3D Sensor Based on Clustering Filtering Method and Unscented Kalman Filter

In this paper, a large screen interactive 3D sensor technology with clustering filtering method and unscented Kalman filter has been addressed. The surface and interactive area with the screen is mapped together using image processing and machine vision processing of 3D sensor. Then, the predictable and accurate multi-point interaction can be obtained. Also some operations can be achieved such as zooming, panning, selecting and moving image components, etc. Vision method, infrared induction, dynamic capture, image processing and other mixed technologies have been simultaneously utilized to recognize human gestures and simulate mouse event which can give participants a fully interactive experience. Experimental results show that this large-screen interactive technology with 3D sensor can achieve accurate multi-point interaction and gesture control, and have good human-machine interaction effect.


I. INTRODUCTION
Interactive technology has been widely used in our life. Touch-screen technology is regarded as a node on the way leading to the ideal human-computer interaction. Apple's iPhone as the representative product, touch screen technology has brought us a new and simple-to-use human interface, and it also brings fantastic control experience. Scientists predict that in the near future, as long as human beings through words, gestures, facial expressions and other input, the machine will know what to do, which is the ideal state of human-computer interaction [1]- [12]. Analyzing hand gestures is a comprehensive task involving motion modeling, motion analysis, pattern recognition, machine learning, and even psycholinguistic studies [5]. Like ''Avatar'' movie, as soon as the human press the start button, a threedimensional holographic user interface can be presented before our eyes.
When press a random button, the program opens. When swipe our gesture randomly, data moves with our wish. Capturing and recognizing gestures, expressions and vocal The associate editor coordinating the review of this manuscript and approving it for publication was You Yang . inputs led to the evolution of human-computer interfaces giving rise to the development of the so-called natural user interfaces. These recent developments are investigated by Lamberti et al. In [7], where a generic framework for a seamless integration of multi-modal natural interaction capabilities is illustrated. As we know, 3D sensing sensors are often employed to capture images and video to realize human computer interaction for intelligent systems [10], [13]- [15], [24]. 3D sensing devices are used as a tool for inferring human's hands position, poses and gestures. The information from the sensors can translated into suitable commands for the control of virtually every kind of digital system. One of the main feature of sensing device-based human-computer interaction systems is the ability of tracking human gestures and poses in the field of 3D sensing sensor-Kinect [13]- [15]. Nowadays, new types of human-computer interaction based on Kinect have been focused on improving interactive feeling and experience.
Research direction of this paper is an important recent field of human-computer interaction, in accordance with the tendency of screen interactive field's developments to the large screen interactive fields. In this paper, a large-screen interactive technology with clustering filtering method and The architecture of the large screen interactive system is designed in this paper (see Fig.1) which consists of hardware and software components. The hardware mainly includes the 3D Sensing sensor(such as Kinect), industrial machines, power supplies, electronic dogs, mosaic panel and other components. The software includes the interface software client software and other software. The Working principle of the interactive system is as follows: Through the 3D sensor image processing, machine vision processing, as well as flat-screen displays and interactive coverage area of the map together, the surface and interactive area with the screen is mapped. Then, the predictable and accurate multi-point interaction can be obtained. After gesture classification, we simulate the mouse and keyboard event and then show the interactive images by the projector. This proposed interactive system has a good human-machine interaction effect in Fig.2.  This design implementation of large-screen interactive system has four key steps. Firstly, we need to create interactive area. Then confirm the touch event by the particle analysis algorithm which points are true ''touch'', which is the interference. After the gesture classification, the surface and interactive area with the screen is mapped together to simulate mouse and keyboard events. Also some operations can be achieved such as zooming, panning, selecting and moving image components, etc. So a good human-machine interaction effect can be achieved. Specific implementation steps of the method are as follows: Step 1 (Create the Interactive Area): Firstly, create an interactive-field interactive platform (eg Kinect) through 3D sensors. Kinect installation location needs to make sure Kinect depth camera covering the entire range of interactive areas. Kinect is also needed to ensure that the installation will not be moved and interfered. Even slight vibration will cause interference on the depth values, and reduce the accuracy of interaction. If a wall as an interactive surface, the above rules still apply, but need to be very careful to confirm the installation location of Kinect. Kinect scanning field of view plane face interaction can not be totally obscured by a user standing. In short, the interactive wall is the best way that the entire interaction region should be covered.
Step 2 (Determine the Touch Event): The development of the interactive project is on the basis of Microsoft's boilerplate code SkeletalViewer on. Environment source code is written by C++ and the NET framework4.0. From the Kinect depth data obtained at the frame, the depth datas are stored in the depth buffer. Then, according to the mounting Kinect position we find the roughly interactive surface [14]. Then the next subsequent frame is obtained, the depth of each frame of data will be compared with each other to interact with the surface in accordance with a pre-obtained in order to confirm whether there is ''touch'' event. Particle analysis algorithm can identify which point is true ''touch'', or which is the interference [15]. Finally, we set up a filter, judging these frames of the points one by one to represent a gesture which acts on behalf of a mouse or keyboard events.
Step 3 (Touch Mode-Mouse Control): System mission mode is the normal operating mode, which is used to capture real-time tracking touch gestures, and translate gestures into win7 system above the corresponding mouse and keyboard events. Define a depth value Dmax, and all the depth value exceeds the upper limit are deemed not to fall within touching range. Also define a depth value of the lower limit Dmin, and all values below the lower limit of the depth are considered meaningless touch value. Narrow region between Dmax and Dmin effective area is designed as a touch point. Left-click and right-click is corresponding with the mouse. The event of Right-click is based on a feedback of the time-related gesture.
Step 4 (Touch Gesture-Control Mode): For the system idle mode, gestures filter function and classification parameters need to be initialized and be ready to accept different particle information. When the values of particles are larger than 1, it can be conformed as an effective gesture. In addition, the entire analog I/O event is reset (the mouse button is released). We use a time counter, used to avoid a false gesture is reset to case0 (noise frame). If the time counter value over three (already more than three), we consider the previous gesture has ended. Every gesture (including mobile gestures, zoom gestures, hand gestures, etc.) is saved in the previous record of the gesture.

III. FILTERING METHOD
The traditional data filtering method is based on the arithmetic weighted averaging method. The detailed steps is in the following: Firstly, filter out distances values that first enter operating area and remove the first P points of entry(P=2). Secondly, take the first point of the P+1 point as the first point of the weighted calculation and compare the rest point, and then remove the maximum and minimum of it. Thirdly, the remaining N-P-2 data is performed with an arithmetic weighted averaging Y. This filtering method is easy to eliminate the deviation of the sampling value caused by the pulse interference, with the characteristics of convenient calculation, fast speed, small storage capacity etc. Specific filtering algorithm is as follows: Notice: X (k) represents the real time data value of sensor acquisition and N represents the total number of data collected, generally N is greater than or equal to 8, and satisfied the condition: In view of the real-time and accuracy requirements of 3D sensor tracking and localization, the adaptive threshold nearest neighbor clustering method is adopted in this paper. The point distance in the continuous point collection is relatively close, but the distance between the discontinuities is far away, and it is a collection of two different points. According to the characteristics of sequential scanning and spatial correlation of the data of 3D sensor, a simplified nearest neighbor clustering method is used [16]- [19]. If the distance between two points is less than the threshold value, the two points belong to the same point set. If they are larger than the threshold value, the two points belong to different points set. At this point, the point becomes termination point for the last point set, and the next point becomes starting point for new point set. The nearest neighbor clustering method is fast and effective. Specific procedures are as follows: x Clustering initialization: i = 1, k = 1, C k is point set clustering.
y j = i, i = i + 1, The coordinates of two adjacent points are p i = (x i , y i ), p j = (x j , y j ). Where, x = l · cos λ, y = l · sin λ. Then the distance between two adjacent points is z If R < δ i is established, then p i ∈ C k and turn to y;If R > δ i ,then p i ∈ C k ,and k = k + 1, p j ∈ C k , i = i + 1, finally turn to y.
{ End of a frame 3D sensor data, turn to x, then start next frame data.
The threshold selection is the key of the whole algorithm, which not only directly affects the clustering effect, but also affects the accuracy of the follow-up tracking and positioning. If the selected threshold is too small, the sensor data will be divided fragmented. Though the precision of judging will be higher, the real-time detection will discount [18]; On the contrary, if the threshold is too big, it is very difficult to detect small obstacles, the possibility of missing data will be increasied [18]- [21].
Unscented Kalman filter(UKF) abandons the method of linearization of the nonlinear function, but to use a linear Kalman filter framework [20]- [25]. For one-step prediction equation, unscented transformation is used to solve nonlinear transformation of covariance and mean. Unscented Kalman filter can achieve great performance in bearings-only target tracking. Therefore, Unscented Kalman filter is used in this paper to realize further data filtering. Specific steps are as follows: x Using the formula (3) and (4) to obtain a set of sample points, and the weights can be calculated with equation (5); where, λ is the scaling parameter. y one-step prediction of 2n + 1 Sigma point set can be obtained with equation (6); z One-step prediction and covariance matrix of the digital data can be calculated with equation (7) and (8); { According to one-step prediction value, unscented transformation can be used to generate sigma point set; | Sigma point set generated in step{ can be inserted into the observation equation to achieve the prediction of observation, as shown in Equation (9); } Predicted mean and covariance of the system can be calculated with predicted observation of sigma point set VOLUME 8, 2020 obtained in step| by weighted summation, as shown in Equations (10)- (12); Kalman gain array can be obtained with Equation (13); Status updates and covariance update of the system can be calculated with Equation (14) and (15).
In this paper, the adaptive threshold nearest neighbor clustering algorithm can effectively remove the noise, run fast, and improve the interaction effect. The man-machine interaction system has been used to analyze the continuous and effective distance values. The coordinate points are calculated by the digital filtering method, which is the coordinates of the hand that is used to operate. Then the man-machine interactive control will be realized. At the same time, it can be used to count the effective coordinate points of the scanning area. The mean value coordinates will be calculated, when the operator leaves the scanning range. At the end of this set of data scanning, coordinate parameters reset.

IV. EXPERIMENT RESULTS
This paper has developed a set of interactive media solutions that shows the advantages of interactive equipment perfectly, which can realize the precise interaction and gesture control. Combining special high precision optical induction and photoelectric analog signal, it can be used in any environment to achieve a variety of interactive scene, and strive to become the leading mode of information dissemination in the future. In this section, the installation position of the experimental system of the 3D sensor Kinect needs to ensure that the depth of the Kinect camera covers the entire interactive area, and ensure that, it will not be moved and interfered after installation. Based on the method of large screen interactive system mentioned above and filtering method with 3D sensor, this section conducted two experiments, named the single point filtering experiment and the multi point filtering experiment, comparing with the traditional filtering method and filtering method.

A. SINGLE POINT FILTERING EXPERIMENT
In Fig.3, the red solid square represents the sensor, and the small red rectangle frame represents the effective operation area, only the point in which it can be effectively operated. Experiments are performed on the same object at the same position, and the only change place is the different filtering algorithm experiments. Fig. 3 (a) is the original image. Fig.3 (b) is the single point filtering effect by the traditional filtering method. Fig. 3 (c) is the single point filtering effect by the proposed filtering method.   Fig.3, we can see that there are some isolated noise points before the single point filter, and the single point of the noise is large, and the accuracy is not well. After filtering with the traditional filtering method, the isolated noise points are basically filtered out, but the location is still not very accurate. Employing the integrated filter proposed in this paper, the single point is more accurate and the jitter noise is very small. The results show that the presented clustering algorithm is effective.

B. MULTI-POINT FILTERING EXPERIMENT
Also, in Fig.4, the red solid square represents the sensor, and the small red rectangle box represents the effective operating area. The point in the red rectangle can be only effective. The experiment for the same object is used to operate at the same position, only change for the different experimental filtering algorithms. Fig.4 (a) is the original image. Fig.4 (b) is the multi point filtering effect by the traditional filtering method, and Fig.4 (c) shows the effect of multi point filtering by the proposed filtering method in the paper.
From Fig.4, before multi-point filtering, there are a lot of noises and the jitter noise is very large. After filtering by the traditional filtering method, the isolated noise points around the multi points can be eliminated, and the denoising effect can be improved. By using the proposed method in this paper, we can remove the noise from the interaction region, retain the useful signal, and improve the signal-noise ratio, so that the multi points can be located precisely. Fig.5 denotes the interactive picture of single point in the actual situation, and Fig.6 is the actual effect diagram of large screen and multi points, analyzed by method mentioned above. It supported gesture control and reached a relatively ideal interactive effect in accuracy, stability and anti disturbance and other aspects. In summary, a large number of experiments demonstrate that the proposed method is effective for the large screen humancomputer interaction system based on 3D sensor, and can filter out the noise points and have good multi point interaction effect.

V. CONCLUSION
This paper has presented a large screen interactive technology system based on clustering filtering method and unscented Kalman filter, using the 3D Sensing sensor as a man-machine interactive sensor. By image processing and machine vision processing of 3D sensor, as well as mapping the display plane and the interaction range to the screen, predictable and accurate multi point interaction can be achieved. Also, some wonderful operations such as zoom, translation, image components selection and mobile can be realized, and design collocation, stability and versatility can also be ensured the best. Research and development of this large screen interactive technology system based on clustering filtering method and unscented Kalman filter method can have the natural interactive effect, infectious human-computer interaction surface and plentiful interactive contents. The proposed interactive system has greater economic benefits and can promote the significance, which can be widely used in entertainment games, digital signage, and other air control, etc.
LEI YU was born in Xuancheng, China, in 1983. He received the Ph.D. degree from the College of Automation, Southeast University, Nanjing, China. He is currently a Professor with the School of Mechanical and Electric Engineering, Soochow University, Suzhou, China. His main research interests include human-computer interaction, filter systems, and robust control.