Robot Formation Control Based on Internet of Things Technology Platform

The cooperative control technology of robot formation can sense all kinds of external environment in real time. It is a multi-functional control and management system including visual recognition, task management execution and distribution, behavior decision-making and so on. It can easily adapt to all kinds of harsh environment. In order to meet the efficient response requirements of robot formation control, a real-time transmission system of robot cooperative motion control is built based on the Internet of things platform, which collects and feeds back the trajectory of multiple robots. Through particle swarm optimization deep learning algorithm, more accurate identification, prediction and guidance of the robot’s next action. Finally, the simulation of robot formation motion is established by MATLAB software, which verifies the feasibility of particle swarm optimization deep learning neural network algorithm under the Internet of things technology. Compared with the traditional robot formation control method, the optimized control method has faster convergence speed, smaller error and more accurate position, which provides method guidance for the accuracy and efficiency of robot formation control technology.


I. INTRODUCTION
Originally robots were mainly used to replace humans to perform tasks in dangerous, complex or completely unknown working environments. A complete robot has a complex system structure, and can independently complete tasks by responding to changing environments in real time. In the face of relatively complex tasks and environments, a single robot cannot meet the work needs only through its own behavior, and the collaborative work of multiple robots has become the focus of researchers in various fields [1]. The research on robot formation began in the 1970s. With the continuous development of robotics, communication technology and automatic control technology, robot formation collaboration technology has gradually been applied in the military The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei . field, traffic control, and space exploration, including border patrols, reconnaissance and rescue, Resource detection, etc.
Aiming at the research of robot formation control method, the T. Fukuda team of Nagoya University in Japan, under the influence of multiple cells, developed a distributed system. A single robot has the ability to communicate and move autonomously. It will perform dynamic reconfiguration when facing complex tasks and environments. Construct [2]. In 1985, researchers at Stanford University proposed the virtual potential field method for the first time, using the force of gravity and combined forces to make multiple robots reach their targets according to a prescribed route [3]. With the development of navigation and automatic recognition technology, as well as research based on behavioral methods, multi-robot cooperative control has begun to be applied in complex environments [4]. At present, the methods of robot formation control are mainly artificial potential field method, virtual structure method, behavior control method VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and follower leader method, and the control method is closely related to the application environment [5]. In the future development, the environmental adaptability of the robot formation control system, the matching of hardware technology, and high sensitivity requirements are the basis for coping with complex and tedious tasks. With the application of the Internet of Things technology, efficient and rapid information acquisition and transmission play an important role in the collaborative control of robots. Arivazhagan A [6] proposed in the article that the use of artificial intelligence and Internet of Things technology can assist robots to complete different tasks better, and can effectively reduce costs. Plauska and Damasevicius [7] found through research that the collaborative learning robot based on the Internet of Things technology provides students with a highly motivated learning environment and improves student learning efficiency. An efficient feedback learning mechanism is the key to the robot formation control system, which directly affects the task completion and accuracy. Yu and Rus [8] used the most discrete and approximate optimal methods in multi-robot path planning to shorten the task completion time and improve the information redundancy. Guo et al. [9] proposed an improved genetic algorithm, which increased genetic targeted mutations to convert multi-objective optimization into singleobjective optimization and found a suitable scale factor in robot path planning. In addition, the organic combination of deep learning and robots can design intelligent robots with high work efficiency, high real-time performance and high accuracy, which are widely used in indoor and outdoor scene recognition, robot industrial services and home services, and multi-robot collaboration [10]. Algorithm optimization based on the deep learning mechanism can effectively improve the accuracy and reduce the error rate of the system. Therefore, in order to better match and control robot formation, this study proposes a robot formation control technology based on the Internet of things platform from the perspective of information transmission and data prediction. The first part summarizes the current control technology of robot formation. The second part is the software and hardware system of the robot under the Internet of Things platform, which combines the robot control board, radio frequency positioning and monitoring software to transmit the position and motion information of the robot. The third part is a deep learning model based on Particle Swarm Optimization for robot formation motion information, which is used to plan and predict the robot trajectory. The fourth part is the simulation process of robot formation control under the Internet of Things platform to test the accuracy of the system model. Through the data transmission of the Internet of Things platform and the position prediction of the deep learning model, the robot's motion parameters and trajectory are accurately planned, which effectively improves the cooperation ability and work efficiency of the robot formation. It lays a foundation for the research of robot formation control.

II. DESIGN OF ROBOT SOFTWARE AND HARDWARE SYSTEM IN THE INTERNET OF THINGS PLATFORM
The formation of multiple mobile robots requires the design of multiple mobile robots, including a robot control board to control the movement of a single robot; an ultra-wideband radio frequency positioning system, including a base station and positioning tags, completes the positioning of the robot; PC monitoring software is used to collect the trajectory of the mobile robot and issue instructions to the robot [11].

A. MULTI-ROBOT FORMATION CONTROL METHOD
The first problem of multi-robot formation control is the formation of formations. Formation mainly refers to how to form a reasonable and stable formation in a complex and unknown environment. In the multi-robot formation system, a reasonable and optimal formation plays a vital role in completing the task. A reasonable and appropriate formation can not only improve the accuracy of completing the task, but also improve the efficiency of the system to complete the task. In different tasks, the robot system will select a suitable and appropriate team according to different needs, and the entire robot system will continuously adjust and change the team according to the changes in the environment and its own conditions during the journey, so as to better to achieve system formation benefits.
The main control methods of multi-robot formation are shown in Figure 1. They mainly include the Leader-Follower method [12], [13], the Behavior-based method [14], [15], and the Artificial Potential Method [16], [17], Virtual Structure method [18], [19], Graph Theory method [20] and so on. Among them, pilot-following method, behavior-based method and artificial potential field method are suitable for distributed control, while the virtual structure method and graph theory method are suitable for centralized control.

B. THE INTERNET OF THINGS PLATFORM ENVIRONMENT SETUP
Before designing a mobile robot software and hardware system, you first need to analyze and build the mobile robot's 96768 VOLUME 8, 2020 operating environment. In order to simulate the Internet of Things platform environment, a wireless communication network should be established first, mainly including Ultra-Wideband (UWB) wireless network, WiFi wireless network, Zig Bee wireless network and GSM network, etc. At the same time, wireless sensor networks are built, mainly including UWB positioning system networks, temperature sensors, and smoke sensor networks [21]. The structural block diagram of the base station based on the Internet of Things environment is shown in Figure 2.
In the Internet of Things environment built in this article, the WiFi network, ZigBee network, and GSM network are mainly responsible for communication between mobile robots and consoles, mobile robots and operators, consoles and sensor networks, mobile robots and sensor networks, etc. Specifically: Ultra-Wideband positioning module uses WiFi / Bluetooth / 5G technology for data communication, mobile robot and console communicate through Zig Bee network, console and sensor network also communicate through Zig Bee network, control personnel and control The stations can communicate via Wi-Fi, ZigBee, and 5G, and mobile robots and sensor networks communicate via ZigBee networks [22]. The sensor network is mainly responsible for collecting environmental temperature, smoke and other information, and communicating with mobile robots and consoles through the Zig Bee wireless network. The mobile robot can move autonomously or be controlled through the console, and it is responsible for performing inspection and monitoring tasks.

C. THE SPECIFIC DESIGN OF THE HARDWARE SYSTEM 1) CONTROLLER
As the core part of the mobile robot hardware system, the controller not only needs to communicate with the sensor system to collect the status information of the mobile robot, but also needs to process this information and solve the corresponding positioning algorithms to obtain the position and attitude angle of the mobile robot information such as speed, angular velocity, etc. also exist as a motion controller, responsible for solving the motion control algorithm of the mobile robot, and providing control signals to the motor drive module to control the motor. Therefore, the controller needs to have excellent computing performance and rich interfaces. This paper chooses the TMS320F28335 digital signal processor as the controller of the mobile robot [23].

2) POWER MODULE DESIGN
The mobile robot hardware system includes controllers, multiple sensors, motors, motor drive circuits, and other parts. Each part has different requirements for power, which requires the system to provide various power outputs with different voltage and current values. According to the power requirements of each part, the power to be provided is classified according to the voltage value. There are three main types: 12V, 5.0V, and 3.3V. The main power source of the mobile robot will be a lead battery with a rated voltage of 12V and a rated output current of 2.5A., which are responsible for providing the input of the voltage stabilization circuit and the driving power of the motor drive module; the voltage stabilization circuit will output a 5.0V power supply for the controller Digital Signal Processing (DSP) and the power of sensors such as the electronic compass, photoelectric code disc, etc. One output is 3.3V power supply, which is responsible for supplying power to ZigBee module, Inertial measurement unit (IMU) and other modules [24].

3) SENSOR SYSTEM DESIGN
The sensor system is the main component of the mobile robot hardware system, and it is the main way to obtain environmental information and its own information. The information required by a mobile robot includes its own pose, heading angle, linear velocity, angular velocity, acceleration, and obstacles in the environment, which can be divided into pose information and speed information.
In order to obtain pose information, the Ultra-Wideband wireless positioning system is used to obtain the position information of the mobile robot, the electronic compass is used to obtain the heading angle of the mobile robot, the laser radar is used for environmental scanning, and the relative pose information and environmental information are obtained through the Iterative Closest Point (ICP) algorithm. In order to obtain the speed information, the photoelectric wheel is used to obtain the rotation speed of the driving wheels, and the angular velocity and acceleration of the mobile robot are measured by IMU [25].

4) WIRELESS COMMUNICATION MODULE DESIGN
The communication problem in the entire mobile robot hardware system includes the communication problem between the mobile robot and the console in addition to the communication between the sensor and the controller. As a bridge between the mobile robot and the user, the console needs to obtain the status information of the mobile robot and other task information in real time. Therefore, it is necessary to ensure timely and smooth communication between the mobile robot and the console. ZigBee technology is a two-way wireless communication method, a short-range and low-rate communication form, which has the advantages of simple structure, low cost, and low power consumption. It is mainly used for short-distance data transmission where the data rate is not high. Considering the working characteristics of mobile robots, ZigBee wireless module is used here as the communication method between the mobile robot and the console [26].

5) MOBILE ROBOT CONTROL PROGRAM DESIGN
From the above analysis, it can be known that the information that the controller of the mobile robot needs to process includes status information such as position, attitude, and speed measured by each sensor, which also needs to process instructions sent by the console and send mobile robot status information to the console. At the same time, the control algorithm must be solved to drive the motor to complete the trajectory tracking of the mobile robot [27]. To process this information, a reasonable control program must be designed, otherwise the established tasks of the mobile robot will not be completed. According to the specific tasks of the mobile robot and the characteristics of each part, the design flow of the mobile robot control program is shown in Figure 3.

D. ROBOT FORMATION CONTROL BASED ON INTERNET OF THINGS TECHNOLOGY PLATFORM
The traditional machine learning method is no longer suitable for multi robot scenes. In this scenario, the strategies of each robot are different. From the perspective of each robot, other robots are changing and the environment is unstable. It presents a state of incomprehensibility and unpredictability with its own strategy. Some algorithms are difficult to converge due to the influence, but when the number of robots increases, the variance of strategy gradient algorithm will increase [28]- [30] Of course, some researchers used the model-based strategy method to predict the next state of the environment by modeling the environment and robot, using the learning strategy in deep learning, or based on certain assumptions and communication with each other. However, it is not applicable or too expensive for many scenarios.
This study uses the method of distributed deep learning, which only regards other agents as dynamic obstacles, and each agent only cares about its own state. Group reinforcement learning inputs combined state and combined action to the evaluation network during training, and only the observation data and state of the agent itself are needed for testing. In this way, a more reasonable cooperation strategy can be trained and no communication is needed during execution, which is more in line with expectations. Based on the construction of the Internet of things platform, the deep learning method will be used to solve the cooperative navigation and control problem of multi robots. The specific process is shown in Figure. 4.

III. DESIGN OF ROBOT SOFTWARE AND HARDWARE SYSTEM IN THE INTERNET OF THINGS PLATFORM
Using the platform of Internet of Things, the position and motion information of robot formation is transmitted to the formation controller of the system. Under the condition of obtaining position information, the motion model can make prediction and feedback according to the current information, and the system can assign it to each robot in formation. At the same time, the error rate and interference rate should be minimized. The objective acquisition can be achieved by reasonably planning the route. the visualization and classification of information data structure should be ensured when choosing the operation model. And the internal attributes and relations between data are determined. In depth learning, multi-layer grids are used to transform the nonlinear features of depth to realize the transfer of sample features in different spaces. It reflects important learning characteristics. The construction of the Internet of Things platform transmits the information characteristics for the deep learning mechanism in the motion model. Through key information and data prediction, more accurate trajectory planning is provided for robot formation.

A. PRINCIPLE AND IMPROVEMENT OF DEEP LEARNING
Deep learning can transfer data information by simulating the multi-level neural network of human brain and using the divergent connection of neurons. After multi-level processing, the hidden characteristics of data can be fully displayed. Compared with the traditional neural network with only one layer of hidden layer nodes, the model has the characteristics of multi-level, large sample size and strong computing power. 96770 VOLUME 8, 2020 Deep neural network has more hierarchical results. It has stronger ability to model or abstract things, and it can simulate more complex models. Deep neural network can deal with a large set of functions through a good compact nonlinear mapping relationship [31], [32]. The deep neural network model structure is shown in Figure. 5.
According to figure 5, deep neural network model consists of input layer, hidden layer and output layer. The input layer is composed of m dimensional column vectors, that is X = [x 1 , x 2 , · · · , x m ], which is used to represent different factors of formation information of robots. The vector of the input layer is transformed and input to the hidden layer. The output expression of the first hidden layer is: In the expression, R 1 is the output matrix of the first hidden layer, R 1 = [r 1,1 , r 1,2 , · · · , r 1,n ], n is the number of nodes in the hidden layer. W 1 is the weight matrix between the input layer and the output layer of the first layer, W 1 = [w 1,1 , w 1,2 , · · · , w 1,p , · · · , w 1,n ]. w 1,p = [w 1,p,1 , w 1,p,2 , · · · , w 1,p,m ] represents all the weights from the input layer to the p node of the hidden layer. B 1 is the threshold matrix between the input layer and the hidden layer, The output value expression of the p node element of the first hidden layer is: In the expression, f (·) is the activation function. The commonly used activation functions are sigmoid function and tanh function. The output value of sigmoid function is between 0 and 1. The definition interval of function is continuous and differentiable.
f (z) ∈ (0, 1). Further, we can deduce the matrix of output after the first hidden layer is input to the second hidden layer. The output expression of the l hidden layer is as follows: After the multilayer hidden layer, the vector matrix of the output layer is: Y = [y 1 , y 2 , · · · , y m ], the specific expression is: In the expression, W k+1 is the weight matrix between the hidden layer k and the output layer, and B k+1 is the threshold matrix between the hidden layer k and the output layer. The motion information of robot formation is real-time and dynamic. At the same time, the feedforward mechanism in effective range is needed to describe the expression effect of dynamic nonlinear time series and further improve the accuracy of the model. Therefore, the recurrent deep neural network model is constructed by adding the feedback delay structure into the deep neural network model. By changing the feed-forward structure of the initial deep neural network model, the dynamic learning ability of the model is improved. In the network structure of recurrent deep neural network model, an association layer structure is added to transfer the dynamic memory of the network and feed it back to the network structure to correct the parameters. The role of the correlation layer is to feedback the output results to the first hidden layer. The output expression of the first hidden layer at time t is: In the expression, is the value input from the correlation layer to the hidden layer at time t, which is also the output result of the model operation at the previous time. Accordingly, the expressions of the hidden layer l and the final output layer at time t are as follows: The recurrent deep neural network model has the ability of associative memory. In the application process of the actual model, the stability of the model needs to be tested. When the average absolute error of the model decreases gradually and its derivative is less than 0, the model is considered to be stable. Another expression of the model's stability is: when the initial input sample is input into the model, the output result is input into the hidden layer in the way of delay memory. New output is generated after continuous operation, and the error between the result and the first output result needs to be gradually reduced. The expression of mean absolute error F and its derivative to Y (t) is as follows: In the expression,Ŷ (t) is the predicted value and Y (t) is the actual value. f is the derivative of the activation function, and VOLUME 8, 2020 2 . Because of f (z) ∈ (0, 1), f (z) ≥ 0, F ≤ 0 can be calculated, the average absolute error F is gradually reduced. After calculation, the model is considered to be stable. The optimization of weight W and threshold B is needed to ensure the stability and accurate output of the model. Particle swarm optimization is a kind of intelligent algorithm which uses the mutual cooperation and competition of population particles to achieve the purpose of optimization. It is convenient, flexible and easy to achieve [33]. The internal parameters of particle swarm optimization algorithm are easy to adjust. It is widely used in the optimization problem of particle swarm optimization algorithm, which is mainly used to optimize parameters. Therefore, particle swarm optimization is chosen to optimize the weight and threshold in recurrent deep neural network model.

B. APPLICATION OF PARTICLE SWARM OPTIMIZATION ALGORITHM
Particle swarm optimization optimization algorithm was first proposed by Eberhart and Kennedy in 1995, and its basic concept comes from the foraging behavior of birds [34]. The algorithm has uncertainty, global optimization, bionic optimization and stability in the application process. The basic idea of particle swarm optimization algorithm is to use an intelligent particle to simulate individual birds. In N dimensional space, each particle is a searching individual. The flight process of particles is regarded as the search process of individuals. The flight speed of particles can be adjusted according to the historical position of individuals and the optimal position of groups [35], [36]. Speed and position are two attributes of particles. The optimal solution of each particle is individual extremum, and the optimal individual extremum in the group is the current global optimal solution. Through continuous iteration to update the speed and position, the optimal solution satisfying the conditions is finally obtained.
The expression of velocity and position update of individuals in particle swarm is as follows: In the expression, v i (n), v i (n + 1), x i (n) and x i (n + 1) represent the velocities and positions of the particles of the n and n + 1 generations respectively. w is the inertial factor, c 1 and c 2 are the learning factors, usually set as 2. rand() is a random number between 0 and 1, pbest i and gbest i are the individual extremum and global optimal solution of the n generation particles. In addition, the fitness function and objective parameters need to be determined. We select the mean square error in recurrent deep neural network model as the fitness of particles [37], and the expression is: In the expression, N is the number of samples,Ŷ (i) and Y (i) are the predicted and actual values of the i sample. The objective parameters are all elements of the weight W and threshold B matrices. The population size of particle swarm is 50, the maximum number of iterations is 10000, the inertial factor is 1, and the target error is 0.01. The initial velocity and position of particles are defined randomly, and the maximum velocity is no more than 5 [38]. The schematic diagram of recurrent deep neural network model optimized by particle swarm optimization is shown in Figure 6.
It can be seen from Fig. 6 that particle swarm optimization algorithm is used to optimize the weights and thresholds in the prediction model. The optimized weights and thresholds are substituted into the network model [39]. Data samples of robot formation motion information are input to recurrent deep neural network model. After continuous feedback learning, more accurate next step motion data can be provided for robot formation [40]. By using recurrent deep neural network model, the accuracy of robot formation control method is improved significantly. A reasonable trajectory planning can improve the motion efficiency of the robot.

IV. SIMULATION OF ROBOT FORMATION CONTROL BASED ON INTERNET OF THINGS
Usually, Deep neural network deep learning neural network training needs a large number of data samples, but because there is no ready-made open source sample library, the overall control data samples of robot formation need to be obtained by the system itself. Build the Internet of things platform, including wireless communication network and wireless sensor network: Ultra-Wideband positioning system network, temperature sensor, smoke sensor network, etc. The data in this paper is acquired by the terminal of the robot contact sensor and feature extraction is realized by the aforementioned method. In the case of larger search space brought by higher dimension input, multi controller  guidance and priority experience playback mechanism are used to complete the application of single agent deep reinforcement learning in obstacle avoidance navigation. In this paper, the high-dimensional laser distance information, the robot's own speed information and the coordinate information of the target point relative to the robot are taken as the input, and the continuous values of the linear speed and angular speed of the robot are taken as the output of the single robot action data. Among 762 sets of characteristic data, 629 sets are selected as training sample data and 100 sets as recognition test data. There are 762 groups of feature data of robot position data, 629 of which are selected as training sample data and 100 groups as recognition test data. In order to verify the deep learning neural network algorithm proposed in this paper, the simulation platform MATLAB is used to compare the deep learning neural network algorithm with the traditional deep learning neural network algorithm. All the deep learning algorithms adopted activation function: ReLU, convolution kernel: 3 * 3, Xavier initialization, batch_size=1.5. The maximum number of iterations of the neural network is 10000, the system accuracy is 0.01, and the maximum evolution algebra of the deep learning neural network algorithm is 50.  As shown in Figure 7 and Figure 8, it can be seen that the training process of deep learning neural network is relatively slow, reaching the optimum at 7500 steps, and the average error is about 0.01. The training process of deep learning neural network based on particle swarm optimization is fast. Only 3000 steps are used to reach the target error value, and the final error is about 0.01. It shows that the deep learning neural network algorithm based on particle swarm optimization is efficient. It can be seen from Figure 9 and Figure 10 that the training process of deep learning neural network algorithm under particle swarm optimization is relatively smooth, which makes the deep learning neural network algorithm under particle swarm optimization more stable in application. In independent reinforcement learning, the environment observed by each robot is dynamic and unstable, resulting in training fluctuations. The noise disturbance at the starting point and the random designation of the target point make the target point to be visited by the robot in each round different, the path length to be experienced is different, the waiting time after reaching the target point is different, and even the navigation difficulty of the target point is different. In particular, some robots have a relatively close target point, which is easy to reach and move in place in a small range  to wait for other robots to complete the task, and continue to get the reward of reaching the target point. Some robots may need to avoid many obstacles on the way to the target point. In order to ensure the safety, they will move forward at a slow speed, and may collide to cause the robot to reset, and get a little reward or even punishment in the whole process. As shown in Figure 11, the simulation test results and error analysis of particle swarm optimization deep learning neural network model. Figure 11 shows the simulation results of deep learning neural network algorithm under particle swarm optimization. It can be seen that the control and route of the robot are well matched. However, in the robot turning, in other words, the error in the continuous turning process of the robot is greater than 0.5%. Therefore, the cooperative control of multiple robots should pay more attention to the cooperative control of multiple robot formation turning process. In the multi robot system, based on the same goal type of the robot, there is no competition relationship, and it is a cooperative mode. The single robot method is extended in parallel to get the Deep neural network algorithm. In order to achieve the global situation, the return function is modified, and its feasibility is verified by experiments. The method has the advantages of fast sampling speed, short training time and convergence in training environment. It can increase or decrease the number of robots without affecting the convergence of the network. It is suitable for the scene with a large number of robots or a constant number of robots. Figure 12 shows the cooperative control route of multiple robot formation, among which ellipse, triangle and rectangle represent obstacles. During the process of robot team from (−8, −15) to (8,15), the shortest route principle is adopted, and the robot needs to avoid obstacles and change the distance between robots. It can be seen from the figure that the robot team can perfectly avoid obstacles and reach the designated destination by changing the route and shape of the team. Therefore, the deep learning neural network algorithm proposed in this paper has a high speed of optimization.

V. CONCLUSION
The robot system can sense all kinds of external environment in real time. It is a multi-functional control and management system, including visual recognition, task management execution and distribution, behavior decision-making and so on. It can easily adapt to all kinds of harsh environment. When some of the world's top professional chess players are defeated by the alpha dog with the powerful deep learning function of artificial intelligence, it represents the rapid improvement of the robot's intelligence. However, when a single robot is in a complex work or in a complex task environment, it can not directly meet the needs of complex task processing through its own behavior. Therefore, based on the formation control technology of the robot, relying on the Internet of things data efficient processing platform, this paper compares the deep learning neural network algorithm and the particle swarm optimization neural network algorithm, and draws the following conclusions: • Through the communication between WiFi network, ZigBee network, GSM network, mobile robot and console, mobile robot and operator, console and sensor network, mobile robot and sensor network, a multi machine person networking information transmission and storage architecture is constructed • The training process of deep learning neural network under particle swarm optimization is fast, only 3000 steps are used to achieve the target error value, and the final error is about 0.01, which is efficient.
• A deep learning neural network algorithm based on particle swarm optimization (PSO) is proposed, which has a high speed of optimization and can avoid obstacles accurately and efficiently to reach the destination.