Multi-AUV Collaborative Target Recognition Based on Transfer-Reinforcement Learning

Due to the existence of unfavorable factors such as turbid water quality and target occlusion, it is difficult to obtain valid data of target features. Due to the repeated calculation of similar data, the real-time performance of the algorithm is poor. In view of the above problems, this paper proposes a multi-AUV collaborative target recognition method based on transfer-reinforcement learning. The features of the target information which is collected by multi-AUV are fused based on wavelet transformation and affine invariance. The similarity of features is calculated by Mahalanobis distance and the learning model is selected autonomously based on the similarity threshold. Based on the Q-learning reinforcement learning model, the target information under the interference environment is trained intensively, and the effective features are extracted and stored in the source domain, which can reduce the impact of the environmental interference on the target recognition. The feature transfer learning model based on deep confidence network transfers the feature data of the source domain to the target domain, reducing the repeated calculation of similar data, and then ensuring the real-time performance of the algorithm. Simulation experiments are conducted in the SUN dataset under five underwater environments (turbid water, target occlusion, insufficient light, complex background, and overlapping targets), and the results demonstrate that the proposed model achieves better performance.


I. INTRODUCTION
The Autonomous Underwater Vehicle (AUV) is an intelligent robot that can perform tasks related to the underwater environment without an operator [1]. It can be used as auxiliary equipment for various underwater tasks to prevent operators from working in more dangerous underwater environments, and it can adapt to underwater environments and carry out underwater tasks well. With the increasing complexity of the work, the working ability of a single AUV can no longer meet the requirements, and the collaborative control of multiple Autonomous Underwater Vehicles (multi-AUV) has become a hotspot in recent years. This paper introduces the The associate editor coordinating the review of this manuscript and approving it for publication was anandakumar Haldorai. multi-AUV collaboration mechanism into the field of target recognition, and multi AUV is used for target recognition.
In the actual underwater environment, target recognition will be interfered by many factors, such as turbid water, target occlusion, insufficient light, complex background, and overlapping targets, and so forth. At present, researchers mainly train a large amount of data on target image samples through reinforcement learning [2]- [4], transfer learning [5], [6], and adversarial learning [7], [8] to improve the real-time performance and the accuracy of target recognition. However, the current research still has the following problems: 1) Insufficient data sample, poor recognition quality, and unpredictable underwater environment make it difficult to obtain valid data of the required target features, seriously affecting the accuracy of the recognition algorithm. 2) The target features are not classified, leading to redundant calculation of VOLUME 8, 2020 This similar data and thereby jeopardizing the real-time performance of the recognition algorithm. In view of the above problems, this paper proposes a multi-AUV target recognition method based on transfer-reinforcement learning, as shown in Fig. 1.
The main contributions of this paper are as follows: (1) Establishment of a multi-AVU collaborative sensing model. The convolutional neural network is used to extract target features, and feature fusion is based on wavelet transformation and affine invariance. This model reduces the impact of environmental interference on target information collection and provides more effective information for target recognition.
(2) A target recognition method based on transferreinforcement learning is proposed. Feature similarity is calculated based on Mahalanobis distance. If the similarity is less than the threshold, reinforcement learning model is used to identify the target. Through this method, we can reduce the impact of insufficient target sample data and underwater interference environment on target recognition and ensure the accuracy of the algorithm. If the similarity is greater than the threshold value, the transfer learning model is used for target recognition to reduce the repeated calculation of the same data and improve the real-time performance of target recognition.
(3) The algorithm in this paper has been evaluated under five underwater environments in the SUN dataset, and achieved good performance.
The multi-AUV acquisition image is used as the algorithm input. The convolutional neural network is used to extract the feature and calculate the similarity between the target feature and the source domain storage feature. If the similarity is high, feature extraction is performed based on the transfer learning model, and if the similarity is low, new features are trained and extracted based on the reinforcement learning model. Finally, multiple AUV information is merged and the recognition target is output.

II. RELATED WORK
Currently, Convolutional Neural Network (CNN) is applied to most target recognition technologies. Literature [9] uses the theory of compressed sensing (CS) to generate saliency maps, calibrates the targets in the image, and then uses the CNN scheme to classify the targets, and extract different features of the targets. The target recognition is achieved through long-term training. In order to reduce the running time of the algorithm, Li et al. [10] divides the convolutional layer and the sub-sampling layer into two parts, greatly reducing the training time without losing the recognition rate. Ren et al. [11] introduce a Regional Proposal Network (RPN) to share the full image convolution feature with the detection network. In order to solve the problem of authenticity and processing history of recognition images, Bayar and Stamm [12] develop a new constrained convolution layer CNN model, which suppresses the content of images jointly and adaptively while learning new target features. CNN technology is improved to a new level of autonomy and intelligence, but the above methods require a large amount of data training for the target, and the target recognition cannot be directly performed for a new target.
In the small sample scene, the literature [13] integrates the convolution operation into statistical modeling for the statistical identification problem with limited training data and develops a convolution factor analysis model, which achieves better recognition performance with less training data. In [14], a multi-scale incremental dictionary learning algorithm is proposed. Gaussian functions with different fuzzy parameters are used to extract multi-scale features of SAR images, which are then reconstructed based on the weights of these features at different scales. Experiments on the automatic recognition database of motion and stationary targets show that the recognition performance is better. Zhang et al. [15] propose a small sample recognition algorithm based on the new Siam network, which does not require large samples for training. It maps input information to the 39274 VOLUME 8, 2020 target space by making used of supervised learning, to realize effective target identification, in the case of a small number of training samples in a single category.
When multi-AUV simultaneously detects a target, they can obtain target information from different angles. In [16]- [18] proposed integrated Glasius biological heuristic neural network (GBNN), the bio-heuristic cascade tracking control method, and Greedy and Adaptive AUV Path-finding (GAAP) heuristic algorithm respectively. These algorithms enable multiple AUVs to quickly achieve consistency during work. In terms of multi-view image recognition, Cao et al. [19] proposed 3D aided duet generative adversarial networks (AD-GAN), which not only improves the visual realism of multi-view synthetic images but also preserves identity information well. Xuan et al. [20] proposed a multiview-based 3D convolutional neural network (MV-C3D). This network maximizes feature information from different perspectives, and the recognition accuracy increases with the number of training samples. Literatures [21], [22] propose a Multi-view automatic target recognition (NJSR-ATR) method based on novel joint sparse representation and a multi-view action recognition algorithm (MARA) respectively. These algorithms increase the accuracy of target recognition by improving the correlation between multiple views. In terms of algorithm optimization, the reinforcement learning [23]- [25] algorithm extracts new features of the target by maximizing the cumulative reward training to achieve accurate target recognition in different interference environments. The transfer learning [26]- [29] algorithm can reduce the repeated calculation of the same data and improve the real-time performance of the algorithm.
In summary, although scholars have made great improvements in target recognition and multi-AUV collaboration, the repeated calculations of the same data and underwater environmental interference still affect the real-time and accuracy of underwater target recognition. In this paper, the multi-AUV collaborative mechanism is introduced to the target recognition field, and the AUV is distributed around the target by collaborative control to collect and identify the target from different angles. The algorithm in this paper can reduce the dependence of the algorithm on the target training samples and the calculation of the target recognition process. The use of multi-AUV information fusion reduces the interference of environmental factors on target recognition and improves the accuracy and robustness of target recognition.
The rest of this article is organized as follows. The second part provides an overview of the relevant literature. The third part of the pilot method describes the establishment of a multi-AUV collaborative sensing model, the extraction of target feature information and the establishment of the source domain, the establishment of a similarity measure model of target feature, the choice of the transfer-reinforcement learning algorithm, and the multi-AUV information fusion. The fourth part analyzes the simulation experiment. Finally, the fifth part summarizes the paper.

III. PROPOSED METHOD A. MULTI-AUV COLLABORATIVE SENSING MODEL
In the process of multi-AUV sensing, how to make multi-AUV systems collaborate quickly is the key to accomplishing tasks. Assume that each AUV has the perception function of tracking targets. In the complex underwater environment, a single AUV might have a large error value when detecting the target. In this paper, multiple AUVs are used to sense the target. The AUVs are evenly distributed around the target, and the images of the same target are collected from different angles, thereby reducing the data error when the single AUV senses the target.
In the multi-AUV model, the position and attitude information of each AUV is assumed to be known. y t is the observed value of AUV i for the target at time t, the specific formula is Eq. (1).
where y 1:t−1 represents observations from time 1 to t − 1.
In order to reduce the calculation amount of the multi-AUV system, and improve the accuracy of the target recognition., the particle filter algorithm is applied to the perception of the target. Firstly, we execute Gaussian perturbation of the target state and position at the previous moment, and establish the state transfer function p (s t | s t−1 ) model. Then the target information is released to each AUV, and multi-AUV cooperate to collect the target information.
The multi-AUV collaborative sensing model is built based on a linear consistency formation algorithm, and the recognition target is regarded as the leader of the multi-AUV formation. Each AUV calculates the coordinated relationship with the pilot based on the movement of the pilot to achieve multi-angle information collection of the target.
In the multi-AUV collaborative sensing model, the entire collaborative sensing topology map G is composed of a topology map G cc and a topological map G tc .Where G cc represents the topological relationship between AUV i , G tc represents the topological relationship between the target AUV T and AUV i . Assume that the information transfer of the AUV i to the sensing target AUV T is unidirectional, so G tc is a directed graph.
The dynamic characteristics of the target are expressed by a second-order integrator: where ξ 0 is the position of the perceived target AUV T , ζ 0 is the speed at which the target is recognized, and u 0 indicates multi-AUV control input information. The second-order integrator with the dynamic characteristics of the cooperative AUV is VOLUME 8, 2020 described as: That is, the consistency control rate of AUV i is as follows: In Eq. (2), a ij is the (i, j) term of the adjacency matrix A, which can represent the communication weight between the cooperation is to make the motion trajectory of AUV i tend to AUV T . Write the above form into a matrix form: where L is the Laplacian matrix corresponding to the topology between AUV i and , then the condition that multi-AUV reach consistency can be converted into: With the consistency and coordinated control, the AUV i can be evenly distributed around the sensing target AUV T . As shown in Fig. 2, multi-AUV simultaneously collect the multi-angle information of AUV T , through information fusion, target recognition can be more accurate.

B. TARGET FEATURE EXTRACTION
The key to achieving efficient identification is the effective collection and extraction of target features, but in practical applications, the accuracy of recognition is affected by the underwater environment (such as turbid water, target occlusion, insufficient light, complex background, and overlapping targets). In this paper, the target features are extracted based on convolution neural network, and the multi AUV target information is fused based on wavelet transform and affine invariance, so that the algorithm can extract more complete feature information.
Assuming that the functional expression of a continuous image is f (x, y), one of the features vectors is denoted as , and κ is a proportional coefficient. Normalizing the (p + q)-order central moment u pq based on 6-dimensional affine invariance.
In order to better preserve the feature, the transformation relationship between the Cartesian coordinate system and polar coordinate: x = r cos θ, y = r sin θ, calculate the moment feature in polar coordinates, simplify the feature extraction difficulty and reduce the computational complexity. Let the image size is N × N and the multi-AUV angular interval is θ = 2π N , the angle integral S q (r) is calculated as follows: where S q (r) denotes the q-th frequency domain feature of f (r, θ) in the entire phase space (0 ≤θ ≤ 2π ), m is the scale factor, and e −jq represents the angular component of the transform kernel. The wavelet basis function ψ m,n (r) and the angle integral S q (r) are the inner product in r∈ [0, 1], that is, the wavelet transform is obtained by wavelet transform:  If the selected wavelet matrix and the affine invariant matrix are directly combined, a new set of features will be obtained. Due to the difference between the two different features, the combined feature needs to be normalized to achieve the fusion of different features. The specific formula is as follows: In Eq. (7), is stored as a target feature in the source domain, which should affect the translation, rotation and scale scaling of multi-AUV when extracting targets at different angles. The specific process is shown in Fig.3.

C. TARGET FEATURE SIMILARITY MEASURE
The target recognition of a multi-AUV system is mainly a process of comparing the extracted target features with the stored feature similarities. Considering the connection between the elements of the feature vector, it may be scale-independent or measurement unit independent. This paper uses the Mahalanobis distance to measure the similarity of two features. Assuming there are n samples, each sample is mdimensional, then the data set matrix X is: In Eq. (8), each of the rows represents a test sample, a total of n samples, which are denoted as X i = (x 1i , x 2i , · · · , x mi ) T , i = 1, 2, . . . , m. The data set matrix can be abbreviated as X = (X 1 , X 2 , . . . , X m ). The overall mean of the sample is: The covariance matrix of the data set is X = 1 n (X − µ X ) T (X − µ X ), then the Mahalanobis distance of any two eigenvectors is: The similarity between the target feature and the source domain storage feature can be obtained by calculating the Mahalanobis distance d 2 M . When the similarity is large, the source domain feature can be directly transferred to identify the target. If the similarity is small, the learning algorithm can be used to learn the feature and increase the robustness of the algorithm to target recognition.

D. TARGET RECOGNITION BASED ON TRANSFER-REINFORCEMENT LEARNING
Because the underwater environment is more complex and the target has fewer training samples, the recognition algorithm cannot satisfy various target types. Therefore, this paper proposes a transfer-reinforcement learning algorithm. The algorithm first accesses the memory and uses the similarity metric model to calculate the similarity between the target domain and the source domain.
The similarity comparison between the current target feature and the feature stored in the source domain is performed, and the corresponding recognition strategy is selected according to the similarity size, as follows: where d M (M S , M T ) represents the similarity between the detection target feature and the source domain storage feature. When the similarity between the source domain and the target domain feature is greater than or equal to the threshold τ , the target recognition is directly performed by using transfer learning. When the similarity between the source domain and the target domain is less than the threshold τ , the algorithm uses reinforcement learning to train the corresponding target features. When d M (M S , M T ) ≥ τ , due to the dynamic underwater environment, subtle changes in target and environmental information always occur. This transfer learning adopts the deep confidence network to seek a probability distribution from the display layer to the hidden layer, as shown in Fig.3. The hidden layer consists of Restricted Boltzmann Machines (RBMs), and the logistic regression layer uses a classic back-propagation (BP) neural network to trim the entire deep network with supervision. The whole transfer learning process is mainly realized by constructing an energy function, which can be described as (12): where E (v, h) represents the energy function of the display layer v to the hidden layer h. P (v, h) represents the probability VOLUME 8, 2020 distribution of the layer v to the hidden layer h. v i represents the i-th unit of the display layer. h j represents the j-th unit of the hidden layer. W ij represents the connection weight between the presentation unit v i and the hidden layer unit h j . b i , v i indicates the offset value of the display layer and the hidden layer. n v , n h indicates the number of cells in the display layer and the hidden layer. After the initial training is completed, the network parameters (W , b, c) can be adjusted by feedback to reduce the prediction error of the knowledge matrix. Assuming Q st is the optimal knowledge matrix of the source task, the transfer to the new task knowledge matrix can be expressed as follows: In Eq (13), Q i nt means that AUV i obtains a new knowledge matrix through transfer learning, and ϕ represents input feature information of the new task.
When d M (M S , M T ) < τ , the target recognition is realized based on a network model combining a convolutional neural network and a Q-learning algorithm. The target characteristic is mapped to the action type in the Q-learning algorithm, and is denoted as a t = {a 1 , a 2 , . . . , a n }, where n is the number of features. Assume that there is a series of actions a t and bonus values r t under the environment ε. The system randomly selects an action, and the input layer obtains an image sample x t . The x t is the vector of the original pixel values of the training samples. After the forward propagation of the neural network, the system will get a reward r t , indicating the degree of fitting to the sample.
Let γ be the discount coefficient, Q * (s , a ) is the optimal value of the sequence s in the next round of action a . The expected value of the selection action a maximize r + γ Q * (s , a ) can be expressed as: In practical applications, the loss function L j (θ j ) is used for training, and the loss function is updated as follows for each iteration j: The target feature output function of the i-th training is: where ρ(s, a) is the probability density distribution of s and action a. y i represents a new feature for enhanced learning extraction of recognition targets. Each AUV has a target information collection and recognition function, the support vector machine (SVM) classifier and Bayesian decision fusion method is applied to the multi-AUV target recognition field. The output of a single AUV to target recognition can be expressed as y k = {y k,c ; c = 1, 2, . . . , C}, where c ∈ C is the identified target category and k ∈ K is the k-th view image. According to the Bayesian criterion, the output of multi-AUV target recognition from K perspectives is: where a c = K k=1 l (x k | c). Multi-AUV recognizes the target from different angles, and the information fusion can effectively improve the recognition accuracy of the target, so that the algorithm can have better robustness. The specific algorithm flow is shown in Table 1, and the schematic diagram is shown in Fig.4.

IV. EXPERIMENTAL RESULTS AND ANALYSES
The simulation experiment runs on a small server with a CPU of E5-2630 v4, the base frequency of 2.2 GHz, and a memory of 32 GB. The algorithm in this paper simulates the data in MATLAB R2016a under the window10 system. In this paper, the target recognition training and feature extraction are performed based on the SUN dataset. The threshold τ of the transfer reinforcement learning algorithm will affect the recognition efficiency and recognition accuracy of the algorithm. The analysis of the different values of the threshold  τ shows that when the value of the threshold τ is too small, the calculation amount of the algorithm can be reduced, and the speed of target recognition can be improved, but the target recognition accuracy will decrease. When the value of the threshold τ is too large, the recognition accuracy of the algorithm can be improved, but the recognition speed will decrease, and the calculation amount of the algorithm is increased. The specific data of different thresholds τ are shown in Table 2. According to the data of Table 2, it can be concluded that when the threshold τ = 0.6, the migration reinforcement learning algorithm has the highest benefit, and thus the target recognition simulation is performed with τ = 0.6.
The information of different angles of the target acquired by multi-AUV is firstly normalized by 800 × 600 pixels for different perspective images, and then the target is identified by the multi-view information fusion transfer-reinforcement learning algorithm. The target recognition process based on the transfer-reinforcement learning algorithm is shown in Fig.5.
In Fig.5, the first column graph is the original map as input information of the target recognition algorithm. The second, third, and fourth columns are feature extractions in the target recognition process. The yellow area in the second column of images represents the initial determination of the region of interest by the algorithm, determining the approximate region of the target. The third and fourth column images represent recognition training and image binarization of the target feature. The last column of images is the output information with the identification tag, the yellow rectangle represents the target feature information, and the red rectangle represents the identified target result. Based on multi-angle information fusion, the recognition accuracy of images at different angles is 86%, 84%, 85%, and 81%, respectively.
At present, excellent algorithms for multi-view recognition are AD-GAN [19], MV-C3D [20], NJSR-ATR [21], and MARA [22]. The underwater targets (divers, sea turtles, whale sharks, fishes) are identified from 4 different perspectives, and compared with the algorithm in this paper. The specific recognition results are shown in Fig. 6 and Table 3.
Under different light conditions, four kinds of targets are identified from multiple perspectives. The recognition result is shown in Fig. 6 and the recognition accuracy is shown in Table 3. The MV-C3D algorithm has a maximum recognition accuracy of 85.17% for sea turtles, which is 0.41% higher than the algorithm in this paper. However, the recognition accuracy of whale sharks and fish decreased when the light is insufficient, and the average recognition rate of the four targets is only 83.19%. The recognition accuracy of the AD-GAN algorithm decreases with the change of light, and the average recognition accuracy is 82.04%. The recognition accuracy of NJSR-ATR algorithm and MARA algorithm has less fluctuation, but the average recognition accuracy is only 77.56% and 80.51%. The algorithm in this paper has the highest recognition accuracy for divers, whale sharks and fish, and the average recognition accuracy is 84.40%. The above data shows that the algorithm in this paper still has excellent recognition ability under the interference of light environment. VOLUME 8, 2020  In the actual underwater environment, multi-AUV is subject to environmental interference at any time during information collection, such as turbid water quality, uneven light, target occlusion, complex background, and overlapping targets. The algorithm in this paper simulates the above influencing factors, compared with the algorithm R-FCN [30], Faster R-CNN [31], JCS-Net [32], OHEM [33], FP-SSD [34], YOLO [35] Compare. The basic settings of each algorithm are shown in Table 4.
Different algorithms are used to identify underwater divers, the specific recognition results are shown in Fig.7. The leftmost column of overlapping images represents the original image to be identified, and the remaining images represent the target image under the influence of turbid water quality, object occlusion, insufficient light, complex background, and target overlapping environment from top to bottom. The target is mainly identified by key features and marked with a red rectangle. The proposed algorithm is more excellent in object occlusion and target overlap. At the same time, the proposed algorithm can accurately identify similar targets with only a small amount of existing sample training.
In order to further verify the effectiveness of the target recognition algorithm in this paper, this algorithm is compared with six excellent target recognition algorithms. The detailed data is shown in Table 5. In the case of insufficient light, the recognition accuracy of R-FCN algorithm for divers is up to 81.47%, and the recognition time is 91.52 ms/img. The JCS-Net algorithm has a minimum recognition time  of 43.72 ms/img when the targets overlap, but the target recognition accuracy is only 63.18%. In terms of turbid water, the OHEM algorithm has a minimum target recognition time of 43.29 ms/img, but the target recognition accuracy rate is 69.17%. There is no particularly outstanding result for the recognition accuracy and recognition time of other algorithms, but these algorithms are relatively stable in different interference environments.
In the case of insufficient light, the target recognition accuracy of the algorithm in this paper is 81.41%, which is 0.06% lower than the R-FCN algorithm (81.47%). However, the target recognition time of the algorithm in this paper is 44.72 ms/img, which is 51.14% lower than the R-FCN (91.52 ms/img) algorithm. When the targets overlap, the target recognition time in this paper is 0.53 ms/img longer than the JCS-Net algorithm. In the turbid water environment, the target recognition time of the algorithm in this paper is 2.52 ms/img longer than that of the OHEM algorithm. In other interference environments, the target recognition rate and recognition time of our algorithm are superior. The average recognition accuracy is 82.82%, and the average recognition time is 44.33 ms/img. Analysis of the above data shows that the algorithm in this paper not only reduces the impact of various underwater interference factors, but also ensures the real-time and accuracy of target recognition.
In summary, the algorithm proposed in this paper is not the highest in the case of insufficient light, but it performs well in other interference environments. In terms of the average value under different interference environments, the recognition accuracy rate is up to 82.82%, and the recognition VOLUME 8, 2020  time is 44.33ms. Future research should pay attention to boosting the recognition efficiency of the algorithm, reducing the recognition time of the algorithm, and improving the robustness of the algorithm.

V. CONCLUSION
Enough target information and long-term sample training are the keys to improving the accuracy of target recognition. However, in the complex underwater environment, the crux is how to accurately identify the target under the condition of insufficient sample training and incomplete information collection. Aiming at the above problems, this paper proposed a multi-AUV target recognition algorithm based on transfer-reinforcement learning. The transfer-reinforcement learning algorithm and multi-AUV information fusion mechanism are introduced into the underwater target recognition field to realize target recognition under multi-angle and multi-interference environments. Experiments demonstrate that the proposed algorithm presents excellent recognition results, but the recognition accuracy needs to be improved in the case of insufficient illumination. In future work, we will focus on the above issues to ameliorate the algorithm.