Communication-Aware Motion Planning of AUV in Obstacle-Dense Environment: A Binocular Vision-Based Deep Learning Method

Communication-aware motion planning of autonomous underwater vehicle (AUV) is regarded as an emergent requirement for marine intelligent transportation system. However, the fading acoustic channel and the complex underwater environment make it difficult to realize such task. This paper is concerned with a communication-aware motion planning issue for AUV in obstacle-dense environment. We first develop an intelligent AUV system, which includes binocular cameras for short-distance obstacle avoidance, sonars for long-distance detection, and modems for acoustic communication with buoys. For such system, the parallax angles from AUV to obstacles are utilized to construct an optimal motion planning problem by integrating our previously proposed channel estimation approach. In order to solve the above problem, a deep learning method called depth deterministic policy gradient (DDPG) is developed to minimize the cost function, such that a collision-free path can be planed for AUV while maintaining the communication quality. Note that the advantages of our solution are highlighted as: 1) balance the communication quality and motion stability over the disk model-based methods; 2) improve the collision-avoidance efficiency in path lengths and control efforts as compared with the distance-based methods. Finally, simulation and experimental studies are both provided to verify the effectiveness of our method.


I. INTRODUCTION
W ITH the rapid developments of marine science and artificial intelligence, the research of autonomous underwater vehicle (AUV) has been a compelling topic over the past decades in maritime transportation system.As an Jing Yan and Liang Zhang are with the Institute of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China (e-mail: jyan@ysu.edu.cn;ysdx_zl@stumail.ysu.edu.cn).
Xian Yang is with the Institute of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China (e-mail: xyang@ysu.edu.cn).
Digital Object Identifier 10.1109/TITS.2023.3296415integrated underwater detection and mobility platform, AUV can perform various types of applications, such as subsea exploration [1], pollution detection [2], creature sampling [3], and map construction [4].In order to understand and explore the ocean, AUV requires to plan a collision-free path from the start point to the destination, as depicted by Fig. 1.Thereby, a fundamental question is given as follows: Given the specific starting/ending positions and the environmental constraints, what is the optimum path of AUV and how to get there?
To answer that question, many motion planning schemes have been developed for AUV.For instance, Cui et al. [5] adopted the random tree star to develop a mutual information-based motion planning strategy for the field sampling of AUV.In [6], an adaptive motion planning strategy was provided to estimate the marine scalar field, where the sparse variation Gaussian process was utilized to capture local measurements of AUV.Yu et al. [7] investigated the path planning of AUV in target traveling missions, through which a shortest path faster algorithm was designed to accomplish task allocation and traveling order determination.Besides that, a target search algorithm was developed in [8] to generate a complete target probability map.The above works are well developed without considering the influences of obstacles in underwater environment.As far as we know, obstacles such as coral reefs, shipwrecks and sea walls are inevitably involved in underwater environment [9].Ignoring the influence of obstacle can lead to the failure of AUV motion planning.Therefore, it is necessary to incorporate the obstacle avoidance into the motion planing of AUV in complex underwater environment.
Considering the influence of obstacle, the collision-free motion planning methods can be classified into two major categories: distance-based solutions and vision-based solutions.Particularly, the distance-based solutions collect relative distances between AUV and obstacle by using on-board active/passive sonar [10], [11].Meanwhile, the vision-based solutions collect the depth images of obstacles through camera stereo calibration [12], such that the gray values of depth images can be used to calculate the relative distances between AUV and obstacle.Of note, the former category is traditionally the primary way for long-distance detection of AUV.However, sonar has low resolution ability due to the underwater reverberation noise and multi-path fading effect [13], which may lead to the performance degradation of collision avoidance, especially in obstacle-dense environments.For example, the sidescan sonar with a low frequency can only have low-quality acoustic data, due to the speckle noises caused by acoustic wave interferences [14], [15].Alternatively, the latter category has the characteristics of high resolution ability and low cost, which is very suitable for the short-distance and obstacle-dense underwater environments.In view of this, we focus on the vision-based motion planning issue of AUV.More recently, some vision-based obstacle avoidance algorithms have been developed for AUV.For instance, a multi-camera collisionavoidance system was constructed in [16], and then, a repulsive potential function was designed to move AUV away from collisions.In [17], a vision-based A-star algorithm was developed for AUV to predict the regions that could lead to collisions, and hence, a set of warning signals were generated to allow AUV to perform evasive maneuvers.Nevertheless, the potential function or A-star based methods inevitably face local optimal and sensitivity to initial values.To remedy the local optimal issue, a reinforcement learning based motion planning strategy was developed in [18], whose aim was to output collision-free trajectories for AUV by integrating the convolutional neural network.In [19], the double Q learning and convolutional neural network were jointly employed to perform vision-based collision-avoidance navigation for AUV.In [20], a double deep Q learning-based global path planning strategy was developed for unmanned surface vehicle.However, the vehicle dynamics in [18] and [19] were simplified as the first-order linear differential equation.As has been pointed out in [21], AUV dynamics are with multiple freedom nonlinear property.Moreover, the double Q learning network cannot be directly applied to continuous state space since it finds action that maximizes the action-value function [22], and hence, it suffers from the difficulty of handling continuous underwater complex environment.Based on this, some behavior rule-based collision-avoidance algorithms have been proposed in [23] and [24] to resolve the collision-avoidance navigation of multiple freedom AUV, but it is not an easy task to design the behavior rules, especially for obstacle-dense environment.With regard to this, how to employ the vision measurements to design a learning-based collision-avoidance algorithm for multiple freedom AUV is still an unsolved issue.
Beyond that, we notice that the communication between AUV and surface buoy plays an important role in motion planning, since AUV requires to send its local measurement to control center via the relay of buoy.Nowadays, most of the existing works adopt the approximate deterministic disk model to characterize underwater channel quality [25], [26].Specifically, it is assumed that the channel quality is ideal within a predefined communication range, otherwise, it is invalid.The above assumption is reasonable for the motion planning of terrestrial vehicle, however it is too realistic for AUV, because underwater acoustic communication is often affected by shadowing effect and path loss with geographic position [27].For that reason, it is necessary to integrate the underwater acoustic communication quality into the motion planning process.To this end, a motion-planning approach by integrating stochastic channel estimation framework was proposed in [28] to achieve collision-free motion planning for robot.Followed by this, a communication-aware and energyefficient motion planning algorithm was developed in [29], which can capture the realistic channel property of robot.In [30], the co-optimization of communication and motion of robot was performed under energy constraints.Nevertheless, the communication-aware motion planning algorithms in [28], [29], and [30] are not developed in the context of AUV.In our previous works [31], [32], the communicationaware motion planning algorithms were designed for AUV in obstacle environment, however they adopt the distance-based measurement to perform collision-avoidance maneuvers.Note that the distance-based solution ignores the dimensions of AUV and obstacles, which is not suitable for AUV in obstacledense environment.With consideration of the obstacle-dense underwater environment, it is necessary to design a visionbased motion planning method for AUV that jointly considers the communication quality in cyber side and complex AUV dynamics in physical side.
This paper develops a communication-aware motion planning solution for AUV in obstacle-dense environment.The difficulties in attacking the aforementioned issues can be concluded into two points: 1) how to sense the parallax angle of multiple obstacles by using a binocular camera?2) how to drive AUV to the destination with collision avoidance and communication quality maintenance?In view of this, an intelligent AUV system is developed to enable it have obstacle avoidance and control abilities.Then, a depth deterministic policy gradient (DDPG) based motion planning algorithm is designed to move AUV away from obstacles and toward to destination position.The comparison with the other existing literatures is provided in Table I, through which the main features of this paper are summarized as follows.
1) Co-optimization framework of communication and motion of AUV.By integrating our previously proposed channel estimation approaches [32], a co-optimization framework is developed for AUV in obstacle-dense environment, which includes AUV dynamics, vision measurement, communication channel quality and destination position.Compared with the disk model-based methods in [25] and [26], the co-optimization framework in this paper can balance the communication quality and motion stability.Meanwhile, the four freedom motion model can well capture the dynamics of AUV.
2) DDPG-based obstacle avoidance algorithm.The parallax angles from AUV to obstacles are employed to design a DDPG-based obstacle avoidance algorithm.By considering the sizes of obstacle and AUV, it can effectively reflect the threat level of obstacles to AUV.Compared with the distance-based obstacle avoidance algorithms in [10], [11], [31], and [32], the vision-based solution in this paper can improve the collisionavoidance efficiency in path lengths and control efforts.
3) Experimental test via the co-design of communication and control.It is important to note that, most of the vision and deep learning based motion planning algorithms for AUV are verified by simulation results.With a viewpoint to the practical application, it is necessary and meaningful to check the effectiveness through experimental results.To this end, we design an intelligent AUV system, and more importantly, the theory results of this paper are verified in water pool.
This paper is organized as follows.Section II presents the system model and problem formulation.Section III gives the binocular vision-based path planning solution.Simulation and experimental results are performed in Section IV.Section V demonstrates the conclusions and future works.

A. System Model
In order to perform communication-aware motion planning, AUV makes direct acoustic communication with surface buoys, and at the same time it plans collision-free path to the destination (see Fig. 1).The mechanical structure of AUV with modular concepts is presented by Fig. 2, which consists of a buoyancy shell, six thrusters, two pairs of binocular cameras, a communication unit, an electrical power system, sonars and control cabin.The thrusters are fixed on the AUV frame to ensure AUV can move as freely as possible.Two pairs of binocular cameras are fixed vertically and horizontally on the front of AUV, which can achieve obstacle identification and parallax angle calculation tasks.The role of sonars is to make long-distance detection.The yaw angle of AUV is measured by gyroscope.In addition, the horizontal position information of AUV is measured by the ultra-short baseline (USBL) system from Waterlinked [33], while the depth information is measured by a depth gauge.The communication unit ensures the effective data transmission among different devices, including the acoustic communication with buoys and the electromagnetic communication with control center.
The position and orientation vector for AUV in inertial reference frame (IRF) is defined as η = [x, y, z, ψ] T , where x, y and z denote the positions on X, Y and Z axes, respectively.In addition, ψ is the angle on yaw.The linear and angle velocity vector in body-fixed reference frame (BRF) is expressed as v = [u, v, w, r ] T , where u, v and w are the linear velocities on surge, sway and heave, respectively.r denotes the angle velocity on yaw.Referring to [34] and [35], the motion model of AUV can be given as where M = diag(m u , m v , m w , I r ) is the inertia matrix.m u , m v and m w represent the masses on surge, sway and heave respectively, I r is the moment of inertia on yaw.
is the kinematic coefficient damping matrix, where k u , k v , k w and k r are the linear damping scales on surge, sway, heave and yaw, respectively.k u|u| , k v|v| , k w|w| and k r |r | are the quadratic damping scales.g (η) ∈ R 4 is the hydrostatic force vector.J = [ cos ψ, − sin ψ, 0, 0; sin ψ, cos ψ, 0, 0; 0, 0, 1, 0; 0, 0, 0, 1] is the rotation matrix.Meanwhile, τ = [τ u , τ v , τ w , τ r ] T is the control input vector, which is required to be designed.Of note, τ u , τ v , τ w and τ r are the control inputs on surge, sway, heave and yaw, respectively.Remark 1: From the perspective of application, the control input vector τ is generated by the motions of thrusters.In view of this, we take the case of six thrusters as an example.Then the relationship between control input and thruster motion can be described as τ = HF, where F = [F 1 , F 2 , F 3 , F 4 , F 5 , F 6 ] T is the force vector generated of thrusters (see Fig. 2).Meanwhile, H is the transformation matrix, i.e., where α i is the angle of F i with X axis for i ∈ {1, 2, 3, 4}.
As has been mentioned above, AUV needs to maintain the communication connection with surface buoys.Along with this, we note that the connectivity can be evaluated by the received signal noise ratio (SNR) [29].In view of this, we use the SNR between AUV and buoy to characterize the communication performance, as depicted by Fig. 3. Let P A = [x, y, z] T denote the position vector of AUV, and P B, j = [x j , y j , z j ] T be the position vector of buoy j ∈ {1, . . ., m B }, where m B Fig. 3. Description of the SNR between AUV and buoy j, where P L j = 10k # log 10 (d AB, j ) + 10d AB, j log 10 (α( f )) represents the number of buoys.Referring to [36], the SNR between AUV and buoy j can be defined as where K # d B is the average energy consumption of transmitting one bit data in dB, k # is the spreading coefficient, d AB, j = P A − P B, j denotes the relative distance between AUV and buoy j, α( f ) is the acoustic absorption at frequency f , and N 0 d B is the noise power spectral density in dB.Besides that, σ S H and µ M P are random parameters, reflecting the shadow fading and multipath fading effects, respectively.
Note that the random parameters σ S H and µ M P in (2) are unknown to AUV.In our previous work [31], [32], an integral reinforcement learning-based estimator was developed to capture the unknown shadowing and multipath parameters.For that reason, we employ our previous channel estimation approach to predict the SNR.Then, the predicted mean of SNR between AUV and buoy j is denoted by where θ * denotes the estimated channel parameter.In addition, represents the basis function, H is location-related parameter, and is the variance of measurement Y dB .
In the following, we consider the obstacles-dense underwater environment.Generally, the off-line map of AUV monitoring region can be known by the remote sensing technique.Meanwhile, AUV is capable of online sensing obstacles by using the onboard cameras.For the above consideration, the obstacles can be covered by convex columns or spheres.Particularly, we define the ratio of length to width for obstacle as κ, and hence, the following two cases are considered: 1) If κ ∈ [1−δ # , 1+δ # ], we regard the obstacle as a convex sphere, where δ # is a positive decimal according to the requirement of the system; 2) otherwise, the obstacle is regarded as a convex column.Based on this, the mth sphere obstacle in the environment is denoted by S m (O sm , ρ sm ), where O sm is its center and ρ sm (ρ sm > 0) is its radius for m = 1, 2, . ... On the other hand, the nth column obstacle is denoted by C n (O cn , ρ cn , h cn ), where O cn is its center, ρ cn (ρ cn > 0) is its radius, and h cn is its height for n = 1, 2, . ... In summary, the following obstacle definition is provided.

B. Problem Formulation
The problems of communication-aware motion planning of AUV in obstacle-dense environment is detailed as follows.
Problem 1 (Co-optimization of Communication and Motion): Although communication and control usually depend on each other, they are often designed independently.For this reason, we attempt to co-optimize the communication and motion qualities for AUV.This problem is reduced to design a co-optimization framework for AUV, with considerations of AUV model in (1), SNR prediction in (3) and obstacle set k .
Problem 2 (Obstacle Avoidance With Vision Measurements): The obstacle-dense environment makes it difficult to achieve collision-free motion planning for AUV by means of distancebased measurements.In view of this, it is necessary to adopt the vision information to plan paths for AUV.This problem is reduced to design a DDPG-based obstacle avoidance algorithm for AUV by using the on-board binocular cameras.

III. MAIN RESULTS
We first construct a co-optimization framework of communication and motion planning for AUV.Then, a DDPG-based obstacle avoidance algorithm is developed.Finally, the performance analysis of our solution is presented.

A. Co-Optimization of Communication and Motion
For ease of illustration, the state of AUV is defined as X = [η, v] T , and the sampling interval is δ ∈ R + .With Taylor expansion, model (1) at time step k is rearranged as with Correspondingly, the state of destination is defined as Xtarget = [x g , y g , z g , 0, 0, 0, 0, 0] T , where x g , y g and z g indicate the position coordinates of destination.It is assumed that Xtarget is decided by control center, which can be sent to AUV by the data relay of surface buoys.Note that the learning aim of DDPG is to obtain the maximum cumulative reward function after the motion planning procedure.Then, the motion planing of AUV includes three parts, i.e., destination tracking, obstacle avoidance and communication maintenance.
To steer AUV towards the destination, the reward function of destination tracking is set as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Next, a vision-based collision-avoidance strategy is provided to construct the reward function of obstacle avoidance.We note that AUV perceives the distance from an obstacle with parallax, i.e., the angle across AUV's views from two different observation points to obstacle.When the parallax is large, it means AUV may collide with the obstacle.Otherwise, it means AUV has low risk of colliding with obstacle, as depicted by Fig. 4. Thereby, the objective of vision-based reward function is divided into two categories: 1) horizontal binocular cameras measure the horizontal parallax of obstacle, and then the reward function on horizontal direction can be defined; 2) vertical binocular cameras acquire the vertical parallax, whose aim is to define the reward function on vertical direction.
To this end, we suppose obstacle k # ∈ k is detected by AUV at step k.It is worth noting that we can adopt active or passive sonars to sense obstacle, through which detection procedure can be conducted to detect the existence of obstacle.As the sonar detection is not the focus of this paper, and hence, the detailed procedure of sonar detection is omitted here.Readers can refer to [37] and [38] for the detailed sonar detection process.In this way, the following steps are conducted to develop the collision-avoidance reward function.
Step 1 (Discretize the Shape of Obstacle): We first discretize the shape of obstacle k we regard obstacle k # as a convex sphere, otherwise it is a convex column.Obstacle surface is discretized equally, and these discretized spatial points are mapped to the spheres or columns via orthographic projection.As shown in Fig. 5, we take a convex column as an example, where the mapped spatial point in convex column for the i ∈ (1, 2, . ..) row and the j ∈ (1, 2, . ..) column is denoted by P i j .
Step 2 (Acquire Obstacle's Position in Pixel Coordinate): Based on the image formation principle [39], the point P i j is projected onto the camera's photoreceptor by the camera lens.In the following, we transform point P i j from the world coordinate system to the image coordinate system.In this way, the operations of rigid body transformation, transmission projection and matrix transformation are conducted.Based on this, the pixel coordinates of point P i j with respect to cameras A, B, C and D can be respectively set as where X ,i, j is the the horizontal coordinate of point P i j in the pixel coordinate system for camera ∈ {A, B, C, D}, Similarly, Y ,i, j is the vertical coordinate of point P i j in the pixel coordinate system for camera ∈ {A, B, C, D}.
Step 3 (Calculate the Pixel Difference of Obstacle): The parallax angle can be indirectly obtained by the pixel difference.Thereby, we define X o and Y o as the horizontal and vertical coordinates of the center point on image coordinate system, respectively.Regarding X o and Y o as the benchmarks, one has the following pixel differences, i.e., (10) where d x,i, j denotes the pixel difference on horizontal direction for camera ∈ {A, B, C, D}.Meanwhile, d y,i, j denotes the pixel differences on vertical direction.
The relative distances between camera's optical center and projection point are denoted by f A,i, j , f B,i, j , f C,i, j and f D,i, j .From (10), the above relative distances are given as where f is the focal length for camera ∈ {A, B, C, D}.
Step 4 (Design of Parallax-Based Reward Function): Let θ ,i, j denote the angle between the line of camera ∈ {A, B, C, D} optical center and the line of binocular camera baseline, as depicted by Fig. 5. Noting with orthogonality and parallelism, the above angles can be calculated by Based on ( 13), ( 14), ( 15) and ( 16), the horizontal parallax angle θ H,i, j and vertical parallax angle θ V,i, j are where . Of note, B H and B V represent the baseline lengths of horizontal and vertical binocular cameras, respectively.In addition,ϒ is the maximum visible distance of the camera, which is a constant related to the current underwater environment.The detailed design procedure of the thresholds (i.e., P θ H and P θ V ) is provided in Corollary 2.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Calculate the parallax angles for the other spatial points, and hence, the maximum horizontal/vertical parallax angles for obstacle k # are represented as θ H,k # = max i, j θ H ,i, j and θ V,k # = max i, j θ V,i, j .Applying the above result to the other obstacle within the sensing range, we can easily get the parallax-based reward function, i.e., at time step k Next, the reward function of communication efficiency is designed.Particularly, we measure the underwater communication quality by calculating the SNR between AUV and buoys.In order to guarantee the communication connection with surface buoys, AUV should maintain a minimum required SNR with at least one buoy.For that consideration, a threshold of the minimum required SNR is denoted as SNR th , which is associated with the on-board communication equipment in AUV.If − − → SNR dB (P A , P B, j ) ≥SNR th , then AUV can communicate with buoy j ∈ {1, . . ., m B } while satisfying the quality of service requirement, i.e., the communication link is connected.Otherwise, the communication link is regarded to be disconnected.Then, we define the following rules: 1) If the maximum SNR received by AUV is less than SNR th , then the communication quality of AUV at time step k is poor.That means AUV needs to enlarge its reward, such that the communication quality can be improved; 2) If the maximum SNR received by AUV is larger than SNR th , then the communication quality of AUV at step k is good.So, AUV needs to keep its current reward, such that the current communication quality can be maintained.
Our previous channel estimation approach [31], [32] is conducted to predict the SNR between AUV and buoy j ∈ {1, . . ., m B }. From (3), the maximum SNR received by AUV at time step k is defined as ). Accordingly, the reward function of communication efficiency is constructed as where sign(•) denotes the sign function.From ( 7), ( 18) and ( 19), the total reward function for AUV communication and control can be given as where K 1 K 2 and K 3 are positive weighting factors.Of note, the roles of K 1 , K 2 and K 3 are to balance the tracking, obstacle avoidance and communication efficiency.
Based on (20), the value function is defined as where γ ∈ (0, 1] represents the discount factor, and its role is to reflect the impact of the future state on the present.Hence, the optimization of ( 21) is to select τ * (k), such that an optimal update policy of τ (k) can be obtained, i.e.,

B. DDPG-Based Obstacle Avoidance Algorithm
This section designs a DDPG-based obstacle avoidance algorithm, such that the optimal solution τ * (k) in ( 22) can be Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.derived.It is noted that DDPG is a deep deterministic policy gradient algorithm with deterministic output actions [40], [41].Compared with the random policies, it tends to require fewer samples for gradient estimation.Besides that, it can solve the action space continuity problem, which is lacked in the classical deep Q-network (DQN) algorithm.For this purpose, we consider the dynamic model ( 6), whose state and action spaces are set as follows.1) State: at time step k, the state space of AUV is described as s 2) Action: the control input vector τ (k) is regarded as the action space of AUV.Particularly, the main framework of the motion planing solution for AUV is described by Fig. 6.
The DDPG-based obstacle avoidance algorithm includes the main network and the target network.Specifically, the main network is consist of actor network π(s(k)|ϑ π ) and critic network Q(s(k), τ (k)|w Q ), where the role of actor network is to select the current action τ (k) based on the current state s(k).Meanwhile, the role of critic network is to evaluate the value of actions.On the other hand, the target network is consist of the target actor network π ′ (s(k)|ϑ π ′ ) and the target critic network Q Of note, the role of π ′ (s(k)|ϑ π ′ ) is to select the next action τ (k + 1) based on the next state s(k + 1) via the memory pool sampling data, while the role of Q ′ (s(k), τ (k)|w Q ′ ) is to fit the target value function for subsequent network update.As shown in Fig. 7, the following steps are conducted to seek τ * (k).
Step 1 (Environmental Experience Collection): Based on the current state X(k), one obtains the state set s(k), i.e., the binocular parallax angles θ H,k # and θ V,k # are acquired by (17) while − − → SNR max (k) is estimated by (3).Subsequently, a random exploration action is generated, i.e., where π(s(k)|ϑ π ) is the mapping output of actor network under state s(k) and weight ϑ π .Besides, N # is the Ornstein Uhlenbeck (OU) noise, obeying the Gaussian distribution.From (3) and noting with s(k), one can easily obtain the reward R(s(k), τ (k)) by (20).In addition, the state space of AUV at time step k + 1 can be obtained through ( 6), ( 17) and ( 3).Accordingly, the collected environmental experience at time step k is depicted as Repeating the above procedure with multiple episodes, one can easily have the total environmental experience for AUV. Step with where ).In order to minimize Loss(w Q k ), we employ the gradient descent to update weight w Q k , and hence, one has From (27), one updates the weight w where γ 1 ∈ (0, 1) is the learning rate and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Algorithm 1 DDPG-Based Motion Planning for AUV Get s(k) and τ (k) by ( 3), (17) and by ( 23  In the following, the weight ϑ π is updated such that the loss function of π(s(k)|ϑ π ) can be maximized.Along with this, the loss function of π(s(k)|ϑ π ) is defined as The gradient ascent is to update ϑ π k , and hence, one has With (30), one updates the weight ϑ π k as where γ 2 ∈ (0, 1) denotes the learning rate and ϑ π k+1 denotes the weight of π(s(k)|ϑ π ) at time step k + 1.
Accordingly, the weights in target network can be updated by using a soft update replicate from the main network at every N p time steps.In view of this, one has where ), and ϑ π ′ is the weight of target actor network π ′ (s(k)|ϑ π ′ ).In addition, ρ ∈ (0, 1) is the learning rate.
Repeating the above process, the optimal weights w Q * and ϑ π * can be obtained by minimizing (25) and maximizing (29).
Step 3 (Output the Optimal Action of AUV): Substituting ϑ π * into the main actor network, one can obtain the optimal action τ * (k) at time step k, i.e., Extending τ * (k) to the other time steps, the optimal actions of AUV are obtained.Correspondingly, the optimal states of AUV are obtained by substituting τ * (k) into (6).For clear illustration, the complete DDPG-based motion planning procedure is shown in Algorithm 1, where T e is the episode of exploration process, T m is the total training episodes and k m is the maximum time step of every episode.

C. Performance Analysis
We first study the optimization of DDPG algorithm.To support the proof, the following properties are presented [42].
Property 1 (Lipschitz Continuity): For a differentiable function f : S ⊆ R → R, if there exists lipschitz constant L > 0, then ∀a, b ∈ S have the following inequality Property 2 (Strong Convexity): For a differentiable function f : S ⊆ R → R is U -strongly convex and U > 0, then ∀a, b ∈ S have the following inequality Based on this, we have the following theorem.
where Loss * (w Q ) denotes the optimal value of loss function, ϵ > 0 is a decimal, 0 < ∂ ϖ < 1, ∂ and ϖ are respectively the ∂-strongly convex and ϖ -lipschitz constants of Loss(w Q ), then the loss function can converge to the optimal value.
Proof: First, we prove that the loss error is a monotonous decrease function with weight w Q k .To this end, we employ Property 1 to obtain the following result From ( 28) and (38), we can further have Set γ 1 = ϖ −1 , thus (39) can be rearranged as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Similarly, based on Property 2, one has We set that w , and hence (41) can be rearranged as Note that the final iteration output is regarded as the optimal loss function Loss * (w Q ).Thus, from (42), one has Subtracting Loss * (w Q ) from both sides of ( 40), one has Taking ( 43) into (44), one has Note that Loss(w > 0 always hold.Thus we extend (45) to the whole iteration sequence, i.e., from the initial iteration to the T m k m N p th iteration during the T m episodes, through which we have We regard that Loss(w

Based on this, we can further have
which means Loss(w Q k ) k th can converge to the optimal if (37) is satisfied.Once Loss(w Q k ) reaches to the optimal value, the optimal control policy τ * (k) can be derived subsequently.
Typically, the horizontal parallax angle is adopted to achieve obstacle avoidance, but it can limit the flexibility of the paths of AUV.Alternatively, this paper jointly employs the horizontal and vertical parallax angles to achieve collision-free motion planning.Then the following corollary is provided.
Corollary 1: By employing the vertical binocular cameras, the measured vertical parallax angle θ V,i, j in ( 17) can correlate with the motion planning of AUV in Z axis.
Proof: Suppose the vertical binocular camera ¯ ∈ {C, D} has detected the target point [X ¯ ,i, j , Y ¯ ,i, j ] T at depth z.When AUV moves down (i.e., the value of z decreases), Y ¯ ,i, j will decrease, otherwise, it will rise.Then, we have where χ ¯ is a fixed positive scale factor.Based on (48), d ¯ y,i, j in (10) can be rearranged as Assume the horizontal position is fixed, then f ¯ ,i, j in ( 12) is a fixed constant, which can be denoted as f # ¯ .From (49) and (50), we can recalculate θ V,i, j as Clearly, θ V,i, j is related to z, and the motion of AUV in Z axis requires to be adjusted during the collision-avoidance procedure, since f # ¯ and χ ¯ are fixed constants.In (17), the thresholds of P θ H and P θ V are provided.To verify the correctness, the following corollary is given.
Corollary 2: Given the maximum visible distance ϒ of a binocular camera, the threshold ranges of P θ H and P θ V are related to the camera baselines, which can be set as ], respectively.Proof: We first study the lower limit value of P θ H , i.e., the case when θ H,i, j has the maximum value.As shown in Fig. 8 (a), one can obtain that the above case is occurred when θ A,i, j = θ B,i, j = tan −1 ( f A / X o ).Based on this, the lower limit value of P θ H in (17) can be determined as On the contrary, the upper limit value of P θ H can be acquired by minimizing θ H,i, j .Hence, the above case is occurred when point P i j is on the visual boundary, as shown by Fig. 8 (b).By using inverse tangent function, one can have Based on (53), ( 54) and (55), the upper limit value of P θ H in (17) can be determined as  Similarly, the threshold of P θ V can be calculated by Clearly, the thresholds of P θ H and P θ V are related to the binocular camera baselines (i.e B H and B V ), since f A , f C , X o , Y o and ϒ are constants.That completes the proof.
1) Vision-Based Motion Planning With SNR and DDPG: As presented in Section III, a binocular vision-based path planning solution with consideration of SNR and DDPG is designed for AUV.To verify its effectiveness, the estimated channel parameters are given as follows: θ * = [−50, 3.95] T , σ S H = 3.3419 and ε = −100 dBm.Based on these parameters, we employ (3) to predict the SNRs in the whole monitoring region, which can be shown in Fig. 9(a).Based on this, the motion trajectory of AUV is depicted by Fig. 9(b).Correspondingly, the motion state of AUV in the X-axis, Y-axis, Z-axis and yaw are shown in Fig. 9(c).To show more clearly, we define the tracking error vector as e track = x − x g , y − y g , z − z g T , which can be shown by Similarly, the relative distance between AUV and column obstacle n ∈ {1, . . ., 5} is defined as e cn = ∥X − O cn ∥ − ρ cn , while the relative distance between AUV and sphere obstacle m ∈ {1, 2} is e sm = ∥X (k) − O sm ∥ − ρ sm .Along with this, the relative distances between AUV and obstacles are shown in Fig. 9(e).Clearly, we find that the collision-avoidance task can be accomplished by AUV, since all the relative distances are greater than zero.Note that the communication channel quality, i.e., SNR, is incorporated into the motion planning procedure.Meanwhile, our objective is to guarantee that the SNR for at least one buoy is greater than the threshold.In view of this, Fig. 9(f) presents the SNR between AUV and buoy j ∈ {1, 2, 3}.Clearly, the SNR is smaller than the threshold at k = 1, which means the communication quality is poor at the initial stage.After using the communication-aware motion solution, the SNR shows an increasing trend during 1 < k ≤ 113.When k > 113, we find that the SNR for at least one buoy is greater than the threshold, indicating the communication connection with at least one buoy can be guaranteed.
Besides that, 1500 episodes are conducted in Algorithm 1 to guarantee the convergence of loss function.For that design, the reward value in (20) can be shown in Fig. 9(g).It indicates that the network weights can converge to the optimal.Based on this, the optimal control strategy τ * can be derived, as shown in Fig. 9 (h).These results verify the effectiveness of the vision-based motion planning solution in this paper.
2) Comparison With Distance-Based Obstacle Avoidance: Note that the distance-based obstacle avoidance solutions have been proposed in [10] and [11].However, the distance-based solutions are not suitable for obstacle-dense environment.To verify the above judgement, we incorporate the distance measurements into the motion planning of AUV.For easy to compare, the SNR is removed from the reward function.In view of this, the motion trajectories of AUV based on distance and parallax measurements are shown in Fig. 10(a Meanwhile, the parallax-based motion planning solution in this paper can reflect the size information of AUV.This is very important for the practical application of AUV in obstacle-dense environment.For that consideration, the following scenarios are considered: 1) A sonar sensor is installed on the middle-front region of AUV; 2) A pair of binocular cameras are installed on the horizontal region of AUV.Once an obstacle is detected by AUV, the detection result of Scenario 1 by using a narrow AUV (e.g., the width is 1 m) is shown in Fig. 10(e), while the detection result of Scenario 1 by using a wide AUV (e.g., the width is 2 m) is shown in Fig. 10(f).From Figs. 10(e)-(f), we find the detection results have the same value.On the other hand, the detection results of Scenario 2 are shown in Figs.10(g)-(h).Clearly, we find the wider AUV can trigger a larger parallax value compared with a narrower vehicle in Scenario 2. Therefore, the parallax-based solution in this paper can indirectly reflect the size information of the AUV.
3) Comparison With the Single Horizontal Cameras: As mentioned above, this paper jointly employs the horizontal and vertical parallax angles to achieve collision-free motion planning, which can improve the flexibility of obstacle avoidance.To verify its merit, we consider the following two cases: 1) The horizontal parallax angle is adopted to achieve obstacle avoidance, e.g., the solution in [13]; 2) The horizontal and vertical parallax angles are jointly adopted to achieve the obstacle avoidance, i.e., the solution in this paper.Besides that, it is set that three column obstacles block the way of AUV.Accordingly, the motion trajectories of AUV in the above two cases are shown by Fig. 11 [10] and [11].
To show more clearly, we define the path length as the total distance from start to finish the obstacle avoidance.Then, the path length and total time step in the above cases are shown in Fig. 11(d   efficiency is ignored, e.g., K 1 = 0.7, K 2 = 0.01 and K 3 = 0. Based on the above design, the motion trajectories of AUV are in Fig. 12(a).Correspondingly, the SNRs between AUV and buoy j ∈ {1, 2, 3} for the above three cases are shown in Figs.12(b)-(d), respectively.Meanwhile, the tracking errors are presented in Fig. 12(e).From Figs. 12(a)-(e), we can find that there is a tradeoff between communication effectiveness Fig. 13.Comparison with Q learning DQN, e.g., [19] and [20].and tracking stability.Therefore, the researchers can adjust the weights according to the actual engineering requirements.In addition, the values of reward functions are depicted by Fig. 12(f), which can demonstrate the optimal of the DDPGbased motion-planing algorithm in this paper.
5) Comparison With the Q Learning and DQN Methods: The Q learning method was adopted in [19] to perform collision-avoidance navigation, while the DQN method was utilized in [20] to achieve global path planning.However, these methods cannot be directly applied to continuous state space since they find action that maximizes the action-value function.For comparison purpose, the Q learning and DQN methods are employed in this section to plan the path of AUV.Based on this, the motion trajectories of AUV with the above three methods are shown in Fig. 13(a).Correspondingly, the path length and time cost are shown in Fig. 13(b).The time consumption for each time step is presented in Fig. 13(c).The reward value by using Q learning and DQN methods is shown in Figs.13(d)-(e).It is obvious that the traditional optimization methods (i.e., Q learning and DQN methods) can minimize the cost function in general.However, the path length and time cost with the DDPG method are the smallest, as compared with the ones in Q learning and DQN methods.
Next, the energy consumption indicator is adopted to quantitatively measure the control efforts.Referring to [43], the energy consumption indicator is defined as  Experimental setup and results when the SNR is ignored in obstacle-sparse environment.Fig. 13(f) shows the energy consumption of controller by using Q learning, DQN and DDPG methods.From Fig. 13(f), we find that the energy consumption indicator by using DDPG is the smallest as compared the other two cases.It reflects that the DDPG-based solution in this paper can improve the collision-avoidance efficiency in control efforts.

B. Experimental Results
In this section, the experimental results are provided.Specifically, the hardware is mainly composed of four parts: 1) Binocular vision system.The binocular vision system is built by four monocular cameras for real-time obstacle detection.2) Communication system.Acoustic transducer and acoustic modem that adopt orthogonal frequency division multiplexing (OFDM) model are employed for the wireless communication with surface buoys.Meanwhile, the video transmission is accomplished by the umbilical communication.3) Localization system.The USBL is adopted for the self-localization of AUV. 4) AUV.BlueROV2, which is one of the most affordable high-performance underwater vehicles, acts as the major structure of AUV in our system.Besides that, the control cabin mainly contains STM32 microprocessor and NVIDIA microcomputer.Of note, STM32 microprocessor is used to control the motion of AUV.Meanwhile, the NVIDIA microcomputer is used to process the video streams from the vision system.
1) Experimental Results in Obstacle-Sparse Environment: In this case, one underwater obstacle exists in the environment.For clear verification, we first ignore the influence of SNR, and hence, the experimental setup is depicted by Fig. 14(a).By employing the parallax vision-based measurement, the DDPG is performed to generate the motion trajectory of AUV, as shown in Fig. 14(b).At the same time, we also employ the distance-based measurement to plan the motion of AUV, which is also shown in Fig. 14(b).It is clear that, the parallax vision-based solution in this paper can improve the collision-avoidance efficiency in path lengths and control efforts, since the path length of distance-based solution is 10.29 m while the parallax vision-based solution is 9.09 m.Besides that, the reward functions with the above two measurements are presented in Fig. 14(c), while the relative distances between AUV and obstacles are shown in Fig. 14(d).From Figs. 14 (c)-(d), we can find that the reward functions with the above two measurements can both converge to the optimal, however the parallax vision-based solution can safely avoid collision at a more closer distance (i.e., 0.23 m).The above results demonstrate the necessary and meaningful of parallax vision-based solution in this paper.Particularly, the demo for these results is provided in the first part of [44].
In the following, we incorporate the SNR quality into the motion planning of AUV.By employing the parallax  Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of [44].
2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig. 17        a more close distance in obstacle-dense environment.Particularly, the demo for these results is provided in the third part of [44].
Along with this, we incorporate the SNR quality into the motion planning of AUV.Particularly, the DDPG-based solution in this paper is compared with Q learning method [19] and DQN method [20].By employing the parallax-based measurement, the motion trajectories of AUV with the above three methods are shown in Fig. 18(a).Correspondingly, the

V. CONCLUSION AND FUTURE WORKS
This paper presents a communication-aware motion planning solution for AUV in obstacle-dense environment.By integrating our previously proposed channel estimation approaches, a binocular vision-based co-optimization framework is constructed for AUV to balance the communication quality and motion stability.Based on this, the vision measurements are employed to design a DDPG-based obstacle avoidance algorithm for AUV, which can move AUV away from obstacles and toward to destination.Finally, simulation and experimental results are both performed to validate the effectiveness.
In the future, we will adopt the binocular vision-based deep learning method to solve the swarm control of multiple AUVs in more complex maritime transportation systems.Besides that, the co-design of underwater sensing, transmission and control is also our future work.

Manuscript received 13
January 2023; revised 5 June 2023; accepted 12 July 2023.Date of publication 25 July 2023; date of current version 29 November 2023.This work was supported in part by the National Natural Science Foundation of China under Grant 62222314, Grant 61973263, and Grant 62033011; in part by the Youth Talent Program of Hebei under Grant BJ2020031; in part by the Distinguished Young Foundation of Hebei Province under Grant F2022203001; in part by the Excellent Youth Project for NSF of Hebei Province under Grant F2021203056; and in part by the Central Guidance Local Foundation of Hebei Province under Grant 226Z3201G.The Associate Editor for this article was Y. Li. (Corresponding author: Jing Yan.)

Fig. 1 .
Fig. 1.Description of an AUV communication-aware motion planning system in obstacle-dense environment.

Fig. 5 .
Fig. 5. Schematic diagram of the horizontal and vertical binocular cameras for obstacle k # ∈ k .

2 (
Training the Entire Network Weights): After the environmental exploration process, we need to update the network weights, such that the loss function of Q(s(k), τ (k)|w Q ) can be minimized while the loss function of π(s(k)|ϑ π ) can be maximized.To this end, N empirical data sets are extracted from the memory pool, and the corresponding time steps are re-labeled as {1 L , . . ., N L } where 1 ≤ 1 L ≤ k and 1 ≤ N L ≤ k.Particularly, an arbitrary empirical data set at time step k is denoted by {s( k), τ ( k), R(s( k), τ ( k)), s( k + 1)} where k ∈ {1 L , . . ., N L }. Based on this, at time step k, the loss function of Q(s(k), τ (k)|w Q ) is constructed as

Theorem 1 :
Consider the AUV model (1) with DDPG-based motion planning algorithm.If the parameters T m , k m and N p satisfy the following condition

Fig. 9 .
Fig. 9. Simulation results for the binocular vision-based motion planning of AUV with SNR and DDPG.

Fig. 9 (
Fig.9(d).From Figs.9(b)-(d), we know the tracking task can be achieved, since the tracking error can converge to zero.Similarly, the relative distance between AUV and column obstacle n ∈ {1, . . ., 5} is defined as e cn = ∥X − O cn ∥ − ρ cn , while the relative distance between AUV and sphere obstacle m ∈ {1, 2} is e sm = ∥X (k) − O sm ∥ − ρ sm .Along with this, the relative distances between AUV and obstacles are shown in Fig.9(e).Clearly, we find that the collision-avoidance task can be accomplished by AUV, since all the relative distances are greater than zero.Note that the communication channel quality, i.e., SNR, is incorporated into the motion planning procedure.Meanwhile, our objective is to guarantee that the SNR for at least one buoy is greater than the threshold.In view of this, Fig.9(f) presents the SNR between AUV and buoy j ∈ {1, 2, 3}.Clearly, the SNR is smaller than the threshold at k = 1, which means the communication quality is poor at ). Correspondingly, the values of reward functions are depicted in Fig.10(b), while the relative distances between AUV and obstacles are shown in Figs.10(c)-(d).From Figs.10(b)-(d), we note that the reward functions with the above two solution can both converge to the optimal, however the vision-based solution can safely avoid collision at a closer distance.
(a).Correspondingly, the states of AUV in the above cases are shown by Figs.11(b)-(c).

4 )
).Clearly, the path length and total time step in Case 2 are shorter than the ones in Case 1, which verify the effectiveness of Corollary 1.More specifically, Case 1 runs 354 time steps to finish the tracking task, while Case 2 only runs 285 time steps.The results in Figs.11 (b)-(d) demonstrate that the adoption of horizontal and vertical cameras is meaningful.Tradeoff of SNR During the Motion Planning: As has mentioned above, the consideration of SNR can ensure the communication efficiency during the motion planning process.However, it may degrade the tracking performance.To reflect this situation, the following three cases are considered: 1) Communication efficiency plays a major role on the reward function, e.g., K 1 = 0.6, K 2 = 0.01 and K 3 = 26; 2) Tracking stability plays a major role on the reward function, e.g., K 1 = 0.7, K 2 = 0.01 and K 3 = 10; 3) Communication

Fig. 12 .
Fig. 12. Tradeoff of the SNR during the motion planning.

Fig. 14 .
Fig. 14.Experimental setup and results when the SNR is ignored in obstacle-sparse environment.

Fig. 15 .
Fig. 15.Experimental results when the SNR is considered in obstacle-sparse environment.

Fig. 16 .
Fig. 16.Obstacle recognition and the video screenshots for the single-obstacle avoidance procedure.

Fig. 17 .
Fig. 17.Experimental results when the SNR is ignored in obstacle-dense environment.
Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of[44].2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig.17(a).We first ignore the influence of SNR, and hence, the motion trajectories of AUV with parallax-based and distance-based solutions are shown in Fig.17(b).Clearly, the path length by using parallax-based solution (i.e., 8.03 m) is smaller than the one by using distance-based solution (i.e., 8.47 m).Besides that, the reward functions with the above two solutions are presented by Fig.17(c).The relative distances between AUV and obstacles are shown in Fig.17(d), where C1, C2 and C3 denote the obstacle 1, obstacle 2 and obstacle 3, respectively.We can see that the reward functions with the above two solutions can both converge to the optimal, meanwhile, the parallax-based solution can avoid obstacles at

17
Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of[44].2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig.17(a).We first ignore the influence of SNR, and hence, the motion trajectories of AUV with parallax-based and distance-based solutions are shown in Fig.17(b).Clearly, the path length by using parallax-based solution (i.e., 8.03 m) is smaller than the one by using distance-based solution (i.e., 8.47 m).Besides that, the reward functions with the above two solutions are presented by Fig.17(c).The relative distances between AUV and obstacles are shown in Fig.17(d), where C1, C2 and C3 denote the obstacle 1, obstacle 2 and obstacle 3, respectively.We can see that the reward functions with the above two solutions can both converge to the optimal, meanwhile, the parallax-based solution can avoid obstacles at Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of[44].2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig.17(a).We first ignore the influence of SNR, and hence, the motion trajectories of AUV with parallax-based and distance-based solutions are shown in Fig.17(b).Clearly, the path length by using parallax-based solution (i.e., 8.03 m) is smaller than the one by using distance-based solution (i.e., 8.47 m).Besides that, the reward functions with the above two solutions are presented by Fig.17(c).The relative distances between AUV and obstacles are shown in Fig.17(d), where C1, C2 and C3 denote the obstacle 1, obstacle 2 and obstacle 3, respectively.We can see that the reward functions with the above two solutions can both converge to the optimal, meanwhile, the parallax-based solution can avoid obstacles at

17
Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of[44].2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig.17(a).We first ignore the influence of SNR, and hence, the motion trajectories of AUV with parallax-based and distance-based solutions are shown in Fig.17(b).Clearly, the path length by using parallax-based solution (i.e., 8.03 m) is smaller than the one by using distance-based solution (i.e., 8.47 m).Besides that, the reward functions with the above two solutions are presented by Fig.17(c).The relative distances between AUV and obstacles are shown in Fig.17(d), where C1, C2 and C3 denote the obstacle 1, obstacle 2 and obstacle 3, respectively.We can see that the reward functions with the above two solutions can both converge to the optimal, meanwhile, the parallax-based solution can avoid obstacles at Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of[44].2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig.17(a).We first ignore the influence of SNR, and hence, the motion trajectories of AUV with parallax-based and distance-based solutions are shown in Fig.17(b).Clearly, the path length by using parallax-based solution (i.e., 8.03 m) is smaller than the one by using distance-based solution (i.e., 8.47 m).Besides that, the reward functions with the above two solutions are presented by Fig.17(c).The relative distances between AUV and obstacles are shown in Fig.17(d), where C1, C2 and C3 denote the obstacle 1, obstacle 2 and obstacle 3, respectively.We can see that the reward functions with the above two solutions can both converge to the optimal, meanwhile, the parallax-based solution can avoid obstacles at

17
Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of[44].2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig.17(a).We first ignore the influence of SNR, and hence, the motion trajectories of AUV with parallax-based and distance-based solutions are shown in Fig.17(b).Clearly, the path length by using parallax-based solution (i.e., 8.03 m) is smaller than the one by using distance-based solution (i.e., 8.47 m).Besides that, the reward functions with the above two solutions are presented by Fig.17(c).The relative distances between AUV and obstacles are shown in Fig.17(d), where C1, C2 and C3 denote the obstacle 1, obstacle 2 and obstacle 3, respectively.We can see that the reward functions with the above two solutions can both converge to the optimal, meanwhile, the parallax-based solution can avoid obstacles at Figs.16(c)-(h).These results reflect the effectiveness of the collision-avoidance algorithm.Particularly, the demo for these results is provided in the second part of[44].2) Experimental Results in Obstacle-Dense Environment: In this section, the obstacle-dense environment is considered, where three obstacles exist in the workspace, as shown by Fig.17(a).We first ignore the influence of SNR, and hence, the motion trajectories of AUV with parallax-based and distance-based solutions are shown in Fig.17(b).Clearly, the path length by using parallax-based solution (i.e., 8.03 m) is smaller than the one by using distance-based solution (i.e., 8.47 m).Besides that, the reward functions with the above two solutions are presented by Fig.17(c).The relative distances between AUV and obstacles are shown in Fig.17(d), where C1, C2 and C3 denote the obstacle 1, obstacle 2 and obstacle 3, respectively.We can see that the reward functions with the above two solutions can both converge to the optimal, meanwhile, the parallax-based solution can avoid obstacles at

Fig. 18 .
Fig. 18.Experimental results when the SNR is considered in obstacle-dense environment.

Fig. 19 .
Fig. 19.Video screenshots of DDPG-based solution when the SNR is considered in obstacle-dense environment.

TABLE I COMPARISON
WITH THE OTHER LITERATURES );

TABLE II SOME
PARAMETERS OF AUV AND DDPG NETWORK and t f denotes the termination step.Based on this, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.