Processing math: 100%
Local Path Planning: Dynamic Window Approach With Q-Learning Considering Congestion Environments for Mobile Robot | IEEE Journals & Magazine | IEEE Xplore

Local Path Planning: Dynamic Window Approach With Q-Learning Considering Congestion Environments for Mobile Robot


A novel local path planning method, called DQDWA, is proposed for a mobile robot. DQDWA adjusts weight coefficients of the evaluation function in real-time using the surr...

Abstract:

In recent years, autonomous mobile robots have significantly increased in prevalence due to their ability to augment and diversify the workforce. One critical aspect of t...Show More

Abstract:

In recent years, autonomous mobile robots have significantly increased in prevalence due to their ability to augment and diversify the workforce. One critical aspect of their operation is effective local path planning, which considers dynamic constraints. In this context, the Dynamic Window Approach (DWA) has been widely recognized as a robust local path planning. DWA produces a set of path candidates derived from velocity space subject to dynamic constraints. An optimal path is selected from path candidates through an evaluation function guided by fixed weight coefficients. However, fixed weight coefficients are typically designed for a specific environmental context. Consequently, changes in environmental conditions such as congestion levels, road width, and obstacle density could potentially lead the evaluation function to select inefficient paths or even result in collisions. To overcome this challenge, this paper proposes the dynamic weight coefficients based on Q-learning for DWA (DQDWA). The proposed method uses a pre-learned Q-table that comprises robot states, environmental conditions, and actions of weight coefficients. DQDWA can use the pre-learned Q-table to dynamically select optimal paths and weight coefficients that better adapt to varying environmental conditions. The performance of DQDWA was validated through extensive simulations and real experiments to confirm its ability to enhance the effectiveness of local path planning.
A novel local path planning method, called DQDWA, is proposed for a mobile robot. DQDWA adjusts weight coefficients of the evaluation function in real-time using the surr...
Published in: IEEE Access ( Volume: 11)
Page(s): 96733 - 96742
Date of Publication: 01 September 2023
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

As the world grapples with declining birthrates and an aging population, these demographic shifts are increasingly viewed as serious issues [1], [2]. To mitigate the resultant strain on the workforce, there has been a growing emphasis on implementing autonomous mobile robots in various contexts such as warehouses [3] and factories [4]. These robots need to navigate different environments autonomously. Therefore, the robot requires the integration of a diverse set of technologies, including localization [5], mapping [6], perception [7], and path planning [8].

This paper primarily explores the realm of path planning. The path planning technology is divided into global and local path planning [9]. Global path planning generates a path from the starting point to the destination based on a pre-existing map [10], [11]. However, it fails to account for unknown or unexpected obstacles in real-world environments. Therefore, in dynamic human workspaces, robots should reach their destinations and avoid obstacles autonomously and adaptively [12], [13]. Consequently, the focus has shifted towards local path planning, which factors in the dynamic obstacles not accounted for on the pre-established maps.

With this research, we delve into local path planning considering those obstacles not included on pre-built maps [14], [15], [16], [17], [18], [19]. While dynamic obstacles are certainly a consideration [20], [21], [22], [23], this paper focuses on static environments like factories and warehouses. The Dynamic Window Approach (DWA), which accounts for dynamic constraints, has emerged as a prevalent local path planning method [24]. Despite numerous reported improvements to DWA [15], [25], [26], its limitations persist. In particular, DWA’s fixed weight coefficients, which determine the optimal path based on factors such as goal position, obstacle distance, and robot velocity, fail to adapt to changes in environmental situations. This can lead to the selection of inefficient paths or even collisions, especially in confined or crowded spaces like factories and warehouses.

To address these issues, researches on dynamic weight coefficients for DWA have been carried out. Abubakr et al. and Hong et al. adjusted weight coefficients with fuzzy logic [28], [29]. These approaches dynamically adjusted weight coefficients using fuzzy logic to analyze goal positions and obstacles. Chang et al. proposed using Q-learning to dynamically adjust the weight coefficients of DWA [30]. Q-learning is a method of reinforcement learning. It doesn’t require prior knowledge of the environment, making it suitable for robot path planning. Additionally, it offers a low cost for learning.

Considering these advantages, this paper focuses on the Q-learning method for adjusting weight coefficients of DWA. While the conventional method [30] adjusts weight coefficients based on goal information, velocities, and obstacles, it doesn’t account for the spatial area and congestion rates of the environment. The conventional method leads to the selection of inefficient paths or even collisions, depending on the specific situation. To remedy the issue, this paper proposes a dynamic weight coefficient adjustment approach based on Q-learning for DWA that accounts for environmental situations (DQDWA). DQDWA considers environmental conditions such as goal distance, goal direction, velocity, visible area, and congestion. DQDWA can dynamically adjust the weight coefficients of the evaluation function based on these environmental conditions. Extensive simulations and experiments have been carried out to demonstrate the effectiveness and advantages of DQDWA in real-world scenarios.

The main contributions of this paper are threefold:

  • This paper proposes DQDWA. DQDWA can dynamically adjust the weight coefficients of the evaluation function based on these environmental conditions.

  • DQDWA incorporates the concept of context-awareness, where weight coefficients are not static but dynamically adjusted according to the area of spaces and congestion levels. This approach enhances the adaptability and performance of autonomous robots in varied situations.

  • The effectiveness of DQDWA has been validated through extensive simulations and real-world experiments. The results demonstrate that the proposed method outperforms traditional DWA in terms of efficiency and safety.

This paper is organized into eight sections including the current section. Sections II, III, and IV provide a comprehensive overview of the coordinate system, the Dynamic Window Approach (DWA), and Q-learning, respectively. Section V proposes Dynamic Weight Coefficients based on Q-learning for DWA (DQDWA). Sections VI and VII show the results from our simulations and real-world experiments to highlight the effectiveness and utility of DQDWA. Finally, section VIII provides conclusions.

SECTION II.

Coordinate System

Fig. 1 illustrates the coordinate system for the robot utilized in this study. This paper defines two coordinate systems: the local coordinate system \Sigma _{LC} , and the global coordinate system \Sigma _{GB} . The quantities measured in the global coordinate system are expressed with the superscript ^{GB}\bigcirc . Variables belonging to the local coordinate system do not carry a superscript. The origin in the global coordinate system is situated at the initial position of the robot. The origin in local coordinate system is positioned at the midpoint between the robot’s wheels. As shown in Fig. 1, (^{GB}x , ^{GB}y ) and ^{GB}\theta represent the position and angle of the robot in the global coordinate system, respectively. L^{rob} denotes the radius of the robot.

FIGURE 1. - Modeling of robot.
FIGURE 1.

Modeling of robot.

SECTION III.

Dynamic Window Approach (DWA)

A. Overview of DWA

The Dynamic Window Approach (DWA) is a commonly used method in local path planning [24]. Initially, the velocity space with dynamic constraints (VSD) is determined based on the robot’s current velocities. Subsequently, at each time step, an optimal path is selected from the VSD using an evaluation function. This optimal path selection is dependent on the weight coefficients of the evaluation function. Details about the velocity space and the optimal path selection are further elaborated in Sections III-B and III-C, respectively.

B. Velocity Space

DWA generates a velocity space with dynamic constraint, denoted as D^{vsd} , using translational and angular velocities as illustrated in Fig. 2 (a). The velocity space D^{vsd} is defined as follows.\begin{equation*} D^{vsd} =D^{all} \cap D^{dw} \cap D^{obs} \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where D^{all} represents the range of maximum and minimum velocities determined by the robot’s specifications. D^{dw} known as the dynamic window, defines the range of velocities that the robot can achieve at the next time step. D^{obs} consists of velocities that enable the robot to stop before colliding with an obstacle.

FIGURE 2. - Dynamic window aprroach.
FIGURE 2.

Dynamic window aprroach.

C. Optical Path

The velocity space D^{vsd} is discretized by equally dividing the range of the translational and angular velocities. This results in pairs of translational and angular velocities within the velocity space D^{vsd} , which serve as the velocity candidates. As shown in Fig. 2 (b), DWA generates predicted paths for each velocity candidate under the assumption of constant velocity motion.

Path candidates are evaluated using the following evaluation function J .\begin{equation*} J = W^{gol} \cdot c^{gol} + W^{vel} \cdot c^{vel} + W^{obs} \cdot c^{obs} \tag{2}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where W^{gol} , W^{vel} , and W^{obs} represent the weight coefficients associated with the goal, velocity, and obstacles, respectively. c^{gol} indicates the distance between the predicted robot position and the goal position. c^{vel} corresponds to the current translational velocity. c^{obs} represents the shortest distance from the predicted robot position on the path to the obstacle. The optimal path is then determined by maximizing the evaluation function J . More details on DWA can be found in [27].

SECTION IV.

Q-Learning

To dynamically adjust the weight coefficients of the evaluation function in DWA, this study incorporates Q-learning [31], a type of reinforcement learning method. Fig. 3 outlines the concept of Q-learning, which updates a Q-table that stores the Q-values for each action in each state. The Q-table is a m \times n matrix, where m and n correspond to the numbers of states and actions, respectively. The formula to update the Q-value is defined as follows.\begin{equation*} Q(s,a) = (1-\alpha)Q(s,a) + \alpha [R(s,a) + \gamma Q(s',a)] \tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \alpha and \gamma represent the learning rate and discount rate, respectively. R(s, a) and Q(s', a) denote the reward for the agent and the maximum Q-value in the next state, respectively. The training process for Q-learning involves four steps, as outlined in Fig. 3:

  • Train1:

    In the current state s , the agent chooses action a using the \epsilon -greedy method with the Q-table.

  • Train2:

    The agent receives the next state and reward R from the environment.

  • Train3:

    The Q-value in the Q-table is updated using (3).

Train1-Train3 are repeated until the Q-table converges to a threshold value.
TABLE 1 Control Parameters
Table 1- 
Control Parameters
FIGURE 3. - Q-learning.
FIGURE 3.

Q-learning.

The \epsilon -greedy method [32] is utilized for action selection. With probability \epsilon , the action is chosen randomly, while with probability 1 - \epsilon , the action with the highest expected reward is chosen. More details about Q-learning can be found in [31].

SECTION V.

Proposed Method (DQDWA)

A. Overview of DQDWA

This section proposes Dynamic Weight Coefficients based on Q-learning for DWA approach considering environmental situations (DQDWA). While the conventional method [30] adjusts weight coefficients based on certain parameters, it does not consider visible area and congestion rate as environmental factors. Therefore, the conventional method may lead to inefficient path selection or even collisions depending on the circumstances. To address this limitation, the proposed method includes visible area and congestion as key factors in defining environmental situations. Fig. 4 provides a overview of DQDWA which consists of four steps:

  • Step1:

    Determine the robot state s_{1} -s_{5} based on environmental information measured by the distance sensor. The definitions of s_{1} -s_{5} are detailed in Section V-B.

  • Step2:

    The trained Q-table chooses the appropriate combination of weight coefficients for the given state. The definition of the action dimension and the Q-learning process are detailed in Sections IV, V-C, and V-D.

  • Step3:

    The chosen weight coefficients are applied to the evaluation function of DWA. DWA is elaborated on in Section III.

  • Step4:

    The robot moves according to the translational and angular velocity that maximizes the evaluation function.

FIGURE 4. - Overview of DQDWA.
FIGURE 4.

Overview of DQDWA.

B. Definition of State Dimension

Fig. 5 provides a visual representation of state dimensions s_{1} -s_{5} . s_{1} -s_{5} indicate states related to the goal distance, goal direction, traveled distance, visible area, and congestion. The state vector \boldsymbol {s} is defined as follows.\begin{equation*} \boldsymbol {s} = \begin{bmatrix} s_{1} & s_{2} & s_{3} & s_{4} & s_{5} \end{bmatrix} ^{\mathsf {T}} \tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where s_{1} , s_{2} , s_{3} , and s_{4} has two possible patterns, while s_{5} has four. Thus, the total size of the state dimension is 64. The detailed descriptions of s_{1} -s_{5} are presented below.

FIGURE 5. - Image of state in proposed method.
FIGURE 5.

Image of state in proposed method.

1) Definition of State Dimension s_{1} (Goal Distance)

s_{1} represents the state associated with the distance between the robot’s position and the goal position. s_{1} is defined as follows.\begin{align*} s_{1} = \begin{cases} \displaystyle 1\quad \text { if}\quad l^{rg} < W^{dis} L^{rob}\\ \displaystyle 2\quad \text { otherwise} \end{cases} \tag{5}\end{align*}

View SourceRight-click on figure for MathML and additional features. where W^{dis} is the weight coefficient for the goal distance and l^{rg} is the distance between the robot and the goal.

2) Definition of State Dimension s_{2} (Goal Direction)

s_{2} represents the state indicating the angular difference between the robot’s direction and the goal’s direction. s_{2} is defined as follows.\begin{align*} s_{2} = \begin{cases} \displaystyle 1\quad \text { if}\quad {\theta }^{rg}\in \big[-\cfrac {\pi }{2},\cfrac {\pi }{2} \big)\\ \displaystyle 2\quad \text { otherwise} \end{cases} \tag{6}\end{align*}

View SourceRight-click on figure for MathML and additional features. where {\theta }^{rg} is the angle between the robot and the goal.

3) Definition of State Dimension s_{3} (Travelled Distance)

s_{3} is the state related to the distance that the robot will travel from its current position after one second. The travelled distance \eta is calculated as follows.\begin{align*} \eta = \begin{cases} \displaystyle \cfrac {2v}{\omega }\quad &\text {if}\quad {|\omega |}>\pi \\ \displaystyle v &\text {else if}\quad \omega = 0\\ \displaystyle \cfrac {2v}{\omega } \sin (\cfrac {\omega }{2})\quad &\text {otherwise} \end{cases} \tag{7}\end{align*}

View SourceRight-click on figure for MathML and additional features. s_{3} is defined as follows.\begin{align*} s_{3} = \begin{cases} \displaystyle 1\quad \text { if}\quad |\eta | \leq \cfrac {V^{max}}{2}\\ \displaystyle 2\quad \text { otherwise} \end{cases} \tag{8}\end{align*}
View SourceRight-click on figure for MathML and additional features.
where V^{max} is the maximum translational velocity of the robot.

4) Definition of State Dimension s_{4} (Visible Area)

s_{4} is the state that quantifies the visible area around the robot. The divided area f_{i} for the state s_{4} is defined as follows.\begin{equation*} f_{i} = \cfrac {1}{2} d_{i} d_{i+1}\sin (\cfrac {2\pi }{N}) \tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where d_{i} is the i -th distance data measured by the distance sensor, and N is the total number of distance data points. Distance data is obtained in \cfrac {2\pi }{N} radian increments in a counter-clockwise direction. The total divided area f^{all} is calculated as follows.\begin{equation*} f^{all} = \sum _{i=1}^{N} f_{i} \tag{10}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
s_{4} is defined as follows.\begin{align*} s_{4} = \begin{cases} \displaystyle 1 \quad \text { if}\quad f^{all} \leq W^{are}F^{max}\\ \displaystyle 2 \quad \text { otherwise} \end{cases} \tag{11}\end{align*}
View SourceRight-click on figure for MathML and additional features.
where W^{are} is the weight coefficient of the area, and F^{max} is the maximum possible sum of the divided areas.

5) Definition of State Dimension s_{5} (Congestion)

s_{5} is the state associated with congestion. s_{5} is defined based on the number of obstacles surrounding the robot.\begin{align*} s_{5} = \begin{cases} \displaystyle 1 \quad \text { if}\quad n^{fwd} > \cfrac {N}{4}\\ \displaystyle 2 \quad \text { else if}\quad n^{bwd} >\cfrac {N}{4}\\ \displaystyle 3 \quad \text { else if}\quad n^{all} > \cfrac {N}{4} \\ \displaystyle 4 \quad \text { otherwise} \end{cases} \tag{12}\end{align*}

View SourceRight-click on figure for MathML and additional features. where n^{fwd} and n^{bwd} are the number of sensor data points within a threshold distance D^{thr} in the front and rear halves of the robot, respectively. n^{all} is defined as follows.\begin{equation*} n^{all} = n^{fwd} + n^{bwd} \tag{13}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

C. Definition of Reward

To adjust the weight coefficients of the evaluation function with Q-learning, the reward R is defined as follows.\begin{equation*} R = R_{1}+R_{2}+R_{3} \tag{14}\end{equation*}

View SourceRight-click on figure for MathML and additional features. Note that the initial value of R is set to 0. R_{1} is a reward related to the result; goal or collision.\begin{align*} R_{1} = \begin{cases} \displaystyle 5000 \quad &\text {if}\qquad \text { reach goal}\\ \displaystyle -200 \quad &\text {else if}\qquad \text { collide obstacle}\\ \displaystyle -2 \qquad &\text {otherwise} \end{cases} \tag{15}\end{align*}
View SourceRight-click on figure for MathML and additional features.
R_{2} is a reward related to distance from the goal position.\begin{align*} R_{2} &=\begin{cases} \displaystyle 0 \quad &\text {if} \qquad \text { reach goal or collide obstacle}\\ \displaystyle 10 \quad &\text {if}\qquad \text { get close to goal position}\\ \displaystyle -10 \quad &\text {else if}\qquad \text { farther from goal position}\end{cases} \\{}\tag{16}\end{align*}
View SourceRight-click on figure for MathML and additional features.
R_{3} is a reward related to distance from the obstacle.\begin{align*} R_{3} &= \begin{cases} \displaystyle 0 \quad &\text {if} \qquad \text { reach goal or collide obstacle}\\ \displaystyle -5 \quad &\text {if} \qquad \text { approach obstacle}\\ \displaystyle 5 \quad &\text {else if}\qquad \text { go away from obstacle}\end{cases} \\{}\tag{17}\end{align*}
View SourceRight-click on figure for MathML and additional features.

D. Definition of Action Dimension

The weight coefficients for position, velocity, and obstacles are each selected from the set {1,2,3}. We omit the sets {2,2,2} and {3,3,3} as they are equivalent to the set {1,1,1}. As a result, 25 unique combinations are obtained, which form the action dimension.

The Q-table comprises the actions and the states of the robot in various environmental situations. By utilizing the learned Q-table, DQDWA selects the optimal path using dynamic weight coefficients considering these environmental situations.

SECTION VI.

Simulation

A. Simulation Setup

The simulation system was implemented using the Robot Operating System (ROS) and Gazebo. In this simulation, we evaluated five patterns of DWA weight coefficients: DWA I, DWA II, DWA III, Conventional DWA with Q-learning (CDQ) [30], and DQDWA. The constant weight coefficients W^{gol}, W^{vel}, W^{obs} for DWA I, DWA II, and DWA III were set as {1,1,2}, {1,2,1}, and {2,1,1}, respectively. Table 1 displays the simulation parameters.

B. Pre-Train of Q-Table

Fig. 6 (a)-(e) depict the environments utilized during the learning process. Env. and goal positions were randomly selected at the start of each trial as outlined in Table 2. ^{GB}x^{gol} and ^{GB}y^{gol} denote the X and Y coordinates of the goal positions, respectively. The notation ([-1.2,1.2],[-1.2,1.2]) means that ^{GB}x^{gol} and ^{GB}y^{gol} are randomly selected from the range of [-1.2,1.2] . As illustrated in Fig. 6 (a)-(c), Env. 1–3 were designed to examine the differences in robot behavior due to crowding within restricted spaces. Fig. 6 (d) shows Env. 4, which was established to investigate robot behavior in a spacious area filled with numerous obstacles. In Fig. 6 (e), Env. 5 was designed to evaluate robot behavior amidst obstacles and humans. All of these environments were thoughtfully designed with real-world scenarios in mind, specifically warehouse and factory settings. The learning process was continued until the Q-table had been updated 30,000 times.

TABLE 2 Goal Positions in Each Environment of Learning Phase
Table 2- 
Goal Positions in Each Environment of Learning Phase
FIGURE 6. - Image of Learning Environment. Blue area means the visible sensor area.
FIGURE 6.

Image of Learning Environment. Blue area means the visible sensor area.

C. Simulation Environment

In this simulation, the following two types of simulations were conducted.

  • Case S1: Conducted a single simulation in each environment (Env. 1-5).

  • Case S2: Performed 30 simulations in unfamiliar environments (Env. 6,7).

The starting positions in Cases S1 and S2 were set to (^{GB}x^{sta},^{GB}y^{sta})=(0.0,0.0) . In Case S1, the goal positions of Env. 1-5, denoted as (^{GB}x^{gol},^{GB}y^{gol}) , were established as (-1.2,-1.2) , (-1.2,-1.2) , (-1.3,1.5) , (0.0,4.0) , and (0.0,8.0) , respectively. For Case S2, goal positions were randomly selected from the red-filled areas as shown in Fig. 7.
FIGURE 7. - Image of Non-Learning Environment. Blue area means the visible sensor area.
FIGURE 7.

Image of Non-Learning Environment. Blue area means the visible sensor area.

Env. 6 was designed with a higher density of obstacles compared to Env. 4. This was intended to test the robot’s ability to deal with unforeseen obstacles that were not encountered during the learning phase. Env. 7 was designed to simulate a manufacturing plant. In this environment, the robot had to recognize and avoid not only static obstacles but also dynamic obstacles, such as humans, while navigating toward its goal.

D. Simulation Results

1) Case S1

Tables 3-​4 present the results for Case S1. The abbreviations TL and PD denote the trajectory length and the movement posture displacement, respectively. Table 4 indicates the number of collisions, along with the average time, trajectory length (TL), and posture displacement (PD). Figs. 8–​12 depict the trajectories in each environment.

TABLE 3 Simulation Results in Case S1 (1Time in Each Environment)
Table 3- 
Simulation Results in Case S1 (1Time in Each Environment)
TABLE 4 Simulation Results in Case S1 (Average in Each Env.)
Table 4- 
Simulation Results in Case S1 (Average in Each Env.)
FIGURE 8. - Trajectories of DWA I ({
$W^{gol},W^{vel},W^{obs}$
} ={1,1,2}).
FIGURE 8.

Trajectories of DWA I ({W^{gol},W^{vel},W^{obs} } ={1,1,2}).

FIGURE 9. - Trajectories of DWA II ({
$W^{gol},W^{vel},W^{obs}$
} ={1,2,1}).
FIGURE 9.

Trajectories of DWA II ({W^{gol},W^{vel},W^{obs} } ={1,2,1}).

FIGURE 10. - Trajectories of DWA III ({
$W^{gol},W^{vel},W^{obs}$
} ={2,1,1}).
FIGURE 10.

Trajectories of DWA III ({W^{gol},W^{vel},W^{obs} } ={2,1,1}).

FIGURE 11. - Trajectories of the conventional method (CDQ).
FIGURE 11.

Trajectories of the conventional method (CDQ).

FIGURE 12. - Trajectories of the proposed method (DQDWA).
FIGURE 12.

Trajectories of the proposed method (DQDWA).

For DWA I-III, while they delivered satisfactory results in some environments, there were instances where the robot collided with obstacles. Moreover, the robot often required a long duration to reach the goal position. The simulation results for DWA I-III varied depending on the environmental situation, as these methods utilize fixed weight coefficients.

In the case of CDQ, the simulation results for Env. 1–3 were better than those for DWA I-III, since CDQ selects weight coefficients considering the environmental situation. However, the results for Env. 4 and 5 were nearly identical to those for DWA I-III. This is because CDQ does not define environmental situations based on visible space size or obstacle count. Therefore, optimal weight coefficients were not chosen in narrow or crowded spaces.

In DQDWA, the robot successfully reached the goal position in the shortest time and with the smallest TL and PD. This is because DQDWA takes into account both space size and congestion, enabling the selection of optimal weight coefficients tailored to each environment. DQDWA allows for more efficient routing while ensuring safety and preventing the robot from circling in one place.

2) Case S2

Table 5 presents the results for Case S2, while Figs. 8–​12 (f)-(g) illustrate the corresponding trajectories. In this simulation, the goal position was randomly selected at the start of each trial. For Env. 6, the goal position was selected from four points with (^{GB}x^{gol},^{GB}y^{gol}) being (2.0,-3.0),(3.0,2.0) , (-2.0,-3.0) , and (-3.0,2.0) . In Env. 7, the goal position was selected from two points, with (^{GB}x^{gol},^{GB}y^{gol}) being (8.5,2.0) and (8.5,-2.0) .

TABLE 5 Case S2 Results (30 Times in Each Environment)
Table 5- 
Case S2 Results (30 Times in Each Environment)

In the case of DWA I, while the success rate was high, the robot took a significantly longer time to reach the goal position. DWA II and DWA III achieved smaller values for time, trajectory length (TL), and posture displacement (PD), but their success rates were comparatively lower. This is because these approaches prioritized high translational velocity and goal distance over obstacle avoidance, leading to more collisions.

CDQ yielded a lower success rate than DQDWA. Additionally, the averages of time, TL, and PD were the largest in Env. 6 and the second largest in Env. 7. This indicates that CDQ didn’t select efficient paths in environments that were not encountered during the learning phase.

In contrast, DQDWA achieved the highest success rate. Furthermore, it reached the goal position in a time span comparable to DWA II, which prioritizes translational velocity, and with TL and PD as small as DWA III, which prioritizes the goal distance. Therefore, DQDWA selected efficient paths while maintaining safety in unlearned environments.

The effectiveness of the proposed method, DQDWA, was thus confirmed through the simulation results for both Case S1 and Case S2.

SECTION VII.

Experiment

A. Experiment Setup

The experiment was carried out with ROS and Turtlebot3. Fig. 13 (a) shows an overview of Turtlebot3. Turtlebot3 is equipped with a distance sensor (LDS-01). The distance sensor measured environmental information. Fig. 13 (b)-(c) show the experiment environments. Fig. 13 (d)-(e) show their images. For the experiments, we have two scenarios defined as follows.

  • Case E1: This represents a simple environment with four obstacles.

  • Case E2: This represents a crowded environment with seven obstacles.

The start position was (^{GB}x^{sta},^{GB}y^{sta})=(0.0,0.0) , and the goal position was (^{GB}x^{gol},^{GB}y^{gol})=(4.0,0.0) . The models and parameters used in the experiment were the same as the simulation; DWA I, DWA II, DWA III, CDQ, and DQDWA.
FIGURE 13. - Experimental Set-up.
FIGURE 13.

Experimental Set-up.

B. Experiment Results

Table 6 presents the experimental results. Figs. 14–​15 illustrate the trajectories for each case, and Fig. 16 shows snapshots from DQDWA run.

TABLE 6 Experiment Results
Table 6- 
Experiment Results
FIGURE 14. - Trajectories in Case E1.
FIGURE 14.

Trajectories in Case E1.

FIGURE 15. - Trajectories in Case E2.
FIGURE 15.

Trajectories in Case E2.

FIGURE 16. - Snapshots in DQDWA.
FIGURE 16.

Snapshots in DQDWA.

In DWA I-III, collisions sometimes occurred. Even in cases where the goal was reached, these methods resulted in longer times, larger trajectory lengths (TL), and greater posture displacements (PD) compared to DQDWA. Their inability to adjust weight coefficients dynamically led to collisions and the selection of inefficient paths.

CDQ also resulted in a collision in Case E2. Moreover, its time, TL, and PD in Case E1 were larger than those of DWA II and DQDWA. These results suggest that CDQ was not able to select appropriate weight coefficients based on the environmental situations.

Conversely, DQDWA successfully reached the goal position and registered the shortest time, smallest TL, and least PD in both cases. DQDWA was capable of adjusting weight coefficients effectively in real-time. The effectiveness of the proposed method, therefore, was confirmed by the experimental results.

SECTION VIII.

Conclusion

This paper introduced DQDWA, the dynamic weight coefficients based on Q-learning for DWA considering environmental situations. We focused on defining the state for Q-learning and included definitions for the area of space, taking into account congested areas. With DQDWA, the robot could select optimal paths by dynamic adjustments of weight coefficients. The effectiveness of the proposed method was validated through simulations and real-world experiments.

In the future, we aim to refine and improve DQDWA as follows.

  • Incorporating Moving Obstacles: The current evaluations of DQDWA have been conducted in static environments. Future work will look into accommodating moving obstacles in the learning and experiment environments.

  • Experiments in Diverse Environments: Our experiments have been performed with a single type of robot and sensor. We plan to evaluate DQDWA’s performance across various environments and using different types of robots and sensors.

  • Exploring Alternative Learning Methods: Presently, we utilize Q-learning as the sole learning method to adjust weight coefficients. Future efforts will investigate other learning methods for dynamic adjustment of these coefficients.

References

References is not available for this document.