Abstract:
Reward design in reinforcement learning should be less burdensome on the designer and be able to respond to changes in the environment and task. Therefore, we are studyin...Show MoreMetadata
Abstract:
Reward design in reinforcement learning should be less burdensome on the designer and be able to respond to changes in the environment and task. Therefore, we are studying the Self-generation of Reward, a method of reward design that does not depend on changes in the environment or task. Self-generation of Reward is a method in which an agent generates its own reward by evaluating sensor information from the outside world from multiple perspectives, using the sense of danger avoidance of biological life as an index. The only desired indicator of danger avoidance is a negative evaluation when the sensor information is dangerous. However, when the state is not dangerous, the evaluation is positive. Positive evaluation means that the agent is desired to achieve the goal, but the non-dangerous state has nothing to do with achieving the goal. Therefore, there is a problem that the measure of risk avoidance includes the evaluation of the achievement of undesired goals. To solve this problem, we propose a method to control whether sensor information is evaluated or not. Improve the indicator of risk avoidance to generate only negative evaluations by not including positive evaluations in the sensor evaluation if they are indicated. In this study, we conducted a simulation experiment using path finding on a grid map to verify whether the proposed method can be used for learning. As a result, we found that learning was possible by adjusting the reward given to each action.
Date of Conference: 05-07 December 2021
Date Added to IEEE Xplore: 24 January 2022
ISBN Information: