By Topic

The Improvement on Reinforcement Learning for SCM by the Agent Policy Mapping

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Ruoying Sun ; Beijing Information Science & Technology University, BeiSiHuanZhongLu 35, Beijing 100101, CHINA. sunry@biti.edu.cn ; Gang Zhao ; Chen Li ; Shoji Tatsumi

The reinforcement learning (RL) is an efficient and popular way for solving problems that an agent has no knowledge about the environment a priori, which owns two characteristics: trial-and-error and delayed rewards. An RL agent must derive an optimal policy by directly interacting with the environment and getting the information about the environment. Supply chain management (SCM) is a challenging problem for the agent-based electronic business. Some proposed RL methods perform better than traditional tools for dynamic problem solving in SCM. It realizes on-line learning and performs efficiently in some applications, but RL agent reacts worse than some heuristic methods to sudden changes in SCM demand since the trial-and-error characteristic of RL is time-consuming in practice. By surveying an efficient policy transition mechanism in RL about how to mapping existing policies in the previous task to a new policies in a changed task, this paper proposes a novel RL agent based SCM system that decreases learning time of the RL agent to a dynamic environment. As the result, the RL agent derives the maximal profit using RL technique as jobs coming with a stable distribution. Further, the RL agent makes the optimal procurement satisfying the requirement of sudden changes in the supply chain network by the policy transition mechanism

Published in:

IECON 2006 - 32nd Annual Conference on IEEE Industrial Electronics

Date of Conference:

6-10 Nov. 2006