罗国攀,张国良,徐佳宝.基于 SPE-ICM 的移动机器人内在动机避障规划[J].电子测量与仪器学报,2023,37(2):21-27
基于 SPE-ICM 的移动机器人内在动机避障规划
Obstacle avoidance strategy method for mobile robots based on SPE-ICM
  
DOI:
中文关键词:  状态奖励误差  内在动机奖励  最优避障策略
英文关键词:status reward error  optimal obstacle avoidance strategy  intrinsic motivation rewards
基金项目:四川省应用基础研究项目(2019YJ00413)资助
作者单位
罗国攀 1.四川轻化工大学人工智能四川省重点实验室 
张国良 1.四川轻化工大学人工智能四川省重点实验室 
徐佳宝 1.四川轻化工大学人工智能四川省重点实验室 
AuthorInstitution
Luo Guopan 1.Artificial Intelligence Key Laboratory of Sichuan Province, Sichuan University of Science & Engineering 
Zhang Guoliang 1.Artificial Intelligence Key Laboratory of Sichuan Province, Sichuan University of Science & Engineering 
Xu Jiabao 1.Artificial Intelligence Key Laboratory of Sichuan Province, Sichuan University of Science & Engineering 
摘要点击次数: 430
全文下载次数: 477
中文摘要:
      针对动态环境下强化学习算法对移动障碍物的检测不理想,进而影响最优避障策略的问题。 提出一种以状态预测误差 为内在动机的奖励结构形式(state predict error - intrinsic curiosity module, SPE-ICM)来提高策略函数对 Agent 的环境探索能力。 首先,引入内在奖励机制为 Agent 提供多重奖励(reward)结构;其次,依据内外的奖励结构优化提高 Agent 对环境信息的感知能 力,改进对移动障碍物在数据结构上的采集检测方式,并且依赖新的检测方式对最优避障策略函数进行优化提升;最后,将该网 络模型与深度确定性策略梯度算法(DDPG)结合,运用到以 ROS 搭建的路径规划仿真环境中进行对比实验,验证所提算法的可 行性。 结果表明,所提算法在检测能力、决策能力上效果明显更优。
英文摘要:
      Aiming at the problem that the reinforcement learning algorithm is not ideal to detect moving obstacles in a dynamic environment, which affects the optimal obstacle avoidance strategy. A state predict error-intrinsic curiosity module (SPE-ICM) with state prediction error as intrinsic motivation is proposed to improve the ability of policy functions to explore the environment of Agents. First, the internal reward mechanism is introduced to provide multiple reward (reward) structure for Agent. Secondly, according to the internal and external reward structure optimization, the Agent’s perception of environmental information is improved, the collection and detection method of moving obstacles on the data structure is improved, and the optimal obstacle avoidance strategy function is optimized and improved by relying on the new detection method. Finally, the network model is combined with the deep deterministic strategy gradient algorithm (DDPG), and the comparative experiment is carried out in the path planning simulation environment built by ROS to verify the feasibility of the proposed algorithm. The results show that the proposed algorithm has significantly better effects in detection ability and decision-making ability.
查看全文  查看/发表评论  下载PDF阅读器