王 典,周 阳,宋 毅,代传金.基于 Q 学习的生物启发式目标导向导航路径规划模型[J].电子测量与仪器学报,2023,37(6):68-76
基于 Q 学习的生物启发式目标导向导航路径规划模型
Model of path planning in biological inspired goal-oriented navigation based on Q-learning
  
DOI:
中文关键词:  Q 学习  位置细胞  认知图  路径规划  目标导向导航  仿生导航
英文关键词:Q-learning  place cells  cognitive map  path planning  goal-oriented navigation  bionic navigation
基金项目:国家自然科学基金(61973314)项目资助
作者单位
王 典 1. 四川职业技术学院,2. 西华师范大学 
周 阳 3. 95486 部队 
宋 毅 3. 95486 部队 
代传金 4. 空军工程大学信息与导航学院 
AuthorInstitution
Wang Dian 1. Sichuan Vocational and Technical College,2. China West Normal University 
Zhou Yang 3. Army Unit 95486 
Song Yi 3. Army Unit 95486 
Dai Chuanjin 4. Information and Navigation College,Air Force Engineering University 
摘要点击次数: 667
全文下载次数: 1010
中文摘要:
      为解决未知环境中移动机器人面向目标运行时最优路径获取问题,本文提出一种基于 Q 学习的生物启发式目标导向 导航路径规划模型。 该模型包括基于 Q 学习的空间探索、基于认知图运行控制和最优路径选择 3 部分。 首先,在空间探索中, 通过位置细胞的放电情况表征位置状态,采用 ε 动态取值方式进行状态-动作学习,生成认知图,并给出空间探索阶段最优路 径。 其次,在基于认知图的运行控制中,分别依据最大动作细胞放电率原则和群体动作细胞原则进行运行方向选择,采用多尺 度位置更新间距进行位置更新,得到不同认知图下最优路径。 最后,对比分析空间探索阶段和运行控制阶段路径规划结果,选 取最优路径。 仿真结果表明,所提模型可行,采用 ε 动态取值方式进行空间探索可得到较好的路径规划结果;运行体在充分的 空间探索后,可提供可行、有效的面向目标运行的路径。
英文摘要:
      To solve the problem of obtaining the optimal path for mobile robots running during goal-oriented running in an unknown environment, a path planning model in biological inspired goal-oriented navigation based on Q-learning is proposed in this paper. The model includes three parts: Spatial exploration based on Q-learning, running control based on cognitive map and optimal path selection. Firstly, in space exploration, the location state is represented by place cells’ firing statues, and the state-action is learned by using dynamic ε value, which can generate cognitive map and provide the optimal path in space exploration stage. Secondly, in the running control based on cognitive map, the running direction is selected respectively according to the principle of maximum action cells’ firing and the principle of group action cells, and the multi-scale position update intervals are used to update the position. As a result, the optimal path based on different cognitive maps can be obtained. Finally, path planning’s result from space exploration stage and running control stage is compared, and the optimal path is selected. Simulation results show that the proposed model is feasible. A better path planning result can be obtained by using the dynamic ε value in space exploration. Besides, a feasible and effective path can be provided for goal-oriented running after sufficient space exploration.
查看全文  查看/发表评论  下载PDF阅读器