电子测量与仪器学报

王典,周阳,宋毅,代传金.基于 Q 学习的生物启发式目标导向导航路径规划模型[J].电子测量与仪器学报,2023,37(6):68-76

基于 Q 学习的生物启发式目标导向导航路径规划模型

Model of path planning in biological inspired goal-oriented navigation based on Q-learning

DOI：

英文关键词:Q-learning place cells cognitive map path planning goal-oriented navigation bionic navigation

基金项目:国家自然科学基金(61973314)项目资助

作者	单位
王典	1. 四川职业技术学院，2. 西华师范大学
周阳	3. 95486 部队
宋毅	3. 95486 部队
代传金	4. 空军工程大学信息与导航学院

Author	Institution
Wang Dian	1. Sichuan Vocational and Technical College,2. China West Normal University
Zhou Yang	3. Army Unit 95486
Song Yi	3. Army Unit 95486
Dai Chuanjin	4. Information and Navigation College,Air Force Engineering University

摘要点击次数: 996

全文下载次数: 2402

中文摘要:

为解决未知环境中移动机器人面向目标运行时最优路径获取问题,本文提出一种基于 Q 学习的生物启发式目标导向导航路径规划模型。该模型包括基于 Q 学习的空间探索、基于认知图运行控制和最优路径选择 3 部分。首先,在空间探索中, 通过位置细胞的放电情况表征位置状态,采用 ε 动态取值方式进行状态-动作学习,生成认知图,并给出空间探索阶段最优路径。其次,在基于认知图的运行控制中,分别依据最大动作细胞放电率原则和群体动作细胞原则进行运行方向选择,采用多尺度位置更新间距进行位置更新,得到不同认知图下最优路径。最后,对比分析空间探索阶段和运行控制阶段路径规划结果,选取最优路径。仿真结果表明,所提模型可行,采用 ε 动态取值方式进行空间探索可得到较好的路径规划结果;运行体在充分的空间探索后,可提供可行、有效的面向目标运行的路径。

英文摘要:

To solve the problem of obtaining the optimal path for mobile robots running during goal-oriented running in an unknown environment, a path planning model in biological inspired goal-oriented navigation based on Q-learning is proposed in this paper. The model includes three parts: Spatial exploration based on Q-learning, running control based on cognitive map and optimal path selection. Firstly, in space exploration, the location state is represented by place cells’ firing statues, and the state-action is learned by using dynamic ε value, which can generate cognitive map and provide the optimal path in space exploration stage. Secondly, in the running control based on cognitive map, the running direction is selected respectively according to the principle of maximum action cells’ firing and the principle of group action cells, and the multi-scale position update intervals are used to update the position. As a result, the optimal path based on different cognitive maps can be obtained. Finally, path planning’s result from space exploration stage and running control stage is compared, and the optimal path is selected. Simulation results show that the proposed model is feasible. A better path planning result can be obtained by using the dynamic ε value in space exploration. Besides, a feasible and effective path can be provided for goal-oriented running after sufficient space exploration.

查看全文查看/发表评论下载PDF阅读器