王 典,周 阳,宋 毅,代传金.基于 Q 学习的生物启发式目标导向导航路径规划模型[J].电子测量与仪器学报,2023,37(6):68-76 |
基于 Q 学习的生物启发式目标导向导航路径规划模型 |
Model of path planning in biological inspired goal-oriented navigation based on Q-learning |
|
DOI: |
中文关键词: Q 学习 位置细胞 认知图 路径规划 目标导向导航 仿生导航 |
英文关键词:Q-learning place cells cognitive map path planning goal-oriented navigation bionic navigation |
基金项目:国家自然科学基金(61973314)项目资助 |
|
|
摘要点击次数: 667 |
全文下载次数: 1010 |
中文摘要: |
为解决未知环境中移动机器人面向目标运行时最优路径获取问题,本文提出一种基于 Q 学习的生物启发式目标导向
导航路径规划模型。 该模型包括基于 Q 学习的空间探索、基于认知图运行控制和最优路径选择 3 部分。 首先,在空间探索中,
通过位置细胞的放电情况表征位置状态,采用 ε 动态取值方式进行状态-动作学习,生成认知图,并给出空间探索阶段最优路
径。 其次,在基于认知图的运行控制中,分别依据最大动作细胞放电率原则和群体动作细胞原则进行运行方向选择,采用多尺
度位置更新间距进行位置更新,得到不同认知图下最优路径。 最后,对比分析空间探索阶段和运行控制阶段路径规划结果,选
取最优路径。 仿真结果表明,所提模型可行,采用 ε 动态取值方式进行空间探索可得到较好的路径规划结果;运行体在充分的
空间探索后,可提供可行、有效的面向目标运行的路径。 |
英文摘要: |
To solve the problem of obtaining the optimal path for mobile robots running during goal-oriented running in an unknown
environment, a path planning model in biological inspired goal-oriented navigation based on Q-learning is proposed in this paper. The
model includes three parts: Spatial exploration based on Q-learning, running control based on cognitive map and optimal path selection.
Firstly, in space exploration, the location state is represented by place cells’ firing statues, and the state-action is learned by using
dynamic ε value, which can generate cognitive map and provide the optimal path in space exploration stage. Secondly, in the running
control based on cognitive map, the running direction is selected respectively according to the principle of maximum action cells’ firing
and the principle of group action cells, and the multi-scale position update intervals are used to update the position. As a result, the
optimal path based on different cognitive maps can be obtained. Finally, path planning’s result from space exploration stage and running
control stage is compared, and the optimal path is selected. Simulation results show that the proposed model is feasible. A better path
planning result can be obtained by using the dynamic ε value in space exploration. Besides, a feasible and effective path can be provided
for goal-oriented running after sufficient space exploration. |
查看全文 查看/发表评论 下载PDF阅读器 |
|
|
|