电子测量与仪器学报

彭艺,唐剑,杨青青,李睿.基于强化学习的应急无人机通信中继选择策略[J].电子测量与仪器学报,2022,36(7):9-15

基于强化学习的应急无人机通信中继选择策略

Relay selection strategy for emergency UAV communicationbased on reinforcement learning

DOI：

英文关键词:UAV emergency communication relay selection SA-SARSA simulated annealing arithmetic

基金项目:国家自然科学基金（61761025）、云南计算机技术应用重点实验室开放基金项目资助

作者	单位
彭艺	1. 昆明理工大学信息工程与自动化学院,2. 昆明理工大学云南省计算机技术应用重点实验室
唐剑	1. 昆明理工大学信息工程与自动化学院
杨青青	1. 昆明理工大学信息工程与自动化学院,2. 昆明理工大学云南省计算机技术应用重点实验室
李睿	1. 昆明理工大学信息工程与自动化学院

Author	Institution
Peng Yi	1. School of Information Engineering and Automation, Kunming University of Science and Technology,2. Yunnan Key Laboratory of Computer Technologies Application, Kunming University of Science and Technology
Tang Jian	1. School of Information Engineering and Automation, Kunming University of Science and Technology
Yang Qingqing	1. School of Information Engineering and Automation, Kunming University of Science and Technology,2. Yunnan Key Laboratory of Computer Technologies Application, Kunming University of Science and Technology
Li Rui	1. School of Information Engineering and Automation, Kunming University of Science and Technology

摘要点击次数: 1711

全文下载次数: 2737

中文摘要:

在基站无法提供通信支撑区域,可引入无人机群作为中继节点来搭建无人机应急通信网络。针对无人机如何高效选择最优中继节点及保证系统吞吐量最大化问题,提出一种基于 SA-SARSA 强化学习算法的无人机中继选择策略。在所有的中继节点通过解码重传(DF)之后进行转发,得到用户端最大比合并后的平均吞吐量的表达式,通过设定 SARSA 算法的状态、动作、奖励函数,选择回报值最大的中继节点。同时,引入退火算法来使源节点探索到更多的中继节点,从而使无人机群通信网络性能达到最优状态。仿真结果表明,与改进前的 SARSA 中继选择策略相比,所提的 SA-SARSA 中继选择策略占理想算法比例提升 10%。同时,在相同总功率条件下,所提策略所选的中继节点的吞吐量比 Q-learning 中继选择策略、SARSA 中继选择策略分别提升了 8%、13%。

英文摘要:

A UAV group can be introduced as a relay node in the base station that cannot provide a communication support area to build a UAV emergency communication network. To solve the problem of how to efficiently select the optimal relay node and maximize the system throughput, we propose a relay selection strategy for UAVs based on the SA-SARSA reinforcement learning algorithm. After all the relay nodes are forwarded after decoding retransmission (DF), the expression of the maximum average throughput of the user is obtained. The relay node with the maximum return value is selected by setting the SARSA algorithm’ s state, action, and reward function. At the same time, the annealing algorithm is introduced to make the source node explore more relay nodes so that the performance of the UAV swarm communication network can reach the optimal state. The simulation results show that compared with the previous SARSA relay selection strategy, the proposed SA-SARSA relay selection strategy increases the proportion of the ideal algorithm by 10%. At the same time, under the same total power condition, the throughput of relay nodes selected by the proposed strategy is 8% and 13% higher than that of the Q-learning relay selection strategy and SARSA relay selection strategy, respectively.

查看全文查看/发表评论下载PDF阅读器