彭 艺,唐 剑,杨青青,李 睿.基于强化学习的应急无人机通信中继选择策略[J].电子测量与仪器学报,2022,36(7):9-15
基于强化学习的应急无人机通信中继选择策略
Relay selection strategy for emergency UAV communicationbased on reinforcement learning
  
DOI:
中文关键词:  应急通信  无人机  中继选择  SA-SARSA  吞吐量
英文关键词:UAV  emergency communication  relay selection  SA-SARSA  simulated annealing arithmetic
基金项目:国家自然科学基金(61761025)、云南计算机技术应用重点实验室开放基金项目资助
作者单位
彭 艺 1. 昆明理工大学信息工程与自动化学院,2. 昆明理工大学云南省计算机技术应用重点实验室 
唐 剑 1. 昆明理工大学信息工程与自动化学院 
杨青青 1. 昆明理工大学信息工程与自动化学院,2. 昆明理工大学云南省计算机技术应用重点实验室 
李 睿 1. 昆明理工大学信息工程与自动化学院 
AuthorInstitution
Peng Yi 1. School of Information Engineering and Automation, Kunming University of Science and Technology,2. Yunnan Key Laboratory of Computer Technologies Application, Kunming University of Science and Technology 
Tang Jian 1. School of Information Engineering and Automation, Kunming University of Science and Technology 
Yang Qingqing 1. School of Information Engineering and Automation, Kunming University of Science and Technology,2. Yunnan Key Laboratory of Computer Technologies Application, Kunming University of Science and Technology 
Li Rui 1. School of Information Engineering and Automation, Kunming University of Science and Technology 
摘要点击次数: 957
全文下载次数: 736
中文摘要:
      在基站无法提供通信支撑区域,可引入无人机群作为中继节点来搭建无人机应急通信网络。 针对无人机如何高效选择 最优中继节点及保证系统吞吐量最大化问题,提出一种基于 SA-SARSA 强化学习算法的无人机中继选择策略。 在所有的中继 节点通过解码重传(DF)之后进行转发,得到用户端最大比合并后的平均吞吐量的表达式,通过设定 SARSA 算法的状态、动作、 奖励函数,选择回报值最大的中继节点。 同时,引入退火算法来使源节点探索到更多的中继节点,从而使无人机群通信网络性 能达到最优状态。 仿真结果表明,与改进前的 SARSA 中继选择策略相比,所提的 SA-SARSA 中继选择策略占理想算法比例提 升 10%。 同时,在相同总功率条件下,所提策略所选的中继节点的吞吐量比 Q-learning 中继选择策略、SARSA 中继选择策略分 别提升了 8%、13%。
英文摘要:
      A UAV group can be introduced as a relay node in the base station that cannot provide a communication support area to build a UAV emergency communication network. To solve the problem of how to efficiently select the optimal relay node and maximize the system throughput, we propose a relay selection strategy for UAVs based on the SA-SARSA reinforcement learning algorithm. After all the relay nodes are forwarded after decoding retransmission (DF), the expression of the maximum average throughput of the user is obtained. The relay node with the maximum return value is selected by setting the SARSA algorithm’ s state, action, and reward function. At the same time, the annealing algorithm is introduced to make the source node explore more relay nodes so that the performance of the UAV swarm communication network can reach the optimal state. The simulation results show that compared with the previous SARSA relay selection strategy, the proposed SA-SARSA relay selection strategy increases the proportion of the ideal algorithm by 10%. At the same time, under the same total power condition, the throughput of relay nodes selected by the proposed strategy is 8% and 13% higher than that of the Q-learning relay selection strategy and SARSA relay selection strategy, respectively.
查看全文  查看/发表评论  下载PDF阅读器