Abstract:A UAV group can be introduced as a relay node in the base station that cannot provide a communication support area to build a UAV emergency communication network. To solve the problem of how to efficiently select the optimal relay node and maximize the system throughput, we propose a relay selection strategy for UAVs based on the SA-SARSA reinforcement learning algorithm. After all the relay nodes are forwarded after decoding retransmission (DF), the expression of the maximum average throughput of the user is obtained. The relay node with the maximum return value is selected by setting the SARSA algorithm’ s state, action, and reward function. At the same time, the annealing algorithm is introduced to make the source node explore more relay nodes so that the performance of the UAV swarm communication network can reach the optimal state. The simulation results show that compared with the previous SARSA relay selection strategy, the proposed SA-SARSA relay selection strategy increases the proportion of the ideal algorithm by 10%. At the same time, under the same total power condition, the throughput of relay nodes selected by the proposed strategy is 8% and 13% higher than that of the Q-learning relay selection strategy and SARSA relay selection strategy, respectively.