Abstract:With the continuing popularity of deep learning techniques, convolutional neural network (CNN) has become the main tool to solve the remote sensing image scene classification tasks. However, current research interests are highly focused on the topic of how to fuse multi-branch-based CNN and how to apply attention models. Despite that these approaches enhance the classification accuracy markedly; it leads to high computational complexity. In this paper, the above problems are addressed by means of introducing a modified loss function and designing a novel data augmentation strategy, which can significantly improve the classification performance of CNN without increasing the computational complexity. First, a stage-based focal loss function is presented to adaptively mining the hard sample during the training process. Second, a parallel training strategy is conducted to feed the original image samples and samples after Gridmask operation into the sharing CNN separately. Experimental results show that the proposed algorithm achieves 96. 72% and 93. 95% detection accuracy on two large-scale databases of AID and NWPU-RESISC45, respectively, and can significantly improve the performance of remote sensing image scene classification.