Abstract:Strong supervisory recognition algorithm requires a large amount of annotation information and consumes a lot of manpower and material resources. In order to solve the above problems and meet the practical requirements, two image recognition methods based on weak supervisory information are proposed for finegrained vision classification (FGVC). One is the combination of ResNet and Inception network, which improves the ability of capturing finegrained features by optimizing the network structure of convolutional neural network. The other is to improve the Bilinear CNN model, feature extractor selects Inceptionv3 module and Inceptionv4 module proposed by Google, and finally gathers different local features for classification. The experimental results on CUB200-2011 and Stanford Cars finegrained image datasets show that the proposed method achieves classification accuracy of 883% and 942% on the two data sets, and achieves better classification performance.