Abstract:To enable fruit picking robots that accurately detect target in complex conditions such as leaf covering and variances of fruit sizes, etc. , improved YOLO( you only look once) model and NMS ( non-maximum suppression) algorithm are proposed. First, the traditional YOLO deep convolutional neural network architecture is upgraded. A more fine-grained SPP5(spatial pyramid pooling)feature fusion network module is generated to enhance the integration of multiple sensory field information in feature maps, based on which a YOLOv4-SPP2-5 model is proposed. The SPP layer is added and improved in the standard YOLOv4 network across layers and redistributed the pooling kernel size and to enlarge perceptual field range, thus decreasing the false detection rate. Moreover, an improved Greedy-Confluence NMS algorithm is proposed. Through direct suppression of high-proximity detection boxes and comprehensive consideration of Distance-Intersection over Union (DIOU) and weighted proximity (WP) for overlapping detection boxes, the computational consumption of NMS was balanced and the error suppression of detection boxes was reduced, so as to improve the detection accuracy of occlusion and overlapping objects. Finally, performance tests are conducted to verify the feasibility of the method, followed by format converting and annotation labelling with fruit training datasets. The training datasets are expanded via data augmentation techniques and the K-means ++ clustering approach is utilized to obtain a priori anchor frames, and the fruit detection experiments are carried out on a computer. The results demonstrate that the improved YOLO network and NMS algorithm-based approach significantly increase the accuracy rate of fruit detection. The mean average precision (MAP) reaches 96. 65% at YOLOv4, which is 1. 70% higher than the previous network. Real-time performance is also guaranteed, hitting 39. 26 frames per second on the test device.