Abstract:Aiming at the problem of poor target detection of operating personnel and equipment due to high dust, low illumination, human-machine multi-target mixing and cross-scale changes in the complex operation scene of coal mine underground, we propose a multi-scene key target detection method based on machine vision for coal mine. Firstly, the YOLOv5s algorithm is optimised using CGNet (context guided network) feature extraction module, SlimNeck feature fusion module with Dyhead dynamic detection head in order to construct the YOLOv5s-CSD network model. Secondly, based on the self-constructed coal mine dataset, ablation experiments, comparison experiments and embedded detection experiments were carried out around the YOLOv5s-CSD model. The experimental results show that YOLOv5s-CSD achieves a detection accuracy of 91.0% in four complex operation scenarios of underground coal mine tunneling, anchor support, coal mining, and auxiliary transport, which is 3.5% higher than YOLOv5s algorithm, and compared with six mainstream target detection algorithms, such as YOLOv9s, YOLOv11s, and YOLOv12s, it has the moderate model complexity and the highest detection accuracy. On the experimental test platform, the real-time detection accuracy of YOLOv5s-CSD model for seven types of key targets, such as person, support, and electric locomotive, is above 90.0%, and its real-time detection speed is up to 38.6 frames/s, which is high in detection accuracy and real-time, and it can provide technical support for the visual dynamic perception of the complex environment of underground coal mines.