Abstract:Tongue diagnosis in traditional Chinese medicine (TCM) judges the deficiency and strength of internal organs as well as the vitality of functions by observing tongue features. It has the advantages of being non-invasive and convenient. Accompanied by the rapid development and wide application of computer vision technology, it is crucial to develop a model that can perform automatic detection, extraction and recognition of tongue features. Toward demands for digital tongue diagnosis in traditional Chinese medicine clinic and health monitoring, an automatic detection model for tongue tooth mark and fissure features was proposed based on improved RetinaNet. The SimPSA-ResNet and SimSPPF module were introduced into the backbone of RetinaNet to enhance the feature extraction capability and robustness of the network. Meanwhile, the multi-level feature pyramid network structure was improved to ensure that the model can better integrate information from different scales, thereby focusing more accurately on the key information pertinent to tongue features. Finally, to further streamline the model’s output, redundant output feature layers were eliminated and integrated with the Attention-guided Spatial Feature Fusion structure. This step helps retain important features while improving the utilization of information within the network. The improved RetinaNet model was trained and predicted by using the self-built tongue image dataset, and the mean average precision(mAP) reaches 94.37%, which is 2.77% higher than that of the original algorithm. Experimental results conclusively demonstrate that the improved RetinaNet model can effectively elevate the detection accuracy of tongue tooth mark and fissure features. This advancement holds tremendous potential for facilitating daily self-examination, health management and assisting doctors in diagnosis.