Abstract:To address the problems of background interference and limited dataset size, we propose a method for locating the region of interest in thyroid ultrasound images. The method utilizes an attention mechanism based on cross-scale attention interaction strategy to improve the fusion efficiency of hierarchical features in the localization model. The feature network of the localization model is enhanced through knowledge distillation to solve the problem of overfitting. A t-mask is designed based on the statistical distribution of anatomical thyroid morphology, and a joint attention mask is calculated to guide the network in learning key channels and pixel information of thyroid ultrasound images, thereby achieving the localization of the region of interest. Experimental results demonstrate that the average precision (AP) for thyroid ultrasound image region of interest localization reaches 92. 7% when the IoU threshold is set to 0. 5, which is clinically significant and valuable for assisting doctors in diagnosing thyroid diseases.