特征融合的密集连接卷积网络识别鸟鸣声
DOI:
CSTR:
作者:
作者单位:

1.南京信息工程大学电子与信息工程学院南京210044;2.南京信息工程大学江苏省大气环境与 装备技术协同创新中心南京210044

作者简介:

通讯作者:

中图分类号:

TN912.34

基金项目:


Birdsong recognition based on improved DenseNet with feature fusion
Author:
Affiliation:

1.School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China;2.Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对目前鸟鸣声识别的深度学习方法提取深层特征单一导致准确率不高的问题,提出一种改进密集连接卷积网络的鸟鸣声识别方法。从鸟鸣声信号中提取梅尔语谱图作为输入,在所有密集块的标准卷积层之后添加卷积块注意力模块,卷积块注意力模块通过学习训练集的特征表示,判断不同层次鸟鸣声特征信息的重要性和关联性,并按照通道维度和空间维度对其进行更深一步的加权融合,使网络更加关注鸟鸣声特征中重要的特征通道和空间位置,从而提高网络学习鸟鸣声特征的能力;在密集块的标准卷积层之后添加丢弃块算法,促使网络对于不同区域的特征进行更加均衡的学习,提高网络对于新鸟鸣声数据的适应能力,使网络能够更好地捕获数据中的共性特征;再利用Transformer编码器为网络建立一条深层特征提取分支,以提高对于鸟鸣声特征中全局信息和长距离依赖信息的捕捉能力。最后将两个分支提取的深层特征融合以提升深层特征的信息丰富度。该方法在Xeno-Canto数据集进行了7组实验。实验结果表明方法对鸟鸣声识别的平均准确率为88.65%。相较于EMSCNN(ensemble multi-scale convolutional neural network)方法高10.83%,AlexNet方法高20.14%,VGGNet方法高16.3%,DenseNet方法高4.28%。实验证明了方法的有效性和先进性。提出的方法对鸟鸣声识别更准确,可用于实际鸟鸣声的识别。

    Abstract:

    To address the issue of low accuracy caused by the single extraction of deep features in current bird sound recognition methods, this study proposed a DenseNet based bird sound recognition method with feature fusion. First, the Mel-spectrogram was extracted from bird sound signals as the network input. Then, DenseNet was used as the base network, and convolutional block attention module was integrated into the standard convolutional layer of all dense blocks dense blocks. The convolutional block attention module learns the feature representation of training set, determines the importance and correlation of different levels of bird song feature information, and further weights and fuses them according to channel and spatial dimensions, making the network pay more attention to the important feature channels and spatial positions in bird song features. Then, adding dropout block algorithm after the standard convolutional layer of dense blocks promotes the network to learn features from different regions in a more balanced manner, improves the network’s adaptability to new bird song data, and enables the network to better capture common features in the data. Subsequently, a deep feature extraction branch using transformer encoder was established for DenseNet to enhance the network’s ability to capture global information and long-distance dependencies in birdsong features. Finally, the deep features extracted by the two branches are fused to enrich the information content of the deep features. This method was tested in seven sets on the Xeno-Canto data set. Experimental results on the test data set show that the proposed method achieves an average accuracy of 88.65%, which is 10.83% higher than the EMSCNN method, 20.14% higher than the AlexNet method, 16.3% higher than the VGGNet method, and 4.28% higher than DenseNet. The experiment proved the effectiveness and progressiveness of the proposed method. It outperforms other comparative deep learning methods in terms of recognition performance and effectiveness.

    参考文献
    相似文献
    引证文献
引用本文

陈晓,颜灏,曾昭优.特征融合的密集连接卷积网络识别鸟鸣声[J].电子测量与仪器学报,2025,39(5):241-250

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-07-04
  • 出版日期:
文章二维码