贾林锋,吴黎明,温腾腾,廖禹韬,高梓皓.多尺度卷积的时频域语音分离方法研究[J].电子测量与仪器学报,2022,36(11):134-140
多尺度卷积的时频域语音分离方法研究
Speech separation in time-and-frequency domainbased on multi-scale convolution
  
DOI:
中文关键词:  语音分离  特征融合  多尺度卷积  时频域特征
英文关键词:speech separation  feature fusion  multiscale convolution  time-frequency domain characteristics
基金项目:国家自然科学基金(61705045)、佛山广工大研究院创新创业人才团队计划项目 (20191108)资助
作者单位
贾林锋 1.广东工业大学机电工程学院 
吴黎明 1.广东工业大学机电工程学院 
温腾腾 1.广东工业大学机电工程学院 
廖禹韬 1.广东工业大学机电工程学院 
高梓皓 1.广东工业大学机电工程学院 
AuthorInstitution
Jia Linfeng 1.School of Electromechanical Engineering, Guangdong University of Technology 
Wu Liming 1.School of Electromechanical Engineering, Guangdong University of Technology 
Wen Tengteng 1.School of Electromechanical Engineering, Guangdong University of Technology 
Liao Yutao 1.School of Electromechanical Engineering, Guangdong University of Technology 
Gao Zihao 1.School of Electromechanical Engineering, Guangdong University of Technology 
摘要点击次数: 1393
全文下载次数: 1561
中文摘要:
      在进行混合语音分离时,信号时域特征的深度学习语音分离性能优于频域特征。 但目前时域特征的语音分离方法在真 实噪声环境下的鲁棒性较差,且单一时域特征对分离模型的性能存在局限性。 因此,提出一种基于 Conv-TasNet 网络的多特征 语音分离方法,融合频域特征与时域特征,提高数据的多维信息。 为了进一步提高分离网络性能,引入多尺度卷积块,提高网络 对特征的提取能力。 在包含真实噪声的实验环境下,所提方法与 Conv-TasNet 模型和最新的时频域融合语音分离基线模型相 比,性能分别提高了 0. 91 和 0. 52 dB,有效提升了语音分离的性能及鲁棒性。
英文摘要:
      In mixed speech separation, the performance of signal time-domain features is better than that of frequency-domain features. However, the current speech separation methods based on time domain feature have poor robustness in real noise environment, and single time domain feature has limitations on the performance of the separation model. Therefore, a multi-feature speech separation method based on Conv-TasNet network is proposed, which integrates frequency domain features and time domain features to improve multidimensional information of data. In order to further improve the performance of separation network, multi-scale convolution block is introduced to improve the feature extraction ability of network. Compared with the Conv-TasNet model and the latest time-frequency fusion speech separation baseline model, the performance and robustness of the proposed method are improved by 0. 91 and 0. 52 dB respectively in the experimental environment containing real noise.
查看全文  查看/发表评论  下载PDF阅读器