电子测量与仪器学报

贾林锋,吴黎明,温腾腾,廖禹韬,高梓皓.多尺度卷积的时频域语音分离方法研究[J].电子测量与仪器学报,2022,36(11):134-140

多尺度卷积的时频域语音分离方法研究

Speech separation in time-and-frequency domainbased on multi-scale convolution

DOI：

英文关键词:speech separation feature fusion multiscale convolution time-frequency domain characteristics

基金项目:国家自然科学基金(61705045)、佛山广工大研究院创新创业人才团队计划项目 (20191108)资助

作者	单位
贾林锋	1.广东工业大学机电工程学院
吴黎明	1.广东工业大学机电工程学院
温腾腾	1.广东工业大学机电工程学院
廖禹韬	1.广东工业大学机电工程学院
高梓皓	1.广东工业大学机电工程学院

Author	Institution
Jia Linfeng	1.School of Electromechanical Engineering, Guangdong University of Technology
Wu Liming	1.School of Electromechanical Engineering, Guangdong University of Technology
Wen Tengteng	1.School of Electromechanical Engineering, Guangdong University of Technology
Liao Yutao	1.School of Electromechanical Engineering, Guangdong University of Technology
Gao Zihao	1.School of Electromechanical Engineering, Guangdong University of Technology

摘要点击次数: 1729

全文下载次数: 2803

中文摘要:

在进行混合语音分离时,信号时域特征的深度学习语音分离性能优于频域特征。但目前时域特征的语音分离方法在真实噪声环境下的鲁棒性较差,且单一时域特征对分离模型的性能存在局限性。因此,提出一种基于 Conv-TasNet 网络的多特征语音分离方法,融合频域特征与时域特征,提高数据的多维信息。为了进一步提高分离网络性能,引入多尺度卷积块,提高网络对特征的提取能力。在包含真实噪声的实验环境下,所提方法与 Conv-TasNet 模型和最新的时频域融合语音分离基线模型相比,性能分别提高了 0. 91 和 0. 52 dB,有效提升了语音分离的性能及鲁棒性。

英文摘要:

In mixed speech separation, the performance of signal time-domain features is better than that of frequency-domain features. However, the current speech separation methods based on time domain feature have poor robustness in real noise environment, and single time domain feature has limitations on the performance of the separation model. Therefore, a multi-feature speech separation method based on Conv-TasNet network is proposed, which integrates frequency domain features and time domain features to improve multidimensional information of data. In order to further improve the performance of separation network, multi-scale convolution block is introduced to improve the feature extraction ability of network. Compared with the Conv-TasNet model and the latest time-frequency fusion speech separation baseline model, the performance and robustness of the proposed method are improved by 0. 91 and 0. 52 dB respectively in the experimental environment containing real noise.

查看全文查看/发表评论下载PDF阅读器