Speech enhancement based on residual dilatation convolutional and gated codec networks
DOI:
CSTR:
Author:
Affiliation:

1.School of Computer Science and Technology, Shandong University of Technology, Zibo 255049, China; 2.School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo 255049, China

Clc Number:

TN912.35

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The time-dependent features and context information of speech signals are crucial in speech enhancement tasks. Aiming at the problem that codec networks insufficiently capture these features, resulting in poor enhancement performance, an asymmetric residual dilatation convolutional and gated codec network (RD-EGN) is constructed. The network comprised three parts: the encoder, intermediate layer and decoder. The encoder designed a causal convolution layer structure to model the temporal feature, capture the features of different layers in the speech sequence and maintain the speech signal’s causality. The intermediate layer incorporated a residual dilated convolutional network (RDCN), which integrated dilated convolution, residual connections, and cascaded expansion blocks to endow the network with a larger receptive field. It facilitated cross-layer information transfer and extracted long-term dependency features in speech. The RDCN is combined with the long short-term memory network to capture broader context information. The decoder introduced a gating mechanism to adjust the gating degree of information flow dynamically, obtain richer global features and reconstruct enhanced speech. Ablation and performance comparison experiments were conducted on the TIMIT,UrbanSound8k,VoiceBank,and NOISE92 datasets. The results show that, RD-EGN has fewer training parameters and higher scores in SSNR and subjective evaluation metrics (CSIG, CBAK, and COVL) than CRN, AECNN and DDAEC. In objective evaluation metrics, the PESQ is increased by 2.5% to 7.1%,and the STOI is increased by1% to 5.3%. RD-EGN demonstrates outstanding enhancement performance and generalization ability.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:
  • Revised:
  • Adopted:
  • Online: June 10,2025
  • Published:
Article QR Code