Abstract:In mixed speech separation, the performance of signal time-domain features is better than that of frequency-domain features. However, the current speech separation methods based on time domain feature have poor robustness in real noise environment, and single time domain feature has limitations on the performance of the separation model. Therefore, a multi-feature speech separation method based on Conv-TasNet network is proposed, which integrates frequency domain features and time domain features to improve multidimensional information of data. In order to further improve the performance of separation network, multi-scale convolution block is introduced to improve the feature extraction ability of network. Compared with the Conv-TasNet model and the latest time-frequency fusion speech separation baseline model, the performance and robustness of the proposed method are improved by 0. 91 and 0. 52 dB respectively in the experimental environment containing real noise.