论文部分内容阅读
摘 要: 为提高语音识别系统在复杂声学场景下的识别率,出现了以单通道语音增强(Monaural Speech Enhancement)技术作为前端处理的鲁棒语音识别系统。尽管现有的单通道语音增强技术能够提高混响干扰下的识别率,却未能显著提升宽带非平稳噪声干扰下的系统识别率。为此,本文提出基于听觉掩蔽生成对抗网络的单通道增强方法,通过听觉掩蔽增强模型和判别器构成的对抗过程,来使增强后的语音特征满足目标语音的概率分布。实验结果表明,就语音识别率而言,所提出的听觉掩蔽生成对抗网络超越了现有的增强方法,能够相对减少19.50%的词错误率,显著提升语音识别系统的噪声鲁棒性。
关键词: 听觉掩蔽;生成对抗网络;单通道语音增强;鲁棒语音识别
文章编号: 2095-2163(2021)03-0209-06 中图分类号:TP183 文献标志码:A
【Abstract】To improve the accuracy of speech recognition system in the complex acoustic scene, monaural speech enhancement method is involved into the robust automatic speech recognition (ASR) system as a front-end processing. Although monaural speech enhancement has improved the recognition performance under the reverberant conditions, it failed to improve the accuracy of speeches interrupted by the wide-band non-stationary noises. To overcome this problem, the paper proposes the adversarial generative network based on auditory masking for monaural speech enhancement. Through the adversarial process between a discriminator and a masking-based enhancement model, the proposed method can make the enhanced speech features follow the distribution of target speeches. Experimental results show that,
关键词: 听觉掩蔽;生成对抗网络;单通道语音增强;鲁棒语音识别
文章编号: 2095-2163(2021)03-0209-06 中图分类号:TP183 文献标志码:A
【Abstract】To improve the accuracy of speech recognition system in the complex acoustic scene, monaural speech enhancement method is involved into the robust automatic speech recognition (ASR) system as a front-end processing. Although monaural speech enhancement has improved the recognition performance under the reverberant conditions, it failed to improve the accuracy of speeches interrupted by the wide-band non-stationary noises. To overcome this problem, the paper proposes the adversarial generative network based on auditory masking for monaural speech enhancement. Through the adversarial process between a discriminator and a masking-based enhancement model, the proposed method can make the enhanced speech features follow the distribution of target speeches. Experimental results show that,