基于信息融合的抗噪视听语音识别

ANTI-NOISE SPEECH RECOGNITION BASED ON INFORMATION FUSION

  • 摘要: 针对噪声环境下,基于单模态音频信息的连续语音识别算法抗噪能力较差问题,提出一种基于信息融合的抗噪视听语音识别模型。该网络模型利用注意力机制自主学习音频流和视频流间的对应关系,并通过对从音视频流中所提取的特征进行融合来补充单一模态所缺失的信息,以达到信息间的互补,提高信息利用率,增强鲁棒性。算法效果在 LRS2 数据集上进行验证,结果表明在不同信噪比的加噪环境下,该算法的识别词错误率较其他多个基准模型能取得更优的效果。

     

    Abstract: To improve the anti-noise capacity of single audio information-based continuous speech recognition approaches in a noisy environment, we propose an information fusion-based anti-noise audio-visual speech recognition (AAVSR) model. The proposed AAVSR model utilized the attention mechanism to learn the correspondences between the audio and video streams autonomously. Based on the learned correspondences, the features extracted from audio and video streams were fused to supplement the missing information of each individual modality. The fused complementary information improved information utilization and enhanced the robust recognition ability of the AAVSR model. Comprehensive simulation results on the LRS2 dataset demonstrate that AAVSR outperforms the other competing models in terms of the word error rate under a noisy environment with various signal-to-noise ratios.

     

/

返回文章
返回