基于逐帧迭代幂次法鲁棒远场语音识别

ROBUST FAR-FIELD SPEECH RECOGNITION BASED ON FRAME BY FRAME ITERATIVE POWER

摘要: 为实现实时远场语音识别，提出一种基于逐帧迭代幂次法的鲁棒远场语音识别方法。在最大似然无失真响应波束形成框架中通过迭代更新规则获得观测噪声语音信号的方差加权空间协方差矩阵估计。逐帧将转向矢量估计和波束形成进行迭代计算，同时用于波束形成和去混响，并且引入幂次法实现在线处理。经过训练的神经网络能够改进具有时变方差的零均值高斯分布。通过多个数据集实验证明了该方法的有效性。

Abstract: To realize real-time far-field speech recognition, a robust far-field speech recognition method based on frame by frame iterative power method is proposed. In the maximum likelihood distortion free response beamforming framework, the variance-weighted spatial covariance matrix of the speech signal with observation noise was estimated by iterative update rules. The steering vector estimation and beamforming were iteratively calculated frame by frame, which were simultaneously used for beamforming and reverberation, and the power method was introduced to achieve online processing. In addition, the trained neural network could improve the zero mean Gaussian distribution with time-varying variance. Experiments on several datasets show the effectiveness of the proposed method.