Abstract:
To realize real-time far-field speech recognition, a robust far-field speech recognition method based on frame by frame iterative power method is proposed. In the maximum likelihood distortion free response beamforming framework, the variance-weighted spatial covariance matrix of the speech signal with observation noise was estimated by iterative update rules. The steering vector estimation and beamforming were iteratively calculated frame by frame, which were simultaneously used for beamforming and reverberation, and the power method was introduced to achieve online processing. In addition, the trained neural network could improve the zero mean Gaussian distribution with time-varying variance. Experiments on several datasets show the effectiveness of the proposed method.