辅助短语音条件下说话人确认系统的seq2seq模型

THE SEQ2SEQ MODEL FOR ASSISTING THE SPEAKER VERIFICATION ON SHORT UTTERANCES

  • 摘要: 文本无关的说话人确认系统使用的测试语言时长越短效果越差。针对这种情况,提出增强声学特征的方法。使用基于seq2seq(Sequence to Sequence)的生成模型将短时声学特征生成更长的特征,其中,编码器用于提取深层特征,解码器输出声学特征,使用注意力机制来获取序列之间的关系。在训练时加入余弦距离损失来提升生成模型的泛化性能,将训练好的说话人确认模型作为生成模型训练架构的组件。实验结果表明,在1~3s语言时长下,采用该模型后等错误率平均降低7.78%。

     

    Abstract: The text-independent speaker verification system is less effective when the test utterance is shorter. In view of this, a method of enhancing acoustic features is proposed to assist the system. The method used a generation model based on seq2seq to generate longer acoustic features from short-term acoustic features. The generation model included an encoder for extracting deep features and a decoder for outputting acoustic features. It used an attention mechanism to obtain the relationship between sequences and added cosine distance loss to improve the generalization performance of the generation model during training. The trained text-independent speaker verification model was used as a component of the generation model training architecture to help the generation model training. The experimental results show that under the condition of 1-3 seconds of speech duration, the equal error rate of the system is reduced by 7.78% on average after using this model.

     

/

返回文章
返回