多任务学习型民航陆空通话语音识别Conformer模型

马广林; 任晋; 师一华; 张海刚; 王莉; 杨金锋

doi:10.3969/j.issn.1000-386x.2025.10.025

多任务学习型民航陆空通话语音识别Conformer模型

MULTI-TASK LEARNING SPEECH RECOGNITION MODEL OF CIVIL AVIATION RADIOTELEPHONY COMMUNICATION BASED ON CONFORMER MODEL

摘要

摘要: 民航陆空通话在用语发音、遣词造句和通话方式等方面具有显著行业特点,通用语音识别模型无法充分适配上述特点对陆空通话进行声学建模。针对上述问题,提出一种端到端的多任务学习型民航陆空通话语音识别Conformer模型。通过将卷积模块引入Transformer模型,Conformer模型在保留上下文长距离依赖关系的全局信息建模能力基础上,进一步增强了局部信息的捕获。同时联合连接时序分类(Connectionist Temporal Classification,CTC)和基于注意力的编码解码模型进行多任务学习以进一步提升其性能。实验结果表明,该方法能有效兼顾全局和局部信息的声学建模,在陆空通话数据集上将字符错误率和句错误率分别降低至1.98%和2.89%。

Abstract: The general speech recognition model cannot be effectively applied to the acoustic modeling of civil aviation radiotelephony communication due to its industrial characteristics in terms of pronunciation, diction and communication mode. Aiming at the above issues, this paper proposes an end-to-end multi-task learning speech recognition model of civil aviation radiotelephony communication based on Conformer model. By introducing convolution modules into Transformer model, Conformer model could further enhance local information acquisition while retaining the global information modeling capability of context long-distance dependencies. Meanwhile, the proposed model combined connectionist temporal classification (CTC) with attention-based Encoder-Decoder (AED) model for multi-task learning to further improve its performance. The experimental results demonstrate that the proposed method can effectively take into account both global and local information in acoustic modeling. The character error rate (CER) and sentence error rate (SER) on the land air communication dataset are reduced to 1.98% and 2.89%, respectively.

HTML全文

参考文献(0)

施引文献

资源附件(0)