基于多类型词信息引导的双越跨语言摘要生成方法

A METHOD FOR CHINESE-VIETNAMESE CROSS-LANGUAGE SUMMARIZATION GENERATION GUIDED BY MULTI-TYPE WORD INFORMATION

  • 摘要: 当前跨语言摘要任务大多依赖于机器翻译,而越南语这类低资源语种翻译效果不佳,双越跨语言摘要面临着数据稀缺下双语语义对齐困难的问题。针对该问题,提出一种基于多类型词信息引导的双越跨语言摘要生成方法。利用显式的关键词信息引导对源文本重要信息的编码;利用外部双越双语概率词典中的词对齐信息,引导编解码器对关键信息的双语对齐;基于指针-生成网络,将两类词信息应用于越南语摘要的生成任务。在构建的双越跨语言摘要数据集上的实验结果表明,该模型可以有效提升跨语言摘要生成的质量。

     

    Abstract: Currently, most of the cross-language summarization tasks rely on machine translation. However, on account of the poor translation performance for low-resource languages like Vietnamese, the Chinese-Vietnamese cross-language summarization faces the problem of bilingual semantic alignment under data scarcity. Aiming at this problem, this paper proposes a method for Chinese-Vietnamese cross-language summarization generation guided by multi-type word information. The explicit keyword information was used as a guidance to encode the important information from the source text. According to the external Chinese-Vietnamese bilingual probability dictionary, its word alignment information was utilized for guiding the bilingual alignment of key information. Based on the pointer-generator network, two types of word information were applied for the summarization generation process. Experiments on the constructed Chinese-Vietnamese cross-language summarization data set demonstrate that this model can effectively improve the quality of cross-language summarization generation.

     

/

返回文章
返回