Abstract:
Currently, most of the cross-language summarization tasks rely on machine translation. However, on account of the poor translation performance for low-resource languages like Vietnamese, the Chinese-Vietnamese cross-language summarization faces the problem of bilingual semantic alignment under data scarcity. Aiming at this problem, this paper proposes a method for Chinese-Vietnamese cross-language summarization generation guided by multi-type word information. The explicit keyword information was used as a guidance to encode the important information from the source text. According to the external Chinese-Vietnamese bilingual probability dictionary, its word alignment information was utilized for guiding the bilingual alignment of key information. Based on the pointer-generator network, two types of word information were applied for the summarization generation process. Experiments on the constructed Chinese-Vietnamese cross-language summarization data set demonstrate that this model can effectively improve the quality of cross-language summarization generation.