Abstract:
For automatic text summarization of long documents in Chinese, two models TRAI and TRAO, which integrate the Self-Attention with TextRank, are proposed. TRAI performed a weighted summation of sentence similarity based on the number of co-occurring words and sentence relevance based on Self-Attention, which was used as weight of the edge in TextRank to participate in iterative calculation to score the sentence. TRAO used TextRank to score sentences. Self-Attention was used to re-express the distributed vector of each sentence integrating the entire document information, and on this basis, cosine similarity between sentences was calculated as the weight of TextRank edges to participate in iterative calculation to score the sentence. The two scores were weighted and summed as the final score for each sentence. Both TRAI and TRAO sorted sentences based on scores to get candidate abstracts. In order to remove redundancy of abstracts, maximal marginal relevance (MMR) method was used to select abstract sentences from candidate abstracts. The two proposed models were tested on the constructed long documents. Compared with TextRank method, the proposed method has a significant improvement in ROUGE evaluation index.