面向网络文章的质量检测模型

CONTENT QUALITY DETECTION MODEL FOR WEB ARTICLES

  • 摘要: 互联网中存在大量良莠不齐的文章,严重破坏网络生态,为构建绿色网络空间,网络文章质量检测是一项重要且崭新的工作。基于腾讯数据集,从文章组织特征、书写特征和语义特征三个维度对文章质量检测展开研究,构建了组织子网、特征子网和文本子网三个子网络,扩展了三种注意力模式和四种Transformer模式,其中采用CNN+BiGRU、Attention+ACNN、Transformer模型Ⅰ使三个子网络的分类准确率分别达到80.6%、87%和92.9%,并使三个子网的组合模型OFT模型框架的分类准确率达到93.3%。此外,针对文本数据采用两种方式获取BERT词向量,最终OFT的准确率达到94.2%。实验结果表明,该模型效果优于现有模型。

     

    Abstract: The existence of a large number of articles of mixed quality in the Internet has seriously damaged the network ecology. In order to build a green cyberspace, online article quality detection is an important and new task. Based on the Tencent dataset, we investigated article quality detection in three dimensions: article organization features, writing features and semantic features, and three sub-networks: organization sub-network, feature sub-network and text sub-network were built. Three attention models and four Transformer models were extended, in which CNN+BiGRU, Attention+ACNN, Transformer model I were used to make the classification accuracy of the three sub-networks reach 80.6%, 87%, and 92.9%, respectively. The classification accuracy of the combined model OFT model framework of the three subnetworks reaches 93.3%. In addition, two methods were used to obtain BERT word vectors for text data, the final OFT's accuracy reaches 94.2%. The experimental results show that the proposed model outperforms the existing methods.

     

/

返回文章
返回