CONTENT QUALITY DETECTION MODEL FOR WEB ARTICLES
-
Graphical Abstract
-
Abstract
The existence of a large number of articles of mixed quality in the Internet has seriously damaged the network ecology. In order to build a green cyberspace, online article quality detection is an important and new task. Based on the Tencent dataset, we investigated article quality detection in three dimensions: article organization features, writing features and semantic features, and three sub-networks: organization sub-network, feature sub-network and text sub-network were built. Three attention models and four Transformer models were extended, in which CNN+BiGRU, Attention+ACNN, Transformer model I were used to make the classification accuracy of the three sub-networks reach 80.6%, 87%, and 92.9%, respectively. The classification accuracy of the combined model OFT model framework of the three subnetworks reaches 93.3%. In addition, two methods were used to obtain BERT word vectors for text data, the final OFT's accuracy reaches 94.2%. The experimental results show that the proposed model outperforms the existing methods.
-
-