STOCHASTIC NEIGHBOR EMBEDDING SHORT TEXT CLUSTERING IMPROVED BY EXPONENTIAL FUNCTION
-
Graphical Abstract
-
Abstract
In recent years, deep learning has played an important role on the short text clustering. The short text clustering algorithm (STC) proposed recently has achieved good results in this field. In order to further improve the clustering accuracy and optimize the performance of algorithm, an improved stochastic neighbor embedding algorithm based on exponential function (e-STC) is proposed. This algorithm magnified the difference between different features by using exponential function to calculate the gap between sample points and clustering center. In the later stage, K-Means+KG-*3+ algorithm was used to determine the clustering center and clustering number in advance. The results of experiments on Stackoverflow dataset show that e-STC algorithm is superior to the original STC algorithm in terms of the accuracy and the normalized mutual information metric. The accuracy is improved by 3.2%, and the normalized mutual information is increased by 2.9% relatively.
-
-