基于稀疏因子与非共享近邻的密度峰值聚类算法

A DENSITY PEAKING CLUSTERING ALGORITHM BASED ON SPARSITY FACTOR AND NON-SHARED NEIGHBORS

  • 摘要: 对于密度分布不均匀的数据集,密度峰值聚类算法(DPC)在确定聚类中心和分配数据点时容易出错。为解决上述问题,提出一种基于稀疏因子和非共享近邻的聚类算法。根据数据点的稀疏因子动态调整其截断距离,利用测地距离计算数据点的局部密度,使得聚类中心受数据集稀疏分布的影响较小;根据数据点的相对非共享近邻,计算聚类中心所在路径上相关联点对的不一致因子;删除最小生成树上最大不一致因子所对应的边,得到聚类结果。实验结果表明,该算法的性能优于对比算法。

     

    Abstract: For dataset with uneven density distribution, the density peak clustering algorithm (DPC) is error-prone in determining the cluster centers and assigning data points. To solve the above issues, this paper proposes a clustering algorithm based on sparsity factor and non-shared neighbors. The cutoff distance of data point was dynamically adjusted according to its sparsity factor, and the geodesic distance was used to calculate the local density of data point so that the clustering center was less affected by the sparse distribution of the data set. The inconsistency factor of the associated point pairs on the paths was calculated where the clustering centers were located based on the non-shared nearest neighbors of the data points. The clustering results were obtained by removing the edges corresponding to the largest inconsistency factor on the minimum spanning tree. The experimental results show that the proposed algorithm outperforms the comparison algorithm.

     

/

返回文章
返回