基于局部相对密度的离群点检测算法

A LOCAL RELATIVE DENSITY-BASED APPROACH FOR OUTLIER DETECTION

  • 摘要: 数据集中离群点占比很小,但大多现有的方法在检测期间需要对所有数据都进行离群度计算。针对此问题提出一种基于互近邻聚类的正常数据去除算法(EMNC),通过数据预处理最大程度消除正常点。只考虑k最近邻不适用分布异常的离群点,充分利用对象与其邻居的分布,同时考虑k最近邻、反近邻和共享近邻来进行密度估计。最后重新定义基于局部相对密度的离群度(ROF)对剩余可疑点进行离群判断。该算法在减少离群度计算量的同时提升了检测效率,在合成与真实数据集上和其他方法的对比实验结果表明了算法的有效性。

     

    Abstract: The proportion of outliers in the data set is very small, but the existing methods have to calculate the outliers of all the data during the outlier detection. To solve this problem, a normal data elimination algorithm based on MNN clustering (EMNC) is proposed, which preprocesses the data to eliminate normal points to the greatest extent. The density outlier detection algorithm that only considers k nearest neighbors cannot well adapt to outliers with abnormal data distribution. This algorithm made full use of the distribution of objects and their neighbors, and meanwhile considers k nearest neighbors, inverse nearest neighbors and shared nearest neighbors to estimate the density. A local relative density-based outlier factor (ROF) was redefined to evaluate the rest outlier of doubtful points. The ROF algorithm not only reduced the amount of data needed to calculate the local outlier, but also improved the detection efficiency. Experimental results on synthetic and real datasets show the effectiveness of the ROF algorithm compared with other methods.

     

/

返回文章
返回