• 中国科技论文统计源期刊(中国科技核心期刊)
  • 中国科学引文数据库(CSCD)来源期刊(2015-2016)
  • 万方数据-数字化期刊群全文收录期刊
  • 美国《乌利希国际期刊指南》收录期刊
  • 全国中文核心期刊(2023)
  • 中国学术期刊综合评价数据库来源期刊
  • 中文科技期刊数据库(全文版)收录期刊
  • 美国《剑桥科学文摘》收录期刊

基于局部相对密度的离群点检测算法

何旭, 邓安生, 葛小龙

何旭, 邓安生, 葛小龙. 基于局部相对密度的离群点检测算法[J]. 计算机应用与软件, 2024, 41(12): 296-302. DOI: 10.3969/j.issn.1000-386x.2024.12.042
引用本文: 何旭, 邓安生, 葛小龙. 基于局部相对密度的离群点检测算法[J]. 计算机应用与软件, 2024, 41(12): 296-302. DOI: 10.3969/j.issn.1000-386x.2024.12.042
He Xu, Deng Ansheng, Ge Xiaolong. A LOCAL RELATIVE DENSITY-BASED APPROACH FOR OUTLIER DETECTION[J]. Computer Applications and Software, 2024, 41(12): 296-302. DOI: 10.3969/j.issn.1000-386x.2024.12.042
Citation: He Xu, Deng Ansheng, Ge Xiaolong. A LOCAL RELATIVE DENSITY-BASED APPROACH FOR OUTLIER DETECTION[J]. Computer Applications and Software, 2024, 41(12): 296-302. DOI: 10.3969/j.issn.1000-386x.2024.12.042

基于局部相对密度的离群点检测算法

详细信息
    作者简介:

    何旭,硕士生,主研领域:机器学习,智能信息处理。邓安生,教授。葛小龙,硕士生。

  • 中图分类号: TP391

A LOCAL RELATIVE DENSITY-BASED APPROACH FOR OUTLIER DETECTION

  • 摘要: 数据集中离群点占比很小,但大多现有的方法在检测期间需要对所有数据都进行离群度计算。针对此问题提出一种基于互近邻聚类的正常数据去除算法(EMNC),通过数据预处理最大程度消除正常点。只考虑k最近邻不适用分布异常的离群点,充分利用对象与其邻居的分布,同时考虑k最近邻、反近邻和共享近邻来进行密度估计。最后重新定义基于局部相对密度的离群度(ROF)对剩余可疑点进行离群判断。该算法在减少离群度计算量的同时提升了检测效率,在合成与真实数据集上和其他方法的对比实验结果表明了算法的有效性。
    Abstract: The proportion of outliers in the data set is very small, but the existing methods have to calculate the outliers of all the data during the outlier detection. To solve this problem, a normal data elimination algorithm based on MNN clustering (EMNC) is proposed, which preprocesses the data to eliminate normal points to the greatest extent. The density outlier detection algorithm that only considers k nearest neighbors cannot well adapt to outliers with abnormal data distribution. This algorithm made full use of the distribution of objects and their neighbors, and meanwhile considers k nearest neighbors, inverse nearest neighbors and shared nearest neighbors to estimate the density. A local relative density-based outlier factor (ROF) was redefined to evaluate the rest outlier of doubtful points. The ROF algorithm not only reduced the amount of data needed to calculate the local outlier, but also improved the detection efficiency. Experimental results on synthetic and real datasets show the effectiveness of the ROF algorithm compared with other methods.
计量
  • 文章访问数:  2
  • HTML全文浏览量:  0
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-07-28

目录

    /

    返回文章
    返回