Abstract:
The proportion of outliers in the data set is very small, but the existing methods have to calculate the outliers of all the data during the outlier detection. To solve this problem, a normal data elimination algorithm based on MNN clustering (EMNC) is proposed, which preprocesses the data to eliminate normal points to the greatest extent. The density outlier detection algorithm that only considers k nearest neighbors cannot well adapt to outliers with abnormal data distribution. This algorithm made full use of the distribution of objects and their neighbors, and meanwhile considers k nearest neighbors, inverse nearest neighbors and shared nearest neighbors to estimate the density. A local relative density-based outlier factor (ROF) was redefined to evaluate the rest outlier of doubtful points. The ROF algorithm not only reduced the amount of data needed to calculate the local outlier, but also improved the detection efficiency. Experimental results on synthetic and real datasets show the effectiveness of the ROF algorithm compared with other methods.