1. 首页
  2. 自学中心
  3. 软件
  4. R

R语言学习笔记之Outlier Detection

Outlier Detection 孤立点检测
This page shows an example on outlier detection with the LOF (Local Outlier Factor) algorithm.
The LOF algorithm
 LOF (Local Outlier Factor) is an algorithm for identifying density-based local outliers [Breunig et al., 2000]. With LOF, the local density of a point is compared with that of its neighbors. If the former is signi.cantly lower than the latter (with an LOF value greater than one), the point is in a sparser region than its neighbors, which suggests it be an outlier.
Function lofactor(data, k) in packages DMwR and dprep calculates local outlier factors using the LOF algorithm, where k is the number of neighbors used in the calculation of the local outlier factors.
Calculate Outlier Scores
> library(DMwR)
> # remove “Species”, which is a categorical column
> iris2 <- iris[,1:4]
> outlier.scores <- lofactor(iris2, k=5)
> plot(density(outlier.scores))

102227hko5ms3dd3ns4imm

> # pick top 5 as outliers
> outliers <- order(outlier.scores, decreasing=T)[1:5]
> # who are outliers
> print(outliers)
[1] 42 107 23 110 63
Visualize Outliers with Plots Next, we show outliers with a biplot of the first two principal components.
> n <- nrow(iris2)
> labels <- 1:n
> labels[-outliers] <- “.”
> biplot(prcomp(iris2), cex=.8, xlabs=labels)
102259jmn7banalxxnbkjn
We can also show outliers with a pairs plot as below, where outliers are labeled with “+” in red.
> pch <- rep(“.”, n)
> pch[outliers] <- “+”
> col <- rep(“black”, n)
> col[outliers] <- “red”
> pairs(iris2, pch=pch, col=col)

102358tftzt969ab9a1o09

Parallel Computation of LOF Scores
Package Rlof provides function lof(), a parallel implementation of the LOF algorithm. Its usage is similar to the above lofactor(), but lof() has two additional features of supporting multiple values of k and several choices of distance metrics. Below is an example of lof().
> library(Rlof)
> outlier.scores <- lof(iris2, k=5)
> # try with different number of neighbors (k = 5,6,7,8,9 and 10)
> outlier.scores <- lof(iris2, k=c(5:10))

原创文章,作者:xsmile,如若转载,请注明出处:http://www.17bigdata.com/r%e8%af%ad%e8%a8%80%e5%ad%a6%e4%b9%a0%e7%ac%94%e8%ae%b0%e4%b9%8boutlier-detection/

发表评论

登录后才能评论

联系我们

在线咨询:点击这里给我发消息

邮件:23683716@qq.com

跳至工具栏