| elbow {GMD} | R Documentation |
Determining the number of clusters in a data set by the "elbow" rule.
## find a good k given thresholds of EV and its increment. elbow(x,inc.thres,ev.thres,precision=3,print.warning=TRUE) ## a wrapper of `elbow' testing multiple thresholds. elbow.batch(x,inc.thres=c(0.01,0.05,0.1), ev.thres=c(0.95,0.9,0.8,0.75,0.67,0.5,0.33),precision=3) ## S3 method for class 'elbow' plot(x,elbow.obj=NULL,main,xlab="k", ylab="Explained_Variance",type="b",pch=20,col.abline="red", lty.abline=3,if.plot.new=TRUE,print.info=TRUE, mar=c(4,5,3,3),omi=c(0.75,0,0,0),...)
x |
a ‘css.multi’ object, generated by |
inc.thres |
numeric with value(s) from 0 to 1, the threshold of the increment of EV.
A single value is used in |
ev.thres |
numeric with value(s) from 0 to 1, the threshold of EV.
A single value is used in |
precision |
integer, the number of digits to round for numerical comparison. |
print.warning |
logical, whether to print warning messages. |
elbow.obj |
a ‘elbow’ object, generated by |
main |
an overall title for the plot. |
ylab |
a title for the y axis. |
xlab |
a title for the x axis. |
type |
what type of plot should be drawn. |
pch |
Either an integer specifying a symbol or a single character
to be used as the default in plotting points (see |
col.abline |
color for straight lines through the current plot
(see option |
lty.abline |
line type for straight lines through the current plot
(see option |
if.plot.new |
logical, whether to start a new plot device or not. |
print.info |
logical, whether to print the information of ‘elbow.obj’. |
mar |
A numerical vector of the form 'c(bottom, left, top, right)'
which gives the number of lines of margin to be specified on
the four sides of the plot (see option |
omi |
A vector of the form 'c(bottom, left, top, right)' giving the
size of the outer margins in inches (see option |
... |
arguments to be passed to method |
Determining the number of clusters in a data set by the "elbow" rule and thresholds in the explained variance (EV) and its increment.
Both elbow and elbow.btach return a ‘elbow’ object
(if a "good" k exists),
which is a list containing the following components
| k | number of clusters |
| ev | explained variance given k |
| inc.thres | the threshold of the increment in EV |
| ev.thres | the threshold of the EV |
, and with an attribute ‘meta’ that contains
| description | A description about the "good" k |
css and css.hclust for computing Clustering Sum-of-Squares.
## load library
require("GMD")
## simulate data around 12 points in Euclidean space
pointv <- data.frame(x=c(1,2,2,4,4,5,5,6,7,8,9,9),
y=c(1,2,8,2,4,4,5,9,9,8,1,9))
set.seed(2012)
mydata <- c()
for (i in 1:nrow(pointv)){
mydata <- rbind(mydata,cbind(rnorm(10,pointv[i,1],0.1),
rnorm(10,pointv[i,2],0.1)))
}
mydata <- data.frame(mydata); colnames(mydata) <- c("x","y")
plot(mydata,type="p",pch=21, main="Simulated data")
## determine a "good" k using elbow
dist.obj <- dist(mydata[,1:2])
hclust.obj <- hclust(dist.obj)
css.obj <- css.hclust(dist.obj,hclust.obj)
elbow.obj <- elbow.batch(css.obj)
print(elbow.obj)
## make partition given the "good" k
k <- elbow.obj$k; cutree.obj <- cutree(hclust.obj,k=k)
mydata$cluster <- cutree.obj
## draw a elbow plot and label the data
dev.new(width=12, height=6)
par(mfcol=c(1,2),mar=c(4,5,3,3),omi=c(0.75,0,0,0))
plot(mydata$x,mydata$y,pch=as.character(mydata$cluster),
col=mydata$cluster,cex=0.75,main="Clusters of simulated data")
plot(css.obj,elbow.obj,if.plot.new=FALSE)
## clustering with more relaxed thresholds (, resulting a smaller "good" k)
elbow.obj2 <- elbow.batch(css.obj,ev.thres=0.90,inc.thres=0.05)
mydata$cluster2 <- cutree(hclust.obj,k=elbow.obj2$k)
dev.new(width=12, height=6)
par(mfcol=c(1,2), mar=c(4,5,3,3),omi=c(0.75,0,0,0))
plot(mydata$x,mydata$y,pch=as.character(mydata$cluster2),
col=mydata$cluster2,cex=0.75,main="Clusters of simulated data")
plot(css.obj,elbow.obj2,if.plot.new=FALSE)