| LiblineaR {LiblineaR} | R Documentation |
LiblineaR allows the estimation of predictive linear models for
classification and regression, such as L1- or L2-regularized logistic
regression, L1- or L2-regularized L2-loss support vector classification,
L2-regularized L1-loss support vector classification and multi-class support
vector classification. It also supports L2-regularized support vector regression
(with L1- or L2-loss). The estimation of the models is particularly fast as
compared to other libraries. The implementation is based on the 'LIBLINEAR' C/C++
library for machine learning.
LiblineaR(data, target, type = 0, cost = 1, epsilon = 0.01, svr_eps = NULL, bias = 1, wi = NULL, cross = 0, verbose = FALSE, findC = FALSE, useInitC = TRUE, ...)
data |
a nxp data matrix. Each row stands for an example (sample, point) and each column stands for a dimension (feature, variable). A sparse matrix (from SparseM package) will also work. |
target |
a response vector for prediction tasks with one value for
each of the n rows of |
type |
|
cost |
cost of constraints violation (default: 1). Rules the trade-off
between regularization and correct classification on |
epsilon |
set tolerance of termination criterion for optimization.
If
The meaning of
|
svr_eps |
set tolerance margin (epsilon) in regression loss function of SVR. Not used for classification methods. |
bias |
if bias > 0, instance |
wi |
a named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named according to the corresponding class label. Not used in regression mode. |
cross |
if an integer value k>0 is specified, a k-fold cross validation
on |
verbose |
if |
findC |
if |
useInitC |
if |
... |
for backwards compatibility, parameter |
For details for the implementation of 'LIBLINEAR', see the README file of the original c/c++ 'LIBLINEAR' library at http://www.csie.ntu.edu.tw/~cjlin/liblinear.
If cross>0, the average accuracy (classification) or mean square error (regression) computed over cross runs of cross-validation is returned.
Otherwise, an object of class "LiblineaR" containing the fitted model is returned, including:
TypeDetail |
A string decsribing the type of model fitted, as determined by |
Type |
An integer corresponding to |
W |
A matrix with the model weights. If |
Bias |
The value of |
ClassNames |
A vector containing the class names. This entry is not returned in case of regression models. |
Classification models usually perform better if each dimension of the data is first centered and scaled.
Thibault Helleputte thibault.helleputte@dnalytics.com and
Pierre Gramme pierre.gramme@dnalytics.com and
Jerome Paul jerome.paul@dnalytics.com.
Based on C/C++-code by Chih-Chung Chang and Chih-Jen Lin
For more information on 'LIBLINEAR' itself, refer to:
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
LIBLINEAR: A Library for Large Linear Classification,
Journal of Machine Learning Research 9(2008), 1871-1874.
http://www.csie.ntu.edu.tw/~cjlin/liblinear
data(iris)
attach(iris)
x=iris[,1:4]
y=factor(iris[,5])
train=sample(1:dim(iris)[1],100)
xTrain=x[train,]
xTest=x[-train,]
yTrain=y[train]
yTest=y[-train]
# Center and scale data
s=scale(xTrain,center=TRUE,scale=TRUE)
# Find the best model with the best cost parameter via 10-fold cross-validations
tryTypes=c(0:7)
tryCosts=c(1000,1,0.001)
bestCost=NA
bestAcc=0
bestType=NA
for(ty in tryTypes){
for(co in tryCosts){
acc=LiblineaR(data=s,target=yTrain,type=ty,cost=co,bias=1,cross=5,verbose=FALSE)
cat("Results for C=",co," : ",acc," accuracy.\n",sep="")
if(acc>bestAcc){
bestCost=co
bestAcc=acc
bestType=ty
}
}
}
cat("Best model type is:",bestType,"\n")
cat("Best cost is:",bestCost,"\n")
cat("Best accuracy is:",bestAcc,"\n")
# Re-train best model with best cost value.
m=LiblineaR(data=s,target=yTrain,type=bestType,cost=bestCost,bias=1,verbose=FALSE)
# Scale the test data
s2=scale(xTest,attr(s,"scaled:center"),attr(s,"scaled:scale"))
# Make prediction
pr=FALSE
if(bestType==0 || bestType==7) pr=TRUE
p=predict(m,s2,proba=pr,decisionValues=TRUE)
# Display confusion matrix
res=table(p$predictions,yTest)
print(res)
# Compute Balanced Classification Rate
BCR=mean(c(res[1,1]/sum(res[,1]),res[2,2]/sum(res[,2]),res[3,3]/sum(res[,3])))
print(BCR)
#' #############################################
# Example of the use of a sparse matrix:
if(require(SparseM)){
# Sparsifying the iris dataset:
iS=apply(iris[,1:4],2,function(a){a[a<quantile(a,probs=c(0.25))]=0;return(a)})
irisSparse<-as.matrix.csr(iS)
# Applying a similar methodology as above:
xTrain=irisSparse[train,]
xTest=irisSparse[-train,]
# Re-train best model with best cost value.
m=LiblineaR(data=xTrain,target=yTrain,type=bestType,cost=bestCost,bias=1,verbose=FALSE)
# Make prediction
p=predict(m,xTest,proba=pr,decisionValues=TRUE)
# Display confusion matrix
res=table(p$predictions,yTest)
print(res)
}
#############################################
# Try regression instead, to predict sepal length on the basis of sepal width and petal width:
xTrain=iris[c(1:25,51:75,101:125),2:3]
yTrain=iris[c(1:25,51:75,101:125),1]
xTest=iris[c(26:50,76:100,126:150),2:3]
yTest=iris[c(26:50,76:100,126:150),1]
# Center and scale data
s=scale(xTrain,center=TRUE,scale=TRUE)
# Estimate MSE in cross-vaidation on a train set
MSECross=LiblineaR(data = s, target = yTrain, type = 13, cross = 10, svr_eps=.01)
# Build the model
m=LiblineaR(data = s, target = yTrain, type = 13, cross=0, svr_eps=.01)
# Test it, after test data scaling:
s2=scale(xTest,attr(s,"scaled:center"),attr(s,"scaled:scale"))
pred=predict(m,s2)$predictions
MSETest=mean((yTest-pred)^2)
# Was MSE well estimated?
print(MSETest-MSECross)
# Distribution of errors
print(summary(yTest-pred))