| catboost.load_pool {catboost} | R Documentation |
Create a dataset from the given file, matrix or data.frame.
catboost.load_pool(data, label = NULL, cat_features = NULL, column_description = NULL, pairs = NULL, delimiter = "\t", has_header = FALSE, weight = NULL, group_id = NULL, group_weight = NULL, subgroup_id = NULL, pairs_weight = NULL, baseline = NULL, feature_names = NULL, thread_count = -1)
data |
A file path, matrix or data.frame with features. The following column types are supported:
Default value: Required argument |
label |
The label vector. |
cat_features |
A vector of categorical features indices. The indices are zero based and can differ from the given in the Column descriptions file. |
column_description |
The path to the input file that contains the column descriptions. |
pairs |
A file path, matrix or data.frame that contains the pairs descriptions. The shape should be Nx2, where N is the pairs' count. The first element of pair is the index of winner document in training set. The second element of pair is the index of loser document in training set. |
delimiter |
Delimiter character to use to separate features in a file. |
has_header |
Read column names from first line, if this parameter is set to True. |
weight |
The weights of the objects. |
group_id |
The group ids of the objects. |
group_weight |
The group weight of the objects. |
subgroup_id |
The subgroup ids of the objects. |
pairs_weight |
The weights of the pairs. |
baseline |
Vector of initial (raw) values of the objective function. Used in the calculation of final values of trees. |
feature_names |
A list of names for each feature in the dataset. |
thread_count |
The number of threads to use while reading the data. Optimizes reading time. This parameter doesn't affect results. If -1, then the number of threads is set to the number of CPU cores. |
catboost.Pool
# From file
pool_path <- system.file("extdata", "adult_train.1000", package = "catboost")
cd_path <- system.file("extdata", "adult.cd", package = "catboost")
pool <- catboost.load_pool(pool_path, column_description = cd_path)
head(pool)
# From matrix
target <- 1
data_matrix <-matrix(runif(18), 6, 3)
pool <- catboost.load_pool(data_matrix[, -target], label = data_matrix[, target])
head(pool)
# From data.frame
nonsense <- c('A', 'B', 'C')
data_frame <- data.frame(value = runif(10), category = nonsense[(1:10) %% 3 + 1])
label = (1:10) %% 2
pool <- catboost.load_pool(data_frame, label = label, cat_features = c(2))
head(pool)