| sim_data {clusteval} | R Documentation |
We provide a wrapper function to generate from three data-generating models:
sim_unifFive multivariate uniform distributions
sim_normalMultivariate normal distributions with intraclass covariance matrices
sim_studentMultivariate Student's t distributions each with a common covariance matrix
sim_data(family = c("uniform", "normal", "student"), ...)
family |
the family of distributions from which to generate data |
... |
optional arguments that are passed to the data-generating function |
For each data-generating model, we generate n_m
observations (m = 1, …, M) from each of
M multivariate distributions so that the Euclidean
distance between each of the population centroids and the
origin is equal and scaled by Δ ≥ 0. For
each model, the argument delta controls this
separation.
This wrapper function is useful for simulation studies, where the efficacy of supervised and unsupervised learning methods and algorithms are evaluated as a the population separation is increased.
named list containing:
A matrix
whose rows are the observations generated and whose
columns are the p features (variables)
A vector denoting the population from which the observation in each row was generated.
set.seed(42) uniform_data <- sim_data(family = "uniform") normal_data <- sim_data(family = "normal", delta = 2) student_data <- sim_data(family = "student", delta = 1, df = 1:5)