| across {collapse} | R Documentation |
across() can be used inside fmutate and fsummarise to apply one or more functions to a selection of columns. It is overall very similar to dplyr::across, but does not support some rlang features, has some additional features (arguments), and is optimized to work with collapse's, .FAST_FUN, yielding much faster computations.
across(.cols = NULL, .fns, ..., .names = NULL,
.apply = "auto", .transpose = "auto")
# acr(...) can be used to abbreviate across(...)
.cols |
select columns using column names and expressions (e.g. |
.fns |
A function, character vector of functions or list of functions. Vectors / lists can be named to yield alternative names in the result (see |
... |
further arguments to |
.names |
controls the naming of computed columns. |
.apply |
controls whether functions are applied column-by-column ( |
.transpose |
with multiple |
across does not support purr-style lambdas, and does not support dplyr-style predicate functions e.g. across(where(is.numeric), sum), simply use across(is.numeric, sum). In contrast to dplyr, you can also compute on grouping columns.
In general, my mission with collapse is not to create a dplyr-clone, but to take some of the useful features and make them robust and fast using base R and C/C++, with the aim of having a stable API. So don't ask me to implement the latest dplyr feature, unless you firmly believe it is very useful and will be around 10 years from now.
fsummarise, fmutate, Fast Data Manipulation, Collapse Overview
# Basic (Weighted) Summaries
fsummarise(wlddev, across(PCGDP:GINI, fmean, w = POP))
library(magrittr) # Note: Used because |> is not available on older R versions
wlddev %>% fgroup_by(region, income) %>%
fsummarise(across(PCGDP:GINI, fmean, w = POP))
# Note that for these we don't actually need across...
fselect(wlddev, PCGDP:GINI) %>% fmean(w = wlddev$POP, drop = FALSE)
wlddev %>% fgroup_by(region, income) %>%
fselect(PCGDP:GINI, POP) %>% fmean(POP, keep.w = FALSE)
collap(wlddev, PCGDP + LIFEEX + GINI ~ region + income, w = ~ POP, keep.w = FALSE)
# But if we want to use some base R function that reguires argument splitting...
wlddev %>% na_omit(cols = "POP") %>% fgroup_by(region, income) %>%
fsummarise(across(PCGDP:GINI, weighted.mean, w = POP, na.rm = TRUE))
# Or if we want to apply different functions...
wlddev %>% fgroup_by(region, income) %>%
fsummarise(across(PCGDP:GINI, list(mu = fmean, sd = fsd), w = POP),
POP_sum = fsum(POP), OECD = fmean(OECD))
# Note that the above still detects fmean as a fast function, the names of the list
# are irrelevant, but the function name must be typed or passed as a character vector,
# Otherwise functions will be executed by groups e.g. function(x) fmean(x) won't vectorize
# Or we want to do more advanced things..
# Such as nesting data frames..
qTBL(wlddev) %>% fgroup_by(region, income) %>%
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) list(Nest = list(x)),
.apply = FALSE))
# Or linear models..
qTBL(wlddev) %>% fgroup_by(region, income) %>%
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) list(Mods = list(lm(PCGDP ~., x))),
.apply = FALSE))
# Or cumputing grouped correlation matrices
qTBL(wlddev) %>% fgroup_by(region, income) %>%
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) qDF(pwcor(x), "Variable"), .apply = FALSE))
# Here calculating 1- and 10-year lags and growth rates of these variables
qTBL(wlddev) %>% fgroup_by(country) %>%
fmutate(across(c(PCGDP, LIFEEX, ODA), list(L, G),
n = c(1, 10), t = year, .names = FALSE))
# Same but variables in different order
qTBL(wlddev) %>% fgroup_by(country) %>%
fmutate(across(c(PCGDP, LIFEEX, ODA), list(L, G), n = c(1, 10),
t = year, .names = FALSE, .transpose = FALSE))