| GRP {collapse} | R Documentation |
GRP performs fast, ordered and unordered, groupings of vectors and data frames (or lists of vectors) using radixorderv or group. The output is a list-like object of class 'GRP' which can be printed, plotted and used as an efficient input to all of collapse's fast statistical and transformation functions / operators, as well as to collap, BY and TRA.
fgroup_by is similar to dplyr::group_by but faster. It creates a grouped data frame with a 'GRP' object attached - for faster dplyr-like programming with collapse's fast functions.
There are also several conversion methods to convert to and from 'GRP' objects. Notable among these is GRP.grouped_df, which returns a 'GRP' object from a grouped data frame created with dplyr::group_by or fgroup_by, and the duo GRP.factor and as_factor_GRP.
gsplit efficiently splits a vector based on a grouping object.
GRP(X, ...)
## Default S3 method:
GRP(X, by = NULL, sort = TRUE, decreasing = FALSE, na.last = TRUE,
return.groups = TRUE, return.order = sort, method = "auto",
call = TRUE, ...)
## S3 method for class 'factor'
GRP(X, ..., group.sizes = TRUE, drop = FALSE, return.groups = TRUE,
call = TRUE)
## S3 method for class 'qG'
GRP(X, ..., group.sizes = TRUE, return.groups = TRUE, call = TRUE)
## S3 method for class 'pseries'
GRP(X, effect = 1L, ..., group.sizes = TRUE, return.groups = TRUE,
call = TRUE)
## S3 method for class 'pdata.frame'
GRP(X, effect = 1L, ..., group.sizes = TRUE, return.groups = TRUE,
call = TRUE)
## S3 method for class 'grouped_df'
GRP(X, ..., return.groups = TRUE, call = TRUE)
# Identify, get length, group names, and convert GRP object to factor
is_GRP(x)
## S3 method for class 'GRP'
length(x)
GRPN(x, expand = TRUE, ...)
GRPnames(x, force.char = TRUE)
as_factor_GRP(x, ordered = FALSE)
# Efficiently split a vector using a grouping object
gsplit(x, g, use.g.names = FALSE, ...)
# Fast, class-agnostic version of dplyr::group_by for use with fast functions, see details
fgroup_by(.X, ..., sort = TRUE, decreasing = FALSE, na.last = TRUE,
return.order = sort, method = "auto")
# Shortcut for fgroup_by
gby(.X, ..., sort = TRUE, decreasing = FALSE, na.last = TRUE,
return.order = sort, method = "auto")
# Get grouping columns from a grouped data frame created with dplyr::group_by or fgroup_by
fgroup_vars(X, return = "data")
# Ungroup grouped data frame created with dplyr::group_by or fgroup_by
fungroup(X, ...)
## S3 method for class 'GRP'
print(x, n = 6, ...)
## S3 method for class 'GRP'
plot(x, breaks = "auto", type = "s", horizontal = FALSE, ...)
X |
a vector, list of columns or data frame (default method), or a classed object (conversion / extractor methods). | |||||||||||||||||||||||||||||||||||||||||
.X |
a data frame or list. | |||||||||||||||||||||||||||||||||||||||||
x, g |
a 'GRP' object. For | |||||||||||||||||||||||||||||||||||||||||
by |
if | |||||||||||||||||||||||||||||||||||||||||
sort |
logical. If | |||||||||||||||||||||||||||||||||||||||||
ordered |
logical. | |||||||||||||||||||||||||||||||||||||||||
decreasing |
logical. Should the sort order be increasing or decreasing? Can be a vector of length equal to the number of arguments in | |||||||||||||||||||||||||||||||||||||||||
na.last |
logical. If missing values are encountered in grouping vector/columns, assign them to the last group (argument passed to | |||||||||||||||||||||||||||||||||||||||||
return.groups |
logical. Include the unique groups in the created GRP object. | |||||||||||||||||||||||||||||||||||||||||
return.order |
logical. Include the output from | |||||||||||||||||||||||||||||||||||||||||
method |
character. The algorithm to use for grouping: either | |||||||||||||||||||||||||||||||||||||||||
group.sizes |
logical. | |||||||||||||||||||||||||||||||||||||||||
drop |
logical. | |||||||||||||||||||||||||||||||||||||||||
call |
logical. | |||||||||||||||||||||||||||||||||||||||||
expand |
logical. | |||||||||||||||||||||||||||||||||||||||||
force.char |
logical. Always output group names as character vector, even if a single numeric vector was passed to | |||||||||||||||||||||||||||||||||||||||||
effect |
plm methods: Select which panel identifier should be used as grouping variable. 1L takes the first variable in the | |||||||||||||||||||||||||||||||||||||||||
return |
an integer or string specifying what
| |||||||||||||||||||||||||||||||||||||||||
use.g.names |
logical. |
n |
integer. Number of groups to print out. |
breaks |
integer. Number of breaks in the histogram of group-sizes. |
type |
linetype for plot. |
horizontal |
logical. |
... |
for |
GRP is a central function in the collapse package because it provides the key inputs to facilitate easy and efficient groupwise-programming at the C/C++ level: Information about (1) the number of groups (2) an integer group-id indicating which values / rows belong to which group and (3) information about the size of each group. Provided with these informations, collapse's Fast Statistical Functions pre-allocate intermediate and result vectors of the right sizes and (in most cases) perform grouped statistical computations in a single pass through the data.
The sorting and ordering functionality for GRP only affects (2), that is groups receive different integer-id's depending on whether the groups are sorted sort = TRUE, and in which order (argument decreasing). This in-turn changes the order of values/rows in the output of collapse functions.
Next to GRP, there is the function fgroup_by as a significantly faster alternative to dplyr::group_by. It creates a grouped data frame by attaching a 'GRP' object to a data frame. collapse functions with a grouped_df method applied to that data frame will yield grouped computations. Note that fgroup_by can only be used in combination with collapse functions, not with dplyr::summarize or dplyr::mutate (the grouping object and method of computing results is different). The converse is not true, you can group data with dplyr::group_by and then apply collapse functions. fgroup_by is class-agnostic, i.e. the classes of the data frame or list passed are preserved, and all standard methods (like subsetting with `[` or print methods) apply to the grouped object. Apart from the class 'grouped_df' which is added behind any classes the object might inherit (apart from 'data.frame'), a class 'GRP_df' is added in front. This class responds to print method and subset (`[`) methods. Both first call the corresponding method for the object and then print / attach the grouping information. print.GRP_df prints one line below the object indicating the grouping variables, followed, in square brackets, by some statistics on the group sizes: [N | Mean (SD) Min-Max]. The mean is rounded to a full number and the standard deviation (SD) to one digit. Minimum and maximum are only displayed if the SD is non-zero.
GRP is an S3 generic function with one default method supporting vector and list input and several conversion methods:
The conversion of factors to 'GRP' objects by GRP.factor involves obtaining the number of groups calling ng <- fnlevels(f) and then computing the count of each level using tabulate(f, ng). The integer group-id (2) is already given by the factor itself after removing the levels and class attributes and replacing any missing values with ng + 1L. The levels are put in a list and moved to position (4) in the 'GRP' object, which is reserved for the unique groups. Going from factor to 'GRP' object thus only requires a tabulation of the levels, whereas creating a factor from a 'GRP' object using as_factor_GRP does not involve any computations, but may involve interacting multiple columns using the paste function to produce unique factor levels (if multiple grouping columns were used).
The method GRP.grouped_df takes the 'groups' attribute from a grouped data frame and converts it to a 'GRP' object. If the grouped data frame was generated using fgroup_by, all work is done already. If it was created using dplyr::group_by, a C routine is called to efficiently convert the grouping object.
Note: For faster factor generation and a factor-light class 'qG' which avoids the coercion of factor levels to character also see qF and qG.
A list-like object of class ‘GRP’ containing information about the number of groups, the observations (rows) belonging to each group, the size of each group, the unique group names / definitions, whether the groups are ordered or not and the ordering vector used to perform the ordering. The object is structured as follows:
| List-index | Element-name | Content type | Content description | |||
| [[1]] | N.groups | integer(1) | Number of Groups | |||
| [[2]] | group.id | integer(NROW(X)) | An integer group-identifier | |||
| [[3]] | group.sizes | integer(N.groups) | Vector of group sizes | |||
| [[4]] | groups | unique(X) or NULL | Unique groups (same format as input, except for fgroup_by which uses a plain list, sorted if sort = TRUE), or NULL if return.groups = FALSE |
|||
| [[5]] | group.vars | character | The names of the grouping variables | |||
| [[6]] | ordered | logical(2) | [1]- TRUE if sort = TRUE, [2]- TRUE if X already sorted |
|||
| [[7]] | order | integer(NROW(X)) or integer(0) (with attributes), or NULL | Ordering vector from radixorderv or group (with "starts" attribute) or NULL if return.order = FALSE |
|||
| [[8]] | call | match.call() or NULL | The GRP() call, obtained from match.call(), or NULL if call = FALSE
|
radixorder, qF, Fast Grouping and Ordering, Collapse Overview
## default method
GRP(mtcars$cyl)
GRP(mtcars, ~ cyl + vs + am) # Or GRP(mtcars, c("cyl","vs","am")) or GRP(mtcars, c(2,8:9))
g <- GRP(mtcars, ~ cyl + vs + am) # Saving the object
print(g) # Printing it
plot(g) # Plotting it
GRPnames(g) # Retain group names
fsum(mtcars, g) # Compute the sum of mtcars, grouped by variables cyl, vs and am
gsplit(mtcars$mpg, g) # Use the object to split a vector
gsplit(NULL, g) # The indices of the groups
## Convert factor to GRP object and vice-versa
GRP(iris$Species)
as_factor_GRP(g)
## dplyr integration
library(dplyr)
mtcars %>% group_by(cyl,vs,am) %>% GRP() # Get GRP object from a dplyr grouped tibble
mtcars %>% group_by(cyl,vs,am) %>% fmean() # Grouped mean using dplyr grouping
mtcars %>% fgroup_by(cyl,vs,am) %>% fmean() # Faster alternative with collapse grouping
mtcars %>% fgroup_by(cyl,vs,am) # Print method for grouped data frame
library(magrittr)
## Adding a column of group sizes
mtcars %>% fgroup_by(cyl,vs,am) %>% ftransform(Sizes = GRPN(.))
mtcars %>% ftransform(Sizes = GRPN(list(cyl,vs,am))) # Same thing, slightly more efficient
## Various options for programming and interactive use
fgroup_by(GGDC10S, Variable, Decade = floor(Year / 10) * 10) %>% head(3)
fgroup_by(GGDC10S, 1:3, 5) %>% head(3)
fgroup_by(GGDC10S, c("Variable", "Country")) %>% head(3)
fgroup_by(GGDC10S, is.character) %>% head(3)
fgroup_by(GGDC10S, Country:Variable, Year) %>% head(3)
fgroup_by(GGDC10S, Country:Region, Var = Variable, Year) %>% head(3)