hard_arrange {disk.frame}R Documentation

Perform a hard arrange

Description

A hard_arrange is a sort by that also reorganizes the chunks to ensure that every unique grouping of 'by“ is in the same chunk. Or in other words, every row that share the same 'by' value will end up in the same chunk.

Usage

hard_arrange(df, ..., add = FALSE, .drop = FALSE)

## S3 method for class 'data.frame'
hard_arrange(df, ...)

## S3 method for class 'disk.frame'
hard_arrange(
  df,
  ...,
  outdir = tempfile("tmp_disk_frame_hard_arrange"),
  nchunks = disk.frame::nchunks(df),
  overwrite = TRUE
)

Arguments

df

a disk.frame

...

grouping variables

add

same as dplyr::arrange

.drop

same as dplyr::arrange

outdir

the output directory

nchunks

The number of chunks in the output. Defaults = nchunks.disk.frame(df)

overwrite

overwrite the out put directory

Examples

iris.df = as.disk.frame(iris, nchunks = 2)

# arrange iris.df by specifies and ensure rows with the same specifies are in the same chunk
iris_hard.df = hard_arrange(iris.df, Species)

get_chunk(iris_hard.df, 1)
get_chunk(iris_hard.df, 2)

# clean up cars.df
delete(iris.df)
delete(iris_hard.df)

[Package disk.frame version 0.3.7 Index]