tokenize {readr}R Documentation

Tokenize a file/string.

Description

Turns input into a character vector. Usually the tokenization is done purely in C++, and never exposed to R (because that requires a copy). This function is useful for testing, or when a file doesn't parse correctly and you want to see the underlying tokens.

Usage

tokenize(file, tokenizer = tokenizer_csv(), skip = 0, n_max = -1L)

Arguments

file

Either a path to a file, a connection, or literal data (either a single string or a raw vector).

Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed. Files starting with http://, https://, ftp://, or ftps:// will be automatically downloaded. Remote gz files can also be automatically downloaded and decompressed.

Literal data is most useful for examples and tests. To be recognised as a path, it must be wrapped with I(), be a string containing at least one new line, or be a vector containing at least one string with a new line.

Using a value of clipboard() will read from the system clipboard.

tokenizer

A tokenizer specification.

skip

Number of lines to skip before reading data.

n_max

Optionally, maximum number of rows to tokenize.

Examples

tokenize("1,2\n3,4,5\n\n6")

# Only tokenize first two lines
tokenize("1,2\n3,4,5\n\n6", n = 2)

[Package readr version 2.0.2 Index]