| stri_split_boundaries {stringi} | R Documentation |
This function locates specific text boundaries (like character, word, line, or sentence boundaries) and splits strings at the indicated positions.
stri_split_boundaries(str, n = -1L, tokens_only = FALSE, simplify = FALSE, ..., opts_brkiter = NULL)
str |
character vector or an object coercible to |
n |
integer vector, maximal number of strings to return |
tokens_only |
single logical value; may affect the result if |
simplify |
single logical value; if |
... |
additional settings for |
opts_brkiter |
a named list with ICU BreakIterator's settings
as generated with |
Vectorized over str and n.
If n is negative (default), then all pieces are extracted.
Otherwise, if tokens_only is FALSE (this is the default,
for compatibility with the stringr package), then n-1
tokes are extracted (if possible) and the n-th string
gives the (non-split) remainder (see Examples).
On the other hand, if tokens_only is TRUE,
then only full tokens (up to n pieces) are extracted.
For more information on the text boundary analysis
performed by ICU's BreakIterator, see
stringi-search-boundaries.
If simplify=FALSE (the default),
then the functions return a list of character vectors.
Otherwise, stri_list2matrix with byrow=TRUE
and n_min=n arguments is called on the resulting object.
In such a case, a character matrix with length(str) rows
is returned. Note that stri_list2matrix's fill
argument is set to an empty string and NA,
for simplify equal to TRUE and NA, respectively.
Other search_split: stri_split_lines,
stri_split, stringi-search
Other locale_sensitive: %s<%,
stri_compare,
stri_count_boundaries,
stri_duplicated,
stri_enc_detect2,
stri_extract_all_boundaries,
stri_locate_all_boundaries,
stri_opts_collator,
stri_order,
stri_trans_tolower,
stri_unique, stri_wrap,
stringi-locale,
stringi-search-boundaries,
stringi-search-coll
Other text_boundaries: stri_count_boundaries,
stri_extract_all_boundaries,
stri_locate_all_boundaries,
stri_opts_brkiter,
stri_split_lines,
stri_trans_tolower,
stri_wrap,
stringi-search-boundaries,
stringi-search
test <- "The\u00a0above-mentioned features are very useful. " %s+%
"Warm thanks to their developers. 123 456 789"
stri_split_boundaries(test, type="line")
stri_split_boundaries(test, type="word")
stri_split_boundaries(test, type="word", skip_word_none=TRUE)
stri_split_boundaries(test, type="word", skip_word_none=TRUE, skip_word_letter=TRUE)
stri_split_boundaries(test, type="word", skip_word_none=TRUE, skip_word_number=TRUE)
stri_split_boundaries(test, type="sentence")
stri_split_boundaries(test, type="sentence", skip_sentence_sep=TRUE)
stri_split_boundaries(test, type="character")
# filtered break iterator with the new ICU:
stri_split_boundaries("Mr. Jones and Mrs. Brown are very happy.
So am I, Prof. Smith.", type="sentence", locale="@ss=standard") # ICU >= 56 only