| stri_locate_all {stringi} | R Documentation |
These functions may be used e.g. to find the indices (positions), at which
a given pattern is matched.
stri_locate_all_* locate all the matches.
On the other hand, stri_locate_first_* and stri_locate_last_*
give the first or the last matches, respectively.
stri_locate_all(str, ..., regex, fixed, coll, charclass)
stri_locate_first(str, ..., regex, fixed, coll, charclass)
stri_locate_last(str, ..., regex, fixed, coll, charclass)
stri_locate(str, ..., regex, fixed, coll, charclass, mode = c("first", "all",
"last"))
stri_locate_all_charclass(str, pattern, merge = TRUE, omit_no_match = FALSE)
stri_locate_first_charclass(str, pattern)
stri_locate_last_charclass(str, pattern)
stri_locate_all_coll(str, pattern, omit_no_match = FALSE, ...,
opts_collator = NULL)
stri_locate_first_coll(str, pattern, ..., opts_collator = NULL)
stri_locate_last_coll(str, pattern, ..., opts_collator = NULL)
stri_locate_all_regex(str, pattern, omit_no_match = FALSE, ...,
opts_regex = NULL)
stri_locate_first_regex(str, pattern, ..., opts_regex = NULL)
stri_locate_last_regex(str, pattern, ..., opts_regex = NULL)
stri_locate_all_fixed(str, pattern, omit_no_match = FALSE, ...,
opts_fixed = NULL)
stri_locate_first_fixed(str, pattern, ..., opts_fixed = NULL)
stri_locate_last_fixed(str, pattern, ..., opts_fixed = NULL)
str |
character vector with strings to search in |
... |
supplementary arguments passed to the underlying functions,
including additional settings for |
mode |
single string;
one of: |
pattern, regex, fixed, coll, charclass |
character vector defining search patterns; for more details refer to stringi-search |
merge |
single logical value;
indicates whether consecutive sequences of indices in the resulting
matrix shall be merged; |
omit_no_match |
single logical value; if |
opts_collator, opts_fixed, opts_regex |
a named list used to tune up
a search engine's settings; see
|
Vectorized over str and pattern.
The matched string(s) may be extracted by calling
the stri_sub function.
Alternatively, you may call stri_extract directly.
stri_locate, stri_locate_all, stri_locate_first,
and stri_locate_last are convenience functions.
They just call stri_locate_*_*, depending on arguments used.
Unless you are a very lazy person, please call the underlying functions
directly for better performance.
For stri_locate_all_*,
a list of integer matrices is returned. Each list element
represents the results of a separate search scenario.
The first column gives the start positions
of matches, and the second column gives the end positions.
Moreover, you may get two NAs in one row
for no match (if omit_no_match is FALSE)
or NA arguments.
stri_locate_first_* and stri_locate_last_*,
on the other hand, return an integer matrix with
two columns, giving the start and end positions of the first
or the last matches, respectively, and two NAs if and
only if they are not found.
For stri_locate_*_regex, if the match is of length 0,
end will be one character less than start.
Other search_locate: stri_locate_all_boundaries,
stringi-search
Other indexing: stri_locate_all_boundaries,
stri_sub
stri_locate_all('XaaaaX',
regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_locate_all('Bartolini', fixed='i')
stri_locate_all('a b c', charclass='\\p{Zs}') # all white spaces
stri_locate_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}')
stri_locate_all_charclass(c('AbcdeFgHijK', 'abc', 'ABC'), '\\p{Ll}', merge=FALSE)
stri_locate_first_charclass('AaBbCc', '\\p{Ll}')
stri_locate_last_charclass('AaBbCc', '\\p{Ll}')
stri_locate_all_coll(c('AaaaaaaA', 'AAAA'), 'a')
stri_locate_first_coll(c('Yy\u00FD', 'AAA'), 'y', strength=2, locale="sk_SK")
stri_locate_last_coll(c('Yy\u00FD', 'AAA'), 'y', strength=1, locale="sk_SK")
pat <- stri_paste("\u0635\u0644\u0649 \u0627\u0644\u0644\u0647 ",
"\u0639\u0644\u064a\u0647 \u0648\u0633\u0644\u0645XYZ")
stri_locate_last_coll("\ufdfa\ufdfa\ufdfaXYZ", pat, strength = 1)
stri_locate_all_fixed(c('AaaaaaaA', 'AAAA'), 'a')
stri_locate_all_fixed(c('AaaaaaaA', 'AAAA'), 'a', case_insensitive=TRUE, overlap=TRUE)
stri_locate_first_fixed(c('AaaaaaaA', 'aaa', 'AAA'), 'a')
stri_locate_last_fixed(c('AaaaaaaA', 'aaa', 'AAA'), 'a')
#first row is 1-2 like in locate_first
stri_locate_all_fixed('bbbbb', 'bb')
stri_locate_first_fixed('bbbbb', 'bb')
# but last row is 3-4, unlike in locate_last,
# keep this in mind [overlapping pattern match OK]!
stri_locate_last_fixed('bbbbb', 'bb')
stri_locate_all_regex('XaaaaX',
c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_locate_first_regex('XaaaaX',
c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
stri_locate_last_regex('XaaaaX',
c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
# Use regex positive-lookahead to locate overlapping pattern matches:
stri_locate_all_regex("ACAGAGACTTTAGATAGAGAAGA", "(?=AGA)")
# note that start > end here (match of 0 length)