| split_match_regex_to_transcript {textshape} | R Documentation |
A wrapper for split_match_regex and textreadr's
as_transript to detect person variable, split the text into turns of
talk, and convert to a data.frame with person and dialogue
variables. There is a bit of cleansing that is closer to as_transript
than split_transcript.
split_match_regex_to_transcript(x, person.regex = "^[A-Z]{3,}",
col.names = c("Person", "Dialogue"), dash = "", ellipsis = "...",
quote2bracket = FALSE, rm.empty.rows = TRUE, skip = 0, ...)
x |
A vector with split points. |
person.regex |
A vector of places (elements) to split on or a regular
expression if |
col.names |
A character vector specifying the column names of the transcript columns. |
dash |
A character string to replace the en and em dashes special characters (default is to remove). |
ellipsis |
A character string to replace the ellipsis special characters. |
quote2bracket |
logical. If |
rm.empty.rows |
logical. If |
skip |
Integer; the number of lines of the data file to skip before beginning to read data. |
... |
ignored. |
Returns a data.frame of dialogue and people.
## Not run:
system.file("docs/Simpsons_Roasting_on_an_Open_Fire_Script.pdf", package = "textshape") %>%
textreadr::read_document() %>%
split_match_regex_to_transcript("^[A-Z]{3,}", skip = 2)
## End(Not run)