Split a corpus into subunits — jl

If a document 23 of x contains 3 sentences in its 'text' then jl_split(x, "sentences") returns three new rows with other variables duplicated, new 'tokens' values, and doc_ids 23.1 23.2 and 23.3

jl_split(x, what = c("paragraphs", "sentences", "regex"), ...)

Arguments

x	a tibble
what	what unit to disaggregate a document to (default: paragraphs)
...	extra arguments to give to tokenizers::tokenize_*

Value

a tibble with new doc_id