Runs a dictionary on the text column of x to create 'tokens' variable consisting of dictionary matches to each word, (non-matching words from the dictionary generate no token), then tabulates them within each document in 'counts' variable
jl_count_categories(x, dictionary, ...)
x | a tibble |
---|---|
dictionary | a quanteda content analysis dictionary |
... | extra arguments to tokenizers::tokenizer_* |
a tibble with 'tokens' and 'counts' variables
This is a shortcut for running jl_tokenize_categories then jl_count_tokens and should be used in preference to these two.