A word count matrix that know which margin holds the words.
wfm(mat, word.margin = 1)
mat | matrix of word counts or the name of a csv file of word counts |
---|---|
word.margin | which margin holds the words |
A word frequency matrix from a suitable object, or read from a file
if mat
is character. Which margin is treated as representing words
is set by word.margin
.
If mat
is a filename it should name a comma separated value format
with row labels in the first column and column labels in the first row.
Which represents words and which documents is specified by
word.margin
, which defaults to words as rows.
A word frequency matrix is defined as any two dimensional matrix with
non-empty row and column names and dimnames 'words' and 'docs' (in either
order). The actual class of such an object is not important for the
operation of the functions in this package, so wfm is essentially an
interface. The function is.wfm
is a (currently rather loose)
check whether an object fulfils the interface contract.
For such objects the convenience accessor functions as.docword
and as.worddoc
can be used to to get counts whichever way up
you need them.
words
returns the words and docs
returns the
document titles. wordmargin
reminds you which margin contains
the words. Assigning wordmargin
flips the dimension names.
To get extract particular documents by name or index, use getdocs.
as.wfm
attempts to convert things to be word frequency
matrices. This functionality is currently limited to objects on which
as.matrix
already works, and to TermDocument
and
DocumentTerm
objects from the tm
package.
Will Lowe
mat <- matrix(1:6, ncol=2) rownames(mat) <- c('W1','W2','W3') colnames(mat) <- c('D1','D2') m <- wfm(mat, word.margin=1) getdocs(as.docword(m), 'D2')#> words #> docs W1 W2 W3 #> D2 4 5 6