A word count matrix that know which margin holds the words.

wfm(mat, word.margin = 1)

Arguments

mat

matrix of word counts or the name of a csv file of word counts

word.margin

which margin holds the words

Value

A word frequency matrix from a suitable object, or read from a file if mat is character. Which margin is treated as representing words is set by word.margin.

Details

If mat is a filename it should name a comma separated value format with row labels in the first column and column labels in the first row. Which represents words and which documents is specified by word.margin, which defaults to words as rows.

A word frequency matrix is defined as any two dimensional matrix with non-empty row and column names and dimnames 'words' and 'docs' (in either order). The actual class of such an object is not important for the operation of the functions in this package, so wfm is essentially an interface. The function is.wfm is a (currently rather loose) check whether an object fulfils the interface contract.

For such objects the convenience accessor functions as.docword and as.worddoc can be used to to get counts whichever way up you need them.

words returns the words and docs returns the document titles. wordmargin reminds you which margin contains the words. Assigning wordmargin flips the dimension names.

To get extract particular documents by name or index, use getdocs.

as.wfm attempts to convert things to be word frequency matrices. This functionality is currently limited to objects on which as.matrix already works, and to TermDocument and DocumentTerm objects from the tm package.

See also

Author

Will Lowe

Examples

mat <- matrix(1:6, ncol=2) rownames(mat) <- c('W1','W2','W3') colnames(mat) <- c('D1','D2') m <- wfm(mat, word.margin=1) getdocs(as.docword(m), 'D2')
#> words #> docs W1 W2 W3 #> D2 4 5 6