Simulates data and returns parameter values using Wordfish model assumptions: Counts are sampled under the assumption of independent Poisson draws with log expected means linearly related to a lattice of document positions.

sim.wordfish(
  docs = 10,
  vocab = 20,
  doclen = 500,
  dist = c("spaced", "normal"),
  scaled = TRUE
)

Arguments

docs

How many `documents' should be generated

vocab

How many `word' types should be generated

doclen

A scalar `document' length or vector of lengths

dist

the distribution of `document' positions

scaled

whether the document positions should be mean 0, unit sd

Value

Y

A sample word-document matrix

theta

The `document' positions

doclen

The `document' lengths

beta

`Word' intercepts

psi

`Word' slopes

Details

This function draws `docs' document positions from a Normal distribution, or regularly spaced between 1/`docs' and 1.

`vocab'/2 word slopes are 1, the rest -1. All word intercepts are 0. `doclen' words are then sampled from a multinomial with these parameters.

Document position (theta) is sorted in increasing size across the documents. If `scaled' is true it is normalized to mean zero, unit standard deviation. This is most helpful when dist=normal.

Author

Will Lowe