A simple bootstrap for the WEFAT calculations. The statistic of interest is the difference between the cosine of each word in condition x_name e.g. "Careers", to the mean vector of condition a_name, e.g. "MaleAttributes" and the mean vector from condition b_name, e.g. "FemaleAttributes".

wefat_boot(items, vectors, x_name, a_name, b_name, b = 300,
  se.calc = c("sd", "quantile"))

Arguments

items

information about the items, typically from cbn_get_items

vectors

a matrix of word vectors for the study

x_name

the name of the target item condition, e.g. "Careers" in WEFAT 1

a_name

the name of the first condition, e.g. "MaleAttributes" in WEFAT 1 and 2

b_name

the name of the second condition, e.g. "FemaleAttributes" in WEFAT 1 and 2

b

number of bootstrap samples. Defaults to 300.

se.calc

how to compute lower and upper bounds on an approximate 95 interval for the difference of cosines statistic. "se" (default) or "quantile".

Value

a data frame with first column x_name, second column the difference of cosines statistic, third and fourth columns the lower and upper bounds of an approximate 95 from the bootstrapped statistic. If se.calc is "quantile", the fifth column is the median value of the statistic across bootstrap samples. The data frame is sorted by the second column.

Details

Uncertainty is quantified by bootstrapping each set of item vectors. That is, in each of the b bootstrap samples, vectors in the a_name condition and vectors in the b_name condition are resampled (independently) with replacement, and the difference between the cosine of a target word and the mean of the a_name vectors and cosine of a target word and the mean of the b_name is recorded. The bootstrap sampling distribution of this difference of cosines statistic is summarized in the outpu by an approximate 95 statistic across bootstrap samples if se.calc is "sd", or as the 0.025 and 0.975 quantiles of the bootstrap sampling distribution if se.calc is "quantile".

If se.calc is "quantile" the data frame returned has an extra column containing the median of the statistic in the bootstrap samples. This should not be too far from the original statistic.

The output of this function is sorted by the value of the difference of cosines statistic. This direction is arbitrary, but if you wish to reverse the ordering just swap the values of a_name for b_name when calling it.

Note that this is not the statistic reported in the original paper.

Examples

its <- cbn_get_items("WEFAT", 1) its_vecs <- cbn_get_item_vectors("WEFAT", 1) res <- wefat_boot(its, its_vecs, x_name = "Careers", a_name = "MaleAttributes", b_name = "FemaleAttributes", se.calc = "quantile")