wefat_boot.Rd
A simple bootstrap for the WEFAT calculations. The statistic
of interest is the difference between the cosine of each word in condition
x_name
e.g. "Careers", to the mean vector of condition a_name
,
e.g. "MaleAttributes" and the mean vector from condition b_name
,
e.g. "FemaleAttributes".
wefat_boot(items, vectors, x_name, a_name, b_name, b = 300, se.calc = c("sd", "quantile"))
items | information about the items, typically from
|
---|---|
vectors | a matrix of word vectors for the study |
x_name | the name of the target item condition, e.g. "Careers" in WEFAT 1 |
a_name | the name of the first condition, e.g. "MaleAttributes" in WEFAT 1 and 2 |
b_name | the name of the second condition, e.g. "FemaleAttributes" in WEFAT 1 and 2 |
b | number of bootstrap samples. Defaults to 300. |
se.calc | how to compute lower and upper bounds on an approximate 95 interval for the difference of cosines statistic. "se" (default) or "quantile". |
a data frame with first column x_name
, second column the
difference of cosines statistic, third and fourth columns the
lower and upper bounds of an approximate 95
from the bootstrapped statistic. If se.calc
is "quantile",
the fifth column is the median value of the statistic across
bootstrap samples. The data frame is sorted by the second column.
Uncertainty is quantified by bootstrapping each set of
item vectors. That is, in each of the b
bootstrap samples,
vectors in the a_name
condition and
vectors in the b_name
condition are
resampled (independently) with replacement, and the difference between
the cosine of a target word and the mean of the a_name
vectors and cosine of a target word and the mean of the b_name
is recorded. The bootstrap sampling distribution of this difference of
cosines statistic is summarized in the outpu by an approximate
95
statistic across bootstrap samples if se.calc
is "sd", or as the
0.025 and 0.975 quantiles of the bootstrap sampling distribution
if se.calc
is "quantile".
If se.calc
is "quantile" the data frame returned has an extra column
containing the median of the statistic in the bootstrap samples. This should not
be too far from the original statistic.
The output of this function is sorted by the value of the difference of
cosines statistic. This direction is arbitrary, but if you wish to reverse
the ordering just swap the values of a_name
for b_name
when
calling it.
Note that this is not the statistic reported in the original paper.
its <- cbn_get_items("WEFAT", 1) its_vecs <- cbn_get_item_vectors("WEFAT", 1) res <- wefat_boot(its, its_vecs, x_name = "Careers", a_name = "MaleAttributes", b_name = "FemaleAttributes", se.calc = "quantile")