weat_boot.Rd
A simple bootstrap for the WEAT calculations. The statistic of interest is an average difference of average differences.
weat_boot(items, vectors, x_name, y_name, a_name, b_name, b = 300, se.calc = c("sd", "quantile"))
items | information about the items, typically from
|
---|---|
vectors | a matrix of word vectors for all the study items |
x_name | the name of the target item condition, e.g. "Flowers" in WEAT 1 |
y_name | the name of the target item condition, e.g. "Insects" in WEAT 1 |
a_name | the name of the first condition, e.g. "Pleasant" in WEAT 1 |
b_name | the name of the second condition, e.g. "Unpleasant" in WEAT 1 |
b | number of bootstrap samples. Defaults to 300. |
se.calc | how to compute lower and upper bounds on an approximate 95 interval for the difference of differences of cosines statistic. "se" (default) or "quantile". |
a data frame with first column the
difference of differences of cosines statistic, the second and third
columns the lower and upper bounds of an approximate 95
interval from the bootstrapped statistic. If se.calc
is "quantile",
the fourth column is the median value of the statistic across
bootstrap samples.
Schematically, the statistic is the average value of
(cosine(x names, a words) - cosine(x names, b words)) - (cosine(y names, a words) - cosine(y names, b words))
If a denotes a set of 'Pleasant' and b denotes a set of 'Unpleasant' words, x are names of 'Insects', and y are names of 'Flowers' (as in WEAT 1) then the statistic will take positive values when flowers are more pleasant than insects. That is, when the degree to which flower names are more similar to pleasant versus unpleasant words is stronger than the degree to which insect names are more similar to pleasant versus unpleasant words.
Uncertainty is quantified by bootstrapping each set of
item vectors. That is, in each of the b
bootstrap samples,
vectors in each condition (a_name
, b_name
,
x_name
and y_name
) are
separately resampled with replacement, and the statistic is
computed. The bootstrap sampling distribution of this statistic
is summarized in the output by an approximate
95
statistic across bootstrap samples if se.calc
is "sd", or as the
0.025 and 0.975 quantiles of the bootstrap sampling distribution
if se.calc
is "quantile".
If se.calc
is "quantile" the data frame returned has an extra column
containing the median of the statistic in the bootstrap samples. This should not
be too far from the original statistic.
The sign of the statistic is arbitrary. If you wish to reverse
the ordering just swap the values of a_name
for b_name
or x_name
and y_name
when calling it.
Note that this is not the statistic reported in the original paper. This bootstraps within each target categories (x and y) and within each attribute category (a and b).
its <- cbn_get_items("WEAT", 1) its_vecs <- cbn_get_item_vectors("WEAT", 1) res <- weat_boot(its, its_vecs, x_name = "Flowers", y_name = "Insects", a_name = "Pleasant", b_name= "Unpleasant", se.calc = "quantile") res#> diff lwr upr median #> 1 0.1547512 0.09626188 0.2101164 0.1502303