A simple bootstrap for the WEAT calculations. The statistic of interest is an average difference of average differences.

weat_boot(items, vectors, x_name, y_name, a_name, b_name, b = 300,
  se.calc = c("sd", "quantile"))

Arguments

items

information about the items, typically from cbn_get_items

vectors

a matrix of word vectors for all the study items

x_name

the name of the target item condition, e.g. "Flowers" in WEAT 1

y_name

the name of the target item condition, e.g. "Insects" in WEAT 1

a_name

the name of the first condition, e.g. "Pleasant" in WEAT 1

b_name

the name of the second condition, e.g. "Unpleasant" in WEAT 1

b

number of bootstrap samples. Defaults to 300.

se.calc

how to compute lower and upper bounds on an approximate 95 interval for the difference of differences of cosines statistic. "se" (default) or "quantile".

Value

a data frame with first column the difference of differences of cosines statistic, the second and third columns the lower and upper bounds of an approximate 95 interval from the bootstrapped statistic. If se.calc is "quantile", the fourth column is the median value of the statistic across bootstrap samples.

Details

Schematically, the statistic is the average value of

(cosine(x names, a words) - cosine(x names, b words)) - (cosine(y names, a words) - cosine(y names, b words))

If a denotes a set of 'Pleasant' and b denotes a set of 'Unpleasant' words, x are names of 'Insects', and y are names of 'Flowers' (as in WEAT 1) then the statistic will take positive values when flowers are more pleasant than insects. That is, when the degree to which flower names are more similar to pleasant versus unpleasant words is stronger than the degree to which insect names are more similar to pleasant versus unpleasant words.

Uncertainty is quantified by bootstrapping each set of item vectors. That is, in each of the b bootstrap samples, vectors in each condition (a_name, b_name, x_name and y_name) are separately resampled with replacement, and the statistic is computed. The bootstrap sampling distribution of this statistic is summarized in the output by an approximate 95 statistic across bootstrap samples if se.calc is "sd", or as the 0.025 and 0.975 quantiles of the bootstrap sampling distribution if se.calc is "quantile".

If se.calc is "quantile" the data frame returned has an extra column containing the median of the statistic in the bootstrap samples. This should not be too far from the original statistic.

The sign of the statistic is arbitrary. If you wish to reverse the ordering just swap the values of a_name for b_name or x_name and y_name when calling it.

Note that this is not the statistic reported in the original paper. This bootstraps within each target categories (x and y) and within each attribute category (a and b).

Examples

its <- cbn_get_items("WEAT", 1) its_vecs <- cbn_get_item_vectors("WEAT", 1) res <- weat_boot(its, its_vecs, x_name = "Flowers", y_name = "Insects", a_name = "Pleasant", b_name= "Unpleasant", se.calc = "quantile") res
#> diff lwr upr median #> 1 0.1547512 0.09626188 0.2101164 0.1502303