WEAT via simple item bootstrap

A simple bootstrap for the WEAT calculations. The statistic of interest is an average difference of average differences.

weat_boot(items, vectors, x_name, y_name, a_name, b_name, b = 300,
  se.calc = c("sd", "quantile"))

Arguments

items	information about the items, typically from `cbn_get_items`
vectors	a matrix of word vectors for all the study items
x_name	the name of the target item condition, e.g. "Flowers" in WEAT 1
y_name	the name of the target item condition, e.g. "Insects" in WEAT 1
a_name	the name of the first condition, e.g. "Pleasant" in WEAT 1
b_name	the name of the second condition, e.g. "Unpleasant" in WEAT 1
b	number of bootstrap samples. Defaults to 300.
se.calc	how to compute lower and upper bounds on an approximate 95 interval for the difference of differences of cosines statistic. "se" (default) or "quantile".

Value

a data frame with first column the difference of differences of cosines statistic, the second and third columns the lower and upper bounds of an approximate 95 interval from the bootstrapped statistic. If se.calc is "quantile", the fourth column is the median value of the statistic across bootstrap samples.

Details

Schematically, the statistic is the average value of

(cosine(x names, a words) - cosine(x names, b words)) - (cosine(y names, a words) - cosine(y names, b words))

If a denotes a set of 'Pleasant' and b denotes a set of 'Unpleasant' words, x are names of 'Insects', and y are names of 'Flowers' (as in WEAT 1) then the statistic will take positive values when flowers are more pleasant than insects. That is, when the degree to which flower names are more similar to pleasant versus unpleasant words is stronger than the degree to which insect names are more similar to pleasant versus unpleasant words.

Uncertainty is quantified by bootstrapping each set of item vectors. That is, in each of the b bootstrap samples, vectors in each condition (a_name, b_name, x_name and y_name) are separately resampled with replacement, and the statistic is computed. The bootstrap sampling distribution of this statistic is summarized in the output by an approximate 95 statistic across bootstrap samples if se.calc is "sd", or as the 0.025 and 0.975 quantiles of the bootstrap sampling distribution if se.calc is "quantile".

If se.calc is "quantile" the data frame returned has an extra column containing the median of the statistic in the bootstrap samples. This should not be too far from the original statistic.

The sign of the statistic is arbitrary. If you wish to reverse the ordering just swap the values of a_name for b_name or x_name and y_name when calling it.

Note that this is not the statistic reported in the original paper. This bootstraps within each target categories (x and y) and within each attribute category (a and b).

Examples

its <- cbn_get_items("WEAT", 1)
its_vecs <- cbn_get_item_vectors("WEAT", 1)
res <- weat_boot(its, its_vecs,
                 x_name = "Flowers", y_name = "Insects",
                 a_name = "Pleasant", b_name= "Unpleasant",
                 se.calc = "quantile")
res
#>        diff        lwr       upr    median
#> 1 0.1547512 0.09626188 0.2101164 0.1502303

Arguments

Value

Details

Examples

Contents