This data set contains gender information only for names used in WEFAT 2. It is a slightly normalized version of the original Dataverse materials.

cbn_gender_name_stats_census1990

Format

An object of class data.frame with 50 rows and 5 columns.

Details

The columns are name, gender.score a numerical score derived (somehow) using -1 to be female, 0 to mean unisex, and 1 to mean male, and percentage.in.population, percentage.in.male.population, and percentage.in.female.population. Apparently these three are some measure of the prevalence of the name in the US population and two gender subpopulations.

The original materials are a tab separated file located at system.file("extdata", "censusNames1990.tsv", package = "cbn").

Presumably some Bayes theorem with the addition of the population gender balance recreates the quantity of substantive interest: P(gender | name). This has not been done.