The “Small World of Words” English word association norms for over 12,000 cue words

The “Small World of Words” English word association norms for over 12,000 cue words

Abstract:

Word associations have been used widely in psychology, but the validity of their application strongly depends on the number of cues included in the study and the extent to which they probe all associations known by an individual. In this work, we address both issues by introducing a new English word association dataset. We describe the collection of word associations for over 12,000 cue words, currently the largest such English-language resource in the world. Our procedure allowed subjects to provide multiple responses for each cue, which permits us to measure weak associations. We evaluate the utility of the dataset in several different contexts, including lexical decision and semantic categorization. We also show that measures based on a mechanism of spreading activation derived from this new resource are highly predictive of direct judgments of similarity. Finally, a comparison with existing English word association sets further highlights systematic improvements provided through these new norms.

(Simon De Deyne, Danielle J. Navarro, Amy Perfors, Marc Brysbaert & Gert Storms)

https://link.springer.com/article/10.3758/s13428-018-1115-7

https://www.semanticscholar.org/paper/The-%E2%80%9CSmall-World-of-Words%E2%80%9D-English-word-association-Deyne-Navarro/bfc9226ea616127d53a8c3ddcccf104d04bf371a


English Data (SWOW-EN18)

Updated 18 October 2018

Word association and participant data for 100 primary, secondary and tertiary responses to 12,292 cues. The data published in Behavior Research Methods were collected between 2011 and 2018. The preprocessed data consist of normalizations of cues and responses by spell-checking them, correcting capitalization and Americanizing. In addition to normalizing cues and responses, the preprocessed file contains data in which each cue is judged by exactly 100 participants (see Github repository for details).

Scripts with a processing pipeline to analyse these data in R can be obtained from the SWOWEN-2018 github repository. Note to R users: use the following command to deal with quotation, otherwise the entire file might not be read in correctly. X= read_delim('strength.SWOW-EN.R123.csv',delim='\t',quote = '',escape_backslash = F,escape_double = F) Raw and processed data, together with cue and response statistics can be found below.

SWOW-EN18 [80Mb]

Citation: De Deyne, S., Navarro, D.J., Perfors, A. et al. (2019). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research, 51, 987–1006. https://doi.org/10.3758/s13428-018-1115-7


Discussions:

In the context of the "Small World of Words" paper, Pointwise Mutual Information (PMI) is used to quantify the strength of association between pairs of words based on the frequency of their co-occurrence in the dataset.


PMI Application in the Paper

  1. Strengthening Word Associations:

    • PMI is used to evaluate how strongly a cue word (e.g., "dog") is associated with a response word (e.g., "bark") based on participant responses. This allows researchers to filter out weak or less meaningful associations, focusing on those with a strong cognitive or semantic link.
  2. Network Construction:

    • The PMI values help build a word association network where nodes represent words, and edges represent the strength of association between words. Higher PMI values result in stronger connections in the network.
  3. Identifying Semantic Clusters:

    • Words with high PMI scores tend to form clusters, reflecting semantic or conceptual categories (e.g., "dog," "cat," "bark," "pet" might cluster together in a network).
  4. Improving Robustness of Analysis:

    • By applying PMI, the dataset minimizes noise from random or less frequent associations. This improves the reliability of the derived word association norms and their applicability in computational models.

Lab Works:

https://colab.research.google.com/drive/1NqqaHec0kcBHCk8DE75ISagxNUfbMBCt


Key Insights from PMI Analysis

  • PMI uncovers nuanced relationships in the data, such as asymmetric associations (e.g., "fire" might strongly associate with "hot," but "hot" may not strongly associate with "fire").

  • It highlights differences in how certain words are semantically interpreted by different groups of participants.