Word Association Norms, Mutual Information, and Lexicography

Word Association Norms, Mutual Information, and Lexicography

Abstract:

The term word association is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor. ) We will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word). This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable.) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.

(Kenneth Ward Church, Patrick Hanks)

https://www.semanticscholar.org/paper/Word-Association-Norms%2C-Mutual-Information%2C-and-Church-Hanks

https://aclanthology.org/P89-1010.pdf

Key Concepts

  1. Mutual Information:

    • MI compares the joint probability of observing two events (or words) together, P(x,y), with the probability of observing them independently, P(x)P(y).

    • If P(x,y) is significantly larger than P(x)P(y), it indicates a strong association, resulting in I(x,y)>0.

    • Conversely, if P(x,y) is similar to P(x)P(y), then I(x,y)≈0, suggesting no significant relationship.

    • If x and y are in complementary distribution, they do not occur together. This means that P(x,y) is very low or approaches zero..

  2. Estimation of Probabilities:

    • The probabilities P(x) and P(y) are estimated by counting occurrences in a corpus, denoted as f(x) and f(y), and normalizing by the total corpus size N.

    • Joint probabilities P(x,y) are estimated by counting how often x is followed by y within a specified window size www (e.g., 5 words).

  3. Window Size:

    • The choice of window size affects the type of relationships captured:

      • Smaller windows identify fixed expressions (like idioms).

      • Larger windows capture broader semantic relationships.

    • A window size of 5 words is chosen as a compromise to balance capturing meaningful relationships without losing contextual adjacency.

  4. Count Threshold:

    • The authors set a threshold, avoiding pairs with very small counts (e.g., f(x,y)<5), to maintain stability in the association ratio. This avoids unreliable estimates that can arise from low counts.
  5. Symmetry in Probabilities:

    • MI is symmetric (P(x,y)=P(y,x)), meaning the relationship holds regardless of the order of the words.

    • The association ratio, however, is not symmetric because it captures linear precedence (the order of appearance). This asymmetry can reveal interesting biases or relationships in data, such as syntactic patterns or sociolinguistic trends.

Lab Works:

https://colab.research.google.com/drive/1f5yfmhAocDZ9bHeg1QKXY_086dEHIVTi