Natural Language Processing with Python

Natural Language Processing with Python

Abstract

This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication. Packed with examples and exercises, Natural Language Processing with Python will help you: Extract information from unstructured text, either to guess the topic or identify "named entities" Analyze linguistic structure in text, including parsing and semantic analysis Access popular linguistic databases, including WordNet and treebanks Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.

Steven Bird, Ewan Klein, Edward Loper

https://dl.acm.org/doi/book/10.5555/1717171

Extract

This book is about teaching computers to understand and work with human language. It’s called Natural Language Processing (NLP). Imagine you want a computer to read a story, figure out what it’s about, or even translate it into another language. This book shows you how to do that using a special computer language called Python.

Here’s what you can learn from the book:

  1. Working with Text: You’ll learn how to make programs that can read and understand big chunks of text, like stories or emails.

  2. Finding Information: You can teach the computer to pick out important words or names from the text, like finding out who the main characters are in a story.

  3. Understanding Language: You’ll learn how to break down sentences to see how they are built and what they mean.

  4. Using Tools: The book uses a special library called NLTK (Natural Language Toolkit), which has lots of tools and examples to help you learn.

This book is great for anyone who is curious about how computers can understand language. It’s like teaching a robot to read and write! If you’re interested in making cool programs or just want to know how computers can work with words, this book is a fun and useful guide.

Exercise

Steps:

  1. Install the required libraries (nltk, textblob, and pandas).

  2. Load the Brown Corpus.

  3. Filter emotion-related sentences.

  4. Use TextBlob to perform sentiment analysis on the filtered sentences.

  5. Store the results in a DataFrame for analysis.


Code:

python

Copy

# Step 1: Install required libraries
# Run these commands in your terminal or notebook if you haven't installed the libraries yet:
# pip install nltk textblob pandas

import nltk
from textblob import TextBlob
import pandas as pd
from nltk.corpus import brown

# Step 2: Download the Brown Corpus
nltk.download('brown')

# Step 3: Define emotion-related keywords
emotion_keywords = ['happy', 'sad', 'angry', 'joy', 'fear', 'love', 'hate', 'excited', 'calm', 'surprised']

# Step 4: Load all sentences from the Brown Corpus
brown_sents = brown.sents()

# Step 5: Filter sentences containing emotion-related words
emotion_sents = [' '.join(sent) for sent in brown_sents if any(word.lower() in emotion_keywords for word in sent)]

# Step 6: Perform sentiment analysis using TextBlob
# Create a list to store sentiment analysis results
sentiment_results = []

for sentence in emotion_sents:
    blob = TextBlob(sentence)
    sentiment = blob.sentiment  # Get sentiment polarity and subjectivity
    sentiment_results.append({
        'Sentence': sentence,
        'Polarity': sentiment.polarity,  # Polarity: -1 (negative) to 1 (positive)
        'Subjectivity': sentiment.subjectivity  # Subjectivity: 0 (objective) to 1 (subjective)
    })

# Step 7: Create a DataFrame to store the results
df = pd.DataFrame(sentiment_results)

# Step 8: Display the DataFrame
print("Emotion-Related Sentences with Sentiment Analysis:")
print(df)

# Step 9: Save the DataFrame to a CSV file (optional)
df.to_csv('emotion_sentiment_analysis.csv', index=False)

Explanation of the Code:

  1. Install Libraries:

    • nltk: For accessing the Brown Corpus.

    • textblob: For sentiment analysis.

    • pandas: For creating and managing the DataFrame.

  2. Download the Brown Corpus:

    • Ensures the corpus is available for use.
  3. Emotion Keywords:

    • A list of emotion-related keywords is defined. You can customize this list.
  4. Filter Sentences:

    • The code filters sentences from the Brown Corpus that contain any of the emotion keywords.
  5. Sentiment Analysis with TextBlob:

    • For each filtered sentence, TextBlob calculates:

      • Polarity: Ranges from -1 (negative) to 1 (positive). Indicates the emotional tone of the sentence.

      • Subjectivity: Ranges from 0 (objective) to 1 (subjective). Indicates how opinionated the sentence is.

  6. Store Results in a DataFrame:

    • The sentences, along with their polarity and subjectivity scores, are stored in a DataFrame.
  7. Output:

    • The DataFrame is printed to the console.

    • Optionally, the DataFrame is saved to a CSV file (emotion_sentiment_analysis.csv).


Example Output:

If the Brown Corpus contains sentences like:

  • "He was happy to see her."

  • "She felt a deep sense of joy."

  • "The news filled him with fear."

The DataFrame will look like this:

SentencePolaritySubjectivity
He was happy to see her.0.80.9
She felt a deep sense of joy.0.70.8
The news filled him with fear.-0.50.6

Key Learning Points:

  • TextBlob Sentiment Analysis: Learn how to use TextBlob to analyze the emotional tone of text.

  • DataFrame Creation: Learn how to organize and analyze data using Pandas.

  • Corpus Filtering: Learn how to filter specific content from a large text corpus.

This exercise is a great way to practice text processing, sentiment analysis, and data organization in Python!

https://colab.research.google.com/drive/1Ia2s0p0FBsA7KWoU7c6bbM-dkhbWKqxw