Analyzing r/wallstreetbets Sentiment with Python, PRAW, and VADER

Learn to scrape r/wallstreetbets posts using Python’s PRAW library and analyze their sentiment with NLTK’s VADER tool. This hands-on tutorial covers Reddit API setup, data collection, sentiment scoring, and visualization—perfect for anyone interested in social media analysis or natural language processing.

Ever wondered what the overall mood is on r/wallstreetbets? Are people bullish or bearish? Excited or anxious? In this tutorial, we’ll learn how to scrape Reddit posts using PRAW (Python Reddit API Wrapper) and analyze their sentiment using NLTK’s VADER (Valence Aware Dictionary and sEntiment Reasoner) tool.

What You’ll Learn

By the end of this tutorial, you’ll be able to:

  • Set up PRAW to access Reddit’s API
  • Scrape posts from r/wallstreetbets (or any subreddit)
  • Use VADER sentiment analysis to gauge the emotional tone of posts
  • Visualize sentiment trends in the data

Prerequisites

Basic Python knowledge is helpful, but we’ll explain everything step by step. You’ll need Python 3.6 or higher installed on your computer.

What is PRAW?

PRAW (Python Reddit API Wrapper) is a Python library that makes it easy to interact with Reddit’s API. Instead of making complex HTTP requests, PRAW provides simple Python methods to retrieve posts, comments, and user information.

What is VADER?

VADER is a sentiment analysis tool specifically designed for social media text. It’s perfect for analyzing Reddit posts because it understands slang, emojis, and the informal language people use online. VADER gives each piece of text a sentiment score ranging from very negative to very positive.

Step 1: Installation

First, let’s install the required libraries. Open your terminal or command prompt and run:

bash

pip install praw nltk matplotlib pandas

After installing NLTK, we need to download the VADER lexicon:

python

import nltk
nltk.download('vader_lexicon')

Step 2: Setting Up Reddit API Credentials

To use PRAW, you need Reddit API credentials. Here’s how to get them:

  1. Go to https://www.reddit.com/prefs/apps
  2. Scroll down and click “Create App” or “Create Another App”
  3. Fill in the form:
    • name: Choose any name (e.g., “WSB Sentiment Analyzer”)
    • App type: Select “script”
    • description: Optional
    • about url: Leave blank
    • redirect uri: Enter http://localhost:8080
  4. Click “Create app”

You’ll see your app details. Note down:

  • The string under your app name (this is your client_id)
  • The “secret” field (this is your client_secret)

Step 3: Connecting to Reddit

Now let’s write some code to connect to Reddit:

python

import praw

# Initialize Reddit instance
reddit = praw.Reddit(
    client_id='YOUR_CLIENT_ID',
    client_secret='YOUR_CLIENT_SECRET',
    user_agent='WSB Sentiment Analyzer v1.0'
)

# Test the connection
print(f"Read-only mode: {reddit.read_only}")
print("Connection successful!")

Replace YOUR_CLIENT_ID and YOUR_CLIENT_SECRET with your actual credentials. The user_agent is a description of your app that Reddit uses to identify your requests.

Step 4: Scraping r/wallstreetbets Posts

Let’s scrape the top posts from r/wallstreetbets:

python

# Access the wallstreetbets subreddit
subreddit = reddit.subreddit('wallstreetbets')

# Get the top 100 hot posts
posts_data = []

for post in subreddit.hot(limit=100):
    posts_data.append({
        'title': post.title,
        'score': post.score,
        'id': post.id,
        'url': post.url,
        'num_comments': post.num_comments,
        'created': post.created_utc,
        'body': post.selftext
    })

print(f"Scraped {len(posts_data)} posts!")

You can also scrape by different sorting methods:

  • subreddit.hot(limit=100) – Currently trending posts
  • subreddit.new(limit=100) – Newest posts
  • subreddit.top(limit=100, time_filter='day') – Top posts (time_filter can be ‘hour’, ‘day’, ‘week’, ‘month’, ‘year’, or ‘all’)

Step 5: Setting Up VADER Sentiment Analysis

Now let’s initialize VADER and create a function to analyze sentiment:

python

from nltk.sentiment import SentimentIntensityAnalyzer

# Initialize VADER
sia = SentimentIntensityAnalyzer()

def analyze_sentiment(text):
    """
    Analyze sentiment of text using VADER.
    Returns a dictionary with negative, neutral, positive, and compound scores.
    """
    if not text:  # Handle empty text
        return {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
    
    scores = sia.polarity_scores(text)
    return scores

# Test it out
sample_text = "GME to the moon! 🚀🚀🚀"
print(analyze_sentiment(sample_text))

VADER returns four scores:

  • neg: Negative sentiment (0 to 1)
  • neu: Neutral sentiment (0 to 1)
  • pos: Positive sentiment (0 to 1)
  • compound: Overall sentiment (-1 to 1, where -1 is most negative and 1 is most positive)

Step 6: Analyzing Our Scraped Posts

Let’s analyze the sentiment of all our scraped posts:

python

# Add sentiment analysis to our posts
for post in posts_data:
    # Combine title and body for analysis
    full_text = post['title'] + ' ' + post['body']
    sentiment = analyze_sentiment(full_text)
    
    post['sentiment_neg'] = sentiment['neg']
    post['sentiment_neu'] = sentiment['neu']
    post['sentiment_pos'] = sentiment['pos']
    post['sentiment_compound'] = sentiment['compound']
    
    # Categorize overall sentiment
    if sentiment['compound'] >= 0.05:
        post['sentiment_category'] = 'Positive'
    elif sentiment['compound'] <= -0.05:
        post['sentiment_category'] = 'Negative'
    else:
        post['sentiment_category'] = 'Neutral'

# Print some examples
print("\nSample Analysis:")
for i, post in enumerate(posts_data[:5]):
    print(f"\nPost {i+1}:")
    print(f"Title: {post['title'][:60]}...")
    print(f"Compound Score: {post['sentiment_compound']:.3f}")
    print(f"Category: {post['sentiment_category']}")

Step 7: Visualizing the Results

Let’s create some visualizations to better understand the sentiment:

python

import pandas as pd
import matplotlib.pyplot as plt

# Convert to DataFrame for easier analysis
df = pd.DataFrame(posts_data)

# Create a figure with multiple subplots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('r/wallstreetbets Sentiment Analysis', fontsize=16, fontweight='bold')

# 1. Sentiment Category Distribution
sentiment_counts = df['sentiment_category'].value_counts()
axes[0, 0].bar(sentiment_counts.index, sentiment_counts.values, color=['green', 'gray', 'red'])
axes[0, 0].set_title('Distribution of Sentiment Categories')
axes[0, 0].set_ylabel('Number of Posts')
axes[0, 0].set_xlabel('Sentiment')

# 2. Compound Score Distribution
axes[0, 1].hist(df['sentiment_compound'], bins=30, color='skyblue', edgecolor='black')
axes[0, 1].set_title('Distribution of Compound Sentiment Scores')
axes[0, 1].set_xlabel('Compound Score')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].axvline(x=0, color='red', linestyle='--', label='Neutral')
axes[0, 1].legend()

# 3. Sentiment vs. Post Score
axes[1, 0].scatter(df['sentiment_compound'], df['score'], alpha=0.5, color='purple')
axes[1, 0].set_title('Sentiment vs. Post Score (Upvotes)')
axes[1, 0].set_xlabel('Compound Sentiment Score')
axes[1, 0].set_ylabel('Post Score')

# 4. Average Sentiment Components
sentiment_means = df[['sentiment_neg', 'sentiment_neu', 'sentiment_pos']].mean()
axes[1, 1].bar(['Negative', 'Neutral', 'Positive'], sentiment_means.values, 
               color=['red', 'gray', 'green'])
axes[1, 1].set_title('Average Sentiment Components')
axes[1, 1].set_ylabel('Average Score')

plt.tight_layout()
plt.show()

# Print summary statistics
print("\n=== Sentiment Analysis Summary ===")
print(f"Total posts analyzed: {len(df)}")
print(f"\nSentiment Category Breakdown:")
print(sentiment_counts)
print(f"\nAverage compound sentiment: {df['sentiment_compound'].mean():.3f}")
print(f"Most positive post: {df.loc[df['sentiment_compound'].idxmax(), 'title'][:60]}...")
print(f"Most negative post: {df.loc[df['sentiment_compound'].idxmin(), 'title'][:60]}...")

Step 8: Finding the Most Positive and Negative Posts

Let’s identify the most extreme posts:

python

# Sort by sentiment
df_sorted = df.sort_values('sentiment_compound', ascending=False)

print("\n=== TOP 5 MOST POSITIVE POSTS ===")
for i, row in df_sorted.head(5).iterrows():
    print(f"\nTitle: {row['title']}")
    print(f"Score: {row['score']} | Comments: {row['num_comments']}")
    print(f"Sentiment: {row['sentiment_compound']:.3f} ({row['sentiment_category']})")

print("\n\n=== TOP 5 MOST NEGATIVE POSTS ===")
for i, row in df_sorted.tail(5).iterrows():
    print(f"\nTitle: {row['title']}")
    print(f"Score: {row['score']} | Comments: {row['num_comments']}")
    print(f"Sentiment: {row['sentiment_compound']:.3f} ({row['sentiment_category']})")

Complete Code

Here’s the complete script you can run:

python

import praw
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
import pandas as pd
import matplotlib.pyplot as plt

# Download VADER lexicon (only need to do this once)
# nltk.download('vader_lexicon')

# Initialize Reddit
reddit = praw.Reddit(
    client_id='YOUR_CLIENT_ID',
    client_secret='YOUR_CLIENT_SECRET',
    user_agent='WSB Sentiment Analyzer v1.0'
)

# Initialize VADER
sia = SentimentIntensityAnalyzer()

def analyze_sentiment(text):
    if not text:
        return {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
    return sia.polarity_scores(text)

# Scrape posts
subreddit = reddit.subreddit('wallstreetbets')
posts_data = []

print("Scraping posts...")
for post in subreddit.hot(limit=100):
    full_text = post.title + ' ' + post.selftext
    sentiment = analyze_sentiment(full_text)
    
    posts_data.append({
        'title': post.title,
        'score': post.score,
        'num_comments': post.num_comments,
        'sentiment_neg': sentiment['neg'],
        'sentiment_neu': sentiment['neu'],
        'sentiment_pos': sentiment['pos'],
        'sentiment_compound': sentiment['compound'],
        'sentiment_category': 'Positive' if sentiment['compound'] >= 0.05 
                             else 'Negative' if sentiment['compound'] <= -0.05 
                             else 'Neutral'
    })

# Create DataFrame
df = pd.DataFrame(posts_data)

# Print summary
print(f"\nAnalyzed {len(df)} posts")
print(f"Average sentiment: {df['sentiment_compound'].mean():.3f}")
print("\nSentiment breakdown:")
print(df['sentiment_category'].value_counts())

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('r/wallstreetbets Sentiment Analysis', fontsize=16, fontweight='bold')

sentiment_counts = df['sentiment_category'].value_counts()
axes[0, 0].bar(sentiment_counts.index, sentiment_counts.values, color=['green', 'gray', 'red'])
axes[0, 0].set_title('Sentiment Distribution')
axes[0, 0].set_ylabel('Number of Posts')

axes[0, 1].hist(df['sentiment_compound'], bins=30, color='skyblue', edgecolor='black')
axes[0, 1].set_title('Compound Score Distribution')
axes[0, 1].set_xlabel('Compound Score')
axes[0, 1].axvline(x=0, color='red', linestyle='--')

axes[1, 0].scatter(df['sentiment_compound'], df['score'], alpha=0.5, color='purple')
axes[1, 0].set_title('Sentiment vs. Upvotes')
axes[1, 0].set_xlabel('Sentiment Score')
axes[1, 0].set_ylabel('Upvotes')

sentiment_means = df[['sentiment_neg', 'sentiment_neu', 'sentiment_pos']].mean()
axes[1, 1].bar(['Negative', 'Neutral', 'Positive'], sentiment_means.values, 
               color=['red', 'gray', 'green'])
axes[1, 1].set_title('Average Sentiment Components')

plt.tight_layout()
plt.show()

Next Steps and Ideas

Now that you’ve learned the basics, here are some ideas to expand your project:

  1. Track sentiment over time: Scrape posts daily and track how sentiment changes
  2. Analyze comments: Use PRAW to also scrape and analyze comment sentiment
  3. Compare subreddits: Analyze sentiment across different investing subreddits
  4. Stock ticker extraction: Use regular expressions to extract stock tickers and see which ones are mentioned most positively
  5. Save to CSV: Export your results using df.to_csv('wsb_sentiment.csv', index=False)

Troubleshooting

Issue: “received 401 HTTP response”

  • Check that your client_id and client_secret are correct
  • Make sure you didn’t accidentally include extra spaces

Issue: “nltk.downloader” error

  • Run nltk.download('vader_lexicon') in a Python console first

Issue: Empty or missing post bodies

  • Some posts are link posts or image posts with no text body (selftext)
  • The code handles this by analyzing just the title when body is empty

Conclusion

Congratulations! You’ve built a sentiment analysis tool that scrapes Reddit posts and analyzes their emotional content. You’ve learned how to use PRAW to access Reddit data and VADER to perform sentiment analysis, skills that are applicable to many other social media analysis projects.

The combination of web scraping and natural language processing opens up endless possibilities for data analysis. Whether you’re tracking public opinion, monitoring brand sentiment, or just satisfying your curiosity, these tools give you the power to turn unstructured text into meaningful insights.

Happy analyzing!

Leave a Reply

Your email address will not be published. Required fields are marked *