Downloading Earnings Call Transcripts from the SEC

Learn how to programmatically download earnings call transcripts from the SEC’s EDGAR system using Python. This beginner-friendly tutorial walks through finding NVDA’s 8-K filings, extracting transcript exhibits, and cleaning the data—all while respecting SEC API requirements.

Earnings call transcripts contain valuable information about a company’s performance, strategy, and management outlook. While many financial data providers charge for access to these transcripts, they’re actually available for free through the SEC’s EDGAR system. In this tutorial, we’ll walk through how to programmatically download earnings call transcripts using Python, with NVIDIA (NVDA) as our example.

What You’ll Need

Before we start, make sure you have Python installed along with the requests library for making HTTP requests:

bash

pip install requests

Understanding SEC Filings

When companies hold earnings calls, they typically file the transcript with the SEC as part of an 8-K form (a form used for announcing major events). The transcript itself is usually attached as an exhibit, most commonly Exhibit 99.1 or 99.2.

The Critical First Step: Identifying Yourself

This is extremely important: The SEC requires that all automated requests to their EDGAR system include a proper User-Agent header that identifies who you are. This isn’t optional – it’s a requirement stated in the SEC’s fair access policy.

Your User-Agent should follow this format: Company Name contact@email.com

Here’s why this matters:

  • The SEC uses this information to contact you if your requests are causing problems
  • Failing to include proper identification can result in your IP being blocked
  • It’s simply good etiquette when using a free public service

Let’s set this up properly:

python

import requests

# Replace with your actual information
headers = {
    'User-Agent': 'YourName yourname@email.com'
}

Never skip this step. Every single request you make to the SEC’s servers should include this header.

Finding NVDA’s Recent Filings

Now let’s search for NVIDIA’s recent 8-K filings. The SEC provides a convenient API for this:

python

# NVIDIA's CIK (Central Index Key) - this is their SEC identifier
cik = '0001045810'

# Get recent filings
url = f'https://data.sec.gov/submissions/CIK{cik}.json'
response = requests.get(url, headers=headers)
filings_data = response.json()

# Extract recent 8-K filings
recent_filings = filings_data['filings']['recent']
form_types = recent_filings['form']
accession_numbers = recent_filings['accessionNumber']
filing_dates = recent_filings['filingDate']
primary_documents = recent_filings['primaryDocument']

# Filter for 8-K forms
eightk_filings = []
for i, form in enumerate(form_types):
    if form == '8-K':
        eightk_filings.append({
            'accession': accession_numbers[i],
            'date': filing_dates[i],
            'document': primary_documents[i]
        })

# Show the 5 most recent 8-K filings
print("Recent 8-K filings for NVIDIA:")
for filing in eightk_filings[:5]:
    print(f"Date: {filing['date']}, Accession: {filing['accession']}")

Getting the Filing Details

Once we have an accession number for an 8-K filing, we need to look at its structure to find the transcript exhibit:

python

import time

def get_filing_documents(cik, accession_number, headers):
    # Remove dashes from accession number for the URL
    accession_clean = accession_number.replace('-', '')
    
    # Build the URL to the filing's index
    index_url = f'https://www.sec.gov/cgi-bin/viewer?action=view&cik={cik}&accession_number={accession_number}&xbrl_type=v'
    
    # Get the filing document list
    filing_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{accession_number}-index.json'
    
    # Be respectful: wait between requests
    time.sleep(0.1)
    
    response = requests.get(filing_url, headers=headers)
    return response.json()

# Example: get documents for the most recent 8-K
if eightk_filings:
    recent_8k = eightk_filings[0]
    docs = get_filing_documents(cik, recent_8k['accession'], headers)
    
    print("\nDocuments in this filing:")
    for item in docs['directory']['item']:
        print(f"- {item['name']}")

Downloading the Transcript

Earnings call transcripts are typically labeled as Exhibit 99.1 or 99.2. Let’s download one:

python

def download_transcript(cik, accession_number, document_name, headers):
    # Clean up the accession number
    accession_clean = accession_number.replace('-', '')
    
    # Build the document URL
    doc_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{document_name}'
    
    # Be respectful: wait between requests
    time.sleep(0.1)
    
    response = requests.get(doc_url, headers=headers)
    return response.text

# Find and download exhibit 99.1 (common location for transcripts)
for item in docs['directory']['item']:
    if 'ex99' in item['name'].lower() or 'exhibit99' in item['name'].lower():
        print(f"\nDownloading: {item['name']}")
        transcript_html = download_transcript(cik, recent_8k['accession'], item['name'], headers)
        print(f"Downloaded {len(transcript_html)} characters")
        break

Cleaning Up the HTML

SEC documents are typically in HTML format with lots of markup. Here’s a simple way to extract the text:

python

from html.parser import HTMLParser

class HTMLTextExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.text = []
    
    def handle_data(self, data):
        self.text.append(data)
    
    def get_text(self):
        return ''.join(self.text)

def extract_text(html_content):
    parser = HTMLTextExtractor()
    parser.feed(html_content)
    text = parser.get_text()
    
    # Clean up whitespace
    lines = [line.strip() for line in text.split('\n')]
    lines = [line for line in lines if line]
    return '\n'.join(lines)

# Extract clean text
if transcript_html:
    clean_text = extract_text(transcript_html)
    print("\nFirst 500 characters of transcript:")
    print(clean_text[:500])

Complete Working Example

Here’s everything put together in a single script:

python

import requests
import time
from html.parser import HTMLParser

# CRITICAL: Replace with your actual information
headers = {
    'User-Agent': 'YourName yourname@email.com'
}

class HTMLTextExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.text = []
    
    def handle_data(self, data):
        self.text.append(data)
    
    def get_text(self):
        return ''.join(self.text)

def get_recent_filings(cik, headers):
    """Get recent filings for a company."""
    url = f'https://data.sec.gov/submissions/CIK{cik}.json'
    response = requests.get(url, headers=headers)
    return response.json()

def get_filing_documents(cik, accession_number, headers):
    """Get the list of documents in a filing."""
    accession_clean = accession_number.replace('-', '')
    filing_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{accession_number}-index.json'
    time.sleep(0.1)  # Be respectful
    response = requests.get(filing_url, headers=headers)
    return response.json()

def download_document(cik, accession_number, document_name, headers):
    """Download a specific document from a filing."""
    accession_clean = accession_number.replace('-', '')
    doc_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{document_name}'
    time.sleep(0.1)  # Be respectful
    response = requests.get(doc_url, headers=headers)
    return response.text

def extract_text(html_content):
    """Extract plain text from HTML."""
    parser = HTMLTextExtractor()
    parser.feed(html_content)
    text = parser.get_text()
    lines = [line.strip() for line in text.split('\n')]
    lines = [line for line in lines if line]
    return '\n'.join(lines)

# Main execution
cik = '0001045810'  # NVIDIA

# Step 1: Get recent filings
print("Fetching NVIDIA's recent filings...")
filings_data = get_recent_filings(cik, headers)

# Step 2: Filter for 8-K forms
recent_filings = filings_data['filings']['recent']
eightk_filings = []
for i, form in enumerate(recent_filings['form']):
    if form == '8-K':
        eightk_filings.append({
            'accession': recent_filings['accessionNumber'][i],
            'date': recent_filings['filingDate'][i],
            'document': recent_filings['primaryDocument'][i]
        })

print(f"\nFound {len(eightk_filings)} recent 8-K filings")

# Step 3: Get documents from the most recent 8-K
if eightk_filings:
    recent_8k = eightk_filings[0]
    print(f"\nExamining 8-K from {recent_8k['date']}")
    
    docs = get_filing_documents(cik, recent_8k['accession'], headers)
    
    # Step 4: Find and download transcript
    for item in docs['directory']['item']:
        if 'ex99' in item['name'].lower():
            print(f"Downloading {item['name']}...")
            html_content = download_document(cik, recent_8k['accession'], item['name'], headers)
            
            # Step 5: Extract text
            clean_text = extract_text(html_content)
            
            # Save to file
            filename = f"nvda_transcript_{recent_8k['date']}.txt"
            with open(filename, 'w', encoding='utf-8') as f:
                f.write(clean_text)
            
            print(f"Saved transcript to {filename}")
            print(f"\nFirst 500 characters:\n{clean_text[:500]}")
            break

Important Reminders

  1. Always include your User-Agent header – Every single request must identify who you are
  2. Be respectful with request timing – Include small delays between requests (we used 0.1 seconds)
  3. Not all 8-Ks contain transcripts – Some 8-Ks are for other announcements, so you may need to check multiple filings
  4. Rate limiting – The SEC may limit your requests if you’re making too many too quickly

Next Steps

Now that you can download earnings call transcripts, you could:

  • Build a database of transcripts across multiple quarters
  • Analyze sentiment or key topics in the transcripts
  • Compare management commentary across different time periods
  • Extract specific sections like Q&A portions

The key is to always remember to properly identify yourself to the SEC and be respectful of their servers. Happy analyzing!

Leave a Reply

Your email address will not be published. Required fields are marked *