Earnings call transcripts contain valuable information about a company’s performance, strategy, and management outlook. While many financial data providers charge for access to these transcripts, they’re actually available for free through the SEC’s EDGAR system. In this tutorial, we’ll walk through how to programmatically download earnings call transcripts using Python, with NVIDIA (NVDA) as our example.
What You’ll Need
Before we start, make sure you have Python installed along with the requests library for making HTTP requests:
bash
pip install requests
Understanding SEC Filings
When companies hold earnings calls, they typically file the transcript with the SEC as part of an 8-K form (a form used for announcing major events). The transcript itself is usually attached as an exhibit, most commonly Exhibit 99.1 or 99.2.
The Critical First Step: Identifying Yourself
This is extremely important: The SEC requires that all automated requests to their EDGAR system include a proper User-Agent header that identifies who you are. This isn’t optional – it’s a requirement stated in the SEC’s fair access policy.
Your User-Agent should follow this format: Company Name contact@email.com
Here’s why this matters:
- The SEC uses this information to contact you if your requests are causing problems
- Failing to include proper identification can result in your IP being blocked
- It’s simply good etiquette when using a free public service
Let’s set this up properly:
python
import requests
# Replace with your actual information
headers = {
'User-Agent': 'YourName yourname@email.com'
}
Never skip this step. Every single request you make to the SEC’s servers should include this header.
Finding NVDA’s Recent Filings
Now let’s search for NVIDIA’s recent 8-K filings. The SEC provides a convenient API for this:
python
# NVIDIA's CIK (Central Index Key) - this is their SEC identifier
cik = '0001045810'
# Get recent filings
url = f'https://data.sec.gov/submissions/CIK{cik}.json'
response = requests.get(url, headers=headers)
filings_data = response.json()
# Extract recent 8-K filings
recent_filings = filings_data['filings']['recent']
form_types = recent_filings['form']
accession_numbers = recent_filings['accessionNumber']
filing_dates = recent_filings['filingDate']
primary_documents = recent_filings['primaryDocument']
# Filter for 8-K forms
eightk_filings = []
for i, form in enumerate(form_types):
if form == '8-K':
eightk_filings.append({
'accession': accession_numbers[i],
'date': filing_dates[i],
'document': primary_documents[i]
})
# Show the 5 most recent 8-K filings
print("Recent 8-K filings for NVIDIA:")
for filing in eightk_filings[:5]:
print(f"Date: {filing['date']}, Accession: {filing['accession']}")
Getting the Filing Details
Once we have an accession number for an 8-K filing, we need to look at its structure to find the transcript exhibit:
python
import time
def get_filing_documents(cik, accession_number, headers):
# Remove dashes from accession number for the URL
accession_clean = accession_number.replace('-', '')
# Build the URL to the filing's index
index_url = f'https://www.sec.gov/cgi-bin/viewer?action=view&cik={cik}&accession_number={accession_number}&xbrl_type=v'
# Get the filing document list
filing_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{accession_number}-index.json'
# Be respectful: wait between requests
time.sleep(0.1)
response = requests.get(filing_url, headers=headers)
return response.json()
# Example: get documents for the most recent 8-K
if eightk_filings:
recent_8k = eightk_filings[0]
docs = get_filing_documents(cik, recent_8k['accession'], headers)
print("\nDocuments in this filing:")
for item in docs['directory']['item']:
print(f"- {item['name']}")
Downloading the Transcript
Earnings call transcripts are typically labeled as Exhibit 99.1 or 99.2. Let’s download one:
python
def download_transcript(cik, accession_number, document_name, headers):
# Clean up the accession number
accession_clean = accession_number.replace('-', '')
# Build the document URL
doc_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{document_name}'
# Be respectful: wait between requests
time.sleep(0.1)
response = requests.get(doc_url, headers=headers)
return response.text
# Find and download exhibit 99.1 (common location for transcripts)
for item in docs['directory']['item']:
if 'ex99' in item['name'].lower() or 'exhibit99' in item['name'].lower():
print(f"\nDownloading: {item['name']}")
transcript_html = download_transcript(cik, recent_8k['accession'], item['name'], headers)
print(f"Downloaded {len(transcript_html)} characters")
break
Cleaning Up the HTML
SEC documents are typically in HTML format with lots of markup. Here’s a simple way to extract the text:
python
from html.parser import HTMLParser
class HTMLTextExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.text = []
def handle_data(self, data):
self.text.append(data)
def get_text(self):
return ''.join(self.text)
def extract_text(html_content):
parser = HTMLTextExtractor()
parser.feed(html_content)
text = parser.get_text()
# Clean up whitespace
lines = [line.strip() for line in text.split('\n')]
lines = [line for line in lines if line]
return '\n'.join(lines)
# Extract clean text
if transcript_html:
clean_text = extract_text(transcript_html)
print("\nFirst 500 characters of transcript:")
print(clean_text[:500])
Complete Working Example
Here’s everything put together in a single script:
python
import requests
import time
from html.parser import HTMLParser
# CRITICAL: Replace with your actual information
headers = {
'User-Agent': 'YourName yourname@email.com'
}
class HTMLTextExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.text = []
def handle_data(self, data):
self.text.append(data)
def get_text(self):
return ''.join(self.text)
def get_recent_filings(cik, headers):
"""Get recent filings for a company."""
url = f'https://data.sec.gov/submissions/CIK{cik}.json'
response = requests.get(url, headers=headers)
return response.json()
def get_filing_documents(cik, accession_number, headers):
"""Get the list of documents in a filing."""
accession_clean = accession_number.replace('-', '')
filing_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{accession_number}-index.json'
time.sleep(0.1) # Be respectful
response = requests.get(filing_url, headers=headers)
return response.json()
def download_document(cik, accession_number, document_name, headers):
"""Download a specific document from a filing."""
accession_clean = accession_number.replace('-', '')
doc_url = f'https://www.sec.gov/Archives/edgar/data/{cik.lstrip("0")}/{accession_clean}/{document_name}'
time.sleep(0.1) # Be respectful
response = requests.get(doc_url, headers=headers)
return response.text
def extract_text(html_content):
"""Extract plain text from HTML."""
parser = HTMLTextExtractor()
parser.feed(html_content)
text = parser.get_text()
lines = [line.strip() for line in text.split('\n')]
lines = [line for line in lines if line]
return '\n'.join(lines)
# Main execution
cik = '0001045810' # NVIDIA
# Step 1: Get recent filings
print("Fetching NVIDIA's recent filings...")
filings_data = get_recent_filings(cik, headers)
# Step 2: Filter for 8-K forms
recent_filings = filings_data['filings']['recent']
eightk_filings = []
for i, form in enumerate(recent_filings['form']):
if form == '8-K':
eightk_filings.append({
'accession': recent_filings['accessionNumber'][i],
'date': recent_filings['filingDate'][i],
'document': recent_filings['primaryDocument'][i]
})
print(f"\nFound {len(eightk_filings)} recent 8-K filings")
# Step 3: Get documents from the most recent 8-K
if eightk_filings:
recent_8k = eightk_filings[0]
print(f"\nExamining 8-K from {recent_8k['date']}")
docs = get_filing_documents(cik, recent_8k['accession'], headers)
# Step 4: Find and download transcript
for item in docs['directory']['item']:
if 'ex99' in item['name'].lower():
print(f"Downloading {item['name']}...")
html_content = download_document(cik, recent_8k['accession'], item['name'], headers)
# Step 5: Extract text
clean_text = extract_text(html_content)
# Save to file
filename = f"nvda_transcript_{recent_8k['date']}.txt"
with open(filename, 'w', encoding='utf-8') as f:
f.write(clean_text)
print(f"Saved transcript to {filename}")
print(f"\nFirst 500 characters:\n{clean_text[:500]}")
break
Important Reminders
- Always include your User-Agent header – Every single request must identify who you are
- Be respectful with request timing – Include small delays between requests (we used 0.1 seconds)
- Not all 8-Ks contain transcripts – Some 8-Ks are for other announcements, so you may need to check multiple filings
- Rate limiting – The SEC may limit your requests if you’re making too many too quickly
Next Steps
Now that you can download earnings call transcripts, you could:
- Build a database of transcripts across multiple quarters
- Analyze sentiment or key topics in the transcripts
- Compare management commentary across different time periods
- Extract specific sections like Q&A portions
The key is to always remember to properly identify yourself to the SEC and be respectful of their servers. Happy analyzing!