Scrape Sports Scores from ESPN with Python (Scoreboard API + Normalized CSV)
ESPN is one of the easiest “real world” sports targets because it’s continuously updated and (crucially) exposes a public JSON scoreboard feed for many leagues.
In this guide we’ll build a scraper that:
- pulls scoreboards for multiple sports/leagues via ESPN’s JSON endpoints
- normalizes teams, scores, game status, and links
- exports both CSV + JSON for downstream analysis

Scoreboard scrapes fail for boring reasons: rate limits, transient 5xxs, and IP reputation. ProxiesAPI fits cleanly into your fetch layer so retries + rotation are a small change — not a rewrite of your parser.
What we’re scraping (ESPN scoreboard JSON)
ESPN has a family of endpoints shaped like:
https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard
Examples:
- NBA:
.../sports/basketball/nba/scoreboard - NFL:
.../sports/football/nfl/scoreboard - MLB:
.../sports/baseball/mlb/scoreboard
You can often pass a date as dates=YYYYMMDD:
.../scoreboard?dates=20260530
The response includes:
events[]: each scheduled/live/final game- per-team score, display names, and IDs
- game status (pre/in/final) and short detail text
We’re not going to “screen scrape” the ESPN HTML in this tutorial. The JSON feed is the more stable source of truth and the best first choice when it’s available.
Setup
python3 -m venv .venv
source .venv/bin/activate
pip install requests tenacity python-dateutil
Step 1: Build a fetch layer (timeouts + retries)
Create espn_scores.py:
from __future__ import annotations
import os
import time
from dataclasses import dataclass
from typing import Any
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
TIMEOUT = (10, 30) # connect, read
HEADERS = {
"User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0; +https://proxiesapi.com)",
"Accept": "application/json,text/plain,*/*",
}
session = requests.Session()
session.headers.update(HEADERS)
def build_proxies() -> dict[str, str] | None:
proxy = os.getenv("PROXIESAPI_PROXY")
if not proxy:
return None
return {"http": f"http://{proxy}", "https": f"http://{proxy}"}
PROXIES = build_proxies()
@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=20))
def fetch_json(url: str) -> dict[str, Any]:
r = session.get(url, timeout=TIMEOUT, proxies=PROXIES)
r.raise_for_status()
return r.json()
ProxiesAPI integration point
If you’re running this on a schedule (daily scoreboards across leagues), the simplest integration is to set an environment variable:
export PROXIESAPI_PROXY="YOUR_PROXIESAPI_PROXY"
Then the same code routes requests through ProxiesAPI via proxies=....
Step 2: Normalize the ESPN payload into “games”
ESPN’s JSON is rich but nested. We’ll extract a flat, analysis-friendly row per game:
- league
- date/time
- home/away team name + score
- status (pre/in/final) and short detail text
- ESPN event URL
from dateutil import parser as dt
@dataclass
class GameRow:
league: str
event_id: str
start_time_utc: str | None
status: str | None
status_detail: str | None
home_team: str | None
away_team: str | None
home_score: int | None
away_score: int | None
event_url: str | None
def to_int(v: Any) -> int | None:
try:
return int(v)
except Exception:
return None
def parse_scoreboard(payload: dict[str, Any], league: str) -> list[GameRow]:
out: list[GameRow] = []
for ev in payload.get("events", []) or []:
competitions = ev.get("competitions", []) or []
comp = competitions[0] if competitions else {}
status = (comp.get("status") or {}).get("type") or {}
status_state = status.get("state")
status_detail = status.get("shortDetail") or status.get("detail")
competitors = comp.get("competitors", []) or []
home = next((c for c in competitors if c.get("homeAway") == "home"), {})
away = next((c for c in competitors if c.get("homeAway") == "away"), {})
home_team = (home.get("team") or {}).get("displayName")
away_team = (away.get("team") or {}).get("displayName")
home_score = to_int(home.get("score"))
away_score = to_int(away.get("score"))
start_time = ev.get("date")
start_time_utc = dt.parse(start_time).isoformat() if start_time else None
links = comp.get("links", []) or []
event_url = links[0].get("href") if links else None
out.append(
GameRow(
league=league,
event_id=str(ev.get("id") or ""),
start_time_utc=start_time_utc,
status=status_state,
status_detail=status_detail,
home_team=home_team,
away_team=away_team,
home_score=home_score,
away_score=away_score,
event_url=event_url,
)
)
return out
Step 3: Crawl multiple leagues (one date) and export CSV
import csv
import json
from dataclasses import asdict
from datetime import datetime, timezone
LEAGUES = {
"nba": ("basketball", "nba"),
"nfl": ("football", "nfl"),
"mlb": ("baseball", "mlb"),
"nhl": ("hockey", "nhl"),
}
def scoreboard_url(sport: str, league: str, yyyymmdd: str | None = None) -> str:
base = f"https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard"
if not yyyymmdd:
return base
return f"{base}?dates={yyyymmdd}"
def crawl_date(yyyymmdd: str) -> list[GameRow]:
rows: list[GameRow] = []
for league_key, (sport, league) in LEAGUES.items():
payload = fetch_json(scoreboard_url(sport, league, yyyymmdd))
rows.extend(parse_scoreboard(payload, league_key))
time.sleep(0.25) # be polite
return rows
def export(rows: list[GameRow], out_prefix: str) -> None:
json_path = f"{out_prefix}.json"
csv_path = f"{out_prefix}.csv"
with open(json_path, "w", encoding="utf-8") as f:
json.dump([asdict(r) for r in rows], f, ensure_ascii=False, indent=2)
fieldnames = list(asdict(rows[0]).keys()) if rows else []
with open(csv_path, "w", newline="", encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=fieldnames)
w.writeheader()
for r in rows:
w.writerow(asdict(r))
print("wrote:", json_path, csv_path, "rows:", len(rows))
if __name__ == "__main__":
today = datetime.now(timezone.utc).strftime("%Y%m%d")
rows = crawl_date(today)
export(rows, out_prefix=f"espn-scoreboards-{today}")
Practical notes (the stuff that breaks scrapers)
- Date + timezone: ESPN’s timestamps are ISO strings; normalize to UTC and only convert at the edges.
- Offseason dates: some leagues return empty
events; treat that as “no games”, not a failure. - Stable IDs: store
event_id+ team IDs if you’re doing joins across days. - Be honest about endpoints: this is a public feed, but it’s not a formal contract. Keep your parser defensive.
Where ProxiesAPI fits (honestly)
If you’re scraping one league once a day, you may never need proxies.
But when you scale into multiple leagues, backfills, and event detail pages (box scores, play-by-play, rosters), the failure rate climbs. ProxiesAPI helps by making the network layer stable: rotation + retries without you building your own proxy plumbing.
Scoreboard scrapes fail for boring reasons: rate limits, transient 5xxs, and IP reputation. ProxiesAPI fits cleanly into your fetch layer so retries + rotation are a small change — not a rewrite of your parser.