Scrape Sports Scores from ESPN with Python (Scoreboard API + Normalized CSV)

ESPN is one of the easiest “real world” sports targets because it’s continuously updated and (crucially) exposes a public JSON scoreboard feed for many leagues.

In this guide we’ll build a scraper that:

  • pulls scoreboards for multiple sports/leagues via ESPN’s JSON endpoints
  • normalizes teams, scores, game status, and links
  • exports both CSV + JSON for downstream analysis

ESPN scores page (we’ll scrape the underlying scoreboard feed)

Keep daily scoreboard jobs stable with ProxiesAPI

Scoreboard scrapes fail for boring reasons: rate limits, transient 5xxs, and IP reputation. ProxiesAPI fits cleanly into your fetch layer so retries + rotation are a small change — not a rewrite of your parser.


What we’re scraping (ESPN scoreboard JSON)

ESPN has a family of endpoints shaped like:

https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard

Examples:

  • NBA: .../sports/basketball/nba/scoreboard
  • NFL: .../sports/football/nfl/scoreboard
  • MLB: .../sports/baseball/mlb/scoreboard

You can often pass a date as dates=YYYYMMDD:

.../scoreboard?dates=20260530

The response includes:

  • events[]: each scheduled/live/final game
  • per-team score, display names, and IDs
  • game status (pre/in/final) and short detail text

We’re not going to “screen scrape” the ESPN HTML in this tutorial. The JSON feed is the more stable source of truth and the best first choice when it’s available.


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests tenacity python-dateutil

Step 1: Build a fetch layer (timeouts + retries)

Create espn_scores.py:

from __future__ import annotations

import os
import time
from dataclasses import dataclass
from typing import Any

import requests
from tenacity import retry, stop_after_attempt, wait_exponential

TIMEOUT = (10, 30)  # connect, read
HEADERS = {
    "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0; +https://proxiesapi.com)",
    "Accept": "application/json,text/plain,*/*",
}

session = requests.Session()
session.headers.update(HEADERS)


def build_proxies() -> dict[str, str] | None:
    proxy = os.getenv("PROXIESAPI_PROXY")
    if not proxy:
        return None
    return {"http": f"http://{proxy}", "https": f"http://{proxy}"}


PROXIES = build_proxies()


@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=20))
def fetch_json(url: str) -> dict[str, Any]:
    r = session.get(url, timeout=TIMEOUT, proxies=PROXIES)
    r.raise_for_status()
    return r.json()

ProxiesAPI integration point

If you’re running this on a schedule (daily scoreboards across leagues), the simplest integration is to set an environment variable:

export PROXIESAPI_PROXY="YOUR_PROXIESAPI_PROXY"

Then the same code routes requests through ProxiesAPI via proxies=....


Step 2: Normalize the ESPN payload into “games”

ESPN’s JSON is rich but nested. We’ll extract a flat, analysis-friendly row per game:

  • league
  • date/time
  • home/away team name + score
  • status (pre/in/final) and short detail text
  • ESPN event URL
from dateutil import parser as dt


@dataclass
class GameRow:
    league: str
    event_id: str
    start_time_utc: str | None
    status: str | None
    status_detail: str | None
    home_team: str | None
    away_team: str | None
    home_score: int | None
    away_score: int | None
    event_url: str | None


def to_int(v: Any) -> int | None:
    try:
        return int(v)
    except Exception:
        return None


def parse_scoreboard(payload: dict[str, Any], league: str) -> list[GameRow]:
    out: list[GameRow] = []

    for ev in payload.get("events", []) or []:
        competitions = ev.get("competitions", []) or []
        comp = competitions[0] if competitions else {}

        status = (comp.get("status") or {}).get("type") or {}
        status_state = status.get("state")
        status_detail = status.get("shortDetail") or status.get("detail")

        competitors = comp.get("competitors", []) or []
        home = next((c for c in competitors if c.get("homeAway") == "home"), {})
        away = next((c for c in competitors if c.get("homeAway") == "away"), {})

        home_team = (home.get("team") or {}).get("displayName")
        away_team = (away.get("team") or {}).get("displayName")

        home_score = to_int(home.get("score"))
        away_score = to_int(away.get("score"))

        start_time = ev.get("date")
        start_time_utc = dt.parse(start_time).isoformat() if start_time else None

        links = comp.get("links", []) or []
        event_url = links[0].get("href") if links else None

        out.append(
            GameRow(
                league=league,
                event_id=str(ev.get("id") or ""),
                start_time_utc=start_time_utc,
                status=status_state,
                status_detail=status_detail,
                home_team=home_team,
                away_team=away_team,
                home_score=home_score,
                away_score=away_score,
                event_url=event_url,
            )
        )

    return out

Step 3: Crawl multiple leagues (one date) and export CSV

import csv
import json
from dataclasses import asdict
from datetime import datetime, timezone

LEAGUES = {
    "nba": ("basketball", "nba"),
    "nfl": ("football", "nfl"),
    "mlb": ("baseball", "mlb"),
    "nhl": ("hockey", "nhl"),
}


def scoreboard_url(sport: str, league: str, yyyymmdd: str | None = None) -> str:
    base = f"https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard"
    if not yyyymmdd:
        return base
    return f"{base}?dates={yyyymmdd}"


def crawl_date(yyyymmdd: str) -> list[GameRow]:
    rows: list[GameRow] = []
    for league_key, (sport, league) in LEAGUES.items():
        payload = fetch_json(scoreboard_url(sport, league, yyyymmdd))
        rows.extend(parse_scoreboard(payload, league_key))
        time.sleep(0.25)  # be polite
    return rows


def export(rows: list[GameRow], out_prefix: str) -> None:
    json_path = f"{out_prefix}.json"
    csv_path = f"{out_prefix}.csv"

    with open(json_path, "w", encoding="utf-8") as f:
        json.dump([asdict(r) for r in rows], f, ensure_ascii=False, indent=2)

    fieldnames = list(asdict(rows[0]).keys()) if rows else []
    with open(csv_path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fieldnames)
        w.writeheader()
        for r in rows:
            w.writerow(asdict(r))

    print("wrote:", json_path, csv_path, "rows:", len(rows))


if __name__ == "__main__":
    today = datetime.now(timezone.utc).strftime("%Y%m%d")
    rows = crawl_date(today)
    export(rows, out_prefix=f"espn-scoreboards-{today}")

Practical notes (the stuff that breaks scrapers)

  • Date + timezone: ESPN’s timestamps are ISO strings; normalize to UTC and only convert at the edges.
  • Offseason dates: some leagues return empty events; treat that as “no games”, not a failure.
  • Stable IDs: store event_id + team IDs if you’re doing joins across days.
  • Be honest about endpoints: this is a public feed, but it’s not a formal contract. Keep your parser defensive.

Where ProxiesAPI fits (honestly)

If you’re scraping one league once a day, you may never need proxies.

But when you scale into multiple leagues, backfills, and event detail pages (box scores, play-by-play, rosters), the failure rate climbs. ProxiesAPI helps by making the network layer stable: rotation + retries without you building your own proxy plumbing.

Keep daily scoreboard jobs stable with ProxiesAPI

Scoreboard scrapes fail for boring reasons: rate limits, transient 5xxs, and IP reputation. ProxiesAPI fits cleanly into your fetch layer so retries + rotation are a small change — not a rewrite of your parser.

Related guides

Scrape Sports Scores from ESPN (Python + ProxiesAPI)
Fetch ESPN’s scoreboard page, parse games + teams + scores into a clean table, then export CSV/JSON. Includes a screenshot and a resilient parsing strategy.
tutorial#python#espn#sports
Scrape Game Prices and Reviews from Steam with Python (Search + App Pages)
Build a practical Steam scraper: crawl search results, extract title/appid/price/discount/review summary, then enrich each game from its app page. Includes a screenshot and a ProxiesAPI-ready fetch layer.
tutorial#python#steam#web-scraping
Scrape Goodreads Author Pages: Books, Series, Ratings (ProxiesAPI + Python)
Extract author profile data plus a clean list of books (title, URL, average rating, rating count) from Goodreads author pages. Includes real selectors, retries, and a screenshot.
tutorial#python#goodreads#web-scraping
Scrape Numbeo City Cost-of-Living Comparisons (2-City Diff Tables) with Python
Extract Numbeo city-vs-city cost of living comparison rows into a clean dataset (item, city1, city2, percent diff). Includes screenshot, URL builder, and robust table parsing.
tutorial#python#numbeo#web-scraping