Scrape Sports Scores from ESPN with Python (Scoreboard API + Normalized CSV)

May 30, 2026 · tutorial · #python, #espn, #sports, #web-scraping, #json, #csv, #automation

ESPN is one of the easiest “real world” sports targets because it’s continuously updated and (crucially) exposes a public JSON scoreboard feed for many leagues.

In this guide we’ll build a scraper that:

pulls scoreboards for multiple sports/leagues via ESPN’s JSON endpoints
normalizes teams, scores, game status, and links
exports both CSV + JSON for downstream analysis

ESPN scores page (we’ll scrape the underlying scoreboard feed)

Keep daily scoreboard jobs stable with ProxiesAPI

Scoreboard scrapes fail for boring reasons: rate limits, transient 5xxs, and IP reputation. ProxiesAPI fits cleanly into your fetch layer so retries + rotation are a small change — not a rewrite of your parser.

Get 1,000 free API calls View pricing

What we’re scraping (ESPN scoreboard JSON)

ESPN has a family of endpoints shaped like:

https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard

Examples:

NBA: .../sports/basketball/nba/scoreboard
NFL: .../sports/football/nfl/scoreboard
MLB: .../sports/baseball/mlb/scoreboard

You can often pass a date as dates=YYYYMMDD:

.../scoreboard?dates=20260530

The response includes:

events[]: each scheduled/live/final game
per-team score, display names, and IDs
game status (pre/in/final) and short detail text

We’re not going to “screen scrape” the ESPN HTML in this tutorial. The JSON feed is the more stable source of truth and the best first choice when it’s available.

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests tenacity python-dateutil

Step 1: Build a fetch layer (timeouts + retries)

Create espn_scores.py:

from __future__ import annotations

import os
import time
from dataclasses import dataclass
from typing import Any

import requests
from tenacity import retry, stop_after_attempt, wait_exponential

TIMEOUT = (10, 30)  # connect, read
HEADERS = {
    "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0; +https://proxiesapi.com)",
    "Accept": "application/json,text/plain,*/*",
}

session = requests.Session()
session.headers.update(HEADERS)


def build_proxies() -> dict[str, str] | None:
    proxy = os.getenv("PROXIESAPI_PROXY")
    if not proxy:
        return None
    return {"http": f"http://{proxy}", "https": f"http://{proxy}"}


PROXIES = build_proxies()


@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=20))
def fetch_json(url: str) -> dict[str, Any]:
    r = session.get(url, timeout=TIMEOUT, proxies=PROXIES)
    r.raise_for_status()
    return r.json()

ProxiesAPI integration point

If you’re running this on a schedule (daily scoreboards across leagues), the simplest integration is to set an environment variable:

export PROXIESAPI_PROXY="YOUR_PROXIESAPI_PROXY"

Then the same code routes requests through ProxiesAPI via proxies=....

Step 2: Normalize the ESPN payload into “games”

ESPN’s JSON is rich but nested. We’ll extract a flat, analysis-friendly row per game:

league
date/time
home/away team name + score
status (pre/in/final) and short detail text
ESPN event URL

from dateutil import parser as dt


@dataclass
class GameRow:
    league: str
    event_id: str
    start_time_utc: str | None
    status: str | None
    status_detail: str | None
    home_team: str | None
    away_team: str | None
    home_score: int | None
    away_score: int | None
    event_url: str | None


def to_int(v: Any) -> int | None:
    try:
        return int(v)
    except Exception:
        return None


def parse_scoreboard(payload: dict[str, Any], league: str) -> list[GameRow]:
    out: list[GameRow] = []

    for ev in payload.get("events", []) or []:
        competitions = ev.get("competitions", []) or []
        comp = competitions[0] if competitions else {}

        status = (comp.get("status") or {}).get("type") or {}
        status_state = status.get("state")
        status_detail = status.get("shortDetail") or status.get("detail")

        competitors = comp.get("competitors", []) or []
        home = next((c for c in competitors if c.get("homeAway") == "home"), {})
        away = next((c for c in competitors if c.get("homeAway") == "away"), {})

        home_team = (home.get("team") or {}).get("displayName")
        away_team = (away.get("team") or {}).get("displayName")

        home_score = to_int(home.get("score"))
        away_score = to_int(away.get("score"))

        start_time = ev.get("date")
        start_time_utc = dt.parse(start_time).isoformat() if start_time else None

        links = comp.get("links", []) or []
        event_url = links[0].get("href") if links else None

        out.append(
            GameRow(
                league=league,
                event_id=str(ev.get("id") or ""),
                start_time_utc=start_time_utc,
                status=status_state,
                status_detail=status_detail,
                home_team=home_team,
                away_team=away_team,
                home_score=home_score,
                away_score=away_score,
                event_url=event_url,
            )
        )

    return out

Step 3: Crawl multiple leagues (one date) and export CSV

import csv
import json
from dataclasses import asdict
from datetime import datetime, timezone

LEAGUES = {
    "nba": ("basketball", "nba"),
    "nfl": ("football", "nfl"),
    "mlb": ("baseball", "mlb"),
    "nhl": ("hockey", "nhl"),
}


def scoreboard_url(sport: str, league: str, yyyymmdd: str | None = None) -> str:
    base = f"https://site.api.espn.com/apis/site/v2/sports/{sport}/{league}/scoreboard"
    if not yyyymmdd:
        return base
    return f"{base}?dates={yyyymmdd}"


def crawl_date(yyyymmdd: str) -> list[GameRow]:
    rows: list[GameRow] = []
    for league_key, (sport, league) in LEAGUES.items():
        payload = fetch_json(scoreboard_url(sport, league, yyyymmdd))
        rows.extend(parse_scoreboard(payload, league_key))
        time.sleep(0.25)  # be polite
    return rows


def export(rows: list[GameRow], out_prefix: str) -> None:
    json_path = f"{out_prefix}.json"
    csv_path = f"{out_prefix}.csv"

    with open(json_path, "w", encoding="utf-8") as f:
        json.dump([asdict(r) for r in rows], f, ensure_ascii=False, indent=2)

    fieldnames = list(asdict(rows[0]).keys()) if rows else []
    with open(csv_path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fieldnames)
        w.writeheader()
        for r in rows:
            w.writerow(asdict(r))

    print("wrote:", json_path, csv_path, "rows:", len(rows))


if __name__ == "__main__":
    today = datetime.now(timezone.utc).strftime("%Y%m%d")
    rows = crawl_date(today)
    export(rows, out_prefix=f"espn-scoreboards-{today}")

Practical notes (the stuff that breaks scrapers)

Date + timezone: ESPN’s timestamps are ISO strings; normalize to UTC and only convert at the edges.
Offseason dates: some leagues return empty events; treat that as “no games”, not a failure.
Stable IDs: store event_id + team IDs if you’re doing joins across days.
Be honest about endpoints: this is a public feed, but it’s not a formal contract. Keep your parser defensive.

Where ProxiesAPI fits (honestly)

If you’re scraping one league once a day, you may never need proxies.

But when you scale into multiple leagues, backfills, and event detail pages (box scores, play-by-play, rosters), the failure rate climbs. ProxiesAPI helps by making the network layer stable: rotation + retries without you building your own proxy plumbing.

Keep daily scoreboard jobs stable with ProxiesAPI

Get 1,000 free API calls View pricing

Collect upcoming games, completed results, opponents, dates, networks, and home-away splits from ESPN team schedule pages using the serialized page data behind the HTML.

tutorial#python#espn#sports

Scrape Sports Scores from ESPN (Python + ProxiesAPI)

Fetch ESPN’s scoreboard page, parse games + teams + scores into a clean table, then export CSV/JSON. Includes a screenshot and a resilient parsing strategy.

tutorial#python#espn#sports

Scrape Book Data from Goodreads

Build a Goodreads dataset with book titles, authors, ratings, and review counts from a public list page using Python and an optional ProxiesAPI fetch layer.

tutorial#python#goodreads#books

Scrape Secondhand Fashion Listings from Vinted

Show how to extract Vinted listing titles, prices, brands, sizes, and image URLs from the public catalog with real selectors and a screenshot.

tutorial#python#vinted#web-scraping

Scrape Sports Scores from ESPN with Python (Scoreboard API + Normalized CSV)

Related guides