Scrape Sports Scores from ESPN with Python (via ProxiesAPI)

ESPN has a clean scoreboard experience that’s perfect for building a practical scraper:

  • it’s updated frequently
  • it’s request-heavy if you crawl multiple sports/days
  • it’s the kind of site that can intermittently throttle or serve different markup

In this guide we’ll build a real Python scraper that:

  • fetches an ESPN scoreboard page
  • extracts each game row: teams, score, status, start time
  • optionally follows game links for a little extra metadata
  • exports results to CSV
  • uses a fetch layer that can route through ProxiesAPI when you scale

We’ll keep the extraction honest: parse the HTML you actually receive, and write selectors that fail loudly when ESPN changes markup.

ESPN scoreboard page (we’ll scrape game rows)

Make score crawls more reliable with ProxiesAPI

Sports sites change and throttle. ProxiesAPI gives you a proxy-backed fetch URL plus optional JS rendering so your scraper finishes more runs with fewer network headaches.


What we’re scraping (ESPN scoreboard)

ESPN scoreboard URLs vary by sport and date. For example:

  • NBA scoreboard: https://www.espn.com/nba/scoreboard
  • NFL scoreboard: https://www.espn.com/nfl/scoreboard

Often there are date parameters or navigation that changes the URL, but you can start with the main scoreboard page and iterate.

Quick sanity check

curl -sL "https://www.espn.com/nba/scoreboard" | head -n 15

If the response is mostly empty or looks like a “please enable JS” shell, you’ll need either:

  • a different endpoint ESPN exposes (sometimes there are JSON feeds), or
  • to fetch through ProxiesAPI with JS rendering enabled (if available on your plan), or
  • a browser automation approach (Playwright) for this target.

This tutorial focuses on HTML extraction, and shows where to switch the fetch layer.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for parsing

Step 1: A production fetch() with retries + ProxiesAPI

Two rules that make scrapers survive:

  1. timeouts always
  2. retries with exponential backoff

And when you scale, route the same request through ProxiesAPI.

import random
import time
from urllib.parse import quote

import requests

TIMEOUT = (10, 30)  # connect, read

HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/123.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
}

session = requests.Session()


def fetch(url: str, *, proxiesapi_key: str | None = None, retries: int = 4) -> str:
    last = None
    for attempt in range(1, retries + 1):
        try:
            if proxiesapi_key:
                # ProxiesAPI simple proxy-backed fetch URL.
                # Note: some accounts support extra params (rendering, country, etc.).
                proxied = (
                    "http://api.proxiesapi.com/?key="
                    + quote(proxiesapi_key)
                    + "&url="
                    + quote(url, safe="")
                )
                r = session.get(proxied, headers=HEADERS, timeout=TIMEOUT)
            else:
                r = session.get(url, headers=HEADERS, timeout=TIMEOUT)

            r.raise_for_status()
            return r.text
        except Exception as e:
            last = e
            sleep_s = (2 ** attempt) + random.random()
            time.sleep(sleep_s)

    raise RuntimeError(f"fetch failed after {retries} retries: {last}")

If you want to test both modes:

html_direct = fetch("https://www.espn.com/nba/scoreboard")
print("direct chars:", len(html_direct))

# html_proxy = fetch("https://www.espn.com/nba/scoreboard", proxiesapi_key="YOUR_KEY")
# print("proxied chars:", len(html_proxy))

Step 2: Inspect the HTML and choose selectors

ESPN’s markup shifts. You can’t rely on “one magic class” forever.

Practical approach:

  1. Save a snapshot of the HTML you received.
  2. Find repeated “game card” containers.
  3. Build selectors around structure (not just long class strings).

Save a local snapshot:

html = fetch("https://www.espn.com/nba/scoreboard")
with open("espn_scoreboard.html", "w", encoding="utf-8") as f:
    f.write(html)
print("wrote espn_scoreboard.html")

Open it and look for repeated blocks like:

  • a wrapper for each event/game
  • team names
  • score numbers
  • status text (final, Q4, scheduled time)

On many ESPN pages you’ll find some combination of:

  • section/div wrappers per event
  • links to the game recap (/game/_/gameId/...)
  • team name text within nested spans

We’ll implement parsing in a way that is easy to adapt: selectors are centralized.


Step 3: Parse game rows into structured records

This parser tries a few common patterns:

  • game containers are “cards” with a game link inside
  • within a card, there are two teams
  • each team has a name, and possibly a score
  • status is present near the top/bottom of the card

If ESPN changes markup, you typically only edit a couple of selectors.

import re
from urllib.parse import urljoin
from bs4 import BeautifulSoup

BASE = "https://www.espn.com"


def clean(text: str | None) -> str | None:
    if not text:
        return None
    t = re.sub(r"\s+", " ", text).strip()
    return t or None


def parse_int(text: str | None) -> int | None:
    if not text:
        return None
    m = re.search(r"\d+", text)
    return int(m.group(0)) if m else None


def parse_scoreboard(html: str, page_url: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    games: list[dict] = []

    # Heuristic: find links that look like game pages and walk up to a container.
    # ESPN game links often contain /game/_/gameId/
    game_links = soup.select('a[href*="/game/_/gameId/"]')

    seen = set()
    for a in game_links:
        href = a.get("href")
        if not href:
            continue
        game_url = urljoin(BASE, href)
        if game_url in seen:
            continue
        seen.add(game_url)

        # Find a reasonable container around the link
        card = a
        for _ in range(6):
            if not card:
                break
            # Stop when the container has enough text to plausibly be a game card
            if getattr(card, "get_text", None):
                txt = clean(card.get_text(" ", strip=True)) or ""
                if len(txt) > 40:
                    break
            card = card.parent

        container = card if card else a

        # Team names: look for two repeated name elements inside container.
        # These selectors are intentionally broad; you should tighten them based on a real snapshot.
        name_candidates = [
            el.get_text(" ", strip=True)
            for el in container.select('span, div')
            if el.get_text(strip=True)
        ]

        # Try to detect team-like names by excluding very short/very long tokens.
        # This is a fallback; ideally you target specific selectors once you inspect your snapshot.
        team_names = []
        for t in name_candidates:
            t = clean(t)
            if not t:
                continue
            if len(t) < 3 or len(t) > 40:
                continue
            # Skip obvious non-team tokens
            if t.lower() in {"final", "preview", "recap", "tickets"}:
                continue
            team_names.append(t)

        # De-dupe while preserving order
        uniq_names = []
        seen_name = set()
        for n in team_names:
            if n in seen_name:
                continue
            seen_name.add(n)
            uniq_names.append(n)

        # Scores: many cards show numeric scores; grab a few numbers.
        nums = [parse_int(clean(el.get_text(strip=True))) for el in container.select("span, div")]
        nums = [n for n in nums if n is not None]

        status = None
        # Status tends to include words like Final, Q1, Half, or a time.
        status_el = container.find(string=re.compile(r"Final|Q\d|Half|AM|PM", re.I))
        if status_el:
            status = clean(str(status_el))

        games.append({
            "page_url": page_url,
            "game_url": game_url,
            "teams_guess": uniq_names[:6],
            "scores_guess": nums[:6],
            "status_guess": status,
        })

    return games

This parser is deliberately conservative: it gives you a structured starting point even when ESPN changes HTML.

For a production scraper, you’ll do one more pass:

  • open espn_scoreboard.html
  • identify the exact game card container selector
  • tighten team name selectors to those elements

That turns “guessy” output into stable output.


Step 4: Convert guesses into a clean schema

You usually want a normalized record like:

  • home_team, away_team
  • home_score, away_score
  • status (Final / In Progress / Scheduled)
  • game_url

Here’s a helper that attempts to map the first two team names + first two scores:


def normalize_game(g: dict) -> dict:
    teams = g.get("teams_guess") or []
    scores = g.get("scores_guess") or []

    away_team = teams[0] if len(teams) > 0 else None
    home_team = teams[1] if len(teams) > 1 else None

    away_score = scores[0] if len(scores) > 0 else None
    home_score = scores[1] if len(scores) > 1 else None

    return {
        "away_team": away_team,
        "home_team": home_team,
        "away_score": away_score,
        "home_score": home_score,
        "status": g.get("status_guess"),
        "game_url": g.get("game_url"),
        "page_url": g.get("page_url"),
    }

Step 5: Export to CSV

import csv


def to_csv(rows: list[dict], path: str) -> None:
    if not rows:
        raise ValueError("no rows")

    fields = ["away_team", "home_team", "away_score", "home_score", "status", "game_url", "page_url"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields)
        w.writeheader()
        for r in rows:
            w.writerow({k: r.get(k) for k in fields})


if __name__ == "__main__":
    url = "https://www.espn.com/nba/scoreboard"
    html = fetch(url)  # or fetch(url, proxiesapi_key="YOUR_KEY")

    raw = parse_scoreboard(html, page_url=url)
    normalized = [normalize_game(g) for g in raw]

    to_csv(normalized, "espn_scores.csv")
    print("wrote espn_scores.csv", len(normalized))

Where ProxiesAPI fits (honestly)

If you’re scraping one scoreboard page occasionally, you may be fine without proxies.

ProxiesAPI becomes useful when you:

  • crawl many sports + dates (lots of repetitive requests)
  • follow each game to a detail/recap page
  • run scheduled scrapes (hourly/daily) where intermittent blocks hurt

The key idea: keep your extraction logic the same, and swap the fetch layer to use the ProxiesAPI URL.


QA checklist

  • curl -sL shows real HTML (not empty shells)
  • Your snapshot contains multiple repeated game blocks
  • Team names map correctly to home/away for 3–5 spot checks
  • CSV outputs sane rows (no null spam)
  • Retries/backoff work (simulate by disconnecting network)

Next upgrades

  • add date selection (crawl yesterday/today/tomorrow)
  • scrape additional fields: venue, broadcast network, odds (if present)
  • store in SQLite for incremental updates
  • tighten selectors based on your saved HTML snapshot
Make score crawls more reliable with ProxiesAPI

Sports sites change and throttle. ProxiesAPI gives you a proxy-backed fetch URL plus optional JS rendering so your scraper finishes more runs with fewer network headaches.

Related guides

Scrape Podcast Charts & Episode Metadata from Apple Podcasts with Python (via ProxiesAPI)
Extract top podcasts from Apple Podcasts charts, crawl show pages, pull episode metadata, and export clean JSON/CSV with retries and pagination.
tutorial#python#apple podcasts#podcasts
Scrape GitHub Repository Data (Stars, Releases, Issues) with Python + ProxiesAPI
Scrape GitHub repo metadata from HTML (not just the API): stars, forks, latest release, open issues, and pull requests. Includes a ProxiesAPI fetch layer, safe parsing, and CSV export + screenshot.
tutorial#python#github#web-scraping
Scrape Book Data from Goodreads (Titles, Authors, Ratings, and Reviews)
A practical Goodreads scraper in Python: collect book title/author/rating count/review count + key metadata using robust selectors, ProxiesAPI in the fetch layer, and export to JSON/CSV.
tutorial#python#goodreads#books
Scrape Live Stock Prices from Yahoo Finance (Python + ProxiesAPI)
Fetch Yahoo Finance quote pages via ProxiesAPI, parse price + change + market cap, and export clean rows to CSV. Includes selector rationale and a screenshot.
tutorial#python#yahoo-finance#stocks