Scrape FanDuel Odds and Lines with Python
FanDuel is not a normal HTML scrape.
If you point BeautifulSoup at a sportsbook page and expect all visible odds to be sitting in the raw DOM, you will waste a lot of time. The practical workflow is:
- open the page in a browser
- capture the JSON/XHR traffic behind it
- normalize the pricing payload into rows you can store
- poll the same event endpoints to track line movement
This tutorial shows that pattern with Python.
We will collect:
- matchup names
- market names
- selection names
- American odds
- start times
- live snapshots for line-movement tracking

Sportsbook pages are dynamic, geo-sensitive, and often hostile to repetitive traffic. A ProxiesAPI-ready collection layer gives you a safer way to stabilize the fetch side while keeping your pricing parser reusable.
Why direct HTML scraping usually disappoints on FanDuel
Sportsbook pages are commonly:
- hydrated client-side
- backed by nested JSON payloads
- personalized by region or jurisdiction
- blocked when traffic does not look like a real browser session
So the reliable mental model is:
- browser for discovery
- JSON for extraction
That is also what makes the scraper maintainable. DOM classes change constantly; event payload structures usually change less often.
Install the dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install playwright
playwright install chromium
We will use Playwright because it is good at observing network responses without forcing us to parse a huge rendered DOM.
Step 1: Capture the sportsbook JSON behind the page
The key move is to listen for network responses while the page loads.
from __future__ import annotations
import json
from pathlib import Path
from urllib.parse import urlparse
from playwright.sync_api import sync_playwright
TARGET_URL = "https://sportsbook.fanduel.com/navigation/nba"
def should_capture(url: str) -> bool:
return "fanduel.com" in url and (
"/cache/" in url
or "content-managed-page" in url
or "/api/" in url
)
def capture_payloads(output_dir: str = "fanduel_payloads") -> list[dict]:
Path(output_dir).mkdir(parents=True, exist_ok=True)
captured = []
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(viewport={"width": 1440, "height": 1200})
def handle_response(response):
url = response.url
if not should_capture(url):
return
try:
content_type = response.headers.get("content-type", "")
if "json" not in content_type:
return
payload = response.json()
name = urlparse(url).path.strip("/").replace("/", "_")[:120] + ".json"
path = Path(output_dir) / name
path.write_text(json.dumps(payload, indent=2), encoding="utf-8")
captured.append({"url": url, "path": str(path)})
except Exception:
pass
page.on("response", handle_response)
page.goto(TARGET_URL, wait_until="domcontentloaded", timeout=60000)
page.wait_for_timeout(8000)
browser.close()
return captured
if __name__ == "__main__":
files = capture_payloads()
print(f"captured {len(files)} JSON payloads")
for item in files[:5]:
print(item["url"])
This script does two things you want in production:
- it keeps a local copy of the raw sportsbook payloads
- it decouples later parsing work from live page access
That is a big deal when a site is geo-blocked or intermittently challenged.
Step 2: Convert raw price ratios into American odds
FanDuel-style payloads often carry raw price numerators and denominators instead of already formatted American lines.
def ratio_to_american(price_up: int, price_down: int) -> int | None:
if not price_up or not price_down:
return None
if price_down < price_up:
return int((price_up / price_down) * 100)
return int((price_down / price_up) * -100)
Examples:
| price_up | price_down | American odds |
|---|---|---|
| 20 | 23 | -115 |
| 23 | 20 | +115 |
| 10 | 11 | -110 |
This is the same idea you see when reverse engineering FanDuel payloads manually in DevTools.
Step 3: Normalize markets into dashboard rows
The payload shape changes over time, but a common structure is:
- events
- markets
- selections
Here is a defensive parser that handles the common case without assuming every node exists.
from __future__ import annotations
import json
from pathlib import Path
def normalize_market_payload(path: str) -> list[dict]:
payload = json.loads(Path(path).read_text(encoding="utf-8"))
rows = []
events = payload.get("events", [])
for event in events:
event_name = event.get("eventname") or (
f"{event.get('participantname_away')} @ {event.get('participantname_home')}"
)
start_time = event.get("tsstart")
sport = event.get("sportname")
for market in event.get("markets", []):
market_name = market.get("name")
for selection in market.get("selections", []):
rows.append(
{
"sport": sport,
"event_name": event_name,
"start_time": start_time,
"market_name": market_name,
"selection_name": selection.get("name"),
"handicap": selection.get("currenthandicap"),
"american_odds": ratio_to_american(
selection.get("currentpriceup"),
selection.get("currentpricedown"),
),
}
)
return rows
If your captured payload is event-specific rather than page-wide, adapt the root walk to:
eventmarketgroupsmarketsselections
The normalization principle is the same.
Step 4: Poll for line movement
Once you have the event or market endpoint, line tracking becomes a polling problem.
from __future__ import annotations
import csv
import time
from datetime import datetime, timezone
import requests
def snapshot_event_json(url: str) -> list[dict]:
payload = requests.get(url, timeout=30).json()
rows = []
for group in payload.get("eventmarketgroups", []):
for market in group.get("markets", []):
for selection in market.get("selections", []):
rows.append(
{
"captured_at": datetime.now(timezone.utc).isoformat(),
"event_name": market.get("eventname"),
"market_name": market.get("name"),
"selection_name": selection.get("name"),
"handicap": selection.get("currenthandicap"),
"american_odds": ratio_to_american(
selection.get("currentpriceup"),
selection.get("currentpricedown"),
),
}
)
return rows
def poll_line_movement(event_url: str, out_csv: str, iterations: int = 10, sleep_seconds: int = 30) -> None:
header_written = False
for _ in range(iterations):
rows = snapshot_event_json(event_url)
with open(out_csv, "a", newline="", encoding="utf-8") as fh:
writer = csv.DictWriter(fh, fieldnames=list(rows[0].keys()))
if not header_written:
writer.writeheader()
header_written = True
writer.writerows(rows)
time.sleep(sleep_seconds)
That CSV becomes the raw input for:
- movement charts
- stale-line alerts
- arbitrage comparisons
- model backtesting
Practical anti-block notes
Sportsbooks are higher-friction targets than most tutorial sites, so be realistic:
| Issue | What it means | Safer response |
|---|---|---|
| 403 / CloudFront page | request never reached usable app content | stop retry storms; rotate IP/session |
| empty or incomplete HTML | data is hydrated by JS | capture network JSON with a browser |
| geo-specific gaps | certain markets differ by jurisdiction | record the region and keep runs separate |
| inconsistent payloads | same sport page mixes featured and event-specific markets | normalize everything into one row schema |
If you are using ProxiesAPI in front of the browser or request layer, keep the integration explicit rather than magic:
- store the original page URL
- store the captured JSON URL
- store the region / state context
That audit trail matters more than one extra field in the parser.
When to scrape the page and when to scrape the API
Use the rendered page when you need:
- visual verification
- screenshots
- a way to discover the hidden endpoints
Use the JSON endpoint when you need:
- repeatable data pulls
- line history
- lower compute cost
- cleaner downstream schemas
The browser is your discovery tool. The JSON is your production feed.
Final thoughts
A good FanDuel scraper is not really a DOM scraper. It is a network-observation pipeline.
That mindset change makes the job easier:
- discover payloads with Playwright
- save raw JSON for replay
- convert price ratios to American odds
- poll event endpoints for movement over time
Once you do that, building the betting dashboard is the easy part.
Sportsbook pages are dynamic, geo-sensitive, and often hostile to repetitive traffic. A ProxiesAPI-ready collection layer gives you a safer way to stabilize the fetch side while keeping your pricing parser reusable.