Scrape Live Stock Data from Yahoo Finance with Python (Quotes + Key Stats)

Yahoo Finance is a convenient place to pull “live-ish” quote data and basic company stats without signing up for a paid market data feed.

In this tutorial you’ll build a practical Python scraper that:

  • fetches Yahoo Finance quote pages through ProxiesAPI (optional, but recommended at scale)
  • extracts a few quote fields (price, change, currency, market cap, etc.)
  • pulls key stats from embedded JSON (more stable than guessing CSS classes)
  • exports results to CSV

Yahoo Finance quote page (we’ll extract quote fields + summary stats)

When finance pages start throttling, ProxiesAPI stabilizes the fetch layer

Scraping market pages is rarely “set and forget.” ProxiesAPI gives you a proxy-backed fetch URL so retries and rotation stay a small change in your network layer—not a rewrite of your parser.


What we’re scraping

We’ll target quote pages like:

  • https://finance.yahoo.com/quote/AAPL/
  • https://finance.yahoo.com/quote/MSFT/

A note on “live” data

Yahoo Finance updates certain UI widgets via JavaScript, but the initial HTML often contains a large JSON blob we can parse. That’s usually the most reliable way to extract “quote summary” fields without running a browser.

This approach won’t be as fast as a dedicated market data API, and it can break if Yahoo changes their page internals. Treat it as a scraping tutorial—build monitoring and fallbacks if it’s business-critical.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests lxml pandas

We’ll use:

  • requests for HTTP
  • lxml for light HTML parsing (when needed)
  • pandas for a quick CSV export

Step 1: A resilient fetch layer (with optional ProxiesAPI)

ProxiesAPI works by fetching the target URL through their endpoint:

http://api.proxiesapi.com/?auth_key=YOUR_KEY&url=https://example.com

We’ll wrap it so the rest of the scraper stays normal requests.

import os
import time
import random
import urllib.parse
import requests

PROXIESAPI_KEY = os.environ.get("PROXIESAPI_KEY", "")
TIMEOUT = (10, 40)  # connect, read

session = requests.Session()


def proxiesapi_url(target_url: str) -> str:
    if not PROXIESAPI_KEY:
        raise RuntimeError("Set PROXIESAPI_KEY in your environment")

    return (
        "http://api.proxiesapi.com/?auth_key="
        + urllib.parse.quote(PROXIESAPI_KEY, safe="")
        + "&url="
        + urllib.parse.quote(target_url, safe="")
    )


def fetch(url: str, *, use_proxiesapi: bool = True, max_retries: int = 4) -> str:
    last_err = None

    for attempt in range(1, max_retries + 1):
        try:
            final_url = proxiesapi_url(url) if use_proxiesapi else url
            r = session.get(
                final_url,
                timeout=TIMEOUT,
                headers={
                    "User-Agent": (
                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                        "AppleWebKit/537.36 (KHTML, like Gecko) "
                        "Chrome/123.0 Safari/537.36"
                    ),
                    "Accept-Language": "en-US,en;q=0.9",
                },
            )
            r.raise_for_status()
            html = r.text
            if not html or len(html) < 50_000:
                raise RuntimeError(f"Suspiciously small HTML ({len(html)} bytes)")
            return html

        except Exception as e:
            last_err = e
            time.sleep(min(10, (2 ** (attempt - 1))) + random.random())

    raise RuntimeError(f"Fetch failed after {max_retries} attempts: {last_err}")

To run with ProxiesAPI:

export PROXIESAPI_KEY="YOUR_KEY"

Step 2: Extract the embedded JSON (no guessed selectors)

Yahoo Finance quote pages often contain a script assignment like:

  • root.App.main = {...};

Inside that JSON there’s typically a store-like structure with quote summary data.

We’ll locate the JSON blob, parse it, then navigate to the bits we care about.

import json
import re
from typing import Any


ROOT_RE = re.compile(r"root\.App\.main\s*=\s*(\{.*?\})\s*;\s*\n", re.S)


def extract_root_app_main(html: str) -> dict[str, Any]:
    m = ROOT_RE.search(html)
    if not m:
        raise RuntimeError("Could not find root.App.main JSON in HTML")
    return json.loads(m.group(1))

Pull useful fields safely

Yahoo’s schema changes over time, so use “get-with-default” patterns instead of hardcoding a deep path that will crash.

def deep_get(obj: Any, path: list[str], default=None):
    cur = obj
    for key in path:
        if not isinstance(cur, dict) or key not in cur:
            return default
        cur = cur[key]
    return cur


def norm_raw(value: Any):
    if isinstance(value, dict) and "raw" in value:
        return value.get("raw")
    return value

Step 3: Build a quote parser (price + change + a few stats)

For many tickers you’ll find relevant data under something like QuoteSummaryStore or QuoteStore inside the JSON.

This parser is intentionally defensive: it returns None for missing fields instead of crashing.

def parse_quote(html: str) -> dict:
    data = extract_root_app_main(html)

    # Common locations. Yahoo changes these; try multiple.
    quote_store = (
        deep_get(data, ["context", "dispatcher", "stores", "QuoteSummaryStore"])
        or deep_get(data, ["context", "dispatcher", "stores", "QuoteStore"])
        or {}
    )

    price = deep_get(quote_store, ["price"], {}) or {}
    summary_detail = deep_get(quote_store, ["summaryDetail"], {}) or {}
    quote_type = deep_get(quote_store, ["quoteType"], {}) or {}

    symbol = deep_get(price, ["symbol"]) or deep_get(quote_type, ["symbol"])

    return {
        "symbol": symbol,
        "short_name": deep_get(price, ["shortName"]) or deep_get(quote_type, ["shortName"]),
        "currency": deep_get(price, ["currency"]) or deep_get(summary_detail, ["currency"]),
        "regular_price": norm_raw(deep_get(price, ["regularMarketPrice"])),
        "regular_change": norm_raw(deep_get(price, ["regularMarketChange"])),
        "regular_change_pct": norm_raw(deep_get(price, ["regularMarketChangePercent"])),
        "market_cap": norm_raw(deep_get(summary_detail, ["marketCap"])),
        "previous_close": norm_raw(deep_get(summary_detail, ["previousClose"])),
        "open": norm_raw(deep_get(summary_detail, ["open"])),
        "day_low": norm_raw(deep_get(summary_detail, ["dayLow"])),
        "day_high": norm_raw(deep_get(summary_detail, ["dayHigh"])),
        "fifty_two_week_low": norm_raw(deep_get(summary_detail, ["fiftyTwoWeekLow"])),
        "fifty_two_week_high": norm_raw(deep_get(summary_detail, ["fiftyTwoWeekHigh"])),
        "volume": norm_raw(deep_get(summary_detail, ["volume"])),
        "avg_volume_3m": norm_raw(deep_get(summary_detail, ["averageVolume"])),
    }

Step 4: Scrape multiple tickers + export to CSV

import pandas as pd


def quote_url(symbol: str) -> str:
    sym = symbol.strip().upper()
    return f"https://finance.yahoo.com/quote/{sym}/"


def scrape_quotes(symbols: list[str]) -> list[dict]:
    rows = []
    for symbol in symbols:
        url = quote_url(symbol)
        html = fetch(url, use_proxiesapi=True)
        rows.append(parse_quote(html))
    return rows


if __name__ == "__main__":
    symbols = ["AAPL", "MSFT", "NVDA", "TSLA"]
    rows = scrape_quotes(symbols)

    df = pd.DataFrame(rows).sort_values("symbol")
    df.to_csv("yahoo-finance-quotes.csv", index=False)
    print(df[["symbol", "regular_price", "regular_change_pct", "market_cap"]])

You’ll end up with:

  • yahoo-finance-quotes.csv
  • a quick terminal summary

Debugging tips (what breaks first)

If you start seeing failures, it’s usually one of these:

  • HTML too small: you’re getting a block page, consent page, or error page
  • Missing root.App.main: the quote page changed or the response is not the real quote HTML
  • Schema drift: a field moved; your parser should return None, not crash

Quick sanity checks

Print the first few hundred characters of HTML when troubleshooting:

html = fetch(quote_url("AAPL"), use_proxiesapi=False)
print(html[:500])

If the HTML looks like “consent” or “access denied,” you’ll want:

  • retries/backoff
  • a proxy/unblock layer (ProxiesAPI)
  • a smaller request rate (don’t hammer it)

Where ProxiesAPI fits (honestly)

ProxiesAPI doesn’t make Yahoo Finance “guaranteed.” What it does give you is a consistent fetch URL so your scraper can:

  • retry without changing your parser
  • reduce sudden IP-based throttling
  • keep request logic centralized (fetch → parse → export)

If you’re scraping finance pages as part of a larger workflow, that architectural separation is the real win.

When finance pages start throttling, ProxiesAPI stabilizes the fetch layer

Scraping market pages is rarely “set and forget.” ProxiesAPI gives you a proxy-backed fetch URL so retries and rotation stay a small change in your network layer—not a rewrite of your parser.

Related guides

Scrape Yahoo Finance Top Gainers/Losers Screener with ProxiesAPI (CSV Export)
Scrape Yahoo Finance movers tables (gainers + losers), extract tickers, prices, % change, and volume using stable data-testid anchors, then export to CSV. Includes selector rationale and a screenshot.
tutorial#python#yahoo-finance#stocks
Scrape Live Stock Prices from Yahoo Finance (Python + ProxiesAPI)
Fetch Yahoo Finance quote pages via ProxiesAPI, parse price + change + market cap, and export clean rows to CSV. Includes selector rationale and a screenshot.
tutorial#python#yahoo-finance#stocks
Scrape Government Contract Data from SAM.gov (Opportunities + Details)
Build an end-to-end SAM.gov scraper: search opportunities, paginate results, fetch detail pages, normalize fields, and export JSON/CSV using ProxiesAPI. Includes screenshots + robust retry patterns.
tutorial#python#sam-gov#government
Scrape Stock Prices and Financial Data with Python (Step-by-Step)
Build a daily stock-price dataset from Stooq (a green-list friendly source): fetch symbols, download historical OHLCV CSVs, handle retries/timeouts, and export clean CSV/SQLite—using ProxiesAPI in the network layer.
tutorial#python#stocks#finance