Scrape Stock Prices and Financial Data with Python

Yahoo Finance is one of the fastest ways to turn public market pages into a useful internal dataset.

In this tutorial we will scrape three things for a ticker like AAPL:

  • quote-page metadata such as company name and live price
  • summary table fields like previous close, open, and market cap
  • historical daily rows you can export to CSV

The key idea is simple: use the visible quote page for human-facing fields, and use Yahoo Finance's own chart endpoint for clean historical candles.

Yahoo Finance quote page used for stock and summary data extraction

Keep finance scrapers stable with ProxiesAPI

Quote pages and finance endpoints look easy until your daily jobs start seeing throttling, redirects, and intermittent failures. ProxiesAPI gives you a cleaner fetch layer without changing your parsing logic.


What we are scraping

The main page is:

  • https://finance.yahoo.com/quote/AAPL/

For daily price history, Yahoo Finance also requests a chart JSON endpoint behind the scenes:

  • https://query1.finance.yahoo.com/v8/finance/chart/AAPL?interval=1d&range=1mo&includePrePost=false&events=div%2Csplits

That makes the workflow much more reliable than trying to parse an on-page historical table that can change layout.


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

Set your ProxiesAPI key:

export PROXIESAPI_KEY="YOUR_KEY"

This guide uses the simple gateway form:

http://api.proxiesapi.com/?key=YOUR_KEY&url=ENCODED_TARGET_URL

Step 1: Build a fetch helper with ProxiesAPI

We want one function that can fetch either HTML pages or JSON endpoints, with timeouts, retries, and polite pacing.

from __future__ import annotations

import json
import os
import random
import time
from typing import Any
from urllib.parse import quote

import requests

API_KEY = os.environ["PROXIESAPI_KEY"]
TIMEOUT = (10, 30)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/125.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
})


def proxied_url(target_url: str) -> str:
    encoded = quote(target_url, safe="")
    return f"http://api.proxiesapi.com/?key={API_KEY}&url={encoded}"


def fetch_text(target_url: str, attempts: int = 4) -> str:
    last_error: Exception | None = None

    for attempt in range(1, attempts + 1):
        time.sleep(random.uniform(0.8, 1.6))
        try:
            response = session.get(proxied_url(target_url), timeout=TIMEOUT)
            if response.status_code in (403, 429, 500, 502, 503, 504):
                raise RuntimeError(f"retryable status {response.status_code}")
            response.raise_for_status()
            return response.text
        except Exception as exc:  # noqa: BLE001
            last_error = exc
            time.sleep(min(10, attempt * 1.8) + random.random())

    raise RuntimeError(f"failed to fetch {target_url}") from last_error


def fetch_json(target_url: str) -> Any:
    return json.loads(fetch_text(target_url))

This is the reusable part. Once your fetch layer is stable, everything else becomes normal parsing code.


Step 2: Parse quote-page fields

Yahoo Finance quote pages contain both visible tables and embedded JSON. A good pattern is:

  • parse the h1 and summary rows from HTML
  • pull regularMarketPrice and related values from embedded page JSON
import re
from bs4 import BeautifulSoup


def clean(text: str | None) -> str | None:
    if not text:
        return None
    return re.sub(r"\s+", " ", text).strip()


def parse_quote_page(html: str) -> dict:
    soup = BeautifulSoup(html, "lxml")

    heading = soup.select_one("h1")
    company_heading = clean(heading.get_text(" ", strip=True) if heading else None)

    summary = {}
    for row in soup.select("table tr"):
        cells = row.select("td")
        if len(cells) < 2:
            continue
        label = clean(cells[0].get_text(" ", strip=True))
        value = clean(cells[1].get_text(" ", strip=True))
        if label and value:
            summary[label] = value

    price_match = re.search(r'"regularMarketPrice"\s*:\s*\{"raw":\s*([0-9.]+)', html)
    change_match = re.search(r'"regularMarketChangePercent"\s*:\s*\{"raw":\s*([-0-9.]+)', html)
    currency_match = re.search(r'"currency"\s*:\s*"([A-Z]+)"', html)

    return {
        "company_heading": company_heading,
        "regular_market_price": float(price_match.group(1)) if price_match else None,
        "regular_market_change_percent": float(change_match.group(1)) if change_match else None,
        "currency": currency_match.group(1) if currency_match else None,
        "summary": summary,
    }

The summary dictionary usually gives you fields like:

  • Previous Close
  • Open
  • Bid
  • Ask
  • Market Cap
  • PE Ratio (TTM)
  • 52 Week Range

That is already enough to build a useful company snapshot dataset.


Step 3: Pull historical daily rows from the chart endpoint

For time-series data, the chart endpoint is cleaner than scraping a rendered table.

from datetime import datetime, timezone


def chart_url(symbol: str, range_: str = "6mo", interval: str = "1d") -> str:
    return (
        "https://query1.finance.yahoo.com/v8/finance/chart/"
        f"{symbol}?interval={interval}&range={range_}&includePrePost=false&events=div%2Csplits"
    )


def parse_history(chart_payload: dict) -> list[dict]:
    result = chart_payload["chart"]["result"][0]
    timestamps = result.get("timestamp") or []
    quote = result["indicators"]["quote"][0]

    rows = []
    for idx, ts in enumerate(timestamps):
        rows.append({
            "date": datetime.fromtimestamp(ts, tz=timezone.utc).date().isoformat(),
            "open": quote["open"][idx],
            "high": quote["high"][idx],
            "low": quote["low"][idx],
            "close": quote["close"][idx],
            "volume": quote["volume"][idx],
        })

    return rows

This gives you real daily OHLCV rows in a format that is much easier to test and store.


Step 4: Combine both sources into one export

import csv


def scrape_symbol(symbol: str) -> tuple[dict, list[dict]]:
    quote_html = fetch_text(f"https://finance.yahoo.com/quote/{symbol}/")
    quote_data = parse_quote_page(quote_html)

    chart_payload = fetch_json(chart_url(symbol, range_="3mo", interval="1d"))
    history_rows = parse_history(chart_payload)

    snapshot = {
        "symbol": symbol,
        "company_heading": quote_data["company_heading"],
        "currency": quote_data["currency"],
        "regular_market_price": quote_data["regular_market_price"],
        "regular_market_change_percent": quote_data["regular_market_change_percent"],
        "previous_close": quote_data["summary"].get("Previous Close"),
        "open": quote_data["summary"].get("Open"),
        "market_cap": quote_data["summary"].get("Market Cap"),
        "pe_ratio_ttm": quote_data["summary"].get("PE Ratio (TTM)"),
        "week_52_range": quote_data["summary"].get("52 Week Range"),
    }

    return snapshot, history_rows


def write_csv(path: str, rows: list[dict]) -> None:
    if not rows:
        return
    with open(path, "w", newline="", encoding="utf-8") as handle:
        writer = csv.DictWriter(handle, fieldnames=list(rows[0].keys()))
        writer.writeheader()
        writer.writerows(rows)


if __name__ == "__main__":
    snapshot, history = scrape_symbol("AAPL")
    write_csv("aapl-history.csv", history)
    write_csv("aapl-snapshot.csv", [snapshot])
    print(snapshot)
    print("history rows:", len(history))

Now you have:

  • one company snapshot CSV
  • one daily historical CSV

Scale it by looping over a ticker list.


Practical tips

1. Keep the page parser narrow

Do not try to scrape every visible widget on Yahoo Finance. Start with:

  • heading
  • regular market price
  • top summary rows
  • chart endpoint

That covers most use cases without turning the scraper into a maintenance headache.

2. Treat missing fields as normal

Some tickers, funds, and foreign listings expose slightly different labels. Store missing values as None and move on.

3. Cache during development

While you are tuning selectors, save the HTML locally and work against that copy. It cuts down requests and makes debugging faster.

4. Respect data licensing

This is a scraping tutorial, not legal advice. If you need licensed market data for a commercial product, use an approved vendor.


Wrap-up

That is a practical Python workflow for scraping stock prices and financial data:

  • use ProxiesAPI to stabilize requests
  • parse the quote page for company fields and key stats
  • pull historical candles from the chart endpoint
  • export the result as plain CSV

It is simple enough to run from cron, but sturdy enough to grow into a real internal data pipeline.

Keep finance scrapers stable with ProxiesAPI

Quote pages and finance endpoints look easy until your daily jobs start seeing throttling, redirects, and intermittent failures. ProxiesAPI gives you a cleaner fetch layer without changing your parsing logic.

Related guides

Scrape Live Stock Data from Yahoo Finance
Show how to pull live quote fields, daily change, volume, and market-cap data from Yahoo Finance quote pages into a clean CSV.
tutorial#python#yahoo-finance#stocks
Scrape Yahoo Finance Top Gainers/Losers Screener with ProxiesAPI (CSV Export)
Scrape Yahoo Finance movers tables (gainers + losers), extract tickers, prices, % change, and volume using stable data-testid anchors, then export to CSV. Includes selector rationale and a screenshot.
tutorial#python#yahoo-finance#stocks
Scrape Stock Prices and Financial Data with Python
Build a stock-price dataset from Stooq with Python: fetch symbols, download historical OHLCV CSVs, handle retries/timeouts, and export clean CSV plus SQLite with ProxiesAPI in the network layer.
tutorial#python#stocks#finance
Scrape Live Stock Data from Yahoo Finance with Python (Quotes + Key Stats)
A resilient Yahoo Finance scraper in Python: fetch quote pages via ProxiesAPI, extract live-ish quote fields + key stats from embedded JSON, handle retries, and export to CSV.
tutorial#python#yahoo-finance#stocks