Scrape Yahoo Finance Analyst Ratings and Price Targets with Python

Jun 22, 2026 · tutorial · #python, #yahoo-finance, #analyst-ratings, #price-targets, #selenium, #pandas, #csv

Yahoo Finance’s Analysis tab is useful because it compresses a lot of analyst sentiment into a few tables:

recommendation trend
mean / low / high price targets
analyst count
revenue and EPS estimate tables

That is exactly the kind of page that becomes more useful once you turn it into a dataset.

In this guide, we’ll scrape the rendered Analysis page for a ticker like AAPL, extract the analyst tables, normalize them, and export a CSV you can reuse in your own research workflow.

The important detail is that Yahoo Finance is not a plain “fetch once with requests” target anymore. In this environment, direct HTTP requests to Yahoo’s finance endpoints were rate-limited, so the stable pattern is:

render the page in a real browser
wait for the visible tables
parse those tables with pandas.read_html()

Yahoo Finance analysis page

Keep the parser, harden the fetch layer with ProxiesAPI

Yahoo Finance is one of those targets that works until it doesn’t. When your local script starts hitting unstable responses or throttling, ProxiesAPI gives you a cleaner network path without forcing you to rewrite the parser.

Get 1,000 free API calls View pricing

The page we want

For any ticker, the analysis page lives at:

https://finance.yahoo.com/quote/AAPL/analysis/

Try another symbol by swapping AAPL:

MSFT
NVDA
TSLA

The page usually contains several rendered tables. The ones we care about most are:

Recommendation Trend
Recommendation Rating
Price Target

The visible HTML changes more often than the URL pattern, so the safest approach is to anchor on the rendered tables, not Yahoo’s styling classes.

Install the stack

python3 -m venv .venv
source .venv/bin/activate
pip install selenium pandas beautifulsoup4 lxml html5lib

Why this stack:

selenium renders the page
pandas.read_html() extracts tables fast
BeautifulSoup is handy for heading metadata
lxml / html5lib help with HTML parsing edge cases

Step 1: Render the page in headless Chrome

We’ll open the Analysis tab in a real browser, wait for tables to appear, then capture the final HTML.

from __future__ import annotations

import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


def build_driver() -> webdriver.Chrome:
    options = Options()
    options.add_argument("--headless=new")
    options.add_argument("--window-size=1440,2200")
    options.add_argument(
        "--user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0 Safari/537.36"
    )

    # Optional: route Chrome through ProxiesAPI when you need a more stable IP layer.
    proxy_url = os.getenv("PROXIESAPI_PROXY_URL")
    if proxy_url:
        options.add_argument(f"--proxy-server={proxy_url}")

    return webdriver.Chrome(options=options)


def fetch_rendered_html(symbol: str) -> str:
    url = f"https://finance.yahoo.com/quote/{symbol}/analysis/"
    driver = build_driver()
    try:
        driver.get(url)
        WebDriverWait(driver, 35).until(
            EC.presence_of_element_located((By.CSS_SELECTOR, "table"))
        )
        return driver.page_source
    finally:
        driver.quit()

Why wait on table instead of a text string? Because the rendered tables are the real target. If the tables never appear, the data is not ready.

Step 2: Parse the visible tables

Once the HTML is rendered, pandas.read_html() does most of the heavy lifting.

from io import StringIO
import pandas as pd


def extract_tables(html: str) -> list[pd.DataFrame]:
    return pd.read_html(StringIO(html))

The Analysis page typically returns multiple tables, so the next job is selecting the ones we actually want.

Step 3: Find the analyst tables we care about

This approach is more robust than hard-coding “table 2” or “table 5”.

def pick_price_target_table(tables: list[pd.DataFrame]) -> pd.DataFrame:
    for df in tables:
        first_col = df.iloc[:, 0].astype(str).str.strip().tolist()
        if "Current Price" in first_col and "Mean Target" in first_col:
            out = df.copy()
            out.columns = ["metric", "value"]
            return out
    raise ValueError("Price target table not found")


def pick_recommendation_rating_table(tables: list[pd.DataFrame]) -> pd.DataFrame:
    for df in tables:
        first_col = df.iloc[:, 0].astype(str).str.strip().tolist()
        if "Mean Rating" in first_col and "No. of Analysts" in first_col:
            out = df.copy()
            out.columns = ["metric", "value"]
            return out
    raise ValueError("Recommendation rating table not found")


def pick_recommendation_trend_table(tables: list[pd.DataFrame]) -> pd.DataFrame:
    expected = {"Strong Buy", "Buy", "Hold", "Sell", "Strong Sell"}
    for df in tables:
        cols = {str(col).strip() for col in df.columns}
        if expected.issubset(cols):
            return df.copy()
    raise ValueError("Recommendation trend table not found")

This selection logic is grounded in the actual labels Yahoo shows on the Analysis page, which makes it easier to survive layout tweaks.

Step 4: Extract ticker metadata from the heading

The tables tell us the analyst story, but we usually also want:

symbol
company name

We can pull that from the page heading.

import re
from bs4 import BeautifulSoup


def extract_metadata(html: str, fallback_symbol: str) -> dict:
    soup = BeautifulSoup(html, "lxml")
    title_text = ""

    h1 = soup.select_one("h1")
    if h1:
        title_text = h1.get_text(" ", strip=True)

    # Example heading pattern:
    # "Apple Inc. (AAPL)"
    m = re.search(r"^(.*?)\s*\(([A-Z.\-]+)\)$", title_text)
    if m:
        return {
            "company_name": m.group(1).strip(),
            "symbol": m.group(2).strip(),
        }

    return {
        "company_name": title_text or None,
        "symbol": fallback_symbol,
    }

Step 5: Normalize everything into one flat dataset

This turns the page into rows you can append every day.

def table_to_dict(df: pd.DataFrame) -> dict:
    return {
        str(row["metric"]).strip(): str(row["value"]).strip()
        for _, row in df.iterrows()
    }


def normalize(symbol: str, html: str) -> tuple[pd.DataFrame, pd.DataFrame]:
    tables = extract_tables(html)

    price_targets = pick_price_target_table(tables)
    rating_table = pick_recommendation_rating_table(tables)
    trend_table = pick_recommendation_trend_table(tables)
    metadata = extract_metadata(html, symbol)

    price_map = table_to_dict(price_targets)
    rating_map = table_to_dict(rating_table)

    summary = pd.DataFrame(
        [
            {
                "symbol": metadata["symbol"],
                "company_name": metadata["company_name"],
                "current_price": price_map.get("Current Price"),
                "target_low": price_map.get("Low Target"),
                "target_mean": price_map.get("Mean Target"),
                "target_high": price_map.get("High Target"),
                "mean_rating": rating_map.get("Mean Rating"),
                "analyst_count": rating_map.get("No. of Analysts"),
            }
        ]
    )

    trend = trend_table.copy()
    trend["symbol"] = metadata["symbol"]
    trend["company_name"] = metadata["company_name"]

    return summary, trend

The result is two useful outputs:

a one-row summary for each ticker
a recommendation trend table that keeps the original buy/hold/sell split

Step 6: Export to CSV

def scrape_yahoo_analysis(symbol: str) -> tuple[pd.DataFrame, pd.DataFrame]:
    html = fetch_rendered_html(symbol)
    return normalize(symbol, html)


if __name__ == "__main__":
    symbol = "AAPL"
    summary_df, trend_df = scrape_yahoo_analysis(symbol)

    print(summary_df.to_string(index=False))
    print()
    print(trend_df.head().to_string(index=False))

    summary_df.to_csv(f"{symbol.lower()}_analyst_summary.csv", index=False)
    trend_df.to_csv(f"{symbol.lower()}_recommendation_trend.csv", index=False)

    print("saved:", f"{symbol.lower()}_analyst_summary.csv")
    print("saved:", f"{symbol.lower()}_recommendation_trend.csv")

Typical output shape:

symbol company_name current_price target_low target_mean target_high mean_rating analyst_count
  AAPL   Apple Inc.        201.55     170.00      229.34      300.00        2.10            38

And for the trend table:

Period Strong Buy Buy Hold Sell Strong Sell symbol company_name
0m          14    21    3    0           0   AAPL   Apple Inc.
-1m         15    20    3    0           0   AAPL   Apple Inc.
-2m         15    19    4    0           0   AAPL   Apple Inc.

Batch multiple tickers

Once the logic works for one ticker, looping a watchlist is straightforward.

WATCHLIST = ["AAPL", "MSFT", "NVDA", "TSLA", "AMZN"]

all_summaries = []
all_trends = []

for symbol in WATCHLIST:
    try:
        summary_df, trend_df = scrape_yahoo_analysis(symbol)
        all_summaries.append(summary_df)
        all_trends.append(trend_df)
    except Exception as exc:
        print("failed:", symbol, exc)

if all_summaries:
    pd.concat(all_summaries, ignore_index=True).to_csv("watchlist_analyst_summary.csv", index=False)

if all_trends:
    pd.concat(all_trends, ignore_index=True).to_csv("watchlist_recommendation_trend.csv", index=False)

That gives you a compact analyst snapshot across your full list.

What can break?

Yahoo Finance is a moving target, so expect three common failure modes:

Failure	Likely cause	Fix
no tables found	page did not finish rendering	wait longer, check login/cookie banners
wrong table selected	labels changed slightly	loosen the table-matching rules
intermittent access failures	throttling or network instability	use calmer pacing and route through ProxiesAPI

The good news is that the structure of the visible tables tends to be more stable than one-off CSS selectors.

Where ProxiesAPI fits

The valuable split here is:

Selenium + pandas handle rendering and parsing
ProxiesAPI improves the network path when Yahoo starts getting inconsistent

If you already have a proxy endpoint from ProxiesAPI, set:

export PROXIESAPI_PROXY_URL="http://USER:PASS@proxy.proxiesapi.com:PORT"

Then the browser layer uses it automatically through:

proxy_url = os.getenv("PROXIESAPI_PROXY_URL")
if proxy_url:
    options.add_argument(f"--proxy-server={proxy_url}")

That is the cleanest upgrade path: keep the scraper logic, improve the connection layer.

Final thoughts

Yahoo Finance analyst pages are a good example of a page that looks simple in the browser but behaves like a semi-dynamic app when you try to scrape it at scale.

The pattern that holds up is: