Web Scraping Dynamic Content: How to Handle JavaScript-Rendered Pages

“Why is my scraper returning an empty page?”

If you’ve scraped a few websites, you’ve hit this: you requests.get() a URL, parse the HTML, and… the content isn’t there.

That’s usually because the page is JavaScript-rendered:

  • the server returns a lightweight shell
  • the browser runs JS
  • JS calls APIs (XHR/fetch)
  • the page fills in results after load

This post gives you a practical decision tree to handle web scraping dynamic content without overengineering.

We’ll cover:

  • how to detect JS-rendered pages
  • how to find the underlying API calls (the easiest path)
  • when to scrape “HTML endpoints” instead
  • when to use a headless browser (Playwright)
  • where proxies help (and when they don’t)
When dynamic pages get flaky, ProxiesAPI can help

JS-heavy sites often mean more requests (API calls + assets) and more rate-limiting. ProxiesAPI gives you a proxy layer you can turn on when reliability matters.


Step 1: Identify whether the content is JS-rendered

A fast checklist:

  1. View Page Source in your browser
    • if the data you want is missing, it’s likely rendered by JS
  2. curl or requests the page
    • compare the HTML to what you see in the browser
  3. Look for placeholders in HTML
    • empty <div id="app"> or lots of script tags and minimal markup

Quick terminal test

curl -s "https://example.com" | head -n 40

If you don’t see your target data anywhere in the HTML, you have three main paths.


The decision tree (what to do next)

Path A (best): scrape the underlying API calls

Most JS pages are powered by JSON responses.

If you can find the API endpoint that returns the data, you get:

  • faster requests
  • simpler parsing
  • less brittle selectors

How to find it:

  1. Open DevTools → Network tab
  2. Filter by Fetch/XHR
  3. Reload the page
  4. Click requests that look like search, listings, products, graphql, etc.
  5. Check the Response tab

If you see JSON with the fields you want, that’s your target.

Python example: call a JSON API directly

import requests

TIMEOUT = (10, 30)

r = requests.get(
    "https://api.example.com/search",
    params={"q": "laptop", "page": 1},
    headers={
        "User-Agent": "Mozilla/5.0",
        "Accept": "application/json",
    },
    timeout=TIMEOUT,
)

r.raise_for_status()
data = r.json()

items = data.get("items", [])
print("items", len(items))
print(items[0])

If the API requires headers/cookies (common), capture them from DevTools and reproduce.


Path B: scrape an HTML endpoint (often hidden)

Some sites provide HTML endpoints for:

  • SEO crawlers
  • older clients
  • alternate views

Examples:

  • adding query params like ?output=1
  • using ?render=1
  • switching to an “AMP” or “print” version

How to discover:

  • search the HTML for alternate links (rel="amphtml", canonical)
  • check if the site has a /sitemap.xml
  • look at internal navigation links (sometimes they point to server-rendered pages)

This approach keeps things simple: requests + BeautifulSoup.


Path C (fallback): use a headless browser

If the site:

  • requires JS to render the content and
  • hides data behind GraphQL calls with complex signatures or
  • requires interactions (infinite scroll, button clicks, logged-in flows)

…then headless browser automation is the pragmatic choice.

The best tool today is Playwright.


Playwright example (Python): extract rendered HTML

pip install playwright
python -m playwright install chromium
import asyncio
from playwright.async_api import async_playwright

async def main():
    url = "https://example.com/products"

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        await page.goto(url, wait_until="networkidle")

        # If content loads after scroll:
        # await page.mouse.wheel(0, 2000)
        # await page.wait_for_timeout(1000)

        html = await page.content()
        print("html bytes", len(html))

        await browser.close()

asyncio.run(main())

Once you have HTML, parse with BeautifulSoup as usual.


Performance + reliability tradeoffs

A quick comparison table:

  • API scraping (XHR/JSON)

    • Fastest, most stable
    • Requires investigation in DevTools
  • HTML endpoint scraping

    • Simple, cheap
    • Might be incomplete or removed
  • Headless browser scraping

    • Most compatible
    • Slowest, most resource-heavy
    • More moving parts (timeouts, navigation, selectors)

Practical anti-block basics (without overclaiming)

Dynamic sites often mean:

  • more requests per “page” (API calls + assets)
  • stricter rate limits
  • bot detection heuristics

Practical steps that help:

  • set realistic timeouts
  • retry on 429/503 with backoff
  • keep concurrency low (start with 1–3)
  • cache responses (huge for debugging)
  • rotate user agents sparingly (don’t randomize every request)

Where ProxiesAPI fits (honestly)

Proxies are not a silver bullet. But they do help in common failure modes:

  • your tracker runs every hour/day and starts getting 429s
  • some runs fail due to regional/rate-limit variability
  • your IP gets temporarily throttled after repeated API calls

If you keep proxy usage as a toggle in your fetch layer, you can turn ProxiesAPI on for:

  • scheduled jobs
  • larger watchlists
  • higher page depth

…and keep local dev/prototyping proxy-free.


A simple playbook you can reuse

When a page is dynamic:

  1. Try API scraping first (Network → XHR → JSON)
  2. If no obvious API, try an alternate HTML endpoint
  3. If interactions are required, use Playwright
  4. Add proxies only when reliability demands it

That sequence keeps your scraper fast, maintainable, and easier to debug.

When dynamic pages get flaky, ProxiesAPI can help

JS-heavy sites often mean more requests (API calls + assets) and more rate-limiting. ProxiesAPI gives you a proxy layer you can turn on when reliability matters.

Related guides

Web Scraping Dynamic Content: 5 Reliable Ways to Handle JavaScript-Rendered Pages
When HTML isn’t in the initial response: how to detect JS-rendered pages and choose between XHR reverse-engineering, Playwright, hybrid extraction, and more. Practical decision rules + examples.
guide#web-scraping#dynamic-content#javascript
Web Scraping with JavaScript and Node.js: Full Tutorial (Puppeteer/Playwright + ProxiesAPI)
A practical Node.js scraping stack for 2026: HTTP-first with Cheerio, then Playwright for JS-rendered sites — plus proxy rotation, retries, and a clean project template.
guide#javascript#nodejs#web-scraping
How to Scrape Data Without Getting Blocked (A Practical Playbook)
A step-by-step anti-block strategy for web scraping: request fingerprinting, sessions, rate limits, retries, proxies, and when to use a real browser—without burning IPs or writing brittle code.
guide#web-scraping#anti-bot#rate-limiting
Web Scraping with JavaScript and Node.js: A Complete Practical Tutorial (2026)
Learn a modern Node.js web scraping stack: fetch + Cheerio for fast HTML parsing, a Playwright fallback for JS-heavy sites, and a production-ready layer for retries, rate limits, and ProxiesAPI proxy rotation.
guide#javascript#nodejs#web-scraping