Web Scraping Dynamic Content: How to Handle JavaScript-Rendered Pages

“Why is my scraper returning an empty page?”

If you’ve scraped a few websites, you’ve hit this: you requests.get() a URL, parse the HTML, and… the content isn’t there.

That’s usually because the page is JavaScript-rendered:

  • the server returns a lightweight shell
  • the browser runs JS
  • JS calls APIs (XHR/fetch)
  • the page fills in results after load

This post gives you a practical decision tree to handle web scraping dynamic content without overengineering.

We’ll cover:

  • how to detect JS-rendered pages
  • how to find the underlying API calls (the easiest path)
  • when to scrape “HTML endpoints” instead
  • when to use a headless browser (Playwright)
  • where proxies help (and when they don’t)
When dynamic pages get flaky, ProxiesAPI can help

JS-heavy sites often mean more requests (API calls + assets) and more rate-limiting. ProxiesAPI gives you a proxy layer you can turn on when reliability matters.


Step 1: Identify whether the content is JS-rendered

A fast checklist:

  1. View Page Source in your browser
    • if the data you want is missing, it’s likely rendered by JS
  2. curl or requests the page
    • compare the HTML to what you see in the browser
  3. Look for placeholders in HTML
    • empty <div id="app"> or lots of script tags and minimal markup

Quick terminal test

curl -s "https://example.com" | head -n 40

If you don’t see your target data anywhere in the HTML, you have three main paths.


The decision tree (what to do next)

Path A (best): scrape the underlying API calls

Most JS pages are powered by JSON responses.

If you can find the API endpoint that returns the data, you get:

  • faster requests
  • simpler parsing
  • less brittle selectors

How to find it:

  1. Open DevTools → Network tab
  2. Filter by Fetch/XHR
  3. Reload the page
  4. Click requests that look like search, listings, products, graphql, etc.
  5. Check the Response tab

If you see JSON with the fields you want, that’s your target.

Python example: call a JSON API directly

import requests

TIMEOUT = (10, 30)

r = requests.get(
    "https://api.example.com/search",
    params={"q": "laptop", "page": 1},
    headers={
        "User-Agent": "Mozilla/5.0",
        "Accept": "application/json",
    },
    timeout=TIMEOUT,
)

r.raise_for_status()
data = r.json()

items = data.get("items", [])
print("items", len(items))
print(items[0])

If the API requires headers/cookies (common), capture them from DevTools and reproduce.


Path B: scrape an HTML endpoint (often hidden)

Some sites provide HTML endpoints for:

  • SEO crawlers
  • older clients
  • alternate views

Examples:

  • adding query params like ?output=1
  • using ?render=1
  • switching to an “AMP” or “print” version

How to discover:

  • search the HTML for alternate links (rel="amphtml", canonical)
  • check if the site has a /sitemap.xml
  • look at internal navigation links (sometimes they point to server-rendered pages)

This approach keeps things simple: requests + BeautifulSoup.


Path C (fallback): use a headless browser

If the site:

  • requires JS to render the content and
  • hides data behind GraphQL calls with complex signatures or
  • requires interactions (infinite scroll, button clicks, logged-in flows)

…then headless browser automation is the pragmatic choice.

The best tool today is Playwright.


Playwright example (Python): extract rendered HTML

pip install playwright
python -m playwright install chromium
import asyncio
from playwright.async_api import async_playwright

async def main():
    url = "https://example.com/products"

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        await page.goto(url, wait_until="networkidle")

        # If content loads after scroll:
        # await page.mouse.wheel(0, 2000)
        # await page.wait_for_timeout(1000)

        html = await page.content()
        print("html bytes", len(html))

        await browser.close()

asyncio.run(main())

Once you have HTML, parse with BeautifulSoup as usual.


Performance + reliability tradeoffs

A quick comparison table:

  • API scraping (XHR/JSON)

    • Fastest, most stable
    • Requires investigation in DevTools
  • HTML endpoint scraping

    • Simple, cheap
    • Might be incomplete or removed
  • Headless browser scraping

    • Most compatible
    • Slowest, most resource-heavy
    • More moving parts (timeouts, navigation, selectors)

Practical anti-block basics (without overclaiming)

Dynamic sites often mean:

  • more requests per “page” (API calls + assets)
  • stricter rate limits
  • bot detection heuristics

Practical steps that help:

  • set realistic timeouts
  • retry on 429/503 with backoff
  • keep concurrency low (start with 1–3)
  • cache responses (huge for debugging)
  • rotate user agents sparingly (don’t randomize every request)

Where ProxiesAPI fits (honestly)

Proxies are not a silver bullet. But they do help in common failure modes:

  • your tracker runs every hour/day and starts getting 429s
  • some runs fail due to regional/rate-limit variability
  • your IP gets temporarily throttled after repeated API calls

If you keep proxy usage as a toggle in your fetch layer, you can turn ProxiesAPI on for:

  • scheduled jobs
  • larger watchlists
  • higher page depth

…and keep local dev/prototyping proxy-free.


A simple playbook you can reuse

When a page is dynamic:

  1. Try API scraping first (Network → XHR → JSON)
  2. If no obvious API, try an alternate HTML endpoint
  3. If interactions are required, use Playwright
  4. Add proxies only when reliability demands it

That sequence keeps your scraper fast, maintainable, and easier to debug.

When dynamic pages get flaky, ProxiesAPI can help

JS-heavy sites often mean more requests (API calls + assets) and more rate-limiting. ProxiesAPI gives you a proxy layer you can turn on when reliability matters.

Related guides

Web Scraping Tools: The 2026 Buyer’s Guide (What to Use When)
A practical 2026 buyer’s guide to web scraping tools: no-code extractors, browser automation, scraping frameworks, and hosted APIs — plus how proxies fit into a reliable stack.
guide#web-scraping#scraping-tools#browser-automation
How to Scrape Data Without Getting Blocked: A Practical Playbook
A no-fluff anti-blocking guide: rate limits, fingerprints, retries/backoff, header hygiene, caching, and when proxy rotation (ProxiesAPI) is the simplest fix. Includes comparison tables and checklists.
guide#web-scraping#anti-block#proxies
Web Scraping with JavaScript and Node.js: Full Tutorial (2026)
A modern Node.js scraping toolkit: fetch + parse with Cheerio, render JS sites with Playwright, add retries/backoff, and integrate ProxiesAPI for proxy rotation. Includes comparison table and production checklists.
guide#javascript#nodejs#web-scraping
Web Scraping with JavaScript and Node.js: A Full 2026 Tutorial
A practical Node.js guide (fetch/axios + Cheerio, plus Playwright when needed) with proxy + anti-block patterns.
guide#javascript#nodejs#web-scraping