Web Scraping Dynamic Content: How to Handle JavaScript-Rendered Pages

Apr 21, 2026 · guide · #web-scraping, #javascript, #dynamic-content, #playwright, #python, #proxies

“Why is my scraper returning an empty page?”

If you’ve scraped a few websites, you’ve hit this: you requests.get() a URL, parse the HTML, and… the content isn’t there.

That’s usually because the page is JavaScript-rendered:

the server returns a lightweight shell
the browser runs JS
JS calls APIs (XHR/fetch)
the page fills in results after load

This post gives you a practical decision tree to handle web scraping dynamic content without overengineering.

We’ll cover:

how to detect JS-rendered pages
how to find the underlying API calls (the easiest path)
when to scrape “HTML endpoints” instead
when to use a headless browser (Playwright)
where proxies help (and when they don’t)

When dynamic pages get flaky, ProxiesAPI can help

JS-heavy sites often mean more requests (API calls + assets) and more rate-limiting. ProxiesAPI gives you a proxy layer you can turn on when reliability matters.

Get 1,000 free API calls View pricing

Step 1: Identify whether the content is JS-rendered

A fast checklist:

View Page Source in your browser
- if the data you want is missing, it’s likely rendered by JS
curl or requests the page
- compare the HTML to what you see in the browser
Look for placeholders in HTML
- empty <div id="app"> or lots of script tags and minimal markup

Quick terminal test

curl -s "https://example.com" | head -n 40

If you don’t see your target data anywhere in the HTML, you have three main paths.

The decision tree (what to do next)

Path A (best): scrape the underlying API calls

Most JS pages are powered by JSON responses.

If you can find the API endpoint that returns the data, you get:

faster requests
simpler parsing
less brittle selectors

How to find it:

Open DevTools → Network tab
Filter by Fetch/XHR
Reload the page
Click requests that look like search, listings, products, graphql, etc.
Check the Response tab

If you see JSON with the fields you want, that’s your target.

Python example: call a JSON API directly

import requests

TIMEOUT = (10, 30)

r = requests.get(
    "https://api.example.com/search",
    params={"q": "laptop", "page": 1},
    headers={
        "User-Agent": "Mozilla/5.0",
        "Accept": "application/json",
    },
    timeout=TIMEOUT,
)

r.raise_for_status()
data = r.json()

items = data.get("items", [])
print("items", len(items))
print(items[0])

If the API requires headers/cookies (common), capture them from DevTools and reproduce.

Path B: scrape an HTML endpoint (often hidden)

Some sites provide HTML endpoints for:

SEO crawlers
older clients
alternate views

Examples:

adding query params like ?output=1
using ?render=1
switching to an “AMP” or “print” version

How to discover:

search the HTML for alternate links (rel="amphtml", canonical)
check if the site has a /sitemap.xml
look at internal navigation links (sometimes they point to server-rendered pages)

This approach keeps things simple: requests + BeautifulSoup.

Path C (fallback): use a headless browser

If the site:

requires JS to render the content and
hides data behind GraphQL calls with complex signatures or
requires interactions (infinite scroll, button clicks, logged-in flows)

…then headless browser automation is the pragmatic choice.

The best tool today is Playwright.

Playwright example (Python): extract rendered HTML

pip install playwright
python -m playwright install chromium

import asyncio
from playwright.async_api import async_playwright

async def main():
    url = "https://example.com/products"

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        await page.goto(url, wait_until="networkidle")

        # If content loads after scroll:
        # await page.mouse.wheel(0, 2000)
        # await page.wait_for_timeout(1000)

        html = await page.content()
        print("html bytes", len(html))

        await browser.close()

asyncio.run(main())

Once you have HTML, parse with BeautifulSoup as usual.

Performance + reliability tradeoffs

A quick comparison table:

API scraping (XHR/JSON)
- Fastest, most stable
- Requires investigation in DevTools
HTML endpoint scraping
- Simple, cheap
- Might be incomplete or removed
Headless browser scraping
- Most compatible
- Slowest, most resource-heavy
- More moving parts (timeouts, navigation, selectors)

Practical anti-block basics (without overclaiming)

Dynamic sites often mean:

more requests per “page” (API calls + assets)
stricter rate limits
bot detection heuristics

Practical steps that help:

set realistic timeouts
retry on 429/503 with backoff
keep concurrency low (start with 1–3)
cache responses (huge for debugging)
rotate user agents sparingly (don’t randomize every request)

Where ProxiesAPI fits (honestly)

Proxies are not a silver bullet. But they do help in common failure modes:

your tracker runs every hour/day and starts getting 429s
some runs fail due to regional/rate-limit variability
your IP gets temporarily throttled after repeated API calls

If you keep proxy usage as a toggle in your fetch layer, you can turn ProxiesAPI on for:

scheduled jobs
larger watchlists
higher page depth

…and keep local dev/prototyping proxy-free.

A simple playbook you can reuse

When a page is dynamic:

Try API scraping first (Network → XHR → JSON)
If no obvious API, try an alternate HTML endpoint
If interactions are required, use Playwright
Add proxies only when reliability demands it

That sequence keeps your scraper fast, maintainable, and easier to debug.

When dynamic pages get flaky, ProxiesAPI can help

JS-heavy sites often mean more requests (API calls + assets) and more rate-limiting. ProxiesAPI gives you a proxy layer you can turn on when reliability matters.

Get 1,000 free API calls View pricing

When HTML isn’t in the initial response: how to detect JS-rendered pages and choose between XHR reverse-engineering, Playwright, hybrid extraction, and more. Practical decision rules + examples.

guide#web-scraping#dynamic-content#javascript

Selenium Web Scraping with Python: Complete Guide

A practical Selenium web scraping with Python guide: setup, waits, selectors, anti-bot basics, exporting data, and when Selenium is the wrong tool. Includes comparison tables and a ProxiesAPI-friendly architecture pattern.

guide#python#selenium#web-scraping

Web Scraping with JavaScript and Node.js: Full Tutorial (Puppeteer/Playwright + ProxiesAPI)

A practical Node.js scraping stack for 2026: HTTP-first with Cheerio, then Playwright for JS-rendered sites — plus proxy rotation, retries, and a clean project template.

guide#javascript#nodejs#web-scraping

How to Scrape Data Without Getting Blocked (A Practical Playbook)

A step-by-step anti-block strategy for web scraping: request fingerprinting, sessions, rate limits, retries, proxies, and when to use a real browser—without burning IPs or writing brittle code.

guide#web-scraping#anti-bot#rate-limiting

Web Scraping Dynamic Content: How to Handle JavaScript-Rendered Pages

Related guides