Web Scraping Dynamic Content: 5 Reliable Ways to Handle JavaScript-Rendered Pages

May 12, 2026 · guide · #web-scraping, #dynamic-content, #javascript, #playwright, #xhr, #python

If you’ve ever run a scraper and got back an HTML page with… basically nothing in it, you’ve met the problem:

dynamic content.

In 2026, many sites render key data with JavaScript:

product lists
availability/pricing
reviews
infinite scroll feeds

This post is a practical playbook for web scraping dynamic content without cargo-culting headless browsers for everything.

We’ll cover:

how to detect JS-rendered content
5 reliable strategies (from simplest to heaviest)
when each strategy wins
practical code examples

Target keyword (natural): web scraping dynamic content

Handle dynamic pages more reliably with ProxiesAPI

Even when the content is rendered by JavaScript, you still have to fetch multiple endpoints reliably (HTML, XHR JSON, assets). ProxiesAPI gives you a stable fetch primitive you can apply to both page HTML and API calls.

Get 1,000 free API calls View pricing

First: how to detect a JavaScript-rendered page

Before you pick a tool, do a 30-second diagnosis.

1) View Source vs Inspect Element

View Source shows the raw HTML returned by the server.
Inspect Element shows the DOM after JavaScript runs.

If “Inspect” shows lots of items but “View Source” doesn’t, the content is likely rendered client-side.

2) Quick curl test

curl -s "https://example.com" | head -n 30

If you don’t see the data you expect (product names, prices, etc.), it’s probably dynamic.

3) Network tab: XHR/Fetch calls

Open DevTools → Network → filter by Fetch/XHR.

If you see JSON responses containing your target data, you’re in luck: you can often scrape the API directly.

Strategy 1: Scrape the underlying JSON/XHR endpoint (best default)

If the site fetches data via an API call, you can usually:

reproduce the request (URL + headers + params)
parse JSON directly
avoid browser automation entirely

Example: mimic an XHR JSON call

import requests

TIMEOUT = (10, 30)

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
    "Accept": "application/json,text/plain,*/*",
})


def fetch_json(url: str) -> dict:
    r = session.get(url, timeout=TIMEOUT)
    r.raise_for_status()
    return r.json()

# Replace with the real XHR URL you find in DevTools.
# data = fetch_json("https://example.com/api/search?q=shoes&page=1")
# print(data.keys())

Pros

fastest + most reliable
structured data (no messy HTML parsing)
easier to paginate

Cons

endpoints may require auth tokens
signatures/anti-bot can exist

Strategy 2: Reverse-engineer pagination + filters (keep it API-first)

Dynamic sites often paginate via:

page=2
cursors like cursor=abc123
offsets like offset=40

Once you find the request shape, implement a crawler that:

iterates pages/cursors
dedupes results
respects rate limits

from time import sleep


def crawl_pages(base_url: str, pages: int = 5) -> list[dict]:
    out = []
    for p in range(1, pages + 1):
        url = f"{base_url}&page={p}"
        data = fetch_json(url)
        # TODO: adapt to your endpoint structure
        items = data.get("items") or []
        out.extend(items)
        print("page", p, "items", len(items), "total", len(out))
        sleep(0.5)
    return out

This approach scales better than “open a browser and scroll.”

Strategy 3: Use a headless browser (Playwright) when you must

Sometimes:

the API endpoints are heavily protected
data is assembled from multiple calls
the page uses complex runtime rendering

That’s when a browser automation tool like Playwright is appropriate.

Minimal Playwright example (Python)

pip install playwright
playwright install

from playwright.sync_api import sync_playwright


def get_rendered_html(url: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        html = page.content()
        browser.close()
        return html

# html = get_rendered_html("https://example.com")
# print(len(html))

Pros

most “human-like” rendering
works when there’s no clean API

Cons

slower, heavier, more expensive
more moving parts (browser installs, timeouts)

Strategy 4: Hybrid approach (browser to discover API, Requests to crawl)

This is the move that most “serious” scrapers end up using:

open the page in Playwright
capture the XHR requests that contain data
extract the real API URL + headers
switch back to Requests for the bulk crawl

Why it’s great:

you use the browser only for discovery
your crawler stays fast and cheap

Conceptually:

Playwright (1 time) -> discover endpoint + tokens
Requests (N times)  -> crawl pages, parse JSON

Playwright can listen to network responses; once you have the endpoint, you can implement the crawler with the retry/timeouts patterns from your Requests stack.

Strategy 5: Last resort techniques (when the site fights back)

If you’re dealing with aggressive anti-bot measures, you may need to combine:

session/cookie management
realistic headers
rate limiting
multiple fetch strategies

And importantly: detect bot pages.

A practical heuristic:

if HTML contains “enable JavaScript” / “are you a robot” / CAPTCHA markers
treat the page as blocked and don’t feed it into your parser


def looks_blocked(html: str) -> bool:
    markers = [
        "captcha",
        "are you a robot",
        "enable javascript",
    ]
    h = (html or "").lower()
    return any(m in h for m in markers)

Decision table: which strategy should you use?

Situation	Best strategy
Data is in XHR JSON	1) scrape the API
API paginates cleanly	2) reverse-engineer pagination
No usable API, content only after JS	3) Playwright
You can discover API via browser	4) Hybrid
Heavy anti-bot	5) Last resort combo

Where ProxiesAPI fits (honestly)

Dynamic scraping often means you’re fetching more than one thing:

the initial HTML
one or more JSON endpoints
detail endpoints

Even if you use Playwright for rendering, your pipeline usually still includes plain HTTP calls for scale.

ProxiesAPI can help by giving you a consistent fetch wrapper for both HTML and JSON endpoints:

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com/api/search?page=1" | head

In practice:

use API-first strategies when possible
reserve browsers for discovery or truly browser-only pages
keep retries/timeouts/dedupe regardless of approach

Quick checklist

confirm it’s dynamic (View Source vs Inspect)
look for XHR JSON endpoints first
implement pagination + dedupe
add retries + timeouts
use Playwright only when necessary

Handle dynamic pages more reliably with ProxiesAPI

Get 1,000 free API calls View pricing

Decision tree for JS sites: XHR capture, HTML endpoints, or headless—plus when proxies matter.

guide#web-scraping#javascript#dynamic-content

Web Scraping with JavaScript and Node.js: Full Tutorial (Puppeteer/Playwright + ProxiesAPI)

A practical Node.js scraping stack for 2026: HTTP-first with Cheerio, then Playwright for JS-rendered sites — plus proxy rotation, retries, and a clean project template.

guide#javascript#nodejs#web-scraping

How to Scrape Data Without Getting Blocked (A Practical Playbook)

A step-by-step anti-block strategy for web scraping: request fingerprinting, sessions, rate limits, retries, proxies, and when to use a real browser—without burning IPs or writing brittle code.

guide#web-scraping#anti-bot#rate-limiting

Web Scraping with JavaScript and Node.js: A Complete Practical Tutorial (2026)

Learn a modern Node.js web scraping stack: fetch + Cheerio for fast HTML parsing, a Playwright fallback for JS-heavy sites, and a production-ready layer for retries, rate limits, and ProxiesAPI proxy rotation.

guide#javascript#nodejs#web-scraping

Web Scraping Dynamic Content: 5 Reliable Ways to Handle JavaScript-Rendered Pages

Related guides