Web Scraping Dynamic Content: 5 Reliable Ways to Handle JavaScript-Rendered Pages

If you’ve ever run a scraper and got back an HTML page with… basically nothing in it, you’ve met the problem:

dynamic content.

In 2026, many sites render key data with JavaScript:

  • product lists
  • availability/pricing
  • reviews
  • infinite scroll feeds

This post is a practical playbook for web scraping dynamic content without cargo-culting headless browsers for everything.

We’ll cover:

  1. how to detect JS-rendered content
  2. 5 reliable strategies (from simplest to heaviest)
  3. when each strategy wins
  4. practical code examples

Target keyword (natural): web scraping dynamic content

Handle dynamic pages more reliably with ProxiesAPI

Even when the content is rendered by JavaScript, you still have to fetch multiple endpoints reliably (HTML, XHR JSON, assets). ProxiesAPI gives you a stable fetch primitive you can apply to both page HTML and API calls.


First: how to detect a JavaScript-rendered page

Before you pick a tool, do a 30-second diagnosis.

1) View Source vs Inspect Element

  • View Source shows the raw HTML returned by the server.
  • Inspect Element shows the DOM after JavaScript runs.

If “Inspect” shows lots of items but “View Source” doesn’t, the content is likely rendered client-side.

2) Quick curl test

curl -s "https://example.com" | head -n 30

If you don’t see the data you expect (product names, prices, etc.), it’s probably dynamic.

3) Network tab: XHR/Fetch calls

Open DevTools → Network → filter by Fetch/XHR.

If you see JSON responses containing your target data, you’re in luck: you can often scrape the API directly.


Strategy 1: Scrape the underlying JSON/XHR endpoint (best default)

If the site fetches data via an API call, you can usually:

  • reproduce the request (URL + headers + params)
  • parse JSON directly
  • avoid browser automation entirely

Example: mimic an XHR JSON call

import requests

TIMEOUT = (10, 30)

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
    "Accept": "application/json,text/plain,*/*",
})


def fetch_json(url: str) -> dict:
    r = session.get(url, timeout=TIMEOUT)
    r.raise_for_status()
    return r.json()

# Replace with the real XHR URL you find in DevTools.
# data = fetch_json("https://example.com/api/search?q=shoes&page=1")
# print(data.keys())

Pros

  • fastest + most reliable
  • structured data (no messy HTML parsing)
  • easier to paginate

Cons

  • endpoints may require auth tokens
  • signatures/anti-bot can exist

Strategy 2: Reverse-engineer pagination + filters (keep it API-first)

Dynamic sites often paginate via:

  • page=2
  • cursors like cursor=abc123
  • offsets like offset=40

Once you find the request shape, implement a crawler that:

  • iterates pages/cursors
  • dedupes results
  • respects rate limits
from time import sleep


def crawl_pages(base_url: str, pages: int = 5) -> list[dict]:
    out = []
    for p in range(1, pages + 1):
        url = f"{base_url}&page={p}"
        data = fetch_json(url)
        # TODO: adapt to your endpoint structure
        items = data.get("items") or []
        out.extend(items)
        print("page", p, "items", len(items), "total", len(out))
        sleep(0.5)
    return out

This approach scales better than “open a browser and scroll.”


Strategy 3: Use a headless browser (Playwright) when you must

Sometimes:

  • the API endpoints are heavily protected
  • data is assembled from multiple calls
  • the page uses complex runtime rendering

That’s when a browser automation tool like Playwright is appropriate.

Minimal Playwright example (Python)

pip install playwright
playwright install
from playwright.sync_api import sync_playwright


def get_rendered_html(url: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        html = page.content()
        browser.close()
        return html

# html = get_rendered_html("https://example.com")
# print(len(html))

Pros

  • most “human-like” rendering
  • works when there’s no clean API

Cons

  • slower, heavier, more expensive
  • more moving parts (browser installs, timeouts)

Strategy 4: Hybrid approach (browser to discover API, Requests to crawl)

This is the move that most “serious” scrapers end up using:

  1. open the page in Playwright
  2. capture the XHR requests that contain data
  3. extract the real API URL + headers
  4. switch back to Requests for the bulk crawl

Why it’s great:

  • you use the browser only for discovery
  • your crawler stays fast and cheap

Conceptually:

Playwright (1 time) -> discover endpoint + tokens
Requests (N times)  -> crawl pages, parse JSON

Playwright can listen to network responses; once you have the endpoint, you can implement the crawler with the retry/timeouts patterns from your Requests stack.


Strategy 5: Last resort techniques (when the site fights back)

If you’re dealing with aggressive anti-bot measures, you may need to combine:

  • session/cookie management
  • realistic headers
  • rate limiting
  • multiple fetch strategies

And importantly: detect bot pages.

A practical heuristic:

  • if HTML contains “enable JavaScript” / “are you a robot” / CAPTCHA markers
  • treat the page as blocked and don’t feed it into your parser

def looks_blocked(html: str) -> bool:
    markers = [
        "captcha",
        "are you a robot",
        "enable javascript",
    ]
    h = (html or "").lower()
    return any(m in h for m in markers)

Decision table: which strategy should you use?

SituationBest strategy
Data is in XHR JSON1) scrape the API
API paginates cleanly2) reverse-engineer pagination
No usable API, content only after JS3) Playwright
You can discover API via browser4) Hybrid
Heavy anti-bot5) Last resort combo

Where ProxiesAPI fits (honestly)

Dynamic scraping often means you’re fetching more than one thing:

  • the initial HTML
  • one or more JSON endpoints
  • detail endpoints

Even if you use Playwright for rendering, your pipeline usually still includes plain HTTP calls for scale.

ProxiesAPI can help by giving you a consistent fetch wrapper for both HTML and JSON endpoints:

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com/api/search?page=1" | head

In practice:

  • use API-first strategies when possible
  • reserve browsers for discovery or truly browser-only pages
  • keep retries/timeouts/dedupe regardless of approach

Quick checklist

  • confirm it’s dynamic (View Source vs Inspect)
  • look for XHR JSON endpoints first
  • implement pagination + dedupe
  • add retries + timeouts
  • use Playwright only when necessary
Handle dynamic pages more reliably with ProxiesAPI

Even when the content is rendered by JavaScript, you still have to fetch multiple endpoints reliably (HTML, XHR JSON, assets). ProxiesAPI gives you a stable fetch primitive you can apply to both page HTML and API calls.

Related guides

Web Scraping Dynamic Content: How to Handle JavaScript-Rendered Pages
Decision tree for JS sites: XHR capture, HTML endpoints, or headless—plus when proxies matter.
guide#web-scraping#javascript#dynamic-content
Web Scraping with JavaScript and Node.js: Full Tutorial (Puppeteer/Playwright + ProxiesAPI)
A practical Node.js scraping stack for 2026: HTTP-first with Cheerio, then Playwright for JS-rendered sites — plus proxy rotation, retries, and a clean project template.
guide#javascript#nodejs#web-scraping
How to Scrape Data Without Getting Blocked (A Practical Playbook)
A step-by-step anti-block strategy for web scraping: request fingerprinting, sessions, rate limits, retries, proxies, and when to use a real browser—without burning IPs or writing brittle code.
guide#web-scraping#anti-bot#rate-limiting
Web Scraping with JavaScript and Node.js: A Complete Practical Tutorial (2026)
Learn a modern Node.js web scraping stack: fetch + Cheerio for fast HTML parsing, a Playwright fallback for JS-heavy sites, and a production-ready layer for retries, rate limits, and ProxiesAPI proxy rotation.
guide#javascript#nodejs#web-scraping