Puppeteer Stealth: How to Avoid Bot Detection (Without Getting Your IP Burned)
If you’re searching for puppeteer stealth, you’ve probably experienced one of these:
- your script works locally, but the server blocks you in production
- you get CAPTCHAs after a few pages
- you see “Access denied”, “unusual traffic”,
403, or a blank HTML shell - you’re rotating user agents, but your IP still gets burned
This guide is the practical truth:
- stealth plugins help — but they’re not a silver bullet
- fingerprint tricks can backfire
- most “stealth success” is actually crawl design: pacing, session reuse, and network strategy
We’ll cover:
- what bot defenses look at (in 2026)
- what puppeteer-extra-plugin-stealth actually changes
- how to detect blocks programmatically
- when rotating IPs beats fingerprint hacks
- patterns that keep your IPs alive longer
Stealth tweaks fingerprints. But many blocks are IP/rate-limit driven. ProxiesAPI gives you a proxy-backed fetch URL (and optional rendering) so you can design crawls that burn fewer IPs and complete more runs.
1) How modern bot detection works (high level)
Most defenses score you across multiple signals:
Network signals
- IP reputation (datacenter vs residential vs “dirty” IP)
- request rate / burstiness
- ASN concentration (too many requests from one provider)
- geo mismatch (login region vs IP)
Browser / fingerprint signals
- headless indicators
- inconsistent properties (e.g., WebGL vendor doesn’t match platform)
- missing APIs / permissions weirdness
- automation artifacts (webdriver, unusual timing patterns)
Behavior signals
- no scrolling / no mouse
- clicks too fast
- always the same path
- never loading images/fonts (resource patterns)
Content / target signals
- scraping “hot” endpoints that are heavily protected
- hitting the same page repeatedly
Stealth tools only address part of the picture.
2) What puppeteer-extra-plugin-stealth changes
puppeteer-extra-plugin-stealth is a bundle of evasions. It typically:
- patches
navigator.webdriver - adjusts
plugins,languages,permissions - changes some Chrome/Headless quirks
- can tweak WebGL / hairline / iframe checks depending on version
What it doesn’t do:
- magically give you a clean IP
- fix rate limiting
- fix behavioral anomalies
- guarantee your fingerprint is “real enough” for advanced checks
That’s why people get burned: they focus on stealth, ignore the crawl.
3) A baseline Puppeteer setup (headful, slow, sane)
Start with a stable baseline.
npm i puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
import puppeteer from "puppeteer-extra";
import StealthPlugin from "puppeteer-extra-plugin-stealth";
puppeteer.use(StealthPlugin());
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
export async function run(url) {
const browser = await puppeteer.launch({
headless: false, // start headful while iterating
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--lang=en-US,en",
],
});
const page = await browser.newPage();
// Reasonable defaults
await page.setViewport({ width: 1280, height: 800 });
await page.setUserAgent(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"
);
await page.goto(url, { waitUntil: "networkidle2", timeout: 60000 });
await sleep(1200);
const title = await page.title();
console.log("title:", title);
await browser.close();
}
run("https://example.com");
If this fails immediately with a block page, stealth isn’t your first problem.
4) Detect blocks programmatically (don’t guess)
You need hard signals in code:
- response status (
403,429,503) - specific block keywords in HTML
- unexpected page titles
- CAPTCHA markers
Capture main document response
function looksBlocked(html) {
const t = (html || "").toLowerCase();
return (
t.includes("access denied") ||
t.includes("unusual traffic") ||
t.includes("verify you are human") ||
t.includes("captcha")
);
}
export async function fetchHtmlWithSignals(page, url) {
const resp = await page.goto(url, { waitUntil: "domcontentloaded", timeout: 60000 });
const status = resp?.status();
const html = await page.content();
const title = await page.title();
return {
url,
status,
title,
blocked: (status && status >= 400) || looksBlocked(html) || /captcha/i.test(title),
html,
};
}
When you run crawls at scale, this is what powers:
- retry policies
- backoff
- “switch IP/session” logic
5) The mistake: rotating fingerprints while keeping the same IP
A very common failure mode:
- you randomize user agents
- you randomize viewport
- you randomize timezones
…but all requests still come from one IP or one small IP pool.
For many targets, IP reputation + request rate dominate.
Practical rule
- If you’re blocked quickly across different fingerprints, you likely need better IP strategy and pacing.
- If you’re blocked only on certain flows/pages, you likely need better behavior/session handling.
6) How to avoid getting your IP burned (crawl design)
Here are patterns that consistently reduce burn:
A) Slow down like a human (but consistently)
- don’t burst 200 page loads in a minute
- implement token-bucket rate limiting per domain
B) Reuse sessions (don’t look like 10,000 new users)
- keep cookies for a while
- keep a browser context per “identity”
C) Reduce page loads
- don’t open PDPs you don’t need
- scrape listing pages first, sample PDPs
- cache responses
D) Avoid “hot” endpoints
- some endpoints are aggressively protected
- find alternate sources: JSON data, sitemaps, RSS, etc.
E) Use a stable proxy-backed fetch when you can
If you don’t need full JS interaction, a proxy-backed HTTP fetch is often:
- cheaper
- faster
- less fingerprint-sensitive
That’s where ProxiesAPI can fit: same URL list, fewer headless runs.
7) When to use Puppeteer stealth vs other approaches
Here’s the practical decision table.
| Problem | Best approach | Why |
|---|---|---|
| Server-rendered HTML, mild throttling | HTTP + retries + proxies | Simple + fast |
| JS-heavy pages, content requires rendering | Playwright/Puppeteer | Need real browser |
| Aggressive bot defense on navigation | Hybrid + careful pacing | Full headless alone gets burned |
| You only need a dataset, not a UI journey | Find JSON endpoints / structured feeds | Most stable |
Stealth is one tool in the toolbox.
8) A safer “hybrid crawler” pattern
Use headless only where necessary.
- Fetch PLPs with HTTP (proxy-backed)
- Extract PDP URLs
- Fetch a small sample of PDPs with headless for validation
- If needed, only then expand headless coverage
This reduces your exposure dramatically.
Where ProxiesAPI fits (honestly)
Stealth plugins change browser signals.
But a lot of “bot detection pain” is network-layer:
- too many requests from one IP
- rate limiting
- inconsistent routing
ProxiesAPI gives you a proxy-backed fetch URL and (depending on plan) optional rendering. Used well, it supports crawl patterns that complete more runs and burn fewer IPs.
Checklist: before you blame stealth
- Are you rate limiting per domain?
- Do you retry with backoff on
429/503? - Are you caching and deduping URLs?
- Are you reusing sessions/cookies appropriately?
- Are you detecting blocks and switching strategy?
Fix these first — then tune stealth.
Stealth tweaks fingerprints. But many blocks are IP/rate-limit driven. ProxiesAPI gives you a proxy-backed fetch URL (and optional rendering) so you can design crawls that burn fewer IPs and complete more runs.