HTTP Headers for Web Scraping: User-Agent, Accept-Language, and Beyond

Jun 21, 2026 · guides · #http headers for web scraping, #headers, #python, #requests, #web-scraping, #user-agent

People often treat request headers like a superstition.

They copy a giant blob from Chrome DevTools, paste it into requests, and hope it magically stops blocks.

Sometimes that works. Usually it is unnecessary.

For most scrapers, only a small set of headers meaningfully changes outcomes:

User-Agent
Accept-Language
Accept
Referer in a few flows
occasionally Cookie when you are continuing a real session

Everything else is situational.

This guide focuses on the headers that actually matter, how to set sane defaults, and when header tuning is worth your time.

The short version

Here is the practical ranking.

Header	Matters often?	Why
`User-Agent`	Yes	Tells the server what client you claim to be
`Accept-Language`	Yes	Helps align locale with browser identity
`Accept`	Yes	Signals expected content type
`Referer`	Sometimes	Some flows expect navigation context
`Cookie`	Sometimes	Required when continuing an existing session
`Accept-Encoding`	Rarely by hand	`requests` handles this well already
`Cache-Control` / `Pragma`	Rarely	Usually not the reason you get blocked
`Sec-Fetch-` / `sec-ch-ua`	Mostly browser-only	Hard to fake consistently with plain `requests`

The big mistake is assuming headers can compensate for everything else.

They cannot.

If your IP is burned or your request rate is absurd, perfect headers will not rescue you.

1. User-Agent: still the first header to fix

The default python-requests/x.y.z user agent is an immediate tell.

Use a modern browser UA unless you have a reason not to.

UA_CHROME_WINDOWS = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/125.0.0.0 Safari/537.36"
)

Why this matters:

some sites explicitly downgrade or block obvious script clients
many anti-bot systems score default library identities as suspicious
consistent browser-like traffic is easier to blend with

What not to do:

rotate to a random UA every request for no reason
claim a mobile Safari UA while behaving like a desktop scraper
use ancient browser versions that no normal user would run

Session consistency beats chaos.

2. Accept-Language: small header, real signal

This header is underrated.

It tells the server what languages you prefer, and it often affects:

page language
geolocation assumptions
whether your request feels browser-like

A safe default:

"Accept-Language": "en-US,en;q=0.9"

This matters most when it matches the rest of your identity:

US-style UA
US-ish locale choices
US-targeted content collection

If you scrape French or German sites, use a locale that matches the workflow instead of blindly sending en-US.

3. Accept: keep it normal

For HTML scraping, a realistic Accept header helps.

"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"

Why?

it reflects a browser asking for HTML first
it avoids the "I only want anything whatsoever" vibe of a bare request

This is not usually the difference between success and failure by itself, but it is part of a believable request profile.

4. Referer: useful when flows have context

Many simple scrapers do not need Referer.

But it can help when:

you move from a search page to a detail page
the site expects internal navigation
the site behaves differently for deep links

Example:

headers["Referer"] = "https://example.com/search?q=laptops"

Do not invent nonsense referers. Use the page that a human would realistically come from.

5. Cookies: only when you mean it

Cookies are powerful because they represent real session state.

They also create headaches if you do not manage them carefully.

Use them when:

you are continuing an existing browsing session
the site sets pagination or locale state in cookies
you already proved the target needs them

Avoid copying stale cookies into every request forever. That often creates brittle scrapers that break mysteriously later.

With requests, a session object handles most cookie persistence for you.

The headers people obsess over too much

`Accept-Encoding`

Usually not worth setting manually. requests negotiates compressed responses fine.

`Sec-Fetch-*`

These are real browser headers, but plain requests is not a browser. Sending a hand-crafted Sec-Fetch-Site without the rest of the browser stack can create more inconsistency than it solves.

`sec-ch-ua*`

Same story. These client hints make more sense in browser automation than in plain HTTP scraping.

If you are using requests, do not try to impersonate full Chromium internals one header at a time.

Safe defaults for Python requests

This is a good baseline for many HTML targets.

import os
import random
import requests
from urllib.parse import urlencode

TIMEOUT = (10, 30)

USER_AGENTS = [
    (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/125.0.0.0 Safari/537.36"
    ),
    (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/125.0.0.0 Safari/537.36"
    ),
]

PROXIESAPI_KEY = os.getenv("PROXIESAPI_KEY", "").strip()


def build_headers(referer: str | None = None) -> dict:
    headers = {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
    }
    if referer:
        headers["Referer"] = referer
    return headers


def maybe_proxy(url: str) -> str:
    if not PROXIESAPI_KEY:
        return url
    return "https://api.proxiesapi.com/?" + urlencode({
        "auth_key": PROXIESAPI_KEY,
        "url": url,
    })


def fetch_html(url: str, referer: str | None = None, session: requests.Session | None = None) -> str:
    s = session or requests.Session()
    r = s.get(
        maybe_proxy(url),
        headers=build_headers(referer=referer),
        timeout=TIMEOUT,
    )
    r.raise_for_status()
    return r.text

This is intentionally boring.

That is the point.

When headers are enough, and when they are not

Use this decision table.

Situation	Headers alone enough?	What to do
Public HTML site, low request volume	Usually yes	Good UA + locale + timeouts
Getting blocked only because of `python-requests` UA	Often	Fix UA and keep sessions
Multi-step session with cookies	Sometimes	Use `requests.Session()` and real referers
JavaScript-rendered site with bot checks	Rarely	Use a browser stack
Failing after many requests from one IP	No	Improve rate limits and proxy layer

Headers are identity hints, not a complete disguise.

The more your target behaves like a browser application rather than a plain website, the less plain header spoofing can do on its own.

Common header mistakes

Mistake 1: copying every header from DevTools

That blob often includes browser-specific fields that do not make sense for requests.

Mistake 2: rotating everything on every request

If your UA, language, and referer change constantly, you stop looking like a person and start looking like a broken traffic generator.

Mistake 3: ignoring consistency

If you send:

Japanese Accept-Language
Windows Chrome UA
EU proxy IP
US-only product URLs

...that can be fine, but it is worth noticing the identity mismatch.

Mistake 4: blaming headers for rate-limit problems

Many block issues are volume problems wearing a header-shaped disguise.

Recommended defaults by scraper type

Scraper type	Recommended header strategy
Simple article / docs scraper	Stable desktop UA + `Accept-Language` + normal `Accept`
Search-to-detail crawler	Same as above, plus realistic `Referer`
Session-based workflow	`requests.Session()` with persistent cookies
Browser automation	Let the browser send most headers natively

The more browser-like your tool is, the less you should manually fake browser-only headers.

Final takeaway

If you remember only one thing, make it this:

A small set of consistent headers beats a giant copied header blob.

For most scrapers, the winning setup is:

a modern User-Agent
a matching Accept-Language
a realistic Accept
Referer only when it makes sense
persistent cookies only when the workflow needs them

That gives you clean, maintainable requests.

Then, if your crawl still struggles, fix the next layer:

pacing
session state
browser rendering
IP quality

Headers matter. They just matter most when they are part of a sane overall scraper, not a cargo-cult paste from DevTools.

Headers help, but they are not your whole anti-block plan

Good headers make your requests less suspicious. ProxiesAPI helps with the network side when clean headers alone are not enough to keep large crawls stable.

Get 1,000 free API calls View pricing

Collect upcoming games, completed results, opponents, dates, networks, and home-away splits from ESPN team schedule pages using the serialized page data behind the HTML.

tutorial#python#espn#sports

Session Cookies for Web Scraping: Keep Logins Stable Without a Browser

Learn how to capture, reuse, persist, and refresh session cookies so authenticated scrapers stay reliable with plain HTTP requests instead of jumping straight to Selenium.

tutorial#web-scraping#session cookies web scraping#python

Scrape GitHub Releases

Collect release tags, publish dates, changelog text, and asset links from GitHub Releases pages with Python so you can monitor repos automatically.

tutorial#python#github#web-scraping

How to Scrape Google Flights Prices with Python (Routes, Dates, and Price Quotes)

A practical guide to extracting flight price quotes from Google Flights responsibly: capture share URLs, fetch server-rendered HTML, parse price cards, and export clean JSON. Includes ProxiesAPI-backed requests + a screenshot.

tutorial#python#google-flights#travel

HTTP Headers for Web Scraping: User-Agent, Accept-Language, and Beyond

Related guides