Web Scraping with Python Requests: Proxies, Retries, and Timeouts (2026)

May 12, 2026 · guide · #python, #requests, #proxy, #retries, #timeouts, #web-scraping

If your scraper “works on my laptop” but fails in production, it’s usually not your parser.

It’s the network layer:

a request hangs because you didn’t set timeouts
the server rate limits you (429)
TLS handshakes fail intermittently
responses vary by IP, geography, or load

This guide is a practical checklist for making Python Requests reliable for web scraping in 2026 — with proxies, retries, and timeouts.

Target keyword (natural): python requests with proxy

Stabilize Requests-based scrapers with ProxiesAPI

Requests is great — until you scale. ProxiesAPI gives you a simple URL wrapper so you can keep your Requests code focused on parsing and retries, while the fetch layer stays consistent across targets.

Get 1,000 free API calls View pricing

The baseline: Requests with a Session + timeouts

Always start with these 3 rules:

use a requests.Session() (connection pooling)
set a real timeout (connect + read)
set a User-Agent

import requests

TIMEOUT = (10, 30)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
})


def get(url: str) -> requests.Response:
    r = session.get(url, timeout=TIMEOUT)
    r.raise_for_status()
    return r


resp = get("https://example.com")
print(resp.status_code, len(resp.text))

Why tuple timeouts?

connect timeout protects you from dead endpoints
read timeout protects you from slow servers (or stalled responses)

Proxies 101: what you can configure in Requests

When people search python requests with proxy, they usually want one of these:

route traffic via a single proxy
rotate proxies to avoid rate limits
separate HTTP vs HTTPS proxy

Requests supports proxies via a proxies dict:

proxies = {
    "http": "http://USER:PASS@HOST:PORT",
    "https": "http://USER:PASS@HOST:PORT",
}

r = session.get("https://httpbin.org/ip", proxies=proxies, timeout=TIMEOUT)
print(r.json())

Notes:

Many providers use an HTTP proxy endpoint for both http and https URLs.
If your proxy provider requires HTTPS proxy (CONNECT over TLS), the URL may start with https://....
Some sites behave differently depending on IP location; this can affect HTML structure too.

Retries: don’t blindly retry everything

Retries are not “try again forever.” You should:

retry idempotent requests (GET) only
back off exponentially
treat 429/503 differently from 404

Requests alone doesn’t do retries — but urllib3 (under it) does.

A solid default retry policy

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


def build_session() -> requests.Session:
    s = requests.Session()
    s.headers.update({
        "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
        "Accept-Language": "en-US,en;q=0.9",
    })

    retry = Retry(
        total=5,
        connect=5,
        read=5,
        backoff_factor=0.7,
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET"],
        raise_on_status=False,
        respect_retry_after_header=True,
    )

    adapter = HTTPAdapter(max_retries=retry, pool_connections=50, pool_maxsize=50)
    s.mount("http://", adapter)
    s.mount("https://", adapter)
    return s


session = build_session()
r = session.get("https://example.com", timeout=(10, 30))
print(r.status_code)

Common mistake: retrying 403/401

If you’re getting 401/403, retries usually just waste time.

Treat those as a signal:

your headers look bot-like
you’re blocked by IP
you need a different fetch approach (browser automation / anti-bot)

Timeouts: choose values that match your job

Good defaults depend on your workload.

Use case	Connect timeout	Read timeout	Why
one-off scripts	5–10s	20–30s	simple, interactive
batch crawler (1000s URLs)	3–5s	10–20s	fail fast, move on
detail pages with large HTML	5–10s	30–60s	allow big responses

If you crawl at scale, also add a global deadline per URL (your own stopwatch) so retries don’t turn one URL into a 5-minute sink.

Failure modes you’ll actually see

1) Hanging requests

Cause: missing timeouts.

Fix: always set timeout=(connect, read).

2) Lots of 429 (rate limits)

Fixes:

slow down (sleep / token bucket)
rotate IPs (proxies)
cache responses
crawl less frequently

3) 503 / 504 spikes

Often temporary. Backoff + retry helps.

4) HTML changes / empty HTML

Sometimes the response is a bot page.

Action:

log status code, headers, first 200 chars
save HTML samples for debugging
consider browser-based fetch

5) TLS / connection errors

These benefit from retries, but don’t overdo it.

Practical patterns

Pattern A: per-request proxy config

proxies = {"http": "http://HOST:PORT", "https": "http://HOST:PORT"}

r = session.get(url, proxies=proxies, timeout=TIMEOUT)

Good for: testing.

Pattern B: one Session per proxy

If you rotate proxies per batch, it can be useful to bind a session to a proxy.

def session_for_proxy(proxy_url: str) -> requests.Session:
    s = build_session()
    s.proxies.update({"http": proxy_url, "https": proxy_url})
    return s

Good for: batch jobs with stable IP per chunk.

Where ProxiesAPI fits (honestly)

You can do proxies directly inside Requests.

But when you’re scaling, there are two sources of complexity:

proxy selection / rotation / reliability
keeping your scraping code consistent across targets

ProxiesAPI helps by giving you a simple wrapper URL for fetching:

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com" | head

In Python, you only change the URL you fetch — your retry logic and parsing code stay the same:

from urllib.parse import quote


def proxiesapi_wrap(target_url: str, api_key: str) -> str:
    return f"http://api.proxiesapi.com/?key={api_key}&url={quote(target_url, safe='')}"


API_KEY = "API_KEY"
target = "https://example.com"
wrapped = proxiesapi_wrap(target, API_KEY)

r = session.get(wrapped, timeout=TIMEOUT)
print(r.status_code, len(r.text))

No overclaims: you still need sane timeouts, retries, and extraction logic — ProxiesAPI just makes the “fetch” layer cleaner.

Comparison: three approaches

Approach	Pros	Cons	Best for
Direct Requests (no proxy)	simplest, cheapest	blocks/rate limits sooner	friendly sites
Requests + proxy provider	full control, flexible	you manage complexity	mature pipelines
Requests + ProxiesAPI wrapper	minimal code change, stable fetch primitive	still need parsing + retry hygiene	teams that want simplicity

Quick checklist (copy/paste)

If you implement just those, your “python requests with proxy” scraper will go from fragile to production-grade.

Stabilize Requests-based scrapers with ProxiesAPI

Get 1,000 free API calls View pricing

A practical guide to using proxies with Python Requests: basic config, authenticated proxies, session rotation, retries, timeouts, and a simpler ProxiesAPI fetch pattern.

guide#python#requests#proxy

Retry Policies for Web Scrapers: What to Retry vs Fail Fast

Learn a production-safe retry strategy with status-code rules, backoff, and a Python helper you can drop into any scraper.

engineering#python#web-scraping#retries

Retries, Timeouts, and Backoff for Web Scraping (Python): Production Defaults That Work

Most scrapers fail because of networking, not parsing. Here are sane timeout defaults, a retry policy that won’t DDoS a site, and a drop-in requests/httpx implementation.

engineering#python#web-scraping#retries

Python Proxy Setup for Scraping: Requests, Retries, and Timeouts

Target keyword: python proxy — show a production-safe Python requests setup with proxy routing, backoff, and failure handling.

guide#python proxy#python#requests

Web Scraping with Python Requests: Proxies, Retries, and Timeouts (2026)

Related guides