Web Scraping with Python Requests: Proxies, Retries, and Timeouts (2026)
If your scraper “works on my laptop” but fails in production, it’s usually not your parser.
It’s the network layer:
- a request hangs because you didn’t set timeouts
- the server rate limits you (429)
- TLS handshakes fail intermittently
- responses vary by IP, geography, or load
This guide is a practical checklist for making Python Requests reliable for web scraping in 2026 — with proxies, retries, and timeouts.
Target keyword (natural): python requests with proxy
Requests is great — until you scale. ProxiesAPI gives you a simple URL wrapper so you can keep your Requests code focused on parsing and retries, while the fetch layer stays consistent across targets.
The baseline: Requests with a Session + timeouts
Always start with these 3 rules:
- use a
requests.Session()(connection pooling) - set a real timeout (connect + read)
- set a User-Agent
import requests
TIMEOUT = (10, 30) # connect, read
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
})
def get(url: str) -> requests.Response:
r = session.get(url, timeout=TIMEOUT)
r.raise_for_status()
return r
resp = get("https://example.com")
print(resp.status_code, len(resp.text))
Why tuple timeouts?
- connect timeout protects you from dead endpoints
- read timeout protects you from slow servers (or stalled responses)
Proxies 101: what you can configure in Requests
When people search python requests with proxy, they usually want one of these:
- route traffic via a single proxy
- rotate proxies to avoid rate limits
- separate HTTP vs HTTPS proxy
Requests supports proxies via a proxies dict:
proxies = {
"http": "http://USER:PASS@HOST:PORT",
"https": "http://USER:PASS@HOST:PORT",
}
r = session.get("https://httpbin.org/ip", proxies=proxies, timeout=TIMEOUT)
print(r.json())
Notes:
- Many providers use an HTTP proxy endpoint for both
httpandhttpsURLs. - If your proxy provider requires HTTPS proxy (CONNECT over TLS), the URL may start with
https://.... - Some sites behave differently depending on IP location; this can affect HTML structure too.
Retries: don’t blindly retry everything
Retries are not “try again forever.” You should:
- retry idempotent requests (GET) only
- back off exponentially
- treat 429/503 differently from 404
Requests alone doesn’t do retries — but urllib3 (under it) does.
A solid default retry policy
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def build_session() -> requests.Session:
s = requests.Session()
s.headers.update({
"User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
"Accept-Language": "en-US,en;q=0.9",
})
retry = Retry(
total=5,
connect=5,
read=5,
backoff_factor=0.7,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET"],
raise_on_status=False,
respect_retry_after_header=True,
)
adapter = HTTPAdapter(max_retries=retry, pool_connections=50, pool_maxsize=50)
s.mount("http://", adapter)
s.mount("https://", adapter)
return s
session = build_session()
r = session.get("https://example.com", timeout=(10, 30))
print(r.status_code)
Common mistake: retrying 403/401
If you’re getting 401/403, retries usually just waste time.
Treat those as a signal:
- your headers look bot-like
- you’re blocked by IP
- you need a different fetch approach (browser automation / anti-bot)
Timeouts: choose values that match your job
Good defaults depend on your workload.
| Use case | Connect timeout | Read timeout | Why |
|---|---|---|---|
| one-off scripts | 5–10s | 20–30s | simple, interactive |
| batch crawler (1000s URLs) | 3–5s | 10–20s | fail fast, move on |
| detail pages with large HTML | 5–10s | 30–60s | allow big responses |
If you crawl at scale, also add a global deadline per URL (your own stopwatch) so retries don’t turn one URL into a 5-minute sink.
Failure modes you’ll actually see
1) Hanging requests
Cause: missing timeouts.
Fix: always set timeout=(connect, read).
2) Lots of 429 (rate limits)
Fixes:
- slow down (sleep / token bucket)
- rotate IPs (proxies)
- cache responses
- crawl less frequently
3) 503 / 504 spikes
Often temporary. Backoff + retry helps.
4) HTML changes / empty HTML
Sometimes the response is a bot page.
Action:
- log status code, headers, first 200 chars
- save HTML samples for debugging
- consider browser-based fetch
5) TLS / connection errors
These benefit from retries, but don’t overdo it.
Practical patterns
Pattern A: per-request proxy config
proxies = {"http": "http://HOST:PORT", "https": "http://HOST:PORT"}
r = session.get(url, proxies=proxies, timeout=TIMEOUT)
Good for: testing.
Pattern B: one Session per proxy
If you rotate proxies per batch, it can be useful to bind a session to a proxy.
def session_for_proxy(proxy_url: str) -> requests.Session:
s = build_session()
s.proxies.update({"http": proxy_url, "https": proxy_url})
return s
Good for: batch jobs with stable IP per chunk.
Where ProxiesAPI fits (honestly)
You can do proxies directly inside Requests.
But when you’re scaling, there are two sources of complexity:
- proxy selection / rotation / reliability
- keeping your scraping code consistent across targets
ProxiesAPI helps by giving you a simple wrapper URL for fetching:
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com" | head
In Python, you only change the URL you fetch — your retry logic and parsing code stay the same:
from urllib.parse import quote
def proxiesapi_wrap(target_url: str, api_key: str) -> str:
return f"http://api.proxiesapi.com/?key={api_key}&url={quote(target_url, safe='')}"
API_KEY = "API_KEY"
target = "https://example.com"
wrapped = proxiesapi_wrap(target, API_KEY)
r = session.get(wrapped, timeout=TIMEOUT)
print(r.status_code, len(r.text))
No overclaims: you still need sane timeouts, retries, and extraction logic — ProxiesAPI just makes the “fetch” layer cleaner.
Comparison: three approaches
| Approach | Pros | Cons | Best for |
|---|---|---|---|
| Direct Requests (no proxy) | simplest, cheapest | blocks/rate limits sooner | friendly sites |
| Requests + proxy provider | full control, flexible | you manage complexity | mature pipelines |
| Requests + ProxiesAPI wrapper | minimal code change, stable fetch primitive | still need parsing + retry hygiene | teams that want simplicity |
Quick checklist (copy/paste)
- use
Session() - set
timeout=(connect, read) - set a User-Agent
- implement retries for 429/5xx with backoff
- don’t retry 401/403 blindly
- log failures + save HTML samples
- add dedupe + caching for batch jobs
If you implement just those, your “python requests with proxy” scraper will go from fragile to production-grade.
Requests is great — until you scale. ProxiesAPI gives you a simple URL wrapper so you can keep your Requests code focused on parsing and retries, while the fetch layer stays consistent across targets.