Cloudflare Error 520 When Scraping: What It Means + 9 Fixes That Actually Work
If you scrape through Cloudflare long enough, you’ll eventually hit Error 520.
It’s frustrating because:
- the page often gives you almost no useful details
- it can be intermittent (works in browser, fails in code)
- it gets confused with other Cloudflare blocks (403, 1020, 524)
This guide explains what Error Code 520 means, how it’s different from other Cloudflare errors, and 9 practical fixes that reduce failures in real scraping systems.
When your scraper scales, transient origin errors become a big part of your failure rate. ProxiesAPI helps by giving you consistent proxy routing + rotation so your retry strategy actually works.
What is Cloudflare Error 520?
520: Web server is returning an unknown error.
Cloudflare sits between you and the target site. A 520 generally means:
- Cloudflare could connect to the origin server, but
- the origin returned an invalid/empty/unexpected response (or closed connection)
In scraping terms, it often correlates with:
- edge/origin instability
- bot detection flows returning non-standard responses
- TLS handshake quirks
- aggressive rate limiting that manifests as connection resets
The key point: 520 is generic. You fix it by improving your request hygiene and making your network layer resilient.
520 vs other Cloudflare errors (quick map)
Here’s the mental model:
- 403 / 1020 (Access denied): you are explicitly blocked by firewall/WAF rules.
- 429 (Too many requests): rate limiting.
- 520 (Unknown error): origin response is malformed/empty/aborted.
- 524 (A timeout occurred): Cloudflare connected to origin, but origin took too long.
Table: what to try first
| Code | Typical cause | First actions |
|---|---|---|
| 403 | WAF/captcha/bot block | headers, cookies, session, reduce concurrency, different IP |
| 1020 | firewall rule | different IP / allowlist / do not scrape |
| 429 | rate limit | backoff, jitter, lower RPS |
| 520 | origin unknown error | retries, TLS/header hygiene, rotate IPs |
| 524 | origin slow | higher timeouts, reduce payload, fewer parallel requests |
Step 1: Confirm it’s really 520 (and capture evidence)
Before you change anything, add logging:
- URL
- status code
- response headers (at least
server,cf-ray,cf-cache-status) - first ~200 chars of body
import requests
def debug_get(url: str):
r = requests.get(url, timeout=(10, 30))
print("status:", r.status_code)
print("server:", r.headers.get("server"))
print("cf-ray:", r.headers.get("cf-ray"))
print("cache:", r.headers.get("cf-cache-status"))
print("body head:", (r.text or "")[:200])
debug_get("https://example.com")
If it’s a Cloudflare-branded HTML error page with 520, you’re in the right place.
9 fixes that actually work
Fix #1: Use a real timeout pair (connect/read)
A missing read timeout can cause your crawler to hang, and timeouts can cascade into partial responses.
TIMEOUT = (10, 40) # connect, read
r = requests.get(url, timeout=TIMEOUT)
Also set max_retries behavior via your own retry logic (next fix) instead of relying on defaults.
Fix #2: Retry with exponential backoff + jitter (520 is often transient)
For 520/522/523/524/503, retries help a lot.
import random
import time
RETRYABLE = {520, 522, 523, 524, 503}
def get_with_retries(session, url, attempts=6):
for i in range(attempts):
r = session.get(url, timeout=(10, 40))
if r.status_code not in RETRYABLE:
return r
sleep_s = min(20, (2 ** i)) + random.random()
time.sleep(sleep_s)
return r # last response
Why jitter matters: without it, a fleet of workers retries at the same time and triggers the same edge failures.
Fix #3: Stop looking like a script (headers that match a browser)
Some origins behave differently based on headers.
Minimum useful set:
User-AgentAcceptAccept-LanguageCache-Control
headers = {
"User-Agent": "Mozilla/5.0 ... Chrome/124 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Cache-Control": "no-cache",
}
If your code sends python-requests/x.y.z, some sites route you to different behavior.
Fix #4: Use a Session + cookie jar
Cloudflare flows and some origins expect cookies to persist.
import requests
session = requests.Session()
r1 = session.get(url, timeout=(10, 40))
r2 = session.get(url, timeout=(10, 40))
Even if you’re not solving captchas, cookie persistence can reduce “weird” responses.
Fix #5: Lower concurrency (and add a small delay)
A lot of 520 incidents are really “origin doesn’t like your traffic pattern.”
Try:
- fewer parallel workers
- 1–2s delay per host
- request budget per domain
Rule of thumb: if 520 disappears at low concurrency, it’s likely rate/edge pressure.
Fix #6: Rotate IPs (ProxiesAPI) when 520 correlates with a specific exit
Sometimes one IP range gets poor routing to an origin or is treated suspiciously.
With ProxiesAPI you can centralize your proxy config:
import os
proxy_url = os.getenv("PROXIESAPI_PROXY_URL")
proxies = {"http": proxy_url, "https": proxy_url} if proxy_url else None
r = session.get(url, proxies=proxies, timeout=(10, 40))
What to measure:
- 520 rate per exit IP / per region
- average time-to-first-byte
Rotate when the error rate spikes.
Fix #7: Treat 520 as a network failure in your pipeline
Don’t mix “parsing failed” with “fetch failed”.
Store fetch failures separately:
urlstatus_codecf-rayattempttimestamp
That lets you replay only failed URLs later (instead of re-crawling everything).
Fix #8: Use conditional fetch + caching to reduce repeated hits
If you fetch the same URL too often, you amplify rate pressure.
Practical options:
- local disk cache (hash URL → HTML)
- ETag/If-Modified-Since (when supported)
- database “last_fetched_at” + min refresh interval
Less traffic → fewer errors.
Fix #9: Know when HTML scraping isn’t the right layer
Some sites behind Cloudflare serve:
- dynamic JS-only content
- bot challenges
If you keep getting 520/403/1020 even with good hygiene:
- switch to official APIs
- use a headless browser (Playwright) for a small subset
- reduce scope or change data source
There’s no prize for brute-forcing a target that’s actively defending itself.
A robust Python template (copy/paste)
import os
import random
import time
import requests
TIMEOUT = (10, 40)
RETRYABLE = {520, 522, 523, 524, 503, 429, 403}
UA_POOL = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
]
def build_proxies():
p = os.getenv("PROXIESAPI_PROXY_URL")
return {"http": p, "https": p} if p else None
def fetch(url: str, attempts: int = 6) -> requests.Response:
s = requests.Session()
for i in range(attempts):
headers = {
"User-Agent": random.choice(UA_POOL),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Cache-Control": "no-cache",
}
r = s.get(url, headers=headers, proxies=build_proxies(), timeout=TIMEOUT)
if r.status_code not in RETRYABLE:
return r
# backoff + jitter
time.sleep(min(20, 2 ** i) + random.random())
return r
if __name__ == "__main__":
r = fetch("https://example.com")
print(r.status_code)
FAQ
Does ProxiesAPI “bypass Cloudflare”?
No. Cloudflare is a security layer; bypassing depends on the site’s configuration and what you’re doing.
What ProxiesAPI helps with is the reliability side:
- IP rotation
- centralized proxy config
- better success rate under load (when combined with sane retries)
Is 520 always my fault?
No. It can be a real origin problem. But as a scraper, you can reduce how often you trigger it.
TL;DR
- 520 is a generic Cloudflare origin error.
- Fix it with: timeouts, retries with jitter, browser-like headers, sessions/cookies, lower concurrency, and IP rotation.
- Track failures separately so you can replay only the broken URLs.
When your scraper scales, transient origin errors become a big part of your failure rate. ProxiesAPI helps by giving you consistent proxy routing + rotation so your retry strategy actually works.