Cloudflare Error 520 When Scraping: What It Means + 9 Fixes That Actually Work

Apr 12, 2026 · guide · #cloudflare, #error-520, #web-scraping, #proxies, #python, #anti-bot

If you scrape through Cloudflare long enough, you’ll eventually hit Error 520.

It’s frustrating because:

the page often gives you almost no useful details
it can be intermittent (works in browser, fails in code)
it gets confused with other Cloudflare blocks (403, 1020, 524)

This guide explains what Error Code 520 means, how it’s different from other Cloudflare errors, and 9 practical fixes that reduce failures in real scraping systems.

Reduce 520s with a stable proxy + retry layer

When your scraper scales, transient origin errors become a big part of your failure rate. ProxiesAPI helps by giving you consistent proxy routing + rotation so your retry strategy actually works.

Get 1,000 free API calls View pricing

What is Cloudflare Error 520?

520: Web server is returning an unknown error.

Cloudflare sits between you and the target site. A 520 generally means:

Cloudflare could connect to the origin server, but
the origin returned an invalid/empty/unexpected response (or closed connection)

In scraping terms, it often correlates with:

edge/origin instability
bot detection flows returning non-standard responses
TLS handshake quirks
aggressive rate limiting that manifests as connection resets

The key point: 520 is generic. You fix it by improving your request hygiene and making your network layer resilient.

520 vs other Cloudflare errors (quick map)

Here’s the mental model:

403 / 1020 (Access denied): you are explicitly blocked by firewall/WAF rules.
429 (Too many requests): rate limiting.
520 (Unknown error): origin response is malformed/empty/aborted.
524 (A timeout occurred): Cloudflare connected to origin, but origin took too long.

Table: what to try first

Code	Typical cause	First actions
403	WAF/captcha/bot block	headers, cookies, session, reduce concurrency, different IP
1020	firewall rule	different IP / allowlist / do not scrape
429	rate limit	backoff, jitter, lower RPS
520	origin unknown error	retries, TLS/header hygiene, rotate IPs
524	origin slow	higher timeouts, reduce payload, fewer parallel requests

Step 1: Confirm it’s really 520 (and capture evidence)

Before you change anything, add logging:

URL
status code
response headers (at least server, cf-ray, cf-cache-status)
first ~200 chars of body

import requests


def debug_get(url: str):
    r = requests.get(url, timeout=(10, 30))
    print("status:", r.status_code)
    print("server:", r.headers.get("server"))
    print("cf-ray:", r.headers.get("cf-ray"))
    print("cache:", r.headers.get("cf-cache-status"))
    print("body head:", (r.text or "")[:200])


debug_get("https://example.com")

If it’s a Cloudflare-branded HTML error page with 520, you’re in the right place.

9 fixes that actually work

Fix #1: Use a real timeout pair (connect/read)

A missing read timeout can cause your crawler to hang, and timeouts can cascade into partial responses.

TIMEOUT = (10, 40)  # connect, read
r = requests.get(url, timeout=TIMEOUT)

Also set max_retries behavior via your own retry logic (next fix) instead of relying on defaults.

Fix #2: Retry with exponential backoff + jitter (520 is often transient)

For 520/522/523/524/503, retries help a lot.

import random
import time

RETRYABLE = {520, 522, 523, 524, 503}


def get_with_retries(session, url, attempts=6):
    for i in range(attempts):
        r = session.get(url, timeout=(10, 40))
        if r.status_code not in RETRYABLE:
            return r

        sleep_s = min(20, (2 ** i)) + random.random()
        time.sleep(sleep_s)

    return r  # last response

Why jitter matters: without it, a fleet of workers retries at the same time and triggers the same edge failures.

Fix #3: Stop looking like a script (headers that match a browser)

Some origins behave differently based on headers.

Minimum useful set:

User-Agent
Accept
Accept-Language
Cache-Control

headers = {
  "User-Agent": "Mozilla/5.0 ... Chrome/124 Safari/537.36",
  "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
  "Accept-Language": "en-US,en;q=0.9",
  "Cache-Control": "no-cache",
}

If your code sends python-requests/x.y.z, some sites route you to different behavior.

Cloudflare flows and some origins expect cookies to persist.

import requests
session = requests.Session()

r1 = session.get(url, timeout=(10, 40))
r2 = session.get(url, timeout=(10, 40))

Even if you’re not solving captchas, cookie persistence can reduce “weird” responses.

Fix #5: Lower concurrency (and add a small delay)

A lot of 520 incidents are really “origin doesn’t like your traffic pattern.”

Try:

fewer parallel workers
1–2s delay per host
request budget per domain

Rule of thumb: if 520 disappears at low concurrency, it’s likely rate/edge pressure.

Fix #6: Rotate IPs (ProxiesAPI) when 520 correlates with a specific exit

Sometimes one IP range gets poor routing to an origin or is treated suspiciously.

With ProxiesAPI you can centralize your proxy config:

import os

proxy_url = os.getenv("PROXIESAPI_PROXY_URL")
proxies = {"http": proxy_url, "https": proxy_url} if proxy_url else None

r = session.get(url, proxies=proxies, timeout=(10, 40))

What to measure:

520 rate per exit IP / per region
average time-to-first-byte

Rotate when the error rate spikes.

Fix #7: Treat 520 as a network failure in your pipeline

Don’t mix “parsing failed” with “fetch failed”.

Store fetch failures separately:

url
status_code
cf-ray
attempt
timestamp

That lets you replay only failed URLs later (instead of re-crawling everything).

Fix #8: Use conditional fetch + caching to reduce repeated hits

If you fetch the same URL too often, you amplify rate pressure.

Practical options:

local disk cache (hash URL → HTML)
ETag/If-Modified-Since (when supported)
database “last_fetched_at” + min refresh interval

Less traffic → fewer errors.

Fix #9: Know when HTML scraping isn’t the right layer

Some sites behind Cloudflare serve:

dynamic JS-only content
bot challenges

If you keep getting 520/403/1020 even with good hygiene:

switch to official APIs
use a headless browser (Playwright) for a small subset
reduce scope or change data source

There’s no prize for brute-forcing a target that’s actively defending itself.

A robust Python template (copy/paste)

import os
import random
import time
import requests

TIMEOUT = (10, 40)
RETRYABLE = {520, 522, 523, 524, 503, 429, 403}

UA_POOL = [
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
]


def build_proxies():
    p = os.getenv("PROXIESAPI_PROXY_URL")
    return {"http": p, "https": p} if p else None


def fetch(url: str, attempts: int = 6) -> requests.Response:
    s = requests.Session()

    for i in range(attempts):
        headers = {
            "User-Agent": random.choice(UA_POOL),
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Cache-Control": "no-cache",
        }

        r = s.get(url, headers=headers, proxies=build_proxies(), timeout=TIMEOUT)
        if r.status_code not in RETRYABLE:
            return r

        # backoff + jitter
        time.sleep(min(20, 2 ** i) + random.random())

    return r


if __name__ == "__main__":
    r = fetch("https://example.com")
    print(r.status_code)

FAQ

Does ProxiesAPI “bypass Cloudflare”?

No. Cloudflare is a security layer; bypassing depends on the site’s configuration and what you’re doing.

What ProxiesAPI helps with is the reliability side:

IP rotation
centralized proxy config
better success rate under load (when combined with sane retries)

Is 520 always my fault?

No. It can be a real origin problem. But as a scraper, you can reduce how often you trigger it.

TL;DR

520 is a generic Cloudflare origin error.
Fix it with: timeouts, retries with jitter, browser-like headers, sessions/cookies, lower concurrency, and IP rotation.
Track failures separately so you can replay only the broken URLs.

Reduce 520s with a stable proxy + retry layer

When your scraper scales, transient origin errors become a big part of your failure rate. ProxiesAPI helps by giving you consistent proxy routing + rotation so your retry strategy actually works.

Get 1,000 free API calls View pricing

A practical guide to reducing Cloudflare blocks with better fingerprints, session reuse, rate control, and smarter escalation paths.

guides#bypass cloudflare#cloudflare#web-scraping

Web Unblockers: What They Are, When You Need One, and Top Options

A practical guide to web unblockers for scraping: how they differ from plain proxies, what problems they solve (and don’t), what to evaluate, and a shortlist of reputable options.

guide#web unblockers#proxies#web-scraping

Selenium Web Scraping with Python: Complete Guide

A practical Selenium web scraping with Python guide: setup, waits, selectors, anti-bot basics, exporting data, and when Selenium is the wrong tool. Includes comparison tables and a ProxiesAPI-friendly architecture pattern.

guide#python#selenium#web-scraping

How to Scrape Data Without Getting Blocked (A Practical Playbook)

A step-by-step anti-block strategy for web scraping: request fingerprinting, sessions, rate limits, retries, proxies, and when to use a real browser—without burning IPs or writing brittle code.

guide#web-scraping#anti-bot#rate-limiting

Cloudflare Error 520 When Scraping: What It Means + 9 Fixes That Actually Work

Related guides