Cloudflare Error 520 When Scraping: What It Means + 9 Fixes That Actually Work

If you scrape through Cloudflare long enough, you’ll eventually hit Error 520.

It’s frustrating because:

  • the page often gives you almost no useful details
  • it can be intermittent (works in browser, fails in code)
  • it gets confused with other Cloudflare blocks (403, 1020, 524)

This guide explains what Error Code 520 means, how it’s different from other Cloudflare errors, and 9 practical fixes that reduce failures in real scraping systems.

Reduce 520s with a stable proxy + retry layer

When your scraper scales, transient origin errors become a big part of your failure rate. ProxiesAPI helps by giving you consistent proxy routing + rotation so your retry strategy actually works.


What is Cloudflare Error 520?

520: Web server is returning an unknown error.

Cloudflare sits between you and the target site. A 520 generally means:

  • Cloudflare could connect to the origin server, but
  • the origin returned an invalid/empty/unexpected response (or closed connection)

In scraping terms, it often correlates with:

  • edge/origin instability
  • bot detection flows returning non-standard responses
  • TLS handshake quirks
  • aggressive rate limiting that manifests as connection resets

The key point: 520 is generic. You fix it by improving your request hygiene and making your network layer resilient.


520 vs other Cloudflare errors (quick map)

Here’s the mental model:

  • 403 / 1020 (Access denied): you are explicitly blocked by firewall/WAF rules.
  • 429 (Too many requests): rate limiting.
  • 520 (Unknown error): origin response is malformed/empty/aborted.
  • 524 (A timeout occurred): Cloudflare connected to origin, but origin took too long.

Table: what to try first

CodeTypical causeFirst actions
403WAF/captcha/bot blockheaders, cookies, session, reduce concurrency, different IP
1020firewall ruledifferent IP / allowlist / do not scrape
429rate limitbackoff, jitter, lower RPS
520origin unknown errorretries, TLS/header hygiene, rotate IPs
524origin slowhigher timeouts, reduce payload, fewer parallel requests

Step 1: Confirm it’s really 520 (and capture evidence)

Before you change anything, add logging:

  • URL
  • status code
  • response headers (at least server, cf-ray, cf-cache-status)
  • first ~200 chars of body
import requests


def debug_get(url: str):
    r = requests.get(url, timeout=(10, 30))
    print("status:", r.status_code)
    print("server:", r.headers.get("server"))
    print("cf-ray:", r.headers.get("cf-ray"))
    print("cache:", r.headers.get("cf-cache-status"))
    print("body head:", (r.text or "")[:200])


debug_get("https://example.com")

If it’s a Cloudflare-branded HTML error page with 520, you’re in the right place.


9 fixes that actually work

Fix #1: Use a real timeout pair (connect/read)

A missing read timeout can cause your crawler to hang, and timeouts can cascade into partial responses.

TIMEOUT = (10, 40)  # connect, read
r = requests.get(url, timeout=TIMEOUT)

Also set max_retries behavior via your own retry logic (next fix) instead of relying on defaults.


Fix #2: Retry with exponential backoff + jitter (520 is often transient)

For 520/522/523/524/503, retries help a lot.

import random
import time

RETRYABLE = {520, 522, 523, 524, 503}


def get_with_retries(session, url, attempts=6):
    for i in range(attempts):
        r = session.get(url, timeout=(10, 40))
        if r.status_code not in RETRYABLE:
            return r

        sleep_s = min(20, (2 ** i)) + random.random()
        time.sleep(sleep_s)

    return r  # last response

Why jitter matters: without it, a fleet of workers retries at the same time and triggers the same edge failures.


Fix #3: Stop looking like a script (headers that match a browser)

Some origins behave differently based on headers.

Minimum useful set:

  • User-Agent
  • Accept
  • Accept-Language
  • Cache-Control
headers = {
  "User-Agent": "Mozilla/5.0 ... Chrome/124 Safari/537.36",
  "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
  "Accept-Language": "en-US,en;q=0.9",
  "Cache-Control": "no-cache",
}

If your code sends python-requests/x.y.z, some sites route you to different behavior.


Cloudflare flows and some origins expect cookies to persist.

import requests
session = requests.Session()

r1 = session.get(url, timeout=(10, 40))
r2 = session.get(url, timeout=(10, 40))

Even if you’re not solving captchas, cookie persistence can reduce “weird” responses.


Fix #5: Lower concurrency (and add a small delay)

A lot of 520 incidents are really “origin doesn’t like your traffic pattern.”

Try:

  • fewer parallel workers
  • 1–2s delay per host
  • request budget per domain

Rule of thumb: if 520 disappears at low concurrency, it’s likely rate/edge pressure.


Fix #6: Rotate IPs (ProxiesAPI) when 520 correlates with a specific exit

Sometimes one IP range gets poor routing to an origin or is treated suspiciously.

With ProxiesAPI you can centralize your proxy config:

import os

proxy_url = os.getenv("PROXIESAPI_PROXY_URL")
proxies = {"http": proxy_url, "https": proxy_url} if proxy_url else None

r = session.get(url, proxies=proxies, timeout=(10, 40))

What to measure:

  • 520 rate per exit IP / per region
  • average time-to-first-byte

Rotate when the error rate spikes.


Fix #7: Treat 520 as a network failure in your pipeline

Don’t mix “parsing failed” with “fetch failed”.

Store fetch failures separately:

  • url
  • status_code
  • cf-ray
  • attempt
  • timestamp

That lets you replay only failed URLs later (instead of re-crawling everything).


Fix #8: Use conditional fetch + caching to reduce repeated hits

If you fetch the same URL too often, you amplify rate pressure.

Practical options:

  • local disk cache (hash URL → HTML)
  • ETag/If-Modified-Since (when supported)
  • database “last_fetched_at” + min refresh interval

Less traffic → fewer errors.


Fix #9: Know when HTML scraping isn’t the right layer

Some sites behind Cloudflare serve:

  • dynamic JS-only content
  • bot challenges

If you keep getting 520/403/1020 even with good hygiene:

  • switch to official APIs
  • use a headless browser (Playwright) for a small subset
  • reduce scope or change data source

There’s no prize for brute-forcing a target that’s actively defending itself.


A robust Python template (copy/paste)

import os
import random
import time
import requests

TIMEOUT = (10, 40)
RETRYABLE = {520, 522, 523, 524, 503, 429, 403}

UA_POOL = [
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
]


def build_proxies():
    p = os.getenv("PROXIESAPI_PROXY_URL")
    return {"http": p, "https": p} if p else None


def fetch(url: str, attempts: int = 6) -> requests.Response:
    s = requests.Session()

    for i in range(attempts):
        headers = {
            "User-Agent": random.choice(UA_POOL),
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Cache-Control": "no-cache",
        }

        r = s.get(url, headers=headers, proxies=build_proxies(), timeout=TIMEOUT)
        if r.status_code not in RETRYABLE:
            return r

        # backoff + jitter
        time.sleep(min(20, 2 ** i) + random.random())

    return r


if __name__ == "__main__":
    r = fetch("https://example.com")
    print(r.status_code)

FAQ

Does ProxiesAPI “bypass Cloudflare”?

No. Cloudflare is a security layer; bypassing depends on the site’s configuration and what you’re doing.

What ProxiesAPI helps with is the reliability side:

  • IP rotation
  • centralized proxy config
  • better success rate under load (when combined with sane retries)

Is 520 always my fault?

No. It can be a real origin problem. But as a scraper, you can reduce how often you trigger it.


TL;DR

  • 520 is a generic Cloudflare origin error.
  • Fix it with: timeouts, retries with jitter, browser-like headers, sessions/cookies, lower concurrency, and IP rotation.
  • Track failures separately so you can replay only the broken URLs.
Reduce 520s with a stable proxy + retry layer

When your scraper scales, transient origin errors become a big part of your failure rate. ProxiesAPI helps by giving you consistent proxy routing + rotation so your retry strategy actually works.

Related guides

Async Web Scraping in Python: asyncio + aiohttp (Concurrency Without Getting Banned)
Learn production-grade async scraping in Python with asyncio + aiohttp: bounded concurrency, per-host limits, retry/backoff, timeouts, and proxy rotation patterns. Includes a complete working crawler template.
guide#python#asyncio#aiohttp
ISP Proxies Explained: When Datacenter and Residential Aren’t Enough
What ISP proxies are, when they outperform datacenter/residential, tradeoffs, and how to rotate them safely for scraping at scale.
guide#proxies#isp-proxies#rotating-proxies
Google Trends Scraping: API Options and DIY Methods (2026)
Compare official and unofficial ways to fetch Google Trends data, plus a DIY approach with throttling, retries, and proxy rotation for stability.
guide#google-trends#web-scraping#python
How to Scrape Google Search Results with Python (Without Getting Blocked)
A practical SERP scraping workflow in Python: handle consent/interstitials, parse organic results defensively, rotate IPs, backoff on blocks, and export clean results. Includes a ProxiesAPI-backed fetch layer.
guide#how to scrape google search results with python#python#serp