Error Code 520: What It Means and How to Fix It When Scraping

Jun 11, 2026 · guide · #error code 520, #cloudflare, #web-scraping, #python, #debugging, #retries, #proxiesapi

If you scrape websites long enough, you will eventually hit this:

520 Web Server Returned an Unknown Error

It is frustrating because the message sounds specific, but it is not. Cloudflare uses 520 as a catch-all when the origin returns an empty, malformed, or otherwise unexpected response.

That means one important thing:

error code 520 is a symptom, not a root cause.

For scrapers, the real job is figuring out which bucket the failure belongs to, then changing the request pattern without making the situation worse.

This guide covers:

what error code 520 actually means
the common scraping-specific causes
a debugging checklist that works in real jobs
code patterns to retry safely
when proxies help and when they do not

Make 520s less random with a steadier network layer

When scraping jobs get bigger, unknown-origin errors become expensive. ProxiesAPI helps you remove one common variable: unstable IP and request routing.

Get 1,000 free API calls View pricing

What error code 520 means

Cloudflare's own documentation describes 520 as an unknown error where the origin server returned an empty, unknown, or unexpected response to Cloudflare.

On the infrastructure side, common causes include:

Cause	What it means in practice
origin crashed or misconfigured	the site itself is flaky
Cloudflare IPs blocked at origin	the site or a plugin is refusing traffic upstream
headers exceeded 128 KB	giant cookies or malformed requests
empty / malformed origin response	Cloudflare got back something that was not a valid HTTP response
missing response headers	the origin sent garbage or partial output
incorrect HTTP/2 config	protocol issue between Cloudflare and origin

Those are server-side descriptions. As a scraper, you usually experience them in more practical ways.

What error code 520 usually means when scraping

In scraping workflows, most 520s fall into one of four buckets.

Scenario	Typical symptom	Most likely action
you look like a bot	one URL works manually, repeated script requests fail	slow down, improve headers, rotate IPs
the origin is unstable	browser and script both fail intermittently	retry with backoff, reduce concurrency
your session is dirty	cookie-heavy or challenge-heavy flows start failing	clear cookies, rebuild session, simplify requests
your retry loop is too aggressive	failures spike after the first error	add jitter, lower retries, spread requests out

That is why generic advice like "just retry" is often wrong. Sometimes retrying is correct. Sometimes it turns a small block into a full ban.

The fastest way to diagnose a 520

Use this sequence.

1. Reproduce the URL outside your main crawler

Run the failing URL by itself with the same headers and no concurrency.

import requests

url = "https://target.example.com/page"
headers = {
    "User-Agent": "Mozilla/5.0",
    "Accept-Language": "en-US,en;q=0.9",
}

r = requests.get(url, headers=headers, timeout=(10, 30))
print(r.status_code)
print(r.headers.get("content-type"))
print(r.text[:500])

If it fails immediately in a single request, the problem is not your worker pool. It is either the request shape, the IP, or the site itself.

2. Compare browser behavior

Open the same page manually.

If the browser also fails, the origin may actually be broken.
If the browser works but your script fails, you are probably dealing with bot detection, missing headers, or a challenge flow.

3. Check whether the response is actually HTML

Some 520 chains happen after redirects, challenge pages, or empty responses. Save the raw body instead of assuming it is the target page.

with open("failed-response.html", "w", encoding="utf-8") as fh:
    fh.write(r.text)

4. Look at cookies and redirect count

Bloated cookie jars and redirect loops can push you into weird states quickly.

print("redirects:", len(r.history))
print("cookies:", requests.utils.dict_from_cookiejar(r.cookies))

5. Lower concurrency before changing everything else

Many 520 episodes are self-inflicted. The site tolerated 1 request per second, then your scraper jumped to 40 concurrent workers and the origin started returning garbage.

A practical fix checklist

When you see error code 520 in a scraper, work through these in order.

Check	Why it matters	Practical fix
timeouts	hung connections can leave you with partial responses	set connect/read timeouts explicitly
retry strategy	instant retries often amplify blocks	add exponential backoff + jitter
headers	bare `python-requests` can look suspicious	set a modern browser UA and language headers
concurrency	too many parallel requests can destabilize the origin	cut worker count first
cookies	oversized or stale cookies can break requests	start fresh sessions regularly
IP reputation	some targets punish repeated traffic from one IP	rotate requests through a proxy layer
raw response capture	without it, you are guessing	save the first failed body for inspection

This table is the 80/20. Most scraper-side 520 problems are solved somewhere in here.

Safe retry pattern for 520s

You do want retries, but only when they are disciplined.

from __future__ import annotations

import os
from urllib.parse import quote

import requests
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential_jitter

TIMEOUT = (10, 30)
HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/136.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}

session = requests.Session()
session.headers.update(HEADERS)


def build_fetch_url(url: str) -> str:
    api_key = os.getenv("PROXIESAPI_KEY", "").strip()
    if not api_key:
        return url
    return (
        "https://api.proxiesapi.com/?auth_key="
        + quote(api_key, safe="")
        + "&url="
        + quote(url, safe="")
    )


@retry(
    reraise=True,
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=1, max=20),
    retry=retry_if_exception_type(requests.RequestException),
)
def fetch(url: str) -> requests.Response:
    response = session.get(build_fetch_url(url), timeout=TIMEOUT)
    response.raise_for_status()
    return response

Why this helps:

retries are capped
backoff spreads requests out
headers are less bot-shaped
ProxiesAPI is optional instead of hard-coded

When proxies actually help

People often say "use proxies" as if that solves everything. It does not.

Proxies help when the main issue is:

repeated requests from one IP
geo-sensitive responses
unstable reputation on your current egress IP
high request volume across many target pages

Proxies do not help much when:

the origin is genuinely down
your headers/session are malformed
the site needs full browser execution
your code is retrying too aggressively

That distinction matters, because otherwise you pay for a proxy layer while keeping the real bug.

Requests vs browser automation for 520-heavy targets

Approach	Best when	Risk
`requests`	server-rendered pages, light bot protection	challenge pages can break silently
Playwright / Selenium	JS-heavy pages, challenge flows, session-heavy sites	higher cost, slower throughput
`requests` + ProxiesAPI	many pages, mostly stable HTML, IP pressure is the issue	still fails if you need browser execution

If the response body shows challenge markup or an interstitial instead of the page you expected, browser automation may be the real fix, not another retry loop.

A simple 520 triage helper

Here is a small diagnostic utility that saves the first bad response:

from pathlib import Path

import requests


def capture_failure(url: str, out_dir: str = "debug") -> None:
    Path(out_dir).mkdir(parents=True, exist_ok=True)

    try:
        r = requests.get(url, timeout=(10, 30), headers={"User-Agent": "Mozilla/5.0"})
        r.raise_for_status()
        print("ok", r.status_code)
        return
    except requests.RequestException as exc:
        response = getattr(exc, "response", None)
        if response is not None:
            Path(out_dir, "body.html").write_text(response.text, encoding="utf-8")
            Path(out_dir, "headers.txt").write_text(str(response.headers), encoding="utf-8")
            print("saved failed response", response.status_code)
        raise

This is boring, but it is exactly the kind of boring step that prevents wasted hours.

What not to do

The bad pattern looks like this:

request fails
script retries instantly
failure repeats
worker pool doubles down
IP gets hotter
failures spread

That is how a small, local 520 turns into a full-job outage.

Avoid these habits:

infinite retries
zero delay between retries
high concurrency during debugging
changing five variables at once
assuming every 520 is "just Cloudflare being weird"

Bottom line

Error code 520 means Cloudflare received something invalid or unexpected from the origin. In scraping work, that usually translates to one of three practical realities:

the site is unstable
your traffic pattern triggered protection
your own request/session behavior is messy

Treat 520 as a debugging workflow, not a single error message. Start by isolating one URL, capture the raw response, reduce concurrency, and add sane backoff. If the problem is IP pressure rather than page structure, bring in ProxiesAPI to steady the request path instead of brute-forcing the same broken loop.

Make 520s less random with a steadier network layer

When scraping jobs get bigger, unknown-origin errors become expensive. ProxiesAPI helps you remove one common variable: unstable IP and request routing.

Get 1,000 free API calls View pricing

A practical playbook for eliminating HTTP 429s: rate limits, concurrency control, jittered exponential backoff, token buckets, Retry-After handling, and when proxies help vs hurt. Includes a production-ready Python retry wrapper.

guide#http#429#rate-limiting

How to Scrape Shopify Stores: Product, Price, Inventory

Break down how to detect Shopify storefront patterns and extract product, pricing, and availability data without relying on brittle selectors.

guide#shopify product scraping#shopify#ecommerce

How to Scrape Google Search Results with Python

Walk through extracting titles, URLs, and snippets from Google result pages while handling rate limits and anti-bot friction.

guide#scrape google#python#serp

Web Scraping with Python Requests: Proxies, Retries, and Timeouts (2026)

Make Python Requests reliable for scraping: proxy configuration, timeouts, retries with backoff, common failure modes, and when to use ProxiesAPI for a stable fetch layer.

guide#python#requests#proxy

Error Code 520: What It Means and How to Fix It When Scraping

Related guides