Error Code 520 When Scraping: What It Means and a Practical Fix Checklist

You’re scraping along, everything is fine… and then you hit:

520 Web Server Returned an Unknown Error

If you search it, you get a lot of “try again later” advice.

That’s not helpful when you’re trying to build a crawler that runs nightly.

This guide covers:

  • what a 520 error actually means (in Cloudflare terms)
  • the most common scraping-specific causes
  • a debugging checklist that finds the root cause fast
  • code patterns for retries and backoff that don’t make blocks worse
Reduce random 520s with a stable fetch layer

When you’re scraping at scale, reliability comes from engineering: timeouts, retries, backoff, and a consistent proxy layer. ProxiesAPI helps by making the network path less volatile across many targets.


What is error code 520?

520 is a Cloudflare catch-all error.

It means Cloudflare could not get a valid response from the origin server (or the response was malformed) — but it doesn’t map neatly to a specific standard HTTP status like 502/503/504.

In practice, for scrapers, a 520 typically comes from one of these buckets:

  • your request got blocked (WAF / bot protection) and the origin didn’t respond cleanly
  • the origin is flaky or overloaded
  • Cloudflare rejected or altered the connection to the origin
  • your client is causing strange behavior (timeouts, premature disconnects, weird headers)

The key: 520 is a symptom, not a diagnosis.


The 80/20 causes when scraping

1) You’re sending “bot-shaped” traffic

Common triggers:

  • no User-Agent / default UA
  • missing Accept / Accept-Language headers
  • suspicious header ordering or fingerprint mismatch
  • high request rate from one IP

2) You’re being challenged and your client can’t complete it

Some sites return a JS / Turnstile / captcha flow.

If your client is plain requests, it won’t run JS and may loop through partial pages.

3) The origin is unstable (not your fault)

If the site is down or overloaded, you’ll see intermittent 520s even from a browser.

Your fix here is:

  • retry with backoff
  • reduce concurrency
  • cache responses

4) Your retries are making the block worse

This is a classic failure mode:

  1. request fails
  2. code retries immediately (no backoff)
  3. you amplify the “bad traffic” signal
  4. blocks escalate → more failures

Debugging flow (fast, deterministic)

Do this in order. Don’t jump to “buy more proxies” before you know what’s happening.

Step 1: Confirm it’s Cloudflare

Look for response headers like:

  • server: cloudflare
  • cf-ray: ...

If you can’t see headers (because you’re using a proxy API), fetch one failing URL directly with curl -I from your machine to confirm.

Step 2: Capture the first failing response body

Don’t throw it away.

Save:

  • status code
  • headers
  • first ~2KB of body

If the body is HTML and includes “Attention Required”, “Just a moment…”, or a captcha, you’re blocked/challenged — not “randomly failing”.

Step 3: Reproduce with a browser

Open the same URL in a normal browser:

  • If the browser fails too → origin is likely down/unhealthy.
  • If the browser works instantly but your scraper fails → your traffic shape is the issue.

Step 4: Reduce the problem (one URL, one request)

Make a minimal script that does exactly one request.

If a single request fails, you don’t have a “scaling” problem. You have an access/fingerprint problem.


A resilient request pattern (Python)

This is the baseline request pattern you should use for any scraping fetch layer:

  • timeouts (connect + read)
  • retries with exponential backoff + jitter
  • sanity checks (HTML too small, wrong content-type, etc.)
import os
import random
import time
import urllib.parse

import requests

TIMEOUT = (10, 60)
session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
})

PROXIESAPI_KEY = os.getenv("PROXIESAPI_KEY")


def via_proxiesapi(url: str) -> str:
    return "https://api.proxiesapi.com/?" + urllib.parse.urlencode({
        "key": PROXIESAPI_KEY,
        "url": url,
    })


def fetch(url: str, *, use_proxiesapi: bool = False, retries: int = 4) -> str:
    last_err = None
    for attempt in range(1, retries + 1):
        try:
            final = via_proxiesapi(url) if (use_proxiesapi and PROXIESAPI_KEY) else url
            r = session.get(final, timeout=TIMEOUT)

            # Some 520s show up as HTML challenges or tiny “error” pages.
            if r.status_code >= 400 and r.text and r.text.lstrip().startswith("<"):
                head = r.text[:400].lower()
                if "cloudflare" in head or "just a moment" in head or "attention required" in head:
                    raise RuntimeError(f"Challenge/block page (status={r.status_code})")

            r.raise_for_status()
            return r.text
        except Exception as e:
            last_err = e
            base = 2.0 ** attempt
            jitter = random.uniform(0.0, 0.4 * base)
            sleep_s = base + jitter
            time.sleep(sleep_s)
    raise RuntimeError(last_err)

“Fix” checklist (what to change first)

When you hit 520s consistently, change one variable at a time:

  1. Add sane headers (User-Agent, Accept, Accept-Language)
  2. Enforce timeouts
  3. Add backoff + jitter (no immediate retry loops)
  4. Lower concurrency
  5. Add caching so re-runs don’t re-hit the same URLs
  6. Use a proxy layer (rotate IPs, avoid “one IP hammering”)

If your target uses heavy bot mitigation, you may also need:

  • a browser-based fetch (Playwright)
  • an unblocker service

But the first 5 steps are still mandatory. They make every approach better.


Quick decision: is it you, or the site?

Use this table:

SymptomLikely CauseNext Move
Browser fails tooOrigin/server issueRetry with backoff; wait
Browser works; scraper fails instantlyBot protection / fingerprintAdjust headers; proxy/unblock
Works for a while; fails after N requestsRate limiting/IP reputationLower rate; rotate IPs
Only fails on some pagesEdge cases/redirectsLog bodies; handle redirects

Bottom line

Treat 520 like a smoke alarm:

  • don’t ignore it
  • don’t panic
  • isolate the cause and fix the fetch layer first

Once your fetch layer is stable, the rest of your scraper becomes boring — and that’s exactly what you want.

Reduce random 520s with a stable fetch layer

When you’re scraping at scale, reliability comes from engineering: timeouts, retries, backoff, and a consistent proxy layer. ProxiesAPI helps by making the network path less volatile across many targets.

Related guides

Cloudflare Error 520 When Scraping: What It Means + 9 Fixes That Actually Work
Error 520 is Cloudflare’s generic 'unknown origin' failure. Here’s how to diagnose it (vs 403/1020/524) and fix it with TLS hygiene, headers, session handling, retries, and proxy rotation patterns using ProxiesAPI.
guide#cloudflare#error-520#web-scraping
Web Unblockers: What They Are, When You Need One, and Top Options
A practical guide to web unblockers for scraping: how they differ from plain proxies, what problems they solve (and don’t), what to evaluate, and a shortlist of reputable options.
guide#web unblockers#proxies#web-scraping
Scraping Real Estate Data: Zillow, Realtor, Redfin Compared
A practical guide to scraping real estate data in 2026: Zillow vs Realtor.com vs Redfin. What each site exposes, what breaks at scale, and realistic approaches for building a listings dataset.
guide#scraping real estate data#real estate#zillow
Scraping Airbnb Listings: Pricing, Availability, Reviews
A practical, risk-aware guide to scraping Airbnb listings: what data exists, what breaks, ethics/ToS considerations, and safer architecture patterns. Includes comparison tables and alternatives like permitted datasets and partner approaches.
guide#airbnb#web-scraping#price-scraping