Beautiful Soup vs Scrapy vs Selenium (2026): Which Python Scraper Should You Use?

If you’re scraping with Python, you’ll hear three names over and over:

  • Beautiful Soup (usually with requests)
  • Scrapy
  • Selenium (or Playwright, in the same “browser automation” category)

They’re not interchangeable.

They solve different problems, and choosing the wrong one costs you weeks:

  • slow scripts that never finish
  • brittle selectors
  • banned IPs
  • “works on my machine” crawlers that fail in production

This guide is a decision framework — not a religious argument.

When sites get hostile, keep your scraper architecture clean

No framework solves blocking by itself. Keep reliability in your fetch layer (timeouts, retries, optional ProxiesAPI) so you can swap tools without rewriting your extraction logic.


TL;DR decision rules

Pick Beautiful Soup when:

  • the site is mostly server-rendered HTML
  • you’re scraping dozens to hundreds of pages
  • you want full control over parsing and exports

Pick Scrapy when:

  • you’re scraping thousands to millions of pages
  • you need a real crawling engine (queues, dedupe, pipelines)
  • you care about throughput and resilience

Pick Selenium when:

  • the content requires real browser rendering
  • navigation requires clicks, scroll, or authenticated sessions
  • anti-bot measures break simple HTTP fetches

Comparison table (practical)

ToolBest forThroughputReliabilityComplexity
Beautiful Soup + requestssimple sites, quick scriptsMediumMediumLow
Scrapylarge crawls, structured pipelinesHighHighMedium
SeleniumJS-heavy sites, complex flowsLowMediumHigh

Notes:

  • Scrapy is “fast” because it’s asynchronous and designed for crawling.
  • Selenium is “slow” because you’re running a full browser per page.

Beautiful Soup: the surgical knife

Beautiful Soup shines when you already know the URLs (or can generate them) and the HTML is present server-side.

Minimal pattern:

import requests
from bs4 import BeautifulSoup

TIMEOUT = (10, 30)
session = requests.Session()

def fetch(url: str) -> str:
    r = session.get(url, timeout=TIMEOUT, headers={"User-Agent": "Mozilla/5.0"})
    r.raise_for_status()
    return r.text

def parse(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    rows = []
    for card in soup.select(".card"):
        a = card.select_one("a.title")
        rows.append({"title": a.get_text(strip=True) if a else None})
    return rows

Where it breaks down:

  • you need a real queue + dedupe
  • you need to auto-discover new pages (true crawling)
  • the site is JS-rendered so the HTML you fetch is empty

Scrapy: the crawling engine

Scrapy is not “a parser.” It’s an engine:

  • concurrent requests
  • built-in retries and throttling hooks
  • request fingerprinting + dedupe
  • pipelines for cleaning/export/storage

Minimal spider:

import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"
    start_urls = ["https://example.com/list"]

    def parse(self, response):
        for href in response.css("a::attr(href)").getall():
            yield response.follow(href, callback=self.parse_item)

    def parse_item(self, response):
        yield {
            "title": response.css("h1::text").get(),
            "url": response.url,
        }

Where Scrapy wins:

  • large-scale crawls
  • crawl politeness (delays, concurrency limits)
  • storing results in a durable pipeline

Where Scrapy struggles:

  • JS-heavy sites (you’ll need a renderer integration)
  • flows that require clicking/scrolling/auth in a real browser

Selenium: the browser hammer

Selenium is the “make it work” tool when the page isn’t really a document — it’s an app.

Minimal pattern:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com")

els = driver.find_elements(By.CSS_SELECTOR, "a.title")
rows = [{"title": e.text, "url": e.get_attribute("href")} for e in els]

driver.quit()

Selenium is expensive:

  • browser startup cost
  • page rendering cost
  • bot detection is more intense

Use it when you must, and keep your run size reasonable.


The real secret: architecture matters more than tool choice

Most scrapers fail because the system is not cleanly separated.

Design for:

  1. Fetch layer (timeouts, retries, rate limits, optional ProxiesAPI)
  2. Parse layer (selectors → raw fields)
  3. Normalize layer (types, defaults, cleanup)
  4. Export/store layer (JSON/CSV/DB)

If you keep these boundaries, you can migrate:

  • Beautiful Soup → Scrapy
  • Selenium → Playwright
  • direct fetch → ProxiesAPI

…without rewriting the entire project.


When proxies actually matter

Proxies are not a cheat code.

They matter when:

  • your request volume increases (you look like a bot)
  • the target throttles by IP
  • you hit geo restrictions
  • you need higher success rate across many URLs

If you’re scraping 10 pages once, solve the basics first:

  • correct selectors
  • timeouts
  • backoff
  • politeness

Then, if you still hit blocks at scale, move reliability into the fetch layer.


A simple “choose your tool” checklist

  • Is the HTML present in curl output?
    • Yes → Beautiful Soup or Scrapy
    • No → Selenium/Playwright (rendered)
  • Do you need to discover URLs by following links?
    • Yes → Scrapy
    • No → Beautiful Soup is often enough
  • Are you scraping 10k+ pages?
    • Yes → Scrapy (or you’ll reinvent it badly)
  • Is the site essentially a single-page app?
    • Yes → Selenium/Playwright

If you follow those rules, you’ll be right most of the time.

When sites get hostile, keep your scraper architecture clean

No framework solves blocking by itself. Keep reliability in your fetch layer (timeouts, retries, optional ProxiesAPI) so you can swap tools without rewriting your extraction logic.

Related guides

Selenium Web Scraping with Python: Complete Guide
A practical Selenium web scraping with Python guide: setup, waits, selectors, anti-bot basics, exporting data, and when Selenium is the wrong tool. Includes comparison tables and a ProxiesAPI-friendly architecture pattern.
guide#python#selenium#web-scraping
Web Scraping Tools (2026): The Buyer's Guide — What to Use and When
A practical 2026 decision guide to web scraping tools: Python libraries, headless browsers, proxy APIs, turnkey services, and managed datasets—plus a no-nonsense selection framework.
guide#web-scraping#web scraping tools#python
Web Scraping Tools: The 2026 Buyer's Guide (What to Use and When)
A practical buyer’s guide to web scraping tools in 2026: Requests/BS4, Scrapy, Playwright, Apify, proxies, and hosted scrapers—plus a decision checklist and comparison table.
guide#web-scraping#tools#python
Web Scraping Tools (2026): The Buyer’s Guide — What to Use and When
A practical guide to choosing web scraping tools in 2026: browser automation vs frameworks vs no-code extractors vs hosted scraping APIs — plus cost, reliability, and when proxies matter.
guide#web scraping tools#web-scraping#python