Beautiful Soup vs Scrapy vs Selenium (2026): Which Python Scraper Should You Use?

If you’re scraping with Python, you’ll hear three names over and over:

  • Beautiful Soup (usually with requests)
  • Scrapy
  • Selenium (or Playwright, in the same “browser automation” category)

They’re not interchangeable.

They solve different problems, and choosing the wrong one costs you weeks:

  • slow scripts that never finish
  • brittle selectors
  • banned IPs
  • “works on my machine” crawlers that fail in production

This guide is a decision framework — not a religious argument.

When sites get hostile, keep your scraper architecture clean

No framework solves blocking by itself. Keep reliability in your fetch layer (timeouts, retries, optional ProxiesAPI) so you can swap tools without rewriting your extraction logic.


TL;DR decision rules

Pick Beautiful Soup when:

  • the site is mostly server-rendered HTML
  • you’re scraping dozens to hundreds of pages
  • you want full control over parsing and exports

Pick Scrapy when:

  • you’re scraping thousands to millions of pages
  • you need a real crawling engine (queues, dedupe, pipelines)
  • you care about throughput and resilience

Pick Selenium when:

  • the content requires real browser rendering
  • navigation requires clicks, scroll, or authenticated sessions
  • anti-bot measures break simple HTTP fetches

Comparison table (practical)

ToolBest forThroughputReliabilityComplexity
Beautiful Soup + requestssimple sites, quick scriptsMediumMediumLow
Scrapylarge crawls, structured pipelinesHighHighMedium
SeleniumJS-heavy sites, complex flowsLowMediumHigh

Notes:

  • Scrapy is “fast” because it’s asynchronous and designed for crawling.
  • Selenium is “slow” because you’re running a full browser per page.

Beautiful Soup: the surgical knife

Beautiful Soup shines when you already know the URLs (or can generate them) and the HTML is present server-side.

Minimal pattern:

import requests
from bs4 import BeautifulSoup

TIMEOUT = (10, 30)
session = requests.Session()

def fetch(url: str) -> str:
    r = session.get(url, timeout=TIMEOUT, headers={"User-Agent": "Mozilla/5.0"})
    r.raise_for_status()
    return r.text

def parse(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    rows = []
    for card in soup.select(".card"):
        a = card.select_one("a.title")
        rows.append({"title": a.get_text(strip=True) if a else None})
    return rows

Where it breaks down:

  • you need a real queue + dedupe
  • you need to auto-discover new pages (true crawling)
  • the site is JS-rendered so the HTML you fetch is empty

Scrapy: the crawling engine

Scrapy is not “a parser.” It’s an engine:

  • concurrent requests
  • built-in retries and throttling hooks
  • request fingerprinting + dedupe
  • pipelines for cleaning/export/storage

Minimal spider:

import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"
    start_urls = ["https://example.com/list"]

    def parse(self, response):
        for href in response.css("a::attr(href)").getall():
            yield response.follow(href, callback=self.parse_item)

    def parse_item(self, response):
        yield {
            "title": response.css("h1::text").get(),
            "url": response.url,
        }

Where Scrapy wins:

  • large-scale crawls
  • crawl politeness (delays, concurrency limits)
  • storing results in a durable pipeline

Where Scrapy struggles:

  • JS-heavy sites (you’ll need a renderer integration)
  • flows that require clicking/scrolling/auth in a real browser

Selenium: the browser hammer

Selenium is the “make it work” tool when the page isn’t really a document — it’s an app.

Minimal pattern:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com")

els = driver.find_elements(By.CSS_SELECTOR, "a.title")
rows = [{"title": e.text, "url": e.get_attribute("href")} for e in els]

driver.quit()

Selenium is expensive:

  • browser startup cost
  • page rendering cost
  • bot detection is more intense

Use it when you must, and keep your run size reasonable.


The real secret: architecture matters more than tool choice

Most scrapers fail because the system is not cleanly separated.

Design for:

  1. Fetch layer (timeouts, retries, rate limits, optional ProxiesAPI)
  2. Parse layer (selectors → raw fields)
  3. Normalize layer (types, defaults, cleanup)
  4. Export/store layer (JSON/CSV/DB)

If you keep these boundaries, you can migrate:

  • Beautiful Soup → Scrapy
  • Selenium → Playwright
  • direct fetch → ProxiesAPI

…without rewriting the entire project.


When proxies actually matter

Proxies are not a cheat code.

They matter when:

  • your request volume increases (you look like a bot)
  • the target throttles by IP
  • you hit geo restrictions
  • you need higher success rate across many URLs

If you’re scraping 10 pages once, solve the basics first:

  • correct selectors
  • timeouts
  • backoff
  • politeness

Then, if you still hit blocks at scale, move reliability into the fetch layer.


A simple “choose your tool” checklist

  • Is the HTML present in curl output?
    • Yes → Beautiful Soup or Scrapy
    • No → Selenium/Playwright (rendered)
  • Do you need to discover URLs by following links?
    • Yes → Scrapy
    • No → Beautiful Soup is often enough
  • Are you scraping 10k+ pages?
    • Yes → Scrapy (or you’ll reinvent it badly)
  • Is the site essentially a single-page app?
    • Yes → Selenium/Playwright

If you follow those rules, you’ll be right most of the time.

When sites get hostile, keep your scraper architecture clean

No framework solves blocking by itself. Keep reliability in your fetch layer (timeouts, retries, optional ProxiesAPI) so you can swap tools without rewriting your extraction logic.

Related guides

Best Web Scraper in 2026: A Feature-First Buyers Guide (No Fluff)
A practical, feature-first guide to choosing a web scraping stack in 2026: browser automation vs HTTP parsing vs crawler frameworks vs data APIs. Includes comparison tables, cost tradeoffs, and when ProxiesAPI fits.
guides#web-scraping#buyers-guide#python
Playwright vs Selenium vs Puppeteer for Web Scraping (2026): Which One Should You Pick?
A practical decision guide for browser-based scraping: Playwright vs Selenium vs Puppeteer. Compare stealth/blocking, JavaScript rendering, speed, reliability, language support, and when each tool is the right hammer.
guide#web-scraping#playwright#selenium
Selenium Web Scraping with Python: Complete Guide
A practical Selenium web scraping with Python guide: setup, waits, selectors, anti-bot basics, exporting data, and when Selenium is the wrong tool. Includes comparison tables and a ProxiesAPI-friendly architecture pattern.
guide#python#selenium#web-scraping
Web Scraping Tools (2026): The Buyer's Guide — What to Use and When
A practical 2026 decision guide to web scraping tools: Python libraries, headless browsers, proxy APIs, turnkey services, and managed datasets—plus a no-nonsense selection framework.
guide#web-scraping#web scraping tools#python