Selenium Web Scraping with Python: Complete Guide

May 17, 2026 · guide · #python, #selenium, #web-scraping, #chromedriver, #headless, #anti-bot, #playwright, #proxies

If you’re searching for selenium web scraping with python, you likely have a site where:

the HTML you get from requests.get(...) is empty or missing key data
the page requires scrolling/clicking to load results
content only appears after JavaScript runs

Selenium can solve those — but it’s also the slowest and most brittle tool in your scraping toolbox. This guide is opinionated: use Selenium when you must, and know when to switch tools.

Stabilize browser scrapes with ProxiesAPI

Selenium gives you a real browser, but at scale you still hit IP-based throttling. ProxiesAPI helps on the network layer (especially your non-browser discovery/fetch jobs) so the overall crawl stays resilient.

Get 1,000 free API calls View pricing

Selenium vs alternatives (pick the right tool)

Tool	Best for	Pros	Cons
`requests` + BeautifulSoup	server-rendered HTML	fastest, cheapest, easiest to scale	fails on JS apps
Playwright	modern JS sites	reliable waits, auto-waits, great debugging	heavier than requests
Selenium	legacy apps + complex UI flows	huge ecosystem, broad compatibility	slow, flakier, more bot detection
Direct API/XHR scraping	data behind JSON calls	fastest + most stable (when allowed)	requires endpoint discovery + sometimes auth

Recommendation:

Start with requests + parse.
If the HTML is missing, try Playwright next.
Use Selenium when you specifically need its ecosystem or compatibility.

Setup (Python + Selenium)

python -m venv .venv
source .venv/bin/activate
pip install selenium

Chrome + driver

Selenium needs a browser and a matching driver.

For Chrome, install a compatible chromedriver (must match major Chrome version).
On macOS (Homebrew):

brew install chromedriver

If the driver is mismatched you’ll see errors like:

session not created: This version of ChromeDriver only supports Chrome version ...

Fix: update Chrome or update chromedriver.

The core pattern: explicit waits (not sleeps)

The #1 Selenium mistake is writing time.sleep(5) everywhere. Instead, wait for a specific condition.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)

driver.get("https://example.com")

h1 = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "h1")))
print(h1.text)

driver.quit()

“Page loaded” isn’t enough

Many sites load in stages (HTML → JS → XHR → render). Waiting for document.readyState alone often gives you “empty” pages. Wait for:

an element to exist
a list length to be > 0
text to appear

Selectors: CSS first, XPath second

Prefer CSS selectors:

div.card
a[href^="/product/"]
button[data-testid="submit"]

Use XPath when you need structural selection (parent/sibling relationships) that CSS can’t express cleanly.

Headless mode (and why behavior changes)

Headless is great for CI/servers, but it can change rendering and break lazy loading. Always set a viewport size.

from selenium.webdriver.chrome.options import Options

opts = Options()
opts.add_argument("--headless=new")
opts.add_argument("--window-size=1400,900")
opts.add_argument("--disable-gpu")

driver = webdriver.Chrome(options=opts)

If a scrape works headed but fails headless, it’s usually:

missing viewport size
missing waits for content
blocked requests (403/captcha/empty HTML)

Anti-bot basics (low-risk, practical)

You can’t “outsmart” every system, but you can avoid obvious mistakes:

pace requests and add jitter
reuse a browser session (don’t relaunch per URL)
detect blocks early (empty content, captcha markers)
stop and back off when failure rates spike

If the site’s terms prohibit scraping, don’t do it (or get explicit permission).

Export data cleanly

Browser automation fails; your exports should survive partial runs. Extract dicts and write them frequently.

import csv

rows = [{"name": "Item A", "price": "$10"}, {"name": "Item B", "price": "$12"}]

with open("out.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=list(rows[0].keys()))
    w.writeheader()
    w.writerows(rows)

When Selenium is the wrong tool

Selenium becomes a liability when you need to crawl thousands of URLs or run continuously. Switch away when:

your bottleneck is IP blocks (not rendering)
you can fetch JSON/XHR endpoints directly
you can parse HTML without a browser

A common production pattern is hybrid scraping:

try requests first (fast path)
fall back to Selenium for the minority of pages that truly need rendering

Where ProxiesAPI fits (sensibly)

Selenium itself doesn’t plug into ProxiesAPI via a single wrapper URL without extra browser proxy configuration — but ProxiesAPI still helps in two common architectures:

Discovery via HTTP, rendering only when needed: use ProxiesAPI on your bulk fetch layer (category pages, listing pages, sitemaps), then pass only hard URLs to Selenium.
Hybrid scraping with fallbacks: ProxiesAPI stabilizes your HTTP fetches so your crawler spends less time failing and less time falling back to the expensive browser path.

Keep the system modular (fetch → parse → export → renderer fallback) and Selenium stays a tool — not your whole product.

Stabilize browser scrapes with ProxiesAPI

Get 1,000 free API calls View pricing

A step-by-step anti-block strategy for web scraping: request fingerprinting, sessions, rate limits, retries, proxies, and when to use a real browser—without burning IPs or writing brittle code.

guide#web-scraping#anti-bot#rate-limiting

Web Scraping Tools (2026): The Buyer's Guide — What to Use and When

A practical 2026 decision guide to web scraping tools: Python libraries, headless browsers, proxy APIs, turnkey services, and managed datasets—plus a no-nonsense selection framework.

guide#web-scraping#web scraping tools#python

How to Scrape Data Without Getting Blocked (Practical Playbook)

A practical anti-blocking playbook for web scraping: rate limits, headers, retries, session handling, proxy rotation, browser fallback, and monitoring—plus proven Python patterns.

guide#web-scraping#anti-bot#proxies

Web Scraping Tools (2026): The Buyer’s Guide — What to Use and When

A practical guide to choosing web scraping tools in 2026: browser automation vs frameworks vs no-code extractors vs hosted scraping APIs — plus cost, reliability, and when proxies matter.

guide#web scraping tools#web-scraping#python

Selenium Web Scraping with Python: Complete Guide

Related guides