Selenium Web Scraping with Python: Complete Guide

If you’re searching for selenium web scraping with python, you likely have a site where:

  • the HTML you get from requests.get(...) is empty or missing key data
  • the page requires scrolling/clicking to load results
  • content only appears after JavaScript runs

Selenium can solve those — but it’s also the slowest and most brittle tool in your scraping toolbox. This guide is opinionated: use Selenium when you must, and know when to switch tools.

Stabilize browser scrapes with ProxiesAPI

Selenium gives you a real browser, but at scale you still hit IP-based throttling. ProxiesAPI helps on the network layer (especially your non-browser discovery/fetch jobs) so the overall crawl stays resilient.


Selenium vs alternatives (pick the right tool)

ToolBest forProsCons
requests + BeautifulSoupserver-rendered HTMLfastest, cheapest, easiest to scalefails on JS apps
Playwrightmodern JS sitesreliable waits, auto-waits, great debuggingheavier than requests
Seleniumlegacy apps + complex UI flowshuge ecosystem, broad compatibilityslow, flakier, more bot detection
Direct API/XHR scrapingdata behind JSON callsfastest + most stable (when allowed)requires endpoint discovery + sometimes auth

Recommendation:

  • Start with requests + parse.
  • If the HTML is missing, try Playwright next.
  • Use Selenium when you specifically need its ecosystem or compatibility.

Setup (Python + Selenium)

python -m venv .venv
source .venv/bin/activate
pip install selenium

Chrome + driver

Selenium needs a browser and a matching driver.

  • For Chrome, install a compatible chromedriver (must match major Chrome version).
  • On macOS (Homebrew):
brew install chromedriver

If the driver is mismatched you’ll see errors like:

  • session not created: This version of ChromeDriver only supports Chrome version ...

Fix: update Chrome or update chromedriver.


The core pattern: explicit waits (not sleeps)

The #1 Selenium mistake is writing time.sleep(5) everywhere. Instead, wait for a specific condition.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 20)

driver.get("https://example.com")

h1 = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "h1")))
print(h1.text)

driver.quit()

“Page loaded” isn’t enough

Many sites load in stages (HTML → JS → XHR → render). Waiting for document.readyState alone often gives you “empty” pages. Wait for:

  • an element to exist
  • a list length to be > 0
  • text to appear

Selectors: CSS first, XPath second

Prefer CSS selectors:

  • div.card
  • a[href^="/product/"]
  • button[data-testid="submit"]

Use XPath when you need structural selection (parent/sibling relationships) that CSS can’t express cleanly.


Headless mode (and why behavior changes)

Headless is great for CI/servers, but it can change rendering and break lazy loading. Always set a viewport size.

from selenium.webdriver.chrome.options import Options

opts = Options()
opts.add_argument("--headless=new")
opts.add_argument("--window-size=1400,900")
opts.add_argument("--disable-gpu")

driver = webdriver.Chrome(options=opts)

If a scrape works headed but fails headless, it’s usually:

  • missing viewport size
  • missing waits for content
  • blocked requests (403/captcha/empty HTML)

Anti-bot basics (low-risk, practical)

You can’t “outsmart” every system, but you can avoid obvious mistakes:

  • pace requests and add jitter
  • reuse a browser session (don’t relaunch per URL)
  • detect blocks early (empty content, captcha markers)
  • stop and back off when failure rates spike

If the site’s terms prohibit scraping, don’t do it (or get explicit permission).


Export data cleanly

Browser automation fails; your exports should survive partial runs. Extract dicts and write them frequently.

import csv

rows = [{"name": "Item A", "price": "$10"}, {"name": "Item B", "price": "$12"}]

with open("out.csv", "w", newline="", encoding="utf-8") as f:
    w = csv.DictWriter(f, fieldnames=list(rows[0].keys()))
    w.writeheader()
    w.writerows(rows)

When Selenium is the wrong tool

Selenium becomes a liability when you need to crawl thousands of URLs or run continuously. Switch away when:

  • your bottleneck is IP blocks (not rendering)
  • you can fetch JSON/XHR endpoints directly
  • you can parse HTML without a browser

A common production pattern is hybrid scraping:

  • try requests first (fast path)
  • fall back to Selenium for the minority of pages that truly need rendering

Where ProxiesAPI fits (sensibly)

Selenium itself doesn’t plug into ProxiesAPI via a single wrapper URL without extra browser proxy configuration — but ProxiesAPI still helps in two common architectures:

  1. Discovery via HTTP, rendering only when needed: use ProxiesAPI on your bulk fetch layer (category pages, listing pages, sitemaps), then pass only hard URLs to Selenium.
  2. Hybrid scraping with fallbacks: ProxiesAPI stabilizes your HTTP fetches so your crawler spends less time failing and less time falling back to the expensive browser path.

Keep the system modular (fetch → parse → export → renderer fallback) and Selenium stays a tool — not your whole product.

Stabilize browser scrapes with ProxiesAPI

Selenium gives you a real browser, but at scale you still hit IP-based throttling. ProxiesAPI helps on the network layer (especially your non-browser discovery/fetch jobs) so the overall crawl stays resilient.

Related guides

How to Scrape Data Without Getting Blocked (A Practical Playbook)
A step-by-step anti-block strategy for web scraping: request fingerprinting, sessions, rate limits, retries, proxies, and when to use a real browser—without burning IPs or writing brittle code.
guide#web-scraping#anti-bot#rate-limiting
Web Scraping Tools (2026): The Buyer's Guide — What to Use and When
A practical 2026 decision guide to web scraping tools: Python libraries, headless browsers, proxy APIs, turnkey services, and managed datasets—plus a no-nonsense selection framework.
guide#web-scraping#web scraping tools#python
How to Scrape Data Without Getting Blocked (Practical Playbook)
A practical anti-blocking playbook for web scraping: rate limits, headers, retries, session handling, proxy rotation, browser fallback, and monitoring—plus proven Python patterns.
guide#web-scraping#anti-bot#proxies
Web Scraping Tools (2026): The Buyer’s Guide — What to Use and When
A practical guide to choosing web scraping tools in 2026: browser automation vs frameworks vs no-code extractors vs hosted scraping APIs — plus cost, reliability, and when proxies matter.
guide#web scraping tools#web-scraping#python