How to Scrape Google Search Results with Python

If you want to scrape Google with Python, the hard part is not writing requests.get(). The hard part is handling all the ways Google SERPs differ by country, query, consent state, result modules, and anti-bot checks.

So the right goal is not “perfect forever parser.” The right goal is a defensive workflow that:

  • fetches result pages with retries
  • detects obvious blocks and interstitials
  • extracts organic titles, URLs, and snippets
  • validates output before you trust it
  • keeps the proxy layer separate from the parser

Important: review Google’s terms for your use case. If you need guaranteed, high-volume SERP data, a dedicated SERP provider is usually a better operational fit than DIY scraping.

SERP scraping gets brittle fast; ProxiesAPI makes the transport cleaner

Google results pages shift often and block aggressively. ProxiesAPI will not solve parser quality for you, but it does make retries and IP rotation much easier once you are testing this at meaningful volume.


What changed in 2026

Google now mixes classic organic results with more AI-heavy modules and richer answer surfaces. That means the page can contain:

  • ads
  • AI or answer modules
  • “People also ask”
  • videos
  • local packs
  • traditional organic links

If your script grabs the first a[href] from each block, it will produce junk. The parser has to be selective.


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We will use:

  • requests for HTTP
  • BeautifulSoup for parsing
  • CSV/JSON export for inspection

Step 1: Fetch pages and detect obvious blocking

import os
import random
import time
from urllib.parse import quote_plus

import requests

TIMEOUT = (10, 30)
MAX_RETRIES = 5
UA = (
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/126.0 Safari/537.36"
)

session = requests.Session()
session.headers.update(
    {
        "User-Agent": UA,
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
    }
)


def proxiesapi_url(target_url: str) -> str:
    key = os.environ.get("PROXIESAPI_KEY")
    if not key:
        return target_url
    return f"http://api.proxiesapi.com/?auth_key={key}&url={quote_plus(target_url)}"


def looks_blocked(html: str) -> bool:
    text = (html or "").lower()
    return any(
        phrase in text
        for phrase in [
            "our systems have detected unusual traffic",
            "/sorry/index",
            "to continue, please verify",
            "captcha",
        ]
    )


def fetch(url: str) -> str:
    last_error = None
    for attempt in range(1, MAX_RETRIES + 1):
        try:
            response = session.get(proxiesapi_url(url), timeout=TIMEOUT)
            if response.status_code in (429, 503):
                raise RuntimeError(f"transient status {response.status_code}")
            response.raise_for_status()

            html = response.text or ""
            if looks_blocked(html):
                raise RuntimeError("google block or interstitial detected")

            return html
        except Exception as exc:
            last_error = exc
            if attempt == MAX_RETRIES:
                break
            time.sleep(min(30, 2 ** (attempt - 1)) + random.uniform(0, 0.7))
    raise RuntimeError(f"failed to fetch SERP: {last_error}")

The point of this function is not to brute-force Google. It is to fail clearly when you are blocked, instead of silently parsing garbage.


Step 2: Generate a predictable search URL

def google_search_url(query: str, start: int = 0, hl: str = "en", gl: str = "us", num: int = 10) -> str:
    return (
        "https://www.google.com/search?"
        f"q={quote_plus(query)}&start={start}&num={num}&hl={hl}&gl={gl}&pws=0"
    )

These parameters help keep tests more stable:

  • hl=en sets interface language
  • gl=us nudges geography
  • pws=0 reduces personalization

They do not make SERPs perfectly deterministic, but they reduce some noise.


Step 3: Parse organic results defensively

from bs4 import BeautifulSoup
from urllib.parse import urlparse


def is_google_internal(href: str | None) -> bool:
    if not href:
        return True
    if href.startswith("/"):
        return True
    host = urlparse(href).netloc.lower()
    return host.endswith("google.com") or host.endswith("googleusercontent.com")


def parse_serp(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    scope = soup.select_one("div#search") or soup

    rows = []
    seen = set()

    for block in scope.select("div"):
        link = block.select_one("a[href]")
        title_el = block.select_one("h3")

        if not link or not title_el:
            continue

        href = link.get("href")
        if is_google_internal(href):
            continue

        title = title_el.get_text(" ", strip=True)
        snippet_el = block.select_one("div.VwiC3b") or block.select_one("span.aCOpRe")
        snippet = snippet_el.get_text(" ", strip=True) if snippet_el else None

        if not title or href in seen:
            continue

        seen.add(href)
        rows.append(
            {
                "title": title,
                "url": href,
                "snippet": snippet,
            }
        )

    return rows

The two most important filters are:

  • require an h3
  • ignore internal Google URLs

Those two checks remove a surprising amount of junk.


Step 4: Paginate and export

import csv
import json


def crawl_query(query: str, pages: int = 2) -> list[dict]:
    all_rows = []
    seen = set()

    for page in range(pages):
        url = google_search_url(query, start=page * 10)
        html = fetch(url)
        batch = parse_serp(html)

        for row in batch:
            if row["url"] in seen:
                continue
            seen.add(row["url"])
            all_rows.append(row)

    return all_rows


def write_csv(path: str, rows: list[dict]) -> None:
    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["title", "url", "snippet"])
        writer.writeheader()
        for row in rows:
            writer.writerow(row)


def write_json(path: str, rows: list[dict]) -> None:
    with open(path, "w", encoding="utf-8") as f:
        json.dump(rows, f, ensure_ascii=False, indent=2)


if __name__ == "__main__":
    rows = crawl_query("best web scraping tools", pages=2)
    write_csv("google_serp.csv", rows)
    write_json("google_serp.json", rows)
    print(f"wrote {len(rows)} results")

Always inspect a sample of the output manually before scaling. SERP scraping is one of those jobs where “ran without crashing” does not mean “data is correct.”


DIY scraping vs using a SERP API

The real choice is operational, not ideological.

ApproachBest forMain downside
DIY Python scraperexperiments, low-volume research, learningbrittle selectors and blocks
SERP API/providerproduction SEO pipelines, scale, geo variationextra cost

If you only need occasional snapshots, Python is fine. If your business depends on stable SERP data every day, a provider is usually cheaper than babysitting breakages.


Practical advice that saves time

  1. Cache HTML while debugging selectors.
  2. Keep request volume low while testing.
  3. Validate that URLs are external and titles are plausible.
  4. Expect markup drift and write parser fallbacks early.
  5. Treat ProxiesAPI as transport help, not a substitute for clean parsing.

That is the honest version of Google scraping. It is possible, but it rewards defensive engineering much more than clever one-liners.

SERP scraping gets brittle fast; ProxiesAPI makes the transport cleaner

Google results pages shift often and block aggressively. ProxiesAPI will not solve parser quality for you, but it does make retries and IP rotation much easier once you are testing this at meaningful volume.

Related guides

How to Scrape Google Search Results with Python (Without Getting Blocked)
A practical SERP scraping workflow in Python: handle consent/interstitials, parse organic results defensively, rotate IPs, backoff on blocks, and export clean results. Includes a ProxiesAPI-backed fetch layer.
guide#how to scrape google search results with python#python#serp
Scrape Craigslist Listings by Category and City
Show how to pull listing titles, prices, neighborhoods, and posting URLs from Craigslist search pages into a clean dataset.
tutorial#python#craigslist#web-scraping
Scrape GitHub Repository Data
Collect repo names, stars, forks, topics, and last-updated metadata from GitHub pages for market and competitor research.
tutorial#python#github#web-scraping
Scrape Rightmove Sold Prices
Walk through building a sold-price dataset from Rightmove with listing details, pagination, and clean CSV export.
tutorial#python#rightmove#real-estate