How to Scrape Google Search Results with Python

Jun 20, 2026 · guide · #scrape google, #python, #serp, #web-scraping, #beautifulsoup, #proxiesapi

If you want to scrape Google with Python, the hard part is not writing requests.get(). The hard part is handling all the ways Google SERPs differ by country, query, consent state, result modules, and anti-bot checks.

So the right goal is not “perfect forever parser.” The right goal is a defensive workflow that:

fetches result pages with retries
detects obvious blocks and interstitials
extracts organic titles, URLs, and snippets
validates output before you trust it
keeps the proxy layer separate from the parser

Important: review Google’s terms for your use case. If you need guaranteed, high-volume SERP data, a dedicated SERP provider is usually a better operational fit than DIY scraping.

SERP scraping gets brittle fast; ProxiesAPI makes the transport cleaner

Google results pages shift often and block aggressively. ProxiesAPI will not solve parser quality for you, but it does make retries and IP rotation much easier once you are testing this at meaningful volume.

Get 1,000 free API calls View pricing

What changed in 2026

Google now mixes classic organic results with more AI-heavy modules and richer answer surfaces. That means the page can contain:

ads
AI or answer modules
“People also ask”
videos
local packs
traditional organic links

If your script grabs the first a[href] from each block, it will produce junk. The parser has to be selective.

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We will use:

requests for HTTP
BeautifulSoup for parsing
CSV/JSON export for inspection

Step 1: Fetch pages and detect obvious blocking

import os
import random
import time
from urllib.parse import quote_plus

import requests

TIMEOUT = (10, 30)
MAX_RETRIES = 5
UA = (
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/126.0 Safari/537.36"
)

session = requests.Session()
session.headers.update(
    {
        "User-Agent": UA,
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
    }
)


def proxiesapi_url(target_url: str) -> str:
    key = os.environ.get("PROXIESAPI_KEY")
    if not key:
        return target_url
    return f"http://api.proxiesapi.com/?auth_key={key}&url={quote_plus(target_url)}"


def looks_blocked(html: str) -> bool:
    text = (html or "").lower()
    return any(
        phrase in text
        for phrase in [
            "our systems have detected unusual traffic",
            "/sorry/index",
            "to continue, please verify",
            "captcha",
        ]
    )


def fetch(url: str) -> str:
    last_error = None
    for attempt in range(1, MAX_RETRIES + 1):
        try:
            response = session.get(proxiesapi_url(url), timeout=TIMEOUT)
            if response.status_code in (429, 503):
                raise RuntimeError(f"transient status {response.status_code}")
            response.raise_for_status()

            html = response.text or ""
            if looks_blocked(html):
                raise RuntimeError("google block or interstitial detected")

            return html
        except Exception as exc:
            last_error = exc
            if attempt == MAX_RETRIES:
                break
            time.sleep(min(30, 2 ** (attempt - 1)) + random.uniform(0, 0.7))
    raise RuntimeError(f"failed to fetch SERP: {last_error}")

The point of this function is not to brute-force Google. It is to fail clearly when you are blocked, instead of silently parsing garbage.

Step 2: Generate a predictable search URL

def google_search_url(query: str, start: int = 0, hl: str = "en", gl: str = "us", num: int = 10) -> str:
    return (
        "https://www.google.com/search?"
        f"q={quote_plus(query)}&start={start}&num={num}&hl={hl}&gl={gl}&pws=0"
    )

These parameters help keep tests more stable:

hl=en sets interface language
gl=us nudges geography
pws=0 reduces personalization

They do not make SERPs perfectly deterministic, but they reduce some noise.

Step 3: Parse organic results defensively

from bs4 import BeautifulSoup
from urllib.parse import urlparse


def is_google_internal(href: str | None) -> bool:
    if not href:
        return True
    if href.startswith("/"):
        return True
    host = urlparse(href).netloc.lower()
    return host.endswith("google.com") or host.endswith("googleusercontent.com")


def parse_serp(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    scope = soup.select_one("div#search") or soup

    rows = []
    seen = set()

    for block in scope.select("div"):
        link = block.select_one("a[href]")
        title_el = block.select_one("h3")

        if not link or not title_el:
            continue

        href = link.get("href")
        if is_google_internal(href):
            continue

        title = title_el.get_text(" ", strip=True)
        snippet_el = block.select_one("div.VwiC3b") or block.select_one("span.aCOpRe")
        snippet = snippet_el.get_text(" ", strip=True) if snippet_el else None

        if not title or href in seen:
            continue

        seen.add(href)
        rows.append(
            {
                "title": title,
                "url": href,
                "snippet": snippet,
            }
        )

    return rows

The two most important filters are:

require an h3
ignore internal Google URLs

Those two checks remove a surprising amount of junk.

Step 4: Paginate and export

import csv
import json


def crawl_query(query: str, pages: int = 2) -> list[dict]:
    all_rows = []
    seen = set()

    for page in range(pages):
        url = google_search_url(query, start=page * 10)
        html = fetch(url)
        batch = parse_serp(html)

        for row in batch:
            if row["url"] in seen:
                continue
            seen.add(row["url"])
            all_rows.append(row)

    return all_rows


def write_csv(path: str, rows: list[dict]) -> None:
    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["title", "url", "snippet"])
        writer.writeheader()
        for row in rows:
            writer.writerow(row)


def write_json(path: str, rows: list[dict]) -> None:
    with open(path, "w", encoding="utf-8") as f:
        json.dump(rows, f, ensure_ascii=False, indent=2)


if __name__ == "__main__":
    rows = crawl_query("best web scraping tools", pages=2)
    write_csv("google_serp.csv", rows)
    write_json("google_serp.json", rows)
    print(f"wrote {len(rows)} results")

Always inspect a sample of the output manually before scaling. SERP scraping is one of those jobs where “ran without crashing” does not mean “data is correct.”

DIY scraping vs using a SERP API

The real choice is operational, not ideological.

Approach	Best for	Main downside
DIY Python scraper	experiments, low-volume research, learning	brittle selectors and blocks
SERP API/provider	production SEO pipelines, scale, geo variation	extra cost

If you only need occasional snapshots, Python is fine. If your business depends on stable SERP data every day, a provider is usually cheaper than babysitting breakages.