How to Scrape Google Flights Prices with Python (Routes, Dates, and Price Quotes)

Google Flights is one of the best “real world” scraping targets because it’s a high-value dataset (prices change constantly) and it’s also a site that will punish sloppy scrapers.

In this tutorial we’ll build a production-minded Python scraper that:

  • captures a shareable Google Flights results URL (you choose the route + dates)
  • fetches HTML safely (timeouts, retries, and a session)
  • parses flight result cards into structured data (airline, times, duration, stops, price)
  • exports JSON you can use for alerts, dashboards, or analysis
  • shows where ProxiesAPI fits when you scale beyond “a few manual checks”

We’ll also include a screenshot of the page we’re scraping so you can visually match selectors.

Google Flights results page (example route/date)

Keep Google Flights requests stable with ProxiesAPI

Google surfaces anti-bot defenses quickly when you scale beyond a handful of requests. ProxiesAPI gives you a clean proxy layer (rotation + reputation) so your scraper can keep running without burning a single IP.


Important note (what we are and aren’t doing)

Google Flights is heavily dynamic and personalized. There are many ways to “scrape Google Flights”, and some are brittle or cross lines you might not want to cross.

This guide focuses on a pragmatic, ethical approach:

  • You generate a results page (route + dates) in your browser.
  • You use a share URL that loads a results page.
  • We fetch the HTML and extract the visible quote cards.

If you need deep automation (searching thousands of date combinations), treat this as the baseline and then add:

  • caching
  • queueing + backoff
  • incremental refresh
  • stronger fingerprinting defenses (often via a real browser)

What we’re scraping (page anatomy)

When you open Google Flights results, you’ll typically see a list of options. Each option contains:

  • a price (e.g. “₹24,531”)
  • departure/arrival times
  • airline(s)
  • duration and stops

The HTML structure changes. So instead of hardcoding one brittle selector, we’ll:

  1. Locate result “cards” by looking for repeating blocks that contain a price
  2. Extract fields using relative selectors within each card
  3. Keep the parser tolerant of missing fields

This is the same approach you’ll use on most complex sites: identify a repeated item container, then parse inside it.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml tenacity

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for parsing
  • tenacity for retries with backoff

Step 1: Get a Google Flights share URL

  1. Go to https://www.google.com/travel/flights
  2. Enter your origin, destination, and dates
  3. Apply any filters you care about (e.g. “1 stop or fewer”)
  4. Copy the URL from the address bar

Tip: If the URL is extremely long, that’s fine. We’ll store it in a config file.

Create config.py:

# config.py
FLIGHTS_URL = "PASTE_YOUR_GOOGLE_FLIGHTS_RESULTS_URL_HERE"

Step 2: Fetch HTML reliably (timeouts + retries)

Google will sometimes return:

  • an interstitial
  • an error page
  • truncated HTML

So we want:

  • connect/read timeouts
  • retries with exponential backoff
  • a stable session (cookies)
import random
import time
import requests
from tenacity import retry, stop_after_attempt, wait_exponential

TIMEOUT = (10, 30)  # connect, read

BASE_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Cache-Control": "no-cache",
    "Pragma": "no-cache",
}

session = requests.Session()


def polite_sleep(min_s=0.7, max_s=1.6):
    time.sleep(random.uniform(min_s, max_s))


@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=20))
def fetch_html(url: str) -> str:
    r = session.get(url, headers=BASE_HEADERS, timeout=TIMEOUT, allow_redirects=True)
    r.raise_for_status()

    text = r.text

    # lightweight sanity checks
    if "captcha" in text.lower() or "unusual traffic" in text.lower():
        raise RuntimeError("Blocked (captcha/unusual traffic)")

    if len(text) < 50_000:
        # Results pages are typically much larger; small HTML often means interstitial.
        raise RuntimeError(f"Suspiciously small HTML: {len(text)} bytes")

    return text

Step 3: Parse results cards into structured quotes

Instead of assuming exact class names, we’ll:

  • extract all price-like strings
  • walk upward to find a container node
  • then parse times/airlines/duration inside that container

This is not perfect, but it’s surprisingly effective when the page is mostly server-rendered.

import re
from bs4 import BeautifulSoup

PRICE_RE = re.compile(r"(₹|\$|€|£)\s?\d[\d,\.]*")
TIME_RE = re.compile(r"\b\d{1,2}:\d{2}\s?(AM|PM)?\b", re.IGNORECASE)


def clean_text(s: str) -> str:
    return re.sub(r"\s+", " ", (s or "").strip())


def find_price_nodes(soup: BeautifulSoup):
    # any element whose text looks like a price
    out = []
    for el in soup.find_all(text=True):
        t = str(el)
        if PRICE_RE.search(t):
            out.append(el.parent)
    return out


def parse_quote_from_container(container) -> dict:
    text = clean_text(container.get_text(" ", strip=True))

    # price
    m = PRICE_RE.search(text)
    price = m.group(0) if m else None

    # times (often two per result)
    times = TIME_RE.findall(text)

    # heuristic fields
    airline = None
    duration = None
    stops = None

    # try to capture common tokens
    dur_m = re.search(r"\b(\d+\s?h\s?\d*\s?m|\d+\s?m)\b", text, re.IGNORECASE)
    if dur_m:
        duration = dur_m.group(1)

    stops_m = re.search(r"\b(nonstop|\d+\s?stop(s)?)\b", text, re.IGNORECASE)
    if stops_m:
        stops = stops_m.group(1)

    # airline guess: take first capitalized word sequence before duration/stops/price
    # (keeps this tolerant; you can refine once you inspect your target HTML)
    airline_m = re.search(r"\b([A-Z][A-Za-z&\-\.]+(?:\s+[A-Z][A-Za-z&\-\.]+){0,3})\b", text)
    if airline_m:
        airline = airline_m.group(1)

    return {
        "price": price,
        "times": times,
        "duration": duration,
        "stops": stops,
        "airline_guess": airline,
        "raw": text[:500],
    }


def parse_google_flights(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    # find many candidate price nodes, then dedupe by container identity
    price_nodes = find_price_nodes(soup)

    quotes = []
    seen = set()

    for node in price_nodes:
        # climb up a few levels to get a stable “card”-ish container
        container = node
        for _ in range(5):
            if container.parent:
                container = container.parent

        key = id(container)
        if key in seen:
            continue
        seen.add(key)

        q = parse_quote_from_container(container)
        if q.get("price"):
            quotes.append(q)

    # light cleanup: keep only the best-looking quotes
    # (cards that contain at least one time token)
    quotes = [q for q in quotes if len(q.get("times") or []) >= 1]

    return quotes

This parser is intentionally conservative. Once you run it once, you can look at the raw field and tighten selectors.


Step 4: Put it together (fetch → parse → export)

import json
from config import FLIGHTS_URL


def main():
    html = fetch_html(FLIGHTS_URL)
    polite_sleep()

    quotes = parse_google_flights(html)

    out = {
        "url": FLIGHTS_URL,
        "count": len(quotes),
        "quotes": quotes[:50],
    }

    with open("google_flights_quotes.json", "w", encoding="utf-8") as f:
        json.dump(out, f, ensure_ascii=False, indent=2)

    print("quotes:", len(quotes))
    if quotes:
        print("example:", quotes[0])


if __name__ == "__main__":
    main()

Run:

python scrape_google_flights.py

Where ProxiesAPI fits (honestly)

If you run this once or twice from your laptop, you may be fine.

But price monitoring is rarely “one request”. You typically want to:

  • poll multiple routes
  • poll multiple dates
  • refresh daily/hourly

That’s where blocks and throttling appear.

With ProxiesAPI, you route your requests through a stable proxy layer and rotate IPs so:

  • bursts don’t come from one IP
  • retries don’t look like a tight bot loop
  • you avoid burning a single home/office IP

Minimal integration pattern

You can integrate ProxiesAPI at the network layer by sending requests via a proxy URL.

PROXIES = {
    "http": "http://YOUR_PROXIESAPI_PROXY",
    "https": "http://YOUR_PROXIESAPI_PROXY",
}

r = session.get(FLIGHTS_URL, headers=BASE_HEADERS, proxies=PROXIES, timeout=TIMEOUT)

(Use the proxy endpoint and auth details from your ProxiesAPI dashboard. Keep them in env vars, not hardcoded.)


Practical tips to avoid getting blocked

  • Use a session (requests.Session()) so cookies persist.
  • Add jitter (random delays) between runs.
  • Cache results so you don’t re-fetch the same URL too often.
  • Fail fast on interstitials/captcha pages and back off.
  • If HTML is inconsistent, switch to a browser-based fetch for the initial capture.

QA checklist

  • You can open your FLIGHTS_URL in a normal browser and see results
  • fetch_html() returns large HTML (not an interstitial)
  • Parser returns at least 5–20 quotes for a busy route
  • Export JSON is valid and contains price + some time tokens

Next upgrades

  • parse fields with stronger selectors after inspecting HTML for your route
  • store results in SQLite (dedupe by itinerary)
  • add alert rules (e.g. notify when price drops below threshold)
  • use Playwright for a “rendered HTML snapshot” when server HTML is insufficient
Keep Google Flights requests stable with ProxiesAPI

Google surfaces anti-bot defenses quickly when you scale beyond a handful of requests. ProxiesAPI gives you a clean proxy layer (rotation + reputation) so your scraper can keep running without burning a single IP.

Related guides

Scrape Flight Prices from Google Flights (Python + ProxiesAPI)
A practical approach to monitoring flight prices: take a proof screenshot, extract prices from HTML snapshots, and run with retries + proxy rotation.
tutorial#python#google-flights#price-scraping
Scrape Product Prices from Home Depot (Search + Category Pages) with Python + ProxiesAPI
Extract product name, price, and availability from Home Depot listing pages (search + category) with pagination, resilient parsing, and an anti-block-friendly request layer.
tutorial#python#home-depot#ecommerce
How to Scrape Cars.com Used Car Prices (Python + ProxiesAPI)
Extract listing title, price, mileage, location, and dealer info from Cars.com search results + detail pages. Includes selector notes, pagination, and a polite crawl plan.
tutorial#python#cars.com#price-scraping
How to Scrape Booking.com Hotel Prices with Python (Using ProxiesAPI)
Extract hotel names, nightly prices, review scores, and basic availability fields from Booking.com search results using Python + BeautifulSoup, with ProxiesAPI for more reliable fetching.
tutorial#python#booking#price-scraping