Scrape Flight Prices from Google Flights (Python + ProxiesAPI)

Google Flights is one of the best places to see real-world airfare pricing — and one of the harder places to scrape reliably.

Why it’s harder:

  • it’s heavily JavaScript-driven (you won’t get useful HTML with plain requests)
  • the UI changes and can A/B test selectors
  • it’s sensitive to aggressive crawling (you’ll hit blocks faster than “friendly HTML” sites)

In this tutorial we’ll build a practical Google Flights scraper that:

  • opens a route (origin → destination)
  • selects departure date (optionally return)
  • extracts the visible price cards (airline, times, stops, price)
  • retries safely
  • exports to CSV/JSON
  • shows where ProxiesAPI fits in (without pretending it magically bypasses everything)

We’ll use Playwright + Python to render the page and scrape from the DOM.

Google Flights results page (we’ll scrape the flight result cards)

Keep flight price crawls stable with ProxiesAPI

Google Flights is a dynamic, high-change target. ProxiesAPI helps you spread load, rotate IPs, and keep your crawling layer reliable as you scale routes, dates, and refresh frequency.


Important note (read this first)

Google’s properties can enforce bot detection and rate limits. Scraping may violate a site’s Terms of Service depending on your use case.

This guide is for educational purposes and for building robust scraping engineering skills (timeouts, retries, data models, pacing). If you need guaranteed access for a production use-case, consider official/partner APIs.


What we’re scraping

We’ll target a Google Flights results URL pattern that opens a route + date.

In practice, Google Flights URLs can look like:

  • https://www.google.com/travel/flights?hl=en#flt=DEL.BOM.2026-05-10;c:INR;e:1;sd:1;t:f

The exact hash structure can vary, but you don’t need to fully reverse-engineer it if you automate the UI steps (fill origin/destination/date) and then scrape cards.

Data we’ll extract per flight

  • airline name
  • departure time
  • arrival time
  • number of stops
  • duration
  • price (currency + amount)

Setup

python -m venv .venv
source .venv/bin/activate
pip install playwright pandas
playwright install

Why Playwright:

  • it renders the JS app reliably
  • it gives you stable waiting primitives
  • it can take screenshots (mandatory for A-track)

Step 1: Launch a browser and open Google Flights

Create scrape_google_flights.py:

from __future__ import annotations

import csv
import json
import re
import time
from dataclasses import asdict, dataclass
from typing import Optional

from playwright.sync_api import sync_playwright, TimeoutError as PWTimeout


@dataclass
class FlightResult:
    airline: Optional[str]
    depart_time: Optional[str]
    arrive_time: Optional[str]
    stops: Optional[str]
    duration: Optional[str]
    price: Optional[str]


def norm_space(s: str) -> str:
    return re.sub(r"\s+", " ", (s or "").strip())


def safe_text(el) -> Optional[str]:
    if not el:
        return None
    try:
        return norm_space(el.inner_text())
    except Exception:
        return None


def goto_with_retries(page, url: str, tries: int = 3, base_sleep: float = 2.0) -> None:
    last = None
    for i in range(1, tries + 1):
        try:
            page.goto(url, wait_until="domcontentloaded", timeout=60_000)
            return
        except Exception as e:
            last = e
            time.sleep(base_sleep * i)
    raise last


def main():
    # Example route/date; you can parameterize these.
    url = "https://www.google.com/travel/flights?hl=en"

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            viewport={"width": 1400, "height": 900},
            locale="en-US",
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/124.0.0.0 Safari/537.36"
            ),
        )
        page = context.new_page()

        goto_with_retries(page, url)

        # Wait for the Flights UI shell.
        page.wait_for_timeout(2000)

        # Screenshot the landing page (useful for debugging)
        page.screenshot(path="google-flights-home.jpg", full_page=True)

        context.close()
        browser.close()


if __name__ == "__main__":
    main()

Run:

python scrape_google_flights.py

At this point you should have google-flights-home.jpg (a quick sanity check).


Step 2: Automate the UI (origin, destination, date)

Google Flights is a component-heavy UI. The safest approach is:

  1. click the “Where from?” field and type an airport/city code (e.g. DEL)
  2. select the first suggestion
  3. do the same for “Where to?” (BOM)
  4. open the date picker and choose a departure date

Selectors can change, so we’ll rely on accessible labels/roles as much as possible.

from playwright.sync_api import expect


def set_route_and_date(page, origin: str, dest: str, depart_yyyy_mm_dd: str):
    # Open origin field
    # Note: labels can vary by locale; keep your locale en-US.
    page.get_by_role("combobox").first.click()

    # Google Flights often has multiple comboboxes; we search by placeholder-ish text.
    # Fallback: click text.
    try:
        page.get_by_text("Where from?").click(timeout=2000)
    except Exception:
        pass

    # Fill origin
    page.keyboard.press("Control+A")
    page.keyboard.type(origin, delay=50)
    page.wait_for_timeout(800)
    page.keyboard.press("Enter")

    # Fill destination
    try:
        page.get_by_text("Where to?").click(timeout=2000)
    except Exception:
        # Tab often moves focus from origin to destination
        page.keyboard.press("Tab")

    page.keyboard.press("Control+A")
    page.keyboard.type(dest, delay=50)
    page.wait_for_timeout(800)
    page.keyboard.press("Enter")

    # Open date picker
    # In many builds there’s a button with text like "Departure".
    page.get_by_text("Departure").click(timeout=5000)

    # Choose date. Google uses aria-labels like "May 10, 2026".
    y, m, d = depart_yyyy_mm_dd.split("-")
    # Build a best-effort label; you may need to adapt for month names.
    # For reliability, you can instead navigate the calendar UI.
    target = depart_yyyy_mm_dd

    # Try clicking a cell containing the day number after ensuring correct month is visible.
    # Minimal approach: click the day number; for real production use, implement month nav.
    page.get_by_role("grid").get_by_text(str(int(d))).first.click(timeout=10_000)

    # Close date picker
    page.get_by_role("button", name="Done").click(timeout=5000)

    # Trigger search if required
    try:
        page.get_by_role("button", name=re.compile("Search", re.I)).click(timeout=3000)
    except Exception:
        pass

    # Wait for results to load
    page.wait_for_timeout(4000)

This is intentionally a “good enough” automation scaffold. In production you’d:

  • implement calendar month navigation
  • harden selectors using page.get_by_role(..., name=...)
  • add detection for “no flights found” and for block/interstitial pages

Step 3: Extract flight cards from the results

Once results load, you’ll typically see a scrollable list of cards.

We’ll use CSS selectors that work on many builds:

  • container list: div[role="list"] or similar
  • card: often div[role="listitem"]

Then read text blocks and parse.


def parse_results(page, limit: int = 25) -> list[FlightResult]:
    out: list[FlightResult] = []

    # Best-effort: wait for list items
    try:
        page.wait_for_selector("div[role='listitem']", timeout=20_000)
    except PWTimeout:
        return out

    cards = page.locator("div[role='listitem']")
    n = min(cards.count(), limit)

    for i in range(n):
        card = cards.nth(i)

        # These subselectors are heuristic; verify with Playwright inspector.
        airline = safe_text(card.locator("div[aria-label*='Airline']").first)
        if not airline:
            # Fallback: first strong-ish text line
            airline = safe_text(card.locator("span").first)

        times = safe_text(card.locator("span").filter(has_text=re.compile(r"\d{1,2}:\d{2}"))).first)
        price = safe_text(card.locator("span").filter(has_text=re.compile(r"₹|\$|€"))).first)

        # Duration/stops often show like "2 hr 5 min" and "Nonstop" / "1 stop"
        duration = safe_text(card.locator("span").filter(has_text=re.compile(r"hr|min"))).first)
        stops = safe_text(card.locator("span").filter(has_text=re.compile(r"Nonstop|stop", re.I))).first)

        out.append(FlightResult(
            airline=airline,
            depart_time=times,  # keep raw; you can split later
            arrive_time=None,
            stops=stops,
            duration=duration,
            price=price,
        ))

    return out

Because the UI changes, treat these selectors as a starting point, not a promise.

The key engineering pattern is stable:

  1. wait for an element that means “results exist”
  2. locate cards
  3. extract a small set of fields
  4. log failures and keep going

Step 4: End-to-end script (route → scrape → export)

Here’s a full script you can run and adapt:

from __future__ import annotations

import csv
import json
from dataclasses import asdict

from playwright.sync_api import sync_playwright


def export_csv(rows, path: str):
    if not rows:
        return
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=list(rows[0].keys()))
        w.writeheader()
        for r in rows:
            w.writerow(r)


def export_json(rows, path: str):
    with open(path, "w", encoding="utf-8") as f:
        json.dump(rows, f, ensure_ascii=False, indent=2)


def run(origin: str, dest: str, depart: str):
    url = "https://www.google.com/travel/flights?hl=en"

    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(viewport={"width": 1400, "height": 900}, locale="en-US")
        page = context.new_page()

        page.goto(url, wait_until="domcontentloaded", timeout=60_000)
        page.wait_for_timeout(2000)

        set_route_and_date(page, origin=origin, dest=dest, depart_yyyy_mm_dd=depart)

        # Mandatory screenshot (A-track)
        page.screenshot(path="google-flights-results.jpg", full_page=True)

        results = parse_results(page, limit=30)
        rows = [asdict(r) for r in results]

        export_csv(rows, "flights.csv")
        export_json(rows, "flights.json")

        context.close()
        browser.close()

    return rows


if __name__ == "__main__":
    data = run("DEL", "BOM", "2026-05-10")
    print("scraped", len(data), "results")

You now have:

  • google-flights-results.jpg
  • flights.csv
  • flights.json

Where ProxiesAPI fits

For a JS-heavy site like Google Flights, you typically use proxies at one of two layers:

  1. Browser traffic (Playwright running through a proxy)
  2. Pre-/post-flight requests (e.g. downloading auxiliary HTML/JSON endpoints)

Option A: Run Playwright through a proxy

Playwright supports proxy settings at browser launch.

If ProxiesAPI gives you an HTTP proxy endpoint, you can pass it like:

browser = p.chromium.launch(
    headless=True,
    proxy={
        "server": "http://YOUR_PROXIESAPI_PROXY:PORT",
        "username": "YOUR_USER",
        "password": "YOUR_PASS",
    },
)

Keep it honest:

  • proxies don’t remove the need for pacing
  • rotating IPs helps reduce correlated failures
  • if you slam the site with browser sessions, you’ll still get blocked

Option B: Use ProxiesAPI as the fetch layer (when HTML is available)

On sites where requests works, a proxy-backed fetch is simple:

import requests

r = requests.get(
    "https://example.com",
    proxies={"http": "http://YOUR_PROXY", "https": "http://YOUR_PROXY"},
    timeout=(10, 30),
)

Google Flights specifically is not a great candidate for plain-HTML scraping — hence Playwright.


Reliability checklist

  • Use a real viewport + locale (reduce UI variability)
  • Use timeouts everywhere (no hanging)
  • Add retries around navigation and selector waits
  • Keep request rate low (sleep between routes)
  • Take screenshots when selectors fail (debug faster)

Next upgrades

  • make the calendar selection deterministic (month navigation + exact date labels)
  • scroll the results list and collect more than the first screen
  • run multiple routes/dates with a queue (and a backoff policy)
  • store results in SQLite so refresh runs only update deltas
Keep flight price crawls stable with ProxiesAPI

Google Flights is a dynamic, high-change target. ProxiesAPI helps you spread load, rotate IPs, and keep your crawling layer reliable as you scale routes, dates, and refresh frequency.

Related guides

Scrape Costco Product Prices with Python (Search + Pagination + SKU Variants)
Pull product name, price, unit size, and availability from Costco listings into a clean CSV using ProxiesAPI + requests. Includes pagination and variant normalization patterns.
tutorial#python#costco#price-scraping
Scrape Rightmove Sold Prices (Second Angle): Price History Dataset Builder
Build a clean Rightmove sold-price history dataset with dedupe + incremental updates, plus a screenshot of the sold-price flow and ProxiesAPI-backed fetching.
tutorial#python#rightmove#web-scraping
Scrape Expedia Flight and Hotel Data with Python (Step-by-Step)
A practical Expedia scraper in Python using Playwright: open search results, extract hotel cards (and where flight offers live), paginate safely, and export clean JSON/CSV. Includes ProxiesAPI-friendly network patterns and a screenshot.
tutorial#python#playwright#expedia
Scrape Google Maps Business Listings with Python: Search → Place Details → Reviews (ProxiesAPI)
Extract local leads from Google Maps: search results → place details → reviews, with a resilient fetch pipeline and a screenshot-driven selector approach.
tutorial#python#google-maps#local-leads