How to Scrape Booking.com Hotel Prices with Python (Using ProxiesAPI)

Mar 18, 2026 · tutorial · #python, #booking, #price-scraping, #web-scraping, #requests, #beautifulsoup, #proxies

Booking.com is a classic “high-value” scraping target: hotel names, nightly prices, and review scores are exactly the inputs you need for deal alerts, market research, or competitive pricing.

The catch: Booking pages are dynamic and can be fragile if you guess selectors or hammer requests.

Booking.com search results page (we'll scrape hotel cards)

In this guide we’ll build a real Python scraper that:

fetches a Booking.com search results URL via ProxiesAPI
parses hotel cards (name, price, score, location blurb)
exports clean JSON
includes a selector-debug workflow so you can fix it quickly when Booking changes markup

Make your Booking.com crawl more reliable with ProxiesAPI

When you scrape at scale, failures are usually network-level: rate limits, transient blocks, and inconsistent responses. ProxiesAPI gives you a simple proxy-backed fetch URL so your scraper keeps moving without you building proxy plumbing.

Get 1,000 free API calls View pricing

What we’re scraping (page shape)

We’ll scrape a Booking.com search results page, like:

https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0

On this page, hotels appear as repeating “property cards”. Booking’s HTML changes often, so the key is:

Fetch HTML reliably (timeouts + retries)
Use multiple selectors + fallbacks
Validate output by spot-checking 5–10 items

Important: Booking may localize currency and content, and may show different prices to different users. Treat scraped prices as observations, not ground truth.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

requests for HTTP
BeautifulSoup(lxml) for parsing

Step 1: Fetch HTML using ProxiesAPI

ProxiesAPI’s core integration is simple: you call a single URL that fetches the target for you.

API_KEY="YOUR_PROXIESAPI_KEY"
TARGET="https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0"

curl -s "http://api.proxiesapi.com/?key=$API_KEY&url=$TARGET" | head -n 20

In Python, we’ll wrap this into a fetch_html() function with:

a real timeout
basic retries
a desktop User-Agent (helps avoid “broken” responses)

import time
import urllib.parse
import requests

API_KEY = "YOUR_PROXIESAPI_KEY"
TIMEOUT = (10, 60)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
})


def proxiesapi_url(target_url: str) -> str:
    return "http://api.proxiesapi.com/?" + urllib.parse.urlencode({
        "key": API_KEY,
        "url": target_url,
    })


def fetch_html(target_url: str, retries: int = 3, backoff: float = 2.0) -> str:
    url = proxiesapi_url(target_url)

    last_err = None
    for attempt in range(1, retries + 1):
        try:
            r = session.get(url, timeout=TIMEOUT)
            r.raise_for_status()

            # Quick sanity check: Booking pages are large; tiny HTML is often a block/consent.
            if len(r.text) < 20_000:
                raise RuntimeError(f"Suspiciously small response: {len(r.text)} bytes")

            return r.text
        except Exception as e:
            last_err = e
            sleep_s = backoff ** attempt
            print(f"attempt {attempt}/{retries} failed: {e} -> sleeping {sleep_s:.1f}s")
            time.sleep(sleep_s)

    raise RuntimeError(f"Failed after {retries} retries: {last_err}")

Step 2: Inspect the HTML and choose robust selectors

Booking markup evolves, so instead of relying on one brittle class name, we’ll try multiple patterns.

At the time of writing, Booking uses “property card” containers with data attributes like:

data-testid="property-card"

Try this local inspection:

from bs4 import BeautifulSoup

html = fetch_html("https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0")

soup = BeautifulSoup(html, "lxml")
cards = soup.select('[data-testid="property-card"]')
print("cards:", len(cards))
print(cards[0].get_text(" ", strip=True)[:300])

If cards is 0, don’t guess randomly—search the HTML for something you see on the page (a hotel name, or “Property”/“Review score”).

Step 3: Parse hotel cards into structured data

We’ll extract:

name
price per night (as text)
review score (as text)
short location / distance blurb (if present)
booking link (relative → absolute)

import re
from bs4 import BeautifulSoup

BASE = "https://www.booking.com"


def clean_text(x: str | None) -> str | None:
    if not x:
        return None
    x = re.sub(r"\s+", " ", x).strip()
    return x or None


def parse_cards(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    out = []

    cards = soup.select('[data-testid="property-card"]')

    for c in cards:
        # Name
        name_el = (
            c.select_one('[data-testid="title"]')
            or c.select_one('[data-testid="property-card-title"]')
            or c.select_one("h3")
        )
        name = clean_text(name_el.get_text(" ", strip=True) if name_el else None)

        # Price text: Booking can show “Price for 2 nights” etc; we keep as raw text.
        price_el = (
            c.select_one('[data-testid="price-and-discounted-price"]')
            or c.select_one('[data-testid="price"]')
        )
        price = clean_text(price_el.get_text(" ", strip=True) if price_el else None)

        # Review score
        score_el = (
            c.select_one('[data-testid="review-score"]')
            or c.select_one('[data-testid="review-score"] div')
        )
        score = clean_text(score_el.get_text(" ", strip=True) if score_el else None)

        # Location/distance snippet
        loc_el = (
            c.select_one('[data-testid="location"]')
            or c.select_one('[data-testid="address"]')
        )
        location = clean_text(loc_el.get_text(" ", strip=True) if loc_el else None)

        # Link
        a = c.select_one('a[href*="/hotel/"]') or c.select_one("a")
        href = a.get("href") if a else None
        if href and href.startswith("/"):
            href = BASE + href.split("?")[0]

        # Skip obviously empty rows
        if not name and not price and not href:
            continue

        out.append({
            "name": name,
            "price_text": price,
            "review_score_text": score,
            "location_text": location,
            "url": href,
        })

    return out

Terminal-style run

if __name__ == "__main__":
    target = "https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0"
    html = fetch_html(target)
    hotels = parse_cards(html)

    print("hotels:", len(hotels))
    for h in hotels[:3]:
        print(h)

Typical output will look like:

hotels: 25
{'name': '...', 'price_text': '₹ ...', 'review_score_text': 'Scored ...', 'location_text': '...', 'url': 'https://www.booking.com/hotel/...'}
...

Export to JSON

import json

with open("booking_hotels.json", "w", encoding="utf-8") as f:
    json.dump(hotels, f, ensure_ascii=False, indent=2)

print("wrote booking_hotels.json", len(hotels))

Common failure modes (and how to debug fast)

1) You get a tiny HTML page

That’s usually a consent page, block page, or error response.

Fixes:

verify you’re using realistic headers (User-Agent, Accept-Language)
add backoff between runs
rotate requests across time (don’t do 200 requests in 10 seconds)
use ProxiesAPI for more reliable fetching across runs

2) Selectors return 0 cards

Do not rewrite everything. Instead:

save the HTML to disk
search for a known hotel name from the page
update one selector at a time

with open("debug.html", "w", encoding="utf-8") as f:
    f.write(html)

print("saved debug.html")

Then open debug.html in your browser, inspect a card, and adjust selectors.

Where ProxiesAPI fits (honestly)

Booking.com is a site where reliability matters. You might be scraping multiple cities, dates, and occupancies.

ProxiesAPI helps you keep the “fetch layer” simple:

you call one URL
you keep your parsing code unchanged
you can add retries/backoff without managing proxy pools

QA checklist

cards > 0 on your target query
Spot-check 5 hotels: do name and price match what you see on the page?
URLs resolve to real hotel pages
Your script uses timeouts and retries
You export clean JSON

Make your Booking.com crawl more reliable with ProxiesAPI

Get 1,000 free API calls View pricing

Build a practical Steam search scraper: fetch the real HTML, extract game title/appid/price/discount/review summary, and export clean CSV/JSON. Includes a screenshot and a ProxiesAPI-based fetch layer for stability.

tutorial#python#steam#price-scraping

Scrape Hotel Prices from Booking.com (Python) — Dates, Room Types, Total Price

A practical Booking.com scraper in Python that builds a search URL with dates + occupancy, parses hotel cards, and extracts room offers (room type + total price) with retries and screenshots for verification.

tutorial#python#booking#travel

How to Scrape Google Flights Prices with Python (Routes, Dates, and Price Quotes)

A practical guide to extracting flight price quotes from Google Flights responsibly: capture share URLs, fetch server-rendered HTML, parse price cards, and export clean JSON. Includes ProxiesAPI-backed requests + a screenshot.

tutorial#python#google-flights#travel

Scrape Product Prices from Home Depot (Search + Category Pages) with Python + ProxiesAPI

Extract product name, price, and availability from Home Depot listing pages (search + category) with pagination, resilient parsing, and an anti-block-friendly request layer.

tutorial#python#home-depot#ecommerce

How to Scrape Booking.com Hotel Prices with Python (Using ProxiesAPI)

Related guides