How to Scrape Booking.com Hotel Prices with Python (Using ProxiesAPI)

Booking.com is a classic “high-value” scraping target: hotel names, nightly prices, and review scores are exactly the inputs you need for deal alerts, market research, or competitive pricing.

The catch: Booking pages are dynamic and can be fragile if you guess selectors or hammer requests.

Booking.com search results page (we'll scrape hotel cards)

In this guide we’ll build a real Python scraper that:

  • fetches a Booking.com search results URL via ProxiesAPI
  • parses hotel cards (name, price, score, location blurb)
  • exports clean JSON
  • includes a selector-debug workflow so you can fix it quickly when Booking changes markup
Make your Booking.com crawl more reliable with ProxiesAPI

When you scrape at scale, failures are usually network-level: rate limits, transient blocks, and inconsistent responses. ProxiesAPI gives you a simple proxy-backed fetch URL so your scraper keeps moving without you building proxy plumbing.


What we’re scraping (page shape)

We’ll scrape a Booking.com search results page, like:

  • https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0

On this page, hotels appear as repeating “property cards”. Booking’s HTML changes often, so the key is:

  1. Fetch HTML reliably (timeouts + retries)
  2. Use multiple selectors + fallbacks
  3. Validate output by spot-checking 5–10 items

Important: Booking may localize currency and content, and may show different prices to different users. Treat scraped prices as observations, not ground truth.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for parsing

Step 1: Fetch HTML using ProxiesAPI

ProxiesAPI’s core integration is simple: you call a single URL that fetches the target for you.

API_KEY="YOUR_PROXIESAPI_KEY"
TARGET="https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0"

curl -s "http://api.proxiesapi.com/?key=$API_KEY&url=$TARGET" | head -n 20

In Python, we’ll wrap this into a fetch_html() function with:

  • a real timeout
  • basic retries
  • a desktop User-Agent (helps avoid “broken” responses)
import time
import urllib.parse
import requests

API_KEY = "YOUR_PROXIESAPI_KEY"
TIMEOUT = (10, 60)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
})


def proxiesapi_url(target_url: str) -> str:
    return "http://api.proxiesapi.com/?" + urllib.parse.urlencode({
        "key": API_KEY,
        "url": target_url,
    })


def fetch_html(target_url: str, retries: int = 3, backoff: float = 2.0) -> str:
    url = proxiesapi_url(target_url)

    last_err = None
    for attempt in range(1, retries + 1):
        try:
            r = session.get(url, timeout=TIMEOUT)
            r.raise_for_status()

            # Quick sanity check: Booking pages are large; tiny HTML is often a block/consent.
            if len(r.text) < 20_000:
                raise RuntimeError(f"Suspiciously small response: {len(r.text)} bytes")

            return r.text
        except Exception as e:
            last_err = e
            sleep_s = backoff ** attempt
            print(f"attempt {attempt}/{retries} failed: {e} -> sleeping {sleep_s:.1f}s")
            time.sleep(sleep_s)

    raise RuntimeError(f"Failed after {retries} retries: {last_err}")

Step 2: Inspect the HTML and choose robust selectors

Booking markup evolves, so instead of relying on one brittle class name, we’ll try multiple patterns.

At the time of writing, Booking uses “property card” containers with data attributes like:

  • data-testid="property-card"

Try this local inspection:

from bs4 import BeautifulSoup

html = fetch_html("https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0")

soup = BeautifulSoup(html, "lxml")
cards = soup.select('[data-testid="property-card"]')
print("cards:", len(cards))
print(cards[0].get_text(" ", strip=True)[:300])

If cards is 0, don’t guess randomly—search the HTML for something you see on the page (a hotel name, or “Property”/“Review score”).


Step 3: Parse hotel cards into structured data

We’ll extract:

  • name
  • price per night (as text)
  • review score (as text)
  • short location / distance blurb (if present)
  • booking link (relative → absolute)
import re
from bs4 import BeautifulSoup

BASE = "https://www.booking.com"


def clean_text(x: str | None) -> str | None:
    if not x:
        return None
    x = re.sub(r"\s+", " ", x).strip()
    return x or None


def parse_cards(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    out = []

    cards = soup.select('[data-testid="property-card"]')

    for c in cards:
        # Name
        name_el = (
            c.select_one('[data-testid="title"]')
            or c.select_one('[data-testid="property-card-title"]')
            or c.select_one("h3")
        )
        name = clean_text(name_el.get_text(" ", strip=True) if name_el else None)

        # Price text: Booking can show “Price for 2 nights” etc; we keep as raw text.
        price_el = (
            c.select_one('[data-testid="price-and-discounted-price"]')
            or c.select_one('[data-testid="price"]')
        )
        price = clean_text(price_el.get_text(" ", strip=True) if price_el else None)

        # Review score
        score_el = (
            c.select_one('[data-testid="review-score"]')
            or c.select_one('[data-testid="review-score"] div')
        )
        score = clean_text(score_el.get_text(" ", strip=True) if score_el else None)

        # Location/distance snippet
        loc_el = (
            c.select_one('[data-testid="location"]')
            or c.select_one('[data-testid="address"]')
        )
        location = clean_text(loc_el.get_text(" ", strip=True) if loc_el else None)

        # Link
        a = c.select_one('a[href*="/hotel/"]') or c.select_one("a")
        href = a.get("href") if a else None
        if href and href.startswith("/"):
            href = BASE + href.split("?")[0]

        # Skip obviously empty rows
        if not name and not price and not href:
            continue

        out.append({
            "name": name,
            "price_text": price,
            "review_score_text": score,
            "location_text": location,
            "url": href,
        })

    return out

Terminal-style run

if __name__ == "__main__":
    target = "https://www.booking.com/searchresults.html?ss=London&checkin=2026-04-10&checkout=2026-04-12&group_adults=2&no_rooms=1&group_children=0"
    html = fetch_html(target)
    hotels = parse_cards(html)

    print("hotels:", len(hotels))
    for h in hotels[:3]:
        print(h)

Typical output will look like:

hotels: 25
{'name': '...', 'price_text': '₹ ...', 'review_score_text': 'Scored ...', 'location_text': '...', 'url': 'https://www.booking.com/hotel/...'}
...

Export to JSON

import json

with open("booking_hotels.json", "w", encoding="utf-8") as f:
    json.dump(hotels, f, ensure_ascii=False, indent=2)

print("wrote booking_hotels.json", len(hotels))

Common failure modes (and how to debug fast)

1) You get a tiny HTML page

That’s usually a consent page, block page, or error response.

Fixes:

  • verify you’re using realistic headers (User-Agent, Accept-Language)
  • add backoff between runs
  • rotate requests across time (don’t do 200 requests in 10 seconds)
  • use ProxiesAPI for more reliable fetching across runs

2) Selectors return 0 cards

Do not rewrite everything. Instead:

  • save the HTML to disk
  • search for a known hotel name from the page
  • update one selector at a time
with open("debug.html", "w", encoding="utf-8") as f:
    f.write(html)

print("saved debug.html")

Then open debug.html in your browser, inspect a card, and adjust selectors.


Where ProxiesAPI fits (honestly)

Booking.com is a site where reliability matters. You might be scraping multiple cities, dates, and occupancies.

ProxiesAPI helps you keep the “fetch layer” simple:

  • you call one URL
  • you keep your parsing code unchanged
  • you can add retries/backoff without managing proxy pools

QA checklist

  • cards > 0 on your target query
  • Spot-check 5 hotels: do name and price match what you see on the page?
  • URLs resolve to real hotel pages
  • Your script uses timeouts and retries
  • You export clean JSON
Make your Booking.com crawl more reliable with ProxiesAPI

When you scrape at scale, failures are usually network-level: rate limits, transient blocks, and inconsistent responses. ProxiesAPI gives you a simple proxy-backed fetch URL so your scraper keeps moving without you building proxy plumbing.

Related guides

How to Scrape AutoTrader Used Car Listings with Python (Make/Model/Price/Mileage)
Scrape AutoTrader search results into a clean dataset: title, price, mileage, year, location, and dealer vs private hints. Includes ProxiesAPI fetch, robust selectors, and export to JSON.
tutorial#python#autotrader#cars
Scrape Product Data from Amazon (with Python + ProxiesAPI)
Extract Amazon product title, price, rating, and availability from a product page using requests + BeautifulSoup, with retries and proxy-backed fetching via ProxiesAPI.
tutorial#python#amazon#web-scraping
Web Scraping with Python: The Complete 2026 Tutorial
A from-scratch, production-minded guide to web scraping in Python: requests + BeautifulSoup, pagination, retries, caching, proxies, and a reusable scraper template.
guide#web scraping python#python#web-scraping
Build a Job Board with Data from Indeed (Python scraper tutorial)
Scrape Indeed job listings (title, company, location, salary, summary) with Python (requests + BeautifulSoup), then save a clean dataset you can render as a simple job board. Includes pagination + ProxiesAPI fetch.
tutorial#python#indeed#jobs