Scrape UK Property Prices from Rightmove (Sold Prices) with Python

Rightmove is one of the most useful public sources for UK property sold-price comps.

In this tutorial we’ll build a practical “dataset builder” that:

  • searches Rightmove sold-property results for an area
  • paginates through results (incremental)
  • extracts key fields from listing cards
  • deduplicates by a stable listing identifier
  • exports to CSV (and JSONL if you want)

We’ll keep the scraper honest:

  • Rightmove HTML and APIs change over time
  • some fields may be missing on some cards
  • you should respect robots / ToS and throttle requests

Rightmove sold prices results page (we’ll scrape listing cards)

Keep Rightmove crawls stable with ProxiesAPI

Property portals rate-limit aggressively once you paginate or run daily jobs. ProxiesAPI gives you a consistent, proxy-backed request layer so your dataset builds don’t die mid-crawl.


What we’re scraping (page + structure)

Rightmove sold property results are served as a search results page that contains listing cards. The key things we want:

  • a stable listing id (often present in links)
  • address / title
  • sold price (if shown)
  • sold date (if shown)
  • property type
  • bedrooms
  • estate agent (sometimes)
  • listing URL

The big win: you can build a comps dataset without visiting every detail page by extracting from results cards first.

A quick sanity check

curl -sL "https://www.rightmove.co.uk/house-prices.html" | head -n 5

If you get blocked or challenged, that’s where a proxy layer helps.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml pandas

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for parsing
  • pandas just to make CSV export easy (optional)

ProxiesAPI request helper (drop-in)

ProxiesAPI typically fits as a network wrapper: you keep your parsing logic unchanged, but route requests through a proxy endpoint.

Below is a simple pattern that works well for “HTML-in, HTML-out” scrapers.

Create rightmove_scraper.py and set an env var:

export PROXIESAPI_KEY="YOUR_KEY"
import os
import time
import random
import urllib.parse
import requests

PROXIESAPI_KEY = os.getenv("PROXIESAPI_KEY", "")

TIMEOUT = (15, 45)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123 Safari/537.36",
    "Accept-Language": "en-GB,en;q=0.9",
})


def fetch_html(url: str, *, use_proxiesapi: bool = True) -> str:
    """Fetch HTML. If use_proxiesapi is True, route the request via ProxiesAPI."""

    if use_proxiesapi:
        if not PROXIESAPI_KEY:
            raise RuntimeError("Set PROXIESAPI_KEY env var")

        # Generic proxy pattern: ProxiesAPI fetches the target URL and returns HTML.
        # Adjust the endpoint/params to your ProxiesAPI account’s format.
        proxied = (
            "https://api.proxiesapi.com"
            f"?api_key={urllib.parse.quote(PROXIESAPI_KEY)}"
            f"&url={urllib.parse.quote(url, safe='')}"
        )
        r = session.get(proxied, timeout=TIMEOUT)
    else:
        r = session.get(url, timeout=TIMEOUT)

    r.raise_for_status()
    return r.text


def polite_sleep(min_s=1.0, max_s=2.5):
    time.sleep(random.uniform(min_s, max_s))

Notes:

  • The parsing stays the same whether you use proxies or not.
  • If you run this daily or at scale, add retries and exponential backoff.

Step 1: Build a sold-price search URL

Rightmove has multiple entry points (house prices pages, search results, etc.). For dataset-building, you want a URL that:

  • returns a list of sold listings for a location
  • supports pagination via a query param (commonly index / start-style)

Because Rightmove URLs can be complex and change, treat the “URL builder” as a configuration step.

A pragmatic workflow:

  1. Go to Rightmove in your browser
  2. Search Sold prices for your target area
  3. Copy the resulting URL
  4. Paste it as BASE_RESULTS_URL

In code, we’ll take a base_url and then append/replace a pagination param.


Step 2: Parse listing cards from HTML

We’ll extract:

  • listing_id from links (best-effort)
  • address/title
  • price
  • sold_date
  • beds
  • property_type
  • url

Rightmove’s CSS selectors can change. The safest approach is:

  • select “card containers”
  • within each card, find the first link that looks like a property detail page
  • parse text blocks defensively
import re
from bs4 import BeautifulSoup
from urllib.parse import urljoin

RIGHTMOVE_BASE = "https://www.rightmove.co.uk"


def extract_int(text: str):
    m = re.search(r"(\d+)", text or "")
    return int(m.group(1)) if m else None


def normalize_ws(s: str) -> str:
    return re.sub(r"\s+", " ", (s or "").strip())


def parse_results_page(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    out = []

    # Heuristic: results are often in list items / divs with links to /properties/
    # We’ll first collect property links, then climb to a likely container.
    for a in soup.select("a[href*='/properties/']"):
        href = a.get("href")
        if not href:
            continue

        url = href if href.startswith("http") else urljoin(RIGHTMOVE_BASE, href)

        # listing id is commonly in the URL path: /properties/<id>
        m = re.search(r"/properties/(\d+)", url)
        listing_id = m.group(1) if m else None

        # Find a container node to read card text
        card = a
        for _ in range(5):
            if card is None:
                break
            # stop at a block-like container
            if card.name in ("div", "li", "article"):
                break
            card = card.parent

        card_text = normalize_ws(card.get_text(" ", strip=True) if card else a.get_text(" ", strip=True))

        # Best-effort extraction. These patterns are not guaranteed.
        price = None
        sold_date = None
        beds = None
        property_type = None

        # Prices tend to look like £123,456
        pm = re.search(r"£\s?[\d,]+", card_text)
        price = pm.group(0).replace(" ", "") if pm else None

        # Sold date often contains month/year or "Sold" tokens
        dm = re.search(r"Sold\s+(in\s+)?([A-Za-z]{3,9}\s+\d{4})", card_text)
        sold_date = dm.group(2) if dm else None

        # Beds often appear as "3 bed" or "3 bedroom"
        bm = re.search(r"(\d+)\s+bed", card_text, re.IGNORECASE)
        beds = int(bm.group(1)) if bm else None

        # Property type is usually a word like "Terraced", "Semi-Detached" etc.
        tm = re.search(r"(Detached|Semi-Detached|Terraced|End of Terrace|Flat|Maisonette|Bungalow|Cottage)", card_text, re.IGNORECASE)
        property_type = tm.group(1) if tm else None

        title = normalize_ws(a.get_text(" ", strip=True))

        out.append({
            "listing_id": listing_id,
            "title": title,
            "price": price,
            "sold_date": sold_date,
            "beds": beds,
            "property_type": property_type,
            "url": url,
        })

    # De-dupe by listing_id/url within the page
    uniq = []
    seen = set()
    for row in out:
        key = row.get("listing_id") or row.get("url")
        if not key or key in seen:
            continue
        seen.add(key)
        uniq.append(row)

    return uniq

This parser is designed to survive missing fields and still output something useful.


Step 3: Pagination + dedupe across pages

Rightmove results commonly use a index parameter (start offset) for paging.

We’ll implement pagination as:

  • start at index=0
  • step by page_size (often 24)
  • stop when a page returns 0 new listings
from urllib.parse import urlparse, parse_qs, urlencode, urlunparse


def set_query_param(url: str, key: str, value: str) -> str:
    parts = urlparse(url)
    q = parse_qs(parts.query)
    q[key] = [str(value)]
    new_query = urlencode(q, doseq=True)
    return urlunparse((parts.scheme, parts.netloc, parts.path, parts.params, new_query, parts.fragment))


def crawl_sold_results(base_results_url: str, pages: int = 10, page_size: int = 24):
    all_rows = []
    seen = set()

    for i in range(pages):
        index = i * page_size
        page_url = set_query_param(base_results_url, "index", str(index))

        html = fetch_html(page_url, use_proxiesapi=True)
        batch = parse_results_page(html)

        new_count = 0
        for row in batch:
            key = row.get("listing_id") or row.get("url")
            if not key or key in seen:
                continue
            seen.add(key)
            all_rows.append(row)
            new_count += 1

        print(f"page {i+1} index={index} batch={len(batch)} new={new_count} total={len(all_rows)}")

        if new_count == 0:
            break

        polite_sleep()

    return all_rows

Step 4: Export to CSV

import pandas as pd


def export_csv(rows: list[dict], path: str = "rightmove_sold_prices.csv"):
    df = pd.DataFrame(rows)
    # Keep stable column order
    cols = ["listing_id", "title", "price", "sold_date", "beds", "property_type", "url"]
    df = df[[c for c in cols if c in df.columns]]
    df.to_csv(path, index=False)
    print("wrote", path, len(df))

Full runnable script

Put it all together:

if __name__ == "__main__":
    # 1) In your browser, run a sold-prices search on Rightmove and paste the URL here.
    BASE_RESULTS_URL = "PASTE_RIGHTMOVE_SOLD_RESULTS_URL_HERE"

    rows = crawl_sold_results(BASE_RESULTS_URL, pages=20, page_size=24)
    export_csv(rows)

Run:

python rightmove_scraper.py

Common issues (and how to fix them)

1) HTML looks different than in your browser

Rightmove may render different HTML depending on headers / geo / bot signals.

Fixes:

  • use a real desktop User-Agent
  • add Accept-Language: en-GB
  • fetch via ProxiesAPI (proxy-backed requests reduce challenge frequency)

2) Your selector returns zero items

Don’t guess selectors for hours.

Instead:

  • save the HTML to a file: open("page.html", "w").write(html)
  • search for /properties/ in it
  • build your extraction around links, not brittle class names

3) Duplicates across pages

Some portals shuffle results or repeat.

That’s why we dedupe by listing_id or url.


Where ProxiesAPI helps (realistically)

For Rightmove-style sites, you tend to get blocked when you:

  • paginate deeply
  • run from a single IP repeatedly (cron jobs)
  • hit the site from cloud/VPS IP ranges

ProxiesAPI helps you keep the network layer stable while your parsing and export logic stays unchanged.


Next upgrades

  • Store results in SQLite and do incremental updates
  • Enrich each record by visiting the detail page (EPC rating, history, agent)
  • Add retries with exponential backoff + jitter
  • Schedule daily runs and only fetch “new since last run”
Keep Rightmove crawls stable with ProxiesAPI

Property portals rate-limit aggressively once you paginate or run daily jobs. ProxiesAPI gives you a consistent, proxy-backed request layer so your dataset builds don’t die mid-crawl.

Related guides

Scrape UK Property Prices from Rightmove with Python (Sold Prices Dataset + Screenshots)
Build a Rightmove sold-prices dataset builder in Python: fetch HTML reliably, parse listing cards, follow pagination, enrich details pages, and export a clean CSV/JSONL. Includes proof screenshots and a resilient request layer with ProxiesAPI.
tutorial#python#rightmove#real-estate
Scrape UK Property Prices from Rightmove (Dataset Builder + Screenshots)
Build a repeatable sold-prices dataset from Rightmove with Python + ProxiesAPI: crawl sold listings, paginate, fetch property details, and save a clean CSV/JSONL. Includes a screenshot capture step.
tutorial#python#rightmove#property-data
Scrape Government Contract Opportunities from SAM.gov (Python + ProxiesAPI)
Build a reliable scraper for SAM.gov contract opportunities: crawl search results, paginate, extract listing cards, fetch detail pages, and export CSV/JSON. Includes retry logic and a screenshot step for proof.
tutorial#python#sam-gov#government-contracts
Scrape UK Property Prices from Rightmove Sold Prices (Python + Dataset Builder)
Build a repeatable sold-prices dataset from Rightmove: search pages → listing IDs → sold history. Includes pagination, dedupe, retries, and an honest ProxiesAPI integration for stability.
tutorial#python#rightmove#real-estate