Scrape UK Property Prices from Rightmove (Sold Prices) with Python

Apr 30, 2026 · tutorial · #python, #rightmove, #property, #web-scraping, #requests, #beautifulsoup, #csv, #dataset

Rightmove is one of the most useful public sources for UK property sold-price comps.

In this tutorial we’ll build a practical “dataset builder” that:

searches Rightmove sold-property results for an area
paginates through results (incremental)
extracts key fields from listing cards
deduplicates by a stable listing identifier
exports to CSV (and JSONL if you want)

We’ll keep the scraper honest:

Rightmove HTML and APIs change over time
some fields may be missing on some cards
you should respect robots / ToS and throttle requests

Rightmove sold prices results page (we’ll scrape listing cards)

Keep Rightmove crawls stable with ProxiesAPI

Property portals rate-limit aggressively once you paginate or run daily jobs. ProxiesAPI gives you a consistent, proxy-backed request layer so your dataset builds don’t die mid-crawl.

Get 1,000 free API calls View pricing

What we’re scraping (page + structure)

Rightmove sold property results are served as a search results page that contains listing cards. The key things we want:

a stable listing id (often present in links)
address / title
sold price (if shown)
sold date (if shown)
property type
bedrooms
estate agent (sometimes)
listing URL

The big win: you can build a comps dataset without visiting every detail page by extracting from results cards first.

A quick sanity check

curl -sL "https://www.rightmove.co.uk/house-prices.html" | head -n 5

If you get blocked or challenged, that’s where a proxy layer helps.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml pandas

We’ll use:

requests for HTTP
BeautifulSoup(lxml) for parsing
pandas just to make CSV export easy (optional)

ProxiesAPI request helper (drop-in)

ProxiesAPI typically fits as a network wrapper: you keep your parsing logic unchanged, but route requests through a proxy endpoint.

Below is a simple pattern that works well for “HTML-in, HTML-out” scrapers.

Create rightmove_scraper.py and set an env var:

export PROXIESAPI_KEY="YOUR_KEY"

import os
import time
import random
import urllib.parse
import requests

PROXIESAPI_KEY = os.getenv("PROXIESAPI_KEY", "")

TIMEOUT = (15, 45)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123 Safari/537.36",
    "Accept-Language": "en-GB,en;q=0.9",
})


def fetch_html(url: str, *, use_proxiesapi: bool = True) -> str:
    """Fetch HTML. If use_proxiesapi is True, route the request via ProxiesAPI."""

    if use_proxiesapi:
        if not PROXIESAPI_KEY:
            raise RuntimeError("Set PROXIESAPI_KEY env var")

        # Generic proxy pattern: ProxiesAPI fetches the target URL and returns HTML.
        # Adjust the endpoint/params to your ProxiesAPI account’s format.
        proxied = (
            "https://api.proxiesapi.com"
            f"?api_key={urllib.parse.quote(PROXIESAPI_KEY)}"
            f"&url={urllib.parse.quote(url, safe='')}"
        )
        r = session.get(proxied, timeout=TIMEOUT)
    else:
        r = session.get(url, timeout=TIMEOUT)

    r.raise_for_status()
    return r.text


def polite_sleep(min_s=1.0, max_s=2.5):
    time.sleep(random.uniform(min_s, max_s))

Notes:

The parsing stays the same whether you use proxies or not.
If you run this daily or at scale, add retries and exponential backoff.

Step 1: Build a sold-price search URL

Rightmove has multiple entry points (house prices pages, search results, etc.). For dataset-building, you want a URL that:

returns a list of sold listings for a location
supports pagination via a query param (commonly index / start-style)

Because Rightmove URLs can be complex and change, treat the “URL builder” as a configuration step.

A pragmatic workflow:

Go to Rightmove in your browser
Search Sold prices for your target area
Copy the resulting URL
Paste it as BASE_RESULTS_URL

In code, we’ll take a base_url and then append/replace a pagination param.

Step 2: Parse listing cards from HTML

We’ll extract:

listing_id from links (best-effort)
address/title
price
sold_date
beds
property_type
url

Rightmove’s CSS selectors can change. The safest approach is:

select “card containers”
within each card, find the first link that looks like a property detail page
parse text blocks defensively

import re
from bs4 import BeautifulSoup
from urllib.parse import urljoin

RIGHTMOVE_BASE = "https://www.rightmove.co.uk"


def extract_int(text: str):
    m = re.search(r"(\d+)", text or "")
    return int(m.group(1)) if m else None


def normalize_ws(s: str) -> str:
    return re.sub(r"\s+", " ", (s or "").strip())


def parse_results_page(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    out = []

    # Heuristic: results are often in list items / divs with links to /properties/
    # We’ll first collect property links, then climb to a likely container.
    for a in soup.select("a[href*='/properties/']"):
        href = a.get("href")
        if not href:
            continue

        url = href if href.startswith("http") else urljoin(RIGHTMOVE_BASE, href)

        # listing id is commonly in the URL path: /properties/<id>
        m = re.search(r"/properties/(\d+)", url)
        listing_id = m.group(1) if m else None

        # Find a container node to read card text
        card = a
        for _ in range(5):
            if card is None:
                break
            # stop at a block-like container
            if card.name in ("div", "li", "article"):
                break
            card = card.parent

        card_text = normalize_ws(card.get_text(" ", strip=True) if card else a.get_text(" ", strip=True))

        # Best-effort extraction. These patterns are not guaranteed.
        price = None
        sold_date = None
        beds = None
        property_type = None

        # Prices tend to look like £123,456
        pm = re.search(r"£\s?[\d,]+", card_text)
        price = pm.group(0).replace(" ", "") if pm else None

        # Sold date often contains month/year or "Sold" tokens
        dm = re.search(r"Sold\s+(in\s+)?([A-Za-z]{3,9}\s+\d{4})", card_text)
        sold_date = dm.group(2) if dm else None

        # Beds often appear as "3 bed" or "3 bedroom"
        bm = re.search(r"(\d+)\s+bed", card_text, re.IGNORECASE)
        beds = int(bm.group(1)) if bm else None

        # Property type is usually a word like "Terraced", "Semi-Detached" etc.
        tm = re.search(r"(Detached|Semi-Detached|Terraced|End of Terrace|Flat|Maisonette|Bungalow|Cottage)", card_text, re.IGNORECASE)
        property_type = tm.group(1) if tm else None

        title = normalize_ws(a.get_text(" ", strip=True))

        out.append({
            "listing_id": listing_id,
            "title": title,
            "price": price,
            "sold_date": sold_date,
            "beds": beds,
            "property_type": property_type,
            "url": url,
        })

    # De-dupe by listing_id/url within the page
    uniq = []
    seen = set()
    for row in out:
        key = row.get("listing_id") or row.get("url")
        if not key or key in seen:
            continue
        seen.add(key)
        uniq.append(row)

    return uniq

This parser is designed to survive missing fields and still output something useful.

Step 3: Pagination + dedupe across pages

Rightmove results commonly use a index parameter (start offset) for paging.

We’ll implement pagination as:

start at index=0
step by page_size (often 24)
stop when a page returns 0 new listings

from urllib.parse import urlparse, parse_qs, urlencode, urlunparse


def set_query_param(url: str, key: str, value: str) -> str:
    parts = urlparse(url)
    q = parse_qs(parts.query)
    q[key] = [str(value)]
    new_query = urlencode(q, doseq=True)
    return urlunparse((parts.scheme, parts.netloc, parts.path, parts.params, new_query, parts.fragment))


def crawl_sold_results(base_results_url: str, pages: int = 10, page_size: int = 24):
    all_rows = []
    seen = set()

    for i in range(pages):
        index = i * page_size
        page_url = set_query_param(base_results_url, "index", str(index))

        html = fetch_html(page_url, use_proxiesapi=True)
        batch = parse_results_page(html)

        new_count = 0
        for row in batch:
            key = row.get("listing_id") or row.get("url")
            if not key or key in seen:
                continue
            seen.add(key)
            all_rows.append(row)
            new_count += 1

        print(f"page {i+1} index={index} batch={len(batch)} new={new_count} total={len(all_rows)}")

        if new_count == 0:
            break

        polite_sleep()

    return all_rows

Step 4: Export to CSV

import pandas as pd


def export_csv(rows: list[dict], path: str = "rightmove_sold_prices.csv"):
    df = pd.DataFrame(rows)
    # Keep stable column order
    cols = ["listing_id", "title", "price", "sold_date", "beds", "property_type", "url"]
    df = df[[c for c in cols if c in df.columns]]
    df.to_csv(path, index=False)
    print("wrote", path, len(df))

Full runnable script

Put it all together:

if __name__ == "__main__":
    # 1) In your browser, run a sold-prices search on Rightmove and paste the URL here.
    BASE_RESULTS_URL = "PASTE_RIGHTMOVE_SOLD_RESULTS_URL_HERE"

    rows = crawl_sold_results(BASE_RESULTS_URL, pages=20, page_size=24)
    export_csv(rows)

Run:

python rightmove_scraper.py

Common issues (and how to fix them)

1) HTML looks different than in your browser

Rightmove may render different HTML depending on headers / geo / bot signals.

Fixes:

use a real desktop User-Agent
add Accept-Language: en-GB
fetch via ProxiesAPI (proxy-backed requests reduce challenge frequency)

2) Your selector returns zero items

Don’t guess selectors for hours.

Instead:

save the HTML to a file: open("page.html", "w").write(html)
search for /properties/ in it
build your extraction around links, not brittle class names

3) Duplicates across pages

Some portals shuffle results or repeat.

That’s why we dedupe by listing_id or url.

Where ProxiesAPI helps (realistically)

For Rightmove-style sites, you tend to get blocked when you:

paginate deeply
run from a single IP repeatedly (cron jobs)
hit the site from cloud/VPS IP ranges

ProxiesAPI helps you keep the network layer stable while your parsing and export logic stays unchanged.

Next upgrades

Store results in SQLite and do incremental updates
Enrich each record by visiting the detail page (EPC rating, history, agent)
Add retries with exponential backoff + jitter
Schedule daily runs and only fetch “new since last run”

Keep Rightmove crawls stable with ProxiesAPI

Property portals rate-limit aggressively once you paginate or run daily jobs. ProxiesAPI gives you a consistent, proxy-backed request layer so your dataset builds don’t die mid-crawl.

Get 1,000 free API calls View pricing

Build a Rightmove sold-prices dataset builder in Python: fetch HTML reliably, parse listing cards, follow pagination, enrich details pages, and export a clean CSV/JSONL. Includes proof screenshots and a resilient request layer with ProxiesAPI.

tutorial#python#rightmove#real-estate

Scrape UK Property Prices from Rightmove (Dataset Builder + Screenshots)

Build a repeatable sold-prices dataset from Rightmove with Python + ProxiesAPI: crawl sold listings, paginate, fetch property details, and save a clean CSV/JSONL. Includes a screenshot capture step.

tutorial#python#rightmove#property-data

Scrape Government Contract Opportunities from SAM.gov (Python + ProxiesAPI)

Build a reliable scraper for SAM.gov contract opportunities: crawl search results, paginate, extract listing cards, fetch detail pages, and export CSV/JSON. Includes retry logic and a screenshot step for proof.

tutorial#python#sam-gov#government-contracts

Scrape UK Property Prices from Rightmove Sold Prices (Python + Dataset Builder)

Build a repeatable sold-prices dataset from Rightmove: search pages → listing IDs → sold history. Includes pagination, dedupe, retries, and an honest ProxiesAPI integration for stability.

tutorial#python#rightmove#real-estate

Scrape UK Property Prices from Rightmove (Sold Prices) with Python

Related guides