Scrape Funda.nl Property Listings with Python (Search + Pagination + Detail Pages)

Apr 28, 2026 · tutorial · #python, #real-estate, #funda, #web-scraping, #beautifulsoup, #requests, #pagination

Funda.nl is one of the most popular real-estate portals in the Netherlands. It’s a perfect “real world” scraping target because you need to handle:

search result pages (with filters)
pagination
de-duplication
detail pages (where the rich data lives)

In this guide we’ll build a production-shaped Python scraper that:

fetches Funda search pages
paginates until it runs out of results (with hard safety limits)
extracts listing URLs + basic card fields
visits each listing detail page and extracts structured fields
exports JSONL/CSV

We’ll keep it honest: site structures change. The key is writing selectors that are explainable and easy to update.

Make your real-estate crawl reliable with ProxiesAPI

Funda pages can be sensitive to repetitive traffic and geo/rate limits. ProxiesAPI helps you keep the network layer stable as you scale search pages + detail pages to thousands of listings.

Get 1,000 free API calls View pricing

What we’re scraping (site shape)

Funda has:

Search results for a city/area + filters (price, size, property type)
Listing detail pages for each property

Typical flow:

Start from a search URL (you can create this manually in your browser)
On each results page, collect listing links
Follow each listing and extract fields (address, price, bedrooms, area, features, agent)

Legal + operational note

Always review the site’s Terms of Service and respect robots/rate limits. For data you plan to resell or use commercially, get legal advice.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml pandas

We’ll use:

requests for HTTP
BeautifulSoup(lxml) for parsing
pandas (optional) for CSV

Step 1: A fetch layer you can swap to ProxiesAPI

Scrapers fail most often at the network layer (timeouts, blocks, inconsistent responses). So we’ll isolate fetching.

from __future__ import annotations

import os
import time
import random
from urllib.parse import urljoin

import requests

BASE = "https://www.funda.nl"
TIMEOUT = (10, 30)  # connect, read

session = requests.Session()

DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/123.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9,nl;q=0.8",
}


def polite_sleep(min_s: float = 0.7, max_s: float = 1.8) -> None:
    time.sleep(random.uniform(min_s, max_s))


def fetch(url: str) -> str:
    """Fetch HTML. If you use ProxiesAPI, this is the only function you change."""

    # Option A (direct):
    r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT)

    # Option B (ProxiesAPI):
    # If ProxiesAPI provides a proxy endpoint or an API fetch endpoint in your account,
    # wire it here. Keep the rest of the scraper unchanged.
    # Example patterns (illustrative; use the exact method from your ProxiesAPI docs):
    #
    # PROXY = os.environ.get("PROXIESAPI_HTTP_PROXY")
    # proxies = {"http": PROXY, "https": PROXY} if PROXY else None
    # r = session.get(url, headers=DEFAULT_HEADERS, proxies=proxies, timeout=TIMEOUT)

    r.raise_for_status()
    return r.text

If you later need retries, add them here (with exponential backoff).

Step 2: Pick a stable search URL

The easiest way is:

Open Funda in your browser
Set the area (e.g., Amsterdam)
Apply a filter (e.g., “for sale”)
Copy the URL from the address bar

You’ll end up with a URL that looks like a city path plus query params.

In code, we’ll treat it as a start_url.

Step 3: Parse search result cards (URLs + summary fields)

Funda’s DOM changes over time, so don’t rely on one fragile CSS class.

A practical strategy:

find result containers that contain links to /koop/ (buy) or /huur/ (rent)
from those links, build absolute URLs
also parse nearby text for price/title when present

import re
from bs4 import BeautifulSoup


def absolute(href: str) -> str:
    return href if href.startswith("http") else urljoin(BASE, href)


def parse_search_page(html: str) -> tuple[list[dict], str | None]:
    """Return (listings, next_page_url)."""

    soup = BeautifulSoup(html, "lxml")

    listings: list[dict] = []

    # Heuristic: listing links usually contain /koop/ or /huur/ and end with a slash.
    anchors = soup.select("a[href]")
    for a in anchors:
        href = a.get("href") or ""
        if not href.startswith("/"):
            continue
        if "/koop/" not in href and "/huur/" not in href:
            continue
        if "#" in href:
            continue

        url = absolute(href)

        # Basic de-dup within page; we’ll also de-dup globally.
        listings.append({
            "url": url,
            "card_text": a.get_text(" ", strip=True) or None,
        })

    # De-dup URLs while preserving order
    seen = set()
    uniq = []
    for it in listings:
        if it["url"] in seen:
            continue
        seen.add(it["url"])
        uniq.append(it)

    # Try to find a "next" link. Many sites use rel="next".
    next_link = soup.select_one('a[rel="next"][href]')
    next_url = absolute(next_link.get("href")) if next_link else None

    return uniq, next_url

This “wide net” approach often catches extra navigation links. That’s OK because we’ll validate listing pages in the detail parser.

Step 4: Parse a listing detail page

On detail pages you want stable fields. Commonly available:

full address
price
living area (m²)
plot area (m²)
number of rooms
energy label
agent/broker

We’ll implement parsing defensively:

Try a few selectors
If missing, keep None

from bs4 import BeautifulSoup


def text_or_none(el) -> str | None:
    if not el:
        return None
    t = el.get_text(" ", strip=True)
    return t if t else None


def parse_listing_detail(html: str, url: str) -> dict:
    soup = BeautifulSoup(html, "lxml")

    # Examples of robust-ish strategies:
    # - use <h1> for title/address
    # - find price by searching for currency-like patterns

    h1 = soup.select_one("h1")
    title = text_or_none(h1)

    # Price: look for elements that contain €
    price = None
    for el in soup.select("*[class], *[data-test], p, span, div"):
        t = el.get_text(" ", strip=True)
        if "€" in t and len(t) < 60:
            price = t
            break

    # Key-value features: many property sites show a definition list or a table.
    features = {}

    # Definition lists
    for dl in soup.select("dl"):
        dts = dl.select("dt")
        dds = dl.select("dd")
        if len(dts) != len(dds) or len(dts) == 0:
            continue
        for dt, dd in zip(dts, dds):
            k = dt.get_text(" ", strip=True)
            v = dd.get_text(" ", strip=True)
            if k and v:
                features[k] = v

    # Fallback: table rows
    for tr in soup.select("table tr"):
        tds = tr.select("td")
        if len(tds) >= 2:
            k = tds[0].get_text(" ", strip=True)
            v = tds[1].get_text(" ", strip=True)
            if k and v and k not in features:
                features[k] = v

    return {
        "url": url,
        "title": title,
        "price": price,
        "features": features,
    }

This parser is intentionally “generic” because Funda’s markup can change and may differ across listing types.

If you want very stable extraction, a better long-term strategy is to:

locate the JSON embedded in the page (often in a <script type="application/ld+json"> or a Next.js state script)
parse it and use its fields

Step 5: Crawl search → pagination → details

Now we wire it all together with safety controls:

max_pages to avoid infinite loops
max_listings to cap run size
de-dup URLs

import json


def crawl(start_url: str, max_pages: int = 10, max_listings: int = 200) -> list[dict]:
    out: list[dict] = []
    seen_urls: set[str] = set()

    url = start_url
    pages = 0

    while url and pages < max_pages and len(out) < max_listings:
        pages += 1
        print(f"[search] page {pages}: {url}")
        html = fetch(url)
        listings, next_url = parse_search_page(html)

        print(f"  found {len(listings)} candidate links")

        for it in listings:
            if len(out) >= max_listings:
                break

            u = it["url"]
            if u in seen_urls:
                continue
            seen_urls.add(u)

            polite_sleep()

            try:
                detail_html = fetch(u)
            except Exception as e:
                print("  [detail] failed", u, e)
                continue

            data = parse_listing_detail(detail_html, u)

            # Minimal validation: if we can’t find a title, it might not be a listing.
            if not data.get("title"):
                continue

            out.append(data)
            print(f"  [detail] ok: {data.get('title')}")

        url = next_url
        polite_sleep(1.0, 2.5)

    return out


if __name__ == "__main__":
    START_URL = "https://www.funda.nl/koop/amsterdam/"  # replace with your filtered URL

    rows = crawl(START_URL, max_pages=5, max_listings=50)
    print("listings:", len(rows))

    with open("funda_listings.jsonl", "w", encoding="utf-8") as f:
        for r in rows:
            f.write(json.dumps(r, ensure_ascii=False) + "\n")

    print("wrote funda_listings.jsonl")

Making selectors actually stable

A quick checklist that saves hours:

Prefer attributes like data-* when present (sites use them for testing)
Prefer semantic tags: h1, dl dt/dd, script[type="application/ld+json"]
Avoid long “CSS class chains” that look auto-generated
Validate by spot-checking 3–5 pages
Log missing fields so you know what broke

Where ProxiesAPI fits

Real-estate crawls tend to:

hit the same domain repeatedly
trigger rate limits
face geo restrictions

ProxiesAPI is useful when you:

increase crawl speed and parallelism
crawl multiple cities and filters
run scheduled refreshes (daily/weekly)

Keep the integration limited to the fetch() layer so you can swap proxy settings, rotate identities, or add retries without rewriting your parsers.

QA checklist

Search parsing finds listing URLs (print first 5)
Pagination stops correctly (no infinite loop)
Detail parsing returns non-empty title for most rows
Output file is valid JSONL
Run size capped with max_pages / max_listings

Next upgrades

Extract structured JSON embedded in listing pages (often more stable than HTML)
Store runs in SQLite with last_seen timestamps
Add incremental crawling (only new listings)
Add concurrency with httpx + asyncio once your network layer is stable

Make your real-estate crawl reliable with ProxiesAPI

Funda pages can be sensitive to repetitive traffic and geo/rate limits. ProxiesAPI helps you keep the network layer stable as you scale search pages + detail pages to thousands of listings.

Get 1,000 free API calls View pricing

Build a Rightmove sold-prices dataset builder in Python: fetch HTML reliably, parse listing cards, follow pagination, enrich details pages, and export a clean CSV/JSONL. Includes proof screenshots and a resilient request layer with ProxiesAPI.

tutorial#python#rightmove#real-estate

Scrape UK Property Prices from Rightmove Sold Prices (Python + Dataset Builder)

Build a repeatable sold-prices dataset from Rightmove: search pages → listing IDs → sold history. Includes pagination, dedupe, retries, and an honest ProxiesAPI integration for stability.

tutorial#python#rightmove#real-estate

Scrape Google Scholar Search Results with Python (Authors, Citations, and Year)

Build a repeatable Scholar scraper for queries + pagination, extracting title, authors, venue, year, and citation count. Includes anti-block hygiene and honest notes on limits.

tutorial#python#google-scholar#web-scraping

Scrape TripAdvisor Hotel Reviews with Python (Pagination + Rate Limits)

Extract TripAdvisor hotel review text, ratings, dates, and reviewer metadata with a resilient Python scraper (pagination, retries, and a proxy-backed fetch layer via ProxiesAPI).

tutorial#python#tripadvisor#reviews

Scrape Funda.nl Property Listings with Python (Search + Pagination + Detail Pages)

Related guides