Scrape Marktplaats.nl Listings with Python (search + pagination + price extraction)

Marktplaats.nl is one of the biggest classifieds marketplaces in the Netherlands. It’s a great scraping target because:

  • search results are rich (title, price, location, seller type)
  • pagination is explicit
  • many categories have consistent listing cards

In this guide we’ll build a practical Marktplaats search scraper in Python that:

  • fetches search pages (with timeouts + retries)
  • parses listing cards with real CSS selectors
  • follows pagination until a limit
  • normalizes prices
  • exports results to CSV

Marktplaats search results page we’ll scrape (cards + prices + pagination)

Keep Marktplaats scrapes stable with ProxiesAPI

Marketplaces rate-limit fast. ProxiesAPI helps you rotate IPs and keep a consistent fetch layer so your crawler doesn’t fall apart when you scale beyond a few pages.


What we’re scraping (site structure)

Marktplaats search results live under URLs like:

  • https://www.marktplaats.nl/q/<query>/

You’ll typically see:

  • a grid/list of listing cards
  • each card has a link to the detail page
  • pagination controls near the bottom

Quick sanity check

curl -I "https://www.marktplaats.nl/q/iphone/" | head

If you get HTML and can view the page in a normal browser, you can usually scrape it with standard HTML parsing.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml tenacity

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for robust parsing
  • tenacity for retries with backoff

ProxiesAPI integration (network layer)

You have two common patterns:

  1. Direct fetch (no proxy) — good for small experiments
  2. Proxy-backed fetch — better for repeatable crawls and avoiding rate limits

Below is a thin “fetch client” that can be configured either way.

Replace PROXIESAPI_PROXY_URL with the proxy endpoint you use from ProxiesAPI (or however your account is configured). The rest of the scraper stays the same.

import os
import random
import time
from typing import Optional

import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

BASE = "https://www.marktplaats.nl"
TIMEOUT = (10, 30)  # connect, read

# Example: http://USER:PASS@gateway.proxiesapi.com:PORT
PROXIESAPI_PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")

session = requests.Session()

DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/123.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9,nl;q=0.8",
    "Connection": "keep-alive",
}


def _proxy_dict() -> Optional[dict]:
    if not PROXIESAPI_PROXY_URL:
        return None
    return {
        "http": PROXIESAPI_PROXY_URL,
        "https": PROXIESAPI_PROXY_URL,
    }


@retry(
    reraise=True,
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=20),
    retry=retry_if_exception_type((requests.RequestException,)),
)
def fetch(url: str) -> str:
    """Fetch HTML with timeouts + retries.

    If PROXIESAPI_PROXY_URL is set, requests will go through the proxy.
    """
    proxies = _proxy_dict()

    # light jitter to look less bot-like
    time.sleep(random.uniform(0.4, 1.2))

    r = session.get(
        url,
        headers=DEFAULT_HEADERS,
        timeout=TIMEOUT,
        proxies=proxies,
    )

    # If you get 403/429, slow down and/or use proxies
    r.raise_for_status()
    return r.text

Step 1: Build search URLs and handle pagination

Marktplaats search uses a query “q” path form. We’ll start with something simple:

  • query string becomes: https://www.marktplaats.nl/q/<query>/

Then we’ll discover and follow pagination links.

from urllib.parse import quote


def search_url(query: str) -> str:
    q = quote(query.strip())
    return f"{BASE}/q/{q}/"

Step 2: Parse listing cards (selectors that survive)

Marktplaats HTML can evolve. The safest approach is:

  • select cards broadly
  • extract fields from within each card
  • be tolerant of missing elements

Below is a parser that targets common “card-like” anchors and typical sub-elements.

import re
from bs4 import BeautifulSoup


def clean_text(s: str | None) -> str | None:
    if not s:
        return None
    t = re.sub(r"\s+", " ", s).strip()
    return t or None


def parse_price(text: str | None) -> dict:
    """Parse common price strings.

    Returns:
      {"raw": "€ 120", "amount": 120.0, "currency": "EUR"}
    """
    raw = clean_text(text)
    if not raw:
        return {"raw": None, "amount": None, "currency": None}

    # Examples you may encounter:
    # - "€ 120,00"
    # - "€ 120"
    # - "Bieden" (bid)
    # - "Gratis" (free)

    if raw.lower() in {"bieden", "gratis"}:
        return {"raw": raw, "amount": 0.0 if raw.lower() == "gratis" else None, "currency": "EUR"}

    m = re.search(r"€\s*([0-9\.]+)(?:,([0-9]{2}))?", raw)
    if not m:
        return {"raw": raw, "amount": None, "currency": "EUR" if "€" in raw else None}

    whole = m.group(1).replace(".", "")
    cents = m.group(2) or "0"
    try:
        amount = float(f"{int(whole)}.{int(cents):02d}")
    except ValueError:
        amount = None

    return {"raw": raw, "amount": amount, "currency": "EUR"}


def parse_search_page(html: str) -> tuple[list[dict], str | None]:
    soup = BeautifulSoup(html, "lxml")

    listings = []

    # Strategy:
    # Many marketplace result cards are wrapped in <a ... href="/v/...">...
    # We filter for anchors that look like item detail URLs.
    for a in soup.select("a[href]"):
        href = a.get("href") or ""
        if not href.startswith("/v/"):
            continue

        title = clean_text(a.get_text(" ", strip=True))
        if not title or len(title) < 8:
            continue

        url = href if href.startswith("http") else f"{BASE}{href}"

        # Try to locate price and location inside the anchor/card
        # These selectors are intentionally broad.
        price_el = a.select_one("[class*='price'], [data-testid*='price']")
        location_el = a.select_one("[class*='location'], [data-testid*='location']")
        seller_el = a.select_one("[class*='seller'], [data-testid*='seller']")

        price = parse_price(price_el.get_text(" ", strip=True) if price_el else None)

        listings.append(
            {
                "title": title,
                "url": url,
                "price_raw": price["raw"],
                "price_amount": price["amount"],
                "currency": price["currency"],
                "location": clean_text(location_el.get_text(" ", strip=True) if location_el else None),
                "seller": clean_text(seller_el.get_text(" ", strip=True) if seller_el else None),
            }
        )

    # Pagination: look for rel="next" if present, else find an anchor with "volgende".
    next_url = None

    rel_next = soup.select_one("link[rel='next']")
    if rel_next and rel_next.get("href"):
        href = rel_next.get("href")
        next_url = href if href.startswith("http") else f"{BASE}{href}"

    if not next_url:
        next_a = soup.select_one("a[rel='next'], a[aria-label*='Volgende'], a:has(span:contains('Volgende'))")
        if next_a and next_a.get("href"):
            href = next_a.get("href")
            next_url = href if href.startswith("http") else f"{BASE}{href}"

    return listings, next_url

Important: Marktplaats can change classes/attributes, and may render pieces dynamically. If you see lots of missing prices, you have 3 options:

  • use Playwright (headless browser) to render JS
  • look for JSON embedded in the HTML (common in modern apps)
  • scrape the detail pages where price is more stable

We’ll keep this tutorial HTML-first (fast + cheap).


Step 3: Crawl N pages and dedupe listings

import csv


def crawl_search(query: str, max_pages: int = 5) -> list[dict]:
    url = search_url(query)

    all_rows: list[dict] = []
    seen_urls: set[str] = set()

    for page in range(1, max_pages + 1):
        html = fetch(url)
        rows, next_url = parse_search_page(html)

        added = 0
        for r in rows:
            u = r.get("url")
            if not u or u in seen_urls:
                continue
            seen_urls.add(u)
            all_rows.append(r)
            added += 1

        print(f"page={page} fetched={url} parsed={len(rows)} added={added} total={len(all_rows)}")

        if not next_url:
            break
        url = next_url

    return all_rows


def export_csv(rows: list[dict], path: str) -> None:
    fieldnames = [
        "title",
        "url",
        "price_raw",
        "price_amount",
        "currency",
        "location",
        "seller",
    ]

    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fieldnames)
        w.writeheader()
        for r in rows:
            w.writerow(r)


if __name__ == "__main__":
    data = crawl_search("iphone", max_pages=3)
    export_csv(data, "marktplaats_iphone.csv")
    print("wrote", len(data), "rows")

Troubleshooting (403 / 429 / empty cards)

1) You get blocked quickly

  • slow down (add 1–3s jitter)
  • reduce concurrency
  • use ProxiesAPI rotation (PROXIESAPI_PROXY_URL)
  • persist cookies (requests.Session() already helps)

2) Missing fields (price/location)

Modern UIs sometimes inject key data via JS. If the HTML doesn’t contain the data you need:

  • inspect the page source for embedded JSON (search for __NEXT_DATA__, application/ld+json, or big JSON blobs)
  • or switch to a rendering approach (Playwright)

3) Duplicates across pages

Always dedupe by URL or by a stable item id.


QA checklist

  • Can fetch page HTML reliably (timeouts + retries)
  • Extracts at least title + URL for most cards
  • Pagination increases total unique items
  • CSV writes cleanly
  • Proxy toggle works via environment variable

Where ProxiesAPI fits (honestly)

For a few pages, Marktplaats may work without proxies.

But when you:

  • crawl multiple queries
  • scrape daily
  • follow detail pages

…your request volume climbs fast. ProxiesAPI helps you keep the fetch layer stable and reduces the odds of your crawler getting shut down mid-run.

Keep Marktplaats scrapes stable with ProxiesAPI

Marketplaces rate-limit fast. ProxiesAPI helps you rotate IPs and keep a consistent fetch layer so your crawler doesn’t fall apart when you scale beyond a few pages.

Related guides

Scrape Numbeo Cost of Living Data with Python (cities, indices, and tables)
Extract Numbeo cost-of-living tables into a structured dataset (with a screenshot), then export to JSON/CSV using ProxiesAPI-backed requests.
tutorial#python#web-scraping#beautifulsoup
Scrape Costco Product Prices with Python (Search + Pagination + SKU Variants)
Pull product name, price, unit size, and availability from Costco listings into a clean CSV using ProxiesAPI + requests. Includes pagination and variant normalization patterns.
tutorial#python#costco#price-scraping
Scrape Rightmove Sold Prices (Second Angle): Price History Dataset Builder
Build a clean Rightmove sold-price history dataset with dedupe + incremental updates, plus a screenshot of the sold-price flow and ProxiesAPI-backed fetching.
tutorial#python#rightmove#web-scraping
Scrape TripAdvisor Hotel Reviews with Python (Pagination + Rate Limits)
Extract TripAdvisor hotel review text, ratings, dates, and reviewer metadata with a resilient Python scraper (pagination, retries, and a proxy-backed fetch layer via ProxiesAPI).
tutorial#python#tripadvisor#reviews