Scrape Marktplaats.nl Listings with Python (search + pagination + price extraction)
Marktplaats.nl is one of the biggest classifieds marketplaces in the Netherlands. It’s a great scraping target because:
- search results are rich (title, price, location, seller type)
- pagination is explicit
- many categories have consistent listing cards
In this guide we’ll build a practical Marktplaats search scraper in Python that:
- fetches search pages (with timeouts + retries)
- parses listing cards with real CSS selectors
- follows pagination until a limit
- normalizes prices
- exports results to CSV

Marketplaces rate-limit fast. ProxiesAPI helps you rotate IPs and keep a consistent fetch layer so your crawler doesn’t fall apart when you scale beyond a few pages.
What we’re scraping (site structure)
Marktplaats search results live under URLs like:
https://www.marktplaats.nl/q/<query>/
You’ll typically see:
- a grid/list of listing cards
- each card has a link to the detail page
- pagination controls near the bottom
Quick sanity check
curl -I "https://www.marktplaats.nl/q/iphone/" | head
If you get HTML and can view the page in a normal browser, you can usually scrape it with standard HTML parsing.
Setup
python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml tenacity
We’ll use:
requestsfor HTTPBeautifulSoup(lxml)for robust parsingtenacityfor retries with backoff
ProxiesAPI integration (network layer)
You have two common patterns:
- Direct fetch (no proxy) — good for small experiments
- Proxy-backed fetch — better for repeatable crawls and avoiding rate limits
Below is a thin “fetch client” that can be configured either way.
Replace
PROXIESAPI_PROXY_URLwith the proxy endpoint you use from ProxiesAPI (or however your account is configured). The rest of the scraper stays the same.
import os
import random
import time
from typing import Optional
import requests
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
BASE = "https://www.marktplaats.nl"
TIMEOUT = (10, 30) # connect, read
# Example: http://USER:PASS@gateway.proxiesapi.com:PORT
PROXIESAPI_PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")
session = requests.Session()
DEFAULT_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/123.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9,nl;q=0.8",
"Connection": "keep-alive",
}
def _proxy_dict() -> Optional[dict]:
if not PROXIESAPI_PROXY_URL:
return None
return {
"http": PROXIESAPI_PROXY_URL,
"https": PROXIESAPI_PROXY_URL,
}
@retry(
reraise=True,
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=20),
retry=retry_if_exception_type((requests.RequestException,)),
)
def fetch(url: str) -> str:
"""Fetch HTML with timeouts + retries.
If PROXIESAPI_PROXY_URL is set, requests will go through the proxy.
"""
proxies = _proxy_dict()
# light jitter to look less bot-like
time.sleep(random.uniform(0.4, 1.2))
r = session.get(
url,
headers=DEFAULT_HEADERS,
timeout=TIMEOUT,
proxies=proxies,
)
# If you get 403/429, slow down and/or use proxies
r.raise_for_status()
return r.text
Step 1: Build search URLs and handle pagination
Marktplaats search uses a query “q” path form. We’ll start with something simple:
- query string becomes:
https://www.marktplaats.nl/q/<query>/
Then we’ll discover and follow pagination links.
from urllib.parse import quote
def search_url(query: str) -> str:
q = quote(query.strip())
return f"{BASE}/q/{q}/"
Step 2: Parse listing cards (selectors that survive)
Marktplaats HTML can evolve. The safest approach is:
- select cards broadly
- extract fields from within each card
- be tolerant of missing elements
Below is a parser that targets common “card-like” anchors and typical sub-elements.
import re
from bs4 import BeautifulSoup
def clean_text(s: str | None) -> str | None:
if not s:
return None
t = re.sub(r"\s+", " ", s).strip()
return t or None
def parse_price(text: str | None) -> dict:
"""Parse common price strings.
Returns:
{"raw": "€ 120", "amount": 120.0, "currency": "EUR"}
"""
raw = clean_text(text)
if not raw:
return {"raw": None, "amount": None, "currency": None}
# Examples you may encounter:
# - "€ 120,00"
# - "€ 120"
# - "Bieden" (bid)
# - "Gratis" (free)
if raw.lower() in {"bieden", "gratis"}:
return {"raw": raw, "amount": 0.0 if raw.lower() == "gratis" else None, "currency": "EUR"}
m = re.search(r"€\s*([0-9\.]+)(?:,([0-9]{2}))?", raw)
if not m:
return {"raw": raw, "amount": None, "currency": "EUR" if "€" in raw else None}
whole = m.group(1).replace(".", "")
cents = m.group(2) or "0"
try:
amount = float(f"{int(whole)}.{int(cents):02d}")
except ValueError:
amount = None
return {"raw": raw, "amount": amount, "currency": "EUR"}
def parse_search_page(html: str) -> tuple[list[dict], str | None]:
soup = BeautifulSoup(html, "lxml")
listings = []
# Strategy:
# Many marketplace result cards are wrapped in <a ... href="/v/...">...
# We filter for anchors that look like item detail URLs.
for a in soup.select("a[href]"):
href = a.get("href") or ""
if not href.startswith("/v/"):
continue
title = clean_text(a.get_text(" ", strip=True))
if not title or len(title) < 8:
continue
url = href if href.startswith("http") else f"{BASE}{href}"
# Try to locate price and location inside the anchor/card
# These selectors are intentionally broad.
price_el = a.select_one("[class*='price'], [data-testid*='price']")
location_el = a.select_one("[class*='location'], [data-testid*='location']")
seller_el = a.select_one("[class*='seller'], [data-testid*='seller']")
price = parse_price(price_el.get_text(" ", strip=True) if price_el else None)
listings.append(
{
"title": title,
"url": url,
"price_raw": price["raw"],
"price_amount": price["amount"],
"currency": price["currency"],
"location": clean_text(location_el.get_text(" ", strip=True) if location_el else None),
"seller": clean_text(seller_el.get_text(" ", strip=True) if seller_el else None),
}
)
# Pagination: look for rel="next" if present, else find an anchor with "volgende".
next_url = None
rel_next = soup.select_one("link[rel='next']")
if rel_next and rel_next.get("href"):
href = rel_next.get("href")
next_url = href if href.startswith("http") else f"{BASE}{href}"
if not next_url:
next_a = soup.select_one("a[rel='next'], a[aria-label*='Volgende'], a:has(span:contains('Volgende'))")
if next_a and next_a.get("href"):
href = next_a.get("href")
next_url = href if href.startswith("http") else f"{BASE}{href}"
return listings, next_url
Important: Marktplaats can change classes/attributes, and may render pieces dynamically. If you see lots of missing prices, you have 3 options:
- use Playwright (headless browser) to render JS
- look for JSON embedded in the HTML (common in modern apps)
- scrape the detail pages where price is more stable
We’ll keep this tutorial HTML-first (fast + cheap).
Step 3: Crawl N pages and dedupe listings
import csv
def crawl_search(query: str, max_pages: int = 5) -> list[dict]:
url = search_url(query)
all_rows: list[dict] = []
seen_urls: set[str] = set()
for page in range(1, max_pages + 1):
html = fetch(url)
rows, next_url = parse_search_page(html)
added = 0
for r in rows:
u = r.get("url")
if not u or u in seen_urls:
continue
seen_urls.add(u)
all_rows.append(r)
added += 1
print(f"page={page} fetched={url} parsed={len(rows)} added={added} total={len(all_rows)}")
if not next_url:
break
url = next_url
return all_rows
def export_csv(rows: list[dict], path: str) -> None:
fieldnames = [
"title",
"url",
"price_raw",
"price_amount",
"currency",
"location",
"seller",
]
with open(path, "w", newline="", encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=fieldnames)
w.writeheader()
for r in rows:
w.writerow(r)
if __name__ == "__main__":
data = crawl_search("iphone", max_pages=3)
export_csv(data, "marktplaats_iphone.csv")
print("wrote", len(data), "rows")
Troubleshooting (403 / 429 / empty cards)
1) You get blocked quickly
- slow down (add 1–3s jitter)
- reduce concurrency
- use ProxiesAPI rotation (
PROXIESAPI_PROXY_URL) - persist cookies (
requests.Session()already helps)
2) Missing fields (price/location)
Modern UIs sometimes inject key data via JS. If the HTML doesn’t contain the data you need:
- inspect the page source for embedded JSON (search for
__NEXT_DATA__,application/ld+json, or big JSON blobs) - or switch to a rendering approach (Playwright)
3) Duplicates across pages
Always dedupe by URL or by a stable item id.
QA checklist
- Can fetch page HTML reliably (timeouts + retries)
- Extracts at least title + URL for most cards
- Pagination increases total unique items
- CSV writes cleanly
- Proxy toggle works via environment variable
Where ProxiesAPI fits (honestly)
For a few pages, Marktplaats may work without proxies.
But when you:
- crawl multiple queries
- scrape daily
- follow detail pages
…your request volume climbs fast. ProxiesAPI helps you keep the fetch layer stable and reduces the odds of your crawler getting shut down mid-run.
Marketplaces rate-limit fast. ProxiesAPI helps you rotate IPs and keep a consistent fetch layer so your crawler doesn’t fall apart when you scale beyond a few pages.