Scrape Marktplaats Listings with Python (Search + Pagination + CSV Export)

May 12, 2026 · tutorial · #python, #marktplaats, #web-scraping, #beautifulsoup, #requests, #csv

Marktplaats (Netherlands’ biggest classifieds marketplace) is a great real‑world scraping target because:

results are list-based and repeatable
pagination is explicit
listing cards contain the exact fields you want (title, price, location)

In this tutorial we’ll build a practical scraper in Python that:

fetches a Marktplaats search results page
parses listing cards (title, price, location, url)
follows pagination for N pages
exports to CSV
optionally routes requests through ProxiesAPI for stability

Keep Marktplaats scraping stable with ProxiesAPI

As you move from one search page to dozens (and from one keyword to many), the network layer becomes your bottleneck. ProxiesAPI gives you a simple fetch URL wrapper so your Python extraction code stays focused on parsing — not blocks and flaky responses.

Get 1,000 free API calls View pricing

What we’re scraping (page structure)

A Marktplaats search is typically reached from the website UI, but the end result is a URL with a query.

Example (illustrative):

https://www.marktplaats.nl/q/iphone/

On the search results page you’ll usually see:

a repeating “card” per listing (title, price, location)
a link to the listing detail page
pagination controls (next page)

Because Marktplaats can change its HTML and can vary by category, the right approach is to inspect the page and write selectors that match actual attributes (and to keep a couple of fallbacks).

Setup

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

requests for HTTP
BeautifulSoup(lxml) for reliable parsing
csv (stdlib) for export

Step 1: Fetch HTML with timeouts (and a real User-Agent)

A surprising number of scrapers “work” until they hang. Always set timeouts.

import requests

TIMEOUT = (10, 30)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
    "Accept-Language": "en-US,en;q=0.9",
})


def fetch_html(url: str) -> str:
    r = session.get(url, timeout=TIMEOUT)
    r.raise_for_status()
    return r.text


url = "https://www.marktplaats.nl/q/iphone/"
html = fetch_html(url)
print("bytes:", len(html))
print(html[:200])

Terminal sanity check

curl -s "https://www.marktplaats.nl/q/iphone/" | head -n 5

Step 2: Parse listing cards (title, price, location, url)

Marktplaats HTML is not guaranteed stable forever, so we’ll parse using a strategy:

find candidate card containers
within each card, find the main link + visible title
extract price and location if present

We’ll also normalize whitespace and build absolute URLs.

from bs4 import BeautifulSoup
from urllib.parse import urljoin

BASE = "https://www.marktplaats.nl"


def clean(text: str | None) -> str | None:
    if not text:
        return None
    t = " ".join(text.split())
    return t or None


def parse_listings(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    listings: list[dict] = []

    # Heuristic: search result pages tend to have many links to ".../a/..." style paths.
    # We'll look for anchors that look like listing links and climb to a card container.
    anchors = soup.select("a[href]")

    seen_urls = set()

    for a in anchors:
        href = a.get("href") or ""
        # Marktplaats listing URLs often contain "/a/". This is a heuristic.
        if "/a/" not in href:
            continue

        url = urljoin(BASE, href)
        if url in seen_urls:
            continue

        title = clean(a.get_text(" ", strip=True))
        if not title or len(title) < 6:
            # too short; often nav/ads
            continue

        # Try to locate a reasonable container to search for price/location nearby.
        card = a
        for _ in range(5):
            if not card or not getattr(card, "name", None):
                break
            if card.name in {"article", "li", "div"}:
                # stop at a common container
                break
            card = card.parent

        container = card if card else a

        text_blob = clean(container.get_text(" ", strip=True)) or ""

        # Price heuristic: € symbol or "EUR".
        price = None
        if "€" in text_blob:
            # pick the first token that contains €
            for token in text_blob.split():
                if "€" in token:
                    # might be like "€250" or "€ 250"
                    price = token if token != "€" else None
                    break

        # Location heuristic: Marktplaats cards often show a city + date.
        # We'll try to capture a short chunk near the end; this is intentionally conservative.
        location = None
        parts = text_blob.split(" · ")
        if len(parts) >= 2:
            # often "City · Today" or similar
            location = clean(parts[0])

        listings.append({
            "title": title,
            "price": price,
            "location": location,
            "url": url,
        })
        seen_urls.add(url)

    return listings


listings = parse_listings(html)
print("parsed:", len(listings))
print(listings[0] if listings else None)

Why this approach?

For a tutorial that survives minor site changes, it’s better to:

anchor on “listing links” (the most stable concept)
extract nearby text
keep heuristics minimal and transparent

If you need perfect extraction (e.g., separating “price” from “bidding” labels), the next step is to inspect the DOM and tighten selectors for the specific page layout you’re targeting.

Step 3: Pagination (crawl N result pages)

The cleanest pagination strategy is:

start from a search URL
parse a “next page” link, follow it
stop after max_pages or when no next link exists

from urllib.parse import urlparse, urlunparse, parse_qs, urlencode


def find_next_page_url(current_url: str, html: str) -> str | None:
    soup = BeautifulSoup(html, "lxml")

    # Attempt 1: rel=next (best case)
    ln = soup.select_one("link[rel='next'][href]")
    if ln:
        return urljoin(BASE, ln.get("href"))

    # Attempt 2: anchor with aria-label or text that implies next
    a = soup.select_one("a[rel='next'][href]")
    if a:
        return urljoin(BASE, a.get("href"))

    # Fallback: if no explicit next link exists, try incrementing a common query param.
    # Some sites use ?p=2 or ?page=2. We'll only do this if a param exists already.
    parsed = urlparse(current_url)
    qs = parse_qs(parsed.query)
    for key in ("p", "page"):
        if key in qs:
            try:
                n = int(qs[key][0])
                qs[key] = [str(n + 1)]
                new_query = urlencode(qs, doseq=True)
                return urlunparse(parsed._replace(query=new_query))
            except Exception:
                pass

    return None


def crawl_search(start_url: str, max_pages: int = 3) -> list[dict]:
    all_rows: list[dict] = []
    seen = set()

    url = start_url
    for page in range(1, max_pages + 1):
        html = fetch_html(url)
        batch = parse_listings(html)

        for row in batch:
            u = row.get("url")
            if not u or u in seen:
                continue
            seen.add(u)
            all_rows.append(row)

        print(f"page={page} url={url} batch={len(batch)} total={len(all_rows)}")

        nxt = find_next_page_url(url, html)
        if not nxt:
            break
        url = nxt

    return all_rows


rows = crawl_search("https://www.marktplaats.nl/q/iphone/", max_pages=5)
print("total unique:", len(rows))

Step 4: Export to CSV

import csv


def export_csv(rows: list[dict], path: str) -> None:
    fieldnames = ["title", "price", "location", "url"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fieldnames)
        w.writeheader()
        for r in rows:
            w.writerow({k: r.get(k) for k in fieldnames})


export_csv(rows, "marktplaats_listings.csv")
print("wrote marktplaats_listings.csv", len(rows))

Step 5: Route fetches through ProxiesAPI (optional)

When you scale scraping (more pages, more keywords, more frequent runs), failures come from the network layer:

intermittent timeouts
inconsistent responses
blocked requests

With ProxiesAPI you can fetch through a simple URL wrapper:

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://www.marktplaats.nl/q/iphone/" | head

In Python, wrap any target URL and reuse the same parsing code:

from urllib.parse import quote


def proxiesapi_wrap(target_url: str, api_key: str) -> str:
    # ProxiesAPI uses a simple querystring wrapper.
    # Keep the target URL URL-encoded.
    return f"http://api.proxiesapi.com/?key={api_key}&url={quote(target_url, safe='')}"


API_KEY = "API_KEY"
start = "https://www.marktplaats.nl/q/iphone/"
wrapped = proxiesapi_wrap(start, API_KEY)

html = fetch_html(wrapped)
print("bytes via proxies:", len(html))

Notice the win: your parser doesn’t change. Only the fetch URL changes.