Scrape Marktplaats Listings with Python (Search + Pagination + CSV Export)

Marktplaats (Netherlands’ biggest classifieds marketplace) is a great real‑world scraping target because:

  • results are list-based and repeatable
  • pagination is explicit
  • listing cards contain the exact fields you want (title, price, location)

In this tutorial we’ll build a practical scraper in Python that:

  1. fetches a Marktplaats search results page
  2. parses listing cards (title, price, location, url)
  3. follows pagination for N pages
  4. exports to CSV
  5. optionally routes requests through ProxiesAPI for stability

Marktplaats search results (we’ll scrape listing cards + pagination)

Keep Marktplaats scraping stable with ProxiesAPI

As you move from one search page to dozens (and from one keyword to many), the network layer becomes your bottleneck. ProxiesAPI gives you a simple fetch URL wrapper so your Python extraction code stays focused on parsing — not blocks and flaky responses.


What we’re scraping (page structure)

A Marktplaats search is typically reached from the website UI, but the end result is a URL with a query.

Example (illustrative):

  • https://www.marktplaats.nl/q/iphone/

On the search results page you’ll usually see:

  • a repeating “card” per listing (title, price, location)
  • a link to the listing detail page
  • pagination controls (next page)

Because Marktplaats can change its HTML and can vary by category, the right approach is to inspect the page and write selectors that match actual attributes (and to keep a couple of fallbacks).


Setup

Create a virtual environment and install dependencies:

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for reliable parsing
  • csv (stdlib) for export

Step 1: Fetch HTML with timeouts (and a real User-Agent)

A surprising number of scrapers “work” until they hang. Always set timeouts.

import requests

TIMEOUT = (10, 30)  # connect, read

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)",
    "Accept-Language": "en-US,en;q=0.9",
})


def fetch_html(url: str) -> str:
    r = session.get(url, timeout=TIMEOUT)
    r.raise_for_status()
    return r.text


url = "https://www.marktplaats.nl/q/iphone/"
html = fetch_html(url)
print("bytes:", len(html))
print(html[:200])

Terminal sanity check

curl -s "https://www.marktplaats.nl/q/iphone/" | head -n 5

Step 2: Parse listing cards (title, price, location, url)

Marktplaats HTML is not guaranteed stable forever, so we’ll parse using a strategy:

  1. find candidate card containers
  2. within each card, find the main link + visible title
  3. extract price and location if present

We’ll also normalize whitespace and build absolute URLs.

from bs4 import BeautifulSoup
from urllib.parse import urljoin

BASE = "https://www.marktplaats.nl"


def clean(text: str | None) -> str | None:
    if not text:
        return None
    t = " ".join(text.split())
    return t or None


def parse_listings(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    listings: list[dict] = []

    # Heuristic: search result pages tend to have many links to ".../a/..." style paths.
    # We'll look for anchors that look like listing links and climb to a card container.
    anchors = soup.select("a[href]")

    seen_urls = set()

    for a in anchors:
        href = a.get("href") or ""
        # Marktplaats listing URLs often contain "/a/". This is a heuristic.
        if "/a/" not in href:
            continue

        url = urljoin(BASE, href)
        if url in seen_urls:
            continue

        title = clean(a.get_text(" ", strip=True))
        if not title or len(title) < 6:
            # too short; often nav/ads
            continue

        # Try to locate a reasonable container to search for price/location nearby.
        card = a
        for _ in range(5):
            if not card or not getattr(card, "name", None):
                break
            if card.name in {"article", "li", "div"}:
                # stop at a common container
                break
            card = card.parent

        container = card if card else a

        text_blob = clean(container.get_text(" ", strip=True)) or ""

        # Price heuristic: € symbol or "EUR".
        price = None
        if "€" in text_blob:
            # pick the first token that contains €
            for token in text_blob.split():
                if "€" in token:
                    # might be like "€250" or "€ 250"
                    price = token if token != "€" else None
                    break

        # Location heuristic: Marktplaats cards often show a city + date.
        # We'll try to capture a short chunk near the end; this is intentionally conservative.
        location = None
        parts = text_blob.split(" · ")
        if len(parts) >= 2:
            # often "City · Today" or similar
            location = clean(parts[0])

        listings.append({
            "title": title,
            "price": price,
            "location": location,
            "url": url,
        })
        seen_urls.add(url)

    return listings


listings = parse_listings(html)
print("parsed:", len(listings))
print(listings[0] if listings else None)

Why this approach?

For a tutorial that survives minor site changes, it’s better to:

  • anchor on “listing links” (the most stable concept)
  • extract nearby text
  • keep heuristics minimal and transparent

If you need perfect extraction (e.g., separating “price” from “bidding” labels), the next step is to inspect the DOM and tighten selectors for the specific page layout you’re targeting.


Step 3: Pagination (crawl N result pages)

The cleanest pagination strategy is:

  • start from a search URL
  • parse a “next page” link, follow it
  • stop after max_pages or when no next link exists
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode


def find_next_page_url(current_url: str, html: str) -> str | None:
    soup = BeautifulSoup(html, "lxml")

    # Attempt 1: rel=next (best case)
    ln = soup.select_one("link[rel='next'][href]")
    if ln:
        return urljoin(BASE, ln.get("href"))

    # Attempt 2: anchor with aria-label or text that implies next
    a = soup.select_one("a[rel='next'][href]")
    if a:
        return urljoin(BASE, a.get("href"))

    # Fallback: if no explicit next link exists, try incrementing a common query param.
    # Some sites use ?p=2 or ?page=2. We'll only do this if a param exists already.
    parsed = urlparse(current_url)
    qs = parse_qs(parsed.query)
    for key in ("p", "page"):
        if key in qs:
            try:
                n = int(qs[key][0])
                qs[key] = [str(n + 1)]
                new_query = urlencode(qs, doseq=True)
                return urlunparse(parsed._replace(query=new_query))
            except Exception:
                pass

    return None


def crawl_search(start_url: str, max_pages: int = 3) -> list[dict]:
    all_rows: list[dict] = []
    seen = set()

    url = start_url
    for page in range(1, max_pages + 1):
        html = fetch_html(url)
        batch = parse_listings(html)

        for row in batch:
            u = row.get("url")
            if not u or u in seen:
                continue
            seen.add(u)
            all_rows.append(row)

        print(f"page={page} url={url} batch={len(batch)} total={len(all_rows)}")

        nxt = find_next_page_url(url, html)
        if not nxt:
            break
        url = nxt

    return all_rows


rows = crawl_search("https://www.marktplaats.nl/q/iphone/", max_pages=5)
print("total unique:", len(rows))

Step 4: Export to CSV

import csv


def export_csv(rows: list[dict], path: str) -> None:
    fieldnames = ["title", "price", "location", "url"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fieldnames)
        w.writeheader()
        for r in rows:
            w.writerow({k: r.get(k) for k in fieldnames})


export_csv(rows, "marktplaats_listings.csv")
print("wrote marktplaats_listings.csv", len(rows))

Step 5: Route fetches through ProxiesAPI (optional)

When you scale scraping (more pages, more keywords, more frequent runs), failures come from the network layer:

  • intermittent timeouts
  • inconsistent responses
  • blocked requests

With ProxiesAPI you can fetch through a simple URL wrapper:

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://www.marktplaats.nl/q/iphone/" | head

In Python, wrap any target URL and reuse the same parsing code:

from urllib.parse import quote


def proxiesapi_wrap(target_url: str, api_key: str) -> str:
    # ProxiesAPI uses a simple querystring wrapper.
    # Keep the target URL URL-encoded.
    return f"http://api.proxiesapi.com/?key={api_key}&url={quote(target_url, safe='')}"


API_KEY = "API_KEY"
start = "https://www.marktplaats.nl/q/iphone/"
wrapped = proxiesapi_wrap(start, API_KEY)

html = fetch_html(wrapped)
print("bytes via proxies:", len(html))

Notice the win: your parser doesn’t change. Only the fetch URL changes.


Common pitfalls (and how to avoid them)

  1. No timeouts → crawls hang forever.
  2. Too-specific selectors → break as soon as the site A/B tests layout.
  3. No dedupe → pagination repeats items, exports get messy.
  4. Not saving raw HTML samples → debugging becomes guesswork.

A simple production habit: when a parse returns zero rows, save the HTML to a debug/ folder and inspect it.


QA checklist

  • First page returns a non-zero count of listings
  • URLs are absolute and unique
  • Pagination increases total rows
  • CSV opens cleanly in Excel/Sheets
  • ProxiesAPI wrapper fetch returns HTML (even if you don’t always need it)
Keep Marktplaats scraping stable with ProxiesAPI

As you move from one search page to dozens (and from one keyword to many), the network layer becomes your bottleneck. ProxiesAPI gives you a simple fetch URL wrapper so your Python extraction code stays focused on parsing — not blocks and flaky responses.

Related guides

Scrape Craigslist Listings by Category and City (Python + ProxiesAPI)
Build a Craigslist city+category scraper with pagination, dedupe, and CSV export. Includes selectors, anti-block hygiene, and screenshot proof.
tutorial#python#craigslist#web-scraping
Scrape UK Property Prices from Rightmove (Dataset Builder + Screenshots)
Build a repeatable Rightmove sold-price dataset pipeline in Python: crawl result pages, extract listing URLs, parse sold-price details, and export clean CSV/JSON with retries and politeness.
tutorial#python#rightmove#real-estate
Scrape UK Property Prices from Rightmove with Python (Sold Prices Dataset + Screenshots)
Build a Rightmove sold-prices dataset builder in Python: fetch HTML reliably, parse listing cards, follow pagination, enrich details pages, and export a clean CSV/JSONL. Includes proof screenshots and a resilient request layer with ProxiesAPI.
tutorial#python#rightmove#real-estate
Scrape Government Contract Opportunities from SAM.gov (Python + ProxiesAPI)
Build a reliable scraper for SAM.gov contract opportunities: crawl search results, paginate, extract listing cards, fetch detail pages, and export CSV/JSON. Includes retry logic and a screenshot step for proof.
tutorial#python#sam-gov#government-contracts