Scrape Marktplaats Seller Listings and Prices with Python
If you already know which seller you care about, scraping a seller page is often better than scraping broad search results.
A seller page gives you:
- a tighter inventory list
- a cleaner pricing view
- listing URLs tied to one merchant or store
- useful seller metadata in the page source
In this guide we'll scrape a real Marktplaats seller page and collect:
- seller name
- listing title
- listing price
- listing URL
- city or location when present

Marktplaats seller pages are workable over plain HTTP, but larger inventory crawls fail because of retries, throttling, and IP reputation. ProxiesAPI helps keep the fetch step predictable while your parser stays simple.
What we're scraping
Marktplaats seller pages usually look like this:
https://www.marktplaats.nl/u/fietshokje/11746360/
For this walkthrough we'll use a seller inventory page that exposes multiple bike listings in the HTML:
https://www.marktplaats.nl/u/fietshokje-groningen/11746360/q/fiets/
In the live response, you can verify useful fields are embedded directly in the HTML:
sellerNamepriceInfovipUrlsellerId
That means we can scrape this page without rendering a browser session just to get the first pass of inventory data.
Setup
python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml
We'll also use:
csvfrom the standard library for exportreto extract embedded values cleanly
Step 1: Fetch the seller page
import requests
SELLER_URL = "https://www.marktplaats.nl/u/fietshokje-groningen/11746360/q/fiets/"
TIMEOUT = (10, 30)
session = requests.Session()
session.headers.update(
{
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/126.0.0.0 Safari/537.36"
),
"Accept-Language": "nl-NL,nl;q=0.9,en;q=0.8",
}
)
def fetch_html(url: str) -> str:
response = session.get(url, timeout=TIMEOUT)
response.raise_for_status()
return response.text
html = fetch_html(SELLER_URL)
print("downloaded", len(html), "chars")
Terminal sanity check:
curl -L -s "https://www.marktplaats.nl/u/fietshokje-groningen/11746360/q/fiets/" | head -n 6
You should see a title like:
<title>≥ FIETSHOKJE - Advertenties op Marktplaats</title>
Step 2: Choose the right extraction strategy
Marktplaats is a modern app, but the seller page already includes serialized listing data in the server response.
That gives you two options:
- scrape visible HTML cards
- parse the embedded listing data
For seller inventory, the second approach is usually cleaner. We can search the response for repeated listing objects containing:
itemIdtitlepriceInfosellerInformationvipUrl
This is still web scraping. We are just reading structured data that the page already ships to the browser.
Step 3: Extract seller listings
import re
from urllib.parse import urljoin
BASE = "https://www.marktplaats.nl"
def cents_to_eur(price_cents: int | None) -> str | None:
if price_cents is None:
return None
euros = price_cents / 100
return f"EUR {euros:,.2f}"
LISTING_PATTERN = re.compile(
r'"itemId":"(?P<item_id>[^"]+)"'
r'.{0,600}?'
r'"title":"(?P<title>[^"]+)"'
r'.{0,1200}?'
r'"priceInfo":\{"priceCents":(?P<price_cents>\d+),"priceType":"(?P<price_type>[^"]+)"'
r'.{0,800}?'
r'"cityName":"(?P<city>[^"]+)"'
r'.{0,1200}?'
r'"sellerName":"(?P<seller_name>[^"]+)"'
r'.{0,1600}?'
r'"vipUrl":"(?P<vip_url>[^"]+)"',
re.DOTALL,
)
def parse_listings(html: str) -> list[dict]:
rows = []
seen = set()
for match in LISTING_PATTERN.finditer(html):
data = match.groupdict()
url = urljoin(BASE, data["vip_url"])
if url in seen:
continue
rows.append(
{
"item_id": data["item_id"],
"seller_name": data["seller_name"],
"title": data["title"],
"price": cents_to_eur(int(data["price_cents"])),
"price_type": data["price_type"],
"city": data["city"],
"url": url,
}
)
seen.add(url)
return rows
rows = parse_listings(html)
print("parsed", len(rows), "rows")
print(rows[:3])
Why regex is acceptable here
Normally I prefer parsing JSON rather than regexing HTML.
But on seller pages like this one, the listing payload is embedded as repeated serialized fragments inside a much larger document. For a tutorial, a bounded regex is a practical way to:
- prove the fields exist
- keep dependencies light
- extract exactly the fields you care about
If you want a more production-grade parser, your next step is to locate the full serialized object and load it with json.loads.
Step 4: Export seller inventory to CSV
import csv
def export_csv(rows: list[dict], path: str) -> None:
fieldnames = [
"item_id",
"seller_name",
"title",
"price",
"price_type",
"city",
"url",
]
with open(path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
export_csv(rows, "marktplaats_seller_inventory.csv")
print("saved", len(rows), "rows to CSV")
You can now sort or filter by:
- fixed price vs bid listings
- city
- seller branch name
- title keywords
That is often enough for inventory monitoring, resale analysis, or competitor tracking.
Full script
import csv
import re
import requests
from urllib.parse import urljoin
SELLER_URL = "https://www.marktplaats.nl/u/fietshokje-groningen/11746360/q/fiets/"
BASE = "https://www.marktplaats.nl"
TIMEOUT = (10, 30)
session = requests.Session()
session.headers.update(
{
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/126.0.0.0 Safari/537.36"
),
"Accept-Language": "nl-NL,nl;q=0.9,en;q=0.8",
}
)
LISTING_PATTERN = re.compile(
r'"itemId":"(?P<item_id>[^"]+)"'
r'.{0,600}?'
r'"title":"(?P<title>[^"]+)"'
r'.{0,1200}?'
r'"priceInfo":\{"priceCents":(?P<price_cents>\d+),"priceType":"(?P<price_type>[^"]+)"'
r'.{0,800}?'
r'"cityName":"(?P<city>[^"]+)"'
r'.{0,1200}?'
r'"sellerName":"(?P<seller_name>[^"]+)"'
r'.{0,1600}?'
r'"vipUrl":"(?P<vip_url>[^"]+)"',
re.DOTALL,
)
def fetch_html(url):
response = session.get(url, timeout=TIMEOUT)
response.raise_for_status()
return response.text
def cents_to_eur(price_cents):
return f"EUR {price_cents / 100:,.2f}"
def parse_listings(html):
rows = []
seen = set()
for match in LISTING_PATTERN.finditer(html):
data = match.groupdict()
url = urljoin(BASE, data["vip_url"])
if url in seen:
continue
rows.append(
{
"item_id": data["item_id"],
"seller_name": data["seller_name"],
"title": data["title"],
"price": cents_to_eur(int(data["price_cents"])),
"price_type": data["price_type"],
"city": data["city"],
"url": url,
}
)
seen.add(url)
return rows
def export_csv(rows, path):
fieldnames = [
"item_id",
"seller_name",
"title",
"price",
"price_type",
"city",
"url",
]
with open(path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
def main():
html = fetch_html(SELLER_URL)
rows = parse_listings(html)
export_csv(rows, "marktplaats_seller_inventory.csv")
print(f"saved {len(rows)} rows")
if __name__ == "__main__":
main()
Practical improvements
Once the base seller scraper works, the best upgrades are:
- crawl multiple seller URLs from a seed file
- keep
item_idas your stable primary key - alert when a listing disappears or the price changes
- store the raw HTML for debugging when parsing fails
You can also combine this with a search-page scraper:
- search results discover new sellers
- seller pages monitor the inventory you care about repeatedly
That division keeps the monitoring job much cheaper.
Where ProxiesAPI fits
Scraping one seller page once is easy.
Scraping hundreds of seller pages every day is not. That is when you start dealing with:
- retries
- inconsistent responses
- network-level throttling
- regional variance
ProxiesAPI is useful at that stage because it improves the fetch layer without forcing you to rewrite the parser. Your extraction logic stays the same, but the crawl is less fragile when you move from one URL to many.
That is the honest value proposition: not magic extraction, just a steadier network path for recurring scraping jobs.
Marktplaats seller pages are workable over plain HTTP, but larger inventory crawls fail because of retries, throttling, and IP reputation. ProxiesAPI helps keep the fetch step predictable while your parser stays simple.