Scrape Secondhand Fashion Listings from Vinted with Python (Search + Pagination + Normalized Output)
Vinted is a great “real-world” scraping target because it combines:
- a marketplace-style listing grid (cards, images, price, condition)
- filters + search terms
- pagination/infinite scroll behavior
- anti-bot measures that punish sloppy crawling
In this tutorial you’ll build a scraper that:
- opens a Vinted search results page
- extracts listing cards (title, price, currency, size/brand when available, item URL, image URL)
- paginates through multiple pages
- normalizes results into clean JSON
- optionally exports CSV

Marketplaces rate-limit aggressively at scale. Keep your extraction logic the same and make reliability a property of your fetch layer (timeouts, retries, optional ProxiesAPI routing).
What we’re scraping (Vinted structure)
Vinted search results live under URLs like:
https://www.vinted.com/catalog?search_text=nike%20dunk
The page is heavily JavaScript-driven, so in practice you have two options:
- Browser automation (recommended): use Playwright to load the page, then extract listing card DOM.
- Reverse-engineer internal APIs: often brittle; may require cookies/tokens and will change without notice.
We’ll use Playwright because it’s the most consistently “works today” approach for JS-heavy marketplaces.
Setup
python3 -m venv .venv
source .venv/bin/activate
pip install playwright pandas
playwright install chromium
We’ll use:
playwrightfor reliable page rendering + extractionpandasfor easy CSV export (optional)
Step 1: A ProxiesAPI-ready fetch layer
Playwright can run without proxies, but you should still structure your code so routing is a configuration knob.
At minimum you want:
- consistent
User-Agent - timeouts
- a clean place to plug in proxy settings later
from __future__ import annotations
import os
from dataclasses import dataclass
@dataclass(frozen=True)
class CrawlConfig:
headless: bool = True
timeout_ms: int = 45_000
max_pages: int = 3
search_url: str = "https://www.vinted.com/catalog?search_text=nike%20dunk"
# Optional: route Chromium through an HTTP proxy.
# If you use ProxiesAPI as an upstream proxy, set this to your proxy URL.
# Example: http://USERNAME:PASSWORD@gateway.proxiesapi.com:port
proxy_server: str | None = os.environ.get("PROXY_SERVER")
Step 2: Extract listing cards (no guessing: print what you see)
The safest way to build selectors is:
- open the page
- identify a stable container that represents an item card
- extract fields relative to each card
Here’s a working pattern: select a card, then query for inner elements.
from playwright.sync_api import sync_playwright
def scrape_first_page(config: CrawlConfig) -> list[dict]:
results: list[dict] = []
with sync_playwright() as p:
browser = p.chromium.launch(
headless=config.headless,
proxy={"server": config.proxy_server} if config.proxy_server else None,
)
page = browser.new_page()
page.set_default_timeout(config.timeout_ms)
page.goto(config.search_url, wait_until="networkidle")
# Vinted frequently renders item cards as <article> elements.
# If this selector ever breaks, update it by inspecting the DOM again.
page.wait_for_selector("article")
cards = page.query_selector_all("article")
for card in cards:
# Defensive extraction: any field can be missing.
title = (card.inner_text() or "").splitlines()[0].strip() or None
a = card.query_selector("a")
href = a.get_attribute("href") if a else None
url = f"https://www.vinted.com{href}" if href and href.startswith("/") else href
img = card.query_selector("img")
image_url = img.get_attribute("src") if img else None
# Prices are usually visible text; pull the whole card text and let a parser refine it later.
text = card.inner_text() or ""
results.append(
{
"title": title,
"url": url,
"image_url": image_url,
"raw_text": text,
}
)
browser.close()
return results
if __name__ == "__main__":
cfg = CrawlConfig()
rows = scrape_first_page(cfg)
print("rows:", len(rows))
print(rows[0] if rows else None)
Why this works
Marketplaces change class names often, but they rarely stop rendering some kind of “card” element for each listing. Starting with broad “card-like” elements and then refining is more robust than anchoring to brittle classnames.
In a production scraper, you’d tighten selectors after inspecting the DOM (for example, selecting only cards that contain an a[href^="/items/"] link).
Step 3: Pagination (two practical approaches)
Vinted commonly paginates via:
- a “next” button, or
- a page query param, or
- infinite scroll that loads more cards
Playwright makes all three possible. Here are two patterns you can use.
Option A: Click “Next” (when it exists)
def click_next(page) -> bool:
next_button = page.query_selector('a[rel="next"], button:has-text("Next")')
if not next_button:
return False
next_button.click()
page.wait_for_load_state("networkidle")
return True
Option B: Infinite scroll (load N batches)
def scroll_to_load_more(page, batches: int = 3) -> None:
for _ in range(batches):
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
page.wait_for_timeout(1500)
Pick the one that matches what you see in the UI. The logic is the same: extract cards, then move forward, then extract again.
Step 4: Normalize output (extract price, currency, size, brand)
Because marketplaces render slightly differently per country/locale, normalize in a separate step.
Start with a conservative parser:
import re
PRICE_RE = re.compile(r"(\d+[\.,]?\d*)\s*([€$£]|EUR|USD|GBP)?")
def parse_price(text: str) -> tuple[float | None, str | None]:
m = PRICE_RE.search(text.replace("\n", " "))
if not m:
return None, None
value = float(m.group(1).replace(",", "."))
currency = m.group(2) or None
return value, currency
def normalize(rows: list[dict]) -> list[dict]:
out = []
for r in rows:
value, currency = parse_price(r.get("raw_text") or "")
out.append(
{
"title": r.get("title"),
"url": r.get("url"),
"image_url": r.get("image_url"),
"price": value,
"currency": currency,
}
)
return out
Then export JSON + CSV:
import json
import pandas as pd
data = normalize(scrape_first_page(CrawlConfig()))
with open("vinted_items.json", "w", encoding="utf-8") as f:
json.dump(data, f, ensure_ascii=False, indent=2)
pd.DataFrame(data).to_csv("vinted_items.csv", index=False)
Practical anti-blocking basics (don’t get rate-limited instantly)
- Cache aggressively: don’t re-fetch the same search pages.
- Bound your crawl: keep
max_pagessmall while developing. - Add random delays: 0.8–2.0s between navigations is a reasonable start.
- Retry with backoff: transient failures are normal.
- Use proxies when scaling: not as a band-aid for broken code, but as a stability tool.
Wrap-up
You now have a Vinted scraper that:
- extracts listing cards from search results
- supports pagination patterns
- normalizes output into JSON/CSV
Next upgrades (worth doing once you’ve validated a small crawl):
- tighten selectors to match only listing cards
- deduplicate items by URL/ID
- add structured extraction (size, brand, condition) based on the real DOM fields you see
- integrate a proxy layer when you scale beyond a handful of pages
Marketplaces rate-limit aggressively at scale. Keep your extraction logic the same and make reliability a property of your fetch layer (timeouts, retries, optional ProxiesAPI routing).