Scrape Secondhand Fashion Listings from Vinted

Jul 07, 2026 · tutorial · #python, #vinted, #web-scraping, #playwright, #json, #csv, #proxies, #marketplace

Vinted is a great “real-world” scraping target because it combines:

a marketplace-style listing grid (cards, images, price, condition)
filters + search terms
pagination/infinite scroll behavior
anti-bot measures that punish sloppy crawling

In this tutorial you’ll build a scraper that:

opens a Vinted search results page
extracts listing cards (title, price, currency, brand, size, item URL, image URL)
paginates through multiple pages
normalizes results into clean JSON
optionally exports CSV

Keep crawls stable with ProxiesAPI when volume grows

Marketplaces rate-limit aggressively at scale. Keep your extraction logic the same and make reliability a property of your fetch layer (timeouts, retries, optional ProxiesAPI routing).

Get 1,000 free API calls View pricing

What we’re scraping (Vinted structure)

Vinted search results live under URLs like:

https://www.vinted.com/catalog?search_text=nike%20dunk

The page is heavily JavaScript-driven, so in practice you have two options:

Browser automation (recommended): use Playwright to load the page, then extract listing card DOM.
Reverse-engineer internal APIs: often brittle; may require cookies/tokens and will change without notice.

We’ll use Playwright because it’s the most consistently “works today” approach for JS-heavy marketplaces.

At the time of writing, a rendered search page exposes stable hooks such as:

div[data-testid="grid-item"] for each search result tile
img[data-testid$="--image--img"] for the primary item photo
[data-testid$="--description-title"] for the visible listing title
[data-testid$="--description-subtitle"] for the size / brand / condition line

That is much better than scraping anonymous CSS classes.

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install playwright pandas
playwright install chromium

We’ll use:

playwright for reliable page rendering + extraction
pandas for easy CSV export (optional)

Step 1: A ProxiesAPI-ready fetch layer

Playwright can run without proxies, but you should still structure your code so routing is a configuration knob.

At minimum you want:

consistent User-Agent
timeouts
a clean place to plug in proxy settings later

from __future__ import annotations

import os
from dataclasses import dataclass


@dataclass(frozen=True)
class CrawlConfig:
    headless: bool = True
    timeout_ms: int = 45_000
    max_pages: int = 3
    search_url: str = "https://www.vinted.com/catalog?search_text=nike%20dunk"

    # Optional: route Chromium through an HTTP proxy.
    # If you use ProxiesAPI as an upstream proxy, set this to your proxy URL.
    # Example: http://USERNAME:PASSWORD@gateway.proxiesapi.com:port
    proxy_server: str | None = os.environ.get("PROXY_SERVER")

Step 2: Extract listing cards with the real rendered selectors

The safest way to build selectors is:

open the page
identify a stable container that represents an item card
extract fields relative to each card

Here’s a working pattern using the selectors visible in the rendered DOM.

from urllib.parse import urlencode

from playwright.sync_api import sync_playwright

BASE = "https://www.vinted.com"


def build_search_url(query: str, page: int = 1) -> str:
    params = {"search_text": query, "page": page}
    return f"{BASE}/catalog?{urlencode(params)}"


def clean_text(value: str | None) -> str | None:
    if not value:
        return None
    value = " ".join(value.split())
    return value or None


def parse_subtitle(text: str | None) -> tuple[str | None, str | None, str | None]:
    if not text:
        return None, None, None

    parts = [part.strip() for part in text.split("·") if part.strip()]

    size = parts[0] if len(parts) > 0 else None
    brand = parts[1] if len(parts) > 1 else None
    condition = parts[2] if len(parts) > 2 else None
    return size, brand, condition


def scrape_page(config: CrawlConfig, query: str, page_number: int) -> list[dict]:
    results: list[dict] = []

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=config.headless,
            proxy={"server": config.proxy_server} if config.proxy_server else None,
        )
        page = browser.new_page()
        page.set_default_timeout(config.timeout_ms)

        page.goto(build_search_url(query, page=page_number), wait_until="networkidle")

        page.wait_for_selector('[data-testid="grid-item"]')
        cards = page.locator('[data-testid="grid-item"]')

        for idx in range(cards.count()):
            card = cards.nth(idx)

            href = card.locator('a[href*="/items/"]').first.get_attribute("href")
            url = f"{BASE}{href}" if href and href.startswith("/") else href

            image_url = card.locator('img[data-testid$="--image--img"]').first.get_attribute("src")
            title = clean_text(card.locator('[data-testid$="--description-title"]').first.text_content())
            subtitle = clean_text(card.locator('[data-testid$="--description-subtitle"]').first.text_content())
            size, brand, condition = parse_subtitle(subtitle)
            text = clean_text(card.inner_text())

            results.append(
                {
                    "title": title,
                    "brand": brand,
                    "size": size,
                    "condition": condition,
                    "url": url,
                    "image_url": image_url,
                    "raw_text": text,
                }
            )

        browser.close()

    return results


if __name__ == "__main__":
    cfg = CrawlConfig()
    rows = scrape_page(cfg, query="patagonia fleece", page_number=1)
    print("rows:", len(rows))
    print(rows[0] if rows else None)

Why this works

The data-testid hooks are tied to product tiles, not to presentation-only CSS. That makes the scraper less fragile than targeting generated class names.

Step 3: Pagination (prefer the public `page=` parameter)

Vinted commonly paginates via:

a “next” button, or
a page query param, or
infinite scroll that loads more cards

For repeatable jobs, the cleanest option is to drive the public page= query parameter yourself instead of clicking the UI.

def scrape_many_pages(query: str, max_pages: int = 3) -> list[dict]:
    cfg = CrawlConfig(max_pages=max_pages)
    seen_urls: set[str] = set()
    rows: list[dict] = []

    for page_number in range(1, cfg.max_pages + 1):
        batch = scrape_page(cfg, query=query, page_number=page_number)
        print(f"page={page_number} rows={len(batch)}")

        for row in batch:
            if not row["url"] or row["url"] in seen_urls:
                continue
            seen_urls.add(row["url"])
            rows.append(row)

    return rows

Step 4: Normalize output (extract price, currency, size, brand)

Because marketplaces render slightly differently per country/locale, normalize in a separate step.

Start with a conservative parser:

import re


PRICE_RE = re.compile(r"(\d+[\.,]?\d*)\s*([€$£]|EUR|USD|GBP)?")


def parse_price(text: str) -> tuple[float | None, str | None]:
    m = PRICE_RE.search(text.replace("\n", " "))
    if not m:
        return None, None
    value = float(m.group(1).replace(",", "."))
    currency = m.group(2) or None
    return value, currency


def normalize(rows: list[dict]) -> list[dict]:
    out = []
    for r in rows:
        value, currency = parse_price(r.get("raw_text") or "")
        out.append(
            {
                "title": r.get("title"),
                "brand": r.get("brand"),
                "size": r.get("size"),
                "condition": r.get("condition"),
                "url": r.get("url"),
                "image_url": r.get("image_url"),
                "price": value,
                "currency": currency,
            }
        )
    return out

Then export JSON + CSV:

import json
import pandas as pd


data = normalize(scrape_many_pages("patagonia fleece", max_pages=3))

with open("vinted_items.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

pd.DataFrame(data).to_csv("vinted_items.csv", index=False)

Practical anti-blocking basics (don’t get rate-limited instantly)

Cache aggressively: don’t re-fetch the same search pages.
Bound your crawl: keep max_pages small while developing.
Add random delays: 0.8–2.0s between navigations is a reasonable start.
Retry with backoff: transient failures are normal.
Use proxies when scaling: not as a band-aid for broken code, but as a stability tool.

Wrap-up

You now have a Vinted scraper that:

extracts listing cards from search results
supports pagination patterns
normalizes output into JSON/CSV

Next upgrades (worth doing once you’ve validated a small crawl):

deduplicate items by URL/ID
enrich detail pages for seller metadata and long descriptions
integrate a proxy layer when you scale beyond a handful of pages

Keep crawls stable with ProxiesAPI when volume grows

Marketplaces rate-limit aggressively at scale. Keep your extraction logic the same and make reliability a property of your fetch layer (timeouts, retries, optional ProxiesAPI routing).

Get 1,000 free API calls View pricing

Show how to collect Vinted search listings, prices, brands, and image URLs into a resale market dataset with Python and an optional ProxiesAPI fetch layer.

tutorial#python#vinted#web-scraping

Scrape Vinted Listings with Python: Search + Pagination + Clean CSV Export

Build a practical Vinted listings scraper: pull search results via Vinted’s internal catalog endpoint, paginate safely, extract price/brand/size/image URLs, and export a clean CSV. Includes a screenshot + ProxiesAPI integration.

tutorial#vinted#python#web-scraping

Scrape Shopee Seller Storefronts and Top Products with Python

Collect seller metadata and top-product signals from public Shopee storefronts using a browser-assisted workflow, bootstrap data extraction, and ProxiesAPI-backed requests.

tutorial#python#shopee#playwright

Scrape Book Data from Goodreads

Build a Goodreads dataset with book titles, authors, ratings, and review counts from a public list page using Python and an optional ProxiesAPI fetch layer.

tutorial#python#goodreads#books

Scrape Secondhand Fashion Listings from Vinted

Related guides