Scrape Secondhand Fashion Listings from Vinted with Python (Search + Pagination + Normalized Output)

Vinted is a great “real-world” scraping target because it combines:

  • a marketplace-style listing grid (cards, images, price, condition)
  • filters + search terms
  • pagination/infinite scroll behavior
  • anti-bot measures that punish sloppy crawling

In this tutorial you’ll build a scraper that:

  • opens a Vinted search results page
  • extracts listing cards (title, price, currency, size/brand when available, item URL, image URL)
  • paginates through multiple pages
  • normalizes results into clean JSON
  • optionally exports CSV

Vinted search results page (we’ll scrape listing cards)

Keep crawls stable with ProxiesAPI when volume grows

Marketplaces rate-limit aggressively at scale. Keep your extraction logic the same and make reliability a property of your fetch layer (timeouts, retries, optional ProxiesAPI routing).


What we’re scraping (Vinted structure)

Vinted search results live under URLs like:

  • https://www.vinted.com/catalog?search_text=nike%20dunk

The page is heavily JavaScript-driven, so in practice you have two options:

  1. Browser automation (recommended): use Playwright to load the page, then extract listing card DOM.
  2. Reverse-engineer internal APIs: often brittle; may require cookies/tokens and will change without notice.

We’ll use Playwright because it’s the most consistently “works today” approach for JS-heavy marketplaces.


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install playwright pandas
playwright install chromium

We’ll use:

  • playwright for reliable page rendering + extraction
  • pandas for easy CSV export (optional)

Step 1: A ProxiesAPI-ready fetch layer

Playwright can run without proxies, but you should still structure your code so routing is a configuration knob.

At minimum you want:

  • consistent User-Agent
  • timeouts
  • a clean place to plug in proxy settings later
from __future__ import annotations

import os
from dataclasses import dataclass


@dataclass(frozen=True)
class CrawlConfig:
    headless: bool = True
    timeout_ms: int = 45_000
    max_pages: int = 3
    search_url: str = "https://www.vinted.com/catalog?search_text=nike%20dunk"

    # Optional: route Chromium through an HTTP proxy.
    # If you use ProxiesAPI as an upstream proxy, set this to your proxy URL.
    # Example: http://USERNAME:PASSWORD@gateway.proxiesapi.com:port
    proxy_server: str | None = os.environ.get("PROXY_SERVER")

Step 2: Extract listing cards (no guessing: print what you see)

The safest way to build selectors is:

  1. open the page
  2. identify a stable container that represents an item card
  3. extract fields relative to each card

Here’s a working pattern: select a card, then query for inner elements.

from playwright.sync_api import sync_playwright


def scrape_first_page(config: CrawlConfig) -> list[dict]:
    results: list[dict] = []

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=config.headless,
            proxy={"server": config.proxy_server} if config.proxy_server else None,
        )
        page = browser.new_page()
        page.set_default_timeout(config.timeout_ms)

        page.goto(config.search_url, wait_until="networkidle")

        # Vinted frequently renders item cards as <article> elements.
        # If this selector ever breaks, update it by inspecting the DOM again.
        page.wait_for_selector("article")
        cards = page.query_selector_all("article")

        for card in cards:
            # Defensive extraction: any field can be missing.
            title = (card.inner_text() or "").splitlines()[0].strip() or None

            a = card.query_selector("a")
            href = a.get_attribute("href") if a else None
            url = f"https://www.vinted.com{href}" if href and href.startswith("/") else href

            img = card.query_selector("img")
            image_url = img.get_attribute("src") if img else None

            # Prices are usually visible text; pull the whole card text and let a parser refine it later.
            text = card.inner_text() or ""

            results.append(
                {
                    "title": title,
                    "url": url,
                    "image_url": image_url,
                    "raw_text": text,
                }
            )

        browser.close()

    return results


if __name__ == "__main__":
    cfg = CrawlConfig()
    rows = scrape_first_page(cfg)
    print("rows:", len(rows))
    print(rows[0] if rows else None)

Why this works

Marketplaces change class names often, but they rarely stop rendering some kind of “card” element for each listing. Starting with broad “card-like” elements and then refining is more robust than anchoring to brittle classnames.

In a production scraper, you’d tighten selectors after inspecting the DOM (for example, selecting only cards that contain an a[href^="/items/"] link).


Step 3: Pagination (two practical approaches)

Vinted commonly paginates via:

  • a “next” button, or
  • a page query param, or
  • infinite scroll that loads more cards

Playwright makes all three possible. Here are two patterns you can use.

Option A: Click “Next” (when it exists)

def click_next(page) -> bool:
    next_button = page.query_selector('a[rel="next"], button:has-text("Next")')
    if not next_button:
        return False
    next_button.click()
    page.wait_for_load_state("networkidle")
    return True

Option B: Infinite scroll (load N batches)

def scroll_to_load_more(page, batches: int = 3) -> None:
    for _ in range(batches):
        page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
        page.wait_for_timeout(1500)

Pick the one that matches what you see in the UI. The logic is the same: extract cards, then move forward, then extract again.


Step 4: Normalize output (extract price, currency, size, brand)

Because marketplaces render slightly differently per country/locale, normalize in a separate step.

Start with a conservative parser:

import re


PRICE_RE = re.compile(r"(\d+[\.,]?\d*)\s*([€$£]|EUR|USD|GBP)?")


def parse_price(text: str) -> tuple[float | None, str | None]:
    m = PRICE_RE.search(text.replace("\n", " "))
    if not m:
        return None, None
    value = float(m.group(1).replace(",", "."))
    currency = m.group(2) or None
    return value, currency


def normalize(rows: list[dict]) -> list[dict]:
    out = []
    for r in rows:
        value, currency = parse_price(r.get("raw_text") or "")
        out.append(
            {
                "title": r.get("title"),
                "url": r.get("url"),
                "image_url": r.get("image_url"),
                "price": value,
                "currency": currency,
            }
        )
    return out

Then export JSON + CSV:

import json
import pandas as pd


data = normalize(scrape_first_page(CrawlConfig()))

with open("vinted_items.json", "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

pd.DataFrame(data).to_csv("vinted_items.csv", index=False)

Practical anti-blocking basics (don’t get rate-limited instantly)

  • Cache aggressively: don’t re-fetch the same search pages.
  • Bound your crawl: keep max_pages small while developing.
  • Add random delays: 0.8–2.0s between navigations is a reasonable start.
  • Retry with backoff: transient failures are normal.
  • Use proxies when scaling: not as a band-aid for broken code, but as a stability tool.

Wrap-up

You now have a Vinted scraper that:

  • extracts listing cards from search results
  • supports pagination patterns
  • normalizes output into JSON/CSV

Next upgrades (worth doing once you’ve validated a small crawl):

  • tighten selectors to match only listing cards
  • deduplicate items by URL/ID
  • add structured extraction (size, brand, condition) based on the real DOM fields you see
  • integrate a proxy layer when you scale beyond a handful of pages
Keep crawls stable with ProxiesAPI when volume grows

Marketplaces rate-limit aggressively at scale. Keep your extraction logic the same and make reliability a property of your fetch layer (timeouts, retries, optional ProxiesAPI routing).

Related guides

Scrape Vinted Listings with Python: Search + Pagination + Clean CSV Export
Build a practical Vinted listings scraper: pull search results via Vinted’s internal catalog endpoint, paginate safely, extract price/brand/size/image URLs, and export a clean CSV. Includes a screenshot + ProxiesAPI integration.
tutorial#vinted#python#web-scraping
Scrape Book Data from Goodreads with Python (List Pages + Pagination)
Scrape Goodreads list pages for title/author/rating/reviews with Python: fetch via ProxiesAPI, parse real HTML selectors, paginate safely, and export CSV/JSON.
tutorial#python#goodreads#books
Scrape Live Stock Data from Yahoo Finance with Python (Quotes + Key Stats)
A resilient Yahoo Finance scraper in Python: fetch quote pages via ProxiesAPI, extract live-ish quote fields + key stats from embedded JSON, handle retries, and export to CSV.
tutorial#python#yahoo-finance#stocks
Scrape Government Contract Data from SAM.gov (Opportunities + Details)
Build an end-to-end SAM.gov scraper: search opportunities, paginate results, fetch detail pages, normalize fields, and export JSON/CSV using ProxiesAPI. Includes screenshots + robust retry patterns.
tutorial#python#sam-gov#government