Scrape Secondhand Fashion Listings from Vinted (Python + ProxiesAPI)

Vinted is one of the most popular secondhand fashion marketplaces in Europe. The listings are valuable for:

  • tracking price trends for brands (Nike / Zara / Patagonia)
  • monitoring inventory and sell-through
  • building “deal finders” and alerts

In this tutorial we’ll scrape Vinted search results pages and extract:

  • listing title
  • price + currency
  • brand (when present)
  • item condition / size (often present in the card)
  • product URL
  • image URL
  • pagination across multiple pages

We’ll do it with Python + requests + BeautifulSoup, and we’ll show exactly where ProxiesAPI fits in (honestly) to keep requests stable when you crawl deeper.

Vinted search results page we will parse (cards + pagination)

Make Vinted scraping reliable with ProxiesAPI

Marketplaces throttle quickly when you paginate. ProxiesAPI gives you a stable proxy layer (plus easy rotation) so your scraper keeps running when you scale from 1 page to 100.


Important note (be responsible)

Before you scrape any marketplace:

  • read the site’s Terms
  • keep your request rate low
  • don’t scrape personal data
  • use caching (don’t re-fetch the same pages)

This guide is for public listing data and educational purposes.


What we’re scraping (URL + structure)

Vinted search pages typically look like:

  • https://www.vinted.com/catalog?search_text=nike%20air%20max

You may also see country-specific domains (and localized paths). The HTML can vary by region, and parts of the site may be rendered by JS.

Two practical strategies:

  1. Start with HTML parsing (fast, cheap) and only fall back to browser automation if the data isn’t present.
  2. Prefer scraping search results over item detail pages first. You can collect item URLs, then selectively enrich details later.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

  • requests for fetching
  • BeautifulSoup(lxml) for parsing HTML reliably

Step 1: Fetch a search page (timeouts + headers)

Marketplaces often block “default” HTTP clients. You want:

  • real timeouts (no hanging)
  • a realistic User-Agent
  • retry handling for 403/429/5xx
import time
import random
import requests

TIMEOUT = (10, 30)  # connect, read

DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
}

session = requests.Session()


def fetch(url: str) -> str:
    r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT)
    r.raise_for_status()
    return r.text


url = "https://www.vinted.com/catalog?search_text=nike%20air%20max"
html = fetch(url)
print("bytes:", len(html))
print(html[:200])

If you immediately hit 403/429, skip ahead to the ProxiesAPI + retries section.


Step 2: Parse listing cards with robust selectors

Vinted’s DOM changes over time. Instead of hard-coding a single fragile selector, we’ll:

  • locate “card-like” anchors that link to items
  • extract title/price/image from within that card
  • keep parsing defensive (missing fields should not crash)
import re
from bs4 import BeautifulSoup
from urllib.parse import urljoin

BASE = "https://www.vinted.com"


def clean_text(x: str | None) -> str | None:
    if not x:
        return None
    t = re.sub(r"\s+", " ", x).strip()
    return t or None


def parse_price(text: str | None):
    """Return (amount, currency) when possible."""
    if not text:
        return None, None

    # Examples vary: "€12.00", "12,00 €", "£10", etc.
    t = text.replace("\xa0", " ").strip()

    # currency symbol first
    m = re.search(r"([€£$])\s*([0-9]+(?:[\.,][0-9]{1,2})?)", t)
    if m:
        return float(m.group(2).replace(",", ".")), m.group(1)

    # currency symbol last
    m = re.search(r"([0-9]+(?:[\.,][0-9]{1,2})?)\s*([€£$])", t)
    if m:
        return float(m.group(1).replace(",", ".")), m.group(2)

    return None, None


def parse_search_results(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    out = []

    # Heuristic: item links often contain "/items/".
    # We collect unique item URLs.
    seen = set()
    for a in soup.select('a[href*="/items/"]'):
        href = a.get("href")
        if not href:
            continue

        item_url = href if href.startswith("http") else urljoin(BASE, href)
        if item_url in seen:
            continue
        seen.add(item_url)

        # Title: sometimes in aria-label, sometimes in nested text.
        title = clean_text(a.get("title") or a.get("aria-label"))

        # Image: look for an <img> inside the anchor
        img = a.select_one("img")
        img_url = img.get("src") or img.get("data-src") if img else None

        # Price: look for obvious price-like text inside the card
        price_text = None
        # Common approach: scan for elements that include currency symbols
        for el in a.select("*"):
            t = el.get_text(" ", strip=True)
            if t and any(sym in t for sym in ["€", "£", "$"]):
                price_text = t
                break

        price_amount, price_currency = parse_price(price_text)

        out.append({
            "title": title,
            "price": price_amount,
            "currency": price_currency,
            "price_text": price_text,
            "url": item_url,
            "image": img_url,
        })

    return out


items = parse_search_results(html)
print("items:", len(items))
print(items[:2])

Why this works

  • We anchor on item URLs (/items/) which are less likely to change than CSS class names.
  • We keep extraction “best-effort” (some cards will be missing brand/size/condition).
  • We store price_text for debugging so you can adjust parsing if the UI changes.

Step 3: Pagination (crawl N result pages)

Vinted pagination patterns vary. A common way is adding a page parameter.

We’ll implement crawling that:

  • starts from a base search URL
  • increments page=1..N
  • deduplicates item URLs across pages
from urllib.parse import urlencode, urlparse, parse_qs, urlunparse


def with_query(url: str, **params) -> str:
    u = urlparse(url)
    q = parse_qs(u.query)
    for k, v in params.items():
        q[k] = [str(v)]
    new_query = urlencode(q, doseq=True)
    return urlunparse((u.scheme, u.netloc, u.path, u.params, new_query, u.fragment))


def crawl_search(base_url: str, pages: int = 3, sleep_range=(0.8, 1.8)) -> list[dict]:
    all_items = []
    seen = set()

    for p in range(1, pages + 1):
        url = with_query(base_url, page=p)
        html = fetch(url)
        batch = parse_search_results(html)

        for it in batch:
            if not it.get("url") or it["url"] in seen:
                continue
            seen.add(it["url"])
            all_items.append(it)

        print(f"page {p}: {len(batch)} items (unique total: {len(all_items)})")
        time.sleep(random.uniform(*sleep_range))

    return all_items


base = "https://www.vinted.com/catalog?search_text=nike%20air%20max"
data = crawl_search(base, pages=5)
print("unique:", len(data))

Step 4: Export to CSV

import csv


def export_csv(items: list[dict], path: str = "vinted_items.csv"):
    fields = ["title", "price", "currency", "price_text", "url", "image"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields)
        w.writeheader()
        for it in items:
            w.writerow({k: it.get(k) for k in fields})


export_csv(data)
print("wrote vinted_items.csv")

Add ProxiesAPI: retries + rotation (where it actually helps)

If you crawl beyond a couple pages, you’ll likely see:

  • 403 Forbidden (blocked)
  • 429 Too Many Requests (rate limited)
  • intermittent connection resets

This is where a proxy layer helps.

The exact ProxiesAPI endpoint format depends on your account configuration, but the pattern is the same:

  • your code calls a single proxy URL
  • ProxiesAPI routes to the destination site using a rotating pool
  • you keep your parsing logic unchanged

Here’s a clean way to structure it so you can switch between direct and proxied requests.

import os

PROXIESAPI_PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")  # e.g. "http://user:pass@gateway.proxiesapi.com:port"


def get_proxies():
    if not PROXIESAPI_PROXY_URL:
        return None
    return {
        "http": PROXIESAPI_PROXY_URL,
        "https": PROXIESAPI_PROXY_URL,
    }


def fetch_with_retries(url: str, tries: int = 5) -> str:
    last_err = None

    for attempt in range(1, tries + 1):
        try:
            r = session.get(
                url,
                headers=DEFAULT_HEADERS,
                timeout=TIMEOUT,
                proxies=get_proxies(),
            )

            # common retry statuses
            if r.status_code in (403, 429, 500, 502, 503, 504):
                raise requests.HTTPError(f"status {r.status_code}")

            r.raise_for_status()
            return r.text

        except Exception as e:
            last_err = e
            backoff = min(20, 2 ** attempt) + random.random()
            print(f"attempt {attempt}/{tries} failed: {e}; sleeping {backoff:.1f}s")
            time.sleep(backoff)

    raise RuntimeError(f"failed after {tries} tries: {last_err}")

To use it, replace fetch() inside crawl_search() with fetch_with_retries().

Practical settings

  • keep concurrency low (1–2) unless you have a strong reason
  • random sleep between pages
  • cache HTML responses when iterating on selectors

QA checklist

  • Collected item URLs look valid and open in browser
  • Price parsing returns numbers for most cards
  • Exported CSV opens cleanly in Sheets
  • Pagination doesn’t duplicate the same items
  • Retries work (you can simulate by temporarily blocking your IP / lowering rate limits)

Next upgrades

  • enrich each item URL with a detail-page parser (brand, condition, seller stats)
  • store results in SQLite for incremental runs
  • add a “changed since last run” diff (so you don’t re-alert on old listings)

If you want, I can adapt the parser to a specific Vinted region + query so the selectors match what you see in your browser.

Make Vinted scraping reliable with ProxiesAPI

Marketplaces throttle quickly when you paginate. ProxiesAPI gives you a stable proxy layer (plus easy rotation) so your scraper keeps running when you scale from 1 page to 100.

Related guides

Scrape Vinted Listings with Python: Search, Prices, Images, and Pagination
Build a dataset from Vinted search results (title, price, size, condition, seller, images) with a production-minded Python scraper + a proxy-backed fetch layer via ProxiesAPI.
tutorial#python#vinted#ecommerce
Scrape Products from Amazon (Python) — Title, Price, Rating + Pagination
Build an Amazon product-list scraper in Python that extracts title, URL, ASIN, price, and rating across multiple result pages. Includes retries, headers, and a ProxiesAPI-ready request wrapper.
tutorial#python#amazon#ecommerce
Scrape Vinted Listings with Python: Search → Listings → Images (with ProxiesAPI)
Build a production-grade Vinted scraper: run a search, paginate results, fetch listing detail pages, and extract image URLs reliably. Includes a target-page screenshot and ProxiesAPI integration.
tutorial#python#vinted#web-scraping
Scrape TripAdvisor Hotel Reviews with Python (Pagination + Rate Limits)
Extract TripAdvisor hotel review text, ratings, dates, and reviewer metadata with a resilient Python scraper (pagination, retries, and a proxy-backed fetch layer via ProxiesAPI).
tutorial#python#tripadvisor#reviews