Scrape Secondhand Fashion Listings from Vinted (Python + ProxiesAPI)

May 14, 2026 · tutorial · #python, #vinted, #web-scraping, #requests, #beautifulsoup, #pagination, #proxies

Vinted is one of the most popular secondhand fashion marketplaces in Europe. The listings are valuable for:

tracking price trends for brands (Nike / Zara / Patagonia)
monitoring inventory and sell-through
building “deal finders” and alerts

In this tutorial we’ll scrape Vinted search results pages and extract:

listing title
price + currency
brand (when present)
item condition / size (often present in the card)
product URL
image URL
pagination across multiple pages

We’ll do it with Python + requests + BeautifulSoup, and we’ll show exactly where ProxiesAPI fits in (honestly) to keep requests stable when you crawl deeper.

Make Vinted scraping reliable with ProxiesAPI

Marketplaces throttle quickly when you paginate. ProxiesAPI gives you a stable proxy layer (plus easy rotation) so your scraper keeps running when you scale from 1 page to 100.

Get 1,000 free API calls View pricing

Important note (be responsible)

Before you scrape any marketplace:

read the site’s Terms
keep your request rate low
don’t scrape personal data
use caching (don’t re-fetch the same pages)

This guide is for public listing data and educational purposes.

What we’re scraping (URL + structure)

Vinted search pages typically look like:

https://www.vinted.com/catalog?search_text=nike%20air%20max

You may also see country-specific domains (and localized paths). The HTML can vary by region, and parts of the site may be rendered by JS.

Two practical strategies:

Start with HTML parsing (fast, cheap) and only fall back to browser automation if the data isn’t present.
Prefer scraping search results over item detail pages first. You can collect item URLs, then selectively enrich details later.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

requests for fetching
BeautifulSoup(lxml) for parsing HTML reliably

Step 1: Fetch a search page (timeouts + headers)

Marketplaces often block “default” HTTP clients. You want:

real timeouts (no hanging)
a realistic User-Agent
retry handling for 403/429/5xx

import time
import random
import requests

TIMEOUT = (10, 30)  # connect, read

DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
}

session = requests.Session()


def fetch(url: str) -> str:
    r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT)
    r.raise_for_status()
    return r.text


url = "https://www.vinted.com/catalog?search_text=nike%20air%20max"
html = fetch(url)
print("bytes:", len(html))
print(html[:200])

If you immediately hit 403/429, skip ahead to the ProxiesAPI + retries section.

Step 2: Parse listing cards with robust selectors

Vinted’s DOM changes over time. Instead of hard-coding a single fragile selector, we’ll:

locate “card-like” anchors that link to items
extract title/price/image from within that card
keep parsing defensive (missing fields should not crash)

import re
from bs4 import BeautifulSoup
from urllib.parse import urljoin

BASE = "https://www.vinted.com"


def clean_text(x: str | None) -> str | None:
    if not x:
        return None
    t = re.sub(r"\s+", " ", x).strip()
    return t or None


def parse_price(text: str | None):
    """Return (amount, currency) when possible."""
    if not text:
        return None, None

    # Examples vary: "€12.00", "12,00 €", "£10", etc.
    t = text.replace("\xa0", " ").strip()

    # currency symbol first
    m = re.search(r"([€£$])\s*([0-9]+(?:[\.,][0-9]{1,2})?)", t)
    if m:
        return float(m.group(2).replace(",", ".")), m.group(1)

    # currency symbol last
    m = re.search(r"([0-9]+(?:[\.,][0-9]{1,2})?)\s*([€£$])", t)
    if m:
        return float(m.group(1).replace(",", ".")), m.group(2)

    return None, None


def parse_search_results(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    out = []

    # Heuristic: item links often contain "/items/".
    # We collect unique item URLs.
    seen = set()
    for a in soup.select('a[href*="/items/"]'):
        href = a.get("href")
        if not href:
            continue

        item_url = href if href.startswith("http") else urljoin(BASE, href)
        if item_url in seen:
            continue
        seen.add(item_url)

        # Title: sometimes in aria-label, sometimes in nested text.
        title = clean_text(a.get("title") or a.get("aria-label"))

        # Image: look for an <img> inside the anchor
        img = a.select_one("img")
        img_url = img.get("src") or img.get("data-src") if img else None

        # Price: look for obvious price-like text inside the card
        price_text = None
        # Common approach: scan for elements that include currency symbols
        for el in a.select("*"):
            t = el.get_text(" ", strip=True)
            if t and any(sym in t for sym in ["€", "£", "$"]):
                price_text = t
                break

        price_amount, price_currency = parse_price(price_text)

        out.append({
            "title": title,
            "price": price_amount,
            "currency": price_currency,
            "price_text": price_text,
            "url": item_url,
            "image": img_url,
        })

    return out


items = parse_search_results(html)
print("items:", len(items))
print(items[:2])

Why this works

We anchor on item URLs (/items/) which are less likely to change than CSS class names.
We keep extraction “best-effort” (some cards will be missing brand/size/condition).
We store price_text for debugging so you can adjust parsing if the UI changes.

Step 3: Pagination (crawl N result pages)

Vinted pagination patterns vary. A common way is adding a page parameter.

We’ll implement crawling that:

starts from a base search URL
increments page=1..N
deduplicates item URLs across pages

from urllib.parse import urlencode, urlparse, parse_qs, urlunparse


def with_query(url: str, **params) -> str:
    u = urlparse(url)
    q = parse_qs(u.query)
    for k, v in params.items():
        q[k] = [str(v)]
    new_query = urlencode(q, doseq=True)
    return urlunparse((u.scheme, u.netloc, u.path, u.params, new_query, u.fragment))


def crawl_search(base_url: str, pages: int = 3, sleep_range=(0.8, 1.8)) -> list[dict]:
    all_items = []
    seen = set()

    for p in range(1, pages + 1):
        url = with_query(base_url, page=p)
        html = fetch(url)
        batch = parse_search_results(html)

        for it in batch:
            if not it.get("url") or it["url"] in seen:
                continue
            seen.add(it["url"])
            all_items.append(it)

        print(f"page {p}: {len(batch)} items (unique total: {len(all_items)})")
        time.sleep(random.uniform(*sleep_range))

    return all_items


base = "https://www.vinted.com/catalog?search_text=nike%20air%20max"
data = crawl_search(base, pages=5)
print("unique:", len(data))

Step 4: Export to CSV

import csv


def export_csv(items: list[dict], path: str = "vinted_items.csv"):
    fields = ["title", "price", "currency", "price_text", "url", "image"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields)
        w.writeheader()
        for it in items:
            w.writerow({k: it.get(k) for k in fields})


export_csv(data)
print("wrote vinted_items.csv")

Add ProxiesAPI: retries + rotation (where it actually helps)

If you crawl beyond a couple pages, you’ll likely see:

403 Forbidden (blocked)
429 Too Many Requests (rate limited)
intermittent connection resets

This is where a proxy layer helps.

The exact ProxiesAPI endpoint format depends on your account configuration, but the pattern is the same:

your code calls a single proxy URL
ProxiesAPI routes to the destination site using a rotating pool
you keep your parsing logic unchanged

Here’s a clean way to structure it so you can switch between direct and proxied requests.

import os

PROXIESAPI_PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")  # e.g. "http://user:pass@gateway.proxiesapi.com:port"


def get_proxies():
    if not PROXIESAPI_PROXY_URL:
        return None
    return {
        "http": PROXIESAPI_PROXY_URL,
        "https": PROXIESAPI_PROXY_URL,
    }


def fetch_with_retries(url: str, tries: int = 5) -> str:
    last_err = None

    for attempt in range(1, tries + 1):
        try:
            r = session.get(
                url,
                headers=DEFAULT_HEADERS,
                timeout=TIMEOUT,
                proxies=get_proxies(),
            )

            # common retry statuses
            if r.status_code in (403, 429, 500, 502, 503, 504):
                raise requests.HTTPError(f"status {r.status_code}")

            r.raise_for_status()
            return r.text

        except Exception as e:
            last_err = e
            backoff = min(20, 2 ** attempt) + random.random()
            print(f"attempt {attempt}/{tries} failed: {e}; sleeping {backoff:.1f}s")
            time.sleep(backoff)

    raise RuntimeError(f"failed after {tries} tries: {last_err}")

To use it, replace fetch() inside crawl_search() with fetch_with_retries().

Practical settings

keep concurrency low (1–2) unless you have a strong reason
random sleep between pages
cache HTML responses when iterating on selectors

QA checklist

Collected item URLs look valid and open in browser
Price parsing returns numbers for most cards
Exported CSV opens cleanly in Sheets
Pagination doesn’t duplicate the same items
Retries work (you can simulate by temporarily blocking your IP / lowering rate limits)

Next upgrades

enrich each item URL with a detail-page parser (brand, condition, seller stats)
store results in SQLite for incremental runs
add a “changed since last run” diff (so you don’t re-alert on old listings)

If you want, I can adapt the parser to a specific Vinted region + query so the selectors match what you see in your browser.

Make Vinted scraping reliable with ProxiesAPI

Marketplaces throttle quickly when you paginate. ProxiesAPI gives you a stable proxy layer (plus easy rotation) so your scraper keeps running when you scale from 1 page to 100.

Get 1,000 free API calls View pricing

Build a dataset from Vinted search results (title, price, size, condition, seller, images) with a production-minded Python scraper + a proxy-backed fetch layer via ProxiesAPI.

tutorial#python#vinted#ecommerce

Scrape Products from Amazon (Python) — Title, Price, Rating + Pagination

Build an Amazon product-list scraper in Python that extracts title, URL, ASIN, price, and rating across multiple result pages. Includes retries, headers, and a ProxiesAPI-ready request wrapper.

tutorial#python#amazon#ecommerce

Scrape Vinted Listings with Python: Search → Listings → Images (with ProxiesAPI)

Build a production-grade Vinted scraper: run a search, paginate results, fetch listing detail pages, and extract image URLs reliably. Includes a target-page screenshot and ProxiesAPI integration.

tutorial#python#vinted#web-scraping

Scrape TripAdvisor Hotel Reviews with Python (Pagination + Rate Limits)

Extract TripAdvisor hotel review text, ratings, dates, and reviewer metadata with a resilient Python scraper (pagination, retries, and a proxy-backed fetch layer via ProxiesAPI).

tutorial#python#tripadvisor#reviews

Scrape Secondhand Fashion Listings from Vinted (Python + ProxiesAPI)

Related guides