Scrape Secondhand Fashion Listings from Vinted

Jun 11, 2026 · tutorial · #python, #vinted, #web-scraping, #ecommerce, #beautifulsoup, #pagination, #proxiesapi

Vinted is a useful target if you care about resale pricing, brand demand, or secondhand inventory. The search results page already exposes most of what you need for a market dataset:

brand
condition and size
listed price
buyer-protection total
thumbnail image
detail URL
paginated search URLs

The practical challenge is not the HTML itself. It is keeping the crawl reliable when you move from page 1 to page 20 and across many search terms.

This guide uses the current Vinted catalog page structure and builds a scraper that:

fetches search result pages with retries
extracts listing cards with real selectors
follows pagination safely
exports CSV or JSON
optionally routes requests through ProxiesAPI

Vinted search results page with listing cards and pagination

Keep Vinted crawls stable with ProxiesAPI

Marketplace pages are fine for a few manual requests, then they start rate-limiting. ProxiesAPI gives you a cleaner network layer once you paginate through many searches.

Get 1,000 free API calls View pricing

What the current page looks like

As of June 2026, the US catalog page at:

https://www.vinted.com/catalog?search_text=nike

renders listing cards with stable data-testid hooks. The useful ones are:

Purpose	Selector
card wrapper	`div[data-testid^="product-item-id-"]`
clickable detail link	`a[data-testid$="--overlay-link"]`
image	`img[data-testid$="--image--img"]`
brand / short title	`p[data-testid$="--description-title"]`
size + condition	`p[data-testid$="--description-subtitle"]`
listed price	`p[data-testid$="--price-text"]`
total with buyer protection	`span[data-testid="total-combined-price"]`
pagination nav	`nav[data-testid="catalog-pagination"]`

That means this is an ordinary HTML scraping job. No browser automation is required for the core extraction loop.

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml tenacity

We will use:

requests for HTTP
BeautifulSoup for parsing
tenacity for retry/backoff

Step 1: Build a fetch layer that can use ProxiesAPI

The safest pattern is to keep the target URL and the proxy wrapper separate. That way your scraper still works locally without ProxiesAPI, and you can turn the proxy layer on only when you need it.

from __future__ import annotations

import csv
import json
import os
from dataclasses import dataclass, asdict
from typing import Iterable
from urllib.parse import quote, urljoin

import requests
from bs4 import BeautifulSoup
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential_jitter

BASE = "https://www.vinted.com"
TIMEOUT = (10, 30)
HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/136.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}

session = requests.Session()
session.headers.update(HEADERS)


def build_fetch_url(target_url: str) -> str:
    api_key = os.getenv("PROXIESAPI_KEY", "").strip()
    if not api_key:
        return target_url
    return (
        "https://api.proxiesapi.com/?auth_key="
        + quote(api_key, safe="")
        + "&url="
        + quote(target_url, safe="")
    )


@retry(
    reraise=True,
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=1, max=12),
    retry=retry_if_exception_type(requests.RequestException),
)
def fetch_html(url: str) -> str:
    wrapped_url = build_fetch_url(url)
    response = session.get(wrapped_url, timeout=TIMEOUT)
    response.raise_for_status()
    return response.text

If PROXIESAPI_KEY is not set, requests go directly to Vinted. If it is set, each request is wrapped through ProxiesAPI.

Step 2: Parse one listing card

The Vinted card structure is slightly unintuitive because the most human-readable text often lives in the title attribute of the overlay link and the alt attribute of the image.

That is still fine. We can combine those fields into a clean row.

@dataclass
class Listing:
    listing_id: str
    url: str
    brand: str | None
    size_condition: str | None
    listed_price: str | None
    total_price: str | None
    image_url: str | None
    label: str | None


def parse_card(card) -> Listing:
    testid = card.get("data-testid", "")
    listing_id = testid.removeprefix("product-item-id-")

    link = card.select_one('a[data-testid$="--overlay-link"]')
    image = card.select_one('img[data-testid$="--image--img"]')
    brand = card.select_one('p[data-testid$="--description-title"]')
    subtitle = card.select_one('p[data-testid$="--description-subtitle"]')
    price = card.select_one('p[data-testid$="--price-text"]')
    total = card.select_one('span[data-testid="total-combined-price"]')

    href = link.get("href") if link else None
    if href and href.startswith("/"):
        href = urljoin(BASE, href)

    return Listing(
        listing_id=listing_id,
        url=href,
        brand=brand.get_text(" ", strip=True) if brand else None,
        size_condition=subtitle.get_text(" ", strip=True) if subtitle else None,
        listed_price=price.get_text(" ", strip=True) if price else None,
        total_price=total.get_text(" ", strip=True) if total else None,
        image_url=image.get("src") if image else None,
        label=(link.get("title") if link else None) or (image.get("alt") if image else None),
    )

The label field is handy because it often contains the richer text you see in the UI, such as:

Toddler boys Nike summer bundle 3T, brand: Nike, condition: Satisfactory, size: 3T/3, $15.00, $16.45 includes Buyer Protection

Step 3: Parse a result page

def parse_search_page(html: str) -> tuple[list[Listing], str | None]:
    soup = BeautifulSoup(html, "lxml")

    cards = soup.select('div[data-testid^="product-item-id-"]')
    listings = [parse_card(card) for card in cards]

    next_link = soup.select_one('a[data-testid="catalog-pagination--next-page"]')
    next_href = next_link.get("href") if next_link else None
    if next_href and next_href.startswith("/"):
        next_href = urljoin(BASE, next_href)

    return listings, next_href

This works because the current pagination markup looks like:

<nav data-testid="catalog-pagination">
  <a data-testid="catalog-pagination--page-2" href="/catalog?search_text=nike&page=2">2</a>
  <a data-testid="catalog-pagination--next-page" href="/catalog?search_text=nike&page=2"></a>
</nav>

Step 4: Crawl multiple pages with dedupe

The two things that matter here are:

deduplicating by listing ID
stopping cleanly when pagination ends

def crawl_search(query: str, max_pages: int = 5) -> list[Listing]:
    url = f"{BASE}/catalog?search_text={quote(query)}"
    seen: set[str] = set()
    rows: list[Listing] = []

    for page_number in range(1, max_pages + 1):
        html = fetch_html(url)
        batch, next_url = parse_search_page(html)

        for item in batch:
            if not item.listing_id or item.listing_id in seen:
                continue
            seen.add(item.listing_id)
            rows.append(item)

        print(f"page={page_number} batch={len(batch)} total={len(rows)}")

        if not next_url:
            break
        url = next_url

    return rows

Step 5: Export to CSV and JSON

def save_csv(rows: Iterable[Listing], path: str) -> None:
    rows = list(rows)
    if not rows:
        return
    with open(path, "w", newline="", encoding="utf-8") as fh:
        writer = csv.DictWriter(fh, fieldnames=list(asdict(rows[0]).keys()))
        writer.writeheader()
        for row in rows:
            writer.writerow(asdict(row))


def save_json(rows: Iterable[Listing], path: str) -> None:
    payload = [asdict(row) for row in rows]
    with open(path, "w", encoding="utf-8") as fh:
        json.dump(payload, fh, ensure_ascii=False, indent=2)

Full runnable script

from __future__ import annotations

import csv
import json
import os
from dataclasses import dataclass, asdict
from typing import Iterable
from urllib.parse import quote, urljoin

import requests
from bs4 import BeautifulSoup
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential_jitter

BASE = "https://www.vinted.com"
TIMEOUT = (10, 30)
HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/136.0.0.0 Safari/537.36"
    ),
    "Accept-Language": "en-US,en;q=0.9",
}

session = requests.Session()
session.headers.update(HEADERS)


@dataclass
class Listing:
    listing_id: str
    url: str | None
    brand: str | None
    size_condition: str | None
    listed_price: str | None
    total_price: str | None
    image_url: str | None
    label: str | None


def build_fetch_url(target_url: str) -> str:
    api_key = os.getenv("PROXIESAPI_KEY", "").strip()
    if not api_key:
        return target_url
    return (
        "https://api.proxiesapi.com/?auth_key="
        + quote(api_key, safe="")
        + "&url="
        + quote(target_url, safe="")
    )


@retry(
    reraise=True,
    stop=stop_after_attempt(4),
    wait=wait_exponential_jitter(initial=1, max=12),
    retry=retry_if_exception_type(requests.RequestException),
)
def fetch_html(url: str) -> str:
    response = session.get(build_fetch_url(url), timeout=TIMEOUT)
    response.raise_for_status()
    return response.text


def parse_card(card) -> Listing:
    testid = card.get("data-testid", "")
    listing_id = testid.removeprefix("product-item-id-")

    link = card.select_one('a[data-testid$="--overlay-link"]')
    image = card.select_one('img[data-testid$="--image--img"]')
    brand = card.select_one('p[data-testid$="--description-title"]')
    subtitle = card.select_one('p[data-testid$="--description-subtitle"]')
    price = card.select_one('p[data-testid$="--price-text"]')
    total = card.select_one('span[data-testid="total-combined-price"]')

    href = link.get("href") if link else None
    if href and href.startswith("/"):
        href = urljoin(BASE, href)

    return Listing(
        listing_id=listing_id,
        url=href,
        brand=brand.get_text(" ", strip=True) if brand else None,
        size_condition=subtitle.get_text(" ", strip=True) if subtitle else None,
        listed_price=price.get_text(" ", strip=True) if price else None,
        total_price=total.get_text(" ", strip=True) if total else None,
        image_url=image.get("src") if image else None,
        label=(link.get("title") if link else None) or (image.get("alt") if image else None),
    )


def parse_search_page(html: str) -> tuple[list[Listing], str | None]:
    soup = BeautifulSoup(html, "lxml")

    rows = [parse_card(card) for card in soup.select('div[data-testid^="product-item-id-"]')]

    next_link = soup.select_one('a[data-testid="catalog-pagination--next-page"]')
    next_href = next_link.get("href") if next_link else None
    if next_href and next_href.startswith("/"):
        next_href = urljoin(BASE, next_href)

    return rows, next_href


def crawl_search(query: str, max_pages: int = 3) -> list[Listing]:
    url = f"{BASE}/catalog?search_text={quote(query)}"
    seen: set[str] = set()
    all_rows: list[Listing] = []

    for page_number in range(1, max_pages + 1):
        html = fetch_html(url)
        batch, next_url = parse_search_page(html)

        for row in batch:
            if not row.listing_id or row.listing_id in seen:
                continue
            seen.add(row.listing_id)
            all_rows.append(row)

        print(f"page={page_number} batch={len(batch)} total={len(all_rows)}")

        if not next_url:
            break
        url = next_url

    return all_rows


def save_csv(rows: Iterable[Listing], path: str) -> None:
    rows = list(rows)
    if not rows:
        return
    with open(path, "w", newline="", encoding="utf-8") as fh:
        writer = csv.DictWriter(fh, fieldnames=list(asdict(rows[0]).keys()))
        writer.writeheader()
        for row in rows:
            writer.writerow(asdict(row))


def save_json(rows: Iterable[Listing], path: str) -> None:
    with open(path, "w", encoding="utf-8") as fh:
        json.dump([asdict(row) for row in rows], fh, ensure_ascii=False, indent=2)


if __name__ == "__main__":
    listings = crawl_search("nike", max_pages=3)
    save_csv(listings, "vinted_nike.csv")
    save_json(listings, "vinted_nike.json")
    print(f"saved {len(listings)} listings")

Typical output:

page=1 batch=96 total=96
page=2 batch=96 total=192
page=3 batch=96 total=288
saved 288 listings

Practical notes that matter in production

Vinted runs multiple domains/locales. Prices, currency, and some filter labels change by geography. The data-testid hooks are usually more stable than visible text, so build your extractor around those.

2. Use the listing ID as your dedupe key

Card order changes constantly. The product-item-id-* value is much better than deduping by title or URL slug.

3. Save raw HTML when selectors break

Do not silently return empty rows. If the card count drops to zero, save the raw page and inspect whether:

a login wall appeared
markup changed
you were rate-limited
the query itself returned no results

4. Turn on ProxiesAPI when you scale out

One search term is easy. Hundreds of search terms across many pages is where your failure rate starts showing up. That is the right time to route requests through ProxiesAPI instead of waiting until jobs become flaky.

When to use Playwright instead

For basic catalog pages, requests + BeautifulSoup is enough.

Switch to Playwright only if you need:

infinite-scroll behavior that stops rendering in raw HTML
interaction-heavy filters
authenticated sessions
screenshot/debug traces while diagnosing blocks

For the dataset described in this post, the server-rendered catalog HTML is the simpler and cheaper option.

Wrap-up

Vinted is a good example of a modern marketplace that still exposes scrape-friendly HTML once you find the right hooks. The key selectors are already on the page, pagination is explicit, and the network layer is the real reliability problem.

Start with direct requests while you validate selectors. Once you run this across many searches, add ProxiesAPI so the crawl stays predictable instead of dying halfway through a long pagination job.

Keep Vinted crawls stable with ProxiesAPI

Marketplace pages are fine for a few manual requests, then they start rate-limiting. ProxiesAPI gives you a cleaner network layer once you paginate through many searches.

Get 1,000 free API calls View pricing

Build an Amazon product-list scraper in Python that extracts title, URL, ASIN, price, and rating across multiple result pages. Includes retries, headers, and a ProxiesAPI-ready request wrapper.

tutorial#python#amazon#ecommerce

Amazon Best Sellers Scraper: Track Category Rankings and Price Moves

Scrape Amazon Best Sellers pages into repeatable snapshots, extract ranks and prices, and compute movement over time with a parser that is honest about block risk.

tutorial#amazon best sellers scraper#amazon#python

IMDb Scraper: Extract Movie Ratings, Cast, and Release Dates with Python

Build a practical IMDb scraper that starts from the search suggestion endpoint, enriches title pages, and exports ratings, cast, and release dates with a ProxiesAPI-ready fetch layer.

tutorial#python#imdb scraper#imdb

Scrape Hacker News Jobs Posts with Python + ProxiesAPI

Turn the HN Jobs feed into a clean dataset of roles, companies, domains, and links from the real jobs page with resilient pagination and a validation screenshot.

tutorial#python#hackernews#jobs