Scrape Vinted Listings with Python: Search → Listings → Images (with ProxiesAPI)

Vinted is a goldmine of secondhand fashion data: pricing, condition, brand, size, seller metadata, and—crucially—high-quality item photos.

In this guide, we’ll build a real Python scraper that follows the exact flow you’d use in production:

  1. Search Vinted for items (e.g., “nike dunk”, “patagonia fleece”)
  2. Paginate through results safely
  3. Open listing pages to extract richer fields
  4. Collect image URLs (and optionally download them)

We’ll also show where ProxiesAPI fits in: not as “magic”, but as a network layer that helps keep crawls stable as volume grows.

Vinted search results page (we’ll scrape cards + listing links)

Make your marketplace scrapers more reliable with ProxiesAPI

Marketplaces like Vinted can rate-limit or challenge repeated requests. ProxiesAPI gives you a stable proxy layer and consistent request behavior when you scale from a few pages to thousands of listings.


What we’re scraping (Vinted page structure)

Vinted is a modern web app. In many locales, the search results page is server-rendered enough to scrape the listing cards and links, but details (and some attributes) can vary by region and A/B tests.

The safe approach is:

  • use the search page HTML to find listing URLs
  • for each URL, fetch the listing detail page and parse consistent fields

Target URLs

Typical entry points:

  • Home: https://www.vinted.com/
  • Search: https://www.vinted.com/catalog?search_text=...
  • Listing: https://www.vinted.com/items/<id>-<slug>

Vinted’s exact query parameters may differ by region, but the scraper below is resilient because it:

  • extracts listing links rather than relying on guessed API endpoints
  • parses JSON embedded in the HTML when available
  • falls back to HTML selectors for core fields

Setup

Create a virtualenv and install dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml python-dotenv

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for HTML parsing
  • .env for configuration

Create a .env file:

PROXIESAPI_KEY="YOUR_PROXIESAPI_KEY"

Step 1: Build a fetcher (timeouts, retries, headers)

Scrapers fail in boring ways: timeouts, 429s, 5xx, and occasional HTML that changes. Start with a fetcher you can trust.

import os
import time
from dataclasses import dataclass
from typing import Optional

import requests
from dotenv import load_dotenv

load_dotenv()

PROXIESAPI_KEY = os.getenv("PROXIESAPI_KEY", "").strip()

TIMEOUT = (10, 30)  # connect, read

DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/123.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Connection": "keep-alive",
}


@dataclass
class FetchResult:
    url: str
    status_code: int
    text: str
    final_url: str


class HttpClient:
    def __init__(self):
        self.s = requests.Session()
        self.s.headers.update(DEFAULT_HEADERS)

    def _via_proxiesapi(self, url: str) -> str:
        """Wrap a target URL through ProxiesAPI.

        NOTE: Keep this conservative and transparent. We just build a proxy URL.
        If ProxiesAPI is not configured, we fetch directly.
        """
        if not PROXIESAPI_KEY:
            return url

        # Common pattern: pass the destination as a query param.
        # If your ProxiesAPI account uses a different format, adjust here.
        return f"https://api.proxiesapi.com/?auth_key={PROXIESAPI_KEY}&url={requests.utils.quote(url, safe='')}"

    def get_html(self, url: str, *, use_proxy: bool = True, max_retries: int = 3) -> FetchResult:
        last_exc: Optional[Exception] = None

        for attempt in range(1, max_retries + 1):
            try:
                fetch_url = self._via_proxiesapi(url) if use_proxy else url
                r = self.s.get(fetch_url, timeout=TIMEOUT, allow_redirects=True)

                # If ProxiesAPI is used, r.url will be the proxy URL; keep both.
                if r.status_code in (429, 500, 502, 503, 504):
                    backoff = min(2 ** attempt, 10)
                    time.sleep(backoff)
                    continue

                r.raise_for_status()
                return FetchResult(url=url, status_code=r.status_code, text=r.text, final_url=r.url)

            except Exception as e:
                last_exc = e
                time.sleep(min(2 ** attempt, 10))

        raise RuntimeError(f"Failed to fetch {url} after {max_retries} retries: {last_exc}")

Why this structure works:

  • Timeouts prevent hangs
  • Retries with backoff smooth temporary bans / spikes
  • ProxiesAPI wrapper is contained to one function

Step 2: Scrape Vinted search results (listing cards → URLs)

The first job is to convert a keyword into listing URLs.

Build a search URL

from urllib.parse import urlencode

BASE = "https://www.vinted.com"


def build_search_url(query: str, page: int = 1) -> str:
    params = {
        "search_text": query,
        "page": page,
    }
    return f"{BASE}/catalog?{urlencode(params)}"

Extract listing URLs from HTML

Vinted’s markup can change, so we use a hybrid strategy:

  1. Collect all links that look like listing URLs (/items/…)
  2. De-duplicate
  3. Filter out non-item links
import re
from bs4 import BeautifulSoup
from urllib.parse import urljoin

ITEM_PATH_RE = re.compile(r"^/items/\d+")


def extract_listing_urls_from_search(html: str) -> list[str]:
    soup = BeautifulSoup(html, "lxml")

    urls: list[str] = []
    seen = set()

    for a in soup.select("a[href]"):
        href = a.get("href")
        if not href:
            continue
        if ITEM_PATH_RE.match(href):
            full = urljoin(BASE, href)
            if full not in seen:
                seen.add(full)
                urls.append(full)

    return urls

Putting it together (paginate)


def crawl_search(query: str, pages: int = 3, *, use_proxy: bool = True) -> list[str]:
    client = HttpClient()

    all_urls: list[str] = []
    seen = set()

    for page in range(1, pages + 1):
        url = build_search_url(query, page=page)
        res = client.get_html(url, use_proxy=use_proxy)

        batch = extract_listing_urls_from_search(res.text)
        print(f"page {page}: found {len(batch)} listing urls")

        # Some pages may contain repeated links; dedupe globally.
        for u in batch:
            if u in seen:
                continue
            seen.add(u)
            all_urls.append(u)

        # Be polite; tune for your needs.
        time.sleep(1.0)

    return all_urls


if __name__ == "__main__":
    urls = crawl_search("patagonia fleece", pages=2)
    print("unique listing urls:", len(urls))
    print(urls[:5])

Step 3: Scrape a listing detail page (title, price, brand, images)

Now the interesting part: extract structured data from a listing page.

On many modern sites, listing pages include embedded JSON (often in a script tag). When it exists, parsing that JSON is more stable than scraping spans.

We’ll try two approaches:

  • Approach A: parse embedded JSON if present
  • Approach B: fallback to HTML selectors for title/price

Extract images + basic fields

import json


def _find_embedded_json(soup: BeautifulSoup) -> dict | None:
    # Vinted (and many Next.js apps) may embed state in script tags.
    # This function is defensive: it searches for JSON blobs and returns the first parseable dict.
    scripts = soup.select("script")
    for sc in scripts:
        txt = sc.string
        if not txt:
            continue
        t = txt.strip()
        if not t:
            continue
        if t.startswith("{") and t.endswith("}") and len(t) > 200:
            try:
                obj = json.loads(t)
                if isinstance(obj, dict):
                    return obj
            except Exception:
                pass
    return None


def parse_listing(html: str, url: str) -> dict:
    soup = BeautifulSoup(html, "lxml")

    data = {
        "url": url,
        "title": None,
        "price": None,
        "currency": None,
        "brand": None,
        "size": None,
        "condition": None,
        "images": [],
    }

    # Try JSON first
    js = _find_embedded_json(soup)
    if js:
        # We can’t assume exact schema (varies by deployment/locale).
        # So we search for image URLs anywhere in the JSON.
        imgs = []

        def walk(x):
            if isinstance(x, dict):
                for k, v in x.items():
                    if isinstance(v, (dict, list)):
                        walk(v)
                    else:
                        if isinstance(v, str) and ("vinted" in v) and (".jpg" in v or ".png" in v):
                            imgs.append(v)
            elif isinstance(x, list):
                for i in x:
                    walk(i)

        walk(js)
        # De-dupe while preserving order
        seen = set()
        for u in imgs:
            if u in seen:
                continue
            seen.add(u)
            data["images"].append(u)

    # Fallback HTML selectors for title/price if JSON wasn’t helpful
    if not data["title"]:
        h1 = soup.select_one("h1")
        if h1:
            data["title"] = h1.get_text(" ", strip=True)

    # Price: try meta first
    price_meta = soup.select_one('meta[property="product:price:amount"], meta[itemprop="price"]')
    if price_meta and price_meta.get("content"):
        data["price"] = price_meta.get("content")

    currency_meta = soup.select_one('meta[property="product:price:currency"], meta[itemprop="priceCurrency"]')
    if currency_meta and currency_meta.get("content"):
        data["currency"] = currency_meta.get("content")

    # If we didn’t find image URLs from JSON, also try og:image
    if not data["images"]:
        og = soup.select_one('meta[property="og:image"]')
        if og and og.get("content"):
            data["images"] = [og.get("content")]

    return data

Fetch + parse listing details at scale


def crawl_listing_details(urls: list[str], *, use_proxy: bool = True, limit: int = 30) -> list[dict]:
    client = HttpClient()
    out: list[dict] = []

    for i, url in enumerate(urls[:limit], start=1):
        res = client.get_html(url, use_proxy=use_proxy)
        item = parse_listing(res.text, url)
        out.append(item)

        print(f"{i}/{min(limit, len(urls))} title={item.get('title')!r} images={len(item.get('images') or [])}")
        time.sleep(1.0)

    return out

Step 4 (optional): Download listing images

Once you have image URLs, downloading is straightforward. The main thing is respecting bandwidth and timeouts.

from pathlib import Path


def download_images(items: list[dict], out_dir: str = "vinted_images") -> None:
    client = HttpClient()
    base = Path(out_dir)
    base.mkdir(parents=True, exist_ok=True)

    for item in items:
        url = item.get("url")
        images = item.get("images") or []
        if not images:
            continue

        # Make a stable folder name
        safe = (url.split("/items/")[-1] if "/items/" in url else "item").split("?")[0]
        folder = base / safe
        folder.mkdir(parents=True, exist_ok=True)

        for idx, img_url in enumerate(images[:10], start=1):
            try:
                r = client.s.get(img_url, timeout=TIMEOUT)
                r.raise_for_status()
                ext = ".jpg" if ".jpg" in img_url else ".png" if ".png" in img_url else ".bin"
                path = folder / f"{idx:02d}{ext}"
                path.write_bytes(r.content)
            except Exception as e:
                print("failed image", img_url, e)

        time.sleep(0.5)

Practical notes (what breaks, and how to fix it)

1) Pagination isn’t always “page=2”

Some locales or experiments may use different params. If you notice you’re getting the same results on every page:

  • print the search URL you’re hitting
  • print the first 3 listing URLs on each page
  • check whether the HTML contains a “next page” link, then follow it

A robust improvement is to parse a “next” URL from the HTML (when present) instead of constructing it.

2) Bot challenges / rate limiting

If you start seeing:

  • HTTP 429
  • 403/401
  • HTML that looks like a challenge page

Then you need to reduce concurrency, add delays, and use a stable proxy layer. That’s where ProxiesAPI helps.

3) Always scrape “cards → details”

Card data is often incomplete. Details pages are richer and closer to the source-of-truth.


End-to-end example (search → details → JSON export)

import json


def main():
    query = "patagonia fleece"

    urls = crawl_search(query, pages=2, use_proxy=True)
    items = crawl_listing_details(urls, use_proxy=True, limit=25)

    with open("vinted_listings.json", "w", encoding="utf-8") as f:
        json.dump(items, f, ensure_ascii=False, indent=2)

    print("wrote vinted_listings.json", len(items))


if __name__ == "__main__":
    main()

Where ProxiesAPI fits (honestly)

You can often scrape a few pages of Vinted directly.

But if you’re building a dataset (hundreds/thousands of listing pages), the failure modes stack up:

  • inconsistent rate limits
  • IP reputation decay during long crawls
  • intermittent 5xx/429 bursts

ProxiesAPI helps by giving you a consistent proxy layer you can route requests through—without rewriting your scraper.


QA checklist

  • Search crawl returns unique /items/… URLs
  • Listing parser extracts at least: title, price (when available), and images
  • JSON export loads cleanly and matches your expectations
  • You have delays/timeouts (no infinite hangs)
  • You can re-run without duplicating data (add a seen set / persistent store when scaling)
Make your marketplace scrapers more reliable with ProxiesAPI

Marketplaces like Vinted can rate-limit or challenge repeated requests. ProxiesAPI gives you a stable proxy layer and consistent request behavior when you scale from a few pages to thousands of listings.

Related guides

Scrape Vinted Listings with Python: Search, Prices, Images, and Pagination
Build a dataset from Vinted search results (title, price, size, condition, seller, images) with a production-minded Python scraper + a proxy-backed fetch layer via ProxiesAPI.
tutorial#python#vinted#ecommerce
Scrape Rightmove Sold Prices with Python: Sold Listings + Price History Dataset (with ProxiesAPI)
Build a Rightmove Sold Prices scraper: crawl sold-property results, paginate, fetch property detail pages, and normalize into a clean dataset. Includes a target-page screenshot and ProxiesAPI integration.
tutorial#python#rightmove#property-data
Scrape TripAdvisor Hotel Reviews with Python (Pagination + Rate Limits)
Extract TripAdvisor hotel review text, ratings, dates, and reviewer metadata with a resilient Python scraper (pagination, retries, and a proxy-backed fetch layer via ProxiesAPI).
tutorial#python#tripadvisor#reviews
Scrape Product Comparisons from CNET (Python + ProxiesAPI)
Collect CNET comparison tables and spec blocks, normalize the data into a clean dataset, and keep the crawl stable with retries + ProxiesAPI. Includes screenshot workflow.
tutorial#python#cnet#web-scraping