Scrape Pinterest Images and Pins (Search + Board URLs) with Python + ProxiesAPI

Apr 24, 2026 · tutorial · #python, #pinterest, #web-scraping, #proxies, #requests, #beautifulsoup, #json

Pinterest is one of those sites where “just curl the HTML” works for quick exploration… until it doesn’t.

some pages are server-rendered enough to parse
some content is hydrated by JS
anti-bot can kick in when you paginate or hit multiple boards quickly

In this guide we’ll build a practical Pinterest scraper in Python that supports two common workflows:

Search pages (e.g. “kitchen design”)
Board pages (e.g. https://www.pinterest.com/<user>/<board>/)

We’ll extract:

pin title / alt text
best image URL we can find (often i.pinimg.com)
pin URL
outbound “destination” URL when available
board metadata (name, followers if visible)

We’ll also add:

pagination / continuation (best-effort)
retries + backoff
dedupe
JSONL export

Keep Pinterest scrapes stable with ProxiesAPI

Pinterest can throttle aggressively once you scale beyond a few requests. ProxiesAPI helps you keep a consistent network layer (retries, rotation, higher success rates) while your parser stays the same.

Get 1,000 free API calls View pricing

Important note (what’s realistic)

Pinterest changes its markup frequently and uses heavy client-side rendering.

This tutorial focuses on a robust, best-effort HTML approach:

It works when the response contains enough HTML / embedded JSON
It may require tweaks when Pinterest ships UI changes

If you need guaranteed long-term stability, you typically move to:

a browser pipeline (Playwright) with careful throttling, or
a data partner / official API

But for many use cases (collecting inspiration pins from specific boards, quick monitoring, internal datasets), HTML + defensive parsing is still useful.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We’ll use:

requests for HTTP
BeautifulSoup(lxml) for parsing

ProxiesAPI request wrapper (recommended)

You’ll plug ProxiesAPI into a single fetch() function. Everything else (parsing, pagination, export) stays the same.

Below is a template you can adapt to your ProxiesAPI account.

import os
import time
import random
import requests

TIMEOUT = (15, 45)  # connect, read

# Put your key in env: export PROXIESAPI_KEY="..."
PROXIESAPI_KEY = os.getenv("PROXIESAPI_KEY")

session = requests.Session()


def fetch(url: str, *, use_proxiesapi: bool = True, max_retries: int = 5) -> str:
    """Fetch a URL with retries + exponential backoff.

    If you use ProxiesAPI, keep this function as the only place that knows about it.
    """
    last_err = None

    for attempt in range(1, max_retries + 1):
        try:
            if use_proxiesapi:
                if not PROXIESAPI_KEY:
                    raise RuntimeError("Missing PROXIESAPI_KEY env var")

                # Example pattern: call ProxiesAPI with the upstream URL.
                # Replace endpoint/params with the exact ProxiesAPI format you use.
                r = session.get(
                    "https://api.proxiesapi.com",
                    params={
                        "auth_key": PROXIESAPI_KEY,
                        "url": url,
                        # Optional knobs (names depend on your ProxiesAPI plan):
                        # "country": "US",
                        # "render": "false",
                    },
                    timeout=TIMEOUT,
                    headers={
                        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0 Safari/537.36",
                        "Accept-Language": "en-US,en;q=0.9",
                    },
                )
            else:
                r = session.get(
                    url,
                    timeout=TIMEOUT,
                    headers={
                        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0 Safari/537.36",
                        "Accept-Language": "en-US,en;q=0.9",
                    },
                )

            r.raise_for_status()
            return r.text

        except Exception as e:
            last_err = e
            # Backoff with jitter
            sleep_s = min(30, (2 ** (attempt - 1)) + random.random())
            time.sleep(sleep_s)

    raise RuntimeError(f"Failed to fetch after {max_retries} retries: {url}") from last_err

What Pinterest pages look like (what to parse)

Pinterest pages often contain:

visible HTML with some pin cards
embedded JSON blobs in <script> tags that contain richer pin data

We’ll implement two extraction strategies:

HTML image + link scraping (fast, sometimes enough)
Embedded JSON discovery (more stable when present)

Step 1: Parse pins from HTML (cards → images)

On many Pinterest pages, you can find images that look like:

https://i.pinimg.com/.../xxx.jpg

We’ll treat each candidate image as a “pin-ish” item and then try to locate a nearby link.

from bs4 import BeautifulSoup
from urllib.parse import urljoin

BASE = "https://www.pinterest.com"


def normalize_pin_url(href: str | None) -> str | None:
    if not href:
        return None
    if href.startswith("http"):
        return href
    return urljoin(BASE, href)


def parse_pins_from_html(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    pins = []

    # Pinterest images typically come from i.pinimg.com
    for img in soup.select('img[src*="i.pinimg.com"]'):
        src = img.get("src")
        alt = (img.get("alt") or "").strip() or None

        # Heuristic: closest anchor up the tree
        a = img.find_parent("a")
        href = a.get("href") if a else None

        pins.append({
            "title": alt,
            "image_url": src,
            "pin_url": normalize_pin_url(href),
        })

    # Dedupe by (image_url or pin_url)
    seen = set()
    out = []
    for p in pins:
        key = p.get("pin_url") or p.get("image_url")
        if not key or key in seen:
            continue
        seen.add(key)
        out.append(p)

    return out

This alone can give you a usable dataset (title + image URL + pin URL).

But if you want outbound destination URLs (the link a pin points to), you’ll usually need embedded JSON.

Step 2: Extract embedded JSON (best-effort)

Pinterest often embeds JSON data in script tags.

We’ll:

scan scripts
find JSON-ish blobs
parse objects that look like pins

Because this varies, we’ll implement a conservative extractor that doesn’t assume a single schema.

import json
import re


def iter_json_blobs(html: str):
    # crude but effective: look for large JSON blobs in <script> tags
    for m in re.finditer(r"<script[^>]*>(.*?)</script>", html, flags=re.S | re.I):
        s = m.group(1).strip()
        if not s:
            continue
        # Quick filter
        if "{" not in s and "[" not in s:
            continue

        # Sometimes Pinterest assigns JSON to a variable; try to locate the first { ... } block
        # This won't catch everything, but it avoids overfitting.
        start = s.find("{")
        if start == -1:
            start = s.find("[")
        if start == -1:
            continue

        candidate = s[start:]

        # Try JSON parse directly
        try:
            yield json.loads(candidate)
        except Exception:
            continue


def walk(obj):
    if isinstance(obj, dict):
        yield obj
        for v in obj.values():
            yield from walk(v)
    elif isinstance(obj, list):
        for it in obj:
            yield from walk(it)


def parse_pins_from_embedded_json(html: str) -> list[dict]:
    pins = []

    for blob in iter_json_blobs(html):
        for node in walk(blob):
            # Heuristic: nodes that look like pin objects may have an id + images
            pin_id = node.get("id") or node.get("pin_id")
            images = node.get("images") or node.get("image")

            # Try to locate an image URL in common shapes
            image_url = None
            if isinstance(images, dict):
                # Pinterest sometimes offers multiple sizes
                for key in ["orig", "original", "736x", "564x", "474x"]:
                    if key in images and isinstance(images[key], dict):
                        image_url = images[key].get("url")
                        if image_url:
                            break
                if not image_url:
                    # any dict value with url
                    for v in images.values():
                        if isinstance(v, dict) and v.get("url"):
                            image_url = v.get("url")
                            break

            title = node.get("title") or node.get("grid_title") or node.get("description")
            link = node.get("link") or node.get("url")

            # Destination URL is often in fields like "link" or "destination_url"
            destination = node.get("destination_url") or node.get("outbound_link")

            if (pin_id or image_url) and (image_url or link):
                pins.append({
                    "id": str(pin_id) if pin_id else None,
                    "title": (title or "").strip() or None,
                    "image_url": image_url,
                    "pin_url": link if isinstance(link, str) else None,
                    "destination_url": destination if isinstance(destination, str) else None,
                })

    # Dedupe
    seen = set()
    out = []
    for p in pins:
        key = p.get("id") or p.get("pin_url") or p.get("image_url")
        if not key or key in seen:
            continue
        seen.add(key)
        out.append(p)

    return out

You’ll notice we didn’t hardcode a single schema. That’s intentional.

Step 3: Scrape Pinterest search results

Search URLs look like:

https://www.pinterest.com/search/pins/?q=kitchen%20design

Pinterest pagination can be complex. For a tutorial that stays maintainable, we’ll:

fetch the first page
parse pins from HTML + embedded JSON
optionally attempt to fetch “more” by using ?rs=typed or additional parameters (best-effort)

from urllib.parse import urlencode


def build_search_url(query: str) -> str:
    qs = urlencode({"q": query})
    return f"https://www.pinterest.com/search/pins/?{qs}"


def scrape_search(query: str, *, pages: int = 1) -> list[dict]:
    all_pins = []
    seen = set()

    url = build_search_url(query)

    for page in range(1, pages + 1):
        html = fetch(url)

        batch = []
        batch.extend(parse_pins_from_embedded_json(html))
        batch.extend(parse_pins_from_html(html))

        for p in batch:
            key = p.get("id") or p.get("pin_url") or p.get("image_url")
            if not key or key in seen:
                continue
            seen.add(key)
            all_pins.append(p)

        print("page", page, "pins", len(batch), "total unique", len(all_pins))

        # Best-effort pagination: Pinterest often needs continuation tokens.
        # For a production pipeline, you’d capture their internal next-page JSON calls.
        # Here we stop after the first page unless you extend this section.
        break

    return all_pins

For many use cases, you can run the search repeatedly (different keywords) instead of deep-paginating one keyword.

Step 4: Scrape a Board URL

Board URLs are commonly:

https://www.pinterest.com/<username>/<board>/

Boards are a great target because the intent is clear: “pins in this collection.”


def scrape_board(board_url: str) -> list[dict]:
    html = fetch(board_url)

    pins = []
    pins.extend(parse_pins_from_embedded_json(html))
    pins.extend(parse_pins_from_html(html))

    # Board metadata (best-effort)
    soup = BeautifulSoup(html, "lxml")
    h1 = soup.select_one("h1")
    board_name = h1.get_text(" ", strip=True) if h1 else None

    for p in pins:
        p["board_url"] = board_url
        p["board_name"] = board_name

    # Dedupe
    seen = set()
    out = []
    for p in pins:
        key = p.get("id") or p.get("pin_url") or p.get("image_url")
        if not key or key in seen:
            continue
        seen.add(key)
        out.append(p)

    return out

Export to JSONL

import json

def write_jsonl(path: str, rows: list[dict]):
    with open(path, "w", encoding="utf-8") as f:
        for r in rows:
            f.write(json.dumps(r, ensure_ascii=False) + "\n")


if __name__ == "__main__":
    pins = scrape_search("kitchen design", pages=1)
    write_jsonl("pinterest_search_kitchen_design.jsonl", pins)
    print("wrote", len(pins))

    # Example board (replace with a real board you own / is public)
    # board_pins = scrape_board("https://www.pinterest.com/<user>/<board>/")
    # write_jsonl("pinterest_board.jsonl", board_pins)

QA checklist

First page returns 20+ pins (varies)
Image URLs are i.pinimg.com and load in a browser
Pin URLs look like Pinterest URLs (not None)
Dedupe reduces duplicates from HTML + JSON extraction overlap
Your fetch layer uses timeouts + retries

Where ProxiesAPI helps (honestly)

Pinterest is one of the sites where the network layer is your main pain:

throttling increases with pagination
intermittent 403/429 responses
inconsistent HTML/JS payloads

ProxiesAPI helps keep fetches reliable while your parser stays focused on structure and data quality.

If you extend this guide, the next step is to capture the internal continuation requests (tokens) and implement deep pagination — ProxiesAPI makes that significantly less flaky.

Keep Pinterest scrapes stable with ProxiesAPI

Get 1,000 free API calls View pricing

Learn how to scrape Goodreads book pages responsibly: extract rating, rating count, review count via JSON-LD, parse key metadata, and collect top review snippets. Includes screenshot and ProxiesAPI-ready request patterns.

tutorial#python#goodreads#web-scraping

How to Scrape Stack Overflow Questions and Accepted Answers with Python (By Tag)

Build a resilient Stack Overflow scraper: crawl tag pages, extract question metadata, follow links, and parse accepted answers. Includes retries, dedupe, and ProxiesAPI-ready requests + a screenshot of the tag page.

tutorial#python#stack-overflow#web-scraping

Scrape Stack Overflow Questions and Answers by Tag (Python + ProxiesAPI)

Collect Stack Overflow Q&A for a tag with pagination, answer extraction, and a proof screenshot. Export clean JSON for analysis.

tutorial#python#stack-overflow#web-scraping

Scrape Podcast Data from Apple Podcasts (Charts + Show/Episode Metadata) with Python + ProxiesAPI

Build a clean dataset of Apple Podcasts charts → show pages → episode lists. Includes stable IDs, incremental updates, and a scraper-friendly request layer using ProxiesAPI.

tutorial#python#apple-podcasts#podcasts

Scrape Pinterest Images and Pins (Search + Board URLs) with Python + ProxiesAPI

Related guides