How to Scrape Shopify Stores: Products, Prices, and Inventory (2026)

If you’re trying to scrape an e-commerce site in 2026, there’s a good chance it’s running on Shopify.

The good news: Shopify exposes a bunch of structured endpoints that are far easier to work with than scraping messy HTML.

The tricky part: every store has different themes/apps, and scraping at scale can trigger blocks.

This guide walks you through practical, repeatable ways to extract:

  • product names and URLs
  • prices (including variant prices)
  • inventory/availability signals

And we’ll do it in a way that’s:

  • robust to theme differences
  • respectful (rate limits, ethical usage)
  • easy to adapt to many stores

Target keyword: how to scrape shopify stores.

Scale Shopify monitoring with a stable proxy layer

Shopify stores vary wildly, and rate limits/blocks show up fast when you monitor many stores. A proxy layer (like ProxiesAPI) can help keep your data collection consistent.


First: know what you can and can’t reliably get

Shopify stores can expose different “levels” of product data:

DataOften available?Best source
Product title, handle, URLYes/products.json or HTML → JSON endpoints
Variant pricesYesproduct JSON (variants[])
Availability (in stock)Sometimesvariant available field, or inventory_quantity when exposed
Exact inventory countsRare (public)usually not available without authenticated APIs

If you need exact inventory quantities, you often can’t get it ethically/legally from public endpoints.

But for many monitoring use cases (price tracking, assortment tracking), availability + price is enough.


The easiest win: /products.json

Many Shopify stores expose:

  • https://STORE_DOMAIN/products.json?limit=250&page=1

This returns a JSON payload with a products array.

Python example: fetch + parse

import requests
from urllib.parse import urljoin

TIMEOUT = (10, 30)


def fetch_products_json(store_base: str, page: int = 1, limit: int = 250) -> dict:
    url = urljoin(store_base, f"/products.json?limit={limit}&page={page}")
    r = requests.get(url, timeout=TIMEOUT, headers={
        "User-Agent": "Mozilla/5.0",
        "Accept": "application/json",
    })
    r.raise_for_status()
    return r.json()


def extract_products(payload: dict) -> list[dict]:
    out = []
    for p in payload.get("products", []):
        handle = p.get("handle")
        product_url = f"/products/{handle}" if handle else None

        variants = p.get("variants", []) or []
        # choose a representative price
        prices = []
        for v in variants:
            if v.get("price") is not None:
                try:
                    prices.append(float(v["price"]))
                except Exception:
                    pass

        out.append({
            "id": p.get("id"),
            "title": p.get("title"),
            "handle": handle,
            "url": product_url,
            "vendor": p.get("vendor"),
            "product_type": p.get("product_type"),
            "price_min": min(prices) if prices else None,
            "price_max": max(prices) if prices else None,
            "variant_count": len(variants),
        })

    return out

Pagination pattern

Keep requesting pages until you get fewer than limit products.


def crawl_store_products(store_base: str, limit: int = 250, max_pages: int = 20) -> list[dict]:
    all_products = []

    for page in range(1, max_pages + 1):
        payload = fetch_products_json(store_base, page=page, limit=limit)
        batch = extract_products(payload)

        print("page", page, "products", len(batch))
        all_products.extend(batch)

        if len(payload.get("products", [])) < limit:
            break

    return all_products

More targeted: collections → products (best for large catalogs)

On bigger stores, /products.json may be disabled or throttled.

A more targeted route is:

  1. discover collection handles (from HTML nav, sitemap, or known URLs)
  2. fetch:
    • https://STORE_DOMAIN/collections/{handle}/products.json?limit=250&page=1

This gives you just products in that collection.


Parse variant availability (inventory “signal”)

Within a product JSON object, variants often include:

  • available (boolean)
  • inventory_management (string or null)
  • sometimes inventory_quantity (not always present)

Example extraction:


def extract_variants(product: dict) -> list[dict]:
    out = []
    for v in product.get("variants", []) or []:
        out.append({
            "variant_id": v.get("id"),
            "title": v.get("title"),
            "price": v.get("price"),
            "available": v.get("available"),
            "sku": v.get("sku"),
        })
    return out

If available is missing, you can sometimes infer availability by:

  • checking if an “Add to cart” form exists in HTML
  • looking for "available":true in embedded JSON (window.__st) depending on theme

But prefer JSON endpoints first.


When JSON endpoints are blocked: HTML → embedded JSON

Some stores restrict /products.json.

In that case, fetch the product page HTML and look for embedded JSON.

Common patterns:

  • <script type="application/ld+json"> (structured product data; often has price)
  • application/json script tags used by themes

A minimal parser using BeautifulSoup:

import json
from bs4 import BeautifulSoup


def extract_ld_json(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    out = []
    for s in soup.select('script[type="application/ld+json"]'):
        try:
            out.append(json.loads(s.get_text(strip=True)))
        except Exception:
            continue
    return out

LD+JSON is not perfect, but it’s a stable fallback for:

  • product name
  • canonical URL
  • offer price

Avoid blocks: practical advice

Shopify stores have WAF/CDN layers and app stacks. Blocks show up as:

  • 403 responses
  • HTML “challenge” pages
  • sudden empty JSON

What helps:

  1. Slow down

    • start at 0.5–1.5 requests/second
  2. Retry carefully

    • exponential backoff
    • stop after N failures
  3. Cache

    • don’t refetch the full catalog every minute
  4. Spread traffic

    • if you monitor many stores, distribute requests across time
  5. Use a stable proxy layer when needed

    • especially if your IP gets flagged during high-volume monitoring

Comparison: JSON endpoints vs HTML scraping

ApproachProsConsBest for
/products.jsonstructured, fast, easy parsingsometimes disabled/throttledcatalogs, price tracking
collections/.../products.jsontargeted, scalableneeds collection discoverylarge stores
HTML + embedded JSONworks when JSON endpoints blockedmore brittle, heavier pagesfallback / enrichment

In practice, build your scraper with a tiered strategy:

  1. try JSON endpoint
  2. fall back to collections JSON
  3. fall back to HTML → embedded JSON

A simple “store monitor” shape

If you’re monitoring prices/availability, you want incremental runs:

  • store last-seen price per variant
  • alert when price changes or availability flips

Even a SQLite database works great for this.


Ethics and Terms

E-commerce scraping has real risks. Do the boring-but-important checks:

  • read the store’s Terms of Service
  • don’t scrape personal data
  • respect rate limits
  • be transparent if used commercially

Summary

To scrape Shopify stores reliably in 2026:

  • start with /products.json (limit 250 + page)
  • use collections JSON for big catalogs
  • treat inventory counts as usually private; use availability signals
  • add retries/backoff and cache aggressively
  • use a proxy layer only when scale requires it
Scale Shopify monitoring with a stable proxy layer

Shopify stores vary wildly, and rate limits/blocks show up fast when you monitor many stores. A proxy layer (like ProxiesAPI) can help keep your data collection consistent.

Related guides

How to Scrape E-Commerce Websites: A Practical Guide
A practical playbook for ecommerce scraping: category discovery, pagination patterns, product detail extraction, variants, rate limits, retries, and proxy-backed fetching with ProxiesAPI.
guide#ecommerce scraping#ecommerce#web-scraping
Web Scraping with Ruby: Nokogiri + HTTParty Tutorial (2026)
A practical Ruby scraping guide: fetch pages with HTTParty, parse HTML with Nokogiri, handle pagination, add retries, and rotate proxies responsibly.
guide#ruby#nokogiri#httparty
Scrape Wikipedia Article Data at Scale (Tables + Infobox + Links)
Extract structured fields from many Wikipedia pages (infobox + tables + links) with ProxiesAPI + Python, then save to CSV/JSON.
tutorial#python#wikipedia#web-scraping
How to Scrape Apartment Listings from Apartments.com (Python + ProxiesAPI)
Scrape Apartments.com listing cards and detail-page fields with Python. Includes pagination, resilient parsing, retries, and clean JSON/CSV exports.
tutorial#python#apartments#real-estate