Scrape Costco Product Prices with Python (Search + Pagination + Product Pages)

Apr 21, 2026 · tutorial · #python, #costco, #price-scraping, #web-scraping, #beautifulsoup, #csv, #proxies

Costco is a great example of a “real-world” ecommerce target:

search pages (you start from a query)
listing pages (multiple results)
pagination (you need to crawl page 1…N)
product detail pages (true source of price + SKU-ish identifiers)

In this guide we’ll build a repeatable Costco price dataset with Python:

crawl search results for a query (e.g. protein)
collect product URLs across pagination
visit each product page and extract name, price, availability (where available)
export to CSV/JSON
add a resilient network layer with timeouts, retries, and ProxiesAPI integration

Keep Costco crawls stable with ProxiesAPI

Ecommerce targets tend to rate-limit and intermittently block repeat traffic. ProxiesAPI helps you run scheduled price crawls with fewer failures and less babysitting.

Get 1,000 free API calls View pricing

Important notes (before you start)

Websites change often. The selectors below are based on Costco’s current markup and designed to be easy to update.
Costco may show different content by region and may require consent/login for some flows.
Be respectful: crawl slowly, cache results, and don’t hammer endpoints.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml pandas

We’ll use:

requests for HTTP
BeautifulSoup(lxml) for parsing
pandas for easy CSV export (optional)

Step 1: Build a robust fetcher (timeouts + retries)

You want a single place to control:

headers
timeouts
retry/backoff
proxy routing (where ProxiesAPI fits)

from __future__ import annotations

import random
import time
from dataclasses import dataclass

import requests

TIMEOUT = (10, 30)  # connect, read

DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/123.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
}


@dataclass
class FetchConfig:
    use_proxiesapi: bool = True
    proxiesapi_endpoint: str | None = None
    max_retries: int = 4
    min_sleep: float = 0.8
    max_sleep: float = 1.8


class Fetcher:
    def __init__(self, cfg: FetchConfig):
        self.cfg = cfg
        self.s = requests.Session()
        self.s.headers.update(DEFAULT_HEADERS)

    def _sleep_jitter(self):
        time.sleep(random.uniform(self.cfg.min_sleep, self.cfg.max_sleep))

    def get(self, url: str) -> str:
        last_err = None

        for attempt in range(1, self.cfg.max_retries + 1):
            try:
                self._sleep_jitter()

                # Where ProxiesAPI fits:
                # - If you have a ProxiesAPI HTTP(S) proxy endpoint, route traffic through it.
                # - Keep this as a config toggle so you can test without proxies.
                proxies = None
                if self.cfg.use_proxiesapi and self.cfg.proxiesapi_endpoint:
                    proxies = {
                        "http": self.cfg.proxiesapi_endpoint,
                        "https": self.cfg.proxiesapi_endpoint,
                    }

                r = self.s.get(url, timeout=TIMEOUT, proxies=proxies)

                # A few sites return 403/429 intermittently. Treat as retryable.
                if r.status_code in (403, 429, 500, 502, 503, 504):
                    raise requests.HTTPError(
                        f"HTTP {r.status_code} for {url}", response=r
                    )

                r.raise_for_status()
                return r.text

            except Exception as e:
                last_err = e
                backoff = 1.2 ** attempt
                time.sleep(backoff)

        raise RuntimeError(f"Failed after retries: {url}") from last_err

Configure ProxiesAPI

Set your proxy endpoint as an env var (example name):

export PROXIESAPI_PROXY_URL="http://USER:PASS@proxy.proxiesapi.com:PORT"

Then in Python:

import os

cfg = FetchConfig(
    use_proxiesapi=True,
    proxiesapi_endpoint=os.getenv("PROXIESAPI_PROXY_URL"),
)
fetcher = Fetcher(cfg)

If you don’t have the endpoint yet, you can run with use_proxiesapi=False and still validate selectors.

Step 2: Costco URLs we’ll crawl

Costco search URLs typically look like:

Search: https://www.costco.com/CatalogSearch?dept=All&keyword=protein

Pagination/parameters can vary; the practical approach is:

Start from a search URL
Parse product card URLs from the HTML
Find the “next page” link (if any) and repeat

Step 3: Parse search/listing pages (product cards)

We’ll extract:

product name
product URL
optional displayed price (sometimes visible on cards)

from urllib.parse import urljoin
from bs4 import BeautifulSoup

BASE = "https://www.costco.com"


def parse_search_page(html: str) -> tuple[list[dict], str | None]:
    soup = BeautifulSoup(html, "lxml")

    items: list[dict] = []

    # Product tiles commonly contain an anchor to the PDP.
    # Use a broad selector, then normalize.
    for a in soup.select('a[href*=".product"]'):
        href = a.get("href")
        if not href:
            continue

        url = href if href.startswith("http") else urljoin(BASE, href)

        # Try to pick a human-visible title from within the tile.
        title = a.get_text(" ", strip=True) or None

        # Filter out non-product anchors.
        if "/" not in url or ".product" not in url:
            continue

        items.append({
            "title": title,
            "url": url,
        })

    # Pagination: look for a "next" link (site markup changes; keep logic forgiving).
    next_url = None
    next_a = soup.select_one('a[aria-label="Next"], a[rel="next"], a.pagination-next')
    if next_a and next_a.get("href"):
        href = next_a.get("href")
        next_url = href if href.startswith("http") else urljoin(BASE, href)

    # Deduplicate by URL
    dedup = {}
    for it in items:
        dedup[it["url"]] = it

    return list(dedup.values()), next_url

Sanity check the parser

query = "protein"
start = f"{BASE}/CatalogSearch?dept=All&keyword={query}"

html = fetcher.get(start)
items, next_url = parse_search_page(html)

print("items", len(items))
print("next", next_url)
print(items[:3])

Step 4: Parse a Costco product page (PDP)

On the product page, you want:

a stable product identifier (often embedded in the URL or in structured data)
title
price
availability / stock messaging (when present)

A reliable strategy:

Prefer structured data (application/ld+json) if available
Fall back to visible DOM selectors

import json
import re


def extract_ld_json(soup: BeautifulSoup) -> list[dict]:
    out = []
    for s in soup.select('script[type="application/ld+json"]'):
        raw = s.get_text("\n", strip=True)
        if not raw:
            continue
        try:
            data = json.loads(raw)
            if isinstance(data, dict):
                out.append(data)
            elif isinstance(data, list):
                out.extend([d for d in data if isinstance(d, dict)])
        except Exception:
            continue
    return out


def parse_product_page(url: str, html: str) -> dict:
    soup = BeautifulSoup(html, "lxml")

    title = None
    price = None
    currency = None
    availability = None

    # 1) Try JSON-LD
    for block in extract_ld_json(soup):
        # Products sometimes live under @graph
        graph = block.get("@graph") if isinstance(block.get("@graph"), list) else None
        candidates = graph if graph else [block]
        for obj in candidates:
            if obj.get("@type") in ("Product", ["Product"]):
                title = title or obj.get("name")

                offers = obj.get("offers")
                if isinstance(offers, dict):
                    price = price or offers.get("price")
                    currency = currency or offers.get("priceCurrency")
                    availability = availability or offers.get("availability")

    # 2) Fall back to visible selectors
    if not title:
        h1 = soup.select_one("h1")
        title = h1.get_text(" ", strip=True) if h1 else None

    if not price:
        # Common pattern: price fragments split across spans.
        # Keep it flexible: look for something that looks like $12.34
        text = soup.get_text("\n", strip=True)
        m = re.search(r"\$(\d{1,4}(?:,\d{3})*(?:\.\d{2})?)", text)
        if m:
            price = m.group(1)
            currency = currency or "USD"

    return {
        "url": url,
        "title": title,
        "price": price,
        "currency": currency,
        "availability": availability,
    }

Step 5: Crawl end-to-end (search → products)

Now we stitch it together:

crawl up to max_pages of search results
collect unique product URLs
fetch + parse each product page

from urllib.parse import urlencode


def crawl_costco_search(keyword: str, max_pages: int = 5) -> list[dict]:
    params = {"dept": "All", "keyword": keyword}
    url = f"{BASE}/CatalogSearch?{urlencode(params)}"

    products: dict[str, dict] = {}

    pages = 0
    while url and pages < max_pages:
        pages += 1
        html = fetcher.get(url)
        items, next_url = parse_search_page(html)

        for it in items:
            products[it["url"]] = it

        print(f"page {pages}: found {len(items)} items (total unique {len(products)})")
        url = next_url

    return list(products.values())


def crawl_product_details(urls: list[str]) -> list[dict]:
    out = []
    for i, url in enumerate(urls, start=1):
        html = fetcher.get(url)
        data = parse_product_page(url, html)
        out.append(data)
        print(f"{i}/{len(urls)} parsed", data.get("title"), data.get("price"))
    return out


items = crawl_costco_search("protein", max_pages=3)
urls = [it["url"] for it in items]
rows = crawl_product_details(urls[:25])  # start small

print("rows", len(rows))
print(rows[0])

Step 6: Export to CSV + JSON

import json
import pandas as pd

pd.DataFrame(rows).to_csv("costco_prices.csv", index=False)

with open("costco_prices.json", "w", encoding="utf-8") as f:
    json.dump(rows, f, ensure_ascii=False, indent=2)

print("wrote costco_prices.csv + costco_prices.json")

Practical production upgrades

If you’re turning this into a tracker (daily/weekly price checks):

Store results in SQLite/Postgres keyed by product URL
Cache HTML for debugging failed parses
Add concurrency cautiously (start with 2–4 threads)
Add alerting when a price changes beyond a threshold
Keep a block/failure rate dashboard (403/429/timeout counts)

QA checklist

Search parser extracts mostly product URLs (spot-check 10)
Pagination finds next page or stops cleanly
Product parser returns non-empty title for most URLs
Price extraction succeeds for a meaningful subset
Exports are valid CSV/JSON

Where ProxiesAPI helps (honestly)

Ecommerce sites are where scraping reliability becomes a job:

IP-based rate limits
intermittent 403/429
different content per region

ProxiesAPI doesn’t “magically bypass everything,” but it does give you a stable proxy layer you can turn on when your crawl starts failing.

If you keep your network layer isolated (like Fetcher above), you can swap proxy settings without rewriting your parser.

Keep Costco crawls stable with ProxiesAPI

Ecommerce targets tend to rate-limit and intermittently block repeat traffic. ProxiesAPI helps you run scheduled price crawls with fewer failures and less babysitting.

Get 1,000 free API calls View pricing

Build a practical Steam search scraper: fetch the real HTML, extract game title/appid/price/discount/review summary, and export clean CSV/JSON. Includes a screenshot and a ProxiesAPI-based fetch layer for stability.

tutorial#python#steam#price-scraping

Scrape eBay Listings and Prices (Green List site)

Scrape search results via ProxiesAPI, extract title/price/url/seller, and save a clean dataset (with a screenshot).

tutorial#python#ebay#web-scraping

Scrape Financial Data from Yahoo Finance (Green List site)

Fetch a quote page via ProxiesAPI, parse price + key stats, and export to CSV (with a screenshot).

tutorial#python#yahoo-finance#stocks

Scrape Book Data from Goodreads with Python (List Pages + Pagination)

Scrape Goodreads list pages for title/author/rating/reviews with Python: fetch via ProxiesAPI, parse real HTML selectors, paginate safely, and export CSV/JSON.

tutorial#python#goodreads#books

Scrape Costco Product Prices with Python (Search + Pagination + Product Pages)

Related guides