Scrape Product Data from Target.com (title, price, availability) with Python + ProxiesAPI

Target product pages are a classic e-commerce scraping use case:

  • price monitoring (competitive intel)
  • availability tracking (in-stock/out-of-stock)
  • catalog enrichment (names, brand, bullets)

In this tutorial we’ll build a practical Target.com PDP scraper in Python that extracts:

  • product title
  • current price (including sale price when present)
  • availability / stock messaging
  • canonical URL + TCIN (Target Catalog Item Number) when available

We’ll also add:

  • retries + timeouts
  • defensive parsing (no “magic selectors” without fallback)
  • CSV export
  • a network layer that’s easy to route through ProxiesAPI

Target product page (we’ll scrape title, price, and availability)

Make Target crawls more reliable with ProxiesAPI

Retail sites can rate-limit, geo-fence, or intermittently serve different markup. ProxiesAPI helps keep your fetch layer stable so your parser sees consistent HTML when you scale beyond a handful of pages.


Important notes (read before you scrape)

  • Terms & policies: Always review Target’s terms and robots.txt. This guide is for educational purposes.
  • HTML variability: Target is a modern retail site. You may see different HTML depending on:
    • location / store pickup settings
    • A/B experiments
    • bot detection responses
  • Prefer “data in the page” over brittle selectors: Many retail PDPs embed structured data (application/ld+json) or JSON blobs that are more stable than CSS class names.

Our approach:

  1. Fetch the page HTML reliably
  2. Try to extract data from embedded JSON first (best)
  3. Fall back to HTML selectors
  4. Normalize to a clean record

What we’re scraping: a Target product detail page (PDP)

A Target PDP typically looks like:

  • URL like https://www.target.com/p/.../-/A-<id>
  • A product title near the top
  • Price module (regular price or sale)
  • Availability messaging (shipping/pickup)

Quick sanity check with curl

Pick a Target PDP URL you’re allowed to test with (use your browser to copy a product page URL).

curl -s "https://www.target.com/" | head -n 5

If you can load HTML, you can parse it.


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml tenacity pandas

We’ll use:

  • requests for HTTP
  • BeautifulSoup(lxml) for parsing
  • tenacity for retries
  • pandas for CSV export (optional but convenient)

Step 1: Build a reliable fetch() (timeouts + retries)

A scraper fails more often due to networking than parsing. Start with a robust fetch.

from __future__ import annotations

import random
import time
from dataclasses import dataclass

import requests
from tenacity import retry, stop_after_attempt, wait_exponential_jitter


DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/122.0.0.0 Safari/537.36"
    ),
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
}


@dataclass
class FetchConfig:
    timeout: tuple[int, int] = (10, 30)  # connect, read
    max_attempts: int = 4
    min_sleep: float = 0.4
    max_sleep: float = 1.2


class HttpClient:
    def __init__(self, config: FetchConfig | None = None):
        self.config = config or FetchConfig()
        self.session = requests.Session()

    @retry(
        stop=stop_after_attempt(4),
        wait=wait_exponential_jitter(initial=1, max=12),
        reraise=True,
    )
    def get_html(self, url: str) -> str:
        # Light jitter between attempts (helps with transient blocks)
        time.sleep(random.uniform(self.config.min_sleep, self.config.max_sleep))

        r = self.session.get(url, headers=DEFAULT_HEADERS, timeout=self.config.timeout)

        # Common “soft blocks” still return 200 with unexpected HTML.
        # You’ll detect them in the parsing/validation step.
        r.raise_for_status()
        return r.text

Where ProxiesAPI fits

You usually integrate ProxiesAPI at the network layer. There are two common patterns:

  1. Proxy URL (set proxies= in requests)
  2. Gateway fetch API (you call ProxiesAPI to fetch the page and return HTML)

Because ProxiesAPI deployments vary by account and product configuration, keep the integration isolated to a single function.

Here’s a proxy-based hook you can adapt with your ProxiesAPI endpoint/credentials:

import os


def build_proxies() -> dict | None:
    # Example only. Replace with your ProxiesAPI proxy URL(s).
    proxy = os.getenv("PROXIESAPI_PROXY_URL")
    if not proxy:
        return None
    return {"http": proxy, "https": proxy}


class HttpClient:
    def __init__(self, config: FetchConfig | None = None):
        self.config = config or FetchConfig()
        self.session = requests.Session()
        self.proxies = build_proxies()

    @retry(stop=stop_after_attempt(4), wait=wait_exponential_jitter(initial=1, max=12), reraise=True)
    def get_html(self, url: str) -> str:
        time.sleep(random.uniform(self.config.min_sleep, self.config.max_sleep))
        r = self.session.get(
            url,
            headers=DEFAULT_HEADERS,
            timeout=self.config.timeout,
            proxies=self.proxies,
        )
        r.raise_for_status()
        return r.text

If you don’t set PROXIESAPI_PROXY_URL, it will run without proxies.


Step 2: Extract product data from embedded JSON (preferred)

Many product pages include structured data in JSON-LD:

<script type="application/ld+json">{ ... }</script>

When it’s present, it’s often the most stable way to get:

  • name/title
  • offers/price
  • availability

Let’s parse JSON-LD safely.

import json
from bs4 import BeautifulSoup


def extract_json_ld(soup: BeautifulSoup) -> list[dict]:
    out: list[dict] = []
    for tag in soup.select('script[type="application/ld+json"]'):
        raw = tag.get_text("\n", strip=True)
        if not raw:
            continue
        try:
            data = json.loads(raw)
        except json.JSONDecodeError:
            continue

        if isinstance(data, dict):
            out.append(data)
        elif isinstance(data, list):
            out.extend([d for d in data if isinstance(d, dict)])
    return out


def pick_product_schema(json_ld_docs: list[dict]) -> dict | None:
    # Look for @type Product or something that contains product-ish fields
    for doc in json_ld_docs:
        t = doc.get("@type")
        if t == "Product":
            return doc
    return None

Now we can extract fields.

from urllib.parse import urlparse


def normalize_availability(value: str | None) -> str | None:
    if not value:
        return None
    v = value.lower()
    if "instock" in v or "in_stock" in v:
        return "in_stock"
    if "outofstock" in v or "out_of_stock" in v:
        return "out_of_stock"
    if "preorder" in v:
        return "preorder"
    return value


def extract_from_product_schema(product: dict) -> dict:
    name = product.get("name")
    url = product.get("url")

    offers = product.get("offers")
    price = None
    availability = None

    # offers can be dict or list
    if isinstance(offers, dict):
        price = offers.get("price")
        availability = offers.get("availability")
    elif isinstance(offers, list) and offers:
        o0 = offers[0]
        if isinstance(o0, dict):
            price = o0.get("price")
            availability = o0.get("availability")

    # basic normalization
    try:
        price = float(price) if price is not None else None
    except (TypeError, ValueError):
        price = None

    return {
        "title": name,
        "canonical_url": url,
        "price": price,
        "availability": normalize_availability(availability),
    }

Step 3: Fallback parsing from HTML (when JSON-LD is missing)

If JSON-LD isn’t available (or doesn’t contain offers), fall back to HTML.

Two rules:

  • Prefer semantic attributes ([data-test], meta[property], etc.) over CSS class names.
  • Add multiple fallbacks for each field.
import re


def text_or_none(el) -> str | None:
    if not el:
        return None
    t = el.get_text(" ", strip=True)
    return t or None


def parse_price(text: str | None) -> float | None:
    if not text:
        return None
    # capture something like $12.34
    m = re.search(r"(\d+[\d,]*\.?\d*)", text.replace(",", ""))
    if not m:
        return None
    try:
        return float(m.group(1))
    except ValueError:
        return None


def extract_from_html(soup: BeautifulSoup) -> dict:
    # Title: try common patterns
    title = (
        text_or_none(soup.select_one('h1'))
        or soup.title.get_text(strip=True) if soup.title else None
    )

    # Price: try a few likely containers
    price_text = (
        text_or_none(soup.select_one('[data-test="product-price"]'))
        or text_or_none(soup.select_one('[data-test="product-price"] span'))
        or text_or_none(soup.select_one('[data-test="offerPrice"]'))
        or text_or_none(soup.select_one('meta[property="product:price:amount"]'))
    )

    # meta tag case
    if price_text and hasattr(soup.select_one('meta[property="product:price:amount"]'), 'get'):
        meta = soup.select_one('meta[property="product:price:amount"]')
        if meta and meta.get('content'):
            price_text = meta.get('content')

    price = parse_price(price_text)

    # Availability: look for common strings in shipping/pickup modules
    availability = None
    candidates = soup.select('[data-test*="fulfillment"], [data-test*="ship"], [data-test*="pickup"]')
    joined = " | ".join([c.get_text(" ", strip=True) for c in candidates[:8] if c.get_text(strip=True)])
    if joined:
        low = joined.lower()
        if "out of stock" in low or "sold out" in low:
            availability = "out_of_stock"
        elif "in stock" in low or "available" in low:
            availability = "in_stock"

    # Canonical URL
    canonical = None
    link = soup.select_one('link[rel="canonical"]')
    if link:
        canonical = link.get('href')

    return {
        "title": title,
        "canonical_url": canonical,
        "price": price,
        "availability": availability,
    }

HTML differs across products and regions. That’s why the next step is validation.


Step 4: Put it together: scrape_product()

from bs4 import BeautifulSoup


def scrape_target_product(url: str, client: HttpClient | None = None) -> dict:
    client = client or HttpClient()

    html = client.get_html(url)
    soup = BeautifulSoup(html, "lxml")

    # 1) Try JSON-LD
    json_ld_docs = extract_json_ld(soup)
    product_doc = pick_product_schema(json_ld_docs)

    data = {}
    if product_doc:
        data.update(extract_from_product_schema(product_doc))

    # 2) Fill missing from HTML
    if not data.get("title") or data.get("price") is None:
        data.update({k: v for k, v in extract_from_html(soup).items() if v is not None})

    # 3) Normalize URL
    if not data.get("canonical_url"):
        data["canonical_url"] = url

    # 4) Basic validation (detect blocks)
    if not data.get("title"):
        raise ValueError("Missing title — possible block/consent page or markup change")

    return {
        "source": "target",
        "input_url": url,
        **data,
    }

Step 5: Run it on a list of product URLs and export to CSV

import pandas as pd


def run(urls: list[str]) -> None:
    client = HttpClient()
    rows = []

    for u in urls:
        try:
            row = scrape_target_product(u, client=client)
            rows.append(row)
            print("OK", u, row.get("price"), row.get("availability"))
        except Exception as e:
            print("FAIL", u, repr(e))

    df = pd.DataFrame(rows)
    df.to_csv("target_products.csv", index=False)
    print("wrote target_products.csv", len(df))


if __name__ == "__main__":
    urls = [
        "https://www.target.com/p/EXAMPLE/-/A-00000000",
    ]
    run(urls)

Replace the example URL with real Target PDP URLs you’re allowed to scrape.


Debugging checklist (when it fails)

  1. Is it a soft block?
    • title missing
    • HTML looks like a challenge/consent page
  2. Did markup change?
    • inspect the HTML (save it to disk for a failing URL)
  3. Location-based changes
    • price/availability depends on store/zip
  4. Add caching
    • avoid re-fetching unchanged pages during development

Save failing HTML for inspection

from pathlib import Path


def save_html(slug: str, html: str) -> None:
    Path("debug").mkdir(exist_ok=True)
    Path("debug") .joinpath(f"{slug}.html").write_text(html, encoding="utf-8")

Where ProxiesAPI helps (realistic)

If you’re scraping a few pages occasionally, you might be fine without proxies.

When you scale to:

  • many product pages
  • repeated price checks
  • multiple regions

…you start seeing more rate limits, timeouts, and inconsistent responses.

ProxiesAPI helps by giving you a consistent proxy layer so your get_html() call succeeds more often — and your parsing logic runs on valid HTML instead of random error pages.


Next upgrades

  • extract more fields (brand, images, rating, reviews)
  • add concurrency with httpx + async
  • store results in SQLite (incremental updates)
  • implement “change detection” so you only alert when price changes
Make Target crawls more reliable with ProxiesAPI

Retail sites can rate-limit, geo-fence, or intermittently serve different markup. ProxiesAPI helps keep your fetch layer stable so your parser sees consistent HTML when you scale beyond a handful of pages.

Related guides

How to Scrape Walmart Product Data at Scale (Python + ProxiesAPI)
Extract product title, price, availability, and rating from Walmart product pages using a session + retry strategy. Includes a real screenshot and production-ready parsing patterns.
tutorial#python#walmart#web-scraping
How to Scrape Cars.com Used Car Prices (Python + ProxiesAPI)
Extract listing title, price, mileage, location, and dealer info from Cars.com search results + detail pages. Includes selector notes, pagination, and a polite crawl plan.
tutorial#python#cars.com#price-scraping
How to Scrape Booking.com Hotel Prices with Python (Using ProxiesAPI)
Extract hotel names, nightly prices, review scores, and basic availability fields from Booking.com search results using Python + BeautifulSoup, with ProxiesAPI for more reliable fetching.
tutorial#python#booking#price-scraping
Scrape Product Data from Amazon (with Python + ProxiesAPI)
Extract Amazon product title, price, rating, and availability from a product page using requests + BeautifulSoup, with retries and proxy-backed fetching via ProxiesAPI.
tutorial#python#amazon#web-scraping