Price Scraping: How to Monitor Competitor Prices Automatically
Price scraping is the backbone of competitor price monitoring.
If you sell anything online (SaaS plans, subscriptions, physical products, marketplace listings), the “price” you care about usually changes more often than you think:
- promotions start and end
- shipping rules change
- currency formatting varies
- “from $X” becomes “$X–$Y”
- stock availability changes the visible price
This guide is a practical blueprint for building an automated price monitoring system that you can run daily (or hourly) without it turning into a fragile mess.
We’ll cover:
- what to scrape (and what not to)
- a simple crawl strategy (seed list → fetch → parse → store)
- change detection patterns (hashing, normalization, diffing)
- reliability tactics (timeouts, retries, and ProxiesAPI as a stable fetch layer)
Price monitoring is repetitive by design: the same URLs on a schedule. ProxiesAPI helps stabilize those fetches (and your retries) so your change detection stays accurate.
What “price scraping” really means
At the most basic level, price scraping is:
- fetch a product page (or pricing page)
- extract the price and any context you need to interpret it
- store a timestamped record
- compare today vs yesterday
The complexity comes from context.
For price monitoring, a “price record” should usually include:
- amount (number)
- currency (USD, EUR, INR…)
- unit (per month, per seat, per item)
- availability (in stock / out of stock)
- shipping / fees (if relevant)
- variant (size, color, region)
- source URL + “observed at” timestamp
If you only store a single number, your alerts will be noisy and confusing.
Step 1: Choose a crawl scope (keep it small first)
Start with a small list of URLs you truly care about.
A simple spreadsheet is fine:
- competitor
- product
- URL
- frequency (daily/weekly)
- notes about what to extract
Then expand.
Why this matters: price scraping is usually a scheduled job. If your scope is too wide, failures become the norm and you won’t trust the output.
Step 2: A reliable fetch layer (timeouts + retries)
Most “price scraping” failures are not parsing failures. They’re networking failures:
- timeouts
- intermittent 5xx
- rate limits
Use a fetch wrapper that:
- sets connect/read timeouts
- retries transient status codes
- adds jitter between retries
import random
import time
from typing import Optional
import requests
TIMEOUT = (10, 30)
session = requests.Session()
session.headers.update({
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
)
})
def fetch_html(url: str, *, max_retries: int = 4, backoff_base: float = 1.6) -> str:
last_err: Optional[Exception] = None
for attempt in range(1, max_retries + 1):
try:
r = session.get(url, timeout=TIMEOUT)
if r.status_code in (429, 500, 502, 503, 504):
raise requests.HTTPError(f"{r.status_code} transient", response=r)
r.raise_for_status()
return r.text
except Exception as e:
last_err = e
time.sleep((backoff_base ** attempt) + random.random())
raise RuntimeError(f"Failed after {max_retries} attempts: {url} ({last_err})")
Step 3: Use ProxiesAPI for stability at scale
When your monitoring list grows (dozens → hundreds → thousands of URLs), you need a consistent way to fetch pages.
ProxiesAPI gives you a simple integration point:
API_KEY="YOUR_API_KEY"
TARGET_URL="https://example.com/product"
curl "http://api.proxiesapi.com/?key=${API_KEY}&url=${TARGET_URL}"
In Python:
import os
import urllib.parse
PROXIESAPI_KEY = os.getenv("PROXIESAPI_KEY", "API_KEY")
def proxiesapi_url(target_url: str) -> str:
return "http://api.proxiesapi.com/?" + urllib.parse.urlencode({
"key": PROXIESAPI_KEY,
"url": target_url,
})
def fetch_html_via_proxiesapi(target_url: str) -> str:
return fetch_html(proxiesapi_url(target_url))
Then your monitoring loop can always call fetch_html_via_proxiesapi(url).
No overclaims: proxies won’t fix bad parsing or magically bypass every block — but they do help make long-running jobs less brittle.
Step 4: Extract the price (normalization is the real trick)
Most sites render prices in many formats:
$19.99€19,99From $19$19 / month₹1,499/mo
Your goal is to normalize to a consistent structure.
Here’s a small helper that extracts a numeric amount and a currency symbol (best-effort):
import re
from dataclasses import dataclass
@dataclass
class Price:
amount: float | None
currency: str | None
raw: str
def parse_price_text(text: str) -> Price:
raw = (text or "").strip()
# Currency symbol detection (very small set; expand as needed)
currency = None
if "$" in raw:
currency = "USD"
elif "€" in raw:
currency = "EUR"
elif "£" in raw:
currency = "GBP"
elif "₹" in raw:
currency = "INR"
# Pull first number-like token; handle commas
m = re.search(r"(\d+[\d,]*(?:\.\d+)?)", raw)
amount = None
if m:
amount = float(m.group(1).replace(",", ""))
return Price(amount=amount, currency=currency, raw=raw)
In production, you should parse:
- list price vs sale price
- shipping
- plan interval (monthly/annual)
But even this baseline normalization will reduce noisy alerts.
Step 5: Store price history (so you can trust your system)
A good price scraping system is a time series.
Minimum storage fields:
urlfetched_atprice_rawamountcurrencyhttp_status(even on failure)parse_okboolean
You can store in:
- SQLite (great for solo projects)
- Postgres
- a data warehouse
SQLite example schema:
CREATE TABLE IF NOT EXISTS price_observations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
url TEXT NOT NULL,
fetched_at TEXT NOT NULL,
http_status INTEGER,
price_raw TEXT,
amount REAL,
currency TEXT,
parse_ok INTEGER
);
CREATE INDEX IF NOT EXISTS idx_price_url_time ON price_observations(url, fetched_at);
Step 6: Change detection patterns (choose one)
Option A: Direct comparisons (simple)
If you trust your parser, compare today vs yesterday:
- if amount changed → alert
This is good for stable sites.
Option B: “Normalized hash” (robust)
For pages where price text shifts, compute a normalized representation and hash it:
- remove whitespace
- remove “from”, “starting at”, etc.
- standardize currency symbols
Then alert when the normalized hash changes.
Option C: “Confidence scoring” (best long-term)
Store multiple signals:
- amount
- currency
- availability
- variant
Then only alert when the confidence is high (e.g. amount and currency both parsed).
Step 7: Scheduling + retry strategy (how to run it daily)
A practical schedule:
- daily for most competitors
- hourly for fast-moving marketplaces
A practical retry strategy:
- retry transient failures (429/5xx)
- mark failures in your database
- re-run failed URLs later with a backoff window
Do not “retry forever” — it hides systematic blocks.
Common pitfalls in price scraping
- Promo overlays and modals: price exists but is hidden behind UI.
- A/B tests: two different price layouts.
- Currency localization: same URL shows different currency.
- Variant selection: price depends on size/color; you must choose a variant.
- Out-of-stock behavior: price disappears.
Treat these as data quality problems, not “scraping problems.”
Practical comparison: 3 approaches to competitor price monitoring
| Approach | Best for | Pros | Cons |
|---|---|---|---|
| Manual checks | 1–5 products | No engineering | Not scalable, easy to miss changes |
| Vendor price monitoring tools | Non-technical teams | UI + alerts fast | Cost, limited customization |
| Build your own (price scraping) | Custom workflows | Full control, cheapest at scale | Requires maintenance |
If you already have engineers (or you are one), a simple scraper + database wins quickly.
Where ProxiesAPI fits
A price scraping job is mostly repetition:
- same URLs
- same schedule
- same failure modes
ProxiesAPI is useful when:
- your URL list grows
- you need consistent fetch behavior
- you want to reduce the impact of transient blocks
The honest framing: it’s a reliability tool for your crawler — not magic.
Next upgrades
- caching: avoid re-fetching pages too frequently
- incremental crawling: only re-check URLs that matter
- notifications: Slack/Email when changes happen
- screenshots/HTML snapshots for auditability
If you build those, your “price scraping” system stops being a script and becomes infrastructure.
Price monitoring is repetitive by design: the same URLs on a schedule. ProxiesAPI helps stabilize those fetches (and your retries) so your change detection stays accurate.