engineering
3 guides
Soft-Block Detection for Web Scraping (Python): Catch ‘HTTP 200 but Wrong Page’
Most scrapers fail silently: the request succeeds but the HTML is a block/consent/login page. Here’s how to detect soft-blocks before parsing.
Free Proxy Lists vs a Proxy API: Why Free Breaks in Production
Free proxies look attractive — until your scraper scales. Here’s what fails first, what a proxy API actually fixes, and how to choose the right setup.
Retries, Timeouts, and Backoff for Web Scraping (Python): Production Defaults That Work
Most scrapers fail because of networking, not parsing. Here are sane timeout defaults, a retry policy that won’t DDoS a site, and a drop-in requests/httpx implementation.