Scrape Craigslist Listings by Category and City (Python + ProxiesAPI)

Craigslist is one of the best real-world scraping targets because pages are mostly server-rendered HTML and the structure is predictable. The moment you scale across cities and categories, though, you can run into throttling and inconsistent failures.

In this tutorial we will build a Craigslist scraper in Python that:

  • builds category + city search URLs
  • paginates across results
  • extracts listing fields (title, price, location, posted time, url)
  • dedupes across pages
  • exports a clean CSV
  • optionally routes requests via ProxiesAPI (without rewriting your scraper)

Craigslist search results (we will scrape listing cards)

Scale Craigslist scrapes reliably with ProxiesAPI

Craigslist is lightweight — but once you crawl multiple cities/categories, you still hit throttling and intermittent blocks. ProxiesAPI helps you keep retries and IP rotation centralized in your fetch layer.


What we are scraping (URL patterns + HTML)

Craigslist search URLs typically look like:

  • city base: https://sfbay.craigslist.org
  • search path: /search/sss (for-sale, all)
  • query parameter: ?query=bike
  • pagination offset: &s=120 (offset in results)

Example:

https://sfbay.craigslist.org/search/sss?query=bike&s=120

On many pages you will see a static HTML results list that is easy to parse, with listing cards like:

<li class="cl-static-search-result" title="Classic Trek 720, 60cm">
  <a href="https://sfbay.craigslist.org/eby/bik/...html">
    <div class="title">Classic Trek 720, 60cm</div>
    <div class="details">
      <div class="price">$600</div>
      <div class="location">Lafayette</div>
    </div>
  </a>
</li>

We will parse those cards defensively (some listings are missing price/location).


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

Step 1: A resilient fetch layer (with optional ProxiesAPI)

The key design choice: keep your scraper split into fetch → parse → export.

When you do that, routing via ProxiesAPI is a tiny change: wrap the target URL at the fetch layer.

This example uses a common ProxiesAPI wrapper format:

http://api.proxiesapi.com/?auth_key=YOUR_KEY&url=https://target.com/...

If your ProxiesAPI endpoint shape differs, only proxiesapi_url() needs changing.

import csv
import os
import random
import time
from dataclasses import dataclass
from typing import Iterable
from urllib.parse import quote, urlencode, urljoin

import requests
from bs4 import BeautifulSoup

UA_POOL = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0 Safari/537.36",
]

def proxiesapi_url(target_url: str) -> str:
    key = os.environ.get("PROXIESAPI_KEY")
    if not key:
        return target_url
    return f"http://api.proxiesapi.com/?auth_key={quote(key)}&url={quote(target_url, safe='')}"

@dataclass(frozen=True)
class FetchConfig:
    timeout: tuple[int, int] = (10, 30)
    max_retries: int = 4
    sleep_base: float = 0.8

class Fetcher:
    def __init__(self, cfg: FetchConfig = FetchConfig()):
        self.cfg = cfg
        self.session = requests.Session()

    def get(self, url: str) -> str:
        last_err: Exception | None = None
        for attempt in range(1, self.cfg.max_retries + 1):
            try:
                final = proxiesapi_url(url)
                r = self.session.get(
                    final,
                    timeout=self.cfg.timeout,
                    headers={"User-Agent": random.choice(UA_POOL)},
                )
                r.raise_for_status()
                return r.text
            except Exception as e:
                last_err = e
                if attempt == self.cfg.max_retries:
                    break
                time.sleep(self.cfg.sleep_base * (2 ** (attempt - 1)) + random.random() * 0.25)
        raise last_err or RuntimeError("fetch failed")

Step 2: Build category + city search URLs

Craigslist uses a city subdomain plus a category code:

  • city base: https://{city}.craigslist.org (for example sfbay, newyork)
  • category: sss for-sale all, jjj jobs all (and many more)
def build_search_url(*, city: str, category: str, query: str, offset: int = 0) -> str:
    base = f"https://{city}.craigslist.org"
    params: dict[str, str] = {"query": query}
    if offset:
        params["s"] = str(offset)
    return f"{base}/search/{category}?{urlencode(params)}"

Step 3: Parse listings from a results page

We will try a couple of selectors (Craigslist has changed layout over time):

  • li.cl-static-search-result (newer pages)
  • li.result-row (older layout)
def clean_text(x: str | None) -> str | None:
    if x is None:
        return None
    t = " ".join(x.split()).strip()
    return t or None

def parse_results(html: str, *, base_url: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    rows = soup.select("li.cl-static-search-result")
    if not rows:
        rows = soup.select("li.result-row")

    out: list[dict] = []
    for row in rows:
        a = row.select_one("a[href]")
        url = a.get("href") if a else None
        if url and url.startswith("/"):
            url = urljoin(base_url, url)

        title_el = row.select_one(".title") or row.select_one("a.result-title")
        price_el = row.select_one(".price") or row.select_one(".result-price")
        loc_el = row.select_one(".location") or row.select_one(".result-hood")
        time_el = row.select_one("time[datetime]")

        out.append({
            "title": clean_text(title_el.get_text(" ", strip=True) if title_el else None),
            "price": clean_text(price_el.get_text(" ", strip=True) if price_el else None),
            "location": clean_text(loc_el.get_text(" ", strip=True) if loc_el else None),
            "posted_at": time_el.get("datetime") if time_el else None,
            "url": url,
        })

    return out

Step 4: Crawl pages + dedupe + export CSV

Pagination is offset-based (s=...). Dedupe by listing URL so you do not double count results across pages.

def crawl(*, city: str, category: str = "sss", query: str, pages: int = 5, page_size: int = 120) -> list[dict]:
    fetcher = Fetcher()
    base = f"https://{city}.craigslist.org"

    seen: set[str] = set()
    all_rows: list[dict] = []

    for i in range(pages):
        offset = i * page_size
        url = build_search_url(city=city, category=category, query=query, offset=offset)
        html = fetcher.get(url)
        batch = parse_results(html, base_url=base)

        for row in batch:
            u = row.get("url") or ""
            if not u or u in seen:
                continue
            seen.add(u)
            all_rows.append(row)

        if not batch:
            break

    return all_rows

def write_csv(rows: Iterable[dict], path: str) -> None:
    rows = list(rows)
    fieldnames = ["title", "price", "location", "posted_at", "url"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fieldnames)
        w.writeheader()
        for r in rows:
            w.writerow({k: r.get(k) for k in fieldnames})

if __name__ == "__main__":
    rows = crawl(city="sfbay", category="sss", query="bike", pages=3)
    print("rows:", len(rows))
    write_csv(rows, "craigslist_results.csv")
    print("wrote craigslist_results.csv")

Where ProxiesAPI fits (honestly)

ProxiesAPI will not make a bad scraper magically invisible, but it does give you a clean knob for rotating IPs and centralizing retries/timeouts. If you start seeing 403s, CAPTCHAs, or intermittent failures as you scale across cities, enabling ProxiesAPI at the fetch layer is usually the smallest change with the biggest impact.

Scale Craigslist scrapes reliably with ProxiesAPI

Craigslist is lightweight — but once you crawl multiple cities/categories, you still hit throttling and intermittent blocks. ProxiesAPI helps you keep retries and IP rotation centralized in your fetch layer.

Related guides

How to Scrape Craigslist Listings by Category and City (Python + ProxiesAPI)
Pull Craigslist listings for a chosen city + category, normalize fields, follow listing pages for details, and export clean CSV with retries and anti-block tips.
tutorial#python#craigslist#web-scraping
Python BeautifulSoup Tutorial: Scraping Your First Website (2026)
A beginner-friendly BeautifulSoup tutorial: fetch HTML with requests, parse elements with CSS selectors, handle pagination, avoid common pitfalls, and export results. Includes an honest ProxiesAPI section for when you scale.
tutorial#python beautifulsoup tutorial#python#beautifulsoup
Scrape eBay Listings and Prices (Search + Pagination + CSV)
Build an eBay search-results scraper that extracts titles, prices, shipping, seller, and URLs; paginates cleanly; and exports CSV. Uses a ProxiesAPI-friendly fetch layer and includes a target-page screenshot.
tutorial#python#ebay#web-scraping
Scrape eBay Listings + Sold Prices with Python (Active + Completed Listings)
Build a small eBay dataset (title, price, condition, shipping) from search results, then pull completed/sold prices from the Sold filter. Includes pagination, CSV export, and ProxiesAPI in the fetch layer.
tutorial#python#ebay#web-scraping