Scrape Craigslist Listings by Category and City

Craigslist is still one of the cleanest places to learn practical scraping. Search pages are mostly server-rendered, pagination is explicit, and the fields you actually want for analysis are easy to describe: title, price, neighborhood, posting time, and URL.

In this guide we will build a Python scraper that:

  • accepts any Craigslist city and category
  • walks paginated search results
  • extracts listing metadata into structured rows
  • deduplicates by canonical listing URL
  • exports clean CSV output
  • optionally routes requests through ProxiesAPI without rewriting the parser

Craigslist search results screenshot

Keep multi-city Craigslist crawls stable with ProxiesAPI

Craigslist is mostly static HTML, but the failure rate still rises once you fan out across dozens of city/category combinations. ProxiesAPI gives you a cleaner fetch layer when you need retries and IP rotation.


What Craigslist search pages look like

The main URL pattern is:

  • city base: https://{city}.craigslist.org
  • category path: /search/{category}
  • query string: ?query=bike
  • page offset: &s=120

Example:

https://sfbay.craigslist.org/search/sss?query=bike&s=120

Craigslist has used two closely related result layouts over time, so it is worth parsing both:

  • newer static cards: li.cl-static-search-result
  • older cards: li.result-row

The most useful sub-elements are usually:

  • title: .title or a.result-title
  • price: .price or .result-price
  • neighborhood: .location or .result-hood
  • timestamp: time[datetime]

That fallback logic is the difference between a tutorial that works for one page and a scraper you can keep alive.


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

We will keep the scraper split into fetch -> parse -> crawl -> export. That makes it easier to test and also makes ProxiesAPI a tiny change instead of a rewrite.


Step 1: Build a fetch layer with optional ProxiesAPI

import os
import random
import time
from dataclasses import dataclass
from typing import Optional
from urllib.parse import quote

import requests

UA_POOL = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0 Safari/537.36",
]


def proxiesapi_url(target_url: str) -> str:
    key = os.environ.get("PROXIESAPI_KEY")
    if not key:
        return target_url
    return f"http://api.proxiesapi.com/?auth_key={quote(key)}&url={quote(target_url, safe='')}"


@dataclass(frozen=True)
class FetchConfig:
    timeout: tuple[int, int] = (10, 30)
    max_retries: int = 4
    base_sleep: float = 0.8


class Fetcher:
    def __init__(self, config: FetchConfig = FetchConfig()):
        self.config = config
        self.session = requests.Session()

    def get(self, url: str) -> str:
        last_error: Optional[Exception] = None
        for attempt in range(1, self.config.max_retries + 1):
            try:
                final_url = proxiesapi_url(url)
                response = self.session.get(
                    final_url,
                    timeout=self.config.timeout,
                    headers={"User-Agent": random.choice(UA_POOL)},
                )
                response.raise_for_status()
                return response.text
            except Exception as exc:
                last_error = exc
                if attempt == self.config.max_retries:
                    break
                time.sleep(self.config.base_sleep * (2 ** (attempt - 1)) + random.random() * 0.25)
        raise last_error or RuntimeError("fetch failed")

If PROXIESAPI_KEY is not set, the fetcher hits Craigslist directly. Once you start scaling across many city/category/query combinations, you can enable ProxiesAPI at the fetch layer and leave the parser alone.


Step 2: Build Craigslist search URLs

from urllib.parse import urlencode


def build_search_url(*, city: str, category: str, query: str, offset: int = 0) -> str:
    base = f"https://{city}.craigslist.org"
    params = {"query": query}
    if offset:
        params["s"] = str(offset)
    return f"{base}/search/{category}?{urlencode(params)}"

Useful category codes:

  • sss for sale
  • apa apartments / housing
  • jjj jobs
  • zip free stuff

That means one scraper can cover very different workflows just by changing parameters.


Step 3: Parse result cards defensively

from urllib.parse import urljoin
from bs4 import BeautifulSoup


def clean_text(value: str | None) -> str | None:
    if value is None:
        return None
    text = " ".join(value.split()).strip()
    return text or None


def parse_results(html: str, *, base_url: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    rows = soup.select("li.cl-static-search-result")
    if not rows:
        rows = soup.select("li.result-row")

    results: list[dict] = []
    for row in rows:
        link = row.select_one("a[href]")
        title_el = row.select_one(".title") or row.select_one("a.result-title")
        price_el = row.select_one(".price") or row.select_one(".result-price")
        loc_el = row.select_one(".location") or row.select_one(".result-hood")
        time_el = row.select_one("time[datetime]")

        url = link.get("href") if link else None
        if url and url.startswith("/"):
            url = urljoin(base_url, url)

        results.append(
            {
                "title": clean_text(title_el.get_text(" ", strip=True) if title_el else None),
                "price": clean_text(price_el.get_text(" ", strip=True) if price_el else None),
                "neighborhood": clean_text(loc_el.get_text(" ", strip=True) if loc_el else None),
                "posted_at": time_el.get("datetime") if time_el else None,
                "url": url,
            }
        )

    return results

The reason to prefer multiple selectors is simple: Craigslist sometimes exposes slightly different markup depending on the category, the city, and which page template you hit.


Step 4: Crawl pages and deduplicate rows

Craigslist pagination is offset-based, usually stepping by 120 results.

def crawl(city: str, category: str, query: str, pages: int = 3, page_size: int = 120) -> list[dict]:
    fetcher = Fetcher()
    base_url = f"https://{city}.craigslist.org"
    seen_urls: set[str] = set()
    rows: list[dict] = []

    for page_number in range(pages):
        offset = page_number * page_size
        url = build_search_url(city=city, category=category, query=query, offset=offset)
        html = fetcher.get(url)
        batch = parse_results(html, base_url=base_url)

        for item in batch:
            listing_url = item.get("url")
            if not listing_url or listing_url in seen_urls:
                continue
            seen_urls.add(listing_url)
            rows.append(item)

        if not batch:
            break

    return rows

Deduping by URL matters because listings can be reshuffled or repeated across adjacent pages when inventory is changing in real time.


Step 5: Export a CSV

import csv


def write_csv(path: str, rows: list[dict]) -> None:
    fieldnames = ["title", "price", "neighborhood", "posted_at", "url"]
    with open(path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for row in rows:
            writer.writerow({key: row.get(key) for key in fieldnames})


if __name__ == "__main__":
    rows = crawl(city="sfbay", category="sss", query="bike", pages=3)
    print(f"scraped {len(rows)} unique rows")
    write_csv("craigslist_bikes.csv", rows)
    print("wrote craigslist_bikes.csv")

Typical uses for this dataset:

  • local market tracking
  • used-goods price monitoring
  • apartment listing snapshots
  • lead generation for niche directories

Practical tips before you scale

  1. Start with one city and one category until the parser is stable.
  2. Keep concurrency low. Craigslist is lightweight, but a polite crawl still wins.
  3. Cache raw HTML while developing so you are not repeatedly hitting live pages.
  4. Expect missing fields. Some posts have no price or neighborhood.
  5. Turn on ProxiesAPI only when you actually need the extra resilience.

Craigslist is a good reminder that a scraper does not need to be fancy to be useful. If your selectors are honest, your fetch layer retries sanely, and your rows are clean, you already have something production-shaped.

Keep multi-city Craigslist crawls stable with ProxiesAPI

Craigslist is mostly static HTML, but the failure rate still rises once you fan out across dozens of city/category combinations. ProxiesAPI gives you a cleaner fetch layer when you need retries and IP rotation.

Related guides

Scrape Craigslist Listings by Category and City (Python + ProxiesAPI)
Build a Craigslist city+category scraper with pagination, dedupe, and CSV export. Includes selectors, anti-block hygiene, and screenshot proof.
tutorial#python#craigslist#web-scraping
Scrape Rightmove Sold Prices
Walk through building a sold-price dataset from Rightmove with listing details, pagination, and clean CSV export.
tutorial#python#rightmove#real-estate
Steam Deal Tracker: Scrape Daily Specials + Price Drops (Python + ProxiesAPI)
Scrape Steam specials/search pages via ProxiesAPI, extract discount + price + appid, and persist a daily snapshot to detect price drops. Includes pagination, CSV export, and a screenshot of the target page.
tutorial#python#steam#price-tracking
Scrape Craigslist Listings by Category and City (Python + ProxiesAPI)
Build a Craigslist scraper with pagination + dedupe, capture title/price/location/date, export CSV, and keep the fetch layer resilient with ProxiesAPI. Includes a target-page screenshot.
tutorial#python#craigslist#web-scraping