How to Scrape Craigslist with Python (the Safe Way): RSS + Detail Pages

Jun 17, 2026 · tutorial · #python, #craigslist, #web-scraping, #rss, #requests, #beautifulsoup

Craigslist gives you a major gift that many marketplaces do not:

public RSS feeds for search result discovery

That means you do not need to brute-force pagination just to find new listings. You can:

pull the RSS feed for a city and category
use it as your “new item” stream
fetch each listing page for richer fields

That is the safest pattern because it lowers request volume and makes dedupe easier.

In this guide we will build a real scraper that extracts:

listing title
URL
posted time
price
neighborhood
map address
attributes
listing body text

Turn a Craigslist scraper into a dependable daily job

Craigslist is lighter than most marketplaces, but a real pipeline still needs retries, pacing, and stable networking. ProxiesAPI helps keep your scheduled runs boring.

Get 1,000 free API calls View pricing

What we are scraping

Craigslist organizes listings by:

city subdomain, such as sfbay.craigslist.org
category code, such as bia for bicycles

RSS feed pattern:

https://<city>.craigslist.org/search/<category>?format=rss

Example:

https://sfbay.craigslist.org/search/bia?format=rss

You can keep normal search filters and still get RSS:

https://sfbay.craigslist.org/search/bia?query=trek&min_price=100&max_price=900&format=rss

That is why RSS is the right discovery layer here.

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

Step 1: Fetch and parse the RSS feed

The RSS response is XML, so we will parse it with BeautifulSoup’s XML mode.

from __future__ import annotations

import requests
from bs4 import BeautifulSoup

TIMEOUT = (10, 30)
UA = "Mozilla/5.0 (compatible; ProxiesAPIGuidesBot/1.0; +https://www.proxiesapi.com/)"

session = requests.Session()
session.headers.update({"User-Agent": UA})


def fetch_text(url: str) -> str:
    r = session.get(url, timeout=TIMEOUT)
    r.raise_for_status()
    return r.text


def parse_rss(xml_text: str) -> list[dict]:
    soup = BeautifulSoup(xml_text, "xml")
    items = []

    for item in soup.select("item"):
        items.append({
            "title": item.title.get_text(" ", strip=True) if item.title else None,
            "url": item.link.get_text(strip=True) if item.link else None,
            "published": item.pubDate.get_text(strip=True) if item.pubDate else None,
        })

    return items


rss_url = "https://sfbay.craigslist.org/search/bia?query=trek&format=rss"
xml = fetch_text(rss_url)
items = parse_rss(xml)
print("rss items:", len(items))
print(items[0])

Typical output:

rss items: 25
{'title': 'Trek bike ...', 'url': 'https://sfbay.craigslist.org/...html', 'published': 'Wed, 17 Jun 2026 07:10:00 -0700'}

Step 2: Inspect the detail page structure

The listing detail page is where the useful fields live. On normal public listings, the most helpful selectors are:

title: span#titletextonly
price: span.price
description: section#postingbody
map address: div.mapaddress
metadata groups: p.attrgroup span

Those selectors have been stable for years because they are tied to Craigslist’s very plain HTML templates.

Step 3: Parse each listing page

import re
from bs4 import BeautifulSoup


def clean_space(text: str) -> str:
    return re.sub(r"\s+", " ", (text or "").strip())


def parse_listing(html: str, url: str) -> dict:
    soup = BeautifulSoup(html, "lxml")

    title_el = soup.select_one("span#titletextonly")
    price_el = soup.select_one("span.price")
    body_el = soup.select_one("section#postingbody")
    addr_el = soup.select_one("div.mapaddress")
    time_el = soup.select_one("time.date.timeago")

    attrs = [
        s.get_text(" ", strip=True)
        for s in soup.select("p.attrgroup span")
        if s.get_text(strip=True)
    ]

    body = None
    if body_el:
        raw = body_el.get_text("\n", strip=True)
        raw = raw.replace("QR Code Link to This Post", "").strip()
        body = clean_space(raw)

    return {
        "url": url,
        "title": title_el.get_text(" ", strip=True) if title_el else None,
        "price": price_el.get_text(strip=True) if price_el else None,
        "address": addr_el.get_text(" ", strip=True) if addr_el else None,
        "posted_datetime": time_el.get("datetime") if time_el else None,
        "attributes": attrs,
        "body": body,
        "body_length": len(body or ""),
    }

Step 4: Combine RSS discovery with detail scraping

import time
import random


def polite_sleep(min_s: float = 1.0, max_s: float = 2.5) -> None:
    time.sleep(random.uniform(min_s, max_s))


def scrape_from_rss(rss_url: str, limit: int = 10) -> list[dict]:
    xml = fetch_text(rss_url)
    feed_items = parse_rss(xml)
    rows = []

    for item in feed_items[:limit]:
        html = fetch_text(item["url"])
        row = parse_listing(html, item["url"])
        row["rss_title"] = item["title"]
        row["rss_published"] = item["published"]
        rows.append(row)
        polite_sleep()

    return rows


rows = scrape_from_rss(
    "https://sfbay.craigslist.org/search/bia?query=trek&format=rss",
    limit=5,
)
print(rows[0])

That gives you a safe baseline:

one feed request
a small number of detail requests
clean, dedupe-friendly records

Step 5: Dedupe and export

Craigslist URLs already contain unique listing IDs, so dedupe on URL first.

import csv


def dedupe(rows: list[dict]) -> list[dict]:
    seen = set()
    out = []
    for row in rows:
        if row["url"] in seen:
            continue
        seen.add(row["url"])
        out.append(row)
    return out


unique_rows = dedupe(rows)

with open("craigslist_listings.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(
        f,
        fieldnames=[
            "url",
            "title",
            "price",
            "address",
            "posted_datetime",
            "rss_published",
            "body_length",
            "body",
            "attributes",
        ],
    )
    writer.writeheader()
    writer.writerows(unique_rows)

print("wrote rows:", len(unique_rows))

Step 6: Handle common failure modes

Craigslist is lighter than many sites, but it is still worth coding for reality.

1) Some listings disappear

Between feed discovery and detail fetch, a seller may delete a listing. Expect:

404s
redirects
short placeholder pages

Treat those as normal and skip them.

2) Some fields are optional

Not every listing has:

a price
a neighborhood
a map address
the same attribute fields

Your parser should tolerate None.

3) Do not scrape too aggressively

If you are crawling many cities:

keep delays between detail pages
cap items per run
avoid hitting the same search feed every minute

RSS already reduces load. Use that advantage.

Using ProxiesAPI

For small Craigslist experiments you may not need a proxy. For scheduled, multi-city jobs, a stable network layer is still useful.

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://sfbay.craigslist.org/search/bia?query=trek&format=rss"

Python helper:

from urllib.parse import urlencode


def wrap_proxiesapi(target_url: str, api_key: str) -> str:
    return "http://api.proxiesapi.com/?" + urlencode({
        "key": api_key,
        "url": target_url,
    })


rss_via_proxy = wrap_proxiesapi(
    "https://sfbay.craigslist.org/search/bia?query=trek&format=rss",
    "YOUR_API_KEY",
)
xml = fetch_text(rss_via_proxy)

The rest of the scraper stays the same.

Why RSS + detail pages is the safe pattern

This pattern wins because it cuts waste:

RSS tells you what is new
detail pages give you richer data
dedupe is simple
you avoid crawling deep search pagination unless you truly need archives

For a production scraper, that is exactly the tradeoff you want.

Final script

RSS_URL = "https://sfbay.craigslist.org/search/bia?query=trek&format=rss"
rows = dedupe(scrape_from_rss(RSS_URL, limit=10))

if not rows:
    raise RuntimeError("No listings scraped; check feed filters or HTML selectors")

print("scraped", len(rows), "listings")

If your goal is reliable Craigslist monitoring, this is the pattern I would start with before reaching for anything heavier.

Turn a Craigslist scraper into a dependable daily job

Craigslist is lighter than most marketplaces, but a real pipeline still needs retries, pacing, and stable networking. ProxiesAPI helps keep your scheduled runs boring.

Get 1,000 free API calls View pricing

Build a Craigslist city+category scraper with pagination, dedupe, and CSV export. Includes selectors, anti-block hygiene, and screenshot proof.

tutorial#python#craigslist#web-scraping

How to Scrape Craigslist Listings by Category and City (Python + ProxiesAPI)

Pull Craigslist listings for a chosen city + category, normalize fields, follow listing pages for details, and export clean CSV with retries and anti-block tips.

tutorial#python#craigslist#web-scraping

Scrape Product Comparisons from CNET

Extract CNET comparison pages, spec tables, and verdict summaries into a tidy dataset with Python, retries, and optional ProxiesAPI routing.

tutorial#python#cnet#comparison-tables

Scrape UK Property Prices from Rightmove

Build a repeatable Rightmove property-price scraper in Python that discovers listing URLs, extracts structured fields, and exports a clean CSV or JSON dataset.

tutorial#python#rightmove#property-data

How to Scrape Craigslist with Python (the Safe Way): RSS + Detail Pages

Related guides