How to Scrape AutoTrader Used Car Listings with Python (Make/Model/Price/Mileage)

AutoTrader results pages are packed with useful data:

  • listing title (year/make/model/trim)
  • price
  • mileage
  • location
  • dealer vs private seller signals
AutoTrader used car listings page (we'll scrape result cards)

In this tutorial we’ll build a scraper that turns an AutoTrader search into structured JSON using requests + BeautifulSoup.

We’ll also do this the “production way”: timeouts, retries, and selectors that degrade gracefully.

Keep listing scrapes stable with ProxiesAPI

Classifieds sites can be sensitive to request volume and repeated searches. ProxiesAPI lets you proxy-fetch result pages via a single URL so you can focus on parsing + data quality instead of proxy plumbing.


What we’re scraping

AutoTrader search results are typically under a URL like:

  • https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY

(Parameters vary by region/search.)

We’ll scrape result cards, not individual listing pages. That keeps the request count lower and is enough for most datasets.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

Step 1: Fetch HTML through ProxiesAPI

Basic curl sanity-check:

API_KEY="YOUR_PROXIESAPI_KEY"
TARGET="https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY"

curl -s "http://api.proxiesapi.com/?key=$API_KEY&url=$TARGET" | head -n 20

Python fetch wrapper:

import time
import urllib.parse
import requests

API_KEY = "YOUR_PROXIESAPI_KEY"
TIMEOUT = (10, 60)

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
})


def proxiesapi_url(target_url: str) -> str:
    return "http://api.proxiesapi.com/?" + urllib.parse.urlencode({
        "key": API_KEY,
        "url": target_url,
    })


def fetch_html(target_url: str, retries: int = 3, backoff: float = 2.0) -> str:
    url = proxiesapi_url(target_url)
    last_err = None

    for attempt in range(1, retries + 1):
        try:
            r = session.get(url, timeout=TIMEOUT)
            r.raise_for_status()

            if len(r.text) < 15_000:
                raise RuntimeError(f"Suspiciously small response: {len(r.text)} bytes")

            return r.text
        except Exception as e:
            last_err = e
            sleep_s = backoff ** attempt
            print(f"attempt {attempt}/{retries} failed: {e} -> sleeping {sleep_s:.1f}s")
            time.sleep(sleep_s)

    raise RuntimeError(f"Failed after {retries} retries: {last_err}")

Step 2: Identify stable selectors

AutoTrader is more JS-heavy than some sites, but result pages often still contain useful server-rendered HTML.

A common pattern is that each listing card is wrapped with a data-testid attribute.

In your first run, do:

from bs4 import BeautifulSoup

html = fetch_html("https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY")

soup = BeautifulSoup(html, "lxml")
print("title:", soup.title.get_text(strip=True) if soup.title else None)

# Probe a few likely patterns
print("cards-testid:", len(soup.select('[data-testid*="listing"]')))
print("cards-article:", len(soup.select("article")))

If the HTML is mostly scripts and you don’t see listing text at all, you’ll need a browser automation approach. But before you go that route, verify your URL is a real public results page and you’re not getting a “blocked” response.


Step 3: Parse listing cards

We’ll extract:

  • title (often includes year/make/model)
  • price
  • mileage
  • location
  • a listing URL

We’ll keep values as text and normalize later (because mileage/price formatting varies).

import re
from bs4 import BeautifulSoup

BASE = "https://www.autotrader.com"


def clean_text(x: str | None) -> str | None:
    if not x:
        return None
    x = re.sub(r"\s+", " ", x).strip()
    return x or None


def parse_listings(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    # Prefer explicit testid cards if present
    cards = soup.select('[data-testid*="listing-card"], [data-testid*="inventory-listing"], article')

    out = []
    for c in cards:
        # Title
        title_el = c.select_one("h2") or c.select_one("h3")
        title = clean_text(title_el.get_text(" ", strip=True) if title_el else None)

        # Price
        price_el = (
            c.select_one('[data-testid*="price"]')
            or c.find(string=re.compile(r"\$\s?\d"))
        )
        price = None
        if price_el:
            price = clean_text(price_el.get_text(" ", strip=True) if hasattr(price_el, "get_text") else str(price_el))

        # Mileage (often like "23,451 miles")
        mileage_el = c.find(string=re.compile(r"miles", re.I))
        mileage = clean_text(str(mileage_el)) if mileage_el else None

        # Location
        location_el = (
            c.select_one('[data-testid*="location"]')
            or c.find(string=re.compile(r"\b[A-Z]{2}\b", re.I))
        )
        location = None
        if location_el:
            location = clean_text(location_el.get_text(" ", strip=True) if hasattr(location_el, "get_text") else str(location_el))

        # Link
        a = c.select_one('a[href*="/cars-for-sale/vehicledetails"]') or c.select_one('a[href^="/"]')
        href = a.get("href") if a else None
        if href and href.startswith("/"):
            href = BASE + href.split("?")[0]

        if not title and not price and not href:
            continue

        out.append({
            "title": title,
            "price_text": price,
            "mileage_text": mileage,
            "location_text": location,
            "url": href,
        })

    # De-dupe by URL/title
    seen = set()
    uniq = []
    for item in out:
        key = item.get("url") or item.get("title")
        if not key or key in seen:
            continue
        seen.add(key)
        uniq.append(item)

    return uniq

Terminal-style run

if __name__ == "__main__":
    target = "https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY"
    html = fetch_html(target)
    items = parse_listings(html)

    print("listings:", len(items))
    for it in items[:5]:
        print(it)

Example output:

listings: 23
{'title': '2021 Toyota Camry SE', 'price_text': '$23,995', 'mileage_text': '34,210 miles', 'location_text': 'Brooklyn, NY', 'url': 'https://www.autotrader.com/cars-for-sale/vehicledetails.xhtml?...'}
...

Export to JSON

import json

with open("autotrader_listings.json", "w", encoding="utf-8") as f:
    json.dump(items, f, ensure_ascii=False, indent=2)

print("wrote autotrader_listings.json", len(items))

Practical notes (so you don’t get blocked)

  • Don’t run the same query 100 times in a minute.
  • Cache HTML for debugging.
  • Add delays between page fetches.
  • Use ProxiesAPI for a more stable network layer.

Where ProxiesAPI fits

AutoTrader scraping tends to break when your network layer is unstable (intermittent blocks, inconsistent content).

ProxiesAPI keeps the integration clean: fetch your target URL via a single proxy-backed endpoint, then focus on parsing and data validation.


QA checklist

  • listings > 0
  • Titles look like real vehicles
  • URLs open correctly
  • Your exporter writes valid JSON
  • You respect delays/timeouts
Keep listing scrapes stable with ProxiesAPI

Classifieds sites can be sensitive to request volume and repeated searches. ProxiesAPI lets you proxy-fetch result pages via a single URL so you can focus on parsing + data quality instead of proxy plumbing.

Related guides

Scrape Flight Prices from Google Flights (Python + ProxiesAPI)
A practical approach to monitoring flight prices: take a proof screenshot, extract prices from HTML snapshots, and run with retries + proxy rotation.
tutorial#python#google-flights#price-scraping
Scrape Stack Overflow Questions and Answers by Tag (Python + ProxiesAPI)
Collect Stack Overflow Q&A for a tag with pagination, answer extraction, and a proof screenshot. Export clean JSON for analysis.
tutorial#python#stack-overflow#web-scraping
Scrape Podcast Data from Apple Podcasts (Charts + Show/Episode Metadata) with Python + ProxiesAPI
Build a clean dataset of Apple Podcasts charts → show pages → episode lists. Includes stable IDs, incremental updates, and a scraper-friendly request layer using ProxiesAPI.
tutorial#python#apple-podcasts#podcasts
Scrape Product Prices from Home Depot (Search + Category Pages) with Python + ProxiesAPI
Extract product name, price, and availability from Home Depot listing pages (search + category) with pagination, resilient parsing, and an anti-block-friendly request layer.
tutorial#python#home-depot#ecommerce