How to Scrape AutoTrader Used Car Listings with Python (Make/Model/Price/Mileage)
AutoTrader results pages are packed with useful data:
- listing title (year/make/model/trim)
- price
- mileage
- location
- dealer vs private seller signals
In this tutorial we’ll build a scraper that turns an AutoTrader search into structured JSON using requests + BeautifulSoup.
We’ll also do this the “production way”: timeouts, retries, and selectors that degrade gracefully.
Classifieds sites can be sensitive to request volume and repeated searches. ProxiesAPI lets you proxy-fetch result pages via a single URL so you can focus on parsing + data quality instead of proxy plumbing.
What we’re scraping
AutoTrader search results are typically under a URL like:
https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY
(Parameters vary by region/search.)
We’ll scrape result cards, not individual listing pages. That keeps the request count lower and is enough for most datasets.
Setup
python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml
Step 1: Fetch HTML through ProxiesAPI
Basic curl sanity-check:
API_KEY="YOUR_PROXIESAPI_KEY"
TARGET="https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY"
curl -s "http://api.proxiesapi.com/?key=$API_KEY&url=$TARGET" | head -n 20
Python fetch wrapper:
import time
import urllib.parse
import requests
API_KEY = "YOUR_PROXIESAPI_KEY"
TIMEOUT = (10, 60)
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
})
def proxiesapi_url(target_url: str) -> str:
return "http://api.proxiesapi.com/?" + urllib.parse.urlencode({
"key": API_KEY,
"url": target_url,
})
def fetch_html(target_url: str, retries: int = 3, backoff: float = 2.0) -> str:
url = proxiesapi_url(target_url)
last_err = None
for attempt in range(1, retries + 1):
try:
r = session.get(url, timeout=TIMEOUT)
r.raise_for_status()
if len(r.text) < 15_000:
raise RuntimeError(f"Suspiciously small response: {len(r.text)} bytes")
return r.text
except Exception as e:
last_err = e
sleep_s = backoff ** attempt
print(f"attempt {attempt}/{retries} failed: {e} -> sleeping {sleep_s:.1f}s")
time.sleep(sleep_s)
raise RuntimeError(f"Failed after {retries} retries: {last_err}")
Step 2: Identify stable selectors
AutoTrader is more JS-heavy than some sites, but result pages often still contain useful server-rendered HTML.
A common pattern is that each listing card is wrapped with a data-testid attribute.
In your first run, do:
from bs4 import BeautifulSoup
html = fetch_html("https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY")
soup = BeautifulSoup(html, "lxml")
print("title:", soup.title.get_text(strip=True) if soup.title else None)
# Probe a few likely patterns
print("cards-testid:", len(soup.select('[data-testid*="listing"]')))
print("cards-article:", len(soup.select("article")))
If the HTML is mostly scripts and you don’t see listing text at all, you’ll need a browser automation approach. But before you go that route, verify your URL is a real public results page and you’re not getting a “blocked” response.
Step 3: Parse listing cards
We’ll extract:
- title (often includes year/make/model)
- price
- mileage
- location
- a listing URL
We’ll keep values as text and normalize later (because mileage/price formatting varies).
import re
from bs4 import BeautifulSoup
BASE = "https://www.autotrader.com"
def clean_text(x: str | None) -> str | None:
if not x:
return None
x = re.sub(r"\s+", " ", x).strip()
return x or None
def parse_listings(html: str) -> list[dict]:
soup = BeautifulSoup(html, "lxml")
# Prefer explicit testid cards if present
cards = soup.select('[data-testid*="listing-card"], [data-testid*="inventory-listing"], article')
out = []
for c in cards:
# Title
title_el = c.select_one("h2") or c.select_one("h3")
title = clean_text(title_el.get_text(" ", strip=True) if title_el else None)
# Price
price_el = (
c.select_one('[data-testid*="price"]')
or c.find(string=re.compile(r"\$\s?\d"))
)
price = None
if price_el:
price = clean_text(price_el.get_text(" ", strip=True) if hasattr(price_el, "get_text") else str(price_el))
# Mileage (often like "23,451 miles")
mileage_el = c.find(string=re.compile(r"miles", re.I))
mileage = clean_text(str(mileage_el)) if mileage_el else None
# Location
location_el = (
c.select_one('[data-testid*="location"]')
or c.find(string=re.compile(r"\b[A-Z]{2}\b", re.I))
)
location = None
if location_el:
location = clean_text(location_el.get_text(" ", strip=True) if hasattr(location_el, "get_text") else str(location_el))
# Link
a = c.select_one('a[href*="/cars-for-sale/vehicledetails"]') or c.select_one('a[href^="/"]')
href = a.get("href") if a else None
if href and href.startswith("/"):
href = BASE + href.split("?")[0]
if not title and not price and not href:
continue
out.append({
"title": title,
"price_text": price,
"mileage_text": mileage,
"location_text": location,
"url": href,
})
# De-dupe by URL/title
seen = set()
uniq = []
for item in out:
key = item.get("url") or item.get("title")
if not key or key in seen:
continue
seen.add(key)
uniq.append(item)
return uniq
Terminal-style run
if __name__ == "__main__":
target = "https://www.autotrader.com/cars-for-sale/all-cars?zip=10001&startYear=2018&endYear=2026&makeCodeList=TOYOTA&modelCodeList=CAMRY"
html = fetch_html(target)
items = parse_listings(html)
print("listings:", len(items))
for it in items[:5]:
print(it)
Example output:
listings: 23
{'title': '2021 Toyota Camry SE', 'price_text': '$23,995', 'mileage_text': '34,210 miles', 'location_text': 'Brooklyn, NY', 'url': 'https://www.autotrader.com/cars-for-sale/vehicledetails.xhtml?...'}
...
Export to JSON
import json
with open("autotrader_listings.json", "w", encoding="utf-8") as f:
json.dump(items, f, ensure_ascii=False, indent=2)
print("wrote autotrader_listings.json", len(items))
Practical notes (so you don’t get blocked)
- Don’t run the same query 100 times in a minute.
- Cache HTML for debugging.
- Add delays between page fetches.
- Use ProxiesAPI for a more stable network layer.
Where ProxiesAPI fits
AutoTrader scraping tends to break when your network layer is unstable (intermittent blocks, inconsistent content).
ProxiesAPI keeps the integration clean: fetch your target URL via a single proxy-backed endpoint, then focus on parsing and data validation.
QA checklist
-
listings > 0 - Titles look like real vehicles
- URLs open correctly
- Your exporter writes valid JSON
- You respect delays/timeouts
Classifieds sites can be sensitive to request volume and repeated searches. ProxiesAPI lets you proxy-fetch result pages via a single URL so you can focus on parsing + data quality instead of proxy plumbing.