Scrape App Store Rankings (Python + ProxiesAPI)
If you’re doing app growth, competitive research, or ASO, you eventually need a clean dataset of:
- Top chart position (rank)
- app name + developer
- category
- app id / URL
- rating count (when available)
Apple’s App Store exposes rankings through a mix of HTML pages and JSON endpoints. In this guide we’ll use a pragmatic approach:
- fetch a Top Apps page for a country/category
- parse the app cards (rank → app URL)
- enrich each app with a lightweight metadata fetch
- export everything to CSV
We’ll write this in Python and show exactly how to add ProxiesAPI + retries so the scraper remains reliable when you run it frequently.

Rankings endpoints get rate limited when you poll frequently (daily/hourly). ProxiesAPI helps reduce blocks and smooth out spikes so your data pipeline doesn’t fall over mid-run.
What we’re scraping (targets)
There are two common ways to get chart data:
Option A: parse HTML top charts pages
Pros:
- easy to start
- no API keys
Cons:
- HTML structure changes
Option B: use Apple’s public RSS/JSON feeds
Apple provides a public “RSS” style feed that returns structured data for top charts.
Pros:
- structured JSON
- stable
Cons:
- the feed doesn’t always contain every field you want (you may still need enrichment)
In this tutorial we’ll do both:
- use the feed as the ranking source
- then enrich via the app page (or a second endpoint) if you need more fields
Setup
python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml
Step 1: Fetch top charts via Apple’s JSON feed
Apple exposes a JSON feed for top free apps like:
https://rss.applemarketingtools.com/api/v2/us/apps/top-free/100/apps.json
You can swap:
- country code (
us,gb,in, …) - chart type (
top-free,top-paid, …) - limit (
10,50,100)
import requests
TIMEOUT = (10, 30)
DEFAULT_HEADERS = {
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/124.0.0.0 Safari/537.36"
),
"Accept": "application/json,text/html;q=0.9,*/*;q=0.8",
}
session = requests.Session()
def fetch_json(url: str) -> dict:
r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT)
r.raise_for_status()
return r.json()
feed_url = "https://rss.applemarketingtools.com/api/v2/us/apps/top-free/100/apps.json"
data = fetch_json(feed_url)
print(data.keys())
print("results:", len(data["feed"]["results"]))
The response includes feed.results which is an ordered list. Rank is simply the list index + 1.
Step 2: Normalize ranking rows
from urllib.parse import urlparse
def normalize_feed(feed: dict) -> list[dict]:
results = feed["feed"]["results"]
out = []
for i, r in enumerate(results, start=1):
out.append({
"rank": i,
"name": r.get("name"),
"artist": r.get("artistName"),
"app_url": r.get("url"),
"app_id": r.get("id"),
"release_date": r.get("releaseDate"),
"kind": r.get("kind"),
"artwork": r.get("artworkUrl100"),
})
return out
rows = normalize_feed(data)
print(rows[0])
At this point you already have a clean rankings dataset.
But if you want richer metadata (rating count, description, genres), you need enrichment.
Step 3: Enrich each app via the iTunes Lookup API
Apple’s iTunes Search/Lookup API can return structured metadata for an app id:
https://itunes.apple.com/lookup?id=APP_ID&country=us
This is a common enrichment step.
def lookup_app(app_id: str, country: str = "us") -> dict | None:
url = f"https://itunes.apple.com/lookup?id={app_id}&country={country}"
r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT)
r.raise_for_status()
js = r.json()
if js.get("resultCount", 0) < 1:
return None
return js["results"][0]
def enrich_rows(rows: list[dict], country: str = "us", limit: int = 20) -> list[dict]:
out = []
for idx, row in enumerate(rows[:limit], start=1):
meta = lookup_app(row["app_id"], country=country)
if meta:
row = {
**row,
"bundle_id": meta.get("bundleId"),
"primary_genre": meta.get("primaryGenreName"),
"genres": ", ".join(meta.get("genres") or []),
"seller": meta.get("sellerName"),
"average_user_rating": meta.get("averageUserRating"),
"user_rating_count": meta.get("userRatingCount"),
"price": meta.get("price"),
"currency": meta.get("currency"),
"content_advisory_rating": meta.get("contentAdvisoryRating"),
}
out.append(row)
print(f"enriched {idx}/{min(limit, len(rows))}: {row.get('name')}")
return out
enriched = enrich_rows(rows, country="us", limit=30)
print(enriched[0].keys())
Why this enrichment step is useful
- you keep rankings collection fast (single feed request)
- you optionally enrich only the top N to manage cost/rate limits
Step 4: Export to CSV
import csv
def export_csv(rows: list[dict], path: str = "app_store_top_free_us.csv"):
fields = sorted({k for r in rows for k in r.keys()})
with open(path, "w", newline="", encoding="utf-8") as f:
w = csv.DictWriter(f, fieldnames=fields)
w.writeheader()
for r in rows:
w.writerow(r)
export_csv(enriched)
print("wrote app_store_top_free_us.csv")
Add ProxiesAPI: retries for frequent runs
If you scrape rankings daily/hourly, you’ll see occasional rate limiting or transient failures. The simplest stabilization is:
- exponential backoff
- optional proxy routing via ProxiesAPI
import os
import time
import random
PROXIESAPI_PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")
def proxies():
if not PROXIESAPI_PROXY_URL:
return None
return {"http": PROXIESAPI_PROXY_URL, "https": PROXIESAPI_PROXY_URL}
def get_json_with_retries(url: str, tries: int = 5) -> dict:
last_err = None
for attempt in range(1, tries + 1):
try:
r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT, proxies=proxies())
if r.status_code in (403, 429, 500, 502, 503, 504):
raise requests.HTTPError(f"status {r.status_code}")
r.raise_for_status()
return r.json()
except Exception as e:
last_err = e
sleep = min(20, 2 ** attempt) + random.random()
print(f"attempt {attempt}/{tries} failed: {e}; sleeping {sleep:.1f}s")
time.sleep(sleep)
raise RuntimeError(f"failed after {tries} tries: {last_err}")
You can use this helper for both the rankings feed and the lookup enrichment.
QA checklist
-
rankincrements correctly and matches the feed ordering -
app_idandapp_urllook valid - enrichment returns fields for most apps
- CSV opens correctly and preserves unicode app names
Next upgrades
- schedule hourly scrapes and store rank history in SQLite/Postgres
- add category-specific charts (games, finance, etc.)
- dedupe apps across countries and compute “global momentum”
If you tell me the country + chart type you care about, I can tailor the feed URL and columns to your exact use case.
Rankings endpoints get rate limited when you poll frequently (daily/hourly). ProxiesAPI helps reduce blocks and smooth out spikes so your data pipeline doesn’t fall over mid-run.