Scrape App Store Rankings (Python + ProxiesAPI)

If you’re doing app growth, competitive research, or ASO, you eventually need a clean dataset of:

  • Top chart position (rank)
  • app name + developer
  • category
  • app id / URL
  • rating count (when available)

Apple’s App Store exposes rankings through a mix of HTML pages and JSON endpoints. In this guide we’ll use a pragmatic approach:

  1. fetch a Top Apps page for a country/category
  2. parse the app cards (rank → app URL)
  3. enrich each app with a lightweight metadata fetch
  4. export everything to CSV

We’ll write this in Python and show exactly how to add ProxiesAPI + retries so the scraper remains reliable when you run it frequently.

App Store top charts page (we parse ranks + app links)

Keep ranking scrapes stable with ProxiesAPI

Rankings endpoints get rate limited when you poll frequently (daily/hourly). ProxiesAPI helps reduce blocks and smooth out spikes so your data pipeline doesn’t fall over mid-run.


What we’re scraping (targets)

There are two common ways to get chart data:

Option A: parse HTML top charts pages

Pros:

  • easy to start
  • no API keys

Cons:

  • HTML structure changes

Option B: use Apple’s public RSS/JSON feeds

Apple provides a public “RSS” style feed that returns structured data for top charts.

Pros:

  • structured JSON
  • stable

Cons:

  • the feed doesn’t always contain every field you want (you may still need enrichment)

In this tutorial we’ll do both:

  • use the feed as the ranking source
  • then enrich via the app page (or a second endpoint) if you need more fields

Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

Step 1: Fetch top charts via Apple’s JSON feed

Apple exposes a JSON feed for top free apps like:

  • https://rss.applemarketingtools.com/api/v2/us/apps/top-free/100/apps.json

You can swap:

  • country code (us, gb, in, …)
  • chart type (top-free, top-paid, …)
  • limit (10, 50, 100)
import requests

TIMEOUT = (10, 30)

DEFAULT_HEADERS = {
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
    "Accept": "application/json,text/html;q=0.9,*/*;q=0.8",
}

session = requests.Session()


def fetch_json(url: str) -> dict:
    r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT)
    r.raise_for_status()
    return r.json()


feed_url = "https://rss.applemarketingtools.com/api/v2/us/apps/top-free/100/apps.json"
data = fetch_json(feed_url)
print(data.keys())
print("results:", len(data["feed"]["results"]))

The response includes feed.results which is an ordered list. Rank is simply the list index + 1.


Step 2: Normalize ranking rows

from urllib.parse import urlparse


def normalize_feed(feed: dict) -> list[dict]:
    results = feed["feed"]["results"]

    out = []
    for i, r in enumerate(results, start=1):
        out.append({
            "rank": i,
            "name": r.get("name"),
            "artist": r.get("artistName"),
            "app_url": r.get("url"),
            "app_id": r.get("id"),
            "release_date": r.get("releaseDate"),
            "kind": r.get("kind"),
            "artwork": r.get("artworkUrl100"),
        })
    return out


rows = normalize_feed(data)
print(rows[0])

At this point you already have a clean rankings dataset.

But if you want richer metadata (rating count, description, genres), you need enrichment.


Step 3: Enrich each app via the iTunes Lookup API

Apple’s iTunes Search/Lookup API can return structured metadata for an app id:

  • https://itunes.apple.com/lookup?id=APP_ID&country=us

This is a common enrichment step.


def lookup_app(app_id: str, country: str = "us") -> dict | None:
    url = f"https://itunes.apple.com/lookup?id={app_id}&country={country}"
    r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT)
    r.raise_for_status()
    js = r.json()
    if js.get("resultCount", 0) < 1:
        return None
    return js["results"][0]


def enrich_rows(rows: list[dict], country: str = "us", limit: int = 20) -> list[dict]:
    out = []
    for idx, row in enumerate(rows[:limit], start=1):
        meta = lookup_app(row["app_id"], country=country)
        if meta:
            row = {
                **row,
                "bundle_id": meta.get("bundleId"),
                "primary_genre": meta.get("primaryGenreName"),
                "genres": ", ".join(meta.get("genres") or []),
                "seller": meta.get("sellerName"),
                "average_user_rating": meta.get("averageUserRating"),
                "user_rating_count": meta.get("userRatingCount"),
                "price": meta.get("price"),
                "currency": meta.get("currency"),
                "content_advisory_rating": meta.get("contentAdvisoryRating"),
            }
        out.append(row)
        print(f"enriched {idx}/{min(limit, len(rows))}: {row.get('name')}")
    return out


enriched = enrich_rows(rows, country="us", limit=30)
print(enriched[0].keys())

Why this enrichment step is useful

  • you keep rankings collection fast (single feed request)
  • you optionally enrich only the top N to manage cost/rate limits

Step 4: Export to CSV

import csv


def export_csv(rows: list[dict], path: str = "app_store_top_free_us.csv"):
    fields = sorted({k for r in rows for k in r.keys()})
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=fields)
        w.writeheader()
        for r in rows:
            w.writerow(r)


export_csv(enriched)
print("wrote app_store_top_free_us.csv")

Add ProxiesAPI: retries for frequent runs

If you scrape rankings daily/hourly, you’ll see occasional rate limiting or transient failures. The simplest stabilization is:

  • exponential backoff
  • optional proxy routing via ProxiesAPI
import os
import time
import random

PROXIESAPI_PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")


def proxies():
    if not PROXIESAPI_PROXY_URL:
        return None
    return {"http": PROXIESAPI_PROXY_URL, "https": PROXIESAPI_PROXY_URL}


def get_json_with_retries(url: str, tries: int = 5) -> dict:
    last_err = None
    for attempt in range(1, tries + 1):
        try:
            r = session.get(url, headers=DEFAULT_HEADERS, timeout=TIMEOUT, proxies=proxies())
            if r.status_code in (403, 429, 500, 502, 503, 504):
                raise requests.HTTPError(f"status {r.status_code}")
            r.raise_for_status()
            return r.json()
        except Exception as e:
            last_err = e
            sleep = min(20, 2 ** attempt) + random.random()
            print(f"attempt {attempt}/{tries} failed: {e}; sleeping {sleep:.1f}s")
            time.sleep(sleep)
    raise RuntimeError(f"failed after {tries} tries: {last_err}")

You can use this helper for both the rankings feed and the lookup enrichment.


QA checklist

  • rank increments correctly and matches the feed ordering
  • app_id and app_url look valid
  • enrichment returns fields for most apps
  • CSV opens correctly and preserves unicode app names

Next upgrades

  • schedule hourly scrapes and store rank history in SQLite/Postgres
  • add category-specific charts (games, finance, etc.)
  • dedupe apps across countries and compute “global momentum”

If you tell me the country + chart type you care about, I can tailor the feed URL and columns to your exact use case.

Keep ranking scrapes stable with ProxiesAPI

Rankings endpoints get rate limited when you poll frequently (daily/hourly). ProxiesAPI helps reduce blocks and smooth out spikes so your data pipeline doesn’t fall over mid-run.

Related guides

Scrape Stack Overflow User Profiles and Badges with Python
Extract reputation, badge counts, top tags, and profile metadata from public Stack Overflow user pages into JSON/CSV with robust selectors and a ProxiesAPI-ready fetch layer.
tutorial#python#stack-overflow#web-scraping
Scrape GitHub Repository Data
Collect GitHub repository metadata, stars, forks, topics, and README-linked context from the public HTML with Python. Includes defensive selectors, CSV export, and a screenshot.
tutorial#python#github#web-scraping
Scrape Book Reviews and Ratings from Goodreads
Extract Goodreads review text, star ratings, review counts, and reviewer metadata for a clean book-sentiment dataset.
tutorial#python#goodreads#web-scraping
Scrape Financial Data from Yahoo Finance (Green List site)
Fetch a quote page via ProxiesAPI, parse price + key stats, and export to CSV (with a screenshot).
tutorial#python#yahoo-finance#stocks