Scrape App Store Rankings (Python + ProxiesAPI)

Apple Marketing Tools RSS (we’ll use feed endpoints for rankings)

App Store rankings are useful for:

  • market research (who’s moving up, who’s falling)
  • competitor monitoring
  • category discovery (new apps that start climbing)
  • building “top apps” datasets

In this tutorial we’ll build a practical scraper that produces a daily CSV/JSON of top chart apps for:

  • a given country (storefront)
  • a given chart type (free/paid/top grossing)
  • optionally multiple categories

Then we’ll enrich the dataset with stable identifiers and metadata.

Make daily ranking pulls more stable with ProxiesAPI

When you pull charts every day across multiple countries/categories, failures add up. ProxiesAPI helps keep the fetch layer consistent so your dataset stays complete.


What we’re scraping: stable endpoints (prefer these)

There are two general ways to get App Store ranking data:

  1. Unofficial web HTML parsing (fragile; Apple can change it)
  2. Apple’s feed endpoints (more stable)

For production-ish datasets, choose the feed endpoints.

Option A: iTunes RSS / Apple Music-like feeds (JSON)

Apple historically provided RSS feeds for top apps.

Depending on the feed version you may find JSON endpoints like:

  • Top Free iPhone Apps: https://itunes.apple.com/{country}/rss/topfreeapplications/limit=100/json
  • Top Paid iPhone Apps: https://itunes.apple.com/{country}/rss/toppaidapplications/limit=100/json

Country is typically a two-letter code like us, gb, in.

These feeds can change over time; if an endpoint 404s, treat it as “endpoint moved” and update the base URL.

Option B: Web pages + parsing (fallback)

Apple also publishes chart pages in the App Store web experience.

You can scrape them, but expect the DOM to change.

In this guide we’ll implement Option A first.


Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests

We only need requests because the feed is JSON.


Step 1: Fetch a chart via ProxiesAPI

ProxiesAPI canonical fetch:

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

Python helper:

import json
import requests
from urllib.parse import quote_plus

TIMEOUT = (10, 60)


def proxiesapi_url(target_url: str, api_key: str) -> str:
    return f"http://api.proxiesapi.com/?key={quote_plus(api_key)}&url={quote_plus(target_url)}"


def fetch_json(target_url: str, api_key: str) -> dict:
    url = proxiesapi_url(target_url, api_key)
    r = requests.get(url, timeout=TIMEOUT)
    r.raise_for_status()
    return r.json()

Pick a chart endpoint. Example (Top Free):

API_KEY = "API_KEY"
COUNTRY = "us"
FEED_URL = f"https://itunes.apple.com/{COUNTRY}/rss/topfreeapplications/limit=100/json"

payload = fetch_json(FEED_URL, API_KEY)
print(payload.keys())

Step 2: Parse ranks, app ids, names, developers

Typical RSS JSON structure includes a feed with entry items.

We’ll normalize into rows with:

  • rank (1..N)
  • app_id (stable numeric id)
  • name
  • developer
  • app_url
  • country
  • chart
  • date
from datetime import date


def parse_top_feed(payload: dict, *, country: str, chart: str) -> list[dict]:
    feed = payload.get("feed") or {}
    entries = feed.get("entry") or []

    rows = []
    for idx, e in enumerate(entries, start=1):
        name = (e.get("im:name") or {}).get("label")
        developer = (e.get("im:artist") or {}).get("label")

        # app id usually in id.attributes["im:id"]
        app_id = None
        id_obj = e.get("id") or {}
        attrs = id_obj.get("attributes") or {}
        app_id = attrs.get("im:id")

        app_url = (id_obj.get("label") if isinstance(id_obj, dict) else None)

        rows.append({
            "rank": idx,
            "app_id": app_id,
            "name": name,
            "developer": developer,
            "app_url": app_url,
            "country": country,
            "chart": chart,
            "pulled_at": date.today().isoformat(),
        })

    return rows


rows = parse_top_feed(payload, country=COUNTRY, chart="topfreeapplications")
print("rows:", len(rows))
print(rows[0])

Step 3: Enrich with app metadata (lookup by app id)

Ranks are useful, but for a durable dataset you’ll usually want more fields:

  • primary genre/category
  • bundle id
  • current version
  • release date
  • price
  • rating count
  • average rating

A common technique is to call Apple’s lookup endpoint:

https://itunes.apple.com/lookup?id=APP_ID&country=us

This often returns a JSON object with results.

We’ll implement a simple enrich step that batches ids (and throttles).

import time


def lookup_app(app_id: str, api_key: str, country: str) -> dict | None:
    url = f"https://itunes.apple.com/lookup?id={app_id}&country={country}"
    data = fetch_json(url, api_key)
    results = data.get("results") or []
    return results[0] if results else None


def enrich_rows(rows: list[dict], api_key: str, country: str, *, sleep_s: float = 0.3) -> list[dict]:
    out = []
    for r in rows:
        app_id = r.get("app_id")
        meta = lookup_app(app_id, api_key, country) if app_id else None

        enriched = dict(r)
        if meta:
            enriched.update({
                "bundle_id": meta.get("bundleId"),
                "primary_genre": meta.get("primaryGenreName"),
                "genres": meta.get("genres"),
                "seller_name": meta.get("sellerName"),
                "version": meta.get("version"),
                "current_version_release_date": meta.get("currentVersionReleaseDate"),
                "release_date": meta.get("releaseDate"),
                "price": meta.get("price"),
                "currency": meta.get("currency"),
                "average_user_rating": meta.get("averageUserRating"),
                "user_rating_count": meta.get("userRatingCount"),
            })

        out.append(enriched)
        time.sleep(sleep_s)

    return out

If you’re pulling 100 apps across many countries, do lookups in a separate job and cache results by app_id.


Step 4: Export daily dataset

import csv
import json


def export_csv(rows: list[dict], path: str) -> None:
    if not rows:
        return

    keys = sorted({k for r in rows for k in r.keys()})
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=keys)
        w.writeheader()
        w.writerows(rows)


def export_json(rows: list[dict], path: str) -> None:
    with open(path, "w", encoding="utf-8") as f:
        json.dump(rows, f, ensure_ascii=False, indent=2)

Putting it together:

API_KEY = "API_KEY"
COUNTRY = "us"
CHART = "topfreeapplications"

feed_url = f"https://itunes.apple.com/{COUNTRY}/rss/{CHART}/limit=100/json"
payload = fetch_json(feed_url, API_KEY)
rows = parse_top_feed(payload, country=COUNTRY, chart=CHART)
rows = enrich_rows(rows[:25], API_KEY, COUNTRY)  # enrich a subset to start

export_csv(rows, f"app_store_{COUNTRY}_{CHART}.csv")
export_json(rows, f"app_store_{COUNTRY}_{CHART}.json")
print("saved", len(rows))

Handling failures (retries, partial days)

For daily pipelines you want two guarantees:

  1. you never “hang”
  2. a partial failure doesn’t erase the day

Practical tips:

  • always set timeouts (connect + read)
  • store raw payloads for debugging (S3 / local)
  • write outputs to a date-partitioned folder (YYYY-MM-DD/)
  • if enrich step fails, still keep the ranking-only dataset

QA checklist

  • Chart endpoint returns JSON (not HTML)
  • Rank count matches expected limit (e.g., 100)
  • app_id is present for each row
  • Lookup enrichment adds stable fields (bundle id, genre)
  • Export files open cleanly

Final thoughts

App Store charts are a great “daily pull” dataset: structured, comparable over time, and high signal.

Start with one country + one chart.

Once it’s stable, expand the matrix:

  • more countries
  • more chart types
  • scheduled runs
  • cached metadata enrichment

That’s the difference between “a script” and a dataset.

Make daily ranking pulls more stable with ProxiesAPI

When you pull charts every day across multiple countries/categories, failures add up. ProxiesAPI helps keep the fetch layer consistent so your dataset stays complete.

Related guides

How to Scrape Stack Overflow Questions and Accepted Answers with Python (By Tag)
Build a resilient Stack Overflow scraper: crawl tag pages, extract question metadata, follow links, and parse accepted answers. Includes retries, dedupe, and ProxiesAPI-ready requests + a screenshot of the tag page.
tutorial#python#stack-overflow#web-scraping
Scrape Government Contract Data from SAM.gov (Opportunities + Details)
Extract live SAM.gov contract opportunities and enrich them with detail pages (filters, pagination, retries). Includes a production-ready Python scraper and export to JSON/CSV.
tutorial#python#sam-gov#government-contracts
Scrape Government Contract Data from SAM.gov (Opportunities + Details)
Build a SAM.gov opportunities dataset in Python: search with filters, paginate results, follow detail pages, and export structured contract fields with retries and polite crawling.
tutorial#python#sam-gov#government-contracts
Scrape UK Property Prices from Rightmove (Dataset Builder + Screenshots)
Build a repeatable Rightmove sold-price dataset pipeline in Python: crawl result pages, extract listing URLs, parse sold-price details, and export clean CSV/JSON with retries and politeness.
tutorial#python#rightmove#real-estate