How to Scrape GitHub Releases with Python (Versions + Notes + Diffs)

If you’re shipping software, releases are a data stream.

This tutorial builds a simple monitor that:

  • scrapes a GitHub repo’s Releases page
  • extracts version tags + dates + notes
  • stores structured JSON
  • computes diffs so you can alert when something changes

GitHub Releases page

Turn release watching into a reliable monitor

Once you track many repos and run this hourly/daily, reliability becomes the bottleneck. ProxiesAPI helps keep fetches stable as you scale.


Target URL

Example repo:

https://github.com/openclaw/openclaw/releases

Replace the owner/repo to track your own targets.


Setup

python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml

Step 1: Fetch HTML

import requests

TIMEOUT = (10, 30)
UA = "Mozilla/5.0 (compatible; ProxiesAPIGuidesBot/1.0; +https://www.proxiesapi.com/)"

url = "https://github.com/openclaw/openclaw/releases"
html = requests.get(url, timeout=TIMEOUT, headers={"User-Agent": UA}).text
print(len(html))

Step 2: Parse releases

GitHub’s HTML structure changes, so keep parsing defensive.

We’ll extract:

  • tag (e.g. v1.2.3)
  • title
  • URL to the release page
  • published time (when present)
  • notes text (collapsed to plain text)
from bs4 import BeautifulSoup


def parse_releases(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")

    releases = []

    # Each release is typically a section/region with a heading + tag link.
    for block in soup.select('div.release, div.js-release'):
        h2 = block.select_one('h2')
        title = h2.get_text(" ", strip=True) if h2 else None

        tag_a = block.select_one('a[href*="/tree/"]')
        tag = tag_a.get_text(" ", strip=True) if tag_a else None

        rel_a = block.select_one('a[href*="/releases/tag/"]')
        rel_url = f"https://github.com{rel_a.get('href')}" if rel_a and rel_a.get('href','').startswith('/') else (rel_a.get('href') if rel_a else None)

        time_el = block.select_one('relative-time, time')
        published = time_el.get('datetime') if time_el else None

        notes_el = block.select_one('[data-test-selector="release-notes"]') or block
        notes = notes_el.get_text("\n", strip=True) if notes_el else ""

        # keep notes short in the listing; store full notes separately if needed
        releases.append({
            "tag": tag,
            "title": title,
            "url": rel_url,
            "published": published,
            "notes": notes,
        })

    # fallback: if selectors miss, still return something by scanning headings
    if not releases:
        for region in soup.select('[aria-label*="Releases"], main'):
            pass

    return releases


rows = parse_releases(html)
print('releases:', len(rows))
print(rows[0] if rows else None)

Step 3: Store snapshots and compute diffs

The simplest monitoring model is: write a JSON snapshot today, compare with yesterday.

import json
from pathlib import Path

out_dir = Path('release_snapshots')
out_dir.mkdir(exist_ok=True)

snapshot_path = out_dir / 'openclaw_openclaw.json'

prev = None
if snapshot_path.exists():
    prev = json.loads(snapshot_path.read_text('utf-8'))

snapshot_path.write_text(json.dumps(rows, ensure_ascii=False, indent=2), 'utf-8')
print('wrote', snapshot_path)

if prev is not None:
    prev_tags = [r.get('tag') for r in prev]
    curr_tags = [r.get('tag') for r in rows]
    new = [t for t in curr_tags if t and t not in set(prev_tags)]
    if new:
        print('new releases:', new)

ProxiesAPI usage (canonical)

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://github.com/openclaw/openclaw/releases"

QA checklist

  • Extracts at least the latest release tag + URL
  • Snapshot JSON writes correctly
  • Diff detects new tags
Turn release watching into a reliable monitor

Once you track many repos and run this hourly/daily, reliability becomes the bottleneck. ProxiesAPI helps keep fetches stable as you scale.

Related guides

Scrape Netflix Catalogue Data with Python + ProxiesAPI (Titles, Genres, Availability)
Build a repeatable Netflix title dataset from listing pages: extract title rows, handle pagination defensively, dedupe, and export clean JSONL. Includes a screenshot of the target UI.
tutorial#python#netflix#web-scraping
Scrape Pinterest Images and Pins (Search + Board URLs) with Python + ProxiesAPI
Extract pin titles, image URLs, outbound links, and board metadata from Pinterest search + board pages with pagination, retries, and defensive parsing. Includes a screenshot of the target UI.
tutorial#python#pinterest#web-scraping
Scrape Stack Overflow Questions and Answers by Tag (Python + ProxiesAPI)
Extract Stack Overflow question lists and accepted answers for a tag with robust retries, respectful rate limits, and a validation screenshot. Export to JSON/CSV.
tutorial#python#stack-overflow#web-scraping
Scrape Patreon Creator Data with Python (Profiles, Tiers, Posts)
Extract Patreon creator metadata, membership tiers, and recent public posts with a screenshot-first workflow, robust retries, and ProxiesAPI-backed requests.
tutorial#python#patreon#web-scraping