How to Scrape GitHub Releases with Python (Versions + Notes + Diffs)
If you’re shipping software, releases are a data stream.
This tutorial builds a simple monitor that:
- scrapes a GitHub repo’s Releases page
- extracts version tags + dates + notes
- stores structured JSON
- computes diffs so you can alert when something changes

Turn release watching into a reliable monitor
Once you track many repos and run this hourly/daily, reliability becomes the bottleneck. ProxiesAPI helps keep fetches stable as you scale.
Target URL
Example repo:
https://github.com/openclaw/openclaw/releases
Replace the owner/repo to track your own targets.
Setup
python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml
Step 1: Fetch HTML
import requests
TIMEOUT = (10, 30)
UA = "Mozilla/5.0 (compatible; ProxiesAPIGuidesBot/1.0; +https://www.proxiesapi.com/)"
url = "https://github.com/openclaw/openclaw/releases"
html = requests.get(url, timeout=TIMEOUT, headers={"User-Agent": UA}).text
print(len(html))
Step 2: Parse releases
GitHub’s HTML structure changes, so keep parsing defensive.
We’ll extract:
- tag (e.g. v1.2.3)
- title
- URL to the release page
- published time (when present)
- notes text (collapsed to plain text)
from bs4 import BeautifulSoup
def parse_releases(html: str) -> list[dict]:
soup = BeautifulSoup(html, "lxml")
releases = []
# Each release is typically a section/region with a heading + tag link.
for block in soup.select('div.release, div.js-release'):
h2 = block.select_one('h2')
title = h2.get_text(" ", strip=True) if h2 else None
tag_a = block.select_one('a[href*="/tree/"]')
tag = tag_a.get_text(" ", strip=True) if tag_a else None
rel_a = block.select_one('a[href*="/releases/tag/"]')
rel_url = f"https://github.com{rel_a.get('href')}" if rel_a and rel_a.get('href','').startswith('/') else (rel_a.get('href') if rel_a else None)
time_el = block.select_one('relative-time, time')
published = time_el.get('datetime') if time_el else None
notes_el = block.select_one('[data-test-selector="release-notes"]') or block
notes = notes_el.get_text("\n", strip=True) if notes_el else ""
# keep notes short in the listing; store full notes separately if needed
releases.append({
"tag": tag,
"title": title,
"url": rel_url,
"published": published,
"notes": notes,
})
# fallback: if selectors miss, still return something by scanning headings
if not releases:
for region in soup.select('[aria-label*="Releases"], main'):
pass
return releases
rows = parse_releases(html)
print('releases:', len(rows))
print(rows[0] if rows else None)
Step 3: Store snapshots and compute diffs
The simplest monitoring model is: write a JSON snapshot today, compare with yesterday.
import json
from pathlib import Path
out_dir = Path('release_snapshots')
out_dir.mkdir(exist_ok=True)
snapshot_path = out_dir / 'openclaw_openclaw.json'
prev = None
if snapshot_path.exists():
prev = json.loads(snapshot_path.read_text('utf-8'))
snapshot_path.write_text(json.dumps(rows, ensure_ascii=False, indent=2), 'utf-8')
print('wrote', snapshot_path)
if prev is not None:
prev_tags = [r.get('tag') for r in prev]
curr_tags = [r.get('tag') for r in rows]
new = [t for t in curr_tags if t and t not in set(prev_tags)]
if new:
print('new releases:', new)
ProxiesAPI usage (canonical)
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://github.com/openclaw/openclaw/releases"
QA checklist
- Extracts at least the latest release tag + URL
- Snapshot JSON writes correctly
- Diff detects new tags
Turn release watching into a reliable monitor
Once you track many repos and run this hourly/daily, reliability becomes the bottleneck. ProxiesAPI helps keep fetches stable as you scale.
Related guides
How to Scrape GitHub Trending with Python (and Export to CSV/JSON)
A practical GitHub Trending scraper: fetch the Trending page, extract repo names + language + stars, and export a clean dataset.
tutorial#python#github#web-scraping
How to Scrape Hacker News (HN) with Python: Stories + Pagination + Comments
A production-grade Hacker News scraper: parse the real HTML, crawl multiple pages, extract stories and comment threads, and export clean JSON. Includes terminal-style runs and selector rationale.
tutorial#python#hackernews#web-scraping
How to Scrape MDN Docs Pages with Python
Extract headings and table-of-contents structure from MDN docs pages with Python and BeautifulSoup.
tutorial#python#mdn#web-scraping
How to Scrape the Python Docs Module Index with Python
Build a searchable dataset from the Python docs module index using Python and BeautifulSoup.
tutorial#python#docs#web-scraping