How to Scrape GitHub Releases with Python (Versions + Notes + Diffs)
If you’re shipping software, releases are a data stream.
This tutorial builds a simple monitor that:
- scrapes a GitHub repo’s Releases page
- extracts version tags + dates + notes
- stores structured JSON
- computes diffs so you can alert when something changes

Turn release watching into a reliable monitor
Once you track many repos and run this hourly/daily, reliability becomes the bottleneck. ProxiesAPI helps keep fetches stable as you scale.
Target URL
Example repo:
https://github.com/openclaw/openclaw/releases
Replace the owner/repo to track your own targets.
Setup
python -m venv .venv
source .venv/bin/activate
pip install requests beautifulsoup4 lxml
Step 1: Fetch HTML
import requests
TIMEOUT = (10, 30)
UA = "Mozilla/5.0 (compatible; ProxiesAPIGuidesBot/1.0; +https://www.proxiesapi.com/)"
url = "https://github.com/openclaw/openclaw/releases"
html = requests.get(url, timeout=TIMEOUT, headers={"User-Agent": UA}).text
print(len(html))
Step 2: Parse releases
GitHub’s HTML structure changes, so keep parsing defensive.
We’ll extract:
- tag (e.g. v1.2.3)
- title
- URL to the release page
- published time (when present)
- notes text (collapsed to plain text)
from bs4 import BeautifulSoup
def parse_releases(html: str) -> list[dict]:
soup = BeautifulSoup(html, "lxml")
releases = []
# Each release is typically a section/region with a heading + tag link.
for block in soup.select('div.release, div.js-release'):
h2 = block.select_one('h2')
title = h2.get_text(" ", strip=True) if h2 else None
tag_a = block.select_one('a[href*="/tree/"]')
tag = tag_a.get_text(" ", strip=True) if tag_a else None
rel_a = block.select_one('a[href*="/releases/tag/"]')
rel_url = f"https://github.com{rel_a.get('href')}" if rel_a and rel_a.get('href','').startswith('/') else (rel_a.get('href') if rel_a else None)
time_el = block.select_one('relative-time, time')
published = time_el.get('datetime') if time_el else None
notes_el = block.select_one('[data-test-selector="release-notes"]') or block
notes = notes_el.get_text("\n", strip=True) if notes_el else ""
# keep notes short in the listing; store full notes separately if needed
releases.append({
"tag": tag,
"title": title,
"url": rel_url,
"published": published,
"notes": notes,
})
# fallback: if selectors miss, still return something by scanning headings
if not releases:
for region in soup.select('[aria-label*="Releases"], main'):
pass
return releases
rows = parse_releases(html)
print('releases:', len(rows))
print(rows[0] if rows else None)
Step 3: Store snapshots and compute diffs
The simplest monitoring model is: write a JSON snapshot today, compare with yesterday.
import json
from pathlib import Path
out_dir = Path('release_snapshots')
out_dir.mkdir(exist_ok=True)
snapshot_path = out_dir / 'openclaw_openclaw.json'
prev = None
if snapshot_path.exists():
prev = json.loads(snapshot_path.read_text('utf-8'))
snapshot_path.write_text(json.dumps(rows, ensure_ascii=False, indent=2), 'utf-8')
print('wrote', snapshot_path)
if prev is not None:
prev_tags = [r.get('tag') for r in prev]
curr_tags = [r.get('tag') for r in rows]
new = [t for t in curr_tags if t and t not in set(prev_tags)]
if new:
print('new releases:', new)
ProxiesAPI usage (canonical)
curl "http://api.proxiesapi.com/?key=API_KEY&url=https://github.com/openclaw/openclaw/releases"
QA checklist
- Extracts at least the latest release tag + URL
- Snapshot JSON writes correctly
- Diff detects new tags
Turn release watching into a reliable monitor
Once you track many repos and run this hourly/daily, reliability becomes the bottleneck. ProxiesAPI helps keep fetches stable as you scale.
Related guides
Scrape Netflix Catalogue Data with Python + ProxiesAPI (Titles, Genres, Availability)
Build a repeatable Netflix title dataset from listing pages: extract title rows, handle pagination defensively, dedupe, and export clean JSONL. Includes a screenshot of the target UI.
tutorial#python#netflix#web-scraping
Scrape Pinterest Images and Pins (Search + Board URLs) with Python + ProxiesAPI
Extract pin titles, image URLs, outbound links, and board metadata from Pinterest search + board pages with pagination, retries, and defensive parsing. Includes a screenshot of the target UI.
tutorial#python#pinterest#web-scraping
Scrape Stack Overflow Questions and Answers by Tag (Python + ProxiesAPI)
Extract Stack Overflow question lists and accepted answers for a tag with robust retries, respectful rate limits, and a validation screenshot. Export to JSON/CSV.
tutorial#python#stack-overflow#web-scraping
Scrape Patreon Creator Data with Python (Profiles, Tiers, Posts)
Extract Patreon creator metadata, membership tiers, and recent public posts with a screenshot-first workflow, robust retries, and ProxiesAPI-backed requests.
tutorial#python#patreon#web-scraping