Scrape Weather Data for Any City (Open-Meteo)

Mar 21, 2026 · tutorial · #python, #open-meteo, #api, #requests, #json, #csv, #caching, #retries

Sometimes the easiest “scraping” project isn’t HTML at all — it’s turning a public API into a repeatable dataset pipeline.

Open-Meteo is a great example: you can fetch detailed hourly/daily weather forecasts (as JSON) without an API key.

Open-Meteo documentation page (we’ll call the forecast API and export datasets)

In this guide, you’ll build a small but production-shaped pipeline:

take a city name ("Mumbai", "Berlin", "Austin")
geocode it to latitude/longitude
call Open-Meteo’s forecast API
add retries, timeouts, and on-disk caching
export the result to JSON and a tidy CSV

We’ll also show how to route requests through ProxiesAPI when you want a consistent fetch layer across many jobs.

Keep your data pipelines reliable with ProxiesAPI

Even when you’re calling “friendly” APIs, network flakiness and rate limits show up at scale. ProxiesAPI gives you a single fetch interface you can standardize across scrapers and data jobs.

Get 1,000 free API calls View pricing

What we’re fetching

We’ll call two endpoints:

Geocoding (Open-Meteo Geocoding API)

used to turn a city name into coordinates
returns multiple matches, so you can choose the best

Forecast (Open-Meteo Forecast API)

takes latitude + longitude
returns time series for hourly/daily variables

We’ll keep it simple and fetch:

hourly: temperature, precipitation, wind
daily: max/min temp

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests

We’ll use the standard library for caching and CSV export.

Step 1: A fetch function (direct + ProxiesAPI)

Even for JSON APIs, you want:

timeouts (no hanging requests)
retries (transient failures happen)
consistent headers

Direct fetch

import requests

TIMEOUT = (10, 30)


def fetch_json_direct(url: str, params: dict | None = None) -> dict:
    r = requests.get(
        url,
        params=params,
        timeout=TIMEOUT,
        headers={"User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)"},
    )
    r.raise_for_status()
    return r.json()

Fetch via ProxiesAPI

ProxiesAPI works for any URL. The simplest mental model is: you pass a URL, you get the response body back.

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://geocoding-api.open-meteo.com/v1/search?name=Mumbai&count=1" | head

In Python, we URL-encode the full target URL (including querystring):

import urllib.parse
import requests

PROXIESAPI_KEY = "API_KEY"
TIMEOUT = (10, 60)


def fetch_json_via_proxiesapi(url: str, params: dict | None = None) -> dict:
    if params:
        url = url + ("&" if "?" in url else "?") + urllib.parse.urlencode(params)

    api = "http://api.proxiesapi.com/"
    req_url = api + "?" + urllib.parse.urlencode({"key": PROXIESAPI_KEY, "url": url})

    r = requests.get(
        req_url,
        timeout=TIMEOUT,
        headers={"User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)"},
    )
    r.raise_for_status()
    return r.json()

Step 2: Geocode a city name

Open-Meteo’s geocoder returns a results list.

We’ll request up to 5 matches and pick the first one.

GEOCODE_URL = "https://geocoding-api.open-meteo.com/v1/search"


def geocode_city(name: str) -> dict:
    data = fetch_json_direct(GEOCODE_URL, params={
        "name": name,
        "count": 5,
        "language": "en",
        "format": "json",
    })

    results = data.get("results") or []
    if not results:
        raise ValueError(f"No geocoding results for: {name}")

    r0 = results[0]
    return {
        "name": r0.get("name"),
        "country": r0.get("country"),
        "admin1": r0.get("admin1"),
        "latitude": r0.get("latitude"),
        "longitude": r0.get("longitude"),
        "timezone": r0.get("timezone"),
    }


loc = geocode_city("Mumbai")
print(loc)

Typical output:

{'name': 'Mumbai', 'country': 'India', 'admin1': 'Maharashtra', 'latitude': 19.07283, 'longitude': 72.88261, 'timezone': 'Asia/Kolkata'}

Step 3: Fetch a forecast for that location

Now we call the forecast endpoint.

FORECAST_URL = "https://api.open-meteo.com/v1/forecast"


def fetch_forecast(lat: float, lon: float, tz: str = "auto") -> dict:
    return fetch_json_direct(FORECAST_URL, params={
        "latitude": lat,
        "longitude": lon,
        "hourly": "temperature_2m,precipitation,wind_speed_10m",
        "daily": "temperature_2m_max,temperature_2m_min",
        "timezone": tz,
    })


forecast = fetch_forecast(loc["latitude"], loc["longitude"], tz=loc["timezone"])
print("keys:", forecast.keys())
print("hourly points:", len((forecast.get("hourly") or {}).get("time") or []))

Step 4: Add retries (so your pipeline doesn’t fall over)

Even with public APIs, you’ll sometimes see:

a timeout
a transient 5xx
a short-lived network error

Here’s a lightweight retry wrapper with exponential backoff:

import time
import random


def with_retries(fn, *args, attempts: int = 4, **kwargs):
    last = None
    for i in range(1, attempts + 1):
        try:
            return fn(*args, **kwargs)
        except Exception as e:
            last = e
            sleep = min(20, (2 ** i) + random.random())
            print(f"failed attempt {i}/{attempts}: {e}; sleeping {sleep:.1f}s")
            time.sleep(sleep)
    raise last

Use it like:

loc = with_retries(geocode_city, "Berlin")
forecast = with_retries(fetch_forecast, loc["latitude"], loc["longitude"], loc["timezone"])

Step 5: Add on-disk caching (so reruns are fast)

If you run the same job repeatedly (daily dashboards, refreshes, tests), caching saves you time and reduces unnecessary calls.

We’ll cache responses keyed by a safe filename.

import json
import hashlib
from pathlib import Path

CACHE_DIR = Path(".cache_openmeteo")
CACHE_DIR.mkdir(exist_ok=True)


def cache_key(url: str, params: dict | None) -> str:
    raw = url + "?" + ("" if not params else json.dumps(params, sort_keys=True))
    return hashlib.sha256(raw.encode("utf-8")).hexdigest()


def fetch_json_cached(url: str, params: dict | None = None, ttl_seconds: int = 3600) -> dict:
    key = cache_key(url, params)
    path = CACHE_DIR / f"{key}.json"

    if path.exists():
        age = (time.time() - path.stat().st_mtime)
        if age < ttl_seconds:
            return json.loads(path.read_text(encoding="utf-8"))

    data = fetch_json_direct(url, params=params)
    path.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8")
    return data

Now just swap the fetch calls.

Step 6: Export a tidy hourly CSV

Open-Meteo returns arrays (parallel lists). We’ll turn the hourly section into row-wise data.

import csv


def export_hourly_csv(forecast: dict, out_path: str = "hourly.csv"):
    hourly = forecast.get("hourly") or {}
    times = hourly.get("time") or []

    cols = {
        "temperature_2m": hourly.get("temperature_2m") or [],
        "precipitation": hourly.get("precipitation") or [],
        "wind_speed_10m": hourly.get("wind_speed_10m") or [],
    }

    with open(out_path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=["time", *cols.keys()])
        w.writeheader()

        for i, t in enumerate(times):
            row = {"time": t}
            for k, arr in cols.items():
                row[k] = arr[i] if i < len(arr) else None
            w.writerow(row)

    print("wrote", out_path, "rows", len(times))

Run end-to-end:

loc = geocode_city("Austin")
forecast = fetch_forecast(loc["latitude"], loc["longitude"], tz=loc["timezone"])

with open("forecast.json", "w", encoding="utf-8") as f:
    import json
    json.dump(forecast, f, ensure_ascii=False, indent=2)

export_hourly_csv(forecast, "austin_hourly.csv")

Where ProxiesAPI fits (honestly)

Open-Meteo is easy to use directly.

But if you’re building multiple data jobs (HTML scrapers + JSON APIs + enrichment steps), a consistent network layer helps:

same timeout strategy
same retry strategy
one place to standardize headers

That’s where ProxiesAPI is useful: you treat every target as “a URL that returns content”, and your pipeline stays uniform.

Checklist

geocoder returns multiple matches; you pick one deterministically
forecast call returns hourly arrays with the same length
retries prevent flaky failures
caching makes reruns fast
CSV export is row-wise (not column-wise)

Keep your data pipelines reliable with ProxiesAPI

Even when you’re calling “friendly” APIs, network flakiness and rate limits show up at scale. ProxiesAPI gives you a single fetch interface you can standardize across scrapers and data jobs.

Get 1,000 free API calls View pricing

Collect upcoming games, completed results, opponents, dates, networks, and home-away splits from ESPN team schedule pages using the serialized page data behind the HTML.

tutorial#python#espn#sports

Scrape Stack Overflow User Profiles and Badges with Python

Extract reputation, badge counts, top tags, and profile metadata from public Stack Overflow user pages into JSON/CSV with robust selectors and a ProxiesAPI-ready fetch layer.

tutorial#python#stack-overflow#web-scraping

Build a Job Board with Data from Indeed

Scrape Indeed job listings (title, company, location, salary, summary) with Python (requests + BeautifulSoup), then save a clean dataset you can render as a simple job board. Includes pagination + ProxiesAPI fetch.

tutorial#python#indeed#jobs

Scrape Goodreads Author Pages: Books, Series, Ratings (ProxiesAPI + Python)

Extract author profile data plus a clean list of books (title, URL, average rating, rating count) from Goodreads author pages. Includes real selectors, retries, and a screenshot.

tutorial#python#goodreads#web-scraping

Scrape Weather Data for Any City (Open-Meteo)

Related guides