Scrape Weather Data for Any City (Open-Meteo)

Sometimes the easiest “scraping” project isn’t HTML at all — it’s turning a public API into a repeatable dataset pipeline.

Open-Meteo is a great example: you can fetch detailed hourly/daily weather forecasts (as JSON) without an API key.

Open-Meteo documentation page (we’ll call the forecast API and export datasets)

In this guide, you’ll build a small but production-shaped pipeline:

  • take a city name ("Mumbai", "Berlin", "Austin")
  • geocode it to latitude/longitude
  • call Open-Meteo’s forecast API
  • add retries, timeouts, and on-disk caching
  • export the result to JSON and a tidy CSV

We’ll also show how to route requests through ProxiesAPI when you want a consistent fetch layer across many jobs.

Keep your data pipelines reliable with ProxiesAPI

Even when you’re calling “friendly” APIs, network flakiness and rate limits show up at scale. ProxiesAPI gives you a single fetch interface you can standardize across scrapers and data jobs.


What we’re fetching

We’ll call two endpoints:

  1. Geocoding (Open-Meteo Geocoding API)
  • used to turn a city name into coordinates
  • returns multiple matches, so you can choose the best
  1. Forecast (Open-Meteo Forecast API)
  • takes latitude + longitude
  • returns time series for hourly/daily variables

We’ll keep it simple and fetch:

  • hourly: temperature, precipitation, wind
  • daily: max/min temp

Setup

python3 -m venv .venv
source .venv/bin/activate
pip install requests

We’ll use the standard library for caching and CSV export.


Step 1: A fetch function (direct + ProxiesAPI)

Even for JSON APIs, you want:

  • timeouts (no hanging requests)
  • retries (transient failures happen)
  • consistent headers

Direct fetch

import requests

TIMEOUT = (10, 30)


def fetch_json_direct(url: str, params: dict | None = None) -> dict:
    r = requests.get(
        url,
        params=params,
        timeout=TIMEOUT,
        headers={"User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)"},
    )
    r.raise_for_status()
    return r.json()

Fetch via ProxiesAPI

ProxiesAPI works for any URL. The simplest mental model is: you pass a URL, you get the response body back.

curl "http://api.proxiesapi.com/?key=API_KEY&url=https://geocoding-api.open-meteo.com/v1/search?name=Mumbai&count=1" | head

In Python, we URL-encode the full target URL (including querystring):

import urllib.parse
import requests

PROXIESAPI_KEY = "API_KEY"
TIMEOUT = (10, 60)


def fetch_json_via_proxiesapi(url: str, params: dict | None = None) -> dict:
    if params:
        url = url + ("&" if "?" in url else "?") + urllib.parse.urlencode(params)

    api = "http://api.proxiesapi.com/"
    req_url = api + "?" + urllib.parse.urlencode({"key": PROXIESAPI_KEY, "url": url})

    r = requests.get(
        req_url,
        timeout=TIMEOUT,
        headers={"User-Agent": "Mozilla/5.0 (compatible; ProxiesAPI-Guides/1.0)"},
    )
    r.raise_for_status()
    return r.json()

Step 2: Geocode a city name

Open-Meteo’s geocoder returns a results list.

We’ll request up to 5 matches and pick the first one.

GEOCODE_URL = "https://geocoding-api.open-meteo.com/v1/search"


def geocode_city(name: str) -> dict:
    data = fetch_json_direct(GEOCODE_URL, params={
        "name": name,
        "count": 5,
        "language": "en",
        "format": "json",
    })

    results = data.get("results") or []
    if not results:
        raise ValueError(f"No geocoding results for: {name}")

    r0 = results[0]
    return {
        "name": r0.get("name"),
        "country": r0.get("country"),
        "admin1": r0.get("admin1"),
        "latitude": r0.get("latitude"),
        "longitude": r0.get("longitude"),
        "timezone": r0.get("timezone"),
    }


loc = geocode_city("Mumbai")
print(loc)

Typical output:

{'name': 'Mumbai', 'country': 'India', 'admin1': 'Maharashtra', 'latitude': 19.07283, 'longitude': 72.88261, 'timezone': 'Asia/Kolkata'}

Step 3: Fetch a forecast for that location

Now we call the forecast endpoint.

FORECAST_URL = "https://api.open-meteo.com/v1/forecast"


def fetch_forecast(lat: float, lon: float, tz: str = "auto") -> dict:
    return fetch_json_direct(FORECAST_URL, params={
        "latitude": lat,
        "longitude": lon,
        "hourly": "temperature_2m,precipitation,wind_speed_10m",
        "daily": "temperature_2m_max,temperature_2m_min",
        "timezone": tz,
    })


forecast = fetch_forecast(loc["latitude"], loc["longitude"], tz=loc["timezone"])
print("keys:", forecast.keys())
print("hourly points:", len((forecast.get("hourly") or {}).get("time") or []))

Step 4: Add retries (so your pipeline doesn’t fall over)

Even with public APIs, you’ll sometimes see:

  • a timeout
  • a transient 5xx
  • a short-lived network error

Here’s a lightweight retry wrapper with exponential backoff:

import time
import random


def with_retries(fn, *args, attempts: int = 4, **kwargs):
    last = None
    for i in range(1, attempts + 1):
        try:
            return fn(*args, **kwargs)
        except Exception as e:
            last = e
            sleep = min(20, (2 ** i) + random.random())
            print(f"failed attempt {i}/{attempts}: {e}; sleeping {sleep:.1f}s")
            time.sleep(sleep)
    raise last

Use it like:

loc = with_retries(geocode_city, "Berlin")
forecast = with_retries(fetch_forecast, loc["latitude"], loc["longitude"], loc["timezone"])

Step 5: Add on-disk caching (so reruns are fast)

If you run the same job repeatedly (daily dashboards, refreshes, tests), caching saves you time and reduces unnecessary calls.

We’ll cache responses keyed by a safe filename.

import json
import hashlib
from pathlib import Path

CACHE_DIR = Path(".cache_openmeteo")
CACHE_DIR.mkdir(exist_ok=True)


def cache_key(url: str, params: dict | None) -> str:
    raw = url + "?" + ("" if not params else json.dumps(params, sort_keys=True))
    return hashlib.sha256(raw.encode("utf-8")).hexdigest()


def fetch_json_cached(url: str, params: dict | None = None, ttl_seconds: int = 3600) -> dict:
    key = cache_key(url, params)
    path = CACHE_DIR / f"{key}.json"

    if path.exists():
        age = (time.time() - path.stat().st_mtime)
        if age < ttl_seconds:
            return json.loads(path.read_text(encoding="utf-8"))

    data = fetch_json_direct(url, params=params)
    path.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8")
    return data

Now just swap the fetch calls.


Step 6: Export a tidy hourly CSV

Open-Meteo returns arrays (parallel lists). We’ll turn the hourly section into row-wise data.

import csv


def export_hourly_csv(forecast: dict, out_path: str = "hourly.csv"):
    hourly = forecast.get("hourly") or {}
    times = hourly.get("time") or []

    cols = {
        "temperature_2m": hourly.get("temperature_2m") or [],
        "precipitation": hourly.get("precipitation") or [],
        "wind_speed_10m": hourly.get("wind_speed_10m") or [],
    }

    with open(out_path, "w", newline="", encoding="utf-8") as f:
        w = csv.DictWriter(f, fieldnames=["time", *cols.keys()])
        w.writeheader()

        for i, t in enumerate(times):
            row = {"time": t}
            for k, arr in cols.items():
                row[k] = arr[i] if i < len(arr) else None
            w.writerow(row)

    print("wrote", out_path, "rows", len(times))

Run end-to-end:

loc = geocode_city("Austin")
forecast = fetch_forecast(loc["latitude"], loc["longitude"], tz=loc["timezone"])

with open("forecast.json", "w", encoding="utf-8") as f:
    import json
    json.dump(forecast, f, ensure_ascii=False, indent=2)

export_hourly_csv(forecast, "austin_hourly.csv")

Where ProxiesAPI fits (honestly)

Open-Meteo is easy to use directly.

But if you’re building multiple data jobs (HTML scrapers + JSON APIs + enrichment steps), a consistent network layer helps:

  • same timeout strategy
  • same retry strategy
  • one place to standardize headers

That’s where ProxiesAPI is useful: you treat every target as “a URL that returns content”, and your pipeline stays uniform.


Checklist

  • geocoder returns multiple matches; you pick one deterministically
  • forecast call returns hourly arrays with the same length
  • retries prevent flaky failures
  • caching makes reruns fast
  • CSV export is row-wise (not column-wise)
Keep your data pipelines reliable with ProxiesAPI

Even when you’re calling “friendly” APIs, network flakiness and rate limits show up at scale. ProxiesAPI gives you a single fetch interface you can standardize across scrapers and data jobs.

Related guides

Scrape Wikipedia Article Data at Scale (Tables + Infobox + Links)
Extract structured fields from many Wikipedia pages (infobox + tables + links) with ProxiesAPI + Python, then save to CSV/JSON.
tutorial#python#wikipedia#web-scraping
How to Scrape Apartment Listings from Apartments.com (Python + ProxiesAPI)
Scrape Apartments.com listing cards and detail-page fields with Python. Includes pagination, resilient parsing, retries, and clean JSON/CSV exports.
tutorial#python#apartments#real-estate
How to Scrape Business Reviews from Yelp (Python + ProxiesAPI)
Extract Yelp search results and business-page review snippets with Python. Includes pagination, resilient selectors, retries, and a clean JSON/CSV export.
tutorial#python#yelp#reviews
Build a Job Board with Data from Indeed (Python scraper tutorial)
Scrape Indeed job listings (title, company, location, salary, summary) with Python (requests + BeautifulSoup), then save a clean dataset you can render as a simple job board. Includes pagination + ProxiesAPI fetch.
tutorial#python#indeed#jobs