Web Scraping in Excel: 5 Ways to Import Website Data into Spreadsheets (Power Query + Python)

Apr 02, 2026 · seo · #excel, #power-query, #web-scraping, #spreadsheets, #python, #automation

Excel is still the world’s most common “data pipeline”.

Which is why the keyword “web scraping excel” stays popular year after year.

The problem: people want “scrape a website into Excel” — but websites aren’t built for spreadsheets.

So you need to choose the right approach depending on:

is the page static HTML or JS-heavy?
do you need refresh on a schedule?
how fragile can this be?

This guide breaks down 5 practical ways to import website data into Excel, from simplest to most robust:

Power Query (Get & Transform)
Excel functions for HTML tables
Office Scripts (for automation)
VBA (legacy but still used)
Python helper (recommended for reliability at scale)

When Excel pulls start failing, use a proxy-backed fetch layer

Excel is great for analysis, not reliability engineering. ProxiesAPI can help stabilize the network side when your imports hit rate limits or intermittent blocks.

Get 1,000 free API calls View pricing

1) Power Query (best default for most users)

If you want a repeatable import without coding, Power Query is your best first stop.

When it works best

the data is in a real HTML table
the page doesn’t require login
the site isn’t blocking automated requests aggressively

How to use it

Excel → Data tab
Get Data → From Web
Paste URL
Choose a table / element
Load to sheet

Power Query will let you:

rename columns
filter rows
merge data sources
refresh on demand

Common failure modes

The site renders via JavaScript → Power Query sees an empty shell
The site returns a consent page / bot check → you import garbage HTML
The site rate-limits you → refresh fails intermittently

If you hit these, keep reading.

2) Built-in Excel functions (quick-and-dirty)

Excel has functions like:

WEBSERVICE()
FILTERXML()

These can work for very simple cases where the URL returns predictable XML/HTML.

Example (conceptual)

=WEBSERVICE("https://example.com/prices.xml")

Then parse:

=FILTERXML(A1, "//price")

Limitations

doesn’t handle complex HTML well
brittle
often blocked

Use this when you control the endpoint or it’s intentionally machine-readable.

3) Office Scripts (automation + governance)

If you’re in Microsoft 365, Office Scripts can automate workbook refresh and transformations.

Typical use cases:

run a sequence: refresh queries → clean sheet → export range
schedule via Power Automate

Office Scripts won’t magically bypass websites that require JS rendering, but it helps you productize the workflow.

4) VBA (still common in legacy orgs)

VBA can:

call HTTP endpoints
parse HTML (with varying levels of pain)
fill worksheets

It’s not the future, but it’s still present.

If you’re maintaining an existing VBA-based pipeline, the main advice is:

isolate “fetch” from “parse”
implement retries/timeouts
log failures to a sheet

5) Python helper (best for reliability)

If the data matters and you need scheduled refresh without babysitting, a Python helper is usually the cleanest solution.

Architecture:

Python script fetches and parses data reliably
script writes CSV (or pushes to a database)
Excel reads the CSV via Power Query (stable)

This gives you the best of both worlds:

Python handles the messy web
Excel handles analysis and presentation

Minimal Python fetch + parse example

Let’s scrape a simple HTML table and export to CSV.

pip install requests beautifulsoup4 lxml python-dotenv

import csv
import os
import random
import time
from typing import Optional

import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv

load_dotenv()

PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")
TIMEOUT = (10, 30)


def make_session() -> requests.Session:
    s = requests.Session()
    s.headers.update({
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/123.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
    })
    return s


def fetch(url: str, session: requests.Session, max_attempts: int = 4) -> str:
    last_exc: Optional[Exception] = None

    for attempt in range(1, max_attempts + 1):
        time.sleep(random.uniform(0.8, 2.0))

        try:
            proxies = None
            if PROXY_URL:
                proxies = {"http": PROXY_URL, "https": PROXY_URL}

            r = session.get(url, timeout=TIMEOUT, proxies=proxies)

            if r.status_code in (403, 429, 500, 502, 503, 504):
                time.sleep(min(10, 1.5 ** attempt) + random.uniform(0, 0.7))
                continue

            r.raise_for_status()
            return r.text

        except Exception as e:
            last_exc = e
            time.sleep(min(10, 1.5 ** attempt) + random.uniform(0, 0.7))

    raise RuntimeError(f"Failed to fetch {url} after {max_attempts} attempts") from last_exc


def parse_first_table(html: str) -> list[list[str]]:
    soup = BeautifulSoup(html, "lxml")
    table = soup.select_one("table")
    if not table:
        return []

    rows = []
    for tr in table.select("tr"):
        cells = [c.get_text(" ", strip=True) for c in tr.select("th, td")]
        if cells:
            rows.append(cells)

    return rows


def write_csv(rows: list[list[str]], path: str = "export.csv") -> None:
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.writer(f)
        w.writerows(rows)


if __name__ == "__main__":
    url = "https://example.com/table"
    html = fetch(url, make_session())
    rows = parse_first_table(html)
    write_csv(rows, "export.csv")
    print("wrote rows", len(rows))

Pulling into Excel

In Excel:

Data → Get Data → From Text/CSV → select export.csv

Then you can refresh the file whenever the script updates it.

Where ProxiesAPI fits for Excel workflows

Most Excel import problems are network problems:

intermittent blocks
rate limits
IP reputation

Excel isn’t built to solve those.

A proxy-backed Python helper (or a small API you run yourself) is the cleanest way to make “web scraping in Excel” actually reliable.

Decision guide: which option should you choose?

One-off import from a simple page → Power Query
Repeatable import from stable HTML tables → Power Query + scheduled refresh
JS-heavy site → Python/Playwright pipeline → export to CSV → Excel reads it
You need reliability → move fetching out of Excel and into a script/service

Excel is great at what it’s great at. Don’t force it to be a web crawler.

When Excel pulls start failing, use a proxy-backed fetch layer

Excel is great for analysis, not reliability engineering. ProxiesAPI can help stabilize the network side when your imports hit rate limits or intermittent blocks.

Get 1,000 free API calls View pricing

A practical guide to getting website data into Excel: Power Query for simple pages, handling pagination, and when to switch to Python + ProxiesAPI for reliable scheduled imports.

seo#excel#power-query#web-scraping

Web Scraping Excel: Import Website Data into Spreadsheets (No-Code + Power Query + VBA)

A practical guide to getting website data into Excel: Power Query (HTML tables + pagination), Office Scripts for scheduled pulls, and VBA for legacy flows—plus when you still need proxies and a Python pipeline.

guides#excel#power-query#office-scripts

API for Dummies: How APIs Work and When to Use One Instead of Scraping

A plain-English guide to APIs, with examples of requests and responses, plus a practical framework for deciding when an API beats web scraping.

seo#api#web-scraping#python

Data Scraping Tool: What to Look For Before You Buy or Build

A buyer-focused guide to picking a data scraping tool, including proxy support, parsing reliability, scheduling, exports, and total cost.

guides#data scraping tool#web-scraping#buyers-guide

Web Scraping in Excel: 5 Ways to Import Website Data into Spreadsheets (Power Query + Python)

Related guides