Web Scraping in Excel: 5 Ways to Import Website Data into Spreadsheets (Power Query + Python)

Excel is still the world’s most common “data pipeline”.

Which is why the keyword “web scraping excel” stays popular year after year.

The problem: people want “scrape a website into Excel” — but websites aren’t built for spreadsheets.

So you need to choose the right approach depending on:

  • is the page static HTML or JS-heavy?
  • do you need refresh on a schedule?
  • how fragile can this be?

This guide breaks down 5 practical ways to import website data into Excel, from simplest to most robust:

  1. Power Query (Get & Transform)
  2. Excel functions for HTML tables
  3. Office Scripts (for automation)
  4. VBA (legacy but still used)
  5. Python helper (recommended for reliability at scale)
When Excel pulls start failing, use a proxy-backed fetch layer

Excel is great for analysis, not reliability engineering. ProxiesAPI can help stabilize the network side when your imports hit rate limits or intermittent blocks.


1) Power Query (best default for most users)

If you want a repeatable import without coding, Power Query is your best first stop.

When it works best

  • the data is in a real HTML table
  • the page doesn’t require login
  • the site isn’t blocking automated requests aggressively

How to use it

  1. Excel → Data tab
  2. Get DataFrom Web
  3. Paste URL
  4. Choose a table / element
  5. Load to sheet

Power Query will let you:

  • rename columns
  • filter rows
  • merge data sources
  • refresh on demand

Common failure modes

  • The site renders via JavaScript → Power Query sees an empty shell
  • The site returns a consent page / bot check → you import garbage HTML
  • The site rate-limits you → refresh fails intermittently

If you hit these, keep reading.


2) Built-in Excel functions (quick-and-dirty)

Excel has functions like:

  • WEBSERVICE()
  • FILTERXML()

These can work for very simple cases where the URL returns predictable XML/HTML.

Example (conceptual)

=WEBSERVICE("https://example.com/prices.xml")

Then parse:

=FILTERXML(A1, "//price")

Limitations

  • doesn’t handle complex HTML well
  • brittle
  • often blocked

Use this when you control the endpoint or it’s intentionally machine-readable.


3) Office Scripts (automation + governance)

If you’re in Microsoft 365, Office Scripts can automate workbook refresh and transformations.

Typical use cases:

  • run a sequence: refresh queries → clean sheet → export range
  • schedule via Power Automate

Office Scripts won’t magically bypass websites that require JS rendering, but it helps you productize the workflow.


4) VBA (still common in legacy orgs)

VBA can:

  • call HTTP endpoints
  • parse HTML (with varying levels of pain)
  • fill worksheets

It’s not the future, but it’s still present.

If you’re maintaining an existing VBA-based pipeline, the main advice is:

  • isolate “fetch” from “parse”
  • implement retries/timeouts
  • log failures to a sheet

5) Python helper (best for reliability)

If the data matters and you need scheduled refresh without babysitting, a Python helper is usually the cleanest solution.

Architecture:

  • Python script fetches and parses data reliably
  • script writes CSV (or pushes to a database)
  • Excel reads the CSV via Power Query (stable)

This gives you the best of both worlds:

  • Python handles the messy web
  • Excel handles analysis and presentation

Minimal Python fetch + parse example

Let’s scrape a simple HTML table and export to CSV.

pip install requests beautifulsoup4 lxml python-dotenv
import csv
import os
import random
import time
from typing import Optional

import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv

load_dotenv()

PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")
TIMEOUT = (10, 30)


def make_session() -> requests.Session:
    s = requests.Session()
    s.headers.update({
        "User-Agent": (
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/123.0.0.0 Safari/537.36"
        ),
        "Accept-Language": "en-US,en;q=0.9",
    })
    return s


def fetch(url: str, session: requests.Session, max_attempts: int = 4) -> str:
    last_exc: Optional[Exception] = None

    for attempt in range(1, max_attempts + 1):
        time.sleep(random.uniform(0.8, 2.0))

        try:
            proxies = None
            if PROXY_URL:
                proxies = {"http": PROXY_URL, "https": PROXY_URL}

            r = session.get(url, timeout=TIMEOUT, proxies=proxies)

            if r.status_code in (403, 429, 500, 502, 503, 504):
                time.sleep(min(10, 1.5 ** attempt) + random.uniform(0, 0.7))
                continue

            r.raise_for_status()
            return r.text

        except Exception as e:
            last_exc = e
            time.sleep(min(10, 1.5 ** attempt) + random.uniform(0, 0.7))

    raise RuntimeError(f"Failed to fetch {url} after {max_attempts} attempts") from last_exc


def parse_first_table(html: str) -> list[list[str]]:
    soup = BeautifulSoup(html, "lxml")
    table = soup.select_one("table")
    if not table:
        return []

    rows = []
    for tr in table.select("tr"):
        cells = [c.get_text(" ", strip=True) for c in tr.select("th, td")]
        if cells:
            rows.append(cells)

    return rows


def write_csv(rows: list[list[str]], path: str = "export.csv") -> None:
    with open(path, "w", newline="", encoding="utf-8") as f:
        w = csv.writer(f)
        w.writerows(rows)


if __name__ == "__main__":
    url = "https://example.com/table"
    html = fetch(url, make_session())
    rows = parse_first_table(html)
    write_csv(rows, "export.csv")
    print("wrote rows", len(rows))

Pulling into Excel

In Excel:

  • Data → Get Data → From Text/CSV → select export.csv

Then you can refresh the file whenever the script updates it.


Where ProxiesAPI fits for Excel workflows

Most Excel import problems are network problems:

  • intermittent blocks
  • rate limits
  • IP reputation

Excel isn’t built to solve those.

A proxy-backed Python helper (or a small API you run yourself) is the cleanest way to make “web scraping in Excel” actually reliable.


Decision guide: which option should you choose?

  • One-off import from a simple page → Power Query
  • Repeatable import from stable HTML tables → Power Query + scheduled refresh
  • JS-heavy site → Python/Playwright pipeline → export to CSV → Excel reads it
  • You need reliability → move fetching out of Excel and into a script/service

Excel is great at what it’s great at. Don’t force it to be a web crawler.

When Excel pulls start failing, use a proxy-backed fetch layer

Excel is great for analysis, not reliability engineering. ProxiesAPI can help stabilize the network side when your imports hit rate limits or intermittent blocks.

Related guides

Playwright vs Selenium vs Puppeteer for Web Scraping (2026): Speed, Stealth, and When to Use Each
A practical 2026 decision guide comparing Playwright, Selenium, and Puppeteer for scraping: performance, detection risk, ecosystem, and real-world architecture patterns.
seo#playwright#selenium#puppeteer
Minimum Advertised Price (MAP) Monitoring: Tools, Workflows, and Data Sources
A practical MAP monitoring playbook for brands and channel teams: what to track, where to collect evidence, how to handle gray areas, and how to automate alerts with scraping + APIs (without getting blocked).
seo#minimum advertised price monitoring#pricing#ecommerce
Best Free Proxy Lists for Web Scraping (and Why They Usually Fail)
A practical look at free proxy lists: what’s actually usable, how to test them, and why production scraping needs a more reliable network layer.
seo#proxy#proxy-list#web-scraping
Scraping Airbnb Listings: Pricing, Availability, and Reviews (What’s Possible in 2026)
A realistic guide to scraping Airbnb in 2026: what you can collect from search + listing pages, what’s hard, and how to reduce blocks with careful crawling and a proxy layer.
seo#airbnb#web-scraping#python