Web Scraping in Excel: 5 Ways to Import Website Data into Spreadsheets (Power Query + Python)
Excel is still the world’s most common “data pipeline”.
Which is why the keyword “web scraping excel” stays popular year after year.
The problem: people want “scrape a website into Excel” — but websites aren’t built for spreadsheets.
So you need to choose the right approach depending on:
- is the page static HTML or JS-heavy?
- do you need refresh on a schedule?
- how fragile can this be?
This guide breaks down 5 practical ways to import website data into Excel, from simplest to most robust:
- Power Query (Get & Transform)
- Excel functions for HTML tables
- Office Scripts (for automation)
- VBA (legacy but still used)
- Python helper (recommended for reliability at scale)
Excel is great for analysis, not reliability engineering. ProxiesAPI can help stabilize the network side when your imports hit rate limits or intermittent blocks.
1) Power Query (best default for most users)
If you want a repeatable import without coding, Power Query is your best first stop.
When it works best
- the data is in a real HTML table
- the page doesn’t require login
- the site isn’t blocking automated requests aggressively
How to use it
- Excel → Data tab
- Get Data → From Web
- Paste URL
- Choose a table / element
- Load to sheet
Power Query will let you:
- rename columns
- filter rows
- merge data sources
- refresh on demand
Common failure modes
- The site renders via JavaScript → Power Query sees an empty shell
- The site returns a consent page / bot check → you import garbage HTML
- The site rate-limits you → refresh fails intermittently
If you hit these, keep reading.
2) Built-in Excel functions (quick-and-dirty)
Excel has functions like:
WEBSERVICE()FILTERXML()
These can work for very simple cases where the URL returns predictable XML/HTML.
Example (conceptual)
=WEBSERVICE("https://example.com/prices.xml")
Then parse:
=FILTERXML(A1, "//price")
Limitations
- doesn’t handle complex HTML well
- brittle
- often blocked
Use this when you control the endpoint or it’s intentionally machine-readable.
3) Office Scripts (automation + governance)
If you’re in Microsoft 365, Office Scripts can automate workbook refresh and transformations.
Typical use cases:
- run a sequence: refresh queries → clean sheet → export range
- schedule via Power Automate
Office Scripts won’t magically bypass websites that require JS rendering, but it helps you productize the workflow.
4) VBA (still common in legacy orgs)
VBA can:
- call HTTP endpoints
- parse HTML (with varying levels of pain)
- fill worksheets
It’s not the future, but it’s still present.
If you’re maintaining an existing VBA-based pipeline, the main advice is:
- isolate “fetch” from “parse”
- implement retries/timeouts
- log failures to a sheet
5) Python helper (best for reliability)
If the data matters and you need scheduled refresh without babysitting, a Python helper is usually the cleanest solution.
Architecture:
- Python script fetches and parses data reliably
- script writes CSV (or pushes to a database)
- Excel reads the CSV via Power Query (stable)
This gives you the best of both worlds:
- Python handles the messy web
- Excel handles analysis and presentation
Minimal Python fetch + parse example
Let’s scrape a simple HTML table and export to CSV.
pip install requests beautifulsoup4 lxml python-dotenv
import csv
import os
import random
import time
from typing import Optional
import requests
from bs4 import BeautifulSoup
from dotenv import load_dotenv
load_dotenv()
PROXY_URL = os.getenv("PROXIESAPI_PROXY_URL")
TIMEOUT = (10, 30)
def make_session() -> requests.Session:
s = requests.Session()
s.headers.update({
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/123.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
})
return s
def fetch(url: str, session: requests.Session, max_attempts: int = 4) -> str:
last_exc: Optional[Exception] = None
for attempt in range(1, max_attempts + 1):
time.sleep(random.uniform(0.8, 2.0))
try:
proxies = None
if PROXY_URL:
proxies = {"http": PROXY_URL, "https": PROXY_URL}
r = session.get(url, timeout=TIMEOUT, proxies=proxies)
if r.status_code in (403, 429, 500, 502, 503, 504):
time.sleep(min(10, 1.5 ** attempt) + random.uniform(0, 0.7))
continue
r.raise_for_status()
return r.text
except Exception as e:
last_exc = e
time.sleep(min(10, 1.5 ** attempt) + random.uniform(0, 0.7))
raise RuntimeError(f"Failed to fetch {url} after {max_attempts} attempts") from last_exc
def parse_first_table(html: str) -> list[list[str]]:
soup = BeautifulSoup(html, "lxml")
table = soup.select_one("table")
if not table:
return []
rows = []
for tr in table.select("tr"):
cells = [c.get_text(" ", strip=True) for c in tr.select("th, td")]
if cells:
rows.append(cells)
return rows
def write_csv(rows: list[list[str]], path: str = "export.csv") -> None:
with open(path, "w", newline="", encoding="utf-8") as f:
w = csv.writer(f)
w.writerows(rows)
if __name__ == "__main__":
url = "https://example.com/table"
html = fetch(url, make_session())
rows = parse_first_table(html)
write_csv(rows, "export.csv")
print("wrote rows", len(rows))
Pulling into Excel
In Excel:
- Data → Get Data → From Text/CSV → select
export.csv
Then you can refresh the file whenever the script updates it.
Where ProxiesAPI fits for Excel workflows
Most Excel import problems are network problems:
- intermittent blocks
- rate limits
- IP reputation
Excel isn’t built to solve those.
A proxy-backed Python helper (or a small API you run yourself) is the cleanest way to make “web scraping in Excel” actually reliable.
Decision guide: which option should you choose?
- One-off import from a simple page → Power Query
- Repeatable import from stable HTML tables → Power Query + scheduled refresh
- JS-heavy site → Python/Playwright pipeline → export to CSV → Excel reads it
- You need reliability → move fetching out of Excel and into a script/service
Excel is great at what it’s great at. Don’t force it to be a web crawler.
Excel is great for analysis, not reliability engineering. ProxiesAPI can help stabilize the network side when your imports hit rate limits or intermittent blocks.