Web Scraping VBA: Extract Data from Websites into Excel (Tables + Pagination)
If you searched for web scraping VBA, you’re probably in one of these situations:
- Excel is your team’s reporting tool, and the “source” is a website.
- You only need a small dataset (dozens to hundreds of rows).
- IT policy blocks installing Python/Node on the machine that runs the report.
This guide is a pragmatic playbook for extracting website data into Excel with VBA:
- fetch HTML with
WinHTTP - parse DOM with
MSHTML.HTMLDocument - extract tables into a worksheet
- paginate safely
- add retries/backoff so your macro doesn’t die on the first timeout
Excel scrapers tend to fail in the fetch step first (timeouts, 403/429s, flaky responses). ProxiesAPI gives you a proxy-backed fetch URL you can route WinHTTP through so retries and rotation stay simple.
Reality check: what VBA scraping is (and isn’t) good for
VBA scraping is good when:
- the pages are server-rendered (HTML contains the data)
- you can tolerate slower performance
- you’re scraping a small number of URLs
VBA scraping is a poor fit when:
- the site is JS-rendered (data loads after page render)
- you need concurrency and speed
- you need a robust crawler that runs unattended
If you’re in that second bucket, stop fighting Excel and build the scraper in Python—then load the CSV into Excel.
Approach options (pick the right one)
| Approach | Best for | Pros | Cons |
|---|---|---|---|
| Excel Power Query | tables/feeds you can refresh | easiest maintenance | limited for custom parsing |
| VBA + WinHTTP + MSHTML | custom parsing in Excel | flexible, deployable | fragile under blocking/timeouts |
| Python scraper + CSV → Excel | serious scraping | most reliable | requires runtime/install |
This post focuses on the middle option: VBA + WinHTTP + MSHTML.
Step 1: Enable required references
In the VBA editor:
Tools→References- Enable:
- Microsoft WinHTTP Services, version 5.1
- Microsoft HTML Object Library
If you can’t enable references (locked-down environment), you can still use late binding—but it’s more verbose.
Step 2: Fetch HTML with WinHTTP (with timeouts)
Option Explicit
Private Function HttpGet(ByVal url As String) As String
Dim http As WinHttp.WinHttpRequest
Set http = New WinHttp.WinHttpRequest
http.SetTimeouts 10000, 10000, 10000, 30000 ' resolve, connect, send, receive
http.Open "GET", url, False
http.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
http.SetRequestHeader "Accept-Language", "en-US,en;q=0.9"
http.Send
If http.Status < 200 Or http.Status >= 300 Then
Err.Raise vbObjectError + 1000, "HttpGet", "HTTP " & http.Status & " for " & url
End If
HttpGet = http.ResponseText
End Function
Add retry + backoff (mandatory for reliability)
Private Function HttpGetWithRetry(ByVal url As String, Optional ByVal maxRetries As Long = 4) As String
Dim attempt As Long
For attempt = 1 To maxRetries
On Error GoTo TryFail
HttpGetWithRetry = HttpGet(url)
Exit Function
TryFail:
If attempt = maxRetries Then
Err.Raise Err.Number, Err.Source, Err.Description
End If
Application.Wait Now + TimeSerial(0, 0, 2 ^ (attempt - 1))
Next attempt
End Function
Step 3: Parse HTML with MSHTML and extract a table
Private Function ParseHtml(ByVal html As String) As MSHTML.HTMLDocument
Dim doc As MSHTML.HTMLDocument
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = html
Set ParseHtml = doc
End Function
Now extract the first HTML table (common pattern for “prices”, “lists”, “directories”):
Private Sub WriteFirstTableToSheet(ByVal doc As MSHTML.HTMLDocument, ByVal ws As Worksheet)
Dim tbl As MSHTML.HTMLTable
Dim r As Long, c As Long
Dim row As MSHTML.HTMLTableRow
Dim cell As MSHTML.HTMLTableCell
Set tbl = doc.querySelector("table")
If tbl Is Nothing Then Err.Raise vbObjectError + 2000, "WriteFirstTableToSheet", "No <table> found"
r = 1
For Each row In tbl.Rows
c = 1
For Each cell In row.Cells
ws.Cells(r, c).Value = Trim$(cell.innerText)
c = c + 1
Next cell
r = r + 1
Next row
End Sub
Step 4: Paginate (and stop when the page ends)
Many sites use a ?page=N query parameter. A safe paginator:
- starts at page 1
- stops when the HTML repeats (or no rows extracted)
- caps pages to prevent runaway loops
Private Function BuildPageUrl(ByVal baseUrl As String, ByVal pageNum As Long) As String
If InStr(1, baseUrl, "?", vbTextCompare) > 0 Then
BuildPageUrl = baseUrl & "&page=" & CStr(pageNum)
Else
BuildPageUrl = baseUrl & "?page=" & CStr(pageNum)
End If
End Function
Public Sub ScrapePagedTable()
Dim baseUrl As String
baseUrl = "https://example.com/list"
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
ws.Cells.Clear
Dim pageNum As Long, outRow As Long
Dim lastFirstCell As String
outRow = 1
For pageNum = 1 To 10
Dim url As String
url = BuildPageUrl(baseUrl, pageNum)
Dim html As String
html = HttpGetWithRetry(url)
Dim doc As MSHTML.HTMLDocument
Set doc = ParseHtml(html)
Dim tbl As MSHTML.HTMLTable
Set tbl = doc.querySelector("table")
If tbl Is Nothing Then Exit For
Dim firstCellText As String
If tbl.Rows.Length = 0 Or tbl.Rows(0).Cells.Length = 0 Then Exit For
firstCellText = Trim$(tbl.Rows(0).Cells(0).innerText)
If firstCellText = lastFirstCell Then Exit For
lastFirstCell = firstCellText
' Write table under the previous output
ws.Cells(outRow, 1).Value = "Page " & pageNum
outRow = outRow + 1
' Simple table write (you can adapt to skip headers)
Dim row As MSHTML.HTMLTableRow, cell As MSHTML.HTMLTableCell
Dim r As Long, c As Long
r = outRow
For Each row In tbl.Rows
c = 1
For Each cell In row.Cells
ws.Cells(r, c).Value = Trim$(cell.innerText)
c = c + 1
Next cell
r = r + 1
Next row
outRow = r + 1
Next pageNum
End Sub
Proxies + VBA (how to keep it from getting blocked)
If your macro starts failing with 403/429 errors or intermittent timeouts:
- slow down (add delays)
- reduce page count per run
- rotate IPs via a proxy-backed fetch layer
ProxiesAPI routing pattern (conceptual)
ProxiesAPI provides a fetch URL like:
http://api.proxiesapi.com/?auth_key=YOUR_KEY&url=https://target.com/page
In VBA, you’d request that ProxiesAPI URL instead of the target URL:
- build the ProxiesAPI URL in a function
- keep everything else the same (retries, parsing, pagination)
That way your scraper stays maintainable: fetch → parse → write.
Practical checklist for web scraping VBA
- Keep
HttpGet()isolated and testable. - Add retry + exponential backoff.
- Parse with selectors (
querySelector) instead of brittle string splits. - Cap pagination.
- Write raw HTML to a sheet when debugging (you’ll thank yourself).
If you’re scraping more than a few hundred pages, graduate to Python—and keep Excel as the reporting layer.
Excel scrapers tend to fail in the fetch step first (timeouts, 403/429s, flaky responses). ProxiesAPI gives you a proxy-backed fetch URL you can route WinHTTP through so retries and rotation stay simple.