Web Scraping VBA: Extract Data from Websites into Excel
If you searched for web scraping VBA, you are probably not trying to build a giant distributed crawler. You are trying to get website data into Excel because that is where the business process already lives.
That is a valid use case. The mistake is pretending VBA scraping works the same way as a modern Python scraper. It does not. VBA is slower, less ergonomic, and easier to break. But for small, server-rendered targets it can still be perfectly useful.
This guide focuses on the workflows that actually hold up:
- fetch HTML with
WinHTTP - parse DOM with
MSHTML.HTMLDocument - extract tables or repeated cards into a worksheet
- paginate safely
- add retries and a stable request layer
VBA projects usually fail in the fetch step first: timeouts, 403s, 429s, and inconsistent responses. ProxiesAPI lets you add a proxy-backed request URL without rewriting the rest of your Excel macro.
When VBA is the right tool
Use VBA when:
- Excel is the final destination anyway
- the site is mostly server-rendered HTML
- the job is small enough to run serially
- installing Python or Node is difficult in your environment
Do not use VBA when:
- the data appears only after heavy client-side JavaScript
- you need concurrency or high throughput
- the scraper must run unattended across thousands of URLs
If you are in that second bucket, build the scraper elsewhere and import the output into Excel.
Your three main options
| Approach | Best for | Good parts | Tradeoffs |
|---|---|---|---|
| Power Query | clean tables and feeds | easiest to maintain | limited custom parsing |
| VBA + WinHTTP + MSHTML | custom HTML extraction in Excel | flexible, deployable | brittle under retries, blocks, and markup drift |
| Python scraper -> CSV -> Excel | larger scraping jobs | most reliable | extra runtime and tooling |
This post is about the middle option.
Step 1: Enable the references
In the VBA editor:
- Open
Tools -> References - Enable:
Microsoft WinHTTP Services, version 5.1Microsoft HTML Object Library
If your environment blocks references, you can use late binding, but early binding is easier to debug.
Step 2: Fetch HTML with WinHTTP
Option Explicit
Private Function HttpGet(ByVal targetUrl As String) As String
Dim http As WinHttp.WinHttpRequest
Set http = New WinHttp.WinHttpRequest
http.SetTimeouts 10000, 10000, 10000, 30000
http.Open "GET", targetUrl, False
http.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
http.SetRequestHeader "Accept-Language", "en-US,en;q=0.9"
http.Send
If http.Status < 200 Or http.Status >= 300 Then
Err.Raise vbObjectError + 1000, "HttpGet", "HTTP " & http.Status & " for " & targetUrl
End If
HttpGet = http.ResponseText
End Function
This function does exactly one thing: fetch HTML. Do not mix parsing and worksheet writes into the same routine unless you enjoy debugging giant macros.
Step 3: Add retry and backoff
Transient failures are normal. Treat them as part of the job.
Private Function HttpGetWithRetry(ByVal targetUrl As String, Optional ByVal maxRetries As Long = 4) As String
Dim attempt As Long
For attempt = 1 To maxRetries
On Error GoTo TryAgain
HttpGetWithRetry = HttpGet(targetUrl)
Exit Function
TryAgain:
If attempt = maxRetries Then
Err.Raise Err.Number, Err.Source, Err.Description
End If
Application.Wait Now + TimeSerial(0, 0, 2 ^ (attempt - 1))
Err.Clear
Next attempt
End Function
If you skip retries, your macro will look "randomly broken" even though the problem is just normal network noise.
Step 4: Parse the response into an HTML document
Private Function ParseHtml(ByVal html As String) As MSHTML.HTMLDocument
Dim doc As MSHTML.HTMLDocument
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = html
Set ParseHtml = doc
End Function
From here, use CSS selectors with querySelector and querySelectorAll instead of fragile string slicing.
Step 5: Extract a table into Excel
This is the classic use case for web scraping VBA.
Private Sub WriteFirstTableToSheet(ByVal doc As MSHTML.HTMLDocument, ByVal ws As Worksheet)
Dim tbl As MSHTML.HTMLTable
Dim rowNode As MSHTML.HTMLTableRow
Dim cellNode As MSHTML.HTMLTableCell
Dim rowIndex As Long
Dim colIndex As Long
Set tbl = doc.querySelector("table")
If tbl Is Nothing Then
Err.Raise vbObjectError + 2000, "WriteFirstTableToSheet", "No table found"
End If
rowIndex = 1
For Each rowNode In tbl.Rows
colIndex = 1
For Each cellNode In rowNode.Cells
ws.Cells(rowIndex, colIndex).Value = Trim$(cellNode.innerText)
colIndex = colIndex + 1
Next cellNode
rowIndex = rowIndex + 1
Next rowNode
End Sub
If the site has more than one table, switch the selector to something more specific, such as table.pricing or div.results table.
Step 6: Handle repeated card layouts
Many sites are not table-based. They render a list of cards instead. VBA can still handle that if the HTML is server-rendered.
Private Sub WriteCardsToSheet(ByVal doc As MSHTML.HTMLDocument, ByVal ws As Worksheet)
Dim cards As IHTMLDOMChildrenCollection
Dim card As IHTMLElement
Dim rowIndex As Long
Set cards = doc.querySelectorAll("div.product-card")
rowIndex = 2
ws.Cells(1, 1).Value = "Title"
ws.Cells(1, 2).Value = "Price"
ws.Cells(1, 3).Value = "Link"
For Each card In cards
ws.Cells(rowIndex, 1).Value = card.querySelector("h2").innerText
ws.Cells(rowIndex, 2).Value = card.querySelector(".price").innerText
ws.Cells(rowIndex, 3).Value = card.querySelector("a").getAttribute("href")
rowIndex = rowIndex + 1
Next card
End Sub
The key habit is to inspect the site once, pick narrow selectors, and keep those selectors together in one parsing routine.
Step 7: Paginate without running forever
Private Function BuildPageUrl(ByVal baseUrl As String, ByVal pageNum As Long) As String
If InStr(1, baseUrl, "?", vbTextCompare) > 0 Then
BuildPageUrl = baseUrl & "&page=" & CStr(pageNum)
Else
BuildPageUrl = baseUrl & "?page=" & CStr(pageNum)
End If
End Function
Public Sub ScrapePages()
Dim baseUrl As String
Dim ws As Worksheet
Dim pageNum As Long
Dim nextRow As Long
baseUrl = "https://example.com/listings"
Set ws = ThisWorkbook.Worksheets("Sheet1")
ws.Cells.Clear
nextRow = 1
For pageNum = 1 To 5
Dim html As String
Dim doc As MSHTML.HTMLDocument
html = HttpGetWithRetry(BuildPageUrl(baseUrl, pageNum))
Set doc = ParseHtml(html)
If doc.querySelectorAll("table").Length = 0 Then Exit For
ws.Cells(nextRow, 1).Value = "Page " & pageNum
nextRow = nextRow + 1
Call WriteFirstTableToSheet(doc, ws)
nextRow = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row + 2
Application.Wait Now + TimeSerial(0, 0, 1)
Next pageNum
End Sub
Add a hard page cap. A runaway loop in VBA is more annoying than in Python because people usually notice it only after Excel freezes.
Where ProxiesAPI fits
If the site starts returning 403s, 429s, or inconsistent HTML, change the fetch URL rather than rebuilding the macro.
Private Function WrapWithProxiesApi(ByVal targetUrl As String, ByVal apiKey As String) As String
WrapWithProxiesApi = "https://api.proxiesapi.com/?auth_key=" & apiKey & "&url=" & targetUrl
End Function
Then call:
html = HttpGetWithRetry(WrapWithProxiesApi(BuildPageUrl(baseUrl, pageNum), "YOUR_KEY"))
That keeps your macro architecture the same:
- build URL
- fetch HTML
- parse DOM
- write cells
The proxy layer changes. The parser does not.
Practical advice
- Start with one page and one selector before adding loops.
- Save the raw HTML of a failing page when debugging.
- Do not scrape JavaScript-heavy apps with VBA unless you enjoy false starts.
- Keep worksheet writes separate from parsing logic so you can test both independently.
VBA scraping is not elegant, but it does not need to be. If the job is small, the selectors are stable, and Excel is the final destination, it is still a useful tool in 2026.
VBA projects usually fail in the fetch step first: timeouts, 403s, 429s, and inconsistent responses. ProxiesAPI lets you add a proxy-backed request URL without rewriting the rest of your Excel macro.