Web Scraping with VBA: Extract Website Data into Excel (with Proxies + Retry Logic)
If you’ve ever tried web scraping in VBA, you already know the vibe:
- It works… until it doesn’t.
- One random timeout, and your macro is stuck.
- One “Access denied” page, and your HTML parser happily extracts nonsense.
Still, scraping into Excel can be useful for:
- pulling a daily price list
- collecting a small list of URLs
- building an internal report where Excel is the “source of truth”
This guide focuses on practical VBA scraping:
- making HTTP requests from Excel (WinHTTP)
- parsing HTML (MSHTML)
- handling pagination
- adding retry + backoff
- routing requests via a ProxiesAPI proxy when sites rate-limit you
Target keyword: web scraping vba.
VBA is fragile under blocking and timeouts. If you must scrape from Excel, ProxiesAPI gives you a stable proxy endpoint you can route WinHTTP through to reduce 403/429 pain.
Reality check: when VBA is the wrong tool
VBA scraping is fine when:
- you need < 100 pages/day
- the site is simple HTML (no heavy JS rendering)
- you don’t need concurrency
Switch to Python (or a dedicated scraper) when:
- you need to scale (> 500 pages)
- you hit bot protection (Cloudflare, PerimeterX, etc.)
- the data requires JavaScript rendering
You can still export from Python to Excel—which is often the cleanest compromise.
Prerequisites (Windows Excel)
This tutorial assumes:
- Excel on Windows (WinHTTP + MSHTML work best there)
- You can open the VBA editor (
ALT + F11)
References to enable
In the VBA editor:
- Tools → References
- Check:
- Microsoft HTML Object Library
If you can’t enable references, you can still scrape via string parsing, but it’s more brittle.
Step 1: HTTP GET in VBA (WinHTTP)
WinHTTP is a solid choice for HTTP requests.
Option Explicit
Private Function HttpGet(ByVal url As String, Optional ByVal timeoutMs As Long = 30000) As String
Dim http As Object
Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
http.Open "GET", url, False
http.SetTimeouts timeoutMs, timeoutMs, timeoutMs, timeoutMs
http.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123 Safari/537.36"
http.SetRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
http.SetRequestHeader "Accept-Language", "en-US,en;q=0.9"
http.Send
If http.Status < 200 Or http.Status >= 300 Then
Err.Raise vbObjectError + 101, "HttpGet", "HTTP " & http.Status & " for " & url
End If
HttpGet = http.ResponseText
End Function
Step 2: Parse HTML (MSHTML)
For HTML pages, MSHTML can parse and let you query DOM-ish elements.
Private Function HtmlDoc(ByVal html As String) As Object
Dim doc As Object
Set doc = CreateObject("htmlfile")
doc.Open
doc.Write html
doc.Close
Set HtmlDoc = doc
End Function
Extracting elements
Private Function GetInnerTextBySelector(ByVal doc As Object, ByVal tagName As String, ByVal className As String) As String
Dim el As Object
Dim els As Object
Set els = doc.getElementsByTagName(tagName)
For Each el In els
If InStr(1, " " & el.className & " ", " " & className & " ", vbTextCompare) > 0 Then
GetInnerTextBySelector = Trim(el.innerText)
Exit Function
End If
Next el
GetInnerTextBySelector = ""
End Function
VBA doesn’t have great CSS selectors, so you typically:
- search by tag
- filter by className or attributes
- keep the parsing logic simple
Step 3: Add retry + exponential backoff
Retries matter more than people think. The web is flaky.
Private Function HttpGetWithRetry(ByVal url As String, Optional ByVal attempts As Integer = 5) As String
Dim i As Integer
Dim delayMs As Long
delayMs = 1000
For i = 1 To attempts
On Error GoTo TryFail
HttpGetWithRetry = HttpGet(url, 30000)
Exit Function
TryFail:
On Error GoTo 0
If i = attempts Then
Err.Raise vbObjectError + 102, "HttpGetWithRetry", "Failed after retries: " & url
End If
' Backoff (1s, 2s, 4s, 8s...)
Sleep delayMs
delayMs = delayMs * 2
Next i
End Function
Sleep is not built in by default. Add this at the top of a module:
#If VBA7 Then
Private Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
Private Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If
Step 4: Pagination pattern (collect multiple pages)
Most pagination boils down to:
- find the “next page” URL
- loop until
maxPages
In VBA, the easiest option is often to build URLs yourself.
Example pattern:
https://example.com/list?page=1...page=2
Public Sub ScrapePaginatedList()
Dim baseUrl As String
Dim page As Integer
Dim html As String
baseUrl = "https://example.com/list?page="
For page = 1 To 5
html = HttpGetWithRetry(baseUrl & CStr(page), 5)
' parse html and write rows to sheet
Debug.Print "page", page, "len", Len(html)
Next page
End Sub
ProxiesAPI: how to use a proxy from WinHTTP
When you start seeing 403/429 or timeouts, a proxy helps.
With ProxiesAPI, you typically get a proxy endpoint like:
http://USER:PASS@gateway.proxiesapi.com:PORT
In WinHTTP you can set a proxy with SetProxy.
Private Function HttpGetViaProxy(ByVal url As String, ByVal proxy As String) As String
Dim http As Object
Set http = CreateObject("WinHttp.WinHttpRequest.5.1")
http.Open "GET", url, False
' proxy format for WinHTTP:
' "http=HOST:PORT;https=HOST:PORT"
http.SetProxy 2, proxy
http.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123 Safari/537.36"
http.Send
If http.Status < 200 Or http.Status >= 300 Then
Err.Raise vbObjectError + 103, "HttpGetViaProxy", "HTTP " & http.Status & " via proxy for " & url
End If
HttpGetViaProxy = http.ResponseText
End Function
Note about credentials
If your proxy uses username/password, you can embed them in the proxy URL for many clients.
For WinHTTP, credential handling varies; the simplest approach is often:
- use a ProxiesAPI gateway that supports credentialed proxies
- or use a local proxy client that manages auth
If you have a single ProxiesAPI proxy URL like http://USER:PASS@HOST:PORT, a common workaround is to configure a local forward proxy and authenticate there.
Write results into Excel (clean pattern)
Create a sheet called Data with headers in row 1.
Private Sub WriteRow(ByVal rowIndex As Long, ByVal title As String, ByVal url As String, ByVal price As String)
With ThisWorkbook.Worksheets("Data")
.Cells(rowIndex, 1).Value = title
.Cells(rowIndex, 2).Value = url
.Cells(rowIndex, 3).Value = price
End With
End Sub
Then scrape:
Public Sub ScrapeExampleToSheet()
Dim html As String
Dim doc As Object
Dim title As String
html = HttpGetWithRetry("https://example.com/products", 5)
Set doc = HtmlDoc(html)
title = GetInnerTextBySelector(doc, "h1", "product-title")
WriteRow 2, title, "https://example.com/products", ""
End Sub
Common failure modes (and fixes)
1) You’re scraping a JS-rendered site
If the data isn’t in the HTML source, VBA can’t “see” it.
Fix:
- scrape a JSON endpoint instead
- or switch to Python + Playwright
2) Your macro hangs
Fix:
- always set timeouts
- always implement retries
3) You’re getting blocked
Fix:
- slow down
- rotate IPs (ProxiesAPI)
- keep sessions/cookies
Comparison: VBA vs Python for scraping
| Capability | VBA (Excel) | Python |
|---|---|---|
| Requests | OK (WinHTTP) | Excellent (requests/httpx) |
| HTML parsing | OK (MSHTML) | Excellent (BeautifulSoup/lxml) |
| Concurrency | Painful | Straightforward |
| JS rendering | Basically no | Playwright/Selenium |
| Maintainability | Low | High |
| Deploy as a service | No | Yes |
If this is a one-off internal report, VBA is fine.
If you want a durable pipeline, Python wins.
Where ProxiesAPI fits (honestly)
VBA doesn’t give you many levers:
- you can’t easily randomize TLS fingerprints
- you can’t easily do smart concurrency
- error handling is primitive
So when a site blocks you, the fastest improvement is often: route your requests through a reliable proxy.
That’s where ProxiesAPI helps — not as magic, but as a stable proxy layer while you keep the rest of your macro simple.
VBA is fragile under blocking and timeouts. If you must scrape from Excel, ProxiesAPI gives you a stable proxy endpoint you can route WinHTTP through to reduce 403/429 pain.