Web Scraping with C# and HtmlAgilityPack: A Practical 2026 Tutorial

If you’re searching for “web scraping with C#”, you usually want one of two things:

  1. a real example that works end-to-end (not a fragment)
  2. a way to keep it reliable when you’re scraping more than a couple pages

This guide is a practical 2026 tutorial on building a C# scraper using:

  • HttpClient for requests
  • HtmlAgilityPack for parsing HTML
  • pagination crawling
  • exporting data to CSV and JSON
  • basic reliability patterns (timeouts, retries, respectful delays)

We’ll scrape a simple, static target so you can focus on the fundamentals.

Example target used here: https://quotes.toscrape.com/ (a public demo site for scraping practice)

When your C# scraper scales, stabilize fetches with ProxiesAPI

C# is excellent for reliable scrapers — but at scale you still hit throttling, geo-variance, and intermittent blocks. ProxiesAPI helps keep your network layer stable so your parsers see consistent HTML.


1) When C# is a great choice for web scraping

C#/.NET is underrated for scraping. It gives you:

  • a fast, strongly-typed language
  • excellent HTTP tooling (HttpClient)
  • great JSON support (System.Text.Json)
  • easy concurrency (Tasks)
  • good packaging/deployment options (containers, Windows services, etc.)

The tradeoff: you need to be slightly more explicit than Python in a few places.


2) Project setup (dotnet + HtmlAgilityPack)

Create a new console app:

dotnet new console -n QuoteScraper
cd QuoteScraper

Add HtmlAgilityPack:

dotnet add package HtmlAgilityPack

3) Fetching HTML with HttpClient (timeouts + headers)

Most sites will treat a default client differently from a browser. At minimum:

  • set a reasonable timeout
  • set a User-Agent
  • handle non-200 responses

Create a file HttpFetch.cs:

using System;
using System.Net.Http;
using System.Threading.Tasks;

public static class HttpFetch
{
    private static readonly HttpClient _http = new HttpClient(new HttpClientHandler
    {
        AutomaticDecompression = System.Net.DecompressionMethods.GZip |
                               System.Net.DecompressionMethods.Deflate |
                               System.Net.DecompressionMethods.Brotli
    })
    {
        Timeout = TimeSpan.FromSeconds(30)
    };

    static HttpFetch()
    {
        _http.DefaultRequestHeaders.UserAgent.ParseAdd(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " +
            "(KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36");
        _http.DefaultRequestHeaders.Accept.ParseAdd(
            "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
        _http.DefaultRequestHeaders.AcceptLanguage.ParseAdd("en-US,en;q=0.9");
    }

    public static async Task<string> GetStringAsync(string url)
    {
        using var resp = await _http.GetAsync(url);
        if (!resp.IsSuccessStatusCode)
        {
            var msg = $"HTTP {(int)resp.StatusCode} {resp.ReasonPhrase} for {url}";
            throw new HttpRequestException(msg);
        }

        return await resp.Content.ReadAsStringAsync();
    }
}

4) Parsing HTML with HtmlAgilityPack

HtmlAgilityPack gives you an HTML DOM plus XPath queries.

We’ll scrape:

  • quote text
  • author
  • tags

Each quote block on quotes.toscrape.com looks like:

  • div.quote
    • span.text
    • small.author
    • div.tags a.tag

Create a file Quote.cs:

using System.Collections.Generic;

public record Quote(string Text, string Author, List<string> Tags);

Create a parser QuoteParser.cs:

using System.Collections.Generic;
using System.Linq;
using HtmlAgilityPack;

public static class QuoteParser
{
    public static List<Quote> ParseQuotes(string html)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        var outList = new List<Quote>();
        var quoteNodes = doc.DocumentNode.SelectNodes("//div[contains(@class,'quote')]")
                         ?? new HtmlNodeCollection(null);

        foreach (var q in quoteNodes)
        {
            var text = q.SelectSingleNode(".//span[@class='text']")?.InnerText?.Trim();
            var author = q.SelectSingleNode(".//small[@class='author']")?.InnerText?.Trim();

            var tags = q.SelectNodes(".//div[@class='tags']//a[contains(@class,'tag')]")
                        ?.Select(n => n.InnerText.Trim())
                        .Where(s => !string.IsNullOrWhiteSpace(s))
                        .ToList()
                        ?? new List<string>();

            if (!string.IsNullOrWhiteSpace(text) && !string.IsNullOrWhiteSpace(author))
                outList.Add(new Quote(text!, author!, tags));
        }

        return outList;
    }
}

5) Pagination: crawling multiple pages safely

Most scraping jobs become “crawl list pages → follow links → scrape details”.

For our demo site:

  • page 1: https://quotes.toscrape.com/
  • page 2: https://quotes.toscrape.com/page/2/

We’ll crawl until there’s no “Next” link.

Create Crawler.cs:

using System;
using System.Collections.Generic;
using System.Threading.Tasks;
using HtmlAgilityPack;

public static class Crawler
{
    public static async Task<List<Quote>> CrawlAllQuotesAsync()
    {
        var results = new List<Quote>();

        var pageUrl = "https://quotes.toscrape.com/";

        while (true)
        {
            var html = await HttpFetch.GetStringAsync(pageUrl);
            results.AddRange(QuoteParser.ParseQuotes(html));

            // find Next page link
            var doc = new HtmlDocument();
            doc.LoadHtml(html);

            var next = doc.DocumentNode.SelectSingleNode("//li[@class='next']/a");
            if (next == null) break;

            var href = next.GetAttributeValue("href", null);
            if (string.IsNullOrWhiteSpace(href)) break;

            pageUrl = new Uri(new Uri(pageUrl), href).ToString();

            // be polite
            await Task.Delay(600);
        }

        return results;
    }
}

6) Export to CSV and JSON

Create Export.cs:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.Json;

public static class Export
{
    public static void ToJson(string path, List<Quote> quotes)
    {
        var json = JsonSerializer.Serialize(quotes, new JsonSerializerOptions
        {
            WriteIndented = true
        });

        File.WriteAllText(path, json, Encoding.UTF8);
    }

    public static void ToCsv(string path, List<Quote> quotes)
    {
        var sb = new StringBuilder();
        sb.AppendLine("text,author,tags");

        foreach (var q in quotes)
        {
            var tags = string.Join("|", q.Tags.Select(Escape));
            sb.AppendLine($"{Escape(q.Text)},{Escape(q.Author)},{tags}");
        }

        File.WriteAllText(path, sb.ToString(), Encoding.UTF8);
    }

    private static string Escape(string s)
    {
        if (s == null) return "";
        var needs = s.Contains(",") || s.Contains("\"") || s.Contains("\n");
        var t = s.Replace("\"", "\"\"");
        return needs ? $"\"{t}\"" : t;
    }
}

And wire it up in Program.cs:

using System;
using System.Threading.Tasks;

public class Program
{
    public static async Task Main()
    {
        var quotes = await Crawler.CrawlAllQuotesAsync();

        Console.WriteLine($"quotes: {quotes.Count}");
        Export.ToJson("quotes.json", quotes);
        Export.ToCsv("quotes.csv", quotes);

        Console.WriteLine("wrote quotes.json and quotes.csv");
    }
}

Run it:

dotnet run

You should see output like:

quotes: 100
wrote quotes.json and quotes.csv

7) Reliability upgrades you’ll want in real scrapers

Once you move beyond a demo site, add these patterns:

  1. Retry policy for transient errors (429/5xx)
  2. Rate limiting (don’t exceed a certain QPS)
  3. Caching (don’t refetch pages you already processed)
  4. Robust selectors (multiple fallbacks; don’t assume one XPath always works)
  5. Block detection (CAPTCHA pages, “unusual traffic”, login walls)

In .NET, retries are often done with Polly (a widely used resilience library). If you’re avoiding dependencies, implement a small exponential backoff loop yourself.


8) Where ProxiesAPI fits

C# is excellent for building reliable scrapers, but at scale you still hit:

  • IP-based throttling
  • geo-dependent pages
  • intermittent 403/429
  • different markup when the site suspects automation

A proxy-backed fetch layer can help.

A simple pattern is to keep your application the same, but route requests through a proxy/API at the HTTP layer.


9) Checklist: production-ready “web scraping with C#”

  • HttpClient has timeouts and decompression
  • You send a realistic User-Agent
  • Parsers handle missing nodes without crashing
  • Pagination stops correctly (no infinite loops)
  • You export clean, escaped CSV
  • You have retries/backoff and a throttle delay

If you want, tell me the site you’re scraping and whether it’s server-rendered or JS-heavy — and I’ll suggest the right C# architecture (HTMLAgilityPack vs Playwright).

When your C# scraper scales, stabilize fetches with ProxiesAPI

C# is excellent for reliable scrapers — but at scale you still hit throttling, geo-variance, and intermittent blocks. ProxiesAPI helps keep your network layer stable so your parsers see consistent HTML.

Related guides

Web Scraping with Go (Colly Framework): Complete Guide
Learn web scraping in Go using Colly: selectors, concurrency, rate limits, retries, and exporting to JSON/CSV. Includes a practical ProxiesAPI integration pattern for more reliable crawling.
guide#go#golang#colly
How to Scrape Google Finance Data with Python (Quotes, News, and Historical Prices)
Scrape Google Finance quote pages for price, key stats, news headlines, and a simple historical price series with Python. Includes selector-first HTML parsing, CSV export, and block-avoidance tactics (timeouts, retries, and ProxiesAPI-friendly patterns).
guide#python#google-finance#web-scraping
Scrape Government Contract Data from SAM.gov (Opportunities + Details)
Build an end-to-end SAM.gov scraper: search opportunities, paginate results, fetch detail pages, normalize fields, and export JSON/CSV using ProxiesAPI. Includes screenshots + robust retry patterns.
tutorial#python#sam-gov#government
Scrape UK Property Prices from Rightmove (Dataset Builder + Screenshots)
Build a repeatable Rightmove sold-price dataset pipeline in Python: crawl result pages, extract listing URLs, parse sold-price details, and export clean CSV/JSON with retries and politeness.
tutorial#python#rightmove#real-estate