Web Scraping with Go and Colly in 2026: The Complete Guide

Build blazing-fast web scrapers with Go's most popular scraping framework. From basic collectors to distributed scraping at scale.

๐Ÿ“‘ Table of Contents

Go is one of the fastest languages for web scraping. Its built-in concurrency model (goroutines), compiled speed, and tiny memory footprint make it perfect for scraping millions of pages. And Colly โ€” Go's premier scraping framework โ€” makes it elegant too.

In this guide, you'll learn everything about web scraping with Go and Colly in 2026: from basic collectors to distributed, production-grade scrapers that process thousands of pages per minute.

Why Go for Web Scraping?

Go brings unique advantages to web scraping that Python and Node.js can't match:

Installation & Project Setup

Make sure you have Go 1.21+ installed, then create a new project:

# Create project directory
mkdir my-scraper && cd my-scraper

# Initialize Go module
go mod init my-scraper

# Install Colly v2
go get github.com/gocolly/colly/v2

# Optional: install goquery for advanced HTML parsing
go get github.com/PuerkitoBio/goquery

Your project structure:

my-scraper/
โ”œโ”€โ”€ go.mod
โ”œโ”€โ”€ go.sum
โ””โ”€โ”€ main.go

Your First Colly Scraper

Colly uses a collector pattern with callbacks. You create a collector, attach callbacks for different events, then start scraping:

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly/v2"
)

func main() {
    // Create a new collector
    c := colly.NewCollector(
        // Restrict domains to scrape
        colly.AllowedDomains("quotes.toscrape.com"),
    )

    // Called when a HTML element matching the selector is found
    c.OnHTML(".quote", func(e *colly.HTMLElement) {
        quote := e.ChildText(".text")
        author := e.ChildText(".author")
        fmt.Printf(""%s" โ€” %s\n", quote, author)
    })

    // Called before a request is made
    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting:", r.URL.String())
    })

    // Called if an error occurs during the request
    c.OnError(func(r *colly.Response, err error) {
        log.Printf("Error on %s: %v", r.Request.URL, err)
    })

    // Start scraping
    err := c.Visit("https://quotes.toscrape.com/")
    if err != nil {
        log.Fatal(err)
    }
}

Run it:

go run main.go

That's it โ€” a working scraper in ~30 lines. Colly handles HTTP requests, HTML parsing, and error handling automatically.

CSS Selectors & Data Extraction

Colly uses goquery under the hood, giving you jQuery-style CSS selectors:

// Extract text content
title := e.ChildText("h1")

// Extract an attribute
href := e.ChildAttr("a", "href")
imgSrc := e.ChildAttr("img", "src")

// Extract multiple items
e.ForEach("li.item", func(i int, el *colly.HTMLElement) {
    name := el.ChildText(".name")
    price := el.ChildText(".price")
    link := el.ChildAttr("a", "href")
    fmt.Printf("%d. %s โ€” %s (%s)\n", i+1, name, price, link)
})

// Get the raw HTML of an element
html, _ := e.DOM.Html()

// Use goquery directly for complex selections
e.DOM.Find("table tr").Each(func(i int, s *goquery.Selection) {
    cells := s.Find("td")
    col1 := cells.Eq(0).Text()
    col2 := cells.Eq(1).Text()
    fmt.Printf("Row %d: %s | %s\n", i, col1, col2)
})

Common CSS Selectors

SelectorMatchesExample
divTag nameAll <div> elements
.classClass.product-card
#idID#main-content
div.classTag + classspan.price
div > pDirect childarticle > h2
div pDescendant.card .title
[attr=val]Attribute[data-type="premium"]
a[href^="https"]Starts withExternal links
:nth-child(n)Positiontr:nth-child(2)

Colly's Callback System

Colly's power comes from its callback system. Each callback fires at a different stage of the scraping lifecycle:

c := colly.NewCollector()

// 1. Before making a request
c.OnRequest(func(r *colly.Request) {
    r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
    fmt.Println("โ†’", r.URL)
})

// 2. When the server responds (raw response)
c.OnResponse(func(r *colly.Response) {
    fmt.Printf("Status: %d, Size: %d bytes\n", r.StatusCode, len(r.Body))
})

// 3. When an HTML element is found (most used)
c.OnHTML("h1", func(e *colly.HTMLElement) {
    fmt.Println("Title:", e.Text)
})

// 4. When an XML element is found (for RSS, sitemap, etc.)
c.OnXML("//item/title", func(e *colly.XMLElement) {
    fmt.Println("Feed item:", e.Text)
})

// 5. When scraping is finished for a page
c.OnScraped(func(r *colly.Response) {
    fmt.Println("โœ“ Done:", r.Request.URL)
})

// 6. When an error occurs
c.OnError(func(r *colly.Response, err error) {
    fmt.Printf("โœ— Error %d on %s: %v\n", r.StatusCode, r.Request.URL, err)
})

Callbacks are called in order: OnRequest โ†’ OnResponse โ†’ OnHTML/OnXML โ†’ OnScraped. If an error occurs, OnError is called instead of OnResponse.

Following Links & Pagination

Colly makes following links trivial โ€” just call Visit() inside an OnHTML callback:

c := colly.NewCollector(
    colly.AllowedDomains("quotes.toscrape.com"),
    colly.MaxDepth(3), // Limit crawl depth
)

// Scrape quotes from each page
c.OnHTML(".quote", func(e *colly.HTMLElement) {
    fmt.Printf(""%s" โ€” %s\n", e.ChildText(".text"), e.ChildText(".author"))
})

// Follow pagination links
c.OnHTML("li.next a[href]", func(e *colly.HTMLElement) {
    nextPage := e.Attr("href")
    fmt.Println("Following โ†’", nextPage)
    e.Request.Visit(nextPage) // Relative URLs resolved automatically
})

c.Visit("https://quotes.toscrape.com/")

Colly automatically deduplicates URLs โ€” it won't visit the same page twice (unless you set colly.AllowURLRevisit()).

Crawling an Entire Site

c := colly.NewCollector(
    colly.AllowedDomains("example.com"),
    colly.MaxDepth(5),
)

// Follow all internal links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    link := e.Attr("href")
    e.Request.Visit(link)
})

// Process each page
c.OnRequest(func(r *colly.Request) {
    fmt.Println("Crawling:", r.URL)
})

c.Visit("https://example.com/")

Concurrency & Rate Limiting

This is where Go and Colly truly shine. Colly has built-in concurrency and rate limiting โ€” no external libraries needed:

c := colly.NewCollector(
    colly.Async(true), // Enable asynchronous scraping
)

// Rate limiting rules
c.Limit(&colly.LimitRule{
    // Match all domains
    DomainGlob:  "*",
    // Max 5 concurrent requests per domain
    Parallelism: 5,
    // Wait 1 second between requests
    Delay:       1 * time.Second,
    // Add random delay up to 500ms
    RandomDelay: 500 * time.Millisecond,
})

c.OnHTML(".product", func(e *colly.HTMLElement) {
    fmt.Println(e.ChildText(".name"))
})

// Queue up multiple URLs
urls := []string{
    "https://example.com/page/1",
    "https://example.com/page/2",
    "https://example.com/page/3",
    // ... hundreds more
}

for _, url := range urls {
    c.Visit(url)
}

// Wait for all async requests to finish
c.Wait()

Per-Domain Rate Limiting

// Different rules for different domains
c.Limit(&colly.LimitRule{
    DomainGlob:  "*.fast-site.com",
    Parallelism: 10,
    Delay:       200 * time.Millisecond,
})

c.Limit(&colly.LimitRule{
    DomainGlob:  "*.slow-site.com",
    Parallelism: 2,
    Delay:       2 * time.Second,
})

Headers, Cookies & Sessions

c := colly.NewCollector()

// Set default headers
c.OnRequest(func(r *colly.Request) {
    r.Headers.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
    r.Headers.Set("Accept", "text/html,application/xhtml+xml")
    r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
    r.Headers.Set("Referer", "https://www.google.com/")
})

// Cookies are handled automatically per-domain
// To set custom cookies:
c.SetCookies("https://example.com", []*http.Cookie{
    {Name: "session_id", Value: "abc123"},
    {Name: "consent", Value: "accepted"},
})

Login & Session Persistence

c := colly.NewCollector()

// Step 1: Login
c.OnHTML("form[action='/login']", func(e *colly.HTMLElement) {
    // Extract CSRF token
    csrfToken := e.ChildAttr("input[name='csrf']", "value")

    // Submit login form
    e.Request.Post("https://example.com/login", map[string]string{
        "username": "myuser",
        "password": "mypass",
        "csrf":     csrfToken,
    })
})

// Step 2: After login, cookies are stored automatically
// Subsequent requests will include the session cookie
c.OnResponse(func(r *colly.Response) {
    if r.Request.URL.Path == "/login" {
        // Login successful, now scrape protected pages
        c.Visit("https://example.com/dashboard")
    }
})

c.Visit("https://example.com/login")

Proxy Rotation

Colly supports proxy rotation out of the box with round-robin or custom proxy switching:

import "github.com/gocolly/colly/v2/proxy"

c := colly.NewCollector()

// Round-robin proxy rotation
proxySwitcher, err := proxy.RoundRobinProxySwitcher(
    "http://proxy1.example.com:8080",
    "http://proxy2.example.com:8080",
    "http://proxy3.example.com:8080",
    "socks5://proxy4.example.com:1080",
)
if err != nil {
    log.Fatal(err)
}

c.SetProxyFunc(proxySwitcher)

Custom Proxy Selection

// Custom proxy function โ€” rotate based on request count
var requestCount int32

c.SetProxyFunc(func(r *http.Request) (*url.URL, error) {
    proxies := []string{
        "http://us-proxy.example.com:8080",
        "http://eu-proxy.example.com:8080",
        "http://asia-proxy.example.com:8080",
    }
    idx := atomic.AddInt32(&requestCount, 1) % int32(len(proxies))
    return url.Parse(proxies[idx])
})

Caching & Persistent Storage

Colly can cache responses to disk, reducing redundant requests during development:

c := colly.NewCollector(
    colly.CacheDir("./cache"), // Cache responses to disk
)

// Responses are cached by URL โ€” revisiting returns cached version
// Delete ./cache to force re-fetching

Exporting Data

import (
    "encoding/csv"
    "encoding/json"
    "os"
)

type Product struct {
    Name  string `json:"name"`
    Price string `json:"price"`
    URL   string `json:"url"`
}

var products []Product

c.OnHTML(".product-card", func(e *colly.HTMLElement) {
    products = append(products, Product{
        Name:  e.ChildText(".name"),
        Price: e.ChildText(".price"),
        URL:   e.Request.AbsoluteURL(e.ChildAttr("a", "href")),
    })
})

c.OnScraped(func(r *colly.Response) {
    // Export to JSON
    jsonFile, _ := os.Create("products.json")
    defer jsonFile.Close()
    json.NewEncoder(jsonFile).Encode(products)

    // Export to CSV
    csvFile, _ := os.Create("products.csv")
    defer csvFile.Close()
    w := csv.NewWriter(csvFile)
    w.Write([]string{"Name", "Price", "URL"})
    for _, p := range products {
        w.Write([]string{p.Name, p.Price, p.URL})
    }
    w.Flush()
})

Handling JavaScript-Rendered Pages

Colly is an HTTP-based scraper โ€” it doesn't execute JavaScript. For JS-heavy sites, you have three options:

Option 1: Find the Hidden API

Many "JavaScript-rendered" sites actually load data from JSON APIs. Check the browser's Network tab:

// If the site loads data from an API endpoint
c.OnResponse(func(r *colly.Response) {
    var data struct {
        Products []struct {
            Name  string `json:"name"`
            Price float64 `json:"price"`
        } `json:"products"`
    }
    json.Unmarshal(r.Body, &data)
    for _, p := range data.Products {
        fmt.Printf("%s: $%.2f\n", p.Name, p.Price)
    }
})

c.Visit("https://api.example.com/products?page=1")

Option 2: Use chromedp (Headless Chrome in Go)

import "github.com/chromedp/chromedp"

ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()

var htmlContent string
err := chromedp.Run(ctx,
    chromedp.Navigate("https://spa-site.com/products"),
    chromedp.WaitVisible(".product-list"),
    chromedp.OuterHTML("html", &htmlContent),
)

// Now parse htmlContent with goquery or feed it to Colly

Option 3: Use the Mantis API

Skip the infrastructure entirely. Mantis handles JavaScript rendering, proxy rotation, and anti-bot evasion:

// One API call replaces hundreds of lines of scraping code
resp, err := http.Get("https://api.mantisapi.com/v1/scrape?url=https://spa-site.com/products&render_js=true&api_key=YOUR_KEY")

var result struct {
    HTML     string            `json:"html"`
    Metadata map[string]string `json:"metadata"`
}
json.NewDecoder(resp.Body).Decode(&result)

๐Ÿฆž Need Data at Scale? Skip the Infrastructure

Mantis API handles JavaScript rendering, proxy rotation, and anti-bot evasion โ€” so you can focus on the data, not the scraping mechanics.

Get 100 Free API Calls โ†’

Production-Ready Scraper

Here's a complete, production-grade scraper with all best practices:

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "os"
    "strings"
    "sync"
    "time"

    "github.com/gocolly/colly/v2"
    "github.com/gocolly/colly/v2/proxy"
)

type ScrapedItem struct {
    Title       string   `json:"title"`
    URL         string   `json:"url"`
    Price       string   `json:"price,omitempty"`
    Description string   `json:"description,omitempty"`
    Tags        []string `json:"tags,omitempty"`
    ScrapedAt   string   `json:"scraped_at"`
}

func main() {
    var (
        items []ScrapedItem
        mu    sync.Mutex
        stats struct {
            Pages    int
            Items    int
            Errors   int
        }
    )

    c := colly.NewCollector(
        colly.AllowedDomains("example.com", "www.example.com"),
        colly.MaxDepth(5),
        colly.Async(true),
        colly.CacheDir("./cache"),
    )

    // Rate limiting
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 4,
        Delay:       1 * time.Second,
        RandomDelay: 500 * time.Millisecond,
    })

    // Rotate User-Agents
    userAgents := []string{
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    }
    var uaIdx int32

    c.OnRequest(func(r *colly.Request) {
        idx := int(uaIdx) % len(userAgents)
        uaIdx++
        r.Headers.Set("User-Agent", userAgents[idx])
        r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
    })

    // Extract items
    c.OnHTML(".product-card", func(e *colly.HTMLElement) {
        item := ScrapedItem{
            Title:       strings.TrimSpace(e.ChildText("h2")),
            URL:         e.Request.AbsoluteURL(e.ChildAttr("a", "href")),
            Price:       strings.TrimSpace(e.ChildText(".price")),
            Description: strings.TrimSpace(e.ChildText(".description")),
            ScrapedAt:   time.Now().UTC().Format(time.RFC3339),
        }
        e.ForEach(".tag", func(_ int, el *colly.HTMLElement) {
            item.Tags = append(item.Tags, el.Text)
        })

        mu.Lock()
        items = append(items, item)
        stats.Items++
        mu.Unlock()
    })

    // Follow pagination
    c.OnHTML("a.next-page", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnResponse(func(r *colly.Response) {
        mu.Lock()
        stats.Pages++
        mu.Unlock()
    })

    c.OnError(func(r *colly.Response, err error) {
        mu.Lock()
        stats.Errors++
        mu.Unlock()
        log.Printf("Error [%d] %s: %v", r.StatusCode, r.Request.URL, err)
    })

    // Start scraping
    start := time.Now()
    c.Visit("https://example.com/products")
    c.Wait()

    // Export results
    f, err := os.Create("results.json")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    enc := json.NewEncoder(f)
    enc.SetIndent("", "  ")
    enc.Encode(items)

    elapsed := time.Since(start)
    fmt.Printf("\n=== Scraping Complete ===\n")
    fmt.Printf("Pages:    %d\n", stats.Pages)
    fmt.Printf("Items:    %d\n", stats.Items)
    fmt.Printf("Errors:   %d\n", stats.Errors)
    fmt.Printf("Duration: %s\n", elapsed.Round(time.Millisecond))
    fmt.Printf("Speed:    %.1f pages/sec\n", float64(stats.Pages)/elapsed.Seconds())
}

Colly vs Other Scraping Tools

FeatureGo + CollyPython + ScrapyNode.js + PuppeteerMantis API
Speedโšก Fastest๐Ÿ”„ Fast๐ŸŒ Slow (browser)โšก Fast
ConcurrencyBuilt-in (goroutines)Built-in (Twisted)LimitedHandled by API
Memory~5-20 MB~50-200 MB~200-500 MBN/A
JavaScriptโŒ (needs chromedp)โŒ (needs Splash)โœ… Nativeโœ… Built-in
Anti-botManualManualStealth pluginโœ… Built-in
Proxy rotationBuilt-inManual/middlewareManualโœ… Built-in
DeploymentSingle binaryPython env + depsNode + ChromiumHTTP calls
Learning curveModerateModerateEasyEasiest
Best forHigh-performance crawlsLarge-scale projectsJS-heavy sitesAny site, any scale

Choose Colly when: You need raw speed, efficient resource usage, easy deployment, and are comfortable with Go. Perfect for infrastructure teams, microservices, and high-volume data pipelines.

The API Alternative: From Colly to Production in Minutes

Building a production scraper with Colly means managing proxies, rotating user agents, handling CAPTCHAs, and fighting anti-bot systems. Or you can make one API call:

package main

import (
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "net/url"
)

func main() {
    // One API call โ€” Mantis handles the rest
    target := url.QueryEscape("https://example.com/products")
    apiURL := fmt.Sprintf("https://api.mantisapi.com/v1/scrape?url=%s&render_js=true", target)

    req, _ := http.NewRequest("GET", apiURL, nil)
    req.Header.Set("X-API-Key", "your-api-key")

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)

    var result map[string]interface{}
    json.Unmarshal(body, &result)
    fmt.Println(result["html"])
}

What You Get vs What You Build

You Build (DIY)Mantis Handles
Proxy infrastructure ($200-1000/mo)โœ… Built-in proxy rotation
Anti-bot evasion codeโœ… Automatic anti-detection
JavaScript rendering (chromedp setup)โœ… Full JS rendering
CAPTCHA solving integrationโœ… Handled automatically
User-agent rotationโœ… Realistic browser headers
Error handling & retriesโœ… Built-in reliability
Maintenance & monitoringโœ… Managed infrastructure

๐Ÿฆž From Go to Data in One API Call

Stop building scraping infrastructure. Mantis gives you clean data from any website โ€” with Go, Python, Node.js, or any language that speaks HTTP.

Start Free โ€” 100 Calls/Month โ†’

Frequently Asked Questions

Is Go good for web scraping?

Yes โ€” Go is one of the best languages for web scraping. Its goroutines handle thousands of concurrent requests with minimal memory, compiled speed makes HTML parsing 5-10x faster than Python, and single-binary deployment makes it trivial to run scrapers anywhere. Colly adds an elegant API on top.

What is Colly in Go?

Colly is Go's most popular web scraping framework. It provides a callback-based API for making HTTP requests, parsing HTML with CSS selectors, following links, managing cookies, rate limiting, proxy rotation, and caching. Think of it as "Scrapy for Go" โ€” production-ready and battle-tested.

How fast is Colly compared to Python scraping?

Colly in Go is typically 5-10x faster than Python scraping libraries. In benchmarks, Colly processes 1,000+ pages per minute with 4 concurrent workers, while Scrapy manages 200-400. Go's compiled nature and lightweight goroutines give it a significant edge for high-volume scraping.

Can Colly handle JavaScript-rendered pages?

Colly alone doesn't execute JavaScript. For JS-heavy sites, use chromedp (Go's headless Chrome library) alongside Colly, find hidden API endpoints that serve JSON data, or use a scraping API like Mantis that handles JavaScript rendering automatically.

How does Colly handle rate limiting?

Colly has built-in rate limiting via LimitRule: set Delay (wait between requests), RandomDelay (jitter), Parallelism (concurrent requests per domain), and DomainGlob (pattern matching). This makes it easy to scrape responsibly.

Should I use Colly or a web scraping API?

Use Colly when you need maximum control and performance, and can manage proxy infrastructure. Use a web scraping API when you want to skip infrastructure management, need anti-bot evasion, or want JavaScript rendering without running headless browsers.

Related Guides

Go Colly Web Scraping Golang Data Extraction Tutorial