Colly is the most popular web scraping framework for Go. It provides a clean, callback-based API for making HTTP requests, parsing HTML with CSS selectors, following links, handling cookies and sessions, rate limiting, proxy rotation, and distributed scraping. It's fast, elegant, and production-ready.

Web Scraping with Go and Colly in 2026: The Complete Guide

Q: Is Go good for web scraping?

Go is excellent for web scraping. Its built-in concurrency (goroutines), fast execution speed, low memory footprint, and easy deployment as a single binary make it ideal for high-performance scraping at scale. The Colly framework makes it even easier with a clean callback-based API.

Q: Can Colly handle JavaScript-rendered pages?

Colly alone cannot render JavaScript — it's an HTTP-based scraper. For JavaScript-heavy sites, you can use chromedp (Go's headless Chrome library) alongside Colly, intercept API endpoints that return JSON data, or use a web scraping API like Mantis that handles JavaScript rendering for you.

Q: How does Colly handle rate limiting?

Colly has built-in rate limiting via the LimitRule struct. You can set Delay (wait between requests), RandomDelay (random jitter), Parallelism (concurrent requests per domain), and DomainGlob (pattern matching for domains). This makes it easy to scrape responsibly without overwhelming target servers.

Q: Should I use Colly or a web scraping API?

Use Colly when you need maximum performance, full control over scraping logic, and are comfortable managing infrastructure (proxies, anti-bot evasion, maintenance). Use a web scraping API like Mantis when you want to skip infrastructure management, need built-in proxy rotation and anti-blocking, or want to focus on data rather than scraping mechanics.

Go is one of the fastest languages for web scraping. Its built-in concurrency model (goroutines), compiled speed, and tiny memory footprint make it perfect for scraping millions of pages. And Colly — Go's premier scraping framework — makes it elegant too.

In this guide, you'll learn everything about web scraping with Go and Colly in 2026: from basic collectors to distributed, production-grade scrapers that process thousands of pages per minute.

Why Go for Web Scraping?

Go brings unique advantages to web scraping that Python and Node.js can't match:

Speed — Compiled Go is 5-10x faster than Python for CPU-bound parsing tasks
Concurrency — Goroutines are lightweight (~2KB each vs ~1MB per OS thread), so you can run thousands of concurrent requests
Memory efficiency — Low memory footprint means you can scrape on small VPS instances
Single binary deployment — go build produces one binary with zero dependencies. Copy it anywhere, it runs
Strong typing — Catch data extraction bugs at compile time, not runtime
Standard library — Go's net/http and encoding/json are production-grade out of the box

Installation & Project Setup

Make sure you have Go 1.21+ installed, then create a new project:

# Create project directory
mkdir my-scraper && cd my-scraper

# Initialize Go module
go mod init my-scraper

# Install Colly v2
go get github.com/gocolly/colly/v2

# Optional: install goquery for advanced HTML parsing
go get github.com/PuerkitoBio/goquery

Your project structure:

my-scraper/
├── go.mod
├── go.sum
└── main.go

Your First Colly Scraper

Colly uses a collector pattern with callbacks. You create a collector, attach callbacks for different events, then start scraping:

package main

import (
    "fmt"
    "log"

    "github.com/gocolly/colly/v2"
)

func main() {
    // Create a new collector
    c := colly.NewCollector(
        // Restrict domains to scrape
        colly.AllowedDomains("quotes.toscrape.com"),
    )

    // Called when a HTML element matching the selector is found
    c.OnHTML(".quote", func(e *colly.HTMLElement) {
        quote := e.ChildText(".text")
        author := e.ChildText(".author")
        fmt.Printf(""%s" — %s\n", quote, author)
    })

    // Called before a request is made
    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting:", r.URL.String())
    })

    // Called if an error occurs during the request
    c.OnError(func(r *colly.Response, err error) {
        log.Printf("Error on %s: %v", r.Request.URL, err)
    })

    // Start scraping
    err := c.Visit("https://quotes.toscrape.com/")
    if err != nil {
        log.Fatal(err)
    }
}

Run it:

go run main.go

That's it — a working scraper in ~30 lines. Colly handles HTTP requests, HTML parsing, and error handling automatically.

CSS Selectors & Data Extraction

Colly uses goquery under the hood, giving you jQuery-style CSS selectors:

// Extract text content
title := e.ChildText("h1")

// Extract an attribute
href := e.ChildAttr("a", "href")
imgSrc := e.ChildAttr("img", "src")

// Extract multiple items
e.ForEach("li.item", func(i int, el *colly.HTMLElement) {
    name := el.ChildText(".name")
    price := el.ChildText(".price")
    link := el.ChildAttr("a", "href")
    fmt.Printf("%d. %s — %s (%s)\n", i+1, name, price, link)
})

// Get the raw HTML of an element
html, _ := e.DOM.Html()

// Use goquery directly for complex selections
e.DOM.Find("table tr").Each(func(i int, s *goquery.Selection) {
    cells := s.Find("td")
    col1 := cells.Eq(0).Text()
    col2 := cells.Eq(1).Text()
    fmt.Printf("Row %d: %s | %s\n", i, col1, col2)
})

Common CSS Selectors

Selector	Matches	Example
`div`	Tag name	All `<div>` elements
`.class`	Class	`.product-card`
`#id`	ID	`#main-content`
`div.class`	Tag + class	`span.price`
`div > p`	Direct child	`article > h2`
`div p`	Descendant	`.card .title`
`[attr=val]`	Attribute	`[data-type="premium"]`
`a[href^="https"]`	Starts with	External links
`:nth-child(n)`	Position	`tr:nth-child(2)`

Colly's Callback System

Colly's power comes from its callback system. Each callback fires at a different stage of the scraping lifecycle:

c := colly.NewCollector()

// 1. Before making a request
c.OnRequest(func(r *colly.Request) {
    r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
    fmt.Println("→", r.URL)
})

// 2. When the server responds (raw response)
c.OnResponse(func(r *colly.Response) {
    fmt.Printf("Status: %d, Size: %d bytes\n", r.StatusCode, len(r.Body))
})

// 3. When an HTML element is found (most used)
c.OnHTML("h1", func(e *colly.HTMLElement) {
    fmt.Println("Title:", e.Text)
})

// 4. When an XML element is found (for RSS, sitemap, etc.)
c.OnXML("//item/title", func(e *colly.XMLElement) {
    fmt.Println("Feed item:", e.Text)
})

// 5. When scraping is finished for a page
c.OnScraped(func(r *colly.Response) {
    fmt.Println("✓ Done:", r.Request.URL)
})

// 6. When an error occurs
c.OnError(func(r *colly.Response, err error) {
    fmt.Printf("✗ Error %d on %s: %v\n", r.StatusCode, r.Request.URL, err)
})

Callbacks are called in order: OnRequest → OnResponse → OnHTML/OnXML → OnScraped. If an error occurs, OnError is called instead of OnResponse.

Following Links & Pagination

Colly makes following links trivial — just call Visit() inside an OnHTML callback:

c := colly.NewCollector(
    colly.AllowedDomains("quotes.toscrape.com"),
    colly.MaxDepth(3), // Limit crawl depth
)

// Scrape quotes from each page
c.OnHTML(".quote", func(e *colly.HTMLElement) {
    fmt.Printf(""%s" — %s\n", e.ChildText(".text"), e.ChildText(".author"))
})

// Follow pagination links
c.OnHTML("li.next a[href]", func(e *colly.HTMLElement) {
    nextPage := e.Attr("href")
    fmt.Println("Following →", nextPage)
    e.Request.Visit(nextPage) // Relative URLs resolved automatically
})

c.Visit("https://quotes.toscrape.com/")

Colly automatically deduplicates URLs — it won't visit the same page twice (unless you set colly.AllowURLRevisit()).

Crawling an Entire Site

c := colly.NewCollector(
    colly.AllowedDomains("example.com"),
    colly.MaxDepth(5),
)

// Follow all internal links
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
    link := e.Attr("href")
    e.Request.Visit(link)
})

// Process each page
c.OnRequest(func(r *colly.Request) {
    fmt.Println("Crawling:", r.URL)
})

c.Visit("https://example.com/")

Concurrency & Rate Limiting

This is where Go and Colly truly shine. Colly has built-in concurrency and rate limiting — no external libraries needed:

c := colly.NewCollector(
    colly.Async(true), // Enable asynchronous scraping
)

// Rate limiting rules
c.Limit(&colly.LimitRule{
    // Match all domains
    DomainGlob:  "*",
    // Max 5 concurrent requests per domain
    Parallelism: 5,
    // Wait 1 second between requests
    Delay:       1 * time.Second,
    // Add random delay up to 500ms
    RandomDelay: 500 * time.Millisecond,
})

c.OnHTML(".product", func(e *colly.HTMLElement) {
    fmt.Println(e.ChildText(".name"))
})

// Queue up multiple URLs
urls := []string{
    "https://example.com/page/1",
    "https://example.com/page/2",
    "https://example.com/page/3",
    // ... hundreds more
}

for _, url := range urls {
    c.Visit(url)
}

// Wait for all async requests to finish
c.Wait()

Per-Domain Rate Limiting

// Different rules for different domains
c.Limit(&colly.LimitRule{
    DomainGlob:  "*.fast-site.com",
    Parallelism: 10,
    Delay:       200 * time.Millisecond,
})

c.Limit(&colly.LimitRule{
    DomainGlob:  "*.slow-site.com",
    Parallelism: 2,
    Delay:       2 * time.Second,
})

Headers, Cookies & Sessions

c := colly.NewCollector()

// Set default headers
c.OnRequest(func(r *colly.Request) {
    r.Headers.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
    r.Headers.Set("Accept", "text/html,application/xhtml+xml")
    r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
    r.Headers.Set("Referer", "https://www.google.com/")
})

// Cookies are handled automatically per-domain
// To set custom cookies:
c.SetCookies("https://example.com", []*http.Cookie{
    {Name: "session_id", Value: "abc123"},
    {Name: "consent", Value: "accepted"},
})

Login & Session Persistence

c := colly.NewCollector()

// Step 1: Login
c.OnHTML("form[action='/login']", func(e *colly.HTMLElement) {
    // Extract CSRF token
    csrfToken := e.ChildAttr("input[name='csrf']", "value")

    // Submit login form
    e.Request.Post("https://example.com/login", map[string]string{
        "username": "myuser",
        "password": "mypass",
        "csrf":     csrfToken,
    })
})

// Step 2: After login, cookies are stored automatically
// Subsequent requests will include the session cookie
c.OnResponse(func(r *colly.Response) {
    if r.Request.URL.Path == "/login" {
        // Login successful, now scrape protected pages
        c.Visit("https://example.com/dashboard")
    }
})

c.Visit("https://example.com/login")

Proxy Rotation

Colly supports proxy rotation out of the box with round-robin or custom proxy switching:

import "github.com/gocolly/colly/v2/proxy"

c := colly.NewCollector()

// Round-robin proxy rotation
proxySwitcher, err := proxy.RoundRobinProxySwitcher(
    "http://proxy1.example.com:8080",
    "http://proxy2.example.com:8080",
    "http://proxy3.example.com:8080",
    "socks5://proxy4.example.com:1080",
)
if err != nil {
    log.Fatal(err)
}

c.SetProxyFunc(proxySwitcher)

Custom Proxy Selection

// Custom proxy function — rotate based on request count
var requestCount int32

c.SetProxyFunc(func(r *http.Request) (*url.URL, error) {
    proxies := []string{
        "http://us-proxy.example.com:8080",
        "http://eu-proxy.example.com:8080",
        "http://asia-proxy.example.com:8080",
    }
    idx := atomic.AddInt32(&requestCount, 1) % int32(len(proxies))
    return url.Parse(proxies[idx])
})

Caching & Persistent Storage

Colly can cache responses to disk, reducing redundant requests during development:

c := colly.NewCollector(
    colly.CacheDir("./cache"), // Cache responses to disk
)

// Responses are cached by URL — revisiting returns cached version
// Delete ./cache to force re-fetching

Exporting Data

import (
    "encoding/csv"
    "encoding/json"
    "os"
)

type Product struct {
    Name  string `json:"name"`
    Price string `json:"price"`
    URL   string `json:"url"`
}

var products []Product

c.OnHTML(".product-card", func(e *colly.HTMLElement) {
    products = append(products, Product{
        Name:  e.ChildText(".name"),
        Price: e.ChildText(".price"),
        URL:   e.Request.AbsoluteURL(e.ChildAttr("a", "href")),
    })
})

c.OnScraped(func(r *colly.Response) {
    // Export to JSON
    jsonFile, _ := os.Create("products.json")
    defer jsonFile.Close()
    json.NewEncoder(jsonFile).Encode(products)

    // Export to CSV
    csvFile, _ := os.Create("products.csv")
    defer csvFile.Close()
    w := csv.NewWriter(csvFile)
    w.Write([]string{"Name", "Price", "URL"})
    for _, p := range products {
        w.Write([]string{p.Name, p.Price, p.URL})
    }
    w.Flush()
})

Handling JavaScript-Rendered Pages

Colly is an HTTP-based scraper — it doesn't execute JavaScript. For JS-heavy sites, you have three options:

Option 1: Find the Hidden API

Many "JavaScript-rendered" sites actually load data from JSON APIs. Check the browser's Network tab:

// If the site loads data from an API endpoint
c.OnResponse(func(r *colly.Response) {
    var data struct {
        Products []struct {
            Name  string `json:"name"`
            Price float64 `json:"price"`
        } `json:"products"`
    }
    json.Unmarshal(r.Body, &data)
    for _, p := range data.Products {
        fmt.Printf("%s: $%.2f\n", p.Name, p.Price)
    }
})

c.Visit("https://api.example.com/products?page=1")

Option 2: Use chromedp (Headless Chrome in Go)

import "github.com/chromedp/chromedp"

ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()

var htmlContent string
err := chromedp.Run(ctx,
    chromedp.Navigate("https://spa-site.com/products"),
    chromedp.WaitVisible(".product-list"),
    chromedp.OuterHTML("html", &htmlContent),
)

// Now parse htmlContent with goquery or feed it to Colly

Option 3: Use the Mantis API

Skip the infrastructure entirely. Mantis handles JavaScript rendering, proxy rotation, and anti-bot evasion:

// One API call replaces hundreds of lines of scraping code
resp, err := http.Get("https://api.mantisapi.com/v1/scrape?url=https://spa-site.com/products&render_js=true&api_key=YOUR_KEY")

var result struct {
    HTML     string            `json:"html"`
    Metadata map[string]string `json:"metadata"`
}
json.NewDecoder(resp.Body).Decode(&result)

🦞 Need Data at Scale? Skip the Infrastructure

Mantis API handles JavaScript rendering, proxy rotation, and anti-bot evasion — so you can focus on the data, not the scraping mechanics.

Get 100 Free API Calls →

Production-Ready Scraper

Here's a complete, production-grade scraper with all best practices:

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "os"
    "strings"
    "sync"
    "time"

    "github.com/gocolly/colly/v2"
    "github.com/gocolly/colly/v2/proxy"
)

type ScrapedItem struct {
    Title       string   `json:"title"`
    URL         string   `json:"url"`
    Price       string   `json:"price,omitempty"`
    Description string   `json:"description,omitempty"`
    Tags        []string `json:"tags,omitempty"`
    ScrapedAt   string   `json:"scraped_at"`
}

func main() {
    var (
        items []ScrapedItem
        mu    sync.Mutex
        stats struct {
            Pages    int
            Items    int
            Errors   int
        }
    )

    c := colly.NewCollector(
        colly.AllowedDomains("example.com", "www.example.com"),
        colly.MaxDepth(5),
        colly.Async(true),
        colly.CacheDir("./cache"),
    )

    // Rate limiting
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 4,
        Delay:       1 * time.Second,
        RandomDelay: 500 * time.Millisecond,
    })

    // Rotate User-Agents
    userAgents := []string{
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    }
    var uaIdx int32

    c.OnRequest(func(r *colly.Request) {
        idx := int(uaIdx) % len(userAgents)
        uaIdx++
        r.Headers.Set("User-Agent", userAgents[idx])
        r.Headers.Set("Accept-Language", "en-US,en;q=0.9")
    })

    // Extract items
    c.OnHTML(".product-card", func(e *colly.HTMLElement) {
        item := ScrapedItem{
            Title:       strings.TrimSpace(e.ChildText("h2")),
            URL:         e.Request.AbsoluteURL(e.ChildAttr("a", "href")),
            Price:       strings.TrimSpace(e.ChildText(".price")),
            Description: strings.TrimSpace(e.ChildText(".description")),
            ScrapedAt:   time.Now().UTC().Format(time.RFC3339),
        }
        e.ForEach(".tag", func(_ int, el *colly.HTMLElement) {
            item.Tags = append(item.Tags, el.Text)
        })

        mu.Lock()
        items = append(items, item)
        stats.Items++
        mu.Unlock()
    })

    // Follow pagination
    c.OnHTML("a.next-page", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnResponse(func(r *colly.Response) {
        mu.Lock()
        stats.Pages++
        mu.Unlock()
    })

    c.OnError(func(r *colly.Response, err error) {
        mu.Lock()
        stats.Errors++
        mu.Unlock()
        log.Printf("Error [%d] %s: %v", r.StatusCode, r.Request.URL, err)
    })

    // Start scraping
    start := time.Now()
    c.Visit("https://example.com/products")
    c.Wait()

    // Export results
    f, err := os.Create("results.json")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    enc := json.NewEncoder(f)
    enc.SetIndent("", "  ")
    enc.Encode(items)

    elapsed := time.Since(start)
    fmt.Printf("\n=== Scraping Complete ===\n")
    fmt.Printf("Pages:    %d\n", stats.Pages)
    fmt.Printf("Items:    %d\n", stats.Items)
    fmt.Printf("Errors:   %d\n", stats.Errors)
    fmt.Printf("Duration: %s\n", elapsed.Round(time.Millisecond))
    fmt.Printf("Speed:    %.1f pages/sec\n", float64(stats.Pages)/elapsed.Seconds())
}

Colly vs Other Scraping Tools

Feature	Go + Colly	Python + Scrapy	Node.js + Puppeteer	Mantis API
Speed	⚡ Fastest	🔄 Fast	🐌 Slow (browser)	⚡ Fast
Concurrency	Built-in (goroutines)	Built-in (Twisted)	Limited	Handled by API
Memory	~5-20 MB	~50-200 MB	~200-500 MB	N/A
JavaScript	❌ (needs chromedp)	❌ (needs Splash)	✅ Native	✅ Built-in
Anti-bot	Manual	Manual	Stealth plugin	✅ Built-in
Proxy rotation	Built-in	Manual/middleware	Manual	✅ Built-in
Deployment	Single binary	Python env + deps	Node + Chromium	HTTP calls
Learning curve	Moderate	Moderate	Easy	Easiest
Best for	High-performance crawls	Large-scale projects	JS-heavy sites	Any site, any scale

Choose Colly when: You need raw speed, efficient resource usage, easy deployment, and are comfortable with Go. Perfect for infrastructure teams, microservices, and high-volume data pipelines.

The API Alternative: From Colly to Production in Minutes

Building a production scraper with Colly means managing proxies, rotating user agents, handling CAPTCHAs, and fighting anti-bot systems. Or you can make one API call:

package main

import (
    "encoding/json"
    "fmt"
    "io"
    "net/http"
    "net/url"
)

func main() {
    // One API call — Mantis handles the rest
    target := url.QueryEscape("https://example.com/products")
    apiURL := fmt.Sprintf("https://api.mantisapi.com/v1/scrape?url=%s&render_js=true", target)

    req, _ := http.NewRequest("GET", apiURL, nil)
    req.Header.Set("X-API-Key", "your-api-key")

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)

    var result map[string]interface{}
    json.Unmarshal(body, &result)
    fmt.Println(result["html"])
}

What You Get vs What You Build

You Build (DIY)	Mantis Handles
Proxy infrastructure ($200-1000/mo)	✅ Built-in proxy rotation
Anti-bot evasion code	✅ Automatic anti-detection
JavaScript rendering (chromedp setup)	✅ Full JS rendering
CAPTCHA solving integration	✅ Handled automatically
User-agent rotation	✅ Realistic browser headers
Error handling & retries	✅ Built-in reliability
Maintenance & monitoring	✅ Managed infrastructure

🦞 From Go to Data in One API Call

Stop building scraping infrastructure. Mantis gives you clean data from any website — with Go, Python, Node.js, or any language that speaks HTTP.

Start Free — 100 Calls/Month →

Frequently Asked Questions

Is Go good for web scraping?

Yes — Go is one of the best languages for web scraping. Its goroutines handle thousands of concurrent requests with minimal memory, compiled speed makes HTML parsing 5-10x faster than Python, and single-binary deployment makes it trivial to run scrapers anywhere. Colly adds an elegant API on top.

What is Colly in Go?

Colly is Go's most popular web scraping framework. It provides a callback-based API for making HTTP requests, parsing HTML with CSS selectors, following links, managing cookies, rate limiting, proxy rotation, and caching. Think of it as "Scrapy for Go" — production-ready and battle-tested.

How fast is Colly compared to Python scraping?

Colly in Go is typically 5-10x faster than Python scraping libraries. In benchmarks, Colly processes 1,000+ pages per minute with 4 concurrent workers, while Scrapy manages 200-400. Go's compiled nature and lightweight goroutines give it a significant edge for high-volume scraping.

Can Colly handle JavaScript-rendered pages?

Colly alone doesn't execute JavaScript. For JS-heavy sites, use chromedp (Go's headless Chrome library) alongside Colly, find hidden API endpoints that serve JSON data, or use a scraping API like Mantis that handles JavaScript rendering automatically.

How does Colly handle rate limiting?

Colly has built-in rate limiting via LimitRule: set Delay (wait between requests), RandomDelay (jitter), Parallelism (concurrent requests per domain), and DomainGlob (pattern matching). This makes it easy to scrape responsibly.

Should I use Colly or a web scraping API?

Use Colly when you need maximum control and performance, and can manage proxy infrastructure. Use a web scraping API when you want to skip infrastructure management, need anti-bot evasion, or want JavaScript rendering without running headless browsers.

Related Guides

Go Colly Web Scraping Golang Data Extraction Tutorial