Web Scraping for Government & Public Sector: How AI Agents Track Contracts, Grants & Policy Changes in 2026

Published March 12, 2026 · 15 min read · Government GovTech AI Agents Web Scraping

Government spending in the United States alone exceeds $6.5 trillion annually, with federal procurement accounting for over $700 billion in contracts awarded each year. Add state and local spending, and the total government market represents the single largest buyer of goods and services on the planet.

Yet tracking government opportunities remains surprisingly manual. GovWin IQ ($5K–$25K/year), Bloomberg Government ($8K–$15K/seat), and Deltek ($3K–$20K/year) charge premium prices for intelligence that is, by definition, publicly available data. Federal contracts, grant announcements, regulatory changes, and spending reports are all published on government websites — but scattered across dozens of portals with inconsistent formats.

What if an AI agent could monitor SAM.gov for new contract opportunities, track Grants.gov for funding announcements, scrape the Federal Register for regulatory changes, and analyze USASpending.gov for spending trends — all automatically, for a fraction of the cost?

In this guide, you'll build an AI-powered government intelligence system that scrapes procurement opportunities, grant announcements, policy changes, and public spending data — then uses GPT-4o to generate opportunity assessments and strategic recommendations via Slack alerts.

Why AI Agents Are Transforming Government Intelligence

Government data is unique: it's almost entirely public by law. The Freedom of Information Act, open data mandates, and transparency requirements mean that an enormous amount of valuable intelligence is freely accessible — if you can find it, parse it, and analyze it at scale.

Traditional government intelligence platforms charge thousands per year for what is essentially structured access to public data. They add value through aggregation, search, and analysis — exactly what AI agents excel at.

The Government Data Landscape

SAM.gov — System for Award Management: all federal contract opportunities, entity registrations, and exclusions
Grants.gov — Federal grant announcements from 26+ agencies
Federal Register — Proposed rules, final rules, notices, and executive orders
USASpending.gov — Every federal dollar spent, searchable by agency, recipient, and program
FPDS — Federal Procurement Data System: historical contract award data
State procurement portals — 50 states, each with their own solicitation system
Congress.gov — Bills, votes, committee hearings, and legislative history
Regulations.gov — Public comments on proposed regulations

What Makes Government Scraping Different

Key insight: Unlike private-sector web scraping where you're extracting proprietary data, government scraping is accessing data that was explicitly published for public consumption. Most government websites have favorable robots.txt policies and many offer APIs (though they're often limited or poorly documented).

Architecture: Building a Government Intelligence Agent

Our system follows a six-step pipeline that monitors multiple government data sources, extracts structured intelligence, detects relevant opportunities, and delivers AI-powered analysis:

Source Discovery — Monitor SAM.gov, Grants.gov, Federal Register, USASpending.gov, state portals
AI Extraction — Use Mantis WebPerception API to scrape and structure government pages
SQLite Storage — Store opportunities, grants, regulations, and spending data locally
Opportunity Detection — Identify relevant contracts, grants, and policy changes
GPT-4o Analysis — Generate opportunity assessments, competitive landscape, and strategy
Slack Alerts — Deliver actionable intelligence to your government affairs team

Step 1: Define Data Models

We'll track four core government data types: contract opportunities, grant announcements, regulatory changes, and spending records.

from pydantic import BaseModel
from datetime import date, datetime
from typing import Optional
from enum import Enum

class ContractType(str, Enum):
    SOLICITATION = "solicitation"
    AWARD = "award"
    MODIFICATION = "modification"
    PRESOLICITATION = "presolicitation"
    SOURCES_SOUGHT = "sources_sought"
    SPECIAL_NOTICE = "special_notice"

class ContractOpportunity(BaseModel):
    """Federal contract opportunity from SAM.gov"""
    notice_id: str
    title: str
    agency: str
    sub_agency: Optional[str]
    type: ContractType
    posted_date: date
    response_deadline: Optional[date]
    naics_code: str
    naics_description: str
    set_aside: Optional[str]  # e.g., "Small Business", "8(a)", "HUBZone"
    place_of_performance: Optional[str]
    estimated_value: Optional[float]
    description_summary: str
    point_of_contact: Optional[str]
    url: str

class GrantAnnouncement(BaseModel):
    """Federal grant opportunity from Grants.gov"""
    opportunity_id: str
    title: str
    agency: str
    funding_instrument: str  # "Grant", "Cooperative Agreement", etc.
    category: str  # "Science & Technology", "Health", etc.
    posted_date: date
    close_date: Optional[date]
    estimated_funding: Optional[float]
    award_ceiling: Optional[float]
    award_floor: Optional[float]
    expected_awards: Optional[int]
    eligibility: str
    description_summary: str
    url: str

class RegulatoryChange(BaseModel):
    """Federal Register entry"""
    document_number: str
    title: str
    agency: str
    type: str  # "Proposed Rule", "Final Rule", "Notice", "Executive Order"
    published_date: date
    effective_date: Optional[date]
    comment_deadline: Optional[date]
    cfr_references: list[str]
    abstract: str
    impact_assessment: Optional[str]
    url: str

class SpendingRecord(BaseModel):
    """Federal spending from USASpending.gov"""
    award_id: str
    recipient_name: str
    recipient_state: Optional[str]
    awarding_agency: str
    funding_agency: str
    award_type: str
    award_amount: float
    period_of_performance_start: date
    period_of_performance_end: Optional[date]
    naics_code: Optional[str]
    description: str
    url: str

Step 2: Scrape Government Data Sources

Government websites are notoriously inconsistent in their formatting. Some offer APIs, others are legacy HTML from the early 2000s. The Mantis WebPerception API handles both — extracting structured data from any government page regardless of its technical stack.

Scraping SAM.gov Contract Opportunities

import httpx
from datetime import datetime, timedelta

MANTIS_API_KEY = "your-mantis-api-key"
MANTIS_BASE = "https://api.mantisapi.com/v1"

async def scrape_sam_opportunities(
    naics_codes: list[str],
    set_asides: list[str] = None,
    days_back: int = 7
) -> list[ContractOpportunity]:
    """Scrape SAM.gov for contract opportunities matching criteria."""
    
    opportunities = []
    
    for naics in naics_codes:
        # Build SAM.gov search URL
        search_url = (
            f"https://sam.gov/search/?index=opp&page=1&pageSize=25"
            f"&sort=-modifiedDate&sfm%5Bstatus%5D%5Bis_active%5D=true"
            f"&sfm%5BsimpleSearch%5D%5BkeywordRadio%5D=ALL"
            f"&sfm%5BsimpleSearch%5D%5BkeywordTags%5D%5B0%5D%5Bkey%5D={naics}"
        )
        
        async with httpx.AsyncClient() as client:
            # Use Mantis to extract structured data from SAM.gov
            response = await client.post(
                f"{MANTIS_BASE}/extract",
                headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
                json={
                    "url": search_url,
                    "schema": {
                        "opportunities": [{
                            "notice_id": "string - SAM.gov notice ID",
                            "title": "string - opportunity title",
                            "agency": "string - awarding agency",
                            "type": "string - solicitation type",
                            "posted_date": "string - date posted (YYYY-MM-DD)",
                            "response_deadline": "string - response due date",
                            "set_aside": "string - small business set-aside type if any",
                            "description_summary": "string - brief description"
                        }]
                    },
                    "wait_for": "networkidle"
                }
            )
            
            data = response.json()
            for opp in data.get("opportunities", []):
                try:
                    opportunity = ContractOpportunity(
                        notice_id=opp["notice_id"],
                        title=opp["title"],
                        agency=opp["agency"],
                        sub_agency=opp.get("sub_agency"),
                        type=opp.get("type", "solicitation"),
                        posted_date=opp["posted_date"],
                        response_deadline=opp.get("response_deadline"),
                        naics_code=naics,
                        naics_description=opp.get("naics_description", ""),
                        set_aside=opp.get("set_aside"),
                        place_of_performance=opp.get("place_of_performance"),
                        estimated_value=opp.get("estimated_value"),
                        description_summary=opp["description_summary"],
                        point_of_contact=opp.get("point_of_contact"),
                        url=f"https://sam.gov/opp/{opp['notice_id']}/view"
                    )
                    opportunities.append(opportunity)
                except Exception as e:
                    print(f"Parse error: {e}")
    
    return opportunities

Scraping Grants.gov

async def scrape_grants(
    categories: list[str],
    eligibility_filter: str = None,
    min_funding: float = None
) -> list[GrantAnnouncement]:
    """Scrape Grants.gov for funding opportunities."""
    
    grants = []
    
    for category in categories:
        search_url = (
            f"https://www.grants.gov/search-grants?"
            f"category={category}&oppStatuses=forecasted|posted"
        )
        
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{MANTIS_BASE}/extract",
                headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
                json={
                    "url": search_url,
                    "schema": {
                        "grants": [{
                            "opportunity_id": "string - Grants.gov opportunity number",
                            "title": "string - grant title",
                            "agency": "string - funding agency",
                            "funding_instrument": "string - grant, cooperative agreement, etc.",
                            "posted_date": "string - post date (YYYY-MM-DD)",
                            "close_date": "string - application deadline",
                            "estimated_funding": "number - total program funding in dollars",
                            "award_ceiling": "number - maximum individual award",
                            "award_floor": "number - minimum individual award",
                            "expected_awards": "number - estimated number of awards",
                            "eligibility": "string - eligible applicant types",
                            "description_summary": "string - brief description"
                        }]
                    },
                    "wait_for": "networkidle"
                }
            )
            
            data = response.json()
            for grant in data.get("grants", []):
                try:
                    if min_funding and grant.get("estimated_funding", 0) < min_funding:
                        continue
                    
                    announcement = GrantAnnouncement(
                        opportunity_id=grant["opportunity_id"],
                        title=grant["title"],
                        agency=grant["agency"],
                        funding_instrument=grant.get("funding_instrument", "Grant"),
                        category=category,
                        posted_date=grant["posted_date"],
                        close_date=grant.get("close_date"),
                        estimated_funding=grant.get("estimated_funding"),
                        award_ceiling=grant.get("award_ceiling"),
                        award_floor=grant.get("award_floor"),
                        expected_awards=grant.get("expected_awards"),
                        eligibility=grant.get("eligibility", ""),
                        description_summary=grant["description_summary"],
                        url=f"https://www.grants.gov/view-opportunity/{grant['opportunity_id']}"
                    )
                    grants.append(announcement)
                except Exception as e:
                    print(f"Parse error: {e}")
    
    return grants

Scraping the Federal Register

async def scrape_federal_register(
    agencies: list[str],
    document_types: list[str] = None,
    days_back: int = 7
) -> list[RegulatoryChange]:
    """Scrape Federal Register for regulatory changes.
    
    Note: Federal Register has a good API (federalregister.gov/api/v1),
    but we use Mantis for consistency and to handle pages without API coverage.
    """
    
    regulations = []
    
    # Federal Register API endpoint (public, no auth required)
    start_date = (datetime.now() - timedelta(days=days_back)).strftime("%Y-%m-%d")
    
    for agency in agencies:
        search_url = (
            f"https://www.federalregister.gov/documents/search?"
            f"conditions%5Bagencies%5D%5B%5D={agency}"
            f"&conditions%5Bpublication_date%5D%5Bgte%5D={start_date}"
        )
        
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{MANTIS_BASE}/extract",
                headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
                json={
                    "url": search_url,
                    "schema": {
                        "documents": [{
                            "document_number": "string - FR document number",
                            "title": "string - document title",
                            "agency": "string - issuing agency",
                            "type": "string - Proposed Rule, Final Rule, Notice, etc.",
                            "published_date": "string - publication date (YYYY-MM-DD)",
                            "effective_date": "string - when rule takes effect",
                            "comment_deadline": "string - deadline for public comments",
                            "abstract": "string - document summary/abstract"
                        }]
                    }
                }
            )
            
            data = response.json()
            for doc in data.get("documents", []):
                try:
                    reg = RegulatoryChange(
                        document_number=doc["document_number"],
                        title=doc["title"],
                        agency=doc.get("agency", agency),
                        type=doc.get("type", "Notice"),
                        published_date=doc["published_date"],
                        effective_date=doc.get("effective_date"),
                        comment_deadline=doc.get("comment_deadline"),
                        cfr_references=doc.get("cfr_references", []),
                        abstract=doc.get("abstract", ""),
                        impact_assessment=doc.get("impact_assessment"),
                        url=f"https://www.federalregister.gov/d/{doc['document_number']}"
                    )
                    regulations.append(reg)
                except Exception as e:
                    print(f"Parse error: {e}")
    
    return regulations

Scraping USASpending.gov

async def scrape_spending(
    agencies: list[str] = None,
    naics_codes: list[str] = None,
    min_amount: float = 100000,
    days_back: int = 30
) -> list[SpendingRecord]:
    """Scrape USASpending.gov for recent federal awards."""
    
    spending = []
    start_date = (datetime.now() - timedelta(days=days_back)).strftime("%Y-%m-%d")
    
    search_url = (
        f"https://www.usaspending.gov/search?"
        f"hash=&time_period%5B0%5D%5Bstart_date%5D={start_date}"
    )
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{MANTIS_BASE}/extract",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={
                "url": search_url,
                "schema": {
                    "awards": [{
                        "award_id": "string - unique award identifier",
                        "recipient_name": "string - company or org receiving the award",
                        "recipient_state": "string - recipient state",
                        "awarding_agency": "string - agency making the award",
                        "award_type": "string - contract, grant, loan, etc.",
                        "award_amount": "number - total award amount in dollars",
                        "description": "string - award description",
                        "naics_code": "string - NAICS code if applicable"
                    }]
                },
                "wait_for": "networkidle"
            }
        )
        
        data = response.json()
        for award in data.get("awards", []):
            try:
                if award.get("award_amount", 0) < min_amount:
                    continue
                
                record = SpendingRecord(
                    award_id=award["award_id"],
                    recipient_name=award["recipient_name"],
                    recipient_state=award.get("recipient_state"),
                    awarding_agency=award["awarding_agency"],
                    funding_agency=award.get("funding_agency", award["awarding_agency"]),
                    award_type=award.get("award_type", "contract"),
                    award_amount=award["award_amount"],
                    period_of_performance_start=start_date,
                    period_of_performance_end=award.get("period_of_performance_end"),
                    naics_code=award.get("naics_code"),
                    description=award.get("description", ""),
                    url=f"https://www.usaspending.gov/award/{award['award_id']}"
                )
                spending.append(record)
            except Exception as e:
                print(f"Parse error: {e}")
    
    return spending

Step 3: Store in SQLite

import sqlite3
import json

def init_gov_database(db_path: str = "gov_intelligence.db"):
    """Initialize SQLite database for government data."""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS contract_opportunities (
            notice_id TEXT PRIMARY KEY,
            title TEXT NOT NULL,
            agency TEXT NOT NULL,
            sub_agency TEXT,
            type TEXT,
            posted_date TEXT,
            response_deadline TEXT,
            naics_code TEXT,
            naics_description TEXT,
            set_aside TEXT,
            place_of_performance TEXT,
            estimated_value REAL,
            description_summary TEXT,
            point_of_contact TEXT,
            url TEXT,
            first_seen TEXT DEFAULT CURRENT_TIMESTAMP,
            ai_assessment TEXT
        )
    """)
    
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS grant_announcements (
            opportunity_id TEXT PRIMARY KEY,
            title TEXT NOT NULL,
            agency TEXT NOT NULL,
            funding_instrument TEXT,
            category TEXT,
            posted_date TEXT,
            close_date TEXT,
            estimated_funding REAL,
            award_ceiling REAL,
            award_floor REAL,
            expected_awards INTEGER,
            eligibility TEXT,
            description_summary TEXT,
            url TEXT,
            first_seen TEXT DEFAULT CURRENT_TIMESTAMP,
            ai_assessment TEXT
        )
    """)
    
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS regulatory_changes (
            document_number TEXT PRIMARY KEY,
            title TEXT NOT NULL,
            agency TEXT NOT NULL,
            type TEXT,
            published_date TEXT,
            effective_date TEXT,
            comment_deadline TEXT,
            cfr_references TEXT,
            abstract TEXT,
            impact_assessment TEXT,
            url TEXT,
            first_seen TEXT DEFAULT CURRENT_TIMESTAMP,
            ai_analysis TEXT
        )
    """)
    
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS spending_records (
            award_id TEXT PRIMARY KEY,
            recipient_name TEXT,
            recipient_state TEXT,
            awarding_agency TEXT,
            funding_agency TEXT,
            award_type TEXT,
            award_amount REAL,
            period_of_performance_start TEXT,
            period_of_performance_end TEXT,
            naics_code TEXT,
            description TEXT,
            url TEXT,
            first_seen TEXT DEFAULT CURRENT_TIMESTAMP
        )
    """)
    
    conn.commit()
    return conn

Step 4: Detect Relevant Opportunities

Not every government action matters to your organization. The detection layer filters for opportunities that match your business profile, capability areas, and strategic interests.

from dataclasses import dataclass

@dataclass
class GovAlert:
    alert_type: str  # "new_contract", "grant_deadline", "reg_change", "large_award"
    severity: str    # "high", "medium", "low"
    title: str
    details: str
    deadline: str = None
    estimated_value: float = None
    url: str = None

def detect_opportunities(
    contracts: list[ContractOpportunity],
    grants: list[GrantAnnouncement],
    regulations: list[RegulatoryChange],
    spending: list[SpendingRecord],
    company_naics: list[str],
    company_keywords: list[str],
    min_contract_value: float = 50000
) -> list[GovAlert]:
    """Detect government opportunities relevant to your organization."""
    
    alerts = []
    keywords_lower = [k.lower() for k in company_keywords]
    
    # Check new contract opportunities
    for contract in contracts:
        relevance_score = 0
        
        # NAICS code match
        if contract.naics_code in company_naics:
            relevance_score += 3
        
        # Keyword match in title or description
        text = f"{contract.title} {contract.description_summary}".lower()
        keyword_matches = sum(1 for kw in keywords_lower if kw in text)
        relevance_score += keyword_matches
        
        # Value threshold
        if contract.estimated_value and contract.estimated_value >= min_contract_value:
            relevance_score += 1
        
        # Small business set-aside bonus
        if contract.set_aside:
            relevance_score += 1
        
        if relevance_score >= 2:
            severity = "high" if relevance_score >= 4 else "medium"
            alerts.append(GovAlert(
                alert_type="new_contract",
                severity=severity,
                title=f"New Contract: {contract.title}",
                details=(
                    f"Agency: {contract.agency}\n"
                    f"NAICS: {contract.naics_code} — {contract.naics_description}\n"
                    f"Set-aside: {contract.set_aside or 'Full & Open'}\n"
                    f"Est. value: ${contract.estimated_value:,.0f}" if contract.estimated_value else ""
                ),
                deadline=str(contract.response_deadline) if contract.response_deadline else None,
                estimated_value=contract.estimated_value,
                url=contract.url
            ))
    
    # Check grant opportunities
    for grant in grants:
        text = f"{grant.title} {grant.description_summary}".lower()
        keyword_matches = sum(1 for kw in keywords_lower if kw in text)
        
        if keyword_matches >= 1:
            severity = "high" if grant.estimated_funding and grant.estimated_funding > 1000000 else "medium"
            alerts.append(GovAlert(
                alert_type="grant_opportunity",
                severity=severity,
                title=f"Grant: {grant.title}",
                details=(
                    f"Agency: {grant.agency}\n"
                    f"Funding: ${grant.estimated_funding:,.0f}\n" if grant.estimated_funding else ""
                    f"Award range: ${grant.award_floor:,.0f}–${grant.award_ceiling:,.0f}\n" if grant.award_floor and grant.award_ceiling else ""
                    f"Expected awards: {grant.expected_awards}" if grant.expected_awards else ""
                ),
                deadline=str(grant.close_date) if grant.close_date else None,
                estimated_value=grant.estimated_funding,
                url=grant.url
            ))
    
    # Check regulatory changes
    for reg in regulations:
        text = f"{reg.title} {reg.abstract}".lower()
        keyword_matches = sum(1 for kw in keywords_lower if kw in text)
        
        if keyword_matches >= 1 or reg.type in ["Final Rule", "Executive Order"]:
            severity = "high" if reg.type in ["Final Rule", "Executive Order"] else "medium"
            alerts.append(GovAlert(
                alert_type="regulatory_change",
                severity=severity,
                title=f"{reg.type}: {reg.title}",
                details=(
                    f"Agency: {reg.agency}\n"
                    f"Effective: {reg.effective_date or 'TBD'}\n"
                    f"Comment deadline: {reg.comment_deadline or 'N/A'}\n"
                    f"{reg.abstract[:300]}..."
                ),
                deadline=str(reg.comment_deadline) if reg.comment_deadline else None,
                url=reg.url
            ))
    
    # Check large spending awards (competitive intelligence)
    for record in spending:
        if record.award_amount >= 1000000:
            text = f"{record.description}".lower()
            keyword_matches = sum(1 for kw in keywords_lower if kw in text)
            
            if keyword_matches >= 1 or record.naics_code in company_naics:
                alerts.append(GovAlert(
                    alert_type="large_award",
                    severity="medium",
                    title=f"Award: ${record.award_amount:,.0f} to {record.recipient_name}",
                    details=(
                        f"Agency: {record.awarding_agency}\n"
                        f"Recipient: {record.recipient_name} ({record.recipient_state})\n"
                        f"Type: {record.award_type}\n"
                        f"Description: {record.description[:200]}"
                    ),
                    estimated_value=record.award_amount,
                    url=record.url
                ))
    
    # Sort by severity (high first)
    severity_order = {"high": 0, "medium": 1, "low": 2}
    alerts.sort(key=lambda a: severity_order.get(a.severity, 2))
    
    return alerts

Step 5: AI-Powered Analysis with GPT-4o

Raw opportunity data is useful, but AI analysis transforms it into strategic intelligence. GPT-4o evaluates each opportunity against your company's capabilities and provides win probability, competitive landscape, and recommended actions.

from openai import OpenAI

openai_client = OpenAI()

async def analyze_opportunity(
    alert: GovAlert,
    company_profile: dict,
    historical_wins: list[dict] = None
) -> str:
    """Use GPT-4o to analyze a government opportunity."""
    
    context = f"""You are a government contracts intelligence analyst. Analyze this 
opportunity for a company with the following profile:

Company: {company_profile.get('name', 'N/A')}
NAICS codes: {', '.join(company_profile.get('naics_codes', []))}
Capabilities: {', '.join(company_profile.get('capabilities', []))}
Past performance: {company_profile.get('past_performance', 'Limited')}
Small business certifications: {', '.join(company_profile.get('certifications', []))}
Target agencies: {', '.join(company_profile.get('target_agencies', []))}

{"Historical wins: " + json.dumps(historical_wins[:5]) if historical_wins else "No historical win data."}

OPPORTUNITY:
Type: {alert.alert_type}
Title: {alert.title}
Details: {alert.details}
Deadline: {alert.deadline or 'Not specified'}
Estimated Value: ${alert.estimated_value:,.0f} if alert.estimated_value else 'Not disclosed'

Provide:
1. RELEVANCE SCORE (1-10): How well does this match the company's capabilities?
2. WIN PROBABILITY: Low / Medium / High — with reasoning
3. COMPETITIVE LANDSCAPE: Who are likely competitors? Is this wired for an incumbent?
4. KEY REQUIREMENTS: What capabilities/certifications are needed?
5. RECOMMENDED ACTIONS: Specific next steps (teaming, capability statements, etc.)
6. RED FLAGS: Any concerns (unrealistic timeline, budget mismatch, etc.)
7. DEADLINE STRATEGY: If there's a deadline, what's the timeline for a quality response?

Be direct and actionable. Flag if this looks like a recompete (incumbent advantage) or a new requirement (open field)."""

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": context}],
        temperature=0.3,
        max_tokens=1000
    )
    
    return response.choices[0].message.content

async def analyze_spending_trends(
    spending: list[SpendingRecord],
    focus_area: str
) -> str:
    """Analyze federal spending trends in a focus area."""
    
    # Aggregate by agency
    agency_totals = {}
    for record in spending:
        agency = record.awarding_agency
        agency_totals[agency] = agency_totals.get(agency, 0) + record.award_amount
    
    top_agencies = sorted(agency_totals.items(), key=lambda x: x[1], reverse=True)[:10]
    
    # Aggregate by recipient
    recipient_totals = {}
    for record in spending:
        name = record.recipient_name
        recipient_totals[name] = recipient_totals.get(name, 0) + record.award_amount
    
    top_recipients = sorted(recipient_totals.items(), key=lambda x: x[1], reverse=True)[:10]
    
    context = f"""Analyze federal spending trends for: {focus_area}

Top agencies by spending:
{chr(10).join(f"  {a}: ${v:,.0f}" for a, v in top_agencies)}

Top recipients:
{chr(10).join(f"  {r}: ${v:,.0f}" for r, v in top_recipients)}

Total awards analyzed: {len(spending)}
Total value: ${sum(r.award_amount for r in spending):,.0f}

Provide:
1. SPENDING TRENDS: Where is money flowing? Increasing or decreasing?
2. DOMINANT PLAYERS: Who are the major recipients? Are there incumbents to be aware of?
3. OPPORTUNITY GAPS: Where might a new entrant find opportunities?
4. AGENCY PRIORITIES: What do spending patterns reveal about agency priorities?
5. STRATEGIC RECOMMENDATIONS: How should a company position itself?"""

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": context}],
        temperature=0.3,
        max_tokens=800
    )
    
    return response.choices[0].message.content

Step 6: Slack Alerts

import httpx

SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

async def send_gov_alert(alert: GovAlert, ai_analysis: str):
    """Send government intelligence alert to Slack."""
    
    emoji_map = {
        "new_contract": "📋",
        "grant_opportunity": "💰",
        "regulatory_change": "⚖️",
        "large_award": "🏆"
    }
    
    severity_color = {
        "high": "#ff4444",
        "medium": "#ffaa00",
        "low": "#44aa44"
    }
    
    emoji = emoji_map.get(alert.alert_type, "🏛️")
    
    message = {
        "blocks": [
            {
                "type": "header",
                "text": {
                    "type": "plain_text",
                    "text": f"{emoji} {alert.title}"
                }
            },
            {
                "type": "section",
                "text": {
                    "type": "mrkdwn",
                    "text": (
                        f"*Severity:* {alert.severity.upper()}\n"
                        f"*Deadline:* {alert.deadline or 'N/A'}\n"
                        f"*Est. Value:* ${alert.estimated_value:,.0f}\n" if alert.estimated_value else ""
                        f"\n{alert.details}"
                    )
                }
            },
            {
                "type": "section",
                "text": {
                    "type": "mrkdwn",
                    "text": f"*🤖 AI Analysis:*\n{ai_analysis[:2000]}"
                }
            }
        ]
    }
    
    if alert.url:
        message["blocks"].append({
            "type": "actions",
            "elements": [{
                "type": "button",
                "text": {"type": "plain_text", "text": "View Opportunity"},
                "url": alert.url
            }]
        })
    
    async with httpx.AsyncClient() as client:
        await client.post(SLACK_WEBHOOK, json=message)

Putting It All Together

import asyncio

async def run_gov_intelligence_pipeline():
    """Main pipeline: scrape → store → detect → analyze → alert."""
    
    # Company profile for opportunity matching
    company_profile = {
        "name": "Your Company",
        "naics_codes": ["541512", "541519", "518210", "541511"],  # IT services
        "capabilities": [
            "cloud migration", "cybersecurity", "data analytics",
            "AI/ML", "software development", "DevSecOps"
        ],
        "certifications": ["8(a)", "HUBZone", "SDVOSB"],
        "past_performance": "3 federal contracts completed (GSA, DoD, HHS)",
        "target_agencies": ["DOD", "DHS", "HHS", "GSA", "VA"]
    }
    
    keywords = [
        "artificial intelligence", "machine learning", "cloud",
        "cybersecurity", "data analytics", "software development",
        "automation", "API", "web scraping", "data extraction"
    ]
    
    # 1. Scrape all sources
    print("🏛️ Scraping SAM.gov...")
    contracts = await scrape_sam_opportunities(
        naics_codes=company_profile["naics_codes"],
        days_back=7
    )
    
    print("💰 Scraping Grants.gov...")
    grants = await scrape_grants(
        categories=["science-technology", "information-technology"],
        min_funding=100000
    )
    
    print("⚖️ Scraping Federal Register...")
    regulations = await scrape_federal_register(
        agencies=["defense-department", "homeland-security", "health-human-services"],
        days_back=7
    )
    
    print("💵 Scraping USASpending.gov...")
    spending = await scrape_spending(
        naics_codes=company_profile["naics_codes"],
        min_amount=100000,
        days_back=30
    )
    
    # 2. Store in database
    db = init_gov_database()
    # ... insert records ...
    
    # 3. Detect relevant opportunities
    alerts = detect_opportunities(
        contracts, grants, regulations, spending,
        company_naics=company_profile["naics_codes"],
        company_keywords=keywords
    )
    
    print(f"\n📊 Found {len(alerts)} relevant opportunities")
    
    # 4. Analyze and alert
    for alert in alerts[:10]:  # Top 10 most relevant
        analysis = await analyze_opportunity(alert, company_profile)
        await send_gov_alert(alert, analysis)
        print(f"  ✅ Alerted: {alert.title}")
    
    # 5. Spending trend analysis
    if spending:
        trends = await analyze_spending_trends(spending, "IT Services & AI/ML")
        # Send trend report to Slack
        print(f"\n📈 Spending analysis complete")

if __name__ == "__main__":
    asyncio.run(run_gov_intelligence_pipeline())

Advanced: State & Local Government Monitoring

Federal opportunities are only part of the picture. State and local government spending exceeds $3.8 trillion annually, and many contracts are under thresholds that attract less competition from large primes.

# State procurement portals vary widely — here's a multi-state approach

STATE_PORTALS = {
    "california": "https://caleprocure.ca.gov/pages/Events-BS3/event-search.aspx",
    "texas": "https://www.txsmartbuy.com/sp",
    "new_york": "https://www.ogs.ny.gov/procurement",
    "florida": "https://vendor.myfloridamarketplace.com/search/bids",
    "virginia": "https://eva.virginia.gov/pages/eva-public-portal.htm",
    # Add all 50 states...
}

async def scrape_state_procurement(state: str, keywords: list[str]):
    """Scrape a state procurement portal for relevant opportunities."""
    
    portal_url = STATE_PORTALS.get(state)
    if not portal_url:
        return []
    
    async with httpx.AsyncClient() as client:
        response = await client.post(
            f"{MANTIS_BASE}/extract",
            headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
            json={
                "url": portal_url,
                "schema": {
                    "opportunities": [{
                        "id": "string - solicitation number",
                        "title": "string - opportunity title",
                        "agency": "string - issuing department",
                        "posted_date": "string - date posted",
                        "due_date": "string - proposal due date",
                        "category": "string - procurement category",
                        "description": "string - brief description",
                        "estimated_value": "number - estimated contract value if shown"
                    }]
                },
                "wait_for": "networkidle"
            }
        )
        
        return response.json().get("opportunities", [])

Advanced: Congressional Activity Monitoring

For organizations whose business is affected by legislation, monitoring congressional activity provides early warning of policy shifts and funding changes.

async def monitor_congress(
    keywords: list[str],
    committees: list[str] = None
) -> list[dict]:
    """Monitor Congress.gov for relevant bills and hearings."""
    
    for keyword in keywords:
        search_url = (
            f"https://www.congress.gov/search?"
            f"q=%7B%22source%22%3A%22legislation%22%2C"
            f"%22search%22%3A%22{keyword}%22%7D"
        )
        
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{MANTIS_BASE}/extract",
                headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
                json={
                    "url": search_url,
                    "schema": {
                        "bills": [{
                            "bill_number": "string - e.g., H.R.1234, S.567",
                            "title": "string - bill title",
                            "sponsor": "string - primary sponsor",
                            "status": "string - current status (introduced, passed house, etc.)",
                            "latest_action": "string - most recent action",
                            "latest_action_date": "string - date of latest action",
                            "committees": "string - committees of jurisdiction"
                        }]
                    }
                }
            )
            
            # Filter for active bills with recent movement
            bills = response.json().get("bills", [])
            active_bills = [
                b for b in bills
                if b.get("status") not in ["introduced"]  # Skip dormant bills
            ]
            
            return active_bills

Cost Comparison: Traditional vs. AI Agent Approach

Platform	Monthly Cost	Coverage	Real-Time Alerts	AI Analysis
GovWin IQ (Deltek)	$400–$2,100/mo	Federal + state	Yes	Limited
Bloomberg Government	$660–$1,250/mo	Federal + congressional	Yes	No
GovTribe	$100–$800/mo	Federal only	Yes	Basic
Deltek	$250–$1,700/mo	Federal + state	Yes	Limited
AI Agent + Mantis	$29–$299/mo	Federal + state + local	Yes	GPT-4o powered

Key advantage: Traditional government intelligence platforms charge per seat and per module. Your AI agent scales to monitor unlimited sources, agencies, and opportunity types at a flat API cost. Plus, the AI analysis layer provides strategic intelligence that no existing platform offers out of the box.

Use Cases by Organization Type

1. Government Contractors (Small & Mid-Size)

The primary use case. Monitor SAM.gov for opportunities matching your NAICS codes, track set-asides (8(a), HUBZone, SDVOSB), and get AI-powered bid/no-bid recommendations. The system replaces expensive GovWin subscriptions while adding strategic analysis.

2. Nonprofits & Research Institutions

Track Grants.gov across multiple categories, monitor NSF/NIH/DOE funding announcements, and get early warning of new grant programs. The AI can match grant requirements against your organization's capabilities and past awards.

3. Policy & Advocacy Organizations

Monitor the Federal Register for proposed rules affecting your industry, track congressional bills through the legislative process, and detect regulatory changes before they're widely reported. Comment deadline tracking ensures you never miss a window to influence policy.

4. GovTech Startups

Analyze spending patterns to identify agencies that are increasing technology spending, find agencies with expiring contracts (recompete opportunities), and map the competitive landscape by tracking who wins which contracts.

Compliance & Ethical Considerations

Public data: All federal government data sources mentioned here publish data explicitly for public access. There are no legal barriers to scraping government websites for public information.
Rate limiting: Respect server capacity. Government websites often run on limited infrastructure. Space requests 2-5 seconds apart.
API preference: When a government API exists (e.g., Federal Register API, USASpending API), prefer it over scraping. APIs are more reliable and reduce server load.
No CUI/classified data: This system only processes publicly available, unclassified information. Never attempt to access controlled unclassified information (CUI) or classified systems.
FOIA requests: For data not publicly available, file proper FOIA requests rather than attempting to scrape restricted areas.
Bid integrity: AI analysis should inform decisions, not replace due diligence. Always have humans review bid/no-bid decisions and verify opportunity details on the source website.

Start Tracking Government Opportunities Today

Mantis WebPerception API handles the complexity of scraping government portals — from legacy SAM.gov pages to modern USASpending.gov dashboards. Extract structured data from any government website with a single API call.

Get Your API Key →

What's Next?

You've built a government intelligence system that monitors procurement, grants, regulations, and spending automatically. Here are ways to extend it:

Subcontractor matching: Scrape SBA's SubNet for subcontracting opportunities from large primes
Past performance tracking: Monitor CPARS and competitor award histories
GSA Schedule monitoring: Track GSA Advantage for pricing benchmarks and competitor offerings
International: Extend to UN procurement, World Bank, EU tenders (TED)
Teaming partner discovery: Use spending data to identify potential teaming partners based on complementary NAICS codes and agency relationships