Web Scraping for Government & Public Sector: How AI Agents Track Contracts, Grants & Policy Changes in 2026
Government spending in the United States alone exceeds $6.5 trillion annually, with federal procurement accounting for over $700 billion in contracts awarded each year. Add state and local spending, and the total government market represents the single largest buyer of goods and services on the planet.
Yet tracking government opportunities remains surprisingly manual. GovWin IQ ($5Kโ$25K/year), Bloomberg Government ($8Kโ$15K/seat), and Deltek ($3Kโ$20K/year) charge premium prices for intelligence that is, by definition, publicly available data. Federal contracts, grant announcements, regulatory changes, and spending reports are all published on government websites โ but scattered across dozens of portals with inconsistent formats.
What if an AI agent could monitor SAM.gov for new contract opportunities, track Grants.gov for funding announcements, scrape the Federal Register for regulatory changes, and analyze USASpending.gov for spending trends โ all automatically, for a fraction of the cost?
In this guide, you'll build an AI-powered government intelligence system that scrapes procurement opportunities, grant announcements, policy changes, and public spending data โ then uses GPT-4o to generate opportunity assessments and strategic recommendations via Slack alerts.
Why AI Agents Are Transforming Government Intelligence
Government data is unique: it's almost entirely public by law. The Freedom of Information Act, open data mandates, and transparency requirements mean that an enormous amount of valuable intelligence is freely accessible โ if you can find it, parse it, and analyze it at scale.
Traditional government intelligence platforms charge thousands per year for what is essentially structured access to public data. They add value through aggregation, search, and analysis โ exactly what AI agents excel at.
The Government Data Landscape
- SAM.gov โ System for Award Management: all federal contract opportunities, entity registrations, and exclusions
- Grants.gov โ Federal grant announcements from 26+ agencies
- Federal Register โ Proposed rules, final rules, notices, and executive orders
- USASpending.gov โ Every federal dollar spent, searchable by agency, recipient, and program
- FPDS โ Federal Procurement Data System: historical contract award data
- State procurement portals โ 50 states, each with their own solicitation system
- Congress.gov โ Bills, votes, committee hearings, and legislative history
- Regulations.gov โ Public comments on proposed regulations
What Makes Government Scraping Different
Architecture: Building a Government Intelligence Agent
Our system follows a six-step pipeline that monitors multiple government data sources, extracts structured intelligence, detects relevant opportunities, and delivers AI-powered analysis:
- Source Discovery โ Monitor SAM.gov, Grants.gov, Federal Register, USASpending.gov, state portals
- AI Extraction โ Use Mantis WebPerception API to scrape and structure government pages
- SQLite Storage โ Store opportunities, grants, regulations, and spending data locally
- Opportunity Detection โ Identify relevant contracts, grants, and policy changes
- GPT-4o Analysis โ Generate opportunity assessments, competitive landscape, and strategy
- Slack Alerts โ Deliver actionable intelligence to your government affairs team
Step 1: Define Data Models
We'll track four core government data types: contract opportunities, grant announcements, regulatory changes, and spending records.
from pydantic import BaseModel
from datetime import date, datetime
from typing import Optional
from enum import Enum
class ContractType(str, Enum):
SOLICITATION = "solicitation"
AWARD = "award"
MODIFICATION = "modification"
PRESOLICITATION = "presolicitation"
SOURCES_SOUGHT = "sources_sought"
SPECIAL_NOTICE = "special_notice"
class ContractOpportunity(BaseModel):
"""Federal contract opportunity from SAM.gov"""
notice_id: str
title: str
agency: str
sub_agency: Optional[str]
type: ContractType
posted_date: date
response_deadline: Optional[date]
naics_code: str
naics_description: str
set_aside: Optional[str] # e.g., "Small Business", "8(a)", "HUBZone"
place_of_performance: Optional[str]
estimated_value: Optional[float]
description_summary: str
point_of_contact: Optional[str]
url: str
class GrantAnnouncement(BaseModel):
"""Federal grant opportunity from Grants.gov"""
opportunity_id: str
title: str
agency: str
funding_instrument: str # "Grant", "Cooperative Agreement", etc.
category: str # "Science & Technology", "Health", etc.
posted_date: date
close_date: Optional[date]
estimated_funding: Optional[float]
award_ceiling: Optional[float]
award_floor: Optional[float]
expected_awards: Optional[int]
eligibility: str
description_summary: str
url: str
class RegulatoryChange(BaseModel):
"""Federal Register entry"""
document_number: str
title: str
agency: str
type: str # "Proposed Rule", "Final Rule", "Notice", "Executive Order"
published_date: date
effective_date: Optional[date]
comment_deadline: Optional[date]
cfr_references: list[str]
abstract: str
impact_assessment: Optional[str]
url: str
class SpendingRecord(BaseModel):
"""Federal spending from USASpending.gov"""
award_id: str
recipient_name: str
recipient_state: Optional[str]
awarding_agency: str
funding_agency: str
award_type: str
award_amount: float
period_of_performance_start: date
period_of_performance_end: Optional[date]
naics_code: Optional[str]
description: str
url: str
Step 2: Scrape Government Data Sources
Government websites are notoriously inconsistent in their formatting. Some offer APIs, others are legacy HTML from the early 2000s. The Mantis WebPerception API handles both โ extracting structured data from any government page regardless of its technical stack.
Scraping SAM.gov Contract Opportunities
import httpx
from datetime import datetime, timedelta
MANTIS_API_KEY = "your-mantis-api-key"
MANTIS_BASE = "https://api.mantisapi.com/v1"
async def scrape_sam_opportunities(
naics_codes: list[str],
set_asides: list[str] = None,
days_back: int = 7
) -> list[ContractOpportunity]:
"""Scrape SAM.gov for contract opportunities matching criteria."""
opportunities = []
for naics in naics_codes:
# Build SAM.gov search URL
search_url = (
f"https://sam.gov/search/?index=opp&page=1&pageSize=25"
f"&sort=-modifiedDate&sfm%5Bstatus%5D%5Bis_active%5D=true"
f"&sfm%5BsimpleSearch%5D%5BkeywordRadio%5D=ALL"
f"&sfm%5BsimpleSearch%5D%5BkeywordTags%5D%5B0%5D%5Bkey%5D={naics}"
)
async with httpx.AsyncClient() as client:
# Use Mantis to extract structured data from SAM.gov
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": search_url,
"schema": {
"opportunities": [{
"notice_id": "string - SAM.gov notice ID",
"title": "string - opportunity title",
"agency": "string - awarding agency",
"type": "string - solicitation type",
"posted_date": "string - date posted (YYYY-MM-DD)",
"response_deadline": "string - response due date",
"set_aside": "string - small business set-aside type if any",
"description_summary": "string - brief description"
}]
},
"wait_for": "networkidle"
}
)
data = response.json()
for opp in data.get("opportunities", []):
try:
opportunity = ContractOpportunity(
notice_id=opp["notice_id"],
title=opp["title"],
agency=opp["agency"],
sub_agency=opp.get("sub_agency"),
type=opp.get("type", "solicitation"),
posted_date=opp["posted_date"],
response_deadline=opp.get("response_deadline"),
naics_code=naics,
naics_description=opp.get("naics_description", ""),
set_aside=opp.get("set_aside"),
place_of_performance=opp.get("place_of_performance"),
estimated_value=opp.get("estimated_value"),
description_summary=opp["description_summary"],
point_of_contact=opp.get("point_of_contact"),
url=f"https://sam.gov/opp/{opp['notice_id']}/view"
)
opportunities.append(opportunity)
except Exception as e:
print(f"Parse error: {e}")
return opportunities
Scraping Grants.gov
async def scrape_grants(
categories: list[str],
eligibility_filter: str = None,
min_funding: float = None
) -> list[GrantAnnouncement]:
"""Scrape Grants.gov for funding opportunities."""
grants = []
for category in categories:
search_url = (
f"https://www.grants.gov/search-grants?"
f"category={category}&oppStatuses=forecasted|posted"
)
async with httpx.AsyncClient() as client:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": search_url,
"schema": {
"grants": [{
"opportunity_id": "string - Grants.gov opportunity number",
"title": "string - grant title",
"agency": "string - funding agency",
"funding_instrument": "string - grant, cooperative agreement, etc.",
"posted_date": "string - post date (YYYY-MM-DD)",
"close_date": "string - application deadline",
"estimated_funding": "number - total program funding in dollars",
"award_ceiling": "number - maximum individual award",
"award_floor": "number - minimum individual award",
"expected_awards": "number - estimated number of awards",
"eligibility": "string - eligible applicant types",
"description_summary": "string - brief description"
}]
},
"wait_for": "networkidle"
}
)
data = response.json()
for grant in data.get("grants", []):
try:
if min_funding and grant.get("estimated_funding", 0) < min_funding:
continue
announcement = GrantAnnouncement(
opportunity_id=grant["opportunity_id"],
title=grant["title"],
agency=grant["agency"],
funding_instrument=grant.get("funding_instrument", "Grant"),
category=category,
posted_date=grant["posted_date"],
close_date=grant.get("close_date"),
estimated_funding=grant.get("estimated_funding"),
award_ceiling=grant.get("award_ceiling"),
award_floor=grant.get("award_floor"),
expected_awards=grant.get("expected_awards"),
eligibility=grant.get("eligibility", ""),
description_summary=grant["description_summary"],
url=f"https://www.grants.gov/view-opportunity/{grant['opportunity_id']}"
)
grants.append(announcement)
except Exception as e:
print(f"Parse error: {e}")
return grants
Scraping the Federal Register
async def scrape_federal_register(
agencies: list[str],
document_types: list[str] = None,
days_back: int = 7
) -> list[RegulatoryChange]:
"""Scrape Federal Register for regulatory changes.
Note: Federal Register has a good API (federalregister.gov/api/v1),
but we use Mantis for consistency and to handle pages without API coverage.
"""
regulations = []
# Federal Register API endpoint (public, no auth required)
start_date = (datetime.now() - timedelta(days=days_back)).strftime("%Y-%m-%d")
for agency in agencies:
search_url = (
f"https://www.federalregister.gov/documents/search?"
f"conditions%5Bagencies%5D%5B%5D={agency}"
f"&conditions%5Bpublication_date%5D%5Bgte%5D={start_date}"
)
async with httpx.AsyncClient() as client:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": search_url,
"schema": {
"documents": [{
"document_number": "string - FR document number",
"title": "string - document title",
"agency": "string - issuing agency",
"type": "string - Proposed Rule, Final Rule, Notice, etc.",
"published_date": "string - publication date (YYYY-MM-DD)",
"effective_date": "string - when rule takes effect",
"comment_deadline": "string - deadline for public comments",
"abstract": "string - document summary/abstract"
}]
}
}
)
data = response.json()
for doc in data.get("documents", []):
try:
reg = RegulatoryChange(
document_number=doc["document_number"],
title=doc["title"],
agency=doc.get("agency", agency),
type=doc.get("type", "Notice"),
published_date=doc["published_date"],
effective_date=doc.get("effective_date"),
comment_deadline=doc.get("comment_deadline"),
cfr_references=doc.get("cfr_references", []),
abstract=doc.get("abstract", ""),
impact_assessment=doc.get("impact_assessment"),
url=f"https://www.federalregister.gov/d/{doc['document_number']}"
)
regulations.append(reg)
except Exception as e:
print(f"Parse error: {e}")
return regulations
Scraping USASpending.gov
async def scrape_spending(
agencies: list[str] = None,
naics_codes: list[str] = None,
min_amount: float = 100000,
days_back: int = 30
) -> list[SpendingRecord]:
"""Scrape USASpending.gov for recent federal awards."""
spending = []
start_date = (datetime.now() - timedelta(days=days_back)).strftime("%Y-%m-%d")
search_url = (
f"https://www.usaspending.gov/search?"
f"hash=&time_period%5B0%5D%5Bstart_date%5D={start_date}"
)
async with httpx.AsyncClient() as client:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": search_url,
"schema": {
"awards": [{
"award_id": "string - unique award identifier",
"recipient_name": "string - company or org receiving the award",
"recipient_state": "string - recipient state",
"awarding_agency": "string - agency making the award",
"award_type": "string - contract, grant, loan, etc.",
"award_amount": "number - total award amount in dollars",
"description": "string - award description",
"naics_code": "string - NAICS code if applicable"
}]
},
"wait_for": "networkidle"
}
)
data = response.json()
for award in data.get("awards", []):
try:
if award.get("award_amount", 0) < min_amount:
continue
record = SpendingRecord(
award_id=award["award_id"],
recipient_name=award["recipient_name"],
recipient_state=award.get("recipient_state"),
awarding_agency=award["awarding_agency"],
funding_agency=award.get("funding_agency", award["awarding_agency"]),
award_type=award.get("award_type", "contract"),
award_amount=award["award_amount"],
period_of_performance_start=start_date,
period_of_performance_end=award.get("period_of_performance_end"),
naics_code=award.get("naics_code"),
description=award.get("description", ""),
url=f"https://www.usaspending.gov/award/{award['award_id']}"
)
spending.append(record)
except Exception as e:
print(f"Parse error: {e}")
return spending
Step 3: Store in SQLite
import sqlite3
import json
def init_gov_database(db_path: str = "gov_intelligence.db"):
"""Initialize SQLite database for government data."""
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS contract_opportunities (
notice_id TEXT PRIMARY KEY,
title TEXT NOT NULL,
agency TEXT NOT NULL,
sub_agency TEXT,
type TEXT,
posted_date TEXT,
response_deadline TEXT,
naics_code TEXT,
naics_description TEXT,
set_aside TEXT,
place_of_performance TEXT,
estimated_value REAL,
description_summary TEXT,
point_of_contact TEXT,
url TEXT,
first_seen TEXT DEFAULT CURRENT_TIMESTAMP,
ai_assessment TEXT
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS grant_announcements (
opportunity_id TEXT PRIMARY KEY,
title TEXT NOT NULL,
agency TEXT NOT NULL,
funding_instrument TEXT,
category TEXT,
posted_date TEXT,
close_date TEXT,
estimated_funding REAL,
award_ceiling REAL,
award_floor REAL,
expected_awards INTEGER,
eligibility TEXT,
description_summary TEXT,
url TEXT,
first_seen TEXT DEFAULT CURRENT_TIMESTAMP,
ai_assessment TEXT
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS regulatory_changes (
document_number TEXT PRIMARY KEY,
title TEXT NOT NULL,
agency TEXT NOT NULL,
type TEXT,
published_date TEXT,
effective_date TEXT,
comment_deadline TEXT,
cfr_references TEXT,
abstract TEXT,
impact_assessment TEXT,
url TEXT,
first_seen TEXT DEFAULT CURRENT_TIMESTAMP,
ai_analysis TEXT
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS spending_records (
award_id TEXT PRIMARY KEY,
recipient_name TEXT,
recipient_state TEXT,
awarding_agency TEXT,
funding_agency TEXT,
award_type TEXT,
award_amount REAL,
period_of_performance_start TEXT,
period_of_performance_end TEXT,
naics_code TEXT,
description TEXT,
url TEXT,
first_seen TEXT DEFAULT CURRENT_TIMESTAMP
)
""")
conn.commit()
return conn
Step 4: Detect Relevant Opportunities
Not every government action matters to your organization. The detection layer filters for opportunities that match your business profile, capability areas, and strategic interests.
from dataclasses import dataclass
@dataclass
class GovAlert:
alert_type: str # "new_contract", "grant_deadline", "reg_change", "large_award"
severity: str # "high", "medium", "low"
title: str
details: str
deadline: str = None
estimated_value: float = None
url: str = None
def detect_opportunities(
contracts: list[ContractOpportunity],
grants: list[GrantAnnouncement],
regulations: list[RegulatoryChange],
spending: list[SpendingRecord],
company_naics: list[str],
company_keywords: list[str],
min_contract_value: float = 50000
) -> list[GovAlert]:
"""Detect government opportunities relevant to your organization."""
alerts = []
keywords_lower = [k.lower() for k in company_keywords]
# Check new contract opportunities
for contract in contracts:
relevance_score = 0
# NAICS code match
if contract.naics_code in company_naics:
relevance_score += 3
# Keyword match in title or description
text = f"{contract.title} {contract.description_summary}".lower()
keyword_matches = sum(1 for kw in keywords_lower if kw in text)
relevance_score += keyword_matches
# Value threshold
if contract.estimated_value and contract.estimated_value >= min_contract_value:
relevance_score += 1
# Small business set-aside bonus
if contract.set_aside:
relevance_score += 1
if relevance_score >= 2:
severity = "high" if relevance_score >= 4 else "medium"
alerts.append(GovAlert(
alert_type="new_contract",
severity=severity,
title=f"New Contract: {contract.title}",
details=(
f"Agency: {contract.agency}\n"
f"NAICS: {contract.naics_code} โ {contract.naics_description}\n"
f"Set-aside: {contract.set_aside or 'Full & Open'}\n"
f"Est. value: ${contract.estimated_value:,.0f}" if contract.estimated_value else ""
),
deadline=str(contract.response_deadline) if contract.response_deadline else None,
estimated_value=contract.estimated_value,
url=contract.url
))
# Check grant opportunities
for grant in grants:
text = f"{grant.title} {grant.description_summary}".lower()
keyword_matches = sum(1 for kw in keywords_lower if kw in text)
if keyword_matches >= 1:
severity = "high" if grant.estimated_funding and grant.estimated_funding > 1000000 else "medium"
alerts.append(GovAlert(
alert_type="grant_opportunity",
severity=severity,
title=f"Grant: {grant.title}",
details=(
f"Agency: {grant.agency}\n"
f"Funding: ${grant.estimated_funding:,.0f}\n" if grant.estimated_funding else ""
f"Award range: ${grant.award_floor:,.0f}โ${grant.award_ceiling:,.0f}\n" if grant.award_floor and grant.award_ceiling else ""
f"Expected awards: {grant.expected_awards}" if grant.expected_awards else ""
),
deadline=str(grant.close_date) if grant.close_date else None,
estimated_value=grant.estimated_funding,
url=grant.url
))
# Check regulatory changes
for reg in regulations:
text = f"{reg.title} {reg.abstract}".lower()
keyword_matches = sum(1 for kw in keywords_lower if kw in text)
if keyword_matches >= 1 or reg.type in ["Final Rule", "Executive Order"]:
severity = "high" if reg.type in ["Final Rule", "Executive Order"] else "medium"
alerts.append(GovAlert(
alert_type="regulatory_change",
severity=severity,
title=f"{reg.type}: {reg.title}",
details=(
f"Agency: {reg.agency}\n"
f"Effective: {reg.effective_date or 'TBD'}\n"
f"Comment deadline: {reg.comment_deadline or 'N/A'}\n"
f"{reg.abstract[:300]}..."
),
deadline=str(reg.comment_deadline) if reg.comment_deadline else None,
url=reg.url
))
# Check large spending awards (competitive intelligence)
for record in spending:
if record.award_amount >= 1000000:
text = f"{record.description}".lower()
keyword_matches = sum(1 for kw in keywords_lower if kw in text)
if keyword_matches >= 1 or record.naics_code in company_naics:
alerts.append(GovAlert(
alert_type="large_award",
severity="medium",
title=f"Award: ${record.award_amount:,.0f} to {record.recipient_name}",
details=(
f"Agency: {record.awarding_agency}\n"
f"Recipient: {record.recipient_name} ({record.recipient_state})\n"
f"Type: {record.award_type}\n"
f"Description: {record.description[:200]}"
),
estimated_value=record.award_amount,
url=record.url
))
# Sort by severity (high first)
severity_order = {"high": 0, "medium": 1, "low": 2}
alerts.sort(key=lambda a: severity_order.get(a.severity, 2))
return alerts
Step 5: AI-Powered Analysis with GPT-4o
Raw opportunity data is useful, but AI analysis transforms it into strategic intelligence. GPT-4o evaluates each opportunity against your company's capabilities and provides win probability, competitive landscape, and recommended actions.
from openai import OpenAI
openai_client = OpenAI()
async def analyze_opportunity(
alert: GovAlert,
company_profile: dict,
historical_wins: list[dict] = None
) -> str:
"""Use GPT-4o to analyze a government opportunity."""
context = f"""You are a government contracts intelligence analyst. Analyze this
opportunity for a company with the following profile:
Company: {company_profile.get('name', 'N/A')}
NAICS codes: {', '.join(company_profile.get('naics_codes', []))}
Capabilities: {', '.join(company_profile.get('capabilities', []))}
Past performance: {company_profile.get('past_performance', 'Limited')}
Small business certifications: {', '.join(company_profile.get('certifications', []))}
Target agencies: {', '.join(company_profile.get('target_agencies', []))}
{"Historical wins: " + json.dumps(historical_wins[:5]) if historical_wins else "No historical win data."}
OPPORTUNITY:
Type: {alert.alert_type}
Title: {alert.title}
Details: {alert.details}
Deadline: {alert.deadline or 'Not specified'}
Estimated Value: ${alert.estimated_value:,.0f} if alert.estimated_value else 'Not disclosed'
Provide:
1. RELEVANCE SCORE (1-10): How well does this match the company's capabilities?
2. WIN PROBABILITY: Low / Medium / High โ with reasoning
3. COMPETITIVE LANDSCAPE: Who are likely competitors? Is this wired for an incumbent?
4. KEY REQUIREMENTS: What capabilities/certifications are needed?
5. RECOMMENDED ACTIONS: Specific next steps (teaming, capability statements, etc.)
6. RED FLAGS: Any concerns (unrealistic timeline, budget mismatch, etc.)
7. DEADLINE STRATEGY: If there's a deadline, what's the timeline for a quality response?
Be direct and actionable. Flag if this looks like a recompete (incumbent advantage) or a new requirement (open field)."""
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": context}],
temperature=0.3,
max_tokens=1000
)
return response.choices[0].message.content
async def analyze_spending_trends(
spending: list[SpendingRecord],
focus_area: str
) -> str:
"""Analyze federal spending trends in a focus area."""
# Aggregate by agency
agency_totals = {}
for record in spending:
agency = record.awarding_agency
agency_totals[agency] = agency_totals.get(agency, 0) + record.award_amount
top_agencies = sorted(agency_totals.items(), key=lambda x: x[1], reverse=True)[:10]
# Aggregate by recipient
recipient_totals = {}
for record in spending:
name = record.recipient_name
recipient_totals[name] = recipient_totals.get(name, 0) + record.award_amount
top_recipients = sorted(recipient_totals.items(), key=lambda x: x[1], reverse=True)[:10]
context = f"""Analyze federal spending trends for: {focus_area}
Top agencies by spending:
{chr(10).join(f" {a}: ${v:,.0f}" for a, v in top_agencies)}
Top recipients:
{chr(10).join(f" {r}: ${v:,.0f}" for r, v in top_recipients)}
Total awards analyzed: {len(spending)}
Total value: ${sum(r.award_amount for r in spending):,.0f}
Provide:
1. SPENDING TRENDS: Where is money flowing? Increasing or decreasing?
2. DOMINANT PLAYERS: Who are the major recipients? Are there incumbents to be aware of?
3. OPPORTUNITY GAPS: Where might a new entrant find opportunities?
4. AGENCY PRIORITIES: What do spending patterns reveal about agency priorities?
5. STRATEGIC RECOMMENDATIONS: How should a company position itself?"""
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": context}],
temperature=0.3,
max_tokens=800
)
return response.choices[0].message.content
Step 6: Slack Alerts
import httpx
SLACK_WEBHOOK = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
async def send_gov_alert(alert: GovAlert, ai_analysis: str):
"""Send government intelligence alert to Slack."""
emoji_map = {
"new_contract": "๐",
"grant_opportunity": "๐ฐ",
"regulatory_change": "โ๏ธ",
"large_award": "๐"
}
severity_color = {
"high": "#ff4444",
"medium": "#ffaa00",
"low": "#44aa44"
}
emoji = emoji_map.get(alert.alert_type, "๐๏ธ")
message = {
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": f"{emoji} {alert.title}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": (
f"*Severity:* {alert.severity.upper()}\n"
f"*Deadline:* {alert.deadline or 'N/A'}\n"
f"*Est. Value:* ${alert.estimated_value:,.0f}\n" if alert.estimated_value else ""
f"\n{alert.details}"
)
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*๐ค AI Analysis:*\n{ai_analysis[:2000]}"
}
}
]
}
if alert.url:
message["blocks"].append({
"type": "actions",
"elements": [{
"type": "button",
"text": {"type": "plain_text", "text": "View Opportunity"},
"url": alert.url
}]
})
async with httpx.AsyncClient() as client:
await client.post(SLACK_WEBHOOK, json=message)
Putting It All Together
import asyncio
async def run_gov_intelligence_pipeline():
"""Main pipeline: scrape โ store โ detect โ analyze โ alert."""
# Company profile for opportunity matching
company_profile = {
"name": "Your Company",
"naics_codes": ["541512", "541519", "518210", "541511"], # IT services
"capabilities": [
"cloud migration", "cybersecurity", "data analytics",
"AI/ML", "software development", "DevSecOps"
],
"certifications": ["8(a)", "HUBZone", "SDVOSB"],
"past_performance": "3 federal contracts completed (GSA, DoD, HHS)",
"target_agencies": ["DOD", "DHS", "HHS", "GSA", "VA"]
}
keywords = [
"artificial intelligence", "machine learning", "cloud",
"cybersecurity", "data analytics", "software development",
"automation", "API", "web scraping", "data extraction"
]
# 1. Scrape all sources
print("๐๏ธ Scraping SAM.gov...")
contracts = await scrape_sam_opportunities(
naics_codes=company_profile["naics_codes"],
days_back=7
)
print("๐ฐ Scraping Grants.gov...")
grants = await scrape_grants(
categories=["science-technology", "information-technology"],
min_funding=100000
)
print("โ๏ธ Scraping Federal Register...")
regulations = await scrape_federal_register(
agencies=["defense-department", "homeland-security", "health-human-services"],
days_back=7
)
print("๐ต Scraping USASpending.gov...")
spending = await scrape_spending(
naics_codes=company_profile["naics_codes"],
min_amount=100000,
days_back=30
)
# 2. Store in database
db = init_gov_database()
# ... insert records ...
# 3. Detect relevant opportunities
alerts = detect_opportunities(
contracts, grants, regulations, spending,
company_naics=company_profile["naics_codes"],
company_keywords=keywords
)
print(f"\n๐ Found {len(alerts)} relevant opportunities")
# 4. Analyze and alert
for alert in alerts[:10]: # Top 10 most relevant
analysis = await analyze_opportunity(alert, company_profile)
await send_gov_alert(alert, analysis)
print(f" โ
Alerted: {alert.title}")
# 5. Spending trend analysis
if spending:
trends = await analyze_spending_trends(spending, "IT Services & AI/ML")
# Send trend report to Slack
print(f"\n๐ Spending analysis complete")
if __name__ == "__main__":
asyncio.run(run_gov_intelligence_pipeline())
Advanced: State & Local Government Monitoring
Federal opportunities are only part of the picture. State and local government spending exceeds $3.8 trillion annually, and many contracts are under thresholds that attract less competition from large primes.
# State procurement portals vary widely โ here's a multi-state approach
STATE_PORTALS = {
"california": "https://caleprocure.ca.gov/pages/Events-BS3/event-search.aspx",
"texas": "https://www.txsmartbuy.com/sp",
"new_york": "https://www.ogs.ny.gov/procurement",
"florida": "https://vendor.myfloridamarketplace.com/search/bids",
"virginia": "https://eva.virginia.gov/pages/eva-public-portal.htm",
# Add all 50 states...
}
async def scrape_state_procurement(state: str, keywords: list[str]):
"""Scrape a state procurement portal for relevant opportunities."""
portal_url = STATE_PORTALS.get(state)
if not portal_url:
return []
async with httpx.AsyncClient() as client:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": portal_url,
"schema": {
"opportunities": [{
"id": "string - solicitation number",
"title": "string - opportunity title",
"agency": "string - issuing department",
"posted_date": "string - date posted",
"due_date": "string - proposal due date",
"category": "string - procurement category",
"description": "string - brief description",
"estimated_value": "number - estimated contract value if shown"
}]
},
"wait_for": "networkidle"
}
)
return response.json().get("opportunities", [])
Advanced: Congressional Activity Monitoring
For organizations whose business is affected by legislation, monitoring congressional activity provides early warning of policy shifts and funding changes.
async def monitor_congress(
keywords: list[str],
committees: list[str] = None
) -> list[dict]:
"""Monitor Congress.gov for relevant bills and hearings."""
for keyword in keywords:
search_url = (
f"https://www.congress.gov/search?"
f"q=%7B%22source%22%3A%22legislation%22%2C"
f"%22search%22%3A%22{keyword}%22%7D"
)
async with httpx.AsyncClient() as client:
response = await client.post(
f"{MANTIS_BASE}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": search_url,
"schema": {
"bills": [{
"bill_number": "string - e.g., H.R.1234, S.567",
"title": "string - bill title",
"sponsor": "string - primary sponsor",
"status": "string - current status (introduced, passed house, etc.)",
"latest_action": "string - most recent action",
"latest_action_date": "string - date of latest action",
"committees": "string - committees of jurisdiction"
}]
}
}
)
# Filter for active bills with recent movement
bills = response.json().get("bills", [])
active_bills = [
b for b in bills
if b.get("status") not in ["introduced"] # Skip dormant bills
]
return active_bills
Cost Comparison: Traditional vs. AI Agent Approach
| Platform | Monthly Cost | Coverage | Real-Time Alerts | AI Analysis |
|---|---|---|---|---|
| GovWin IQ (Deltek) | $400โ$2,100/mo | Federal + state | Yes | Limited |
| Bloomberg Government | $660โ$1,250/mo | Federal + congressional | Yes | No |
| GovTribe | $100โ$800/mo | Federal only | Yes | Basic |
| Deltek | $250โ$1,700/mo | Federal + state | Yes | Limited |
| AI Agent + Mantis | $29โ$299/mo | Federal + state + local | Yes | GPT-4o powered |
Use Cases by Organization Type
1. Government Contractors (Small & Mid-Size)
The primary use case. Monitor SAM.gov for opportunities matching your NAICS codes, track set-asides (8(a), HUBZone, SDVOSB), and get AI-powered bid/no-bid recommendations. The system replaces expensive GovWin subscriptions while adding strategic analysis.
2. Nonprofits & Research Institutions
Track Grants.gov across multiple categories, monitor NSF/NIH/DOE funding announcements, and get early warning of new grant programs. The AI can match grant requirements against your organization's capabilities and past awards.
3. Policy & Advocacy Organizations
Monitor the Federal Register for proposed rules affecting your industry, track congressional bills through the legislative process, and detect regulatory changes before they're widely reported. Comment deadline tracking ensures you never miss a window to influence policy.
4. GovTech Startups
Analyze spending patterns to identify agencies that are increasing technology spending, find agencies with expiring contracts (recompete opportunities), and map the competitive landscape by tracking who wins which contracts.
Compliance & Ethical Considerations
- Public data: All federal government data sources mentioned here publish data explicitly for public access. There are no legal barriers to scraping government websites for public information.
- Rate limiting: Respect server capacity. Government websites often run on limited infrastructure. Space requests 2-5 seconds apart.
- API preference: When a government API exists (e.g., Federal Register API, USASpending API), prefer it over scraping. APIs are more reliable and reduce server load.
- No CUI/classified data: This system only processes publicly available, unclassified information. Never attempt to access controlled unclassified information (CUI) or classified systems.
- FOIA requests: For data not publicly available, file proper FOIA requests rather than attempting to scrape restricted areas.
- Bid integrity: AI analysis should inform decisions, not replace due diligence. Always have humans review bid/no-bid decisions and verify opportunity details on the source website.
Start Tracking Government Opportunities Today
Mantis WebPerception API handles the complexity of scraping government portals โ from legacy SAM.gov pages to modern USASpending.gov dashboards. Extract structured data from any government website with a single API call.
Get Your API Key โWhat's Next?
You've built a government intelligence system that monitors procurement, grants, regulations, and spending automatically. Here are ways to extend it:
- Subcontractor matching: Scrape SBA's SubNet for subcontracting opportunities from large primes
- Past performance tracking: Monitor CPARS and competitor award histories
- GSA Schedule monitoring: Track GSA Advantage for pricing benchmarks and competitor offerings
- International: Extend to UN procurement, World Bank, EU tenders (TED)
- Teaming partner discovery: Use spending data to identify potential teaming partners based on complementary NAICS codes and agency relationships