Web Scraping for Insurance & InsurTech: How AI Agents Track Premiums, Claims & Risk Data in 2026
The global insurance industry generates over $6 trillion in annual premiums, making it one of the largest financial sectors on the planet. Property & casualty alone accounts for $2.5 trillion, with commercial lines growing 8-12% annually as climate risk, cyber threats, and regulatory complexity increase.
Yet the insurance industry remains one of the most data-dependent โ and data-expensive โ sectors. Verisk Analytics ($2.5B revenue), LexisNexis Risk Solutions, and AM Best charge $5,000โ$30,000/month for the actuarial data, loss ratios, and competitive intelligence that carriers need to price risk accurately.
What if an AI agent could monitor competitor rate filings, track catastrophe events, scrape claims data, and analyze market hardening trends โ all automatically, for a fraction of the cost?
In this guide, you'll build an AI-powered insurance intelligence system that scrapes premium rates, regulatory filings, catastrophe data, and competitor products โ then uses GPT-4o to generate underwriting insights and Slack alerts.
Why AI Agents Are Transforming Insurance Data
Insurance data has unique characteristics that make it ideal for AI agent automation:
- Regulatory transparency: State insurance departments publish rate filings, financial statements, and market conduct reports โ all public data, but scattered across 50+ state portals.
- Catastrophe sensitivity: A single hurricane, wildfire, or cyber breach can shift an entire market segment. Real-time monitoring of weather events, claims reports, and loss estimates is critical.
- Competitive density: The US alone has 5,900+ insurance companies. Tracking competitor rate changes, new product launches, and market exits requires massive scale.
- Market cycles: Insurance markets alternate between "hard" (rising rates, tighter capacity) and "soft" (falling rates, excess capacity) cycles. Detecting inflection points early creates significant competitive advantage.
Architecture: The 6-Step Insurance Intelligence Pipeline
Here's the complete system architecture:
- Source Discovery โ Identify state DOI portals, NAIC databases, AM Best, carrier websites, and catastrophe data sources
- AI-Powered Extraction โ Use Mantis WebPerception API to scrape and structure insurance data from complex regulatory portals
- SQLite Storage โ Store historical rate filings, financial data, and catastrophe events locally
- Change Detection โ Flag rate changes >5%, new filings, catastrophe events, and competitor product launches
- GPT-4o Analysis โ AI interprets market conditions, predicts impact, recommends underwriting actions
- Slack/Email Alerts โ Real-time notifications for underwriters, actuaries, and product managers
Step 1: Define Your Insurance Data Models
First, create Pydantic schemas for structured insurance data extraction:
from pydantic import BaseModel
from typing import Optional, List
from datetime import datetime
from enum import Enum
class InsuranceLine(str, Enum):
PERSONAL_AUTO = "personal_auto"
HOMEOWNERS = "homeowners"
COMMERCIAL_PROPERTY = "commercial_property"
GENERAL_LIABILITY = "general_liability"
WORKERS_COMP = "workers_comp"
CYBER = "cyber"
PROFESSIONAL_LIABILITY = "professional_liability"
D_AND_O = "d_and_o"
COMMERCIAL_AUTO = "commercial_auto"
UMBRELLA = "umbrella"
class RateFiling(BaseModel):
"""State DOI rate filing data from SERFF or state portals."""
state: str # Two-letter state code
company_name: str
naic_code: Optional[str] = None
line_of_business: str
filing_type: str # "rate", "rule", "form", "rate_and_rule"
serff_tracking: Optional[str] = None
rate_change_pct: Optional[float] = None # Overall rate change requested
effective_date: Optional[str] = None
status: str # "pending", "approved", "disapproved", "withdrawn"
premium_impact: Optional[str] = None # Dollar impact estimate
filing_date: str
disposition_date: Optional[str] = None
url: Optional[str] = None
class CarrierFinancial(BaseModel):
"""Insurance company financial data from NAIC or AM Best."""
company_name: str
naic_code: str
am_best_rating: Optional[str] = None # A++, A+, A, A-, B++, etc.
direct_written_premium: Optional[float] = None
net_written_premium: Optional[float] = None
loss_ratio: Optional[float] = None # Losses / earned premium
combined_ratio: Optional[float] = None # Loss ratio + expense ratio
surplus: Optional[float] = None
year: int
line_of_business: Optional[str] = None
class CatastropheEvent(BaseModel):
"""Natural catastrophe and large-loss event data."""
event_name: str
event_type: str # "hurricane", "wildfire", "tornado", "flood", "cyber", "earthquake"
date: str
location: str # State/region affected
estimated_insured_loss: Optional[float] = None # In dollars
estimated_economic_loss: Optional[float] = None
pcs_number: Optional[str] = None # Verisk PCS catastrophe number
affected_lines: List[str] = [] # Lines of business impacted
status: str # "developing", "estimated", "final"
source: str
class CompetitorProduct(BaseModel):
"""Competitor insurance product and pricing intelligence."""
company_name: str
product_name: str
line_of_business: str
target_market: str # "small_commercial", "middle_market", "personal", etc.
key_features: List[str] = []
coverage_highlights: Optional[str] = None
pricing_model: Optional[str] = None # "usage_based", "parametric", "traditional"
distribution: Optional[str] = None # "direct", "agent", "broker", "embedded"
launch_date: Optional[str] = None
states_available: List[str] = []
url: str
Step 2: Scrape State DOI Rate Filings
State insurance departments publish all rate filings โ this is the most valuable competitive intelligence in insurance:
import requests
import json
import sqlite3
from datetime import datetime
MANTIS_API_KEY = "your-mantis-api-key"
BASE_URL = "https://api.mantisapi.com/v1"
def scrape_rate_filings(state: str, doi_url: str) -> list[RateFiling]:
"""Scrape rate filings from a state DOI portal or SERFF."""
# Step 1: Capture the filing search results
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"url": doi_url,
"render_js": True,
"wait_for": "table, .filing-results, .search-results",
"timeout": 30000
}
)
page_data = response.json()
# Step 2: AI-powered extraction of filing data
extraction = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"content": page_data["content"],
"schema": RateFiling.model_json_schema(),
"prompt": f"""Extract all insurance rate filings from this {state} DOI portal.
For each filing, capture:
- Company name and NAIC code
- Line of business (auto, homeowners, commercial, cyber, etc.)
- Filing type (rate, rule, form)
- Requested rate change percentage (look for +X% or -X%)
- Effective date and filing status
- SERFF tracking number if available
Return as a list of filing records. Pay special attention to
rate change percentages โ these are the most important data points.""",
"multiple": True
}
)
filings = [RateFiling(state=state, **f) for f in extraction.json()["data"]]
return filings
# Key state DOI portals โ focus on largest premium states first
state_doi_portals = {
"CA": "https://interactive.web.insurance.ca.gov/apex_extprd/f?p=142:1",
"FL": "https://apps.fldfs.com/SRDW/Search/RateFilingSearch.aspx",
"TX": "https://filingaccess.serff.com/sfa/search/filingSummary.xhtml?state=TX",
"NY": "https://myportal.dfs.ny.gov/web/guest/rate-applications",
"PA": "https://filingaccess.serff.com/sfa/search/filingSummary.xhtml?state=PA",
"IL": "https://filingaccess.serff.com/sfa/search/filingSummary.xhtml?state=IL",
"OH": "https://filingaccess.serff.com/sfa/search/filingSummary.xhtml?state=OH",
"NJ": "https://filingaccess.serff.com/sfa/search/filingSummary.xhtml?state=NJ",
"GA": "https://filingaccess.serff.com/sfa/search/filingSummary.xhtml?state=GA",
"NC": "https://filingaccess.serff.com/sfa/search/filingSummary.xhtml?state=NC",
}
all_filings = []
for state, url in state_doi_portals.items():
try:
filings = scrape_rate_filings(state, url)
all_filings.extend(filings)
print(f"โ
{state}: {len(filings)} rate filings captured")
except Exception as e:
print(f"โ {state}: {e}")
Step 3: Track Carrier Financial Health
Monitor carrier financial strength, loss ratios, and combined ratios from NAIC and AM Best:
def scrape_carrier_financials() -> list[CarrierFinancial]:
"""Scrape carrier financial data from NAIC and AM Best."""
financial_sources = [
{
"url": "https://content.naic.org/cipr-topics/insurance-industry-financial-results",
"prompt": """Extract insurance industry financial results including:
- Direct written premium by line of business
- Loss ratios and combined ratios by line
- Year-over-year premium growth rates
- Policyholder surplus trends
Focus on the most recent year and prior year for comparison."""
},
{
"url": "https://web.ambest.com/ratings-services/best-ratings",
"prompt": """Extract AM Best rating actions including:
- Company name and NAIC code
- Current AM Best rating (A++, A+, A, etc.)
- Rating outlook (stable, positive, negative)
- Any recent upgrades or downgrades
- Financial strength rating rationale"""
}
]
all_financials = []
for source in financial_sources:
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={"url": source["url"], "render_js": True}
)
extraction = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"content": response.json()["content"],
"schema": CarrierFinancial.model_json_schema(),
"prompt": source["prompt"],
"multiple": True
}
)
financials = [CarrierFinancial(**f) for f in extraction.json()["data"]]
all_financials.extend(financials)
return all_financials
Step 4: Monitor Catastrophe Events & Loss Estimates
Track natural catastrophes, cyber events, and large losses in real-time:
def scrape_catastrophe_events() -> list[CatastropheEvent]:
"""Scrape catastrophe events from insurance industry sources."""
cat_sources = [
{
"url": "https://www.iii.org/fact-statistic/facts-statistics-catastrophes",
"prompt": """Extract recent catastrophe events with insured loss estimates.
For each event, capture: name, type (hurricane, wildfire, tornado, flood),
date, location, insured loss estimate, economic loss estimate,
and which insurance lines are affected."""
},
{
"url": "https://www.ncei.noaa.gov/access/billions/",
"prompt": """Extract billion-dollar weather and climate disaster events.
For each, capture: event name, type, date range, states affected,
total cost estimate, and deaths. Focus on events from the past 12 months."""
},
{
"url": "https://www.artemis.bm/news/",
"prompt": """Extract recent catastrophe loss estimates and reinsurance-relevant events.
Focus on: named storms, wildfires, earthquakes, flooding events,
and any industry loss warranty (ILW) trigger events.
Include estimated insured losses and affected reinsurance layers."""
}
]
all_events = []
for source in cat_sources:
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={"url": source["url"], "render_js": True}
)
extraction = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"content": response.json()["content"],
"schema": CatastropheEvent.model_json_schema(),
"prompt": source["prompt"],
"multiple": True
}
)
events = [CatastropheEvent(**e) for e in extraction.json()["data"]]
all_events.extend(events)
return all_events
Step 5: Track Competitor Products & InsurTech Launches
Monitor competitor product launches, InsurTech funding, and new market entrants:
def scrape_competitor_products() -> list[CompetitorProduct]:
"""Scrape competitor insurance products and InsurTech launches."""
product_sources = [
{
"url": "https://www.insurancejournal.com/news/national/",
"prompt": """Extract new insurance product announcements including:
- Company name and product name
- Line of business and target market
- Key features and coverage innovations
- Distribution model (direct, agent, embedded)
- States where available
Focus on product launches from the past 30 days."""
},
{
"url": "https://www.insurtech.com/news/",
"prompt": """Extract InsurTech company news including:
- New product launches and partnerships
- Funding rounds and valuations
- Technology innovations (AI underwriting, parametric, usage-based)
- Market expansion announcements"""
},
{
"url": "https://www.carriermanagement.com/news/",
"prompt": """Extract carrier management news including:
- New program launches and appetite changes
- Market entry/exit announcements
- Leadership changes at major carriers
- M&A activity in the insurance space"""
}
]
all_products = []
for source in product_sources:
response = requests.post(
f"{BASE_URL}/scrape",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={"url": source["url"], "render_js": True}
)
extraction = requests.post(
f"{BASE_URL}/extract",
headers={"Authorization": f"Bearer {MANTIS_API_KEY}"},
json={
"content": response.json()["content"],
"schema": CompetitorProduct.model_json_schema(),
"prompt": source["prompt"],
"multiple": True
}
)
products = [CompetitorProduct(**p) for p in extraction.json()["data"]]
all_products.extend(products)
return all_products
Step 6: Store Everything in SQLite
Create a local database for historical tracking and trend analysis:
def init_insurance_db():
"""Initialize SQLite database for insurance intelligence."""
conn = sqlite3.connect("insurance_intel.db")
c = conn.cursor()
c.execute("""CREATE TABLE IF NOT EXISTS rate_filings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
state TEXT, company_name TEXT, naic_code TEXT,
line_of_business TEXT, filing_type TEXT,
serff_tracking TEXT UNIQUE,
rate_change_pct REAL, effective_date TEXT,
status TEXT, premium_impact TEXT,
filing_date TEXT, disposition_date TEXT, url TEXT,
scraped_at TEXT DEFAULT CURRENT_TIMESTAMP
)""")
c.execute("""CREATE TABLE IF NOT EXISTS carrier_financials (
id INTEGER PRIMARY KEY AUTOINCREMENT,
company_name TEXT, naic_code TEXT,
am_best_rating TEXT,
direct_written_premium REAL, net_written_premium REAL,
loss_ratio REAL, combined_ratio REAL,
surplus REAL, year INTEGER,
line_of_business TEXT,
scraped_at TEXT DEFAULT CURRENT_TIMESTAMP,
UNIQUE(naic_code, year, line_of_business)
)""")
c.execute("""CREATE TABLE IF NOT EXISTS catastrophe_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_name TEXT, event_type TEXT,
date TEXT, location TEXT,
estimated_insured_loss REAL,
estimated_economic_loss REAL,
pcs_number TEXT UNIQUE,
affected_lines TEXT, status TEXT, source TEXT,
scraped_at TEXT DEFAULT CURRENT_TIMESTAMP
)""")
c.execute("""CREATE TABLE IF NOT EXISTS competitor_products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
company_name TEXT, product_name TEXT,
line_of_business TEXT, target_market TEXT,
key_features TEXT, coverage_highlights TEXT,
pricing_model TEXT, distribution TEXT,
launch_date TEXT, states_available TEXT, url TEXT UNIQUE,
scraped_at TEXT DEFAULT CURRENT_TIMESTAMP
)""")
conn.commit()
return conn
def store_filings(conn, filings: list[RateFiling]):
"""Store rate filings with deduplication."""
c = conn.cursor()
new_count = 0
for f in filings:
try:
c.execute("""INSERT OR IGNORE INTO rate_filings
(state, company_name, naic_code, line_of_business,
filing_type, serff_tracking, rate_change_pct,
effective_date, status, premium_impact,
filing_date, disposition_date, url)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(f.state, f.company_name, f.naic_code, f.line_of_business,
f.filing_type, f.serff_tracking, f.rate_change_pct,
f.effective_date, f.status, f.premium_impact,
f.filing_date, f.disposition_date, f.url))
if c.rowcount > 0:
new_count += 1
except sqlite3.IntegrityError:
pass
conn.commit()
return new_count
Step 7: Anomaly Detection & AI Analysis
Detect significant rate changes, catastrophe developments, and market shifts โ then use GPT-4o to generate underwriting insights:
from openai import OpenAI
client = OpenAI()
def detect_insurance_anomalies(conn) -> list[dict]:
"""Detect anomalies in insurance data."""
c = conn.cursor()
anomalies = []
# 1. Large rate increases (>10%)
c.execute("""
SELECT state, company_name, line_of_business,
rate_change_pct, filing_date, status
FROM rate_filings
WHERE scraped_at > datetime('now', '-24 hours')
AND rate_change_pct > 10
ORDER BY rate_change_pct DESC
""")
for row in c.fetchall():
anomalies.append({
"type": "LARGE_RATE_INCREASE",
"severity": "critical" if row[3] > 25 else "high",
"state": row[0],
"company": row[1],
"line": row[2],
"rate_change_pct": row[3],
"filing_date": row[4],
"status": row[5]
})
# 2. Rate decreases (potential market softening signal)
c.execute("""
SELECT state, company_name, line_of_business,
rate_change_pct, filing_date
FROM rate_filings
WHERE scraped_at > datetime('now', '-24 hours')
AND rate_change_pct < -5
ORDER BY rate_change_pct ASC
""")
for row in c.fetchall():
anomalies.append({
"type": "RATE_DECREASE",
"severity": "important",
"state": row[0],
"company": row[1],
"line": row[2],
"rate_change_pct": row[3],
"filing_date": row[4]
})
# 3. New catastrophe events or updated loss estimates
c.execute("""
SELECT event_name, event_type, location,
estimated_insured_loss, status
FROM catastrophe_events
WHERE scraped_at > datetime('now', '-24 hours')
AND (status = 'developing' OR estimated_insured_loss > 1000000000)
""")
for row in c.fetchall():
anomalies.append({
"type": "CATASTROPHE_EVENT",
"severity": "critical" if (row[3] or 0) > 5e9 else "high",
"event_name": row[0],
"event_type": row[1],
"location": row[2],
"insured_loss": row[3],
"status": row[4]
})
# 4. AM Best rating downgrades
c.execute("""
SELECT company_name, am_best_rating, combined_ratio
FROM carrier_financials
WHERE scraped_at > datetime('now', '-7 days')
AND combined_ratio > 110
""")
for row in c.fetchall():
anomalies.append({
"type": "POOR_COMBINED_RATIO",
"severity": "important",
"company": row[0],
"am_best_rating": row[1],
"combined_ratio": row[2]
})
return anomalies
def analyze_insurance_market(anomalies: list[dict], filings: list, events: list) -> str:
"""Use GPT-4o to generate strategic insurance market analysis."""
market_context = {
"anomalies": anomalies,
"filing_summary": {
"total_filings": len(filings),
"avg_rate_change": sum(f.rate_change_pct or 0 for f in filings) / len(filings) if filings else 0,
"max_increase": max((f.rate_change_pct or 0 for f in filings), default=0),
"max_decrease": min((f.rate_change_pct or 0 for f in filings), default=0),
"lines_with_increases": list(set(
f.line_of_business for f in filings if (f.rate_change_pct or 0) > 5
)),
"states_with_hardening": list(set(
f.state for f in filings if (f.rate_change_pct or 0) > 10
))
},
"catastrophe_summary": {
"active_events": len([e for e in events if e.status == "developing"]),
"total_insured_losses": sum(e.estimated_insured_loss or 0 for e in events),
"event_types": list(set(e.event_type for e in events))
}
}
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are an insurance market analyst AI. Analyze the following
market data and provide:
1. MARKET STATUS: Hard or soft market? Direction of rates by line of business.
2. RATE TRENDS: Which lines are hardening/softening? Which states show the most movement?
3. CATASTROPHE IMPACT: How are recent events affecting pricing and capacity?
4. COMPETITOR SIGNALS: What are rate filings telling us about competitor strategy?
5. UNDERWRITING RECOMMENDATIONS: Where should we grow? Where should we pull back?
6. REINSURANCE IMPLICATIONS: How might cat losses affect treaty renewals?
Be specific with numbers. Distinguish between personal and commercial lines.
Flag anything requiring immediate underwriting action."""
}, {
"role": "user",
"content": f"Insurance market data:\n{json.dumps(market_context, indent=2, default=str)}"
}],
temperature=0.3
)
return response.choices[0].message.content
Step 8: Real-Time Alerts via Slack
Send structured alerts to underwriters, actuaries, and product managers:
def send_insurance_alert(anomalies: list[dict], analysis: str):
"""Send insurance market alerts via Slack."""
import requests as req
severity_emoji = {
"critical": "๐ด",
"high": "๐ ",
"important": "๐ก",
"minor": "๐ต"
}
anomaly_text = ""
for a in sorted(anomalies, key=lambda x: {"critical": 0, "high": 1, "important": 2, "minor": 3}[x["severity"]]):
emoji = severity_emoji[a["severity"]]
if a["type"] == "LARGE_RATE_INCREASE":
anomaly_text += f"{emoji} *RATE INCREASE* โ {a['company']} ({a['state']}): "
anomaly_text += f"+{a['rate_change_pct']:.1f}% on {a['line']} ({a['status']})\n"
elif a["type"] == "RATE_DECREASE":
anomaly_text += f"{emoji} *RATE DECREASE* โ {a['company']} ({a['state']}): "
anomaly_text += f"{a['rate_change_pct']:.1f}% on {a['line']}\n"
elif a["type"] == "CATASTROPHE_EVENT":
loss_str = f"${a['insured_loss']/1e9:.1f}B" if a['insured_loss'] else "TBD"
anomaly_text += f"{emoji} *CAT EVENT* โ {a['event_name']}: "
anomaly_text += f"{a['event_type']} in {a['location']} (est. loss: {loss_str})\n"
elif a["type"] == "POOR_COMBINED_RATIO":
anomaly_text += f"{emoji} *HIGH COMBINED RATIO* โ {a['company']}: "
anomaly_text += f"{a['combined_ratio']:.1f}% (rated {a['am_best_rating']})\n"
message = f"""๐ก๏ธ *Insurance Market Intelligence Report*
โโโโโโโโโโโโโโโโโโโโโโโโโ
*Anomalies Detected: {len(anomalies)}*
{anomaly_text}
โโโโโโโโโโโโโโโโโโโโโโโโโ
*AI Analysis:*
{analysis}
_Powered by Mantis WebPerception API โ monitoring 50 state DOIs + industry sources_"""
req.post(SLACK_WEBHOOK, json={"text": message})
Step 9: Automated Scheduling
Run the full pipeline on a schedule โ daily for rate filings, hourly for catastrophe events, weekly for financials:
import schedule
import time
def rate_filing_check():
"""Run daily โ scrape new rate filings from state DOIs."""
conn = init_insurance_db()
for state, url in state_doi_portals.items():
try:
filings = scrape_rate_filings(state, url)
new = store_filings(conn, filings)
if new > 0:
print(f"๐ {state}: {new} new rate filings")
except Exception as e:
print(f"Error scraping {state}: {e}")
# Detect anomalies and alert
anomalies = detect_insurance_anomalies(conn)
if anomalies:
events = scrape_catastrophe_events()
all_filings = [] # Retrieve from DB for context
analysis = analyze_insurance_market(anomalies, all_filings, events)
send_insurance_alert(anomalies, analysis)
conn.close()
def catastrophe_check():
"""Run every 2 hours โ monitor active cat events."""
conn = init_insurance_db()
events = scrape_catastrophe_events()
for e in events:
try:
conn.execute("""INSERT OR REPLACE INTO catastrophe_events
(event_name, event_type, date, location,
estimated_insured_loss, estimated_economic_loss,
pcs_number, affected_lines, status, source)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(e.event_name, e.event_type, e.date, e.location,
e.estimated_insured_loss, e.estimated_economic_loss,
e.pcs_number, json.dumps(e.affected_lines), e.status, e.source))
except sqlite3.IntegrityError:
pass
conn.commit()
# Alert on developing events with large loss estimates
developing = [e for e in events if e.status == "developing"]
if developing:
print(f"๐ช๏ธ {len(developing)} developing catastrophe events")
conn.close()
def competitor_scan():
"""Run weekly โ scan for new competitor products and InsurTech launches."""
conn = init_insurance_db()
products = scrape_competitor_products()
new_products = 0
for p in products:
try:
conn.execute("""INSERT OR IGNORE INTO competitor_products
(company_name, product_name, line_of_business,
target_market, key_features, coverage_highlights,
pricing_model, distribution, launch_date,
states_available, url)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(p.company_name, p.product_name, p.line_of_business,
p.target_market, json.dumps(p.key_features),
p.coverage_highlights, p.pricing_model, p.distribution,
p.launch_date, json.dumps(p.states_available), p.url))
if conn.execute("SELECT changes()").fetchone()[0] > 0:
new_products += 1
except sqlite3.IntegrityError:
pass
conn.commit()
if new_products > 0:
print(f"๐ {new_products} new competitor products detected")
conn.close()
# Schedule the pipeline
schedule.every().day.at("07:00").do(rate_filing_check)
schedule.every().day.at("19:00").do(rate_filing_check)
schedule.every(2).hours.do(catastrophe_check)
schedule.every().monday.at("09:00").do(competitor_scan)
print("๐ก๏ธ Insurance intelligence pipeline running...")
while True:
schedule.run_pending()
time.sleep(60)
Cost Comparison: Traditional vs. AI Agent Approach
| Platform | Monthly Cost | Data Coverage | Real-Time | AI Analysis |
|---|---|---|---|---|
| Verisk / ISO | $5,000โ$30,000 | Comprehensive actuarial | Daily | Basic |
| LexisNexis Risk | $3,000โ$20,000 | Claims + risk scoring | Yes | Rules-based |
| AM Best | $2,000โ$10,000 | Ratings + financials | Weekly | Manual reports |
| Guidewire / Duck Creek | $10,000โ$50,000 | Policy admin + data | Yes | Platform-dependent |
| AI Agent + Mantis | $29โ$299 | Customizable | Yes (2-hr cat) | GPT-4o powered |
Use Cases: Who Benefits?
1. Carriers & Underwriters
Monitor competitor rate filings across all 50 states to inform pricing strategy. When State Farm files for a 15% homeowners increase in Florida, your AI agent detects it within hours โ not weeks. Track combined ratios across your competitive set to identify carriers under pressure who may exit markets, creating growth opportunities.
2. Managing General Agents (MGAs)
MGAs need to stay ahead of market capacity shifts. An AI agent monitoring catastrophe events, reinsurance market signals, and carrier financial health can predict which carriers might restrict appetite โ giving you time to secure alternative capacity. Track rate adequacy across your programs by comparing your pricing to state filing trends.
3. InsurTech Startups
InsurTech companies building embedded insurance, parametric products, or usage-based models need competitive intelligence at startup speed. An AI agent scrapes competitor product launches, funding announcements, and distribution partnerships โ building a real-time competitive landscape that would cost $50K+ from a consulting firm.
4. Reinsurance Brokers
Track catastrophe loss development in real-time to advise clients on treaty structure and pricing. Monitor primary carrier rate filings to forecast cedant premium growth and loss trends. An AI agent aggregating data from PCS, NOAA, Artemis, and state DOIs provides the intelligence layer that supports treaty placement discussions.
Advanced: Market Cycle Detection
Use historical rate filing data to detect hard/soft market inflection points:
def detect_market_cycle(conn, line_of_business: str) -> dict:
"""Analyze rate filing trends to detect market cycle position."""
c = conn.cursor()
# Get quarterly average rate changes for the past 2 years
c.execute("""
SELECT
strftime('%Y-Q' || ((CAST(strftime('%m', filing_date) AS INTEGER)-1)/3 + 1), filing_date) as quarter,
AVG(rate_change_pct) as avg_change,
COUNT(*) as filing_count,
SUM(CASE WHEN rate_change_pct > 0 THEN 1 ELSE 0 END) as increases,
SUM(CASE WHEN rate_change_pct < 0 THEN 1 ELSE 0 END) as decreases
FROM rate_filings
WHERE line_of_business = ?
AND filing_date > date('now', '-2 years')
AND rate_change_pct IS NOT NULL
GROUP BY quarter
ORDER BY quarter
""", (line_of_business,))
quarterly_data = c.fetchall()
# Use GPT-4o to interpret the cycle
analysis = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": """You are an insurance market cycle analyst. Based on the
quarterly rate filing trends, determine:
1. Current market position: HARD, HARDENING, STABLE, SOFTENING, or SOFT
2. Direction: rates accelerating, decelerating, or flat
3. Inflection signals: any signs of cycle turning
4. Forecast: expected rate direction for next 2 quarters
5. Strategic recommendation: grow, hold, or contract in this line"""
}, {
"role": "user",
"content": f"""Rate filing trends for {line_of_business}:
{json.dumps([{
'quarter': q[0], 'avg_change': q[1],
'filings': q[2], 'increases': q[3], 'decreases': q[4]
} for q in quarterly_data], indent=2)}"""
}]
)
return {
"line": line_of_business,
"quarterly_data": quarterly_data,
"analysis": analysis.choices[0].message.content
}
Compliance & Best Practices
Insurance data scraping comes with specific considerations:
- State DOI filings are public records: Rate filings submitted to state insurance departments are public by law in most states. SERFF (System for Electronic Rates & Forms Filing) is the standard platform and allows public searches.
- NAIC data is partially public: The NAIC publishes aggregate industry statistics freely. Individual company financial statement data requires a subscription โ respect those terms.
- AM Best ratings: Published ratings are public, but detailed financial analysis and credit reports are subscription content. Scrape only publicly available rating information.
- Catastrophe data: NOAA, NWS, and FEMA data are public domain. Verisk PCS estimates may have redistribution restrictions โ use as directional intelligence, not for republication.
- Competitor websites: Product descriptions and publicly listed rates are fair game. Agent portals, quoting engines, and login-protected areas should not be scraped without authorization.
- Consumer data: Never scrape individual policyholder information, claims details with PII, or medical records. This violates state privacy laws and potentially HIPAA.
Getting Started
Ready to build your insurance intelligence system? Here's the quick start:
- Get a Mantis API key at mantisapi.com โ free tier includes 100 API calls/month
- Start with one state โ Pick your largest premium state and scrape its DOI portal for recent rate filings
- Add catastrophe monitoring โ NOAA and III provide the most accessible cat event data
- Set up anomaly detection โ Even simple threshold alerts (rate change >10%) catch the most important signals
- Layer in AI analysis โ GPT-4o turns rate filing data into underwriting intelligence
- Scale to all 50 states โ Once your single-state agent works, SERFF provides a consistent interface across most states
๐ก๏ธ Start Monitoring Insurance Markets Today
Build your first insurance intelligence agent in under 30 minutes. Free tier includes 100 API calls/month.
Get Your API Key โFurther Reading
- The Complete Guide to Web Scraping with AI Agents in 2026
- Web Scraping for Financial Data: Track Stocks, Earnings & Market Signals
- Web Scraping for Legal & Compliance: Track Regulations & Court Cases
- Web Scraping for Market Research: Analyze Competitors, Trends & Opportunities
- Web Scraping for Price Monitoring: Build an AI-Powered Price Tracker
- Structured Data Extraction with AI: Extracting Clean Data from Any Page