๐Ÿ”ฑ Mantis

How to Scrape LinkedIn Profiles & Jobs in 2026

Extract profile data, job listings, and company information using Python, Node.js, and API-based approaches โ€” with production-ready code.

๐Ÿ“‘ Table of Contents

Why Scrape LinkedIn?

LinkedIn is the world's largest professional network with over 1 billion members and 67 million company profiles. That professional data powers some of the most valuable business intelligence in B2B:

Whether you're building a recruiting tool, a sales prospecting engine, or an AI research agent, LinkedIn data is foundational for any B2B data strategy.

What Data Can You Extract?

LinkedIn has three main data categories, each with different accessibility levels:

Public Profiles

Data PointAvailabilityNotes
Full NameUsually publicSome profiles show first name + last initial only
HeadlinePublicCurrent role/tagline
Current Company & TitlePublicMost recent experience entry
LocationPublicMetro area (e.g., "San Francisco Bay Area")
About/SummaryUsually publicMay be truncated without login
Experience HistoryUsually publicCompanies, titles, dates, descriptions
EducationUsually publicSchools, degrees, dates
SkillsPartially publicTop skills visible; full list may require login
CertificationsUsually publicIssuing org, credential ID
Profile Photo URLVariesSome users restrict to connections only

Job Listings

Data PointAvailabilityNotes
Job TitlePublicAccessible via linkedin.com/jobs
Company NamePublicLinks to company page
LocationPublicIncludes remote/hybrid/on-site labels
Job DescriptionPublicFull text on job detail page
Posting DatePublicRelative ("2 days ago") or absolute
Seniority LevelPublicEntry, Mid, Senior, Director, etc.
Employment TypePublicFull-time, Part-time, Contract
Applicant CountSometimes"Over 100 applicants" shown on some listings
Salary RangeSometimesShown when employer provides it

Company Pages

Data PointAvailabilityNotes
Company NamePublicOfficial name
IndustryPublicSelf-reported category
Employee CountPublicRange (e.g., "501-1,000")
HeadquartersPublicCity, state, country
DescriptionPublicAbout section
WebsitePublicCompany URL
Founded YearPublicWhen available
SpecialtiesPublicSelf-reported tags

Method 1: Python + BeautifulSoup

Best for scraping public LinkedIn pages that don't require login โ€” particularly job listings and public profiles. LinkedIn serves some content without JavaScript, making simple HTTP requests viable for basic data extraction.

Install Dependencies

pip install requests beautifulsoup4 lxml

Public Job Listings Scraper

# linkedin_jobs.py
import requests
from bs4 import BeautifulSoup
import json
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/125.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept": "text/html,application/xhtml+xml",
}

def scrape_linkedin_jobs(keyword: str, location: str = "",
                         pages: int = 3) -> list:
    """Scrape LinkedIn public job listings."""
    jobs = []

    for page in range(pages):
        start = page * 25
        url = (
            f"https://www.linkedin.com/jobs/search/"
            f"?keywords={keyword.replace(' ', '%20')}"
            f"&location={location.replace(' ', '%20')}"
            f"&start={start}"
        )
        resp = requests.get(url, headers=HEADERS, timeout=15)
        resp.raise_for_status()
        soup = BeautifulSoup(resp.text, "lxml")

        for card in soup.select(".base-card"):
            title_el = card.select_one(".base-search-card__title")
            company_el = card.select_one(
                ".base-search-card__subtitle a"
            )
            location_el = card.select_one(
                ".job-search-card__location"
            )
            link_el = card.select_one("a.base-card__full-link")
            date_el = card.select_one("time")

            jobs.append({
                "title": (
                    title_el.text.strip() if title_el else None
                ),
                "company": (
                    company_el.text.strip() if company_el else None
                ),
                "location": (
                    location_el.text.strip()
                    if location_el else None
                ),
                "url": link_el["href"] if link_el else None,
                "posted": (
                    date_el.get("datetime") if date_el else None
                ),
            })

        time.sleep(random.uniform(3, 8))

    return jobs

# Example: search for Python developer jobs
results = scrape_linkedin_jobs(
    "python developer", "San Francisco", pages=2
)
print(f"Found {len(results)} jobs")
print(json.dumps(results[:3], indent=2))

Public Profile Scraper

# linkedin_profile.py
def scrape_public_profile(profile_url: str) -> dict:
    """Scrape a public LinkedIn profile page."""
    resp = requests.get(profile_url, headers=HEADERS, timeout=15)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "lxml")

    def text(selector):
        el = soup.select_one(selector)
        return el.get_text(strip=True) if el else None

    # Extract experience entries
    experiences = []
    for exp in soup.select(".experience-item"):
        title = exp.select_one(".experience-item__title")
        company = exp.select_one(".experience-item__subtitle")
        duration = exp.select_one(".experience-item__duration")
        experiences.append({
            "title": title.text.strip() if title else None,
            "company": company.text.strip() if company else None,
            "duration": duration.text.strip() if duration else None,
        })

    # Extract education
    education = []
    for edu in soup.select(".education__list-item"):
        school = edu.select_one(".education__item--school")
        degree = edu.select_one(".education__item--degree")
        education.append({
            "school": school.text.strip() if school else None,
            "degree": degree.text.strip() if degree else None,
        })

    return {
        "name": text(".top-card-layout__title"),
        "headline": text(".top-card-layout__headline"),
        "location": text(".top-card-layout__first-subline"),
        "about": text(".summary .core-section-container__content"),
        "experience": experiences,
        "education": education,
        "url": profile_url,
    }

profile = scrape_public_profile(
    "https://www.linkedin.com/in/example-profile"
)
print(json.dumps(profile, indent=2))
โš ๏ธ Important: Public vs. Authenticated

The examples above only scrape publicly accessible LinkedIn pages โ€” no login required. LinkedIn shows significantly more data to logged-in users, but scraping while authenticated carries much higher legal risk and violates LinkedIn's Terms of Service more clearly. For production use, stick to public data or use an API.

Method 2: Playwright (Headless Browser)

LinkedIn relies heavily on JavaScript for rendering profile sections, job details, and infinite scroll. Playwright renders the full page, giving you access to dynamically loaded content.

Install

pip install playwright
playwright install chromium

Job Detail Scraper

# playwright_linkedin.py
import asyncio
from playwright.async_api import async_playwright
import json

async def scrape_job_detail(job_url: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )

        page = await context.new_page()

        # Block images and tracking for speed
        await page.route("**/*.{png,jpg,jpeg,gif,svg}", 
                         lambda route: route.abort())
        await page.route("**/tracking/**",
                         lambda route: route.abort())

        await page.goto(job_url, wait_until="domcontentloaded")
        await page.wait_for_timeout(3000)

        # Click "Show more" to expand description
        try:
            show_more = page.locator(
                "button.show-more-less-html__button"
            )
            if await show_more.count() > 0:
                await show_more.first.click()
                await page.wait_for_timeout(500)
        except Exception:
            pass

        job = await page.evaluate("""() => {
            const text = (sel) => {
                const el = document.querySelector(sel);
                return el ? el.textContent.trim() : null;
            };
            const criteria = {};
            document.querySelectorAll(
                '.description__job-criteria-item'
            ).forEach(item => {
                const label = item.querySelector(
                    '.description__job-criteria-subheader'
                );
                const value = item.querySelector(
                    '.description__job-criteria-text'
                );
                if (label && value) {
                    criteria[label.textContent.trim()
                        .toLowerCase().replace(/ /g, '_')]
                        = value.textContent.trim();
                }
            });
            return {
                title: text('.top-card-layout__title'),
                company: text('.topcard__org-name-link')
                    || text('.top-card-layout__second-subline a'),
                location: text('.topcard__flavor--bullet'),
                description: text(
                    '.show-more-less-html__markup'
                ),
                posted: text('.posted-time-ago__text'),
                applicants: text('.num-applicants__caption'),
                criteria: criteria,
            };
        }""")

        await browser.close()
        return job

# Scrape a specific job listing
job = asyncio.run(scrape_job_detail(
    "https://www.linkedin.com/jobs/view/1234567890"
))
print(json.dumps(job, indent=2))

Profile Scraper with Infinite Scroll

# playwright_profile.py
async def scrape_profile_full(profile_url: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent=(
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36"
            ),
            viewport={"width": 1920, "height": 1080},
        )
        page = await context.new_page()

        await page.goto(profile_url, wait_until="domcontentloaded")
        await page.wait_for_timeout(2000)

        # Scroll to load lazy sections
        for _ in range(5):
            await page.evaluate(
                "window.scrollBy(0, window.innerHeight)"
            )
            await page.wait_for_timeout(800)

        profile = await page.evaluate("""() => {
            const text = (sel) => {
                const el = document.querySelector(sel);
                return el ? el.textContent.trim() : null;
            };
            const allText = (sel) => {
                return [...document.querySelectorAll(sel)]
                    .map(el => el.textContent.trim())
                    .filter(Boolean);
            };
            return {
                name: text('.top-card-layout__title'),
                headline: text('.top-card-layout__headline'),
                location: text('.top-card-layout__first-subline'),
                about: text(
                    '.summary .core-section-container__content p'
                ),
                experience: allText(
                    '.experience-item .experience-item__title'
                ),
                education: allText(
                    '.education__item--school-name'
                ),
                skills: allText('.skill-categories-card li'),
            };
        }""")

        await browser.close()
        return profile
๐Ÿ’ก Pro Tip: Avoid Login Walls

LinkedIn shows an "authwall" modal prompting login after a few page views. Adding ?trk=public_profile to profile URLs and using clean browser contexts (no cookies) can help. Rotating IP addresses between requests also resets LinkedIn's session tracking.

Method 3: Node.js + Cheerio

Lightweight and fast โ€” ideal for scraping LinkedIn job listings from a Node.js backend or serverless function.

Install

npm install cheerio node-fetch

Job Search Scraper

// linkedin-jobs.mjs
import fetch from "node-fetch";
import * as cheerio from "cheerio";

const HEADERS = {
  "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
    "AppleWebKit/537.36 Chrome/125.0.0.0 Safari/537.36",
  "Accept-Language": "en-US,en;q=0.9",
  Accept: "text/html",
};

async function scrapeJobs(keyword, location = "", pages = 3) {
  const jobs = [];

  for (let page = 0; page < pages; page++) {
    const start = page * 25;
    const url =
      `https://www.linkedin.com/jobs/search/` +
      `?keywords=${encodeURIComponent(keyword)}` +
      `&location=${encodeURIComponent(location)}` +
      `&start=${start}`;

    const resp = await fetch(url, { headers: HEADERS });
    const html = await resp.text();
    const $ = cheerio.load(html);

    $(".base-card").each((_, card) => {
      const $card = $(card);
      jobs.push({
        title: $card.find(".base-search-card__title")
          .text().trim() || null,
        company: $card.find(".base-search-card__subtitle a")
          .text().trim() || null,
        location: $card.find(".job-search-card__location")
          .text().trim() || null,
        url: $card.find("a.base-card__full-link")
          .attr("href") || null,
        posted: $card.find("time").attr("datetime") || null,
      });
    });

    // Polite delay
    await new Promise((r) =>
      setTimeout(r, 3000 + Math.random() * 5000)
    );
  }

  return jobs;
}

// Usage
const jobs = await scrapeJobs("machine learning engineer", "New York");
console.log(`Found ${jobs.length} jobs`);
console.log(JSON.stringify(jobs.slice(0, 3), null, 2));

Company Page Scraper

// linkedin-company.mjs
async function scrapeCompany(companySlug) {
  const url =
    `https://www.linkedin.com/company/${companySlug}/about/`;
  const resp = await fetch(url, { headers: HEADERS });
  const html = await resp.text();
  const $ = cheerio.load(html);

  const text = (sel) => $(sel).first().text().trim() || null;

  // Extract detail items
  const details = {};
  $(".core-section-container__content dt").each((i, el) => {
    const key = $(el).text().trim().toLowerCase()
      .replace(/ /g, "_");
    const value = $(el).next("dd").text().trim();
    if (key && value) details[key] = value;
  });

  return {
    name: text(".top-card-layout__title"),
    tagline: text(".top-card-layout__headline"),
    description: text(
      ".core-section-container__content p"
    ),
    ...details,
    url: `https://www.linkedin.com/company/${companySlug}`,
  };
}

const company = await scrapeCompany("google");
console.log(JSON.stringify(company, null, 2));

Method 4: Web Scraping API (Easiest)

The most reliable approach for production LinkedIn scraping. A web scraping API handles proxies, anti-detection, JavaScript rendering, and login wall bypassing โ€” you just send a URL and get structured data back.

Scraping Profiles with Mantis

# One API call โ€” structured LinkedIn profile data
import requests

resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": "https://www.linkedin.com/in/example-profile",
        "extract": {
            "name": "person's full name",
            "headline": "professional headline",
            "location": "geographic location",
            "about": "about/summary section",
            "current_company": "current employer name",
            "current_title": "current job title",
            "experience": "work experience (array of: company, title, dates, description)",
            "education": "education history (array of: school, degree, dates)",
            "skills": "listed skills (array)",
        },
        "render_js": True,
    },
)

profile = resp.json()
print(profile)

Scraping Job Listings with Mantis

# Bulk job search
resp = requests.post(
    "https://api.mantisapi.com/v1/scrape",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "url": (
            "https://www.linkedin.com/jobs/search/"
            "?keywords=AI+engineer&location=Remote"
        ),
        "extract": {
            "jobs": (
                "array of job listings with: "
                "title, company, location, url, posted_date"
            ),
        },
        "render_js": True,
    },
)

jobs = resp.json().get("jobs", [])
for job in jobs[:5]:
    print(f"{job['title']} at {job['company']} โ€” {job['location']}")

Skip the Login Walls and IP Bans

Mantis handles LinkedIn's anti-bot detection, proxy rotation, authwall bypassing, and JavaScript rendering โ€” so you don't have to.

View Pricing Get Started Free

Node.js with Mantis

// mantis-linkedin.mjs
const resp = await fetch("https://api.mantisapi.com/v1/scrape", {
  method: "POST",
  headers: {
    Authorization: "Bearer YOUR_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://www.linkedin.com/in/example-profile",
    extract: {
      name: "full name",
      headline: "professional headline",
      current_role: "current job title and company",
      experience: "work history (array)",
      skills: "professional skills (array)",
    },
    render_js: true,
  }),
});

const profile = await resp.json();
console.log(profile);

Beating LinkedIn's Anti-Bot Detection

LinkedIn invests heavily in detecting and blocking scrapers. Here's what you're up against:

LinkedIn's Defense Layers

DefenseWhat It DoesCountermeasure
AuthwallPrompts login after a few page viewsClean sessions, ?trk= params, IP rotation
IP Rate LimitingBlocks IPs with too many requestsRotating residential proxies
CAPTCHA ChallengesServes CAPTCHA on suspicious patternsCAPTCHA solving services or API
Account RestrictionsLimits or suspends accounts scraping while logged inDon't use personal accounts; use public pages
Browser FingerprintingDetects headless browsers via JSStealth plugins, real browser profiles
Session TrackingCorrelates requests across pageviewsFresh sessions per request batch
Honeypot LinksInvisible links that only bots followOnly follow visible, user-facing links

Essential Anti-Detection Techniques

# linkedin_stealth.py
import random
import time
import requests

PROXY_POOL = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
    "Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
    "Chrome/125.0.0.0 Safari/537.36",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
    "Chrome/124.0.0.0 Safari/537.36",
]

def create_linkedin_session():
    """Create a session configured for LinkedIn scraping."""
    session = requests.Session()
    proxy = random.choice(PROXY_POOL)
    session.proxies = {"http": proxy, "https": proxy}
    session.headers.update({
        "User-Agent": random.choice(USER_AGENTS),
        "Accept-Language": "en-US,en;q=0.9",
        "Accept": "text/html,application/xhtml+xml",
        "Referer": "https://www.google.com/",
        "DNT": "1",
    })
    return session

def is_authwall(html: str) -> bool:
    """Detect LinkedIn's login prompt."""
    return any(marker in html.lower() for marker in [
        "authwall", "sign in", "join now",
        "login-form", "session_key",
    ])

def scrape_with_retry(url: str, max_retries: int = 3) -> str:
    """Scrape with authwall detection and retry."""
    for attempt in range(max_retries):
        session = create_linkedin_session()
        try:
            resp = session.get(url, timeout=15)
            if resp.status_code == 429:
                wait = 30 * (attempt + 1)
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
                continue
            if is_authwall(resp.text):
                print("Authwall detected โ€” rotating proxy")
                continue
            return resp.text
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
        time.sleep(random.uniform(5, 15))
    raise Exception(f"Failed after {max_retries} retries")

def polite_delay():
    """Longer delays for LinkedIn โ€” they're aggressive."""
    time.sleep(random.uniform(5, 20))
๐Ÿ’ก Golden Rule for LinkedIn Scraping

Stick to public pages. LinkedIn's public job listings (linkedin.com/jobs/) and public profiles (linkedin.com/in/) are the safest targets. Scraping while logged in dramatically increases the risk of account suspension and legal action.

LinkedIn Official API vs Scraping

LinkedIn offers several APIs, but none provide the data most scrapers need:

FeatureLinkedIn APIWeb ScrapingMantis API
Profile DataOwn profile only (authenticated user)Any public profileAny public profile
Job ListingsNot available via APIFull search + detail pagesFull search + detail pages
Company DataManaged pages onlyAny public company pageAny public company page
People SearchNot availableVia Google dorks or site searchVia URL-based extraction
Rate LimitsVaries (100-500 calls/day typical)Depends on proxy poolBased on plan (up to 100K/mo)
SetupApp review required (weeks-months)Immediate (need proxies)Immediate (API key)
ApprovalRequires LinkedIn partnership for most endpointsNoneNone
CostFree (very limited) or enterprise pricing$50-500+/mo (proxies)$0-299/mo
ReliabilityHigh (when approved)Medium (authwalls, blocks)High (maintained infrastructure)
Legal RiskNone (authorized)Low for public data (hiQ ruling)API handles compliance
๐Ÿ’ก The LinkedIn API Gap

LinkedIn's API is designed for marketing and authentication, not data access. You can post on company pages and sign users in, but you cannot search for people, browse job listings, or access other users' profiles. This is why scraping (especially public pages) remains the primary method for LinkedIn data collection.

Method Comparison

CriteriaPython + BS4PlaywrightNode.js + CheerioMantis API
Setup Time5 min10 min5 min2 min
JS RenderingโŒโœ…โŒโœ…
Authwall HandlingManual detectionManual workaroundManual detectionBuilt-in
Anti-DetectionBasicGood (with stealth)BasicBuilt-in
SpeedFastSlowFastMedium
MaintenanceHighHighHighNone
ScaleLowLowMediumHigh
Cost (5K profiles/mo)$100-300 (proxies)$200-500 (proxies + compute)$100-300 (proxies)$99 (Pro plan)
Best ForJob listingsFull profilesServerless jobsProduction

Real-World Use Cases

1. Recruiter Lead Generation Tool

Build a pipeline that finds candidates matching specific criteria โ€” title, skills, location โ€” and enriches them with LinkedIn profile data for outreach.

# recruiter_tool.py
import requests
import json

MANTIS_KEY = "YOUR_API_KEY"

def find_candidates(
    title: str, location: str, skills: list
) -> list:
    """Find candidates matching criteria via LinkedIn."""
    # Use Google to find LinkedIn profiles
    query = (
        f'site:linkedin.com/in/ "{title}" "{location}" '
        + " ".join(f'"{s}"' for s in skills[:3])
    )
    search_resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": f"https://www.google.com/search?q={query}&num=10",
            "extract": {
                "results": (
                    "array of search results with: "
                    "title, url, snippet"
                ),
            },
            "render_js": True,
        },
    )
    results = search_resp.json().get("results", [])

    # Enrich each profile
    candidates = []
    for r in results:
        if "linkedin.com/in/" not in r.get("url", ""):
            continue
        profile_resp = requests.post(
            "https://api.mantisapi.com/v1/scrape",
            headers={
                "Authorization": f"Bearer {MANTIS_KEY}",
                "Content-Type": "application/json",
            },
            json={
                "url": r["url"],
                "extract": {
                    "name": "full name",
                    "headline": "professional headline",
                    "location": "location",
                    "current_company": "current employer",
                    "current_title": "current job title",
                    "experience_years": "total years of experience (estimate)",
                    "skills": "key skills (array, top 10)",
                },
                "render_js": True,
            },
        )
        candidate = profile_resp.json()
        candidate["linkedin_url"] = r["url"]
        candidates.append(candidate)

    return candidates

candidates = find_candidates(
    title="Senior ML Engineer",
    location="San Francisco",
    skills=["PyTorch", "LLM", "MLOps"],
)
print(f"Found {len(candidates)} candidates")
for c in candidates:
    print(f"  {c.get('name')} โ€” {c.get('headline')}")
    print(f"  {c.get('linkedin_url')}\n")

2. Job Market Analyzer

Track hiring trends, salary ranges, and skill demand across industries โ€” useful for career platforms, salary benchmarking tools, and workforce analytics.

# job_analyzer.py
import requests
import json
from collections import Counter
from datetime import datetime

MANTIS_KEY = "YOUR_API_KEY"

def analyze_job_market(
    keyword: str, location: str = ""
) -> dict:
    """Analyze job listings for trends."""
    # Scrape job listings
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                f"https://www.linkedin.com/jobs/search/"
                f"?keywords={keyword}&location={location}"
            ),
            "extract": {
                "jobs": (
                    "array of all visible jobs with: title, "
                    "company, location, seniority_level, "
                    "employment_type, posted_date"
                ),
            },
            "render_js": True,
        },
    )
    jobs = resp.json().get("jobs", [])

    # Analyze trends
    companies = Counter(
        j.get("company", "Unknown") for j in jobs
    )
    locations = Counter(
        j.get("location", "Unknown") for j in jobs
    )
    seniority = Counter(
        j.get("seniority_level", "Unknown") for j in jobs
    )

    return {
        "keyword": keyword,
        "total_listings": len(jobs),
        "top_hiring_companies": dict(companies.most_common(10)),
        "top_locations": dict(locations.most_common(10)),
        "seniority_distribution": dict(seniority),
        "analyzed_at": datetime.utcnow().isoformat(),
    }

report = analyze_job_market("AI engineer", "United States")
print(json.dumps(report, indent=2))

3. AI Agent Talent Sourcing

Give an AI agent the ability to search for professionals and generate outreach recommendations โ€” a core building block for AI-powered recruiting and sales assistants.

# agent_talent.py โ€” LangChain tool for LinkedIn research
from langchain.tools import tool
import requests

MANTIS_KEY = "YOUR_API_KEY"

@tool
def search_linkedin_people(query: str) -> str:
    """Search for professionals on LinkedIn and return
    their profiles with current roles and companies."""
    # Google X-ray search for LinkedIn profiles
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                "https://www.google.com/search?"
                f"q=site:linkedin.com/in/+{query}&num=5"
            ),
            "extract": {
                "profiles": (
                    "array of LinkedIn profiles with: "
                    "name, title, company, linkedin_url"
                ),
            },
            "render_js": True,
        },
    )
    profiles = resp.json().get("profiles", [])

    if not profiles:
        return f"No LinkedIn profiles found for '{query}'"

    result = f"LinkedIn profiles matching '{query}':\n\n"
    for i, p in enumerate(profiles, 1):
        result += (
            f"{i}. {p.get('name', 'N/A')}\n"
            f"   {p.get('title', 'N/A')} at "
            f"{p.get('company', 'N/A')}\n"
            f"   {p.get('linkedin_url', '')}\n\n"
        )
    return result

@tool
def search_linkedin_jobs(query: str) -> str:
    """Search LinkedIn for job openings and return
    listings with company, location, and posting date."""
    resp = requests.post(
        "https://api.mantisapi.com/v1/scrape",
        headers={
            "Authorization": f"Bearer {MANTIS_KEY}",
            "Content-Type": "application/json",
        },
        json={
            "url": (
                "https://www.linkedin.com/jobs/search/"
                f"?keywords={query}"
            ),
            "extract": {
                "jobs": (
                    "array of top 5 jobs with: title, "
                    "company, location, posted_date, url"
                ),
            },
            "render_js": True,
        },
    )
    jobs = resp.json().get("jobs", [])

    if not jobs:
        return f"No jobs found for '{query}'"

    result = f"LinkedIn jobs for '{query}':\n\n"
    for i, j in enumerate(jobs, 1):
        result += (
            f"{i}. {j.get('title', 'N/A')} at "
            f"{j.get('company', 'N/A')}\n"
            f"   ๐Ÿ“ {j.get('location', 'N/A')} | "
            f"Posted: {j.get('posted_date', 'N/A')}\n\n"
        )
    return result

# Use in a LangChain agent
# agent = create_agent(
#     tools=[search_linkedin_people, search_linkedin_jobs]
# )

LinkedIn scraping has the most developed legal precedent of any website, thanks to the landmark hiQ Labs case:

โš–๏ธ Best Practices for LinkedIn Scraping

Only scrape publicly accessible pages (no login required). Respect rate limits โ€” don't hammer LinkedIn's servers. Don't store personal data longer than necessary. Comply with GDPR/CCPA if applicable. Never scrape private messages, connection lists, or data behind authentication. Document your legitimate business purpose. Consult legal counsel for commercial use. Consider a web scraping API that handles compliance for you.

Production-Ready LinkedIn Data Extraction

Stop fighting authwalls, IP bans, and broken selectors. Mantis extracts structured LinkedIn data with a single API call.

View Pricing Get Started Free

Frequently Asked Questions

Is it legal to scrape LinkedIn?

The hiQ Labs v. LinkedIn ruling (2022) established that scraping publicly accessible LinkedIn data does not violate the CFAA. However, LinkedIn's Terms of Service still prohibit scraping (a contract matter, not criminal). Scraping data behind login walls carries more risk. Consult legal counsel for commercial use.

How do I scrape LinkedIn without getting blocked?

Use rotating residential proxies, add random delays (5-20 seconds), rotate User-Agents, limit to under 100 profiles per IP per day, avoid scraping while logged in, and handle authwall detection. Or use a web scraping API that handles this automatically.

Can I scrape LinkedIn job listings?

Yes. LinkedIn job listings at linkedin.com/jobs are publicly accessible without login. You can extract job titles, companies, locations, descriptions, and posting dates.

What data can I extract from LinkedIn profiles?

From public profiles: name, headline, current company/title, location, about section, experience history, education, skills, and certifications. Visibility depends on individual privacy settings.

Does LinkedIn have an official API for profile data?

LinkedIn's API only allows access to the authenticated user's own basic profile. There is no API to search for people, browse others' profiles, or access job listings in bulk. The API is designed for marketing and authentication, not data access.

How many LinkedIn profiles can I scrape per day?

Without proxies: 50-80 before blocking. With rotating residential proxies: 500-2,000. With Mantis API: up to 100,000/month on the Scale plan.

Related Guides