Web Scraper Extract structured data from websites using BeautifulSoup and requests - turn any webpage into usable data. When to Use This Skill Competitor research - Scrape pricing, features, positioning Lead generation - Extract contact info from directories Content audit - Pull headings, links, meta data Price monitoring - Track competitor pricing changes Data collection - Gather research data from multiple sources What Claude Does vs What You Decide Claude Does You Decide Structures analysis frameworks Strategic priorities Synthesizes market data Competitive positioning Identifies opportunities Resource allocation Creates strategic options Final strategy selection Suggests implementation approaches Execution decisions Dependencies pip install beautifulsoup4 requests pandas click lxml Commands Scrape Elements python scripts/main.py scrape https://example.com --selector "h1,h2,p" python scripts/main.py scrape https://example.com --selector ".product-price" Extract Links python scripts/main.py links https://example.com python scripts/main.py links https://example.com --internal-only Extract Emails python scripts/main.py emails https://example.com python scripts/main.py emails https://example.com --depth 2 Extract Structured Data python scripts/main.py structured https://example.com/article --schema article python scripts/main.py structured https://example.com/product --schema product Examples Example 1: Scrape Competitor Pricing python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"

Output:

Extracted 6 elements

1. Starter - $29/mo

2. Pro - $99/mo

3. Enterprise - Contact us

Example 2: Extract Article Content python scripts/main.py structured https://blog.example.com/post --schema article

Output: article_data.json

{

"title": "How to Scale Your Startup",

"author": "Jane Doe",

"date": "2024-01-15",

"content": "...",

"word_count": 1523

}

CSS Selector Reference Selector Description Example tag Element type h1 , p , div .class Class name .price , .title

id

Element ID

main-content

tag.class Tag with class div.product tag[attr] Has attribute a[href] parent > child Direct child ul > li tag1, tag2 Multiple h1, h2, h3 Ethical Scraping Guidelines Check robots.txt - Respect site's scraping policy Rate limit - Don't overload servers (1-2 req/sec) Identify yourself - Use descriptive User-Agent Cache requests - Don't re-scrape unchanged pages Terms of Service - Check if scraping is allowed Skill Boundaries What This Skill Does Well Structuring strategic analysis Identifying market opportunities Creating strategic frameworks Synthesizing competitive data What This Skill Cannot Do Replace market research Guarantee strategic success Know proprietary competitor info Make executive decisions

web-scraper

安装