# web-search > Efficiently search the web and scrape pages using local SearXNG + Crawl4AI services, with natural language responses and optional compression - Author: danwt - Repository: danwt/claude-search-skill - Version: 20260114125439 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/danwt/claude-search-skill - Web: https://mule.run/skillshub/@@danwt/claude-search-skill~web-search:20260114125439 --- --- name: web-search description: Efficiently search the web and scrape pages using local SearXNG + Crawl4AI services, with natural language responses and optional compression --- # Web Search & Scrape (Local Services) Search the web and scrape page content using locally-running services via a proxy on port 8001. ## Prerequisites Docker services must be running from `~/Documents/repos/sketch-claude-search-skill`: ```bash cd ~/Documents/repos/sketch-claude-search-skill && docker compose up -d ``` Services: - Proxy: http://localhost:8001 (use this) - SearXNG: http://localhost:8080 (direct) - Crawl4AI: http://localhost:8000 (direct) ## Search the Web ```bash curl -s "http://localhost:8001/search?q=QUERY&format=json" | jq '.results[:10] | .[] | {title, url, content}' ``` Replace `QUERY` with URL-encoded search terms. Use `+` for spaces. Add `&compress=true&instruction=YOUR+INSTRUCTION` to have a cheap LLM process the results. The instruction is natural language, e.g.: - `brief summary of key facts` - `detailed analysis preserving all information` - `just the URLs and titles` - `extract only pricing information` ### Search Examples ```bash # Basic search curl -s "http://localhost:8001/search?q=rust+async+tutorial&format=json" | jq '.results[:10]' # Compressed search with natural language instruction curl -s "http://localhost:8001/search?q=rust+async+tutorial&format=json&compress=true&instruction=summarize+the+top+results+with+links" # Detailed compression curl -s "http://localhost:8001/search?q=weather+london&format=json&compress=true&instruction=extract+temperature+and+conditions+for+today+and+tomorrow" # Page 2 of results curl -s "http://localhost:8001/search?q=query&format=json&pageno=2" | jq '.results' # Search specific category (images, news, videos, science, files, it) curl -s "http://localhost:8001/search?q=query&format=json&categories=science" | jq '.results' # Time filter (day, week, month, year) curl -s "http://localhost:8001/search?q=query&format=json&time_range=week" | jq '.results' ``` ## Scrape a URL ```bash curl -s -X POST "http://localhost:8001/crawl" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' | jq '.markdown' ``` Add `"compress": true, "instruction": "your instruction"` to the JSON body to have a cheap LLM process the page content. ### Scrape Examples ```bash # Full response with metadata curl -s -X POST "http://localhost:8001/crawl" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com"}' | jq '{markdown, metadata, links}' # Compressed scrape with brief summary curl -s -X POST "http://localhost:8001/crawl" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "compress": true, "instruction": "brief summary"}' # Compressed scrape with detailed extraction curl -s -X POST "http://localhost:8001/crawl" \ -H "Content-Type: application/json" \ -d '{"url": "https://docs.example.com", "compress": true, "instruction": "extract all API endpoints and their parameters"}' # Target specific CSS selector curl -s -X POST "http://localhost:8001/crawl" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "css_selector": "article"}' | jq '.markdown' # Longer timeout for slow sites curl -s -X POST "http://localhost:8001/crawl" \ -H "Content-Type: application/json" \ -d '{"url": "https://example.com", "timeout": 60}' | jq '.markdown' ``` ## Health Check ```bash curl -s "http://localhost:8001/health" | jq ``` ## Troubleshooting ```bash # Check container status docker ps | grep -E "searxng|crawl4ai|redis|proxy" # Restart services cd ~/Documents/repos/sketch-claude-search-skill && docker compose restart # View logs docker logs search-proxy docker logs searxng docker logs crawl4ai-service ``` ## When to Use This - Searching for current information beyond your knowledge cutoff - Getting full content from a URL (not just snippets) - Researching topics that need multiple sources - Scraping JavaScript-heavy sites that WebFetch can't handle - Use compression with specific instructions to control how much detail you get back