# proweb > Advanced web search + scraping with no API key. Search via DuckDuckGo (instant answers), Bing (scraping), or all sources. Scrape pages for text, links, images, tables. Full HTML parsing with BeautifulSoup. Use for research, content extraction, web automation, and data gathering without Brave API or rate limits. - Author: shumaker-openclawbot - Repository: shumaker-openclawbot/proweb-skill - Version: 20260207083811 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/shumaker-openclawbot/proweb-skill - Web: https://mule.run/skillshub/@@shumaker-openclawbot/proweb-skill~proweb:20260207083811 --- --- name: proweb description: Advanced web search + scraping with no API key. Search via DuckDuckGo (instant answers), Bing (scraping), or all sources. Scrape pages for text, links, images, tables. Full HTML parsing with BeautifulSoup. Use for research, content extraction, web automation, and data gathering without Brave API or rate limits. --- # proweb v2 - Advanced Web Search & Scraping Industrial-strength web search and scraping without API keys. Multiple sources, full content extraction, no rate limits. ## Quick Start ### Search (Default: DuckDuckGo) ```bash proweb "your search query" --count 5 proweb "Python programming" --count 3 proweb "OpenClaw" --source bing --count 10 ``` ### Scrape First Result ```bash proweb "Python" --scrape # Scrape content from first result ``` ### Scrape Specific URL ```bash proweb --scrape-url "https://example.com" --extract all proweb --scrape-url "https://example.com" --extract text proweb --scrape-url "https://example.com" --extract links proweb --scrape-url "https://example.com" --extract images ``` ## Features ### Search Sources | Source | Speed | Depth | Reliability | Notes | |--------|-------|-------|-------------|-------| | **DuckDuckGo** | ⚡ Fast | Medium | Very High | Instant answers + related topics. No rate limits. Preferred. | | **Bing** | ⚡ Fast | Deep | High | Direct scraping. More results than DDG. | | **Google** | ⚡ Fast | Deep | Low* | Rate-limited, often blocked. Use sparingly. | | **All** | Medium | Deep | High | Combines DDG + Bing. Best coverage. | *Google actively blocks scrapers. DDG/Bing are more reliable. ### Scraping Modes When scraping a URL, extract: - **`text`** - Main text content (2000 char limit) - **`links`** - All hyperlinks with text + href (20 link limit) - **`images`** - All images with src, alt, title (10 image limit) - **`tables`** - Tabular data (3 table limit, 10 rows each) - **`all`** - Everything above ### Output Format All responses are JSON: ```json { "query": "Python", "count": 3, "source": "duckduckgo", "results": [ { "title": "Python (programming language)", "url": "https://...", "snippet": "A high-level, general-purpose...", "source": "duckduckgo-related", "scraped_content": "Optional: scraped text from URL" } ] } ``` For URL scraping: ```json { "url": "https://example.com", "title": "Example Domain", "text": "Full text content", "text_length": 142, "links": [{ "text": "Learn more", "href": "https://..." }], "link_count": 1, "images": [{ "src": "https://...", "alt": "description", "title": "title" }], "image_count": 0, "tables": [[[...]]], "table_count": 0 } ``` ## Usage Examples ### Example 1: Research a Topic ```bash proweb "machine learning best practices" --count 5 --source all ``` Returns 5 best results from DDG + Bing, combined and ranked. ### Example 2: Find Documentation ```bash proweb "Python documentation" --scrape ``` Searches for Python docs, then scrapes the first result to extract content preview. ### Example 3: Extract Data from URL ```bash proweb --scrape-url "https://example.com/data" --extract tables ``` Extracts all tables from a specific URL (useful for data gathering). ### Example 4: Gather Links from Page ```bash proweb --scrape-url "https://github.com/openclaw/openclaw" --extract links ``` Extracts all hyperlinks from a page for further analysis. ## Technical Details ### Requirements - Python 3.7+ - `curl` (for HTTP requests) - `beautifulsoup4` (for scraping) - `lxml` (for fast HTML parsing) ### Dependencies Installed ``` requests beautifulsoup4 lxml httpx selenium scrapy playwright ``` ### How It Works 1. **Search Mode:** - DuckDuckGo: Uses public JSON API (no scraping needed) - Bing/Google: Direct HTML scraping with BeautifulSoup - Results deduplicated and ranked 2. **Scrape Mode:** - Fetches HTML via curl with browser-like headers - Parses with BeautifulSoup + lxml - Extracts specified content types - Limits results (text, links, images) to prevent bloat ### Rate Limiting - **None imposed by proweb** — relies on target site limits - DuckDuckGo: Unlimited (API-based) - Bing: Very high (generous rate limits) - Google: Low (actively blocks scrapers) - Recommended: Use DDG or Bing for production ### Performance - Average search: 2-5 seconds - Average scrape: 3-8 seconds (depends on page size) - Timeout: 15 seconds per request ## Limitations - **Google scraping:** Often returns 0 results (rate limited) - **JavaScript-heavy sites:** Can't render JS (use agent-browser for that) - **Login-required pages:** Won't work without auth - **Cloudflare/WAF sites:** May be blocked - **PDF/binary files:** Can't be scraped directly ## When to Use proweb ✅ **Use proweb when:** - You need web search without paying for APIs - You want to extract text, links, images from pages - You're researching or gathering data - You need reliable, fast searches - Rate limits don't apply to your use case ❌ **Don't use proweb for:** - High-volume scraping (use dedicated scrapers) - JavaScript-rendered content (use agent-browser or Playwright) - Bypassing paywalls or login walls - Commercial data harvesting (check ToS) ## Troubleshooting **Empty results on Google?** - Google actively blocks scrapers. Use Bing or DDG instead. - If you need Google: `--source all` will still try. **Timeout errors?** - Network latency. Retry after a few seconds. - Large pages might timeout. Reduce expectations. **Rate limited?** - Bing/DDG rarely rate limit. Google will. - If hammering a site, space out requests. **Missing content when scraping?** - If page is JavaScript-heavy, proweb can't render it. - Use `agent-browser` skill instead. ## Script Location - `scripts/search.py` - Main executable (11KB, lightweight) ## Examples: Command Line ```bash # Search Python stuff python3 search.py "Python" --count 5 # Try all sources python3 search.py "web scraping" --source all --count 10 # Scrape first result python3 search.py "OpenClaw" --scrape # Scrape specific URL for all data python3 search.py --scrape-url "https://docs.python.org" --extract all # Just get links from a page python3 search.py --scrape-url "https://github.com/trending" --extract links # Get table data from URL python3 search.py --scrape-url "https://en.wikipedia.org/wiki/Python_(programming_language)" --extract tables ``` --- **proweb v2: The free, unrestricted web search tool for AI agents. 🌀** Built with BeautifulSoup, curl, and chaos. No API keys. No restrictions. Just data.