# browser-os > Control browser via BrowserOS MCP - use JavaScript and screenshots together for reliable extraction - Author: KIrill Enkogu - Repository: enkogu/pipeline-fullstack-app - Version: 20260125155958 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/enkogu/pipeline-fullstack-app - Web: https://mule.run/skillshub/@@enkogu/pipeline-fullstack-app~browser-os:20260125155958 --- --- name: browser-os description: Control browser via BrowserOS MCP - use JavaScript and screenshots together for reliable extraction --- # BrowserOS Skill Browser automation via BrowserOS MCP. Enables navigating websites, extracting content, clicking elements, and executing JavaScript. **Authentication**: User is already logged in to all services. Do not ask about login - just navigate and proceed. ## REQUIRED: Check Availability First **ALWAYS start with availability check.** This verifies the helper extension is connected. ``` run_skill("browser-os/check-availability.py", "") ``` Returns: - `{"success": true, "port": 9129}` → Proceed with your task - `{"success": false, "error": "..."}` → Report to user, BrowserOS needs manual attention Auto-recovery is built-in: if helper is disconnected, it will restart BrowserOS and retry. ## Critical: Wait for Page Load After navigation, **ALWAYS wait for the page to fully load** before extracting content: ```python # 1. Navigate run_skill("browser-os/navigate.py", "https://x.com/home") # 2. Wait and check load status (poll until complete) run_skill("browser-os/get-load-status.py", "TAB_ID") # Keep checking until: "DOM Content Loaded: Yes" AND "Page Complete: Yes" # 3. Only then extract content run_skill("browser-os/get-page-content.py", "TAB_ID text") ``` **If you skip waiting**, you'll get "No content extracted" or empty results. ## Content Extraction Strategy **Use BOTH JavaScript AND screenshots together for reliable extraction.** | Method | When to Use | |--------|-------------| | **execute-javascript.py** | Primary method - structured data, counts, specific elements | | **screenshot.py** | Use alongside JS - visual verification, complex layouts, when JS returns empty | | **get-page-content.py** | Quick text overview, finding general content | **Best Practice**: After JavaScript extraction, take a screenshot to verify what was captured. If JS returns empty/incomplete, screenshot provides visual context to understand why and what's actually on screen. ### get-page-content.py - Text Extraction ``` run_skill("browser-os/get-page-content.py", "TAB_ID text") run_skill("browser-os/get-page-content.py", "TAB_ID text-with-links") ``` Returns full page text. Good for reading content, finding information. ### execute-javascript.py - Structured Data Extraction **Best for extracting lists, arrays, or structured data from pages.** ``` run_skill("browser-os/execute-javascript.py", '["TAB_ID", "JavaScript code here"]') ``` **IMPORTANT:** - Use JSON array format: `'["TAB_ID", "code"]'` - JavaScript must be single-line (use semicolons, not newlines) - Top-level `return` statements are auto-wrapped in IIFE ## CRITICAL: JavaScript Return Value Limitations **Arrays and objects return `{}` (empty object).** This is a known limitation. ✅ **WORKS** - Primitives and strings: - `document.querySelectorAll("article").length` → `6` - `document.querySelector("article").innerText` → `"Tweet content..."` - `Array.from(...).map(...).join(" ### ")` → `"item1 ### item2 ### item3"` ❌ **FAILS** - Returns `{}`: - `Array.from(document.querySelectorAll("article")).map(...)` → `{}` - `[{text: "...", date: "..."}]` → `{}` - `JSON.stringify([...])` → `{}` **SOLUTION**: Always join arrays to a string with a separator: ```javascript Array.from(document.querySelectorAll("article")) .slice(0, 10) .map(el => el.innerText.slice(0, 300)) .join(" ### ") ``` Then parse the separator in your response. ## Robust Extraction Patterns **For X.com/Twitter tweets:** ``` run_skill("browser-os/execute-javascript.py", '["TAB_ID", "Array.from(document.querySelectorAll(\\\"article\\\")).slice(0,15).map(el => (el.innerText.slice(0,300))).join(\\\" ||| \\\")"]') ``` **For LinkedIn profiles:** ``` run_skill("browser-os/execute-javascript.py", '["TAB_ID", "Array.from(document.querySelectorAll(\\\"[data-view-name=profile-card]\\\")).slice(0,5).map(el => el.innerText.slice(0,200)).join(\\\" ||| \\\")"]') ``` **AVOID fragile selectors** like `data-testid="tweetText"` - these change frequently. Use semantic selectors like `article`, `section`, or visible text patterns. ## Infinite Scroll Collection When collecting N items from infinite-scroll feeds: 1. **Check initial count**: `document.querySelectorAll("article").length` 2. **Scroll down**: `scroll-down.py` 3. **Check new count** - if unchanged after 2 scrolls, the feed is exhausted 4. **Extract what's available** - if you have 11 but need 15, use 11 and note the limitation 5. **Max 5 scroll attempts** - don't loop indefinitely ```python # Pattern for collecting items count = execute_js("document.querySelectorAll('article').length") # e.g., 6 scroll_down() scroll_down() new_count = execute_js("document.querySelectorAll('article').length") # e.g., 11 # If new_count == count after scrolling, stop trying ``` **Tip**: For simple extraction, `get-page-content.py` often suffices. Use JavaScript only when you need structured arrays or specific DOM elements. ### screenshot.py - Visual Analysis **Use screenshots proactively, not just as fallback.** Screenshots help you: - Verify what JavaScript actually extracted - Understand page layout when selectors aren't working - Identify correct elements to target - Debug why extraction returned empty results ``` run_skill("browser-os/screenshot.py", '["TAB_ID", "What elements are visible?"]') run_skill("browser-os/screenshot.py", '["TAB_ID", "List all tweets with their text and dates"]') run_skill("browser-os/screenshot.py", '["TAB_ID", "Query about the page", "large"]') ``` **When to take screenshots:** - After JavaScript returns `{}` or empty string - When you're unsure which CSS selectors to use - After scrolling to verify new content loaded - To visually confirm extracted data matches what's on screen Requires BrowserOS window to be in foreground. ## Complete Tool Reference ### Tab Management | Tool | Usage | Description | |------|-------|-------------| | get-active-tab.py | `""` | Get current tab ID and URL | | list-tabs.py | `""` | List all open tabs | | open-tab.py | `"URL"` | Open new tab with URL | | close-tab.py | `"TAB_ID"` | Close tab by ID | | switch-tab.py | `"TAB_ID"` | Switch to tab | | get-load-status.py | `"TAB_ID"` | Check if page loaded | ### Navigation | Tool | Usage | Description | |------|-------|-------------| | navigate.py | `"URL"` or `'["URL", "TAB_ID"]'` | Navigate to URL | ### Content Extraction | Tool | Usage | Description | |------|-------|-------------| | get-page-content.py | `"TAB_ID text"` | Extract page text | | execute-javascript.py | `'["TAB_ID", "code"]'` | Run JavaScript, get result | | screenshot.py | `'["TAB_ID", "query"]'` | Screenshot with vision analysis | ### Element Interaction | Tool | Usage | Description | |------|-------|-------------| | get-interactive-elements.py | `"TAB_ID"` | Get clickable elements with nodeIds | | click-element.py | `"TAB_ID NODE_ID"` | Click element by nodeId | | type-text.py | `"TAB_ID NODE_ID 'text'"` | Type into input | | clear-input.py | `"TAB_ID NODE_ID"` | Clear input field | | scroll-to-element.py | `"TAB_ID NODE_ID"` | Scroll element into view | ### Scrolling | Tool | Usage | Description | |------|-------|-------------| | scroll-down.py | `"TAB_ID"` | Scroll down one viewport | | scroll-up.py | `"TAB_ID"` | Scroll up one viewport | ### Keyboard | Tool | Usage | Description | |------|-------|-------------| | send-keys.py | `"TAB_ID Enter"` | Send key (Enter, Tab, Escape, etc.) | ### Coordinates | Tool | Usage | Description | |------|-------|-------------| | click-coordinates.py | `"TAB_ID X Y"` | Click at x,y position | | type-at-coordinates.py | `"TAB_ID X Y 'text'"` | Click and type at position | ### Bookmarks & History | Tool | Usage | Description | |------|-------|-------------| | get-bookmarks.py | `""` or `"FOLDER_ID"` | Get bookmarks | | create-bookmark.py | `"Title URL"` | Create bookmark | | remove-bookmark.py | `"BOOKMARK_ID"` | Remove bookmark | | search-history.py | `"query"` or `"query 50"` | Search browser history | | get-recent-history.py | `""` or `"50"` | Get recent history | ## Example Workflows ### Extract 10 Tweets from X.com (Using Both JS and Screenshots) ```python # 1. Check availability run_skill("browser-os/check-availability.py", "") # 2. Navigate to X.com home run_skill("browser-os/navigate.py", "https://x.com/home") # 3. Wait for load run_skill("browser-os/get-load-status.py", "TAB_ID") # 4. First, check element count with JavaScript run_skill("browser-os/execute-javascript.py", '["TAB_ID", "document.querySelectorAll(\"article\").length.toString()"]') # Result: "5" # 5. Take screenshot to see what's actually on screen run_skill("browser-os/screenshot.py", '["TAB_ID", "List all visible tweets with their text and dates"]') # Vision API will describe the tweets it sees # 6. Try JavaScript extraction run_skill("browser-os/execute-javascript.py", '["TAB_ID", "var s=\"\"; document.querySelectorAll(\"article\").forEach(el => { s += el.innerText.slice(0,300) + \" ### \"; }); s;"]') # 7. If JS returns empty, use screenshot to extract directly run_skill("browser-os/screenshot.py", '["TAB_ID", "Extract the text and date of each tweet visible on screen. Return as a list."]') # 8. Scroll and repeat for more tweets run_skill("browser-os/scroll-down.py", "TAB_ID") run_skill("browser-os/screenshot.py", '["TAB_ID", "Extract tweets that are now visible"]') ``` **Key insight**: When JavaScript fails or returns incomplete data, screenshots can extract the same information visually. ### Click a Button and Verify ```python # 1. Get interactive elements run_skill("browser-os/get-interactive-elements.py", "TAB_ID") # Find the nodeId for your target button # 2. Click the element run_skill("browser-os/click-element.py", "TAB_ID NODE_ID") # 3. Wait for page update run_skill("browser-os/get-load-status.py", "TAB_ID") # 4. Verify the action worked run_skill("browser-os/get-page-content.py", "TAB_ID text") ``` ## Troubleshooting ### "No content extracted" - **Cause**: Page hasn't fully loaded - **Fix**: Wait for `get-load-status.py` to return "Page Complete: Yes" ### "BrowserOS helper service not connected" - **Cause**: Browser extension not connected to MCP server - **Fix**: check-availability.py auto-recovers. If persistent, user needs to restart BrowserOS manually ### "Failed to capture screenshot" - **Cause**: BrowserOS window not in foreground - **Fix**: Use `osascript -e 'tell application "BrowserOS" to activate'` or use text-based extraction instead ### "Request timed out after 60000ms" - **Cause**: Page is stuck or helper disconnected - **Fix**: Re-run check-availability.py to trigger recovery ### JavaScript returns `{}` (empty object) - **Cause**: Returning arrays/objects directly doesn't work - **Fix**: Use `.join(" ### ")` to convert array to string, then parse separator in response - See "CRITICAL: JavaScript Return Value Limitations" section above ### JavaScript returns empty string or no items - **Cause**: Wrong CSS selectors for the page, or content not loaded yet - **Fix**: 1. Use get-page-content.py first to verify page has content 2. Try simpler selectors (e.g., `article` instead of `[data-testid="tweetText"]`) 3. Check element count first: `document.querySelectorAll("article").length`