# browser-os

> Control browser via BrowserOS MCP - use JavaScript and screenshots together for reliable extraction

- Author: KIrill Enkogu
- Repository: enkogu/pipeline-fullstack-app
- Version: 20260125155958
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/enkogu/pipeline-fullstack-app
- Web: https://mule.run/skillshub/@@enkogu/pipeline-fullstack-app~browser-os:20260125155958

---

---
name: browser-os
description: Control browser via BrowserOS MCP - use JavaScript and screenshots together for reliable extraction
---

# BrowserOS Skill

Browser automation via BrowserOS MCP. Enables navigating websites, extracting content, clicking elements, and executing JavaScript.

**Authentication**: User is already logged in to all services. Do not ask about login - just navigate and proceed.

## REQUIRED: Check Availability First

**ALWAYS start with availability check.** This verifies the helper extension is connected.

```
run_skill("browser-os/check-availability.py", "")
```

Returns:
- `{"success": true, "port": 9129}` → Proceed with your task
- `{"success": false, "error": "..."}` → Report to user, BrowserOS needs manual attention

Auto-recovery is built-in: if helper is disconnected, it will restart BrowserOS and retry.

## Critical: Wait for Page Load

After navigation, **ALWAYS wait for the page to fully load** before extracting content:

```python
# 1. Navigate
run_skill("browser-os/navigate.py", "https://x.com/home")

# 2. Wait and check load status (poll until complete)
run_skill("browser-os/get-load-status.py", "TAB_ID")
# Keep checking until: "DOM Content Loaded: Yes" AND "Page Complete: Yes"

# 3. Only then extract content
run_skill("browser-os/get-page-content.py", "TAB_ID text")
```

**If you skip waiting**, you'll get "No content extracted" or empty results.

## Content Extraction Strategy

**Use BOTH JavaScript AND screenshots together for reliable extraction.**

| Method | When to Use |
|--------|-------------|
| **execute-javascript.py** | Primary method - structured data, counts, specific elements |
| **screenshot.py** | Use alongside JS - visual verification, complex layouts, when JS returns empty |
| **get-page-content.py** | Quick text overview, finding general content |

**Best Practice**: After JavaScript extraction, take a screenshot to verify what was captured. If JS returns empty/incomplete, screenshot provides visual context to understand why and what's actually on screen.

### get-page-content.py - Text Extraction

```
run_skill("browser-os/get-page-content.py", "TAB_ID text")
run_skill("browser-os/get-page-content.py", "TAB_ID text-with-links")
```

Returns full page text. Good for reading content, finding information.

### execute-javascript.py - Structured Data Extraction

**Best for extracting lists, arrays, or structured data from pages.**

```
run_skill("browser-os/execute-javascript.py", '["TAB_ID", "JavaScript code here"]')
```

**IMPORTANT:**
- Use JSON array format: `'["TAB_ID", "code"]'`
- JavaScript must be single-line (use semicolons, not newlines)
- Top-level `return` statements are auto-wrapped in IIFE

## CRITICAL: JavaScript Return Value Limitations

**Arrays and objects return `{}` (empty object).** This is a known limitation.

✅ **WORKS** - Primitives and strings:
- `document.querySelectorAll("article").length` → `6`
- `document.querySelector("article").innerText` → `"Tweet content..."`
- `Array.from(...).map(...).join(" ### ")` → `"item1 ### item2 ### item3"`

❌ **FAILS** - Returns `{}`:
- `Array.from(document.querySelectorAll("article")).map(...)` → `{}`
- `[{text: "...", date: "..."}]` → `{}`
- `JSON.stringify([...])` → `{}`

**SOLUTION**: Always join arrays to a string with a separator:
```javascript
Array.from(document.querySelectorAll("article"))
  .slice(0, 10)
  .map(el => el.innerText.slice(0, 300))
  .join(" ### ")
```
Then parse the separator in your response.

## Robust Extraction Patterns

**For X.com/Twitter tweets:**
```
run_skill("browser-os/execute-javascript.py", '["TAB_ID", "Array.from(document.querySelectorAll(\\\"article\\\")).slice(0,15).map(el => (el.innerText.slice(0,300))).join(\\\" ||| \\\")"]')
```

**For LinkedIn profiles:**
```
run_skill("browser-os/execute-javascript.py", '["TAB_ID", "Array.from(document.querySelectorAll(\\\"[data-view-name=profile-card]\\\")).slice(0,5).map(el => el.innerText.slice(0,200)).join(\\\" ||| \\\")"]')
```

**AVOID fragile selectors** like `data-testid="tweetText"` - these change frequently. Use semantic selectors like `article`, `section`, or visible text patterns.

## Infinite Scroll Collection

When collecting N items from infinite-scroll feeds:

1. **Check initial count**: `document.querySelectorAll("article").length`
2. **Scroll down**: `scroll-down.py`
3. **Check new count** - if unchanged after 2 scrolls, the feed is exhausted
4. **Extract what's available** - if you have 11 but need 15, use 11 and note the limitation
5. **Max 5 scroll attempts** - don't loop indefinitely

```python
# Pattern for collecting items
count = execute_js("document.querySelectorAll('article').length")  # e.g., 6
scroll_down()
scroll_down()
new_count = execute_js("document.querySelectorAll('article').length")  # e.g., 11
# If new_count == count after scrolling, stop trying
```

**Tip**: For simple extraction, `get-page-content.py` often suffices. Use JavaScript only when you need structured arrays or specific DOM elements.

### screenshot.py - Visual Analysis

**Use screenshots proactively, not just as fallback.** Screenshots help you:
- Verify what JavaScript actually extracted
- Understand page layout when selectors aren't working
- Identify correct elements to target
- Debug why extraction returned empty results

```
run_skill("browser-os/screenshot.py", '["TAB_ID", "What elements are visible?"]')
run_skill("browser-os/screenshot.py", '["TAB_ID", "List all tweets with their text and dates"]')
run_skill("browser-os/screenshot.py", '["TAB_ID", "Query about the page", "large"]')
```

**When to take screenshots:**
- After JavaScript returns `{}` or empty string
- When you're unsure which CSS selectors to use
- After scrolling to verify new content loaded
- To visually confirm extracted data matches what's on screen

Requires BrowserOS window to be in foreground.

## Complete Tool Reference

### Tab Management

| Tool | Usage | Description |
|------|-------|-------------|
| get-active-tab.py | `""` | Get current tab ID and URL |
| list-tabs.py | `""` | List all open tabs |
| open-tab.py | `"URL"` | Open new tab with URL |
| close-tab.py | `"TAB_ID"` | Close tab by ID |
| switch-tab.py | `"TAB_ID"` | Switch to tab |
| get-load-status.py | `"TAB_ID"` | Check if page loaded |

### Navigation

| Tool | Usage | Description |
|------|-------|-------------|
| navigate.py | `"URL"` or `'["URL", "TAB_ID"]'` | Navigate to URL |

### Content Extraction

| Tool | Usage | Description |
|------|-------|-------------|
| get-page-content.py | `"TAB_ID text"` | Extract page text |
| execute-javascript.py | `'["TAB_ID", "code"]'` | Run JavaScript, get result |
| screenshot.py | `'["TAB_ID", "query"]'` | Screenshot with vision analysis |

### Element Interaction

| Tool | Usage | Description |
|------|-------|-------------|
| get-interactive-elements.py | `"TAB_ID"` | Get clickable elements with nodeIds |
| click-element.py | `"TAB_ID NODE_ID"` | Click element by nodeId |
| type-text.py | `"TAB_ID NODE_ID 'text'"` | Type into input |
| clear-input.py | `"TAB_ID NODE_ID"` | Clear input field |
| scroll-to-element.py | `"TAB_ID NODE_ID"` | Scroll element into view |

### Scrolling

| Tool | Usage | Description |
|------|-------|-------------|
| scroll-down.py | `"TAB_ID"` | Scroll down one viewport |
| scroll-up.py | `"TAB_ID"` | Scroll up one viewport |

### Keyboard

| Tool | Usage | Description |
|------|-------|-------------|
| send-keys.py | `"TAB_ID Enter"` | Send key (Enter, Tab, Escape, etc.) |

### Coordinates

| Tool | Usage | Description |
|------|-------|-------------|
| click-coordinates.py | `"TAB_ID X Y"` | Click at x,y position |
| type-at-coordinates.py | `"TAB_ID X Y 'text'"` | Click and type at position |

### Bookmarks & History

| Tool | Usage | Description |
|------|-------|-------------|
| get-bookmarks.py | `""` or `"FOLDER_ID"` | Get bookmarks |
| create-bookmark.py | `"Title URL"` | Create bookmark |
| remove-bookmark.py | `"BOOKMARK_ID"` | Remove bookmark |
| search-history.py | `"query"` or `"query 50"` | Search browser history |
| get-recent-history.py | `""` or `"50"` | Get recent history |

## Example Workflows

### Extract 10 Tweets from X.com (Using Both JS and Screenshots)

```python
# 1. Check availability
run_skill("browser-os/check-availability.py", "")

# 2. Navigate to X.com home
run_skill("browser-os/navigate.py", "https://x.com/home")

# 3. Wait for load
run_skill("browser-os/get-load-status.py", "TAB_ID")

# 4. First, check element count with JavaScript
run_skill("browser-os/execute-javascript.py", '["TAB_ID", "document.querySelectorAll(\"article\").length.toString()"]')
# Result: "5"

# 5. Take screenshot to see what's actually on screen
run_skill("browser-os/screenshot.py", '["TAB_ID", "List all visible tweets with their text and dates"]')
# Vision API will describe the tweets it sees

# 6. Try JavaScript extraction
run_skill("browser-os/execute-javascript.py", '["TAB_ID", "var s=\"\"; document.querySelectorAll(\"article\").forEach(el => { s += el.innerText.slice(0,300) + \" ### \"; }); s;"]')

# 7. If JS returns empty, use screenshot to extract directly
run_skill("browser-os/screenshot.py", '["TAB_ID", "Extract the text and date of each tweet visible on screen. Return as a list."]')

# 8. Scroll and repeat for more tweets
run_skill("browser-os/scroll-down.py", "TAB_ID")
run_skill("browser-os/screenshot.py", '["TAB_ID", "Extract tweets that are now visible"]')
```

**Key insight**: When JavaScript fails or returns incomplete data, screenshots can extract the same information visually.

### Click a Button and Verify

```python
# 1. Get interactive elements
run_skill("browser-os/get-interactive-elements.py", "TAB_ID")
# Find the nodeId for your target button

# 2. Click the element
run_skill("browser-os/click-element.py", "TAB_ID NODE_ID")

# 3. Wait for page update
run_skill("browser-os/get-load-status.py", "TAB_ID")

# 4. Verify the action worked
run_skill("browser-os/get-page-content.py", "TAB_ID text")
```

## Troubleshooting

### "No content extracted"
- **Cause**: Page hasn't fully loaded
- **Fix**: Wait for `get-load-status.py` to return "Page Complete: Yes"

### "BrowserOS helper service not connected"
- **Cause**: Browser extension not connected to MCP server
- **Fix**: check-availability.py auto-recovers. If persistent, user needs to restart BrowserOS manually

### "Failed to capture screenshot"
- **Cause**: BrowserOS window not in foreground
- **Fix**: Use `osascript -e 'tell application "BrowserOS" to activate'` or use text-based extraction instead

### "Request timed out after 60000ms"
- **Cause**: Page is stuck or helper disconnected
- **Fix**: Re-run check-availability.py to trigger recovery

### JavaScript returns `{}` (empty object)
- **Cause**: Returning arrays/objects directly doesn't work
- **Fix**: Use `.join(" ### ")` to convert array to string, then parse separator in response
- See "CRITICAL: JavaScript Return Value Limitations" section above

### JavaScript returns empty string or no items
- **Cause**: Wrong CSS selectors for the page, or content not loaded yet
- **Fix**:
  1. Use get-page-content.py first to verify page has content
  2. Try simpler selectors (e.g., `article` instead of `[data-testid="tweetText"]`)
  3. Check element count first: `document.querySelectorAll("article").length`