# browser > remotely orchestrate a (chromium) browser to utilize web apps - Author: me - Repository: mikesmullin/browser - Version: 20251223155704 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/mikesmullin/browser - Web: https://mule.run/skillshub/@@mikesmullin/browser~browser:20251223155704 --- --- name: browser description: remotely orchestrate a (chromium) browser to utilize web apps --- # Browser Uses [browser-use](https://github.com/browser-use/browser-use) (via Chrome DevTools Protocol) to browse any website. Remotely orchestrate a browser, to navigate the web. Meaning any task that can be accomplished via a browser. ## Instructions - read `README.md` to understand this project ## QUICK REFERENCE These are CLI commands you may use. | Task | Command | Purpose | |------|---------|---------| | Start Browser Server | `browser server start` | Start background browser server | | Stop Browser Server | `browser server stop` | Stop background server | | Server Status | `browser server status` | Check server status and latest logs summary | | View Live Logs | `browser server logs --follow` | Tail live server logs | | Get Page Status | `browser client status` | Get current page URL and title | | Navigate to URL | `browser client navigate URL` | Navigate browser to specified URL | | Execute JavaScript | `browser client execute JAVASCRIPT` | Execute JavaScript code (arrow function) | | Get DOM Content | `browser client dom [SELECTOR]` | Get HTML content of element (default: body) | | Fill Input Field | `browser client fill SELECTOR VALUE` | Fill form input field with value | | Click Element | `browser client click SELECTOR` | Click on specified element | | Click at Coordinates | `browser client click-at X Y` | Click at specific coordinates | | Visualize Elements | `browser client visualize [--csv]` | Generate Set-of-Marks visualization | | Detect Objects | `browser client detect [--csv]` | Run YOLO object detection | | Segment Objects | `browser client segment` | Run SAM segmentation | ## Examples Sample user prompt: > let's test this project: > - start browser service > - use client scripts to > - navigate to google.com > - type `hamster dance` into search box > - click search button > - read first 3 search results > - summarize them in your response to me > - stop the server Sample agent output (ie. `Gemini 3 Flash`): ``` GitHub Copilot: Created 6 todos Starting: *Start browser service* (1/6) Ran terminal command: browser server start Stdout: Starting browser server... ✅ Server started in background (PID 12345) Logs: /home/user/.browser_agent/server.log Starting: *Navigate to Google* (2/6) Ran terminal command: browser client navigate "https://www.google.com" Stdout: ✅ Navigated to https://www.google.com Starting: *Type search query* (3/6) Ran terminal command: browser client fill "[name='q']" "hamster dance" Stdout: ✅ Filled [name='q'] Starting: *Click search button* (4/6) Ran terminal command: browser client click "[name='btnK']" Stdout: ✅ Clicked [name='btnK'] Starting: *Read search results* (5/6) Ran terminal command: browser client execute "() => Array.from(document.querySelectorAll('#search h3')).slice(0, 3).map(h3 => ({title: h3.innerText, link: h3.closest('a')?.href}))" Stdout: Result: [{"title": "Hampton the Hamster \"The Hamsterdance Song\"", "link": "https://www.youtube.com/watch?v=1qN72LEQnaU"}, ...] Starting: *Stop server* (6/6) Ran terminal command: browser server stop Stdout: ✅ Server stopped (PID 12345) Completed: *Summarize results* The browser service was successfully started, and I performed the requested search for "hamster dance" on Google. Here are the first three search results: 1. **Hampton the Hamster "The Hamsterdance Song"** - [https://www.youtube.com/watch?v=1qN72LEQnaU](https://www.youtube.com/watch?v=1qN72LEQnaU) 2. **Hampton the Hamster - The Hamsterdance Song (Official Video ...** - [https://www.youtube.com/watch?v=6WpMlwVwydo](https://www.youtube.com/watch?v=6WpMlwVwydo) 3. **Hamster Dance - Wikipedia** - [https://en.wikipedia.org/wiki/Hamster_Dance](https://en.wikipedia.org/wiki/Hamster_Dance) ``` ### Visual Grounding Example Sample user prompt: > Find the "Sign in" button on this page using vision and click it. Sample agent output: ```bash # 1. Get visual coordinates of interactive elements (CSV is default) browser client visualize | grep -i "Sign in" # Output: 1194,28,5,Sign in # 2. Click at the identified coordinates browser client click-at 1194 28 ``` **Example Output:** ![Visualized Elements](docs/visualized_example.png) **CSV Representation (truncated):** ```csv 44,30,0,About 99,30,1,Store 1010,28,2,Gmail 1066,28,3,Images 1124,28,4, 1194,28,5,Sign in ``` ### Object Detection Example (YOLOv8) Sample user prompt: > Detect people and bicycles in the current image search results. ```bash browser client detect # Output: # 256,180,0,person # 240,320,1,bicycle # ... ``` **Example Output:** ![Detected Objects](docs/detected_example.png) ### Image Segmentation Example (SAM) Sample user prompt: > Segment the visual regions of the current page and click on the main logo (Segment 0). ```bash # 1. Run segmentation to get IDs and coordinates browser client segment # Output: # 450,300,0,segment # 120,50,1,segment # ... # 2. Click at the coordinates corresponding to ID 0 browser client click-at 450 300 ``` **Example Output:** ![Segmented Regions](docs/segmented_example.png) **How it works:** The `segment` command returns a CSV mapping (`x,y,id,label`). The `id` in the CSV matches the large number shown in the `segmented_*.png` image. An AI agent can look at the image to identify which segment it wants to interact with, find that ID in the CSV, and use the provided `x,y` coordinates for a `click-at` command.