# daedalus-simple

> Quick website extraction and site generation. Use when user wants to crawl, extract, process, or migrate a website. Handles the full flow from URL to running local site.

- Author: Tyler Kim
- Repository: tylertaewook/dispatch
- Version: 20260201162450
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/tylertaewook/dispatch
- Web: https://mule.run/skillshub/@@tylertaewook/dispatch~daedalus-simple:20260201162450

---

---
name: daedalus-simple
description: Quick website extraction and site generation. Use when user wants to crawl, extract, process, or migrate a website. Handles the full flow from URL to running local site.
---

# Daedalus Simple

Extract any website and generate a local Next.js site in two steps.

---

## IMPORTANT: Always Run in Background

**Daedalus commands are long-running and MUST be run in background terminal sessions using the PTY tools.**

```bash
# CORRECT: Use pty_spawn for all daedalus commands
pty_spawn: command="daedalus", args=["crawl-and-extract", "--url", "<URL>"], title="Daedalus Crawl"

# Then monitor progress with pty_read
pty_read: id="<session_id>", limit=50

# Check if still running
pty_list
```

**Why?**

- Crawling can take 10-30+ minutes depending on site size
- Extraction phases process pages one-by-one with LLM calls
- Running synchronously will timeout and fail

**Workflow Pattern:**

1. `pty_spawn` the daedalus command
2. Periodically `pty_read` to check progress
3. Use `pty_read` with `pattern="error|ERROR"` to check for issues
4. Wait for process to exit before starting next phase

---

## When to Use

Invoke this skill when the user:

- Wants to "crawl" or "extract" a website
- Wants to "process" or "migrate" a site
- Provides a URL and wants to create a site from it
- Asks to "scrape" or "copy" a website's content

---

## Workflow

### Step 1: Crawl and Extract

Spawn the `crawl-and-extract` command in a background PTY session:

```bash
pty_spawn: command="daedalus", args=["crawl-and-extract", "--url", "<URL>"], title="Daedalus Crawl"
```

Monitor progress periodically:

```bash
pty_read: id="<session_id>", limit=50
```

This command:

1. Crawls the website and saves raw HTML
2. Extracts site configuration (name, navigation, footer)
3. Discovers page templates automatically
4. Extracts structured content from all pages

**After completion, summarize results to the user:**

- Number of pages crawled
- Number of templates discovered
- Site name extracted
- Output location

Example summary:

> Extraction complete:
>
> - **Site**: Example Government Agency
> - **Pages**: 47 pages extracted
> - **Templates**: 5 templates discovered
> - **Output**: `/path/to/output-folder`

Then ask: **"Would you like me to generate a local site you can preview?"**

---

### Step 2: Generate Site (if user agrees)

Spawn the `generate-site` command in a background PTY session:

```bash
pty_spawn: command="daedalus", args=["generate-site"], title="Daedalus Generate"
```

Monitor until completion:

```bash
pty_read: id="<session_id>", limit=50
```

This command:

1. Uses AI to generate a Next.js site from extracted data
2. Creates minimal, editorial-style design
3. Installs dependencies automatically
4. Starts a local dev server

**After completion, provide the clickable link:**

> Site is ready: [http://localhost:4000](http://localhost:4000)

---

## Command Reference

All commands should be run via `pty_spawn` to handle long-running processes:

| Command                                                                     | Purpose                                    |
| --------------------------------------------------------------------------- | ------------------------------------------ |
| `pty_spawn: command="daedalus", args=["crawl-and-extract", "--url", <URL>]` | Crawl site and extract all content         |
| `pty_spawn: command="daedalus", args=["generate-site"]`                     | Generate Next.js site and start dev server |
| `pty_spawn: command="daedalus", args=["generate-site", "--no-server"]`      | Generate files only, don't start server    |
| `pty_spawn: command="daedalus", args=["generate-site", "--port", <N>]`      | Use specific starting port                 |

Use `pty_read` and `pty_list` to monitor progress.

---

## Example Conversation Flow

**User**: "Can you extract https://example.gov for me?"

**Agent**:

1. Set DAEDALUS_OUTPUT if needed
2. Spawn: `pty_spawn: command="daedalus", args=["crawl-and-extract", "--url", "https://example.gov"]`
3. Monitor progress with `pty_read` periodically
4. Wait for completion (check with `pty_list`)
5. Summarize: "Extracted 47 pages across 5 templates from Example Gov"
6. Ask: "Would you like me to generate a local preview site?"

**User**: "Yes"

**Agent**:

1. Spawn: `pty_spawn: command="daedalus", args=["generate-site"]`
2. Monitor with `pty_read` until server starts
3. Respond: "Your site is ready: [http://localhost:4000](http://localhost:4000)"

---

## Output Structure

After both commands complete:

```
$DAEDALUS_OUTPUT/
├── raw/                 # Crawled HTML pages
├── site-config.json     # Extracted site configuration
├── templates.json       # Discovered templates
├── extracted/           # Structured JSON for each page
└── site/                # Generated Next.js site
    ├── app/
    ├── lib/
    ├── package.json
    └── node_modules/
```

---

## Troubleshooting

**Crawl returns 0 pages**: The site may block crawlers. Check if the URL is accessible.

**Generate-site fails**: Ensure extracted data exists. Run `crawl-and-extract` first.

**Port already in use**: Use `--port 5000` to try a different port.

**AI generation fails**: The command falls back to default templates automatically.