# writer-agent

> Transform documents into styled article series. Analyze input (md, txt, pdf, docx, pptx, xlsx, html, epub, images, url), extract core ideas, decompose into logical sections, write articles with user-selectable styles (professional, casual, custom), synthesize into organized output. Uses Docling for high-quality document conversion. Handles large documents with hierarchical summarization. Output to docs/generated/.

- Author: shun
- Repository: hoangvantuan/claude-plugin
- Version: 20260206144131
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/hoangvantuan/claude-plugin
- Web: https://mule.run/skillshub/@@hoangvantuan/claude-plugin~writer-agent:20260206144131

---

---
name: writer-agent
description: Transform documents into styled article series. Analyze input (md, txt, pdf, docx, pptx, xlsx, html, epub, images, url), extract core ideas, decompose into logical sections, write articles with user-selectable styles (professional, casual, custom), synthesize into organized output. Uses Docling for high-quality document conversion. Handles large documents with hierarchical summarization. Output to docs/generated/.
disable-model-invocation: true
version: 1.15.0
license: MIT
---

# Writer Agent

Transform documents and URLs into styled article series.

## Quick Reference

| Reference                                                             | Purpose                            |
| --------------------------------------------------------------------- | ---------------------------------- |
| [directory-structure.md](references/directory-structure.md)           | Output folder layout               |
| [decision-trees.md](references/decision-trees.md)                     | Workflow decision guides           |
| [retry-workflow.md](references/retry-workflow.md)                     | Error recovery procedures          |
| [large-doc-processing.md](references/large-doc-processing.md)         | Handling documents >50K words      |
| [article-writer-prompt.md](references/article-writer-prompt.md)       | Subagent prompt templates          |
| [context-extractor-prompt.md](references/context-extractor-prompt.md) | Context extraction template        |
| [context-optimization.md](references/context-optimization.md)         | Context optimization anti-patterns |
| [performance-benchmarks.md](references/performance-benchmarks.md)     | Measured performance test cases    |
| [detail-levels.md](references/detail-levels.md)                       | Output detail level options        |

## Workflow Overview

**Direct Path (<20K words OR <50K words with <=3 articles):**

Main agent writes all articles directly without subagents.

```
Input → Convert → Plan → Write(main) → Synthesize → Verify
  1        1        3         4            5           6
```

**Standard (Tier 1-2, 20K-100K words):**

```
Input → Convert → Analyze → Extract → Write → Synthesize → Verify
  1        1         3         3        4         5           6
```

**Fast Path (Tier 3, >=100K words):**

```
Input → Convert → Plan → Write(parallel) → Synthesize → Verify
  1        1        3          4              5           6
```

## Step 1: Input Handling

Detect input type and convert to markdown.

| Input Type               | Detection               | Action                    |
| ------------------------ | ----------------------- | ------------------------- |
| File (PDF/DOCX/EPUB/etc) | Path + extension        | `wa-convert {path}`       |
| URL                      | `http://` or `https://` | `wa-convert {url}`        |
| Plain text / .txt / .md  | No complex extension    | Rewrite → `wa-paste-text` |

### File/URL Conversion

```bash
.claude/skills/writer-agent/scripts/wa-convert [/path/to/file.pdf or url]
```

**Output**: `docs/generated/{slug}-{timestamp}/input-handling/content.md`

### Plain Text Processing

1. Read content (if file)
2. Rewrite to structured markdown (add headings, preserve content)
3. Propose title
4. Execute:

```bash
echo "{rewritten_content}" | .claude/skills/writer-agent/scripts/wa-paste-text - --title "{title}"
```

### Error Handling

| Error              | Action                         |
| ------------------ | ------------------------------ |
| File not found     | Ask for correct path           |
| Unsupported format | Try Docling, confirm with user |
| URL fetch failed   | Report and stop                |
| Empty content      | Warn, confirm before continue  |
| Encrypted PDF      | Ask for decrypted version      |

## Step 2: Select Style

Use `AskUserQuestion` to confirm output style.

| Style                    | File                          | Voice                           |
| ------------------------ | ----------------------------- | ------------------------------- |
| Professional             | `professional.md`             | Formal, data-driven, 3rd person |
| Casual                   | `casual.md`                   | Friendly, 2nd person (you)      |
| Contemplative            | `contemplative.md`            | Observational, "the writer"     |
| Explanatory              | `explanatory.md`              | Teaching, "we" together         |
| Ethan Mollick            | `ethan-mollick.md`            | Practitioner-explorer, "I"      |
| Andrew Ng                | `andrew-ng.md`                | Educator-democratizer, warm     |
| Thích Nhất Hạnh          | `thich-nhat-hanh.md`          | Mindful, compassionate, "we"    |
| Mindful Educator         | `mindful-educator.md`         | Depth + practice + mindfulness  |
| Thu Giang Nguyễn Duy Cần | `thu-giang-nguyen-duy-can.md` | Scholar, East-West philosophy   |
| Introspective Narrative  | `introspective-narrative.md`  | Personal journey, "I"           |
| Mindful Dialogue         | `mindful-dialogue.md`         | Master-student dialogue         |
| Mindful Storytelling     | `mindful-storytelling.md`     | First person storytelling       |

Style files: `output_styles/{style}.md`

## Step 2.5: Select Detail Level

Use `AskUserQuestion` to confirm output detail level.

| Level         | Ratio  | Description                         |
| ------------- | ------ | ----------------------------------- |
| Concise       | 15-25% | Tóm lược, giữ ý chính               |
| **Standard**  | 30-40% | Cân bằng (Recommended)              |
| Comprehensive | 50-65% | Chi tiết, giữ nhiều ví dụ           |
| Faithful      | 75-90% | Gần như đầy đủ, viết lại theo style |

**Default**: Standard (if user skips or unclear)

### Calculate Target Words (Tham khảo)

**LƯU Ý**: Target words chỉ mang tính tham khảo. PASS/FAIL dựa trên section coverage, không phải word count.

```
target_ratio = midpoint of selected level
total_target = source_words × target_ratio

Per article (reference only):
article_target = (article_source_words / source_words) × total_target
# Word count để định hướng, không bắt buộc đạt chính xác
```

### Understanding Detail Level Parameters

**Two complementary concepts:**

1. **`target_ratio`**: Controls total article length relative to source

   * Standard level: 30-40% (midpoint 35%)

   * This ratio applies to the entire article wordcount

2. **`example_percentage`**: Controls retention of examples within kept content

   * Standard level: 60% of examples

   * This percentage applies only to example sections

**Worked example (Standard level, 35% target ratio, 60% examples):**

* Source section: 5,000 words total

  * Main explanatory content: 4,000 words

  * Examples (10 examples): 1,000 words

* Target article length: 5,000 × 0.35 = 1,750 words

* Keep 60% of examples: 6 examples ≈ 600 words

* Remaining budget for main content: 1,750 - 600 = 1,150 words

* Main content compression: 1,150 / 4,000 = 28.75% (summarized)

**Key insight:** Higher `example_percentage` (60%) than overall `target_ratio` (35%) means examples are preserved more than prose, reflecting their teaching value.

See [detail-levels.md](references/detail-levels.md) for full specification.

## Step 2.6: Tier Reference Table

**Canonical tier definitions** (referenced throughout documentation):

| Tier            | Word Count                     | Strategy                       | Context Approach   | Glossary                    | max\_concurrent |
| --------------- | ------------------------------ | ------------------------------ | ------------------ | --------------------------- | --------------- |
| **Direct Path** | <20K OR (<50K AND ≤3 articles) | Main agent writes all          | N/A (no subagents) | Inline (\~200 words)        | N/A             |
| **Tier 1**      | 20K-50K                        | Subagents read source directly | No context files   | Inline (\~200 words)        | 3               |
| **Tier 2**      | 50K-100K                       | Smart compression              | Context extractors | Separate file (\~600 words) | 3               |
| **Tier 3**      | >=100K                         | Fast Path, minimal overhead    | No context files   | Inline (\~300 words)        | 2               |

**Priority rules:**

* Direct Path conditions are checked FIRST and override tier boundaries

* Documents 20K-50K with ≤3 articles use Direct Path (not Tier 1)

* Only documents failing Direct Path conditions fall through to tier selection

**Key differences:**

* Direct Path: Main agent handles everything (no subagents)

* Tier 1: Lightweight subagents, read source via line ranges

* Tier 2: Context extraction for compression (only tier with separate glossary file)

* Tier 3: Like Tier 1 but larger chunks, more selective glossary, lower concurrency

## Step 3: Analyze

**Goal**: Create analysis artifacts for article generation.

### 3.0 Processing Path Selection

Read `structure.json` → use `direct_path` field (computed by `wa-convert` v1.2+):

```
structure.json → direct_path.eligible?
├─ YES AND direct_path.capacity_ok?
│   └─ DIRECT PATH
│       └─ Skip context extraction
│       └─ Main agent writes ALL articles
│       └─ ~30% faster for small documents
│
├─ YES BUT NOT direct_path.capacity_ok?
│   └─ WARN: direct_path.warning
│   └─ RECOMMEND: Use Tier 1 with subagents instead
│
├─ NO AND tier_recommendation.tier <= 2?
│   └─ STANDARD PATH (3.1-3.5)
│
└─ NO AND tier_recommendation.tier == 3?
    └─ FAST PATH (Tier 3)
```

> **Note**: `direct_path` fields in structure.json (since v1.2) include `eligible`, `capacity_ok`, `capacity_limit`, and `warning`. These are pre-computed based on word count, estimated article count, and detected language. Main agent does NOT need to recalculate these values.

**Examples:**

| Document     | Words | Articles | Path            | Reason                                        |
| ------------ | ----- | -------- | --------------- | --------------------------------------------- |
| Blog post    | 15K   | 5        | Direct          | <20K words (first condition) ✓                |
| Tutorial     | 45K   | 3        | Direct          | <50K AND ≤3 articles (second condition) ✓     |
| Long guide   | 48K   | 3        | Direct → Tier 1 | Exceeds max\_words for mixed (38K) ⚠️         |
| Paper        | 45K   | 4        | Standard        | Fails both conditions (4 > 3) → use subagents |
| Book chapter | 67K   | 8        | Standard        | Tier 2: smart compression                     |
| Full book    | 142K  | 12       | Fast            | Tier 3: reference-based                       |

> **Note**: Direct Path capacity limit depends on language: EN ~44K, VI ~32K, mixed ~38K words. Use `structure.json → language` field for accurate limit.

### 3.1 Structure Scan

> **📖 READ FIRST**: [context-optimization.md](references/context-optimization.md) explains anti-patterns that waste 50%+ context budget. Review before proceeding.

**Quick path** (if `structure.json` exists):

* **ONLY** read `structure.json` for outline, stats, tier recommendation

* **DO NOT** read `content.md` - it wastes context budget

* Skip manual scanning (outline already in JSON)

**Fallback** (if `structure.json` missing):

⚠️ **WARNING**: Fallback mode loses 51% context optimization. Re-run `wa-convert` to generate `structure.json` if possible.

Manual scan using efficient commands:

```bash
# Extract heading structure without reading full file
grep -n "^#" docs/generated/{slug}/input-handling/content.md | head -100

# Or use line-based sampling (first 100 lines for overview)
Read(file_path, offset=1, limit=100)  # Only to extract headings
```

⚠️ **CRITICAL**: Do NOT read full `content.md` during structure scan! For all tiers, subagents will read source content directly when writing articles. Reading it now wastes 90%+ context budget. See [context-optimization.md](references/context-optimization.md) for budget examples and common mistakes.

### 3.1.1 Tier 3 Fast Path (>=100K words)

For very large documents, minimize analysis overhead:

| Action | Detail                                                            |
| ------ | ----------------------------------------------------------------- |
| SKIP   | `_inventory.md`, `_glossary.md`, context files                    |
| CREATE | Minimal `_plan.md` (section-to-article mapping + line ranges)     |
| EMBED  | Key terms (\~300 words) + dependencies inline in subagent prompts |
| SPAWN  | Subagents immediately after `_plan.md` (continuous batching)      |

**Context savings**: \~40% reduction in main agent context.

See [large-doc-processing.md](references/large-doc-processing.md#tier-3-fast-path) for `_plan.md` format, subagent prompt template, and workflow details.

### 3.2 Content Inventory (`analysis/_inventory.md`)

**Skip for Tier 3**: Use `structure.json` outline directly.

Create section registry with IDs:

```markdown
| ID  | Section Title | Line Range | Words | Critical |
| --- | ------------- | ---------- | ----- | -------- |
| S01 | Introduction  | 1-50       | 650   |          |
| S02 | Core Argument | 51-120     | 900   | \*       |
```

**\* Critical criteria** (mark if ANY match):

* Contains >3 direct quotes

* Defines foundational terms

* Core thesis/argument

* Data tables, specs, proofs, code blocks

* Max 30% sections should be critical

### 3.3 Article Plan (`analysis/_plan.md`)

**Check user request first:**

```python
# Priority: User request > Auto-split
if user_specified_article_count:
    # User yêu cầu số bài cụ thể (ví dụ: "chia thành 5 bài")
    target_articles = user_specified_count
    skip_auto_split = True
    # Phân bổ sections đều cho các bài, không chia nhỏ thêm
else:
    # Auto mode: chia thành 3-7 bài, mỗi bài ~10 phút đọc
    target_articles = calculate_optimal_articles(total_words, detail_ratio)
    skip_auto_split = False
```

Group sections into articles (default 3-7, or user-specified count):

```markdown
| #   | Slug  | Title         | Sections      | Est. Words | Reading Time |
| --- | ----- | ------------- | ------------- | ---------- | ------------ |
| 1   | intro | Introduction  | S01, S02      | 2000       | ~13 min      |
| 2   | core  | Core Concepts | S03, S04, S05 | 2500       | ~13-15 min   |
```

**Rules**:

* All sections must be mapped. Coverage check at end.

* Target \~13-15 phút đọc/bài (2000-3000 từ)

* Nếu user chỉ định số bài → tuân theo, không auto-split thêm

**Series Context (QUAN TRỌNG - tạo cùng lúc với plan):**

Khi tạo `_plan.md`, đồng thời xác định:

```markdown
## Series Context

Core message: "{1-2 câu thông điệp cốt lõi}"

| #   | Title | Role        | Reader Journey           |
| --- | ----- | ----------- | ------------------------ |
| 1   | Intro | foundation  | Người đọc bắt đầu hiểu X |
| 2   | Core  | development | Từ X, đi sâu vào Y       |
| 3   | Adv   | climax      | Kết nối Y với Z          |
```

Thông tin này sẽ được embed vào `SERIES_CONTEXT` block trong mỗi subagent prompt (xem [article-writer-prompt.md](references/article-writer-prompt.md#series-context-block)).

### 3.3.1 Article Splitting (Auto)

**Trigger**: After Step 3.3, before Step 3.4. Check each planned article.

**Priority rules:**

1. **User-specified count**: Nếu user yêu cầu số bài cụ thể → tuân theo, KHÔNG auto-split
2. **Auto-split**: Chỉ áp dụng khi user KHÔNG yêu cầu số bài cụ thể

**Key constants:**

* `MAX_OUTPUT_WORDS = 3000` (\~15 min reading time)

* `TARGET_PART_WORDS = 2000` (\~13 min reading time)

* Atomic unit = H2 block (H2 + H3 children). NEVER split within paragraph, H3, or critical section.

**When to split**: `estimated_output = source_words × detail_ratio > MAX_OUTPUT_WORDS`

**Algorithm**: Greedy grouping of H2 blocks, no minimum. See [large-doc-processing.md#article-splitting-strategy](references/large-doc-processing.md#article-splitting-strategy) for full algorithm and validation matrix.

**Validate after split:**

```bash
uv run .claude/skills/writer-agent/scripts/validate_split.py docs/generated/{book}/analysis/_plan.md
```

**Part naming**: `02-core.md` → `02-core-part1.md`, `02-core-part2.md`

**Context bridging**: For Part N > 1, provide prev part topics, last paragraph, key concepts. See [article-writer-prompt.md#multi-part-article-template](references/article-writer-prompt.md#multi-part-article-template).

### 3.4 Shared Context (Inline Glossary)

⚠️ **TIMING**: Execute AFTER Steps 3.1-3.3 complete, BEFORE Step 3.5.

**Strategy by tier:**

```
word_count < 50,000 (Tier 1)?
├─ YES → Extract inline glossary (~200 words) from first ~300 lines
│   └─ Embed in each subagent prompt
│   └─ Skip separate _glossary.md
│   └─ Saves 1 Read call per subagent (~400 words saved)
│
├─ 50,000 <= word_count < 100,000 (Tier 2)?
│   └─ SKIP inline extraction (context extractors in Step 3.5 will create _glossary.md)
│       └─ Article writers (Step 4) will read shared _glossary.md file (~600 words)
│       └─ More comprehensive terminology than inline approach
│
└─ word_count >= 100,000 (Tier 3)?
    └─ Extract inline glossary (~300 words) from first ~500 lines
        └─ Embed in each subagent prompt
        └─ Skip separate _glossary.md
        └─ Larger than Tier 1 due to more technical terminology
```

**Extraction algorithm**: See [context-optimization.md#glossary-extraction-algorithm](references/context-optimization.md#glossary-extraction-algorithm) for detailed process.

**Quick process**:

1. Read content.md first 300-500 lines (tier-dependent)
2. Extract terms using definition patterns
3. Score by importance (frequency + position)
4. Take top N terms until hitting word budget
5. Format: `Term: definition (~20 words each)`

**Tier 3 inline rationale:**

* Avoids 1 Read call per subagent (saves \~400 words/subagent)

* Trade-off: More selective terms (300 vs 1000) but faster execution

* Larger than Tier 1 (300 vs 200 words) because large documents have more technical terminology

* Combined with reading source directly via line ranges = maximum efficiency

**Inline glossary format**:

```markdown
## Terms

Term1: definition (~20 words)
Term2: definition
Term3: definition
```

Article dependencies: Embed 1-2 sentences in prompt, not separate file.

### 3.5 Context Files

**Skip for**:

* Tier 1 (<50K words): Subagents read source directly via line ranges

* Tier 3 (>=100K words): Subagents read source directly via line ranges

* Direct Path (<20K words): Main agent writes directly

**Decision** (see [decision-trees.md#3](references/decision-trees.md#3-context-extraction-strategy) for full tree):

* Tier 1/3 or <20K words: Skip context files (subagents read source directly via line ranges)

* Tier 2 (50K-100K): Spawn context extractor subagents (batch: min(3, article\_count))

* Template: `templates/_context-file-template.md`

Each context file: `analysis/XX-{slug}-context.md`

### 3.6 Quality Gate: Analysis Complete

Before proceeding to Step 4, verify:

* [ ] All sections have IDs (from structure.json or \_inventory.md)

* [ ] Critical sections marked (\* auto-detected in structure.json)
  * **Guideline**: Thường <=30% sections là critical

  * **If >30%**: Tự động ghi nhận trong `_plan.md`, KHÔNG cần user confirmation

    * Document: "High critical ratio: {ratio}% - technical content"

    * Tiếp tục workflow bình thường

  * **If >50%**: Tự động chuyển sang Tier 3 strategy (read source directly)

    * KHÔNG cần STOP hoặc ask user

    * Tier 3 xử lý được high critical ratio vì đọc source trực tiếp

    * Ghi log: "Auto-escalated to Tier 3 due to high critical ratio"

  * **Rationale**: Tự động xử lý thay vì blocking workflow để hỏi user

* [ ] Article plan covers 100% sections

* [ ] For Tier 3: \_plan.md created with line ranges

## Step 4: Write Articles

### 4.0 State Tracking (Recommended)

For resume and retry support, create/update `analysis/_state.json`. Required if retry-workflow is needed (see [retry-workflow.md](references/retry-workflow.md)):

```json
{
  "status": "in_progress",
  "current_step": 4,
  "completed_articles": ["00-overview.md"],
  "pending_articles": ["01-intro.md", "02-core.md"]
}
```

See [retry-workflow.md](references/retry-workflow.md#state-persistence) for details.

For selective re-runs (style change or single article rewrite), see [retry-workflow.md#selective-re-run](references/retry-workflow.md#selective-re-run).

### 4.1 Overview Article (Phase 1)

Write `00-overview.md` in **main context**:

* Requires full series knowledge

* Template: `templates/_overview-template.md`

* Target: 300-400 words (initial)

* Include placeholders for Key Takeaways and Article Index

**Phase 1 content**:

* Hook + Introduction + Why It Matters + What You'll Learn

* Placeholder sections for Điểm chính and Mục lục

### 4.2 Content Articles

**Direct Path** (<20K words): Main agent writes all articles directly.

**Standard/Fast Path**: Spawn subagents for articles 01+:

```
Task tool:
- subagent_type: "general-purpose"
- description: "Write: {title}"
- prompt: [Use references/article-writer-prompt.md]
```

**Multi-Part Articles** (from Step 3.3.1):

For split articles, spawn each part sequentially within the article:

```
# Article 2 was split into 3 parts
1. Spawn 02-core-part1.md
2. Wait for completion → extract context bridge
3. Spawn 02-core-part2.md (with context from part1)
4. Wait → extract context bridge
5. Spawn 02-core-part3.md (with context from part2)

# Other articles can run in parallel
# e.g., 01-intro.md and 03-advanced.md can run while part2 waits
```

**Context bridge extraction:**

```python
def extract_context_bridge(completed_part):
    """Extract context for next part from completed part"""
    article_content = read(completed_part.output_path)
    return {
        'prevPartTopics': extract_h2_titles(article_content),
        'prevPartEnding': get_last_paragraph(article_content, max_words=50),
        'keyConceptsFromPrev': extract_bold_terms(article_content)
    }
```

**Prompt validation (optional, for debugging):**

```bash
echo "{prompt_text}" | uv run .claude/skills/writer-agent/scripts/validate_prompt.py --tier {1|2|3} --stdin
```

Validates all required template variables are present. Exit code 0 = PASS, 1 = missing variables.

**Continuous Batching** (preferred over static batching):

* Tier 1-2: `max_concurrent = 3` (smaller chunks \~3.5K words)

* Tier 3: `max_concurrent = 2` (larger chunks \~10K words)

* Dynamic adjustment: large chunks (>8K) → reduce to 2, all small (<2K) → increase to 5

* On any completion → spawn next immediately (no batch waiting)

* **Benefits**: 25-35% faster than static batching

See [large-doc-processing.md#continuous-batching-vs-static](references/large-doc-processing.md#continuous-batching-vs-static) for full algorithm and [performance-benchmarks.md](references/performance-benchmarks.md) for benchmarks.

**Progress Reporting**:

After each article completes, update TaskUpdate:

* Format: `"Writing articles: {completed}/{total} completed"`

* Example: `"Writing articles: 3/7 completed"`

* Do NOT include time estimates

### 4.3 SoT Pattern (Long Articles)

**When to use** Skeleton-of-Thought: estimated output >2000 words AND >=5 subsections (H3 preferred, fallback to H2).

**Quick decision**: `h3_count >= 5` → SoT. `h3 == 0 AND h2 >= 5` → SoT. Otherwise → standard write.

**Workflow**: Phase 1 (skeleton) → Phase 2 (expand ALL sections parallel) → Phase 3 (merge + transitions)

**Benefits**: 45-50% faster for long articles. See [article-writer-prompt.md#sot-pattern](references/article-writer-prompt.md#sot-pattern) for template and [performance-benchmarks.md#test-case-5](references/performance-benchmarks.md#test-case-5-sot-pattern-long-article) for benchmarks.

**Limitations**: Priority 3 (paragraph breaks) not implemented. Ambiguous structure → default to standard write.

### 4.4 Coverage Tracking

Subagent reports coverage in **return message** (not in article file) using **table format**. See [Step 5.2](#52-coverage-aggregation) for aggregation details.

**IMPORTANT**: PASS/FAIL chỉ dựa trên section coverage, không phải word count. Word count chỉ mang tính thống kê.

**Subagent return format** (2-column, see article-writer-prompt.md):

```markdown
DONE: {filename} | {N} words (stats)
COVERAGE (determines PASS/FAIL):
| Section | Status |
|---------|--------|
| S01 | ✅ quoted |
| S02 ⭐ | ✅ verbatim |
RESULT: PASS # PASS nếu all sections covered
```

**Tiêu chí PASS/FAIL:**

* **PASS**: Tất cả sections được assigned đều được covered (100% section coverage)

* **FAIL**: Có section bị missing hoặc skipped không hợp lệ

* **Word count**: Chỉ thống kê, KHÔNG ảnh hưởng PASS/FAIL

Main agent enriches with "Assigned To" and "Used In" columns → aggregates into `_coverage.md` (4-column format, see [Step 5.2](#52-coverage-aggregation)).

### 4.5 Critical Sections

**\* sections MUST be included verbatim**:

* Never summarize critical sections

* If unable to include fully → flag for review

### 4.6 Quality Gate: Articles Complete

Before proceeding to Step 5, verify:

* [ ] All articles written (check pending list)

* [ ] **Each article has "## Các bài viết trong series" at end** (check `SERIES_LIST: YES` in subagent return)
  * If `SERIES_LIST: NO` → Append series list to article file before continuing

* [ ] Coverage reports collected from all subagents

* [ ] No placeholder text in articles

* [ ] Source verification quotes provided

* [ ] Opening of each article is NOT mechanical ("Trong bài này...")

## Step 5: Synthesize

### 5.1 Update Overview (Phase 2)

Update `00-overview.md` with actual content for placeholder sections:

**Điểm chính** (Key Takeaways):

```markdown
## Điểm chính

1. **[Concept 1]**: [Brief explanation from series]
2. **[Concept 2]**: [Brief explanation from series]
3. **[Concept 3]**: [Brief explanation from series]
```

**Các bài viết trong series** (Series List):

```markdown
## Các bài viết trong series

1. **Tổng quan - Brief description** _(đang xem)_
2. [Article 1 Title](./01-slug.md) - Brief description
3. [Article 2 Title](./02-slug.md) - Brief description
```

**Final overview target**: 400-600 words

### 5.2 Coverage Aggregation

Collect subagent coverage tables → aggregate into `analysis/_coverage.md`

**Process**:

1. Each subagent returns a 2-column COVERAGE TABLE (`| Section | Status |`) in their return message
2. Main agent enriches each row with "Assigned To" and "Used In" columns (from `_plan.md`)
3. Concatenate all enriched tables into single `_coverage.md` file (4-column format)
4. Add summary statistics at bottom

**Column enrichment**: Main agent knows which article each subagent wrote, so it adds:

* `Assigned To`: article filename (from `_plan.md`)

* `Used In`: same as Assigned To (or different if reassigned during writing)

**Coverage file format** (required by `validate_coverage.py`):

```markdown
## Section Coverage Matrix

| Section | Assigned To   | Used In       | Status        |
| ------- | ------------- | ------------- | ------------- |
| S01     | 01-article.md | 01-article.md | ✅ summarized |
| S02 ⭐  | 01-article.md | 01-article.md | ✅ verbatim   |

- Total: {N} | Used: {N} | Missing: {N}
```

**Format rules**:

* Column 1: `S{NN}` with optional `⭐` for critical sections

* Column 4:

  * For used sections: `✅` followed by one of: `used`, `verbatim`, `quoted`, `summarized`

  * For skipped sections: `⚠️ skipped` (requires Notes column with reason)

* Summary line at bottom: `- Total: {N} | Used: {N} | Missing: {N}`

**Edge case examples**:

```markdown
| Section | Assigned To | Used In               | Status        | Notes                      |
| ------- | ----------- | --------------------- | ------------- | -------------------------- |
| S05     | 01-intro.md | 02-core.md            | ✅ summarized | Reassigned during planning |
| S08     | 02-core.md  | 02-core.md, 03-adv.md | ✅ quoted     | Shared across articles     |
| S12     | 03-adv.md   | -                     | ⚠️ skipped    | Redundant with S08         |
```

**Multi-Part Article Coverage:** For split articles, track by part in `_coverage.md`. Each section row should sum to ~100%. See [large-doc-processing.md#coverage-tracking](references/large-doc-processing.md#coverage-tracking) for format and validation rules.

**Edge case rules:**

1. **Reassignment**: Section moved to different article (common when planning adjusts)

   * "Assigned To" shows original plan

   * "Used In" shows actual article that included it

   * Validate coverage in "Used In" article

2. **Shared sections** (one section used in multiple articles):

   * Format: `Used In` = comma-separated list (e.g., `02-core.md, 03-adv.md`)

   * Validation: Each article in the list MUST contain \[Sxx] reference

   * Check both articles include the section (quoted, summarized, or paraphrased)

   * Status reflects how primary article used it

3. **Skipped**: Must document reason (redundant, off-topic, user instruction)

   * Status: `⚠️ skipped` (not `✅`)

   * Notes column required with explicit reason

Run validation:

```bash
uv run .claude/skills/writer-agent/scripts/validate_coverage.py docs/generated/{book}/analysis/_coverage.md
```

## Step 6: Verify

### 6.1 Coverage Check

**Soft target**: Coverage nên đạt >=95% (không bắt buộc retry)

```
Coverage results:
├─ >= 95% → PASS (tiếp tục)
├─ 90-94% → WARNING (ghi nhận, không retry tự động)
│   └─ Chỉ retry nếu user yêu cầu
├─ < 90% → ASK USER
│   └─ Option 1: Accept as-is
│   └─ Option 2: Retry specific articles
│   └─ Option 3: Create supplementary
```

**QUAN TRỌNG**: Không tự động retry để đạt coverage target. Việc retry tốn token và thời gian, thường không cải thiện đáng kể.

### 6.2 Quality Checklist

* [ ] All articles written, reader-ready (no metadata)

* [ ] Overview updated with Key Takeaways and Series List

* [ ] **All articles have "## Các bài viết trong series" at the end** (MANDATORY)

* [ ] All links in series lists verified

* [ ] \_coverage.md reported (>=95% target, >=90% acceptable)

* [ ] Critical \* sections included (verbatim preferred)

* [ ] Warnings logged for any skipped sections

* [ ] **No mechanical openings** ("Trong bài này...", "Bài viết sẽ trình bày...")

* [ ] **No mechanical closings** ("Tóm lại, bài viết đã...", "Trong phần tiếp theo...")

### 6.3 Error Recovery (User-Driven)

| Error               | Action                         | Auto-retry? |
| ------------------- | ------------------------------ | ----------- |
| Subagent timeout    | Report to user, ask what to do | ❌ NO        |
| Missing output      | Log warning, continue          | ❌ NO        |
| Style mismatch      | Report, user decides           | ❌ NO        |
| Content fabrication | Flag for user review           | ❌ NO        |
| Coverage < 90%      | Ask user for decision          | ❌ NO        |

**Nguyên tắc**: Không tự động retry. User có toàn quyền quyết định.

See [retry-workflow.md](references/retry-workflow.md) for user decision flow.

## Content Guidelines

### Source Fidelity

* Use ONLY source material, no fabrication

* **REWRITE all non-critical content in output style voice** - Source defines WHAT to say, Style defines HOW to say it

* DO NOT copy-paste sentences from source (except \* critical sections)

* Maintain original terminology (terms stay the same, but sentences must be rewritten)

* \* Critical sections: verbatim, never summarize

* Non-critical sections: MUST be rewritten in the selected output style's voice, structure, and language patterns

* VERIFY quotes prove source origin, but article content must be rewritten (not copied)

### Writing Quality

**Narrative Coherence:**

* Mỗi bài viết phải có mạch logic riêng, KHÔNG phải tóm tắt tuần tự từng section

* Sections phải nối với nhau bằng bridges (logical hoặc emotional), không phải "Tiếp theo..."

* Draw connections giữa các ý trong bài VÀ với thông điệp cốt lõi của series

**Opening & Closing (quyết định ấn tượng):**

* Opening: Hook compelling (câu hỏi, hình ảnh, khoảnh khắc). TRÁNH: "Trong bài này chúng ta sẽ..."

* Closing: Kết resonant (câu hỏi mở, hình ảnh, lời mời). TRÁNH: "Tóm lại, bài viết đã trình bày..."

* Mechanical phrases BLACKLIST: "Trong phần tiếp theo", "Như đã đề cập ở trên", "Bài viết này sẽ", "Tóm lại"

**Depth vs Breadth:**

* Khi một ý quan trọng: đi SÂU (ví dụ, implications, câu hỏi) thay vì liệt kê

* Khi nhiều ý nhỏ: nhóm lại thành pattern/theme, không liệt kê từng ý riêng lẻ

* Priority: 2-3 key insights explored deeply > 10 points listed superficially

**Reader Engagement:**

* Đặt câu hỏi cho người đọc (rhetorical hoặc reflective)

* Dùng ví dụ cụ thể, relatable thay vì abstract

* Tạo tension/curiosity trước khi giải đáp

* Vary sentence length: xen kẽ câu ngắn và dài

### Formatting

* Link between articles with relative paths

* Track all sections with \[Sxx] IDs

* NO markdown tables in article output - use bullet points instead

* NO diagrams (mermaid, ASCII, flowcharts) - describe in prose or bullets

### Series List (MANDATORY)

* **MỖI bài viết PHẢI có "## Các bài viết trong series" ở cuối** - Thiếu = FAIL

* Mark current article with _(đang xem)_

* Validation: Subagent return format includes `SERIES_LIST: YES/NO`

* Main agent MUST check `SERIES_LIST: YES` trước khi accept article

## Cài đặt thư viện mới

Skill sử dụng virtual environment tại `.claude/skills/writer-agent/scripts/.venv`. Khi cần cài thêm thư viện, **PHẢI activate venv trước**:

```bash
# 1. Activate venv
source .claude/skills/writer-agent/scripts/.venv/bin/activate

# 2. Cài package
uv pip install <package>

# 3. Cập nhật requirements.txt
uv pip freeze > .claude/skills/writer-agent/scripts/requirements.txt
```

**KHÔNG dùng:**

* `uv pip install <package>` khi chưa activate venv → lỗi "No virtual environment found"

* `uv pip install <package> --system` → lỗi "externally managed" (Python Homebrew)

* `uv add <package>` → cần pyproject.toml, skill dùng requirements.txt