# mdcn2en

> Translate Chinese Markdown (.md) files into English while preserving Markdown structure. Use only for Markdown files and when asked to translate Chinese Markdown to English, create a sibling .en.md file, keep YAML front matter untouched, and maintain a term glossary/translation memory.

- Author: Jeff Song
- Repository: JEFFTIMES/skill-building
- Version: 20260205211705
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-08
- Source: https://github.com/JEFFTIMES/skill-building
- Web: https://mule.run/skillshub/@@JEFFTIMES/skill-building~mdcn2en:20260205211705

---

---
name: mdcn2en
description: Translate Chinese Markdown (.md) files into English while preserving Markdown structure. Use only for Markdown files and when asked to translate Chinese Markdown to English, create a sibling .en.md file, keep YAML front matter untouched, and maintain a term glossary/translation memory.
---

# mdcn2en

## Workflow

1) Confirm input .md file path and desired output name (default: `foo.md` -> `foo.en.md`).
2) Run `scripts/extract_cn_blocks.py` to create the skeleton `.en.md` and the blocks file.
   - The skeleton keeps YAML front matter, code blocks, and Markdown structure.
   - Chinese text is replaced by indexed placeholders.
3) Run `scripts/extract_blocks_from_json.py` to list the blocks for translation.
4) Translate the extracted blocks one by one, applying glossary terms where appropriate.
5) Run `scripts/insert_en_blocks.py` to replace placeholders in the skeleton with translations.
6) Append stable new terms to `references/glossary.jsonl` using `scripts/append_glossary.py` in batch.

## Script Usage

### Extract blocks

```bash
python mdcn2en/scripts/extract_cn_blocks.py --input path/to/foo.md
```

This creates:

- `path/to/foo.en.md` with placeholders
- `path/to/foo.en.blocks.json` with extracted blocks

### List blocks for translation

```bash
python mdcn2en/scripts/extract_blocks_from_json.py --input path/to/foo.en.blocks.json
```

### Insert translations

```bash
python mdcn2en/scripts/insert_en_blocks.py --input path/to/foo.en.md --translations path/to/translated.json
```

Translations file formats:

- JSON list: `[{"index": 1, "text": "..."}]`
- JSON dict: `{ "[[CN2EN_BLOCK_0001]]": "..." }`
- JSONL: one object per line with `index` or `placeholder` and `text`

### Append glossary (batch)

```bash
python mdcn2en/scripts/append_glossary.py --input path/to/glossary.json --source "path.md" --context "short snippet"
```

Input formats:

- JSON dict: `{ "术语": "Term", "科研": "research" }` (requires `--source` and `--context`)
- JSON list: `[{"zh": "术语", "en": "Term", "source": "...", "context": "..."}, ...]`

## Resources

- `references/output-conventions.md`: naming rules and file placement.
- `references/ignore-rules.md`: what to preserve and never translate.
- `references/glossary.jsonl`: append-only term base.
- `references/terms.md`: guidance for adding stable glossary entries.
- `scripts/append_glossary.py`: helper to append stable terms to the glossary.
- `scripts/extract_cn_blocks.py`: create a skeleton .en.md file and extract placeholders into a blocks file.
- `scripts/extract_blocks_from_json.py`: list block texts from a blocks JSON file.
- `scripts/insert_en_blocks.py`: insert translated blocks into the skeleton .en.md file.

## Term Base Format

Each line in `references/glossary.jsonl` is a JSON object:

- `zh`: Chinese term
- `en`: English term
- `source`: where the term came from (file path or context)
- `context`: short snippet to clarify meaning
- `added_at`: ISO 8601 timestamp

Append only when the translation is stable and unambiguous.