# download
> Download an arXiv paper by title or ID. Searches Semantic Scholar and converts to markdown.
- Author: aphamm
- Repository: abundance-company/researcher
- Version: 20260123130952
- Stars: 1
- Forks: 0
- Last Updated: 2026-02-07
- Source: https://github.com/abundance-company/researcher
- Web: https://mule.run/skillshub/@@abundance-company/researcher~download:20260123130952
---
---
name: download
description: Download an arXiv paper by title or ID. Searches Semantic Scholar and converts to markdown.
allowed-tools: Bash, Read, Write, Edit
argument-hint: [paper title or arXiv ID, or empty to read from PAPERS.md]
---
# Download arXiv Paper
Download papers from arXiv and convert to markdown format for the research vault.
## Usage
```
/download EgoMimic: Scaling [[concepts/imitation-learning|Imitation Learning]]
/download 2410.24221
/download
```
## Process
### If arguments provided:
Download a single paper:
```bash
cd /Users/pham/Documents/researcher && uv run python src/download.py "$ARGUMENTS"
```
Extract the arXiv ID from the output (look for "Saved: XXXX.XXXXX/paper.md").
### If no arguments (batch mode):
Read from PAPERS.md and download each queued paper.
1. Read `/Users/pham/Documents/researcher/PAPERS.md`
2. Find lines starting with `- [ ]` (queued papers)
3. For each queued paper:
- Extract the title (between `] ` and ` (arxiv_id)`)
- Run: `cd /Users/pham/Documents/researcher && uv run python src/download.py "
"`
- If successful, change `- [ ]` to `- [x]` for that line
4. Write updated PAPERS.md back
**PAPERS.md format (organized by quarter):**
```
## 2026 Winter
- [ ] Paper Title One (2601.12345)
## 2025 Fall
- [x] Already Downloaded Paper (2512.12345)
```
## Self-Improvement Loop (REQUIRED)
**After EVERY download, you MUST perform this quality check:**
1. **Read the downloaded paper.md** to check for conversion artifacts:
```
Read {arxiv_id}/paper.md
```
2. **Look for these error patterns:**
- "An error in the conversion from LaTeX to XML has occurred here"
- `[conversion error...]` or `(error:...)` markers
- Garbled text, missing content, or obvious corruption
- Orphan parentheses like `( text)` or `(text )`
- Incomplete sentences that suggest missing content
3. **If errors are found:**
- Report what you found to the user
- Read `/Users/pham/Documents/researcher/src/download.py`
- Add a new regex cleanup pattern to handle the specific error
- Fix the current paper.md manually
- Test that future downloads would be cleaned
4. **If no errors found:**
- Report "Paper downloaded and verified clean"
This self-improvement loop ensures the download pipeline gets better over time by learning from conversion artifacts.
## What Happens
1. Searches Semantic Scholar for the paper
2. Downloads PDF from arXiv via arxiv2md
3. Converts to markdown with citation wikilinks
4. Creates `{arxiv_id}/paper.md` in the vault
5. **Verifies content quality and improves cleanup code if needed**
## After Download
To fully process the paper (summarize, extract concepts, create author notes), run `/analyze {arxiv_id}` or [[methods/universal-sentence-encoder|use]] the analyze skill directly.
## Download: $ARGUMENTS