# download

> Download an arXiv paper by title or ID. Searches Semantic Scholar and converts to markdown.

- Author: aphamm
- Repository: abundance-company/researcher
- Version: 20260123130952
- Stars: 1
- Forks: 0
- Last Updated: 2026-02-07
- Source: https://github.com/abundance-company/researcher
- Web: https://mule.run/skillshub/@@abundance-company/researcher~download:20260123130952

---

---
name: download
description: Download an arXiv paper by title or ID. Searches Semantic Scholar and converts to markdown.
allowed-tools: Bash, Read, Write, Edit
argument-hint: [paper title or arXiv ID, or empty to read from PAPERS.md]
---

# Download arXiv Paper

Download papers from arXiv and convert to markdown format for the research vault.

## Usage

```
/download EgoMimic: Scaling [[concepts/imitation-learning|Imitation Learning]]
/download 2410.24221
/download
```

## Process

### If arguments provided:
Download a single paper:
```bash
cd /Users/pham/Documents/researcher && uv run python src/download.py "$ARGUMENTS"
```

Extract the arXiv ID from the output (look for "Saved: XXXX.XXXXX/paper.md").

### If no arguments (batch mode):
Read from PAPERS.md and download each queued paper.

1. Read `/Users/pham/Documents/researcher/PAPERS.md`
2. Find lines starting with `- [ ]` (queued papers)
3. For each queued paper:
   - Extract the title (between `] ` and ` (arxiv_id)`)
   - Run: `cd /Users/pham/Documents/researcher && uv run python src/download.py "<title>"`
   - If successful, change `- [ ]` to `- [x]` for that line
4. Write updated PAPERS.md back

**PAPERS.md format (organized by quarter):**
```
## 2026 Winter
- [ ] Paper Title One (2601.12345)

## 2025 Fall
- [x] Already Downloaded Paper (2512.12345)
```

## Self-Improvement Loop (REQUIRED)

**After EVERY download, you MUST perform this quality check:**

1. **Read the downloaded paper.md** to check for conversion artifacts:
   ```
   Read {arxiv_id}/paper.md
   ```

2. **Look for these error patterns:**
   - "An error in the conversion from LaTeX to XML has occurred here"
   - `[conversion error...]` or `(error:...)` markers
   - Garbled text, missing content, or obvious corruption
   - Orphan parentheses like `( text)` or `(text )`
   - Incomplete sentences that suggest missing content

3. **If errors are found:**
   - Report what you found to the user
   - Read `/Users/pham/Documents/researcher/src/download.py`
   - Add a new regex cleanup pattern to handle the specific error
   - Fix the current paper.md manually
   - Test that future downloads would be cleaned

4. **If no errors found:**
   - Report "Paper downloaded and verified clean"

This self-improvement loop ensures the download pipeline gets better over time by learning from conversion artifacts.

## What Happens

1. Searches Semantic Scholar for the paper
2. Downloads PDF from arXiv via arxiv2md
3. Converts to markdown with citation wikilinks
4. Creates `{arxiv_id}/paper.md` in the vault
5. **Verifies content quality and improves cleanup code if needed**

## After Download

To fully process the paper (summarize, extract concepts, create author notes), run `/analyze {arxiv_id}` or [[methods/universal-sentence-encoder|use]] the analyze skill directly.

## Download: $ARGUMENTS