# download > Download an arXiv paper by title or ID. Searches Semantic Scholar and converts to markdown. - Author: aphamm - Repository: abundance-company/researcher - Version: 20260123130952 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-07 - Source: https://github.com/abundance-company/researcher - Web: https://mule.run/skillshub/@@abundance-company/researcher~download:20260123130952 --- --- name: download description: Download an arXiv paper by title or ID. Searches Semantic Scholar and converts to markdown. allowed-tools: Bash, Read, Write, Edit argument-hint: [paper title or arXiv ID, or empty to read from PAPERS.md] --- # Download arXiv Paper Download papers from arXiv and convert to markdown format for the research vault. ## Usage ``` /download EgoMimic: Scaling [[concepts/imitation-learning|Imitation Learning]] /download 2410.24221 /download ``` ## Process ### If arguments provided: Download a single paper: ```bash cd /Users/pham/Documents/researcher && uv run python src/download.py "$ARGUMENTS" ``` Extract the arXiv ID from the output (look for "Saved: XXXX.XXXXX/paper.md"). ### If no arguments (batch mode): Read from PAPERS.md and download each queued paper. 1. Read `/Users/pham/Documents/researcher/PAPERS.md` 2. Find lines starting with `- [ ]` (queued papers) 3. For each queued paper: - Extract the title (between `] ` and ` (arxiv_id)`) - Run: `cd /Users/pham/Documents/researcher && uv run python src/download.py ""` - If successful, change `- [ ]` to `- [x]` for that line 4. Write updated PAPERS.md back **PAPERS.md format (organized by quarter):** ``` ## 2026 Winter - [ ] Paper Title One (2601.12345) ## 2025 Fall - [x] Already Downloaded Paper (2512.12345) ``` ## Self-Improvement Loop (REQUIRED) **After EVERY download, you MUST perform this quality check:** 1. **Read the downloaded paper.md** to check for conversion artifacts: ``` Read {arxiv_id}/paper.md ``` 2. **Look for these error patterns:** - "An error in the conversion from LaTeX to XML has occurred here" - `[conversion error...]` or `(error:...)` markers - Garbled text, missing content, or obvious corruption - Orphan parentheses like `( text)` or `(text )` - Incomplete sentences that suggest missing content 3. **If errors are found:** - Report what you found to the user - Read `/Users/pham/Documents/researcher/src/download.py` - Add a new regex cleanup pattern to handle the specific error - Fix the current paper.md manually - Test that future downloads would be cleaned 4. **If no errors found:** - Report "Paper downloaded and verified clean" This self-improvement loop ensures the download pipeline gets better over time by learning from conversion artifacts. ## What Happens 1. Searches Semantic Scholar for the paper 2. Downloads PDF from arXiv via arxiv2md 3. Converts to markdown with citation wikilinks 4. Creates `{arxiv_id}/paper.md` in the vault 5. **Verifies content quality and improves cleanup code if needed** ## After Download To fully process the paper (summarize, extract concepts, create author notes), run `/analyze {arxiv_id}` or [[methods/universal-sentence-encoder|use]] the analyze skill directly. ## Download: $ARGUMENTS