# research-paper-extractor > Extract text from cardiology research paper PDFs - **FREE, runs locally**. - Author: Shailesh Singh - Repository: drshailesh88/integrated_content_OS - Version: 20260102142516 - Stars: 1 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/drshailesh88/integrated_content_OS - Web: https://mule.run/skillshub/@@drshailesh88/integrated_content_OS~research-paper-extractor:20260102142516 --- # Research Paper Extractor Extract text from cardiology research paper PDFs - **FREE, runs locally**. ## Cost: ZERO - Text extraction: `pdfplumber` (free, local) - Structuring: You ask me (Claude) in this conversation - you're already paying for the subscription **No API calls. No extra costs.** --- ## How It Works ``` STEP 1: Extract text (free, local) python scripts/extract_paper.py trial.pdf --output trial.md STEP 2: Ask Claude (your existing subscription) "Read trial.md and structure this for my content workflow" DONE - No extra cost. ``` --- ## Quick Start ### Install (one time) ```bash pip3 install pdfplumber ``` ### Extract text from PDF ```bash # Save to file python scripts/extract_paper.py paper.pdf --output extracted.md # Just first 5 pages (faster) python scripts/extract_paper.py paper.pdf --pages 5 --output extracted.md ``` ### Then ask Claude Code After extracting, just tell me: > "Read /path/to/extracted.md and give me: > - Study design, population, intervention > - Primary/secondary endpoints with HR, CI, p-values > - Safety data and conclusions > - Content angles for YouTube, Twitter, Newsletter" I'll structure it for your content workflow. --- ## Example Workflow ```bash # 1. Download PDF from NEJM/JACC/Lancet # 2. Extract text python scripts/extract_paper.py ~/Downloads/declare-timi-58.pdf --output declare.md # 3. In Claude Code: # "Read declare.md and structure the trial data. # Give me content angles for my YouTube channel." ``` **Output you'll get from me:** ``` DECLARE-TIMI 58 Summary: Study: RCT, N=17,160, T2DM with CV risk Intervention: Dapagliflozin 10mg vs placebo Duration: 4.2 years median follow-up Primary (MACE): HR 0.93 (0.84-1.03), p=0.17 - Non-inferior, not superior Key Secondary (CV death/HF hosp): HR 0.83 (0.73-0.95), p=0.005 ✓ Content Angles: 🎬 YouTube: "SGLT2 inhibitors: The HF story hidden in a 'negative' trial" 🐦 Twitter: "DECLARE: Primary endpoint NS, but NNT 111 for HF hosp. Bury the lede much?" 📧 Newsletter: "Why 'negative' trials often have positive stories" ``` --- ## Why This Approach? | Approach | Cost | |----------|------| | ❌ Anthropic API per extraction | ~$0.05-0.15 per paper | | ❌ OpenAI API per extraction | ~$0.05-0.20 per paper | | ✅ **This approach** | **$0** - uses your subscription | You're already paying for Claude Code. Use it. --- ## Integration with Your Skills After I structure the data, you can use it with: - `cardiology-trial-editorial` → Write 500-word editorial - `x-post-creator-skill` → Generate tweets with accurate stats - `youtube-script-master` → Script with verified data - `cardiology-newsletter-writer` → Deep dive newsletter --- ## Limitations - Works best with native PDFs (not scanned images) - Very long papers: use `--pages 10` to extract key sections - Tables may need manual review --- *Zero cost. Maximum utility. Uses what you already pay for.*