# literature-search > Search PubMed for RNAi/dsRNA research on target genes or species (utility skill - use anytime) - Author: Hannes Bretschneider - Repository: katalyzeAI/dsrna-designer - Version: 20260120113046 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/katalyzeAI/dsrna-designer - Web: https://mule.run/skillshub/@@katalyzeAI/dsrna-designer~literature-search:20260120113046 --- --- name: literature-search description: Search PubMed for RNAi/dsRNA research on target genes or species (utility skill - use anytime) type: utility --- # Literature Search Skill ## When to Use This Skill This is a **utility skill** - use it at ANY point in the workflow when you need: - Published RNAi/dsRNA studies for a pest species - Evidence supporting gene essentiality - Validation of candidate gene targets - References for the final report ## IMPORTANT: Tool Selection **ALWAYS use PubMed MCP tools for literature searches. DO NOT use WebSearch/Tavily.** | Correct | Incorrect | |---------|-----------| | `pubmed_search_articles` | `WebSearch` | | `pubmed_get_article_metadata` | `WebFetch` on Google Scholar | PubMed provides peer-reviewed, citable scientific literature with structured metadata (PMIDs, DOIs, abstracts). Web search returns unstructured, potentially unreliable results. ## Instructions ### Step 1: Search PubMed Using MCP Server Use the PubMed MCP tools. **For species-wide RNAi research:** ``` pubmed_search_articles query: "{species}" AND (RNAi OR dsRNA OR "RNA interference" OR "gene silencing") max_results: 50 ``` **For specific gene targets:** ``` pubmed_search_articles query: "{gene_name}" AND (RNAi OR dsRNA) AND insect max_results: 20 ``` ### Step 2: Get Article Details For relevant PMIDs, fetch full metadata: ``` pubmed_get_article_metadata pmids: ["PMID1", "PMID2", ...] ``` ### Step 3: Extract Gene Names (CRITICAL) **You MUST extract gene names from each paper's title and abstract.** The `match_essential.py` script relies on the `gene_names` field to give literature support scores to candidate genes. If this field is empty or missing, literature support will be ignored. Look for these gene patterns in titles and abstracts: | Gene | Patterns to Match | |------|-------------------| | vATPase | V-ATPase, vATPase, vha, ATP6V, vacuolar ATPase | | chitin synthase | chitin synthase, ChS, CHS | | acetylcholinesterase | acetylcholinesterase, AChE, Ace | | alpha-tubulin | α-tubulin, alpha-tubulin, TUA | | beta-tubulin | β-tubulin, beta-tubulin, TUB | | ribosomal protein | ribosomal protein, RpS, RpL | | cytochrome P450 | cytochrome P450, CYP, P450 | | ecdysone receptor | ecdysone receptor, EcR | | trehalase | trehalase, TRE | | laccase | laccase, Lac | | aquaporin | aquaporin, AQP | | heat shock protein | heat shock protein, HSP, Hsp | | actin | actin, ACT | | GABA receptor | GABA receptor, Rdl, GABAR | | sodium channel | sodium channel, Nav, para | ### Step 4: Save Results in Required Format **Analysis outputs go in `output/{run}/`, NOT in `data/`.** Write to `output/{run}/literature_search.json`: **REQUIRED FORMAT:** ```json [ { "pmid": "12345678", "doi": "10.1234/example", "title": "RNAi silencing of vATPase in Drosophila suzukii causes mortality", "authors": ["Smith J", "Jones K"], "journal": "Journal of Insect Physiology", "year": "2020", "gene_names": ["vATPase"], "abstract_snippet": "We demonstrate effective gene silencing..." }, { "pmid": "12345679", "title": "Chitin synthase and acetylcholinesterase as RNAi targets", "gene_names": ["chitin synthase", "acetylcholinesterase"], ... } ] ``` **CRITICAL FIELDS:** - `gene_names` - **REQUIRED** - Array of gene names found in title/abstract - `pmid` - PubMed ID - `title` - Article title The downstream script `match_essential.py` checks `paper.get('gene_names', [])` for each paper. If `gene_names` is missing or empty, that paper won't contribute to literature support scores. ### Step 5: Verify Format After saving, verify the format is correct: ```bash jq '.[0:2] | .[] | {pmid, gene_names}' output/{run}/literature_search.json ``` Should show each paper with its extracted gene_names array. ## Alternative: Use parse_pubmed.py Script If you have raw PubMed XML, you can use the bundled script to extract genes: ```bash python dsrna_agent/skills/literature-search/scripts/parse_pubmed.py \ --xml-file /tmp/pubmed_results.xml \ --output output/{run}/literature_search.json ``` This automatically extracts gene names using pattern matching. ## Available MCP Tools | Tool | Purpose | |------|---------| | `pubmed_search_articles` | Search PubMed with query | | `pubmed_get_article_metadata` | Get full article details by PMID | | `pubmed_find_related_articles` | Find similar papers | | `pubmed_get_full_text_article` | Get PMC full text (if available) | ## Notes - Always cite PubMed and include DOIs when reporting findings - If no results for exact species, try related species or genus-level queries - Gene mentions from literature boost candidate scores in the scoring step - **This skill runs AUTONOMOUSLY - no user confirmation needed** - Do NOT ask "Would you like me to search PubMed?" - just search when relevant - Integrate results silently and continue with the workflow