# neoantigen-predictor > Predict neoantigens that may be recognized by the immune system based on patient HLA typing and tumor mutation data. Trigger conditions: - User provides HLA typing results and mutation data, requesting neoantigen prediction - User inquires about tumor immunotherapy-related neoantigen prediction - Need to provide T-cell epitope prediction and immunogenicity assessment - Input: HLA alleles (HLA-A*02:01, etc.), tumor mutation data (VCF or peptide sequences) - Output: Predicted neoantigen list, HLA binding affinity, immunogenicity scores - Author: Rowtion - Repository: aipoch/skills-collection - Version: 20260210095832 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-10 - Source: https://github.com/aipoch/skills-collection - Web: https://mule.run/skillshub/@@aipoch/skills-collection~neoantigen-predictor:20260210095832 --- --- name: neoantigen-predictor description: 'Predict neoantigens that may be recognized by the immune system based on patient HLA typing and tumor mutation data. Trigger conditions: - User provides HLA typing results and mutation data, requesting neoantigen prediction - User inquires about tumor immunotherapy-related neoantigen prediction - Need to provide T-cell epitope prediction and immunogenicity assessment - Input: HLA alleles (HLA-A*02:01, etc.), tumor mutation data (VCF or peptide sequences) - Output: Predicted neoantigen list, HLA binding affinity, immunogenicity scores' version: 1.0.0 category: Bioinfo tags: [] author: AIPOCH license: MIT status: Draft risk_level: High skill_type: Hybrid (Tool/Script + Network/API) owner: AIPOCH reviewer: '' last_updated: '2026-02-06' --- # Neoantigen Predictor Predicts patient-specific neoantigen candidate peptides with high immunogenicity based on HLA typing and tumor mutation profiles, providing target screening for tumor immunotherapy. ## Function Overview Neoantigens are variant peptides generated by non-synonymous mutations in tumor cells, which can be presented by the patient's own HLA molecules and recognized by T cells. This tool integrates the following analysis workflows: 1. **Mutant Peptide Generation** - Extract 8-11mer variant peptides from mutation sites 2. **HLA Binding Prediction** - Predict peptide binding affinity to patient HLA molecules 3. **Immunogenicity Assessment** - Assess potential to elicit immune response 4. **Priority Ranking** - Comprehensive scoring to screen optimal neoantigen candidates ## Input Format ### HLA Typing Input | Format | Example | Description | |------|------|------| | **Standard Nomenclature** | `HLA-A*02:01` | WHO standard HLA nomenclature | | **Simplified Nomenclature** | `A0201` | Omit HLA- and *|: | | **Multi-alleles** | `HLA-A*02:01,A*11:01,B*07:02` | Multiple alleles separated by commas | ### Mutation Data Input **VCF Format Example:** ``` #CHROM POS ID REF ALT QUAL FILTER INFO chr17 7579472 . G A 100 PASS GENE=TP53;AA=p.R273H chr13 32915005 . C T 100 PASS GENE=BRCA2;AA=p.S1172L ``` **Table Format:** | Gene | Chrom | Position | Ref | Alt | Protein_Change | |------|-------|----------|-----|-----|----------------| | TP53 | chr17 | 7579472 | G | A | p.R273H | | BRCA2 | chr13 | 32915005 | C | T | p.S1172L | **FASTA Format (Variant Peptides):** ``` >TP53_R273H_mut GSDLWPGYFSH >TP53_R273H_wt GSDLWPGYFSP ``` ## Usage ### Python API ```python from scripts.main import NeoantigenPredictor # Initialize predictor predictor = NeoantigenPredictor() # Set patient HLA typing hla_alleles = ["HLA-A*02:01", "HLA-A*11:01", "HLA-B*07:02"] # Define mutation data mutations = [ { "gene": "TP53", "chrom": "chr17", "pos": 7579472, "ref": "G", "alt": "A", "protein_change": "p.R273H" } ] # Predict neoantigens results = predictor.predict( hla_alleles=hla_alleles, mutations=mutations, peptide_length=[9, 10], # 9-10mer peptides mhc_method="netmhcpan" # Use NetMHCpan prediction ) # Get high-affinity neoantigens high_affinity = predictor.filter_by_binding(results, rank_threshold=0.5) ``` ### Command Line Usage ```bash # Basic prediction python scripts/main.py \ --hla "HLA-A*02:01,HLA-A*11:01,B*07:02" \ --vcf mutations.vcf \ --output neoantigen_results.json # Use table format input python scripts/main.py \ --hla-file hla_genotype.txt \ --mutations mutations.csv \ --peptide-length 9,10,11 \ --rank-cutoff 0.5 \ --output results.json # Predict HLA binding for existing variant peptides python scripts/main.py \ --hla "A*02:01" \ --variant-peptides peptides.fasta \ --wildtype-peptides wt_peptides.fasta \ --output binding_predictions.csv ``` ## Output Format ```json { "patient_hla": ["HLA-A*02:01", "HLA-A*11:01", "HLA-B*07:02"], "prediction_method": "NetMHCpan 4.1", "total_predictions": 156, "strong_binders": 12, "neoantigens": [ { "rank": 1, "mutation_id": "TP53_R273H", "gene": "TP53", "chromosome": "chr17", "position": 7579472, "ref_aa": "R", "alt_aa": "H", "hla_allele": "HLA-A*02:01", "peptide_sequence": "S DDLWPGYFSH", "peptide_length": 9, "mutant_position": 9, "mhc_binding": { "rank_percentile": 0.12, "affinity_nM": 34.5, "binding_level": "Strong", "core_peptide": "DLWPGYFSH", "anchor_residues": [2, 9] }, "immunogenicity": { "foreignness_score": 0.87, "self_similarity": 0.23, "amino_acid_change": "R->H", "anchor_mutation": true, "hydrophobicity_change": -0.45 }, "priority_score": 0.92, "clinical_relevance": { "variant_allele_frequency": 0.42, "expression_level": "High", "clonality": "Clonal" } } ], "summary": { "top_candidates": 5, "binding_distribution": { "strong": 12, "weak": 44, "non_binder": 100 } } } ``` ## Scoring Algorithms ### MHC Binding Affinity Prediction Using **NetMHCpan 4.1** algorithm to predict peptide binding to HLA molecules: | Metric | Description | Threshold | |------|------|------| | **Rank %** | Binding rank percentile compared to natural ligand library | <0.5% = Strong, <2% = Weak | | **IC50 (nM)** | Half-maximal inhibitory concentration | <50nM = High, <500nM = Intermediate | | **Binding Level** | Comprehensive binding strength classification | Strong/Weak/Non-binder | ### Immunogenicity Score ``` Immunogenicity Score = Σ(wi × fi) Components: 1. Foreignness Score (w=0.30): Difference from wild-type protein 2. Anchor Mutation (w=0.25): Whether mutation is at HLA binding anchor position 3. Self-similarity (w=0.20): Similarity to self-antigen pool (lower is better) 4. Hydrophobicity Change (w=0.15): Magnitude of hydrophobicity change 5. Clonality (w=0.10): Tumor clonality (clonal mutation > subclonal) ``` ### Priority Score ```python priority_score = ( binding_weight × (1 - rank_percentile) + immunogenicity_weight × immunogenicity_score + clinical_weight × clinical_score ) # Weight configuration weights = { 'mhc_binding': 0.40, # MHC binding affinity 'immunogenicity': 0.35, # Immunogenicity 'clinical': 0.25 # Clinical relevance (expression, clonality) } ``` ## HLA Support List ### MHC Class I Molecules - **HLA-A**: A*01:01, A*02:01, A*02:03, A*02:06, A*03:01, A*11:01, A*23:01, A*24:02, A*26:01, A*30:01, A*30:02, A*31:01, A*32:01, A*33:01, A*68:01, A*68:02 - **HLA-B**: B*07:02, B*08:01, B*15:01, B*27:05, B*35:01, B*40:01, B*44:02, B*44:03, B*51:01, B*53:01, B*57:01, B*58:01 - **HLA-C**: C*03:03, C*04:01, C*05:01, C*06:02, C*07:01, C*07:02, C*08:02, C*12:03, C*14:02, C*15:02 ### Mouse MHC (for preclinical research) - H2-Db, H2-Kb, H2-Kd, H2-Ld ## Technical Difficulty: **HIGH** ⚠️ **AI Autonomous Acceptance Status**: Manual review required This skill involves complex immunoinformatics calculations: - MHC binding prediction algorithms (NetMHCpan neural network) - Peptide sequence processing and variant positioning - Multi-dimensional immunogenicity assessment - Large-scale parallel computing optimization - Tumor genomics data integration ## Data Dependencies | Data Source | Type | Purpose | |--------|------|------| | **NetMHCpan 4.1** | MHC binding prediction | Core prediction algorithm | | **Ensembl/GENCODE** | Genome annotation | Transcript sequence extraction | | **UniProt** | Protein sequences | Wild-type reference sequences | | **IEDB** | Immune epitope data | Immunogenicity assessment reference | | **TCGA** | Tumor mutation data | Mutation signature analysis | ## Algorithm Limitations - MHC binding prediction accuracy: ~85% (Rank < 0.5 threshold) - Immunogenicity prediction requires experimental validation, correlation ~60-70% - Does not consider HLA molecule expression levels on cell surface - Cannot predict immune tolerance or suppressive T cell responses - Uncertainty in the correlation between neoantigen generation and T cell response ## Clinical Application Notes ⚠️ **Important Notice**: This tool is for research purposes only; prediction results should not be the sole basis for clinical decisions. - All candidate neoantigens require experimental validation (e.g., ELISPOT, tetramer staining) - Consider patient's own immune status and treatment history - Assess potential autoimmune toxicity risks - Combine with tumor microenvironment immune infiltration status ## References See `references/` directory: - NetMHCpan 4.1 algorithm paper (Reynisson et al., 2020) - Neoantigen prediction best practice guidelines - Tumor immunotherapy clinical trial design references - Immunopeptidomics databases ## Dependencies **Required:** - Python 3.8+ - biopython (sequence processing) - pandas, numpy (data analysis) - requests (API calls) **Optional (enhanced features):** - NetMHCpan 4.1 local installation (improved performance) - samtools (VCF processing) - matplotlib, seaborn (visualization) ## Core Implementation Core script: `scripts/main.py` Key functions: - `extract_variant_peptides()` - Extract variant peptides from mutation sites - `predict_mhc_binding()` - MHC binding affinity prediction - `calculate_foreignness()` - Foreignness/self-similarity assessment - `score_immunogenicity()` - Comprehensive immunogenicity scoring - `rank_candidates()` - Multi-criteria candidate ranking ## Validation Status - **Unit Test Coverage**: 78% - **Benchmark Validation**: Prediction consistency with published neoantigen datasets - **Status**: ⏳ Requires experimental validation - Prediction results require in vitro/in vivo validation ## Risk Assessment | Risk Indicator | Assessment | Level | |----------------|------------|-------| | Code Execution | Python scripts with tools | High | | Network Access | External API calls | High | | File System Access | Read/write data | Medium | | Instruction Tampering | Standard prompt guidelines | Low | | Data Exposure | Data handled securely | Medium | ## Security Checklist - [ ] No hardcoded credentials or API keys - [ ] No unauthorized file system access (../) - [ ] Output does not expose sensitive information - [ ] Prompt injection protections in place - [ ] API requests use HTTPS only - [ ] Input validated against allowed patterns - [ ] API timeout and retry mechanisms implemented - [ ] Output directory restricted to workspace - [ ] Script execution in sandboxed environment - [ ] Error messages sanitized (no internal paths exposed) - [ ] Dependencies audited - [ ] No exposure of internal service architecture ## Prerequisites ```bash # Python dependencies pip install -r requirements.txt ``` ## Evaluation Criteria ### Success Metrics - [ ] Successfully executes main functionality - [ ] Output meets quality standards - [ ] Handles edge cases gracefully - [ ] Performance is acceptable ### Test Cases 1. **Basic Functionality**: Standard input → Expected output 2. **Edge Case**: Invalid input → Graceful error handling 3. **Performance**: Large dataset → Acceptable processing time ## Lifecycle Status - **Current Stage**: Draft - **Next Review Date**: 2026-03-06 - **Known Issues**: None - **Planned Improvements**: - Performance optimization - Additional feature support