# virtual-screening

> Molecular docking and virtual screening for drug discovery. Use for screening
compound libraries against protein targets, predicting binding affinities, and
identifying lead candidates.

- Author: huifer
- Repository: huifer/drug-discovery-skills
- Version: 20260104224806
- Stars: 4
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/huifer/drug-discovery-skills
- Web: https://mule.run/skillshub/@@huifer/drug-discovery-skills~virtual-screening:20260104224806

---

---
name: virtual-screening
description: |
  Molecular docking and virtual screening for drug discovery. Use for screening
  compound libraries against protein targets, predicting binding affinities, and
  identifying lead candidates.

  Keywords: docking, virtual screening, molecular docking, binding affinity, lead identification
category: Computational Chemistry
tags: [docking, screening, molecular-modeling, drug-design]
version: 1.0.0
author: Drug Discovery Team
dependencies:
  - rdkit
  - openbabel
  - docking-engine
---

# Virtual Screening Skill

Molecular docking and virtual screening capabilities for lead identification.

## Quick Start

```
/virtual-screening EGFR --library compounds.sdf --top 10
/dock "target.pdb" --ligands "ligands.smi" --engine vina
/screen-kinases --scaffold quinazoline --threshold -7.0
```

## Capabilities

### 1. Molecular Docking

Predict binding poses and affinities for compound-target pairs.

**Supported docking engines**:
- AutoDock Vina - Fast, widely used
- SMINA - Vina derivative with custom scoring
- DiffDock - Deep learning-based
- GNINA - Graph neural network scoring

### 2. Virtual Screening

Screen large compound libraries against targets.

**Library sources**:
- ChEMBL (bioactive compounds)
- PubChem (diverse compounds)
- ZINC (commercially available)
- Enamine (make-on-demand)

### 3. Binding Affinity Prediction

Estimate binding energies using scoring functions.

### 4. Pose Analysis

Analyze binding modes and key interactions.

## Docking Workflow

```
1. Target Preparation
   ├── Retrieve PDB structure
   ├── Remove water/ligands
   ├── Add hydrogens
   └── Define binding site

2. Ligand Preparation
   ├── Generate 3D conformations
   ├── Minimize energy
   └── Generate protonation states

3. Docking
   ├── Set grid box
   ├── Run docking engine
   └── Generate poses

4. Analysis
   ├── Score poses
   ├── Analyze interactions
   └── Rank compounds
```

## Output Structure

### Docking Results

```markdown
# Virtual Screening Results: EGFR Kinase

## Summary
| Metric | Value |
|--------|-------|
| Compounds screened | 10,000 |
| Successful dockings | 9,847 |
| Top hits (≤ -8 kcal/mol) | 47 |
| Processing time | 2.5 hours |

## Top 10 Compounds

| Rank | Compound ID | Affinity (kcal/mol) | LE | LLE | Interactions |
|------|-------------|---------------------|-----|-----|--------------|
| 1 | CHEMBL210 | -10.2 | 0.42 | 6.8 | H-bond: Met793, hinge |
| 2 | CHEMBL456 | -9.8 | 0.38 | 6.5 | H-bond: Met793, Lys745 |
| 3 | ZINC12345 | -9.5 | 0.35 | 6.2 | π-π: Phe723 |

## Binding Mode Analysis (Top Hit)

### Compound: CHEMBL210
**Affinity**: -10.2 kcal/mol
**LE**: 0.42
**LLE**: 6.8

**Key Interactions**:
- H-bond with Met793 (hinge region)
- H-bond with Thr854
- π-π stacking with Phe723
- Hydrophobic pocket: Le718, Val726

## Pharmacophore Features

1. **Hinge binder**: N-heterocycle H-bond donor/acceptor
2. **Gatekeeper interaction**: Small hydrophobic group
3. **Solvent front**: Polar substituent
4. **Back pocket**: Extended hydrophobic moiety

## Recommendations

1. **Synthesis priority**: Top 5 compounds
2. **SAR exploration**: Around quinazoline core
3. **Experimental validation**: SPR, ITC binding assays
```

## Scoring Metrics

| Metric | Formula | Good Range |
|--------|---------|------------|
| Binding affinity | Docking score | ≤ -7 kcal/mol |
| Ligand Efficiency (LE) | Score / Heavy atoms | ≥ 0.3 |
| LLE (LipE) | Score - LogP | ≥ 6 |
| Size | Heavy atom count | 20-40 |

## Running Scripts

```bash
# Virtual screening with Vina
python scripts/virtual_screening.py \
  --target EGFR \
  --library data/compounds.sdf \
  --top 50 \
  --output results.json

# Docking with custom settings
python scripts/docking.py \
  --pdb 1m17.pdb \
  --center_x 10.5 \
  --center_y 20.3 \
  --center_z 15.8 \
  --size_x 20 \
  --size_y 20 \
  --size_z 20

# Rescoring with GNINA
python scripts/rescore.py \
  --poses docked_poses.sdf \
  --model gnina \
  --output rescored.json
```

## Requirements

```bash
# Core dependencies
pip install rdkit meeko

# Docking engines
# AutoDock Vina: http://vina.scripps.edu/
# SMINA: https://github.com/ccsb-scripps/AutoDock-Vina
# GNINA: https://github.com/gnina/gnina

# Optional
pip install prody pymol-open-source
```

## Reference

- See [reference/docking-guide.md](reference/docking-guide.md) for detailed docking protocols
- See [reference/scoring-functions.md](reference/scoring-functions.md) for scoring function details
- See [reference/structure-prep.md](reference/structure-prep.md) for protein preparation

## Best Practices

1. **Prepare structures carefully**: Clean PDB, remove duplicates
2. **Validate docking protocol**: Re-dock co-crystal ligand
3. **Consider multiple poses**: Top 3-5 poses per compound
4. **Use consensus scoring**: Combine multiple scoring functions
5. **Check binding modes**: Visual inspection of top poses
6. **Account for flexibility**: Consider induced fit if needed

## Common Pitfalls

| Pitfall | Solution |
|---------|----------|
| Incorrect binding site | Validate with known inhibitor |
| Poor ligand preparation | Generate multiple conformations |
| Single scoring function | Use consensus scoring |
| Ignoring protein flexibility | Use ensemble docking |
| Overinterpreting scores | Remember scoring is approximate |

## Limitations

- **Scoring accuracy**: ±2 kcal/mol typical error
- **Protein flexibility**: Limited in standard docking
- **Solvent effects**: Often implicit/explicit simplified
- **Binding kinetics**: Not predicted (affinity only)
- **Synthetic accessibility**: Not assessed