# optimizing-queries

> Analyzes and optimizes SQL/NoSQL queries for performance. Use when reviewing query performance, optimizing slow queries, analyzing EXPLAIN output, suggesting indexes, identifying N+1 problems, recommending query rewrites, or improving database access patterns. Supports PostgreSQL, MySQL, SQLite, MongoDB, Redis, DynamoDB, and Elasticsearch.

- Author: galihcitta
- Repository: galihcitta/dotclaudeskills
- Version: 20260112141943
- Stars: 4
- Forks: 0
- Last Updated: 2026-02-07
- Source: https://github.com/galihcitta/dotclaudeskills
- Web: https://mule.run/skillshub/@@galihcitta/dotclaudeskills~optimizing-queries:20260112141943

---

---
name: optimizing-queries
description: Analyzes and optimizes SQL/NoSQL queries for performance. Use when reviewing query performance, optimizing slow queries, analyzing EXPLAIN output, suggesting indexes, identifying N+1 problems, recommending query rewrites, or improving database access patterns. Supports PostgreSQL, MySQL, SQLite, MongoDB, Redis, DynamoDB, and Elasticsearch.
---

# Query Optimization

Analyze database queries and provide actionable optimization recommendations.

## Workflow

Copy and track progress:
```
Query Optimization:
- [ ] Identify database type
- [ ] Analyze query structure
- [ ] Check for red flags
- [ ] Provide recommendations
- [ ] Suggest verification
```

1. **Identify database type** from syntax or user context
2. **Analyze query structure** - parse SELECT, FROM, WHERE, JOIN, ORDER BY, GROUP BY clauses
3. **Check for red flags** (see Quick Reference)
4. **Provide concrete recommendations** with copy-pasteable code
5. **Suggest verification** - EXPLAIN command or metric to confirm

**For complex queries (3+ JOINs or nested subqueries):**
- Decompose into logical units
- Analyze each unit's access pattern
- Synthesize recommendations in priority order

## Analysis Checklist

For each query, evaluate:
- **Index usage**: Missing or inefficient indexes
- **Scan type**: Full scans on large tables
- **Join efficiency**: Join order, type, missing conditions
- **Filtering**: WHERE placement, function usage on columns
- **Projection**: SELECT * vs specific columns
- **Pagination**: OFFSET vs keyset/cursor
- **N+1 detection**: Loop patterns in code context

## Quick Reference

### SQL Red Flags
- `SELECT *` on large tables
- `WHERE function(column)` prevents index use
- `LIKE '%pattern'` (leading wildcard)
- `OR` conditions spanning different columns
- Large `OFFSET` values
- `NOT IN` with nullable subquery
- Missing `LIMIT` on unbounded queries

### NoSQL Red Flags
- MongoDB: COLLSCAN, `$where`, unbounded arrays
- Redis: `KEYS *`, missing TTL, wrong data structure
- DynamoDB: Scan operations, hot partitions
- Elasticsearch: Deep pagination, wildcard prefix

## Examples

**Example 1: Missing Index**

Input:
```sql
SELECT * FROM orders WHERE customer_email = 'user@example.com';
```

Output:
```
## Issue [HIGH]
Full table scan on `orders` - no index on `customer_email`

## Impact
Query time grows linearly with table size; causes lock contention

## Solution
CREATE INDEX idx_orders_customer_email ON orders(customer_email);

## Verification
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_email = 'user@example.com';
-- Should show "Index Scan" instead of "Seq Scan"
```

**Example 2: N+1 Query Pattern**

Input:
```python
users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query(f"SELECT * FROM orders WHERE user_id = {user.id}")
```

Output:
```
## Issue [HIGH]
N+1 query pattern - executes 1 + N queries instead of 1

## Impact
100 users = 101 queries; network latency multiplied

## Solution
SELECT u.*, o.* FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

## Verification
Check query count before/after; should reduce from N+1 to 1
```

**Example 3: When NOT to Optimize**

Input:
```sql
SELECT * FROM config WHERE key = 'app_version';
```

Output:
```
## No Issues Detected

This query on a small lookup table (typically <100 rows) is acceptable.
Adding an index would add write overhead without meaningful read benefit.

## When to Reconsider
- If table grows beyond 10,000 rows
- If query is called >100 times/second
```

**Example 4: Incomplete Information**

Input:
```sql
SELECT * FROM large_table WHERE status = ?;
```

Output:
```
## Issue [MEDIUM]
Preliminary analysis without schema information

## Potential Issues
1. `SELECT *` - may fetch unnecessary columns
2. `status` column - if low cardinality (e.g., 5 values), index may not help

## Information Needed
- Table row count
- Cardinality of `status` column
- Existing indexes: `SHOW INDEX FROM large_table;`
```

**Example 5: MongoDB COLLSCAN**

Input:
```javascript
db.orders.find({ customer_email: "user@example.com" })
```

Output:
```
## Issue [HIGH]
COLLSCAN on `orders` - no index on `customer_email`

## Impact
Query scans entire collection; O(n) performance

## Solution
db.orders.createIndex({ customer_email: 1 })

## Verification
db.orders.find({ customer_email: "..." }).explain("executionStats")
// Should show "stage": "IXSCAN" instead of "COLLSCAN"
```

## Response Format

Structure recommendations as:

```
## Issue [HIGH/MEDIUM/LOW]
[Specific problem identified]

## Impact
[Performance/resource cost]

## Solution
[Concrete fix with code]

## Verification
[EXPLAIN command or metric to confirm]
```

## Confidence Levels

When uncertainty exists, indicate confidence:
- **HIGH**: Pattern clearly matches known anti-pattern
- **MEDIUM**: Likely issue, depends on data distribution/schema
- **LOW**: Potential issue, needs EXPLAIN to confirm

## Response Constraints

1. **Max 5 recommendations per query** - prioritize by impact
2. **Order by severity**: HIGH → MEDIUM → LOW
3. **One primary fix per issue** - mention alternatives briefly
4. **Code must be copy-pasteable** - no placeholders like `<table_name>`
5. **Verification commands must be complete** - include all flags

## Prioritization Matrix

| Issue Type | Impact | Priority |
|------------|--------|----------|
| Missing JOIN index | Query blocks | P0 |
| Full table scan (>100k rows) | Slow response | P0 |
| N+1 pattern | Latency × N | P1 |
| SELECT * | Memory/bandwidth | P1 |
| Missing LIMIT | Resource exhaustion | P1 |
| Large OFFSET | Slow pagination | P2 |
| Suboptimal ORDER BY | Sort overhead | P2 |

## Reference Files

Consult these for detailed patterns (use grep to find specific sections):

- `references/sql-patterns.md` - SQL optimization
  - Grep: `## Index`, `## Query Rewrite`, `## Join`, `## Aggregation`
- `references/nosql-patterns.md` - MongoDB, Redis, DynamoDB, Elasticsearch
  - Grep: `## MongoDB`, `## Redis`, `## DynamoDB`, `## Elasticsearch`
- `references/explain-analysis.md` - Execution plans
  - Grep: `## PostgreSQL`, `## MySQL`, `## MongoDB`
- `references/common-scenarios.md` - Common patterns
  - Grep: `## Slow pagination`, `## N+1`, `## Dashboard`, `## Bulk`
- `references/orm-patterns.md` - ORM fixes
  - Grep: `### Django`, `### SQLAlchemy`, `### ActiveRecord`, `### Prisma`

## Utility Scripts

All scripts use Python standard library only (no pip install required).

Run these for automated analysis:

### Query Analysis
**analyze_query.py** - Detect anti-patterns and score complexity:
```bash
python scripts/analyze_query.py "SELECT * FROM orders WHERE ..."
python scripts/analyze_query.py --file query.sql --json
```

**suggest_index.py** - Generate index recommendations:
```bash
python scripts/suggest_index.py "SELECT ... FROM orders WHERE status = 'pending'"
```

### EXPLAIN & Logs
**parse_explain.py** - Analyze EXPLAIN output:
```bash
python scripts/parse_explain.py --pg explain_output.txt
python scripts/parse_explain.py --mysql explain_output.txt
```

**diff_explain.py** - Compare before/after EXPLAIN outputs:
```bash
python scripts/diff_explain.py --pg before.txt after.txt
python scripts/diff_explain.py --mysql before.txt after.txt --json
```

**parse_slow_log.py** - Analyze slow query logs:
```bash
python scripts/parse_slow_log.py --pg /var/log/postgresql.log --top 20
python scripts/parse_slow_log.py --mysql slow-query.log --json
```

### Schema & Code
**analyze_schema.py** - Find schema optimization opportunities:
```bash
python scripts/analyze_schema.py schema.sql
```

**detect_orm_issues.py** - Find ORM anti-patterns in code:
```bash
python scripts/detect_orm_issues.py app.py
python scripts/detect_orm_issues.py --dir ./models
```

**check_migration.py** - Check migration safety:
```bash
python scripts/check_migration.py --pg migration.sql
python scripts/check_migration.py --mysql migration.sql
```

### Workflow with Scripts

**For query optimization:**
1. `analyze_query.py` → detect issues
2. `suggest_index.py` → generate indexes
3. User runs EXPLAIN (before)
4. User applies optimization
5. User runs EXPLAIN (after)
6. `diff_explain.py` → verify improvement

**For verifying optimization impact:**
```bash
# Save before state
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > before.txt

# Apply optimization (add index, rewrite query, etc.)

# Save after state  
psql -c "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." > after.txt

# Compare
python scripts/diff_explain.py --pg before.txt after.txt
```

**For codebase audit:**
1. `detect_orm_issues.py --dir ./` → find N+1 patterns
2. `analyze_schema.py` → check schema
3. Prioritize fixes by severity

**For migration review:**
1. `check_migration.py` → safety check
2. Fix critical issues before deployment
3. Use safe alternatives (CONCURRENTLY, etc.)

## EXPLAIN Commands

Provide appropriate commands:

```sql
-- PostgreSQL
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- MySQL
EXPLAIN ANALYZE SELECT ...;

-- SQLite
EXPLAIN QUERY PLAN SELECT ...;
```

```javascript
// MongoDB
db.collection.find({...}).explain("executionStats")
```

## Index Recommendation Format

When suggesting indexes:

```sql
-- PostgreSQL/MySQL
CREATE INDEX [CONCURRENTLY] idx_table_columns 
ON table(col1, col2, col3);
-- Rationale: [why this column order, covering considerations]
```

```javascript
// MongoDB (ESR: Equality, Sort, Range)
db.collection.createIndex({ equality_field: 1, sort_field: -1, range_field: 1 })
```

## Handling Missing Information

When user provides incomplete context:

**No schema provided:**
> "To give precise recommendations, I need the table schema. Could you share:
> 1. Column data types
> 2. Existing indexes
> 3. Approximate row counts
>
> Meanwhile, here's what I can infer from the query..."

**No EXPLAIN output:**
> "Run this command and share the output:
> ```sql
> EXPLAIN (ANALYZE, BUFFERS) <their query>;
> ```
> This will show actual vs estimated rows and scan types."

**Unknown database type:**
> "Which database are you using? (PostgreSQL, MySQL, SQLite, MongoDB, etc.)
> The optimization strategies differ significantly between them."

## Conditional Workflows

**Analyzing a query:**
1. Run `analyze_query.py` first
2. Check for anti-patterns
3. Suggest fixes with examples

**Interpreting EXPLAIN output:**
1. Run `parse_explain.py` with appropriate flag
2. Identify warnings
3. Correlate with query structure

**Full optimization request:**
1. Analyze query → identify issues
2. Suggest indexes → generate CREATE statements
3. Request EXPLAIN → interpret results
4. Provide before/after comparison

**Complex optimization (3+ issues or architectural decisions):**
Use extended reasoning to:
1. Map full query execution path
2. Identify all optimization opportunities
3. Evaluate trade-offs between approaches
4. Prioritize by impact and implementation cost

## Script Error Handling

**If scripts fail:**
1. Check Python version (requires 3.7+)
2. Verify input format (SQL string, valid file path)
3. Fall back to manual analysis using reference files

**Common errors:**
- `FileNotFoundError` → Check file path, use absolute paths
- `json.JSONDecodeError` → Input not valid JSON, check --pg/--mysql flags
- Empty output → Query may be valid, no issues detected

**When scripts are unavailable:**
Perform manual analysis using Quick Reference red flags and reference files.

## Unsupported Databases

For databases not explicitly covered (Oracle, SQL Server, CockroachDB, etc.):

1. Apply general SQL patterns from `references/sql-patterns.md`
2. Note: "Optimization patterns based on PostgreSQL/MySQL; verify syntax for [database]"
3. Recommend user consult database-specific documentation for EXPLAIN syntax and index creation