# filter-group-consistency

> Filter duplicate file groups to keep only consistent decisions (all KEEP or all ARCHIVE). Use when processing CSV duplicate file groups and need to focus on groups with clear, consistent decisions.

- Author: Austin
- Repository: austin183/agent-ollama-projects
- Version: 20260209195648
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-10
- Source: https://github.com/austin183/agent-ollama-projects
- Web: https://mule.run/skillshub/@@austin183/agent-ollama-projects~filter-group-consistency:20260209195648

---

# Filter Group Consistency

Filter duplicate file groups to keep only consistent decisions (all KEEP or all ARCHIVE). Use when processing CSV duplicate file groups and need to focus on groups with clear, consistent decisions.

## Overview

This skill filters CSV duplicate file groups to keep only entries where all decisions in a group are identical. This simplifies downstream processing by focusing on groups with clear, consistent decisions.

## Workflow

1. Read from `FilteredDuplicateAnalysis.csv` (or similar input)
2. Group entries by Group ID
3. For each group, check if all decisions are identical
4. Write output to `FocusedDuplicates.csv` (or specified output file)

## Key Learnings

### Simplify Conditional Logic

Use simple length checks instead of complex conditional logic:

```python
# Good: Simple and clear
if len(decisions) == 1:
    kept_entries.extend(entries)
```

### Single Entry Auto-Qualification

Groups with a single entry should be automatically included since there's no inconsistency to check.

### Keep Fieldnames Consistent

Ensure the output fieldnames match the input CSV structure:

```python
fieldnames = ['Group ID', 'Folder', 'Filename', 'Size (KB)', 'Match %', 'Decision', 'Rationale']
```

## Implementation

See [csv-patterns/SKILL.md](csv-patterns/SKILL.md) for CSV reading/writing patterns and the complete example implementation.

## Complete Example

```python
#!/usr/bin/env python3
"""
Focused Duplicates Script

Filters DuplicateAnalysis.csv for groups where ALL entries are either KEEP or ARCHIVE
(fully consistent groups).

Output: FocusedDuplicates.csv in workspace directory
"""

import csv
from pathlib import Path

def main():
    input_file = Path("workspace/FilteredDuplicateAnalysis.csv")
    output_file = Path("workspace/FocusedDuplicates.csv")

    # Read input and group entries by Group ID
    groups = {}

    with open(input_file, 'r', newline='', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for row in reader:
            group_id = row['Group ID']
            if group_id not in groups:
                groups[group_id] = []
            groups[group_id].append(row)

    # Filter groups where ALL entries have the same decision
    kept_entries = []

    for group_id, entries in groups.items():
        if len(entries) == 1:
            kept_entries.extend(entries)
            continue

        decisions = {entry['Decision'] for entry in entries}
        if len(decisions) == 1:
            kept_entries.extend(entries)

    # Write output
    fieldnames = ['Group ID', 'Folder', 'Filename', 'Size (KB)', 'Match %', 'Decision', 'Rationale']
    with open(output_file, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(kept_entries)

    print(f"Processed {len(groups)} groups")
    print(f"Kept {len(kept_entries)} entries from fully consistent groups")

if __name__ == '__main__':
    main()
```

## Reference Implementation

See `filter-group-consistency-reference.py` for a complete working implementation based on the actual `scripts/focused_duplicates.py`.