# meeting-transcription

> Real-time meeting transcription and summarization with speaker diarization, action item extraction, and hierarchical summaries. Optimized for medical research meetings with MedCPT embeddings.

- Author: Thomas Landry
- Repository: thomas-landry/assistant_scientist
- Version: 20260122145858
- Stars: 0
- Forks: 0
- Last Updated: 2026-02-06
- Source: https://github.com/thomas-landry/assistant_scientist
- Web: https://mule.run/skillshub/@@thomas-landry/assistant_scientist~meeting-transcription:20260122145858

---

---
name: meeting-transcription
description: Real-time meeting transcription and summarization with speaker diarization, action item extraction, and hierarchical summaries. Optimized for medical research meetings with MedCPT embeddings.
execution:
  type: python_function
  module: skills.meeting_transcription.main
  function: run
  async: true
  timeout: 3600
parameters:
  - name: audio_source
    type: string
    description: Path to audio file (MP3, WAV, M4A, MP4) or 'live' for real-time streaming
    required: true
  - name: output_formats
    type: array
    description: Output formats to generate
    required: false
    default: ["markdown"]
    items_type: string
    enum:
      - markdown
      - json
      - notion
  - name: extract_actions
    type: boolean
    description: Extract action items with assignees and deadlines
    required: false
    default: true
  - name: speaker_names
    type: object
    description: Mapping of speaker IDs to names (e.g., {"spk_0": "Thomas"})
    required: false
    default: {}
  - name: use_medical_embeddings
    type: boolean
    description: Use MedCPT embeddings for medical terminology
    required: false
    default: true
  - name: stream
    type: boolean
    description: Enable real-time WebSocket streaming
    required: false
    default: false
  - name: websocket_host
    type: string
    description: WebSocket server host for streaming
    required: false
    default: "localhost"
  - name: websocket_port
    type: integer
    description: WebSocket server port for streaming
    required: false
    default: 8765
output:
  type: object
  description: MeetingResult with transcript, summary, action items, and output files
---

# Meeting Transcription & Summarization

## Overview

Real-time meeting transcription and summarization skill providing Otter.ai-equivalent functionality. Features adaptive Whisper model selection, speaker diarization, topic segmentation using MedCPT embeddings, and hierarchical summarization.

**Key Features:**
- Real-time transcription via WebSocket streaming (<500ms latency)
- Adaptive model selection based on available hardware
- Speaker diarization with word-level attribution
- Medical terminology support via MedCPT embeddings
- Action item extraction with assignees and deadlines
- Multi-format export (Markdown, JSON, Notion)

## Quick Start

### Transcribe a Meeting Recording

```python
from skills.meeting_transcription.main import run

result = await run(
    audio_source="/recordings/team_meeting.mp3",
    output_formats=["markdown", "json"],
    extract_actions=True,
    speaker_names={"spk_0": "Thomas", "spk_1": "Youjin"},
    use_medical_embeddings=True,
)
```

### Real-Time Streaming

```python
# Start WebSocket server for real-time transcription
await run(
    audio_source="live",
    stream=True,
    websocket_host="localhost",
    websocket_port=8765,
)
```

### CLI Usage

```bash
# Batch transcription
landry-assistant transcribe --source /recordings/meeting.mp3 --output markdown --extract-actions

# Real-time streaming
landry-assistant transcribe --stream --host localhost --port 8765
```

## Installation Requirements

```bash
pip install faster-whisper whisperx pydub websockets sentence-transformers pydantic pyyaml
```

## Configuration

Edit `skills/meeting-transcription/config.yaml`:

```yaml
transcription:
  auto_model_select: true
  default_model: large-v3
  fallback_chain:
    - large-v3
    - medium
    - small
    - tiny

audio:
  chunk_duration_ms: 200
  vad_threshold: 0.5

diarization:
  enabled: true
  min_speakers: 2
  max_speakers: 10

segmentation:
  use_medical_embeddings: true
  min_segment_minutes: 3
  max_segment_minutes: 30

summarization:
  llm_provider: claude
  executive_length: 3
```

## Output Structure

```python
{
    "session": {
        "id": "ms_2026_01_21_abc123",
        "title": "Q1 Planning Session",
        "duration_seconds": 2820.0,
        "num_speakers": 4,
        "status": "completed"
    },
    "transcript": [...],  # TranscriptSegment[]
    "topics": [...],      # TopicSegment[]
    "summary": {...},     # MeetingSummary
    "action_items": [...],# ActionItem[]
    "output_files": ["/path/to/summary.md", "/path/to/result.json"]
}
```

## Architecture

```
meeting-transcription/
├── SKILL.md              # This file
├── main.py               # Entry point
├── config.yaml           # Configuration
├── src/
│   ├── audio/            # Audio loading & streaming
│   ├── transcription/    # Whisper + adaptive selection
│   ├── diarization/      # Speaker attribution
│   ├── processing/       # Segmentation, summarization, extraction
│   ├── outputs/          # Multi-format export
│   └── models/           # Pydantic entities
└── tests/                # Unit & integration tests
```

## Performance Targets

| Metric | Target | Notes |
|--------|--------|-------|
| Real-time latency | <500ms | WebSocket streaming |
| Batch transcription | 2x audio duration | Varies by model |
| Summary generation | <30s | 1-hour meeting |
| Memory usage | <4GB | CPU-only mode |
| WER accuracy | >90% | Clear audio |

## Privacy

All processing runs locally by default. Cloud APIs (Deepgram, OpenAI) available as optional fallback for improved accuracy.