# meeting-transcription > Real-time meeting transcription and summarization with speaker diarization, action item extraction, and hierarchical summaries. Optimized for medical research meetings with MedCPT embeddings. - Author: Thomas Landry - Repository: thomas-landry/assistant_scientist - Version: 20260122145858 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/thomas-landry/assistant_scientist - Web: https://mule.run/skillshub/@@thomas-landry/assistant_scientist~meeting-transcription:20260122145858 --- --- name: meeting-transcription description: Real-time meeting transcription and summarization with speaker diarization, action item extraction, and hierarchical summaries. Optimized for medical research meetings with MedCPT embeddings. execution: type: python_function module: skills.meeting_transcription.main function: run async: true timeout: 3600 parameters: - name: audio_source type: string description: Path to audio file (MP3, WAV, M4A, MP4) or 'live' for real-time streaming required: true - name: output_formats type: array description: Output formats to generate required: false default: ["markdown"] items_type: string enum: - markdown - json - notion - name: extract_actions type: boolean description: Extract action items with assignees and deadlines required: false default: true - name: speaker_names type: object description: Mapping of speaker IDs to names (e.g., {"spk_0": "Thomas"}) required: false default: {} - name: use_medical_embeddings type: boolean description: Use MedCPT embeddings for medical terminology required: false default: true - name: stream type: boolean description: Enable real-time WebSocket streaming required: false default: false - name: websocket_host type: string description: WebSocket server host for streaming required: false default: "localhost" - name: websocket_port type: integer description: WebSocket server port for streaming required: false default: 8765 output: type: object description: MeetingResult with transcript, summary, action items, and output files --- # Meeting Transcription & Summarization ## Overview Real-time meeting transcription and summarization skill providing Otter.ai-equivalent functionality. Features adaptive Whisper model selection, speaker diarization, topic segmentation using MedCPT embeddings, and hierarchical summarization. **Key Features:** - Real-time transcription via WebSocket streaming (<500ms latency) - Adaptive model selection based on available hardware - Speaker diarization with word-level attribution - Medical terminology support via MedCPT embeddings - Action item extraction with assignees and deadlines - Multi-format export (Markdown, JSON, Notion) ## Quick Start ### Transcribe a Meeting Recording ```python from skills.meeting_transcription.main import run result = await run( audio_source="/recordings/team_meeting.mp3", output_formats=["markdown", "json"], extract_actions=True, speaker_names={"spk_0": "Thomas", "spk_1": "Youjin"}, use_medical_embeddings=True, ) ``` ### Real-Time Streaming ```python # Start WebSocket server for real-time transcription await run( audio_source="live", stream=True, websocket_host="localhost", websocket_port=8765, ) ``` ### CLI Usage ```bash # Batch transcription landry-assistant transcribe --source /recordings/meeting.mp3 --output markdown --extract-actions # Real-time streaming landry-assistant transcribe --stream --host localhost --port 8765 ``` ## Installation Requirements ```bash pip install faster-whisper whisperx pydub websockets sentence-transformers pydantic pyyaml ``` ## Configuration Edit `skills/meeting-transcription/config.yaml`: ```yaml transcription: auto_model_select: true default_model: large-v3 fallback_chain: - large-v3 - medium - small - tiny audio: chunk_duration_ms: 200 vad_threshold: 0.5 diarization: enabled: true min_speakers: 2 max_speakers: 10 segmentation: use_medical_embeddings: true min_segment_minutes: 3 max_segment_minutes: 30 summarization: llm_provider: claude executive_length: 3 ``` ## Output Structure ```python { "session": { "id": "ms_2026_01_21_abc123", "title": "Q1 Planning Session", "duration_seconds": 2820.0, "num_speakers": 4, "status": "completed" }, "transcript": [...], # TranscriptSegment[] "topics": [...], # TopicSegment[] "summary": {...}, # MeetingSummary "action_items": [...],# ActionItem[] "output_files": ["/path/to/summary.md", "/path/to/result.json"] } ``` ## Architecture ``` meeting-transcription/ ├── SKILL.md # This file ├── main.py # Entry point ├── config.yaml # Configuration ├── src/ │ ├── audio/ # Audio loading & streaming │ ├── transcription/ # Whisper + adaptive selection │ ├── diarization/ # Speaker attribution │ ├── processing/ # Segmentation, summarization, extraction │ ├── outputs/ # Multi-format export │ └── models/ # Pydantic entities └── tests/ # Unit & integration tests ``` ## Performance Targets | Metric | Target | Notes | |--------|--------|-------| | Real-time latency | <500ms | WebSocket streaming | | Batch transcription | 2x audio duration | Varies by model | | Summary generation | <30s | 1-hour meeting | | Memory usage | <4GB | CPU-only mode | | WER accuracy | >90% | Clear audio | ## Privacy All processing runs locally by default. Cloud APIs (Deepgram, OpenAI) available as optional fallback for improved accuracy.