# gemini-live-api > Build real-time voice and video applications with Google's Gemini Live API. Use when implementing bidirectional audio/video streaming, voice assistants, conversational AI with interruption handling, or any application requiring low-latency multimodal interaction with Gemini models. Covers WebSocket streaming, voice activity detection (VAD), function calling during conversations, session management/resumption, and ephemeral tokens for secure client-side connections. - Author: gamepop - Repository: gamepop/pg-skills - Version: 20251223165249 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/gamepop/pg-skills - Web: https://mule.run/skillshub/@@gamepop/pg-skills~gemini-live-api:20251223165249 --- --- name: gemini-live-api description: Build real-time voice and video applications with Google's Gemini Live API. Use when implementing bidirectional audio/video streaming, voice assistants, conversational AI with interruption handling, or any application requiring low-latency multimodal interaction with Gemini models. Covers WebSocket streaming, voice activity detection (VAD), function calling during conversations, session management/resumption, and ephemeral tokens for secure client-side connections. --- # Gemini Live API Real-time bidirectional streaming API for voice/video conversations with Gemini. ## Quick Start ```python from google import genai from google.genai import types client = genai.Client(api_key="YOUR_API_KEY") config = types.LiveConnectConfig(response_modalities=["AUDIO"]) async with client.aio.live.connect( model="gemini-2.5-flash-preview-native-audio-dialog", config=config ) as session: # Send audio await session.send_realtime_input( audio=types.Blob(data=audio_bytes, mime_type="audio/pcm;rate=16000") ) # Receive responses async for response in session.receive(): if response.data: play_audio(response.data) ``` ## Core Patterns ### Audio Chat (Mic + Speaker) Use `scripts/audio_chat.py` for complete microphone-to-speaker implementation with PyAudio. ### Text Chat via Live API Use `scripts/text_chat.py` for text-based streaming conversations. ### Function Calling Use `scripts/function_calling.py` for tool integration: ```python config = types.LiveConnectConfig( response_modalities=["TEXT"], tools=[{ "function_declarations": [{ "name": "get_weather", "description": "Get weather for location", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}} }] }] ) # Handle tool_call in response, send result via session.send_tool_response() ``` ### Ephemeral Tokens (Client-Side Auth) Use `scripts/generate_token.py` for secure browser/mobile connections: ```python token = client.auth_tokens.create(config={ "uses": 1, "expire_time": now + timedelta(minutes=30), "new_session_expire_time": now + timedelta(minutes=1) }) # Client uses token.name as API key ``` ## Key Configuration | Setting | Options | |---------|---------| | `response_modalities` | `["AUDIO"]` or `["TEXT"]` (not both) | | Audio input | 16-bit PCM, 16kHz, mono | | Audio output | 24kHz | | Session limit | 15 min audio-only, 2 min with video | ### Voice Selection ```python speech_config=types.SpeechConfig( voice_config=types.VoiceConfig( prebuilt_voice_config=types.PrebuiltVoiceConfig( voice_name="Puck" # Aoede, Charon, Fenrir, Kore, Puck ) ) ) ``` ### Interruption Handling (VAD) Automatic by default. Check `response.server_content.interrupted` for interruptions. ### Session Resumption Save `response.session_resumption_update.handle`, pass to new session within 2 hours. ## Resources - **`scripts/audio_chat.py`** - Full mic/speaker streaming example - **`scripts/text_chat.py`** - Text-based Live API chat - **`scripts/function_calling.py`** - Tool/function calling pattern - **`scripts/generate_token.py`** - Ephemeral token generation - **`references/api-reference.md`** - Complete configuration options, models, audio specs