Back to all

tts-broadcast-injection-issues

by sani

00Feb 6, 2026Visit Source
Avoid broadcast speaker injection (injecting embeddings to ALL codec positions) in multi-speaker TTS training. Use when: (1) Training codec-based TTS models with multiple speakers, (2) Model generates to max_new_tokens instead of stopping at EOS, (3) Audio duration is excessively long (e.g., 163 seconds for short text). Fix: Use single-position injection (position-6) which preserves EOS detection. More speaker conditioning is NOT always better.