# bailian-tts > Generate human-like speech audio with BaiLian DashScope Qwen TTS (qwen3-tts-flash). Use when converting text to speech, producing voice lines for short drama/news videos, or documenting TTS request/response fields for DashScope. - Author: cinience - Repository: aivideo-labs/aivideo-skills - Version: 20260131174452 - Stars: 0 - Forks: 0 - Last Updated: 2026-02-06 - Source: https://github.com/aivideo-labs/aivideo-skills - Web: https://mule.run/skillshub/@@aivideo-labs/aivideo-skills~bailian-tts:20260131174452 --- --- name: bailian-tts description: Generate human-like speech audio with BaiLian DashScope Qwen TTS (qwen3-tts-flash). Use when converting text to speech, producing voice lines for short drama/news videos, or documenting TTS request/response fields for DashScope. --- Category: provider # BaiLian Qwen TTS ## Critical model name Use the recommended model: - `qwen3-tts-flash` ## Normalized interface (tts.generate) ### Request - `text` (string, required) - `voice` (string, required) - `language_type` (string, optional; default `Auto`) - `stream` (bool, optional; default false) ### Response - `audio_url` (string, when stream=false) - `audio_base64_pcm` (string, when stream=true) - `sample_rate` (int, 24000) - `format` (string, wav or pcm depending on mode) ## Quick start (Python + DashScope SDK) ```python import os import dashscope # Beijing region; for Singapore use: https://dashscope-intl.aliyuncs.com/api/v1 dashscope.base_http_api_url = "https://dashscope.aliyuncs.com/api/v1" text = "Hello, this is a short voice line." response = dashscope.MultiModalConversation.call( model="qwen3-tts-flash", api_key=os.getenv("DASHSCOPE_API_KEY"), text=text, voice="Cherry", language_type="English", stream=False, ) audio_url = response.output.audio.url print(audio_url) ``` ## Streaming notes - `stream=True` returns Base64-encoded PCM chunks at 24kHz. - Decode chunks and play or concatenate to a pcm buffer. - The response contains `finish_reason == "stop"` when the stream ends. ## Operational guidance - Keep text short per request (doc limit is 600 characters for qwen3-tts-flash). - Use `language_type` consistent with the text to improve pronunciation. - Cache by `(text, voice, language_type)` to avoid repeat costs. ## Output location - Save generated audio under `output/audio/` by default. ## References - `references/api_reference.md` for parameter mapping and streaming example.