Gemini 3.1 TTS Launches with Inline Audio Tag Prompting System
Google has launched Gemini 3.1 TTS with a new inline audio tag system using square-bracket syntax. Developers can embed style, pace, and vocalization cues directly in the prompt text — for example: [screams], [whispers], [slow], [fast], [short pause], [long pause], [cackles]. Tags must not be placed directly adjacent to each other, and multiple cue types can be combined inline. Target use cases include language learning tools, interactive podcast applications, and adaptive customer service systems.
Why It Matters
Inline audio tag prompting eliminates the need for separate prosody configuration APIs or post-processing audio pipelines — bringing expressive TTS control into the same prompt engineering workflow developers use for text generation. For voice AI applications, this is a meaningful step toward treating speech synthesis as a first-class, prompt-controllable capability rather than a parameter-tuned pipeline.