Async streaming STT and TTS

Reliable media streaming with minimum latency

What is async streaming STT and TTS?

💡 Most AI Agents available today on popular platforms suffer from the common pitfall. Their TTS (text to speech) is dependent on the speech length (and text length by the same token) - the longer is the sentence to be spoken by the agent, the longer is also a time it takes to produce this speech, and latency observed by a caller increases. This is also the case for STT (speech to text)

VoIP Number uses async streaming STT and TTS to make latency to audio independent from input speech (and input text). By switching from regular TTS to async version, we reduce latency to first audio byte on TTS from around 500 ms to 170 ms. Similar advantage is achieved on STT

Last updated