Documentation Index
Fetch the complete documentation index at: https://visionagents.ai/llms.txt
Use this file to discover all available pages before exploring further.
View Cartesia Narrator Example on GitHub
Check out the complete Cartesia Narrator example in our GitHub repository
Vision Agents requires a Stream account
for real-time transport. Most providers offer free tiers to get started.
What You Will Build
- Listen to story topics via voice input using Deepgram
- Generate creative narratives using OpenAI’s GPT-4o-mini
- Speak with highly expressive and customizable voice using Cartesia’s Sonic 3 TTS
- Use audio markup tags for enhanced speech control (emotions, pauses, emphasis)
- Run on Stream’s low-latency edge network
Next Steps
Cartesia Integration
Explore Cartesia’s TTS configuration and audio markup
Live Video Try-On
Real-time virtual try-on with Decart’s Lucy-2 model

