Trusted by teams at
“NextKast built a fully automated AI DJ for our radio station customers using PlayAI Dialog voices. We love how expressive, emotional, and natural the voices sound, and didn’t find anything else close in the market. In radio, keeping your audience engaged is the whole game, and Play’s voices do that.”
Winston Potgieter, Founder, Axis Entertainment
< 320ms latency
Optimized for multi-turn conversations
Dynamic prosody and emotion
On-prem deployable
Create engaging and emotive AI narrations, podcasts and audiobooks, or power ultra-realistic voice agents. Dialog understands each turn in a conversation and generate speech with the right prosody, pacing and emotion.
Create engaging contextual conversations between multiple characters
Get StartedUnlike previous voice AI models, PlayDialog uses a conversation's entire conversational context as input, meaning that narrations and multi-party conversations sound fluid, engaging, and natural, with excellent prosody, pacing and intonation.
Our industry-leading voice cloning capabilities mean that with PlayDialog, you get a faithful reproduction barely distinguishable from the original. Create narrations, podcasts, dubbing accurately every time.
Dialog was preferred 3:1 in testing versus the industry's best known model, winning on emotion, quality and accuracy. Try it and experience the difference
import axios from 'axios';
import dotenv from 'dotenv';
dotenv.config();
// Set up headers with your API secret key and user ID
const userId = process.env.PLAYDIALOG_USER_ID;
const secretKey = process.env.PLAYDIALOG_SECRET_KEY;
const headers = {
'X-USER-ID': userId,
Authorization: secretKey,
'Content-Type': 'application/json',
};
// Define the model
const model = 'PlayDialog';
// Define voices for the 2 hosts
// Find all voices here https://docs.play.ai/tts-api-reference/voices
const voice1 = 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json';
const voice2 = 's3://voice-cloning-zero-shot/e040bd1b-f190-4bdb-83f0-75ef85b18f84/original/manifest.json';
// Podcast transcript should be in the format of Host 1: ... Host 2:
const transcript = `
Host 1: Welcome to The Tech Tomorrow Podcast! Today we're diving into the fascinating world of voice AI and what the future holds.
Host 2: And what a topic this is. The technology has come so far from those early days of basic voice commands.
Host 1: Remember when we thought it was revolutionary just to ask our phones to set a timer?
Host 2: Now we're having full conversations with AI that can understand context, emotion, and even cultural nuances. It's incredible.
Host 1: Though it does raise some interesting questions about privacy and ethics. Where do we draw the line?
Host 2: Exactly. The potential benefits for accessibility and education are huge, but we need to be thoughtful about implementation.
Host 1: Well, we'll be exploring all of these aspects today. Stay with us as we break down the future of voice AI.
`;
const payload = {
model,
text: transcript,
voice: voice1,
voice2: voice2,
turnPrefix: 'Host 1:',
turnPrefix2: 'Host 2:',
outputFormat: 'mp3',
};
// Send the POST request to trigger podcast generation
const response = await axios.post('https://api.play.ai/api/v1/tts/', payload, { headers });
// Get the job ID to check the status
const jobId = response.data.id;
if (!jobId) {
throw new Error('Job ID not returned by API');
}
// Use the job ID to check completion status
const url = `https://api.play.ai/api/v1/tts/${jobId}`;
const delaySeconds = 2000;
// Keep checking until status is COMPLETED.
// Longer transcripts take more time to complete.
let podcastAudio = null;
while (!podcastAudio) {
const statusResponse = await axios.get(url, { headers });
const status = statusResponse.data.output?.status;
console.log(status);
if (status === 'COMPLETED') {
// Once completed, audio URL will be available
podcastAudio = statusResponse.data.output.url;
} else {
await new Promise((resolve) => setTimeout(resolve, delaySeconds));
}
}
console.log('Podcast audio URL:', podcastAudio);
PlayDialog is easy to use and is available through our API and on platforms like Fal. It also supports Websockets and streaming from LLMs.
PlayAI's models go where you need them, including on-prem for the highest security applications
Dialog is GDPR, SOC 2 type II, and ISO2700 compliant. All models are available on request on cloud platforms or on-prem for the most demanding enterprise applications
Play's TTS voice models lead the industry in voice quality, prosody and intonation.
Time to first audio as low as 320ms, less if on-prem deployment required
Voice AI generation and customization all supported by easy to use APIs.
Dialog is fine-tuned to ensure accurate generation of acronyms, numerical sequences (e.g. phone, credit card numbers).
English, Spanish, Arabic fully supported; 25+ languages under development
All models are GDPR, ISO 27001 and SOC 2 type II compliant. On-prem also available.
If you have an enterprise use case in mind, we'd love to hear from you.