Dialog: The world's most expressive voice AI model

Dialog is a highly expressive, natural sounding voice AI model ideal for narrations, audiobooks, podcasts, and voice agents, where accurate and engaging conversational tone, prosody and emotion are required.

Trusted by teams at

“NextKast built a fully automated AI DJ for our radio station customers using PlayAI Dialog voices. We love how expressive, emotional, and natural the voices sound, and didn’t find anything else close in the market. In radio, keeping your audience engaged is the whole game, and Play’s voices do that.”

Winston Potgieter, Founder, Axis Entertainment

< 320ms latency

Optimized for multi-turn conversations

Dynamic prosody and emotion

On-prem deployable

Hear Dialog in action

Create engaging and emotive AI narrations, podcasts and audiobooks, or power ultra-realistic voice agents. Dialog understands each turn in a conversation and generate speech with the right prosody, pacing and emotion.

voice

AI podcast between hosts

Generate entire AI podcasts with any voices

Get Started
voice

Conversation between characters

Create engaging contextual conversations between multiple characters

Get Started
voice

Engaging narration

Generate rich dramatic narrative content

Get Started
voice

Dramatic dialogs for a scene

Prompt and direct to generate dramatic deliveries

Get Started

Dialog Uses a Conversation's Entire Context

Unlike previous voice AI models, PlayDialog uses a conversation's entire conversational context as input, meaning that narrations and multi-party conversations sound fluid, engaging, and natural, with excellent prosody, pacing and intonation.

Voice cloning benchmark

Dialog Delivers Best in Class Voice Cloning

Our industry-leading voice cloning capabilities mean that with PlayDialog, you get a faithful reproduction barely distinguishable from the original. Create narrations, podcasts, dubbing accurately every time.

Dialog is preferred 3:1 over industry leading model

Dialog was preferred 3:1 in testing versus the industry's best known model, winning on emotion, quality and accuracy. Try it and experience the difference

PlayAI Dialog vs Competing Model
Generate spoken audio from input text

  import axios from 'axios';
  import dotenv from 'dotenv';
  
  dotenv.config();
  
  // Set up headers with your API secret key and user ID
  const userId = process.env.PLAYDIALOG_USER_ID;
  const secretKey = process.env.PLAYDIALOG_SECRET_KEY;
  
  const headers = {
    'X-USER-ID': userId,
    Authorization: secretKey,
    'Content-Type': 'application/json',
  };
  
  // Define the model
  const model = 'PlayDialog';
  
  // Define voices for the 2 hosts
  // Find all voices here https://docs.play.ai/tts-api-reference/voices
  const voice1 = 's3://voice-cloning-zero-shot/baf1ef41-36b6-428c-9bdf-50ba54682bd8/original/manifest.json';
  const voice2 = 's3://voice-cloning-zero-shot/e040bd1b-f190-4bdb-83f0-75ef85b18f84/original/manifest.json';
  
  // Podcast transcript should be in the format of Host 1: ... Host 2:
  const transcript = `
  Host 1: Welcome to The Tech Tomorrow Podcast! Today we're diving into the fascinating world of voice AI and what the future holds.
  Host 2: And what a topic this is. The technology has come so far from those early days of basic voice commands.
  Host 1: Remember when we thought it was revolutionary just to ask our phones to set a timer?
  Host 2: Now we're having full conversations with AI that can understand context, emotion, and even cultural nuances. It's incredible.
  Host 1: Though it does raise some interesting questions about privacy and ethics. Where do we draw the line?
  Host 2: Exactly. The potential benefits for accessibility and education are huge, but we need to be thoughtful about implementation.
  Host 1: Well, we'll be exploring all of these aspects today. Stay with us as we break down the future of voice AI.
  `;
  
  const payload = {
    model,
    text: transcript,
    voice: voice1,
    voice2: voice2,
    turnPrefix: 'Host 1:',
    turnPrefix2: 'Host 2:',
    outputFormat: 'mp3',
  };
  
  // Send the POST request to trigger podcast generation
  const response = await axios.post('https://api.play.ai/api/v1/tts/', payload, { headers });
  
  // Get the job ID to check the status
  const jobId = response.data.id;
  
  if (!jobId) {
    throw new Error('Job ID not returned by API');
  }
  
  // Use the job ID to check completion status
  const url = `https://api.play.ai/api/v1/tts/${jobId}`;
  const delaySeconds = 2000;
  
  // Keep checking until status is COMPLETED.
  // Longer transcripts take more time to complete.
  let podcastAudio = null;
  while (!podcastAudio) {
    const statusResponse = await axios.get(url, { headers });
    const status = statusResponse.data.output?.status;
    console.log(status);
  
    if (status === 'COMPLETED') {
      // Once completed, audio URL will be available
      podcastAudio = statusResponse.data.output.url;
    } else {
      await new Promise((resolve) => setTimeout(resolve, delaySeconds));
    }
  }
  
  console.log('Podcast audio URL:', podcastAudio);
  
  

It's Easy to Code

PlayDialog is easy to use and is available through our API and on platforms like Fal. It also supports Websockets and streaming from LLMs.

Need it on prem? No problem

PlayAI's models go where you need them, including on-prem for the highest security applications

OnPrem
Enterprise Certifications

Dialog is Enterprise ready

Dialog is GDPR, SOC 2 type II, and ISO2700 compliant. All models are available on request on cloud platforms or on-prem for the most demanding enterprise applications

Key Features

Lifelike voices

Play's TTS voice models lead the industry in voice quality, prosody and intonation.

Low latency

Time to first audio as low as 320ms, less if on-prem deployment required

Easy to use

Voice AI generation and customization all supported by easy to use APIs.

Accuracy

Dialog is fine-tuned to ensure accurate generation of acronyms, numerical sequences (e.g. phone, credit card numbers).

Multilingual

English, Spanish, Arabic fully supported; 25+ languages under development

Security

All models are GDPR, ISO 27001 and SOC 2 type II compliant. On-prem also available.

Want to Talk to Our Team?

If you have an enterprise use case in mind, we'd love to hear from you.