Elevenlabs Text To Speech Java Script (NodeJS) API Guide

As a developer, using high-quality text-to-speech (TTS) services is essential for creating applications that deliver dynamic audio content—whether it’s for audiobooks, chatbots, or podcasts. ElevenLabs offers a powerful text-to-speech API with AI-driven voice synthesis, capable of producing lifelike audio. In this tutorial, I’ll walk you through setting up the ElevenLabs Text-to-Speech API using JavaScript and show you how to generate speech from text in your app. By the end, you’ll have a clear understanding of how to integrate this API into your workflow.

Prerequisites: This guide assumes you have a basic understanding of JavaScript, Node.js, and working with REST APIs.

Step 1: Understanding ElevenLabs API

The ElevenLabs API provides a flexible and robust way to generate speech from text using AI voices. You can specify voice parameters like voice_id, model_id, similarity_boost, and speaker_boost to customize the sound and quality. It’s particularly useful for dynamic applications such as audiobooks, chatbots, or podcasts, where high-quality, AI-driven voice synthesis can elevate user experience.

Key Concepts:

voice_id: This identifier determines which AI voice will be used for the text-to-speech conversion.
model_id: Specifies the model to use for speech synthesis (e.g., “turbo” for faster results).
voice_settings: Includes optional parameters like similarity_boost and speaker_boost to fine-tune the output voice.

Step 2: Setup Your Environment

1. Get an API Key

First, you’ll need to sign up at ElevenLabs.io and generate your API key. This key will be used for authentication in API requests.

2. Install Node.js and Create a New Project

If you don’t have Node.js installed, download and install it from Node.js. Create a new Node.js project by running:

mkdir elevenlabs-tts
cd elevenlabs-tts
npm init -y

3. Install Required Dependencies

For this tutorial, we’ll use axios to handle API requests. Run the following command to install it:

npm install axios

4. Clone the ElevenLabs JavaScript SDK (Optional)

ElevenLabs offers an open-source JavaScript SDK on GitHub that simplifies interaction with their API. You can clone it or install directly via npm:

npm install elevenlabs-js

You can check out their GitHub repository for more info: ElevenLabs JS SDK.

Step 3: Writing Your First Text-to-Speech Script

Now that everything is set up, let’s write a simple JavaScript script to convert text to speech using the ElevenLabs API.

1. Basic API Request Setup

const axios = require('axios');

// Your API Key from ElevenLabs
const API_KEY = 'YOUR_API_KEY'; 

const generateSpeech = async (text, voice_id) => {
    try {
        const response = await axios.post(
            'https://api.elevenlabs.io/v1/text-to-speech', 
            {
                text,
                voice_id,
                model_id: '21m00tcm4tlvdq8ikwam', // Default model
                voice_settings: {
                    similarity_boost: 0.75, // Adjust this value based on preferences
                    speaker_boost: true // Can be true or false based on voice requirements
                }
            },
            {
                headers: {
                    'xi-api-key': API_KEY,
                    'Content-Type': 'application/json'
                },
                responseType: 'arraybuffer' // To handle the audio file response
            }
        );
        
        // Save the audio file to disk
        const fs = require('fs');
        fs.writeFileSync('output.mp3', response.data);
        console.log('Audio file saved as output.mp3');
    } catch (error) {
        console.error('Error generating speech:', error);
    }
};

const text = "Hello! This is a sample text being converted to speech using ElevenLabs.";
const voice_id = 'INSERT_VOICE_ID_HERE'; // Replace with the actual voice ID
generateSpeech(text, voice_id);

Explanation of the Code:

We are using axios to send a POST request to ElevenLabs’ TTS endpoint.
In the body, we send the text and parameters like voice_id and model_id.
The response will contain an audio stream (MPEG format), which we save to a file (output.mp3).
The headers include your ElevenLabs API key and specify the request Content-Type as JSON.

2. Working with Voice Settings

You can tweak the voice_settings by changing the similarity_boost (how closely the voice should match a specific style) and speaker_boost (boost the speaker’s energy).

"voice_settings": {
    "similarity_boost": 0.85,
    "speaker_boost": false
}

3. Handling Latency

If you’re dealing with real-time applications like chatbots, you might need to consider latency. Using the model_id turbo speeds up the process but might slightly reduce the audio quality.

Step 4: Testing the Output

Once you run the script, you should see the following in your terminal:

Audio file saved as output.mp3

You can then listen to the generated speech by playing output.mp3 in your preferred media player.

Step 5: Next Steps

Once you’re familiar with the basics, you can explore more advanced topics:

Dynamic Voices: Query the API for available voices and let users select one dynamically.
Enhanced Voice Customization: Use the voice_settings to adjust similarity_boost and speaker_boost for more natural-sounding output.
Streaming: Rather than saving the audio file locally, you can stream the audio directly in your application, which is particularly useful for chatbots or voice assistants.

Step 6: Using ElevenLabs with Python

If you’re also working with Python, ElevenLabs offers Python support as well. You can check out the Python SDK or use raw HTTP requests to make API calls in a similar fashion to the JavaScript approach. For more details, visit their docs.

Conclusion

This tutorial showed you how to get started with the ElevenLabs Text-to-Speech API using JavaScript. With the ability to synthesize high-quality voices, this API is a powerful tool for projects that need AI-driven audio, whether you’re building chatbots, audiobooks, or any other application that needs TTS capabilities.

Check out the full API documentation at docs.elevenlabs.io for more details on advanced features and customization options.

Does ElevenLabs have an API?

Yes, ElevenLabs offers a text-to-speech API that developers can use to convert text into high-quality, lifelike AI voices. The API supports various customization options, including choosing different voices (voice_id), models (model_id), and parameters like similarity_boost to adjust the voice output. You can use it for applications like chatbots, audiobooks, and podcasts.

Is there a free text-to-speech API?

ElevenLabs offers a free tier with limited usage. With this free tier, you can access the text-to-speech API and experiment with generating speech, but it’s subject to usage limits like character count or daily requests. For larger-scale or commercial applications, you may need to upgrade to a paid plan.

How to convert text to speech using JavaScript?

You can convert text to speech using JavaScript by making a request to a text-to-speech API, like the one offered by ElevenLabs. Here’s how:
Set up a Node.js environment and install dependencies like axios or use the ElevenLabs SDK.
Make an API request to ElevenLabs’ text-to-speech endpoint, providing your API key, text, and voice parameters.
Process the audio response, which will come in an audio format like MP3 or WAV.