Cantai API

Integrate professional AI singing into your applications

Transform Your Applications

🎮

Singing Games

Create karaoke games, rhythm games, or music education apps with real-time vocal synthesis

🎵

Music Production

Build music streaming apps, vocal synthesis services, or voice customization platforms

🎓

Education

Build interactive music theory tools, ear training apps, or vocal harmony visualizers

🎙️

Content Creation

Generate singing vocals for podcasts, videos, or interactive media projects

Universal Input Support

Work with your preferred music formats

Standard Notation

  • MIDI Files
  • ABC Notation Files
  • Lilypond Files
  • MusicXML Files

Developer Friendly

  • JSON Score Format
  • YAML Configuration
  • CANTAI Format Files
  • Plain Text Lyrics

Advanced Features

  • Batch Processing
  • Voice Cloning
  • Style Transfer

Voice Output Formats

  • WAV Audio Files
  • OPUS Format
  • OGG Format
  • MP3 Format

Quick Start

song.yaml
song:
  notes: ["C4", "E4", "G4"]
  lyrics: ["Let's", "play", "now"]
  tempo: 120
fetch.js
const response = await fetch('https://cantai.app/v1/synthesize', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify(songData)
});
curl
curl -X POST https://cantai.app/v1/synthesize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d @song.yaml

High Performance

Process multiple voices simultaneously with our multi-threaded rendering engine for real-time synthesis and batch processing

🎤

Voice Types

  • High Soprano (Operatic)
  • Mezzo-Soprano
  • Tenor & Bass
  • Classical, Opera & Gospel Choir
  • Children's Choir
  • Custom Models (On Request)
☁️

Cloud Storage

Automatic file handling with secure cloud storage integration and version control for all your voice synthesis projects

Core Endpoints

POST /api/upload

Upload and normalize music files in any supported notation format

Supported Formats:

  • MIDI (.mid)
  • MusicXML (.xml, .mxl)
  • ABC Notation (.abc)
  • JSON-based notation (.json)
  • Gibber (.gibber)
{
  "id": "123abc",
  "detectedFormat": "midi",
  "status": "uploaded"
}
POST /api/render

Convert notation into high-quality AI vocals with customizable parameters

Parameters:

  • id - Upload ID from previous step
  • outputFormat - WAV, MP3, or Opus
  • voiceProfile - Voice type selection
  • reverb - Enable/disable reverb processing
POST /api/transcribe

Convert audio recordings into musical notation

Features:

  • WAV/MP3/Opus input support
  • MIDI/MusicXML output formats
  • Automatic note detection
  • Rhythm analysis
GET /api/status/{id}

Check processing status of rendering or transcription tasks

Status Values:

  • queued - In processing queue
  • processing - Currently being processed
  • completed - Ready for download
  • failed - Error occurred
GET /api/download/{id}

Download processed audio or notation files

Returns:

  • Audio files (WAV/MP3/Opus)
  • Notation files (MIDI/MusicXML)

Simple, Usage-Based Pricing

Pay-As-You-Go

$0.167/min
  • AWS GPU costs ~$0.10/min
  • No Monthly Minimum
  • 50%+ Profit Margin
  • Basic Support
Includes free trial

Enterprise License

$10,000/month
  • ~50,000 Minutes Included
  • $0.20/min Additional Usage
  • SLA & Dedicated Support
  • Custom Integration

All plans include standard voice models, API access, and basic audio formats

Prices shown in USD. Additional fees may apply for high-priority processing or specialized voice models.

Have Questions?

Join our Discord community to learn more about the API

Join #api-info on Discord

⚠️ API details and supported formats may change without notice before our Spring 2025 launch date. Check back for updates.

Subscribe to Our Newsletter

* indicates required