Karaoke-Maker API AI-Powered Karaoke Solutions

Transform any song into a karaoke track with our AI-powered vocal extraction and pitch correction technology.

Input

string

Input audio file (MP3/WAV)

string

Input vocal-removed audio file (MP3/WAV) for generating instrumental version

string

ASS subtitle file (must contain word-by-word effects)

string

Video resolution (e.g., 1280x720)

boolean

Whether to render audio visualization (disabling this will not render waves/spectrum)

boolean

Whether to force use spectrum when visualization is enabled; otherwise follow viz_type setting

string

Visualization type: waves (waveform, faster) / spectrum (frequency spectrum, slower)

integer

Visualization area height (in pixels)

string

Position: top / bottom / center

string

Color in waves mode (0xRRGGBB or color name); color scheme in spectrum mode (rainbow/moreland/viridis etc.)

number

Visualization layer opacity 0-1

string

Optional: Directory containing required Chinese fonts (TTF/OTF) for ASS font matching

integer

Output frame rate (reducing this value can significantly speed up processing, e.g., 24)

string

x264 encoding preset (ultrafast/superfast/veryfast/faster/fast/medium...)

integer

x264 CRF value (larger value means smaller file size and faster speed, range is usually 18-32, recommended 26-30 for acceleration)

integer

FFmpeg thread count (0 means auto-detect)

integer

Filter graph parallel thread count (0 lets FFmpeg choose automatically)

integer

Visualization layer refresh frame rate (0 means follow the fps value)

string

Optional: Comma-separated list of image paths (jpg/png/webp etc.)

number

Display duration per image (in seconds), only effective when image list is provided

string

Optional: Comma-separated list of video paths (mp4/mov etc.)

boolean

Whether to loop playback when video duration is shorter than audio duration

string

Optional: Title displayed on the first frame of the video

string

Optional: Author displayed on the first frame of the video

Output

Powerful Features for Karaoke Apps

High-Quality Vocal Extraction

Advanced AI algorithms that precisely separate vocals from any audio track.

Real-Time Pitch Correction

Automatic pitch detection and correction for perfect harmonies.

Multi-Language Support

Trained on diverse datasets to support lyrics and phonemes from multiple languages.

Fast Processing

Optimized inference engine delivers results in seconds.

Everything You Need

Build karaoke apps faster

Vocal Separation

Extract or remove vocals with AI precision.

1

Pitch Correction

Automatic pitch detection and correction.

2

Lyrics Sync

Synchronize lyrics with timing data.

3

Batch Processing

Process multiple tracks simultaneously.

4

How to Get Started

Start in minutes

Ready to Build Amazing Karaoke Experiences?

Start today with our free tier.

What Our Users Say

Join thousands of developers

Karaoke-Maker API transformed our app. Vocal extraction quality is amazing.

Alex Chen

CTO, SingAlong App

The pitch correction feature is incredibly accurate.

Sarah Kim

Product Manager, MusicFlow

Fast processing and reliable uptime.

Mike Johnson

Lead Developer, PartyBox

Frequently Asked Questions

Everything you need to know