Karaoke-Maker API AI-Powered Karaoke Solutions

Transform any song into a karaoke track with our AI-powered vocal extraction and pitch correction technology.

Input

Audio Filestring

Input audio file (MP3/WAV)

No Vocal Audio Filestring

Input vocal-removed audio file (MP3/WAV) for generating instrumental version

Ass Filestring

ASS subtitle file (must contain word-by-word effects)

Resolutionstring

Video resolution (e.g., 1280x720)

Enable Visualizationboolean

Whether to render audio visualization (disabling this will not render waves/spectrum)

Enable Spectrumboolean

Whether to force use spectrum when visualization is enabled; otherwise follow viz_type setting

Viz Typestring

Visualization type: waves (waveform, faster) / spectrum (frequency spectrum, slower)

Viz Heightinteger

Visualization area height (in pixels)

Viz Positionstring

Position: top / bottom / center

Viz Colorstring

Color in waves mode (0xRRGGBB or color name); color scheme in spectrum mode (rainbow/moreland/viridis etc.)

Viz Opacitynumber

Visualization layer opacity 0-1

Fonts Dirstring

Optional: Directory containing required Chinese fonts (TTF/OTF) for ASS font matching

Fpsinteger

Output frame rate (reducing this value can significantly speed up processing, e.g., 24)

X264 Presetstring

x264 encoding preset (ultrafast/superfast/veryfast/faster/fast/medium...)

Crfinteger

x264 CRF value (larger value means smaller file size and faster speed, range is usually 18-32, recommended 26-30 for acceleration)

Threadsinteger

FFmpeg thread count (0 means auto-detect)

Filter Threadsinteger

Filter graph parallel thread count (0 lets FFmpeg choose automatically)

Viz Fpsinteger

Visualization layer refresh frame rate (0 means follow the fps value)

Image Filesstring

Optional: Comma-separated list of image paths (jpg/png/webp etc.)

Image Durationnumber

Display duration per image (in seconds), only effective when image list is provided

Video Filesstring

Optional: Comma-separated list of video paths (mp4/mov etc.)

Video Loopboolean

Whether to loop playback when video duration is shorter than audio duration

Titlestring

Optional: Title displayed on the first frame of the video

Authorstring

Optional: Author displayed on the first frame of the video

Output

Powerful Features for Karaoke Apps

High-Quality Vocal Extraction

Advanced AI algorithms that precisely separate vocals from any audio track.

Real-Time Pitch Correction

Automatic pitch detection and correction for perfect harmonies.

Multi-Language Support

Trained on diverse datasets to support lyrics and phonemes from multiple languages.

Fast Processing

Optimized inference engine delivers results in seconds.

Everything You Need

Build karaoke apps faster

Vocal Separation

Extract or remove vocals with AI precision.

Pitch Correction

Automatic pitch detection and correction.

Lyrics Sync

Synchronize lyrics with timing data.

Batch Processing

Process multiple tracks simultaneously.

How to Get Started

Start in minutes

Ready to Build Amazing Karaoke Experiences?

Start today with our free tier.

Start Free Trial

What Our Users Say

Join thousands of developers

Karaoke-Maker API transformed our app. Vocal extraction quality is amazing.

Alex Chen

CTO, SingAlong App

The pitch correction feature is incredibly accurate.

Sarah Kim

Product Manager, MusicFlow

Fast processing and reliable uptime.

Mike Johnson

Lead Developer, PartyBox

Frequently Asked Questions

Everything you need to know