Voice Model Creation Guide

Creating your own voice model on Kits.ai is easy. Create the best possible voice model by creating a high-quality dataset using the tips below. If you need any additional support, join the Kits.ai Discord or get in touch with us.

Here’s what you need.

15 total minutes (or more! the more audio the better) of dry (no effects) and monophonic (one note at a time) vocals.

No reverb, delay, chorus, or instrumentals
No harmonies, layering, doubletracking, stereo effects.

Bad Vocals

Stereo, reverb, delay

SG_acapella_wet.wav

Good Vocals

Mono, clean tone, low noise

SG_acapella_example.wav

For best results, create different models for distinct vocal styles (singing vs. rapping, etc.)

Getting your file(s) ready.

Export your files with no silence and consistent volume as a 16-bit lossless audio file (.wav preferred).

Before: silence, inconsistent volume levels

After: truncated silence, consistent volume

Once you’ve compiled your vocals, the next step is to prepare your files for training:

Remove any extra silence (we recommend doing this automatically with Audacity)
Export as true mono (rather than stereo with equal L + R channels)
Export as 16-bit .wav (no audio length requirements, can be one 15-minute file or 15 1-minute files)

How to convert to mono and remove silence with Audacity

example.mov

Here’s what you need.

Getting your file(s) ready.

Advanced Pre-Processing Tips