Creating your own voice model on is easy. Create the best possible voice model by creating a high-quality dataset using the tips below. If you need any additional support, join the Discord or get in touch with us.

Here’s what you need.

15 total minutes (or more! the more audio the better) of dry (no effects) and monophonic (one note at a time) vocals.

Bad Vocals

Stereo, reverb, delay


Good Vocals

Mono, clean tone, low noise


For best results, create different models for distinct vocal styles (singing vs. rapping, etc.)

Getting your file(s) ready.

Export your files with no silence and consistent volume as a 16-bit lossless audio file (.wav preferred).

Before: silence, inconsistent volume levels


After: truncated silence, consistent volume


Once you’ve compiled your vocals, the next step is to prepare your files for training:

How to convert to mono and remove silence with Audacity

Advanced Pre-Processing Tips