Instructions to use facebook/mms-tts-eng with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use facebook/mms-tts-eng with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="facebook/mms-tts-eng")# Load model directly from transformers import AutoTokenizer, AutoModelForTextToWaveform tokenizer = AutoTokenizer.from_pretrained("facebook/mms-tts-eng") model = AutoModelForTextToWaveform.from_pretrained("facebook/mms-tts-eng") - Notebooks
- Google Colab
- Kaggle
Special tokens to control pausing?
The model appears to randomly ignore periods or commas and the speech sounds a bit odd. Are there special pause characters or something to get it to stop and take a breath?
The model is trained on neither commas and periods, so they are filtered from the input text via a normalisation step in the pre-processing: https://github.com/huggingface/transformers/blob/910faa3e1f1c566b23a0318f78f5caf5bda8d3b2/src/transformers/models/vits/tokenization_vits.py#L127
Using hyphens is indeed the best option here
Will this model be re-trained on commas? even use hyphen, it still sounds very odd, just a bit better than commas. Besides, I found some pronounce is wrong, you can test "library".
I've long since given up on this model. Microsoft's new TTS model is impeccable and produces more natural speech anyway.