A comparison of multi-style DNN-based TTS approaches using small datasets

Siniša Suzić; Tijana Delić; Vladimir Jovanović; Milan Sečujski; Darko Pekar; Vlado Delić

doi:10.1051/matecconf/201816103005

All issues

Volume 161 (2018)

MATEC Web Conf., 161 (2018) 03005

Abstract

Open Access

Issue		MATEC Web Conf. Volume 161, 2018 13^th International Scientific-Technical Conference on Electromechanics and Robotics “Zavalishin’s Readings” - 2018


Article Number		03005
Number of page(s)		6
Section		Robotics and Automation
DOI		https://doi.org/10.1051/matecconf/201816103005
Published online		18 April 2018

MATEC Web of Conferences 161, 03005 (2018)

A comparison of multi-style DNN-based TTS approaches using small datasets

Siniša Suzić¹^*, Tijana Delić¹, Vladimir Jovanović², Milan Sečujski¹, Darko Pekar² and Vlado Delić¹

¹ Faculty of Technical Sciences, University of Novi Sad, 21000 Novi Sad, Serbia
² AlfaNum Speech Technologies, 21000 Novi Sad, Serbia

^* Corresponding author: sinisa.suzic@uns.ac.rs

Abstract

Studies have shown that people already perceive the interaction with computers, robots and media in the same way as they perceive social communication with other people. For that reason it is critical for a high-quality text-to-speech system (TTS) to sound as human-like as possible. However, a major obstacle in creating expressive TTS voices is that the amount of style-specific speech needed for training such a system is often not sufficient. This paper presents a comparison between different approaches to multi-style TTS, with focus on cases when only a small dataset per style is available. The described approaches have been originally proposed for efficient modelling of multiple speakers with a limited amount of data per speaker. Among the suggested approaches the approach based on style codes has emerged as the best, regardless of the target speech style.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (http://creativecommons.org/licenses/by/4.0/).

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.