Faculty Scholarship

Comparison of Multi-Scale Speaker Vectors and S-Vectors for Zero-Shot Speech Synthesis

Abstract

We compare a novel speaker encoder model, called Multi-Scale Speaker (MSS) Vectors, with state-of-the-art s-vectors model for zero-shot speech synthesis. The s-vectors model relies on a modified transformer self-attention network for its architecture. The MSS vectors model introduces a multi-scale approach to the s-vectors model. Results demonstrate that our model produces more natural and similar-sounding synthesized speech for unseen speakers in a zero-shot speech synthesis system.

Department(s)

Computer Science

Document Type

Conference Proceeding

DOI

10.1109/ISM55400.2022.00055

Keywords

speaker adaptation, speaker embedding, speaker encoder, text to speech

Publication Date

1-1-2022

Recommended Citation

Cory, Tristin and Iqbal, Razib, "Comparison of Multi-Scale Speaker Vectors and S-Vectors for Zero-Shot Speech Synthesis" (2022). Faculty Scholarship. 793.
https://bearworks.missouristate.edu/articles00/793

Journal Title

Proceedings 2022 IEEE International Symposium on Multimedia Ism 2022

Link to Full Text

COinS

Faculty Scholarship

Comparison of Multi-Scale Speaker Vectors and S-Vectors for Zero-Shot Speech Synthesis

Abstract

Department(s)

Document Type

DOI

Keywords

Publication Date

Recommended Citation

Journal Title

Browse

Search

Author Corner

Faculty Scholarship

Comparison of Multi-Scale Speaker Vectors and S-Vectors for Zero-Shot Speech Synthesis

Authors

Abstract

Department(s)

Document Type

DOI

Keywords

Publication Date

Recommended Citation

Journal Title

Share

Browse

Search

Author Corner