SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit
Depositor | dc.contributor | Yamagishi, Junichi | |
Funder | dc.contributor.other | EPSRC - Engineering and Physical Sciences Research Council | en_UK |
Funder | dc.contributor.other | The Royal Society of Edinburgh | en_UK |
Funder | dc.contributor.other | Japan Science & Technology Agency (JST). Core Research for Evolutionary Science and Technology (CREST) | en_UK |
Data Creator | dc.creator | Veaux, Christophe | |
Data Creator | dc.creator | Yamagishi, Junichi | |
Data Creator | dc.creator | MacDonald, Kirsten | |
Date Accessioned | dc.date.accessioned | 2017-04-04T09:21:53Z | |
Date Available | dc.date.available | 2017-04-04T09:21:53Z | |
Citation | dc.identifier.citation | Veaux, Christophe; Yamagishi, Junichi; MacDonald, Kirsten. (2017). CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit, [sound]. University of Edinburgh. The Centre for Speech Technology Research (CSTR). https://doi.org/10.7488/ds/1994. | en |
Persistent Identifier | dc.identifier.uri | https://hdl.handle.net/10283/2651 | |
Persistent Identifier | dc.identifier.uri | https://doi.org/10.7488/ds/1994 | |
Dataset Description (abstract) | dc.description.abstract | ## This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/2645 ##' This CSTR VCTK Corpus (Centre for Speech Technology Voice Cloning Toolkit) includes speech data uttered by 109 native speakers of English with various accents. 96kHz versions of the recordings are available at https://doi.org/10.7488/ds/2101. Each speaker reads out about 400 sentences, most of which were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent. The newspaper texts were taken from The Herald (Glasgow), with permission from Herald & Times Group. Each speaker reads a different set of the newspaper sentences, where each set was selected using a greedy algorithm designed to maximise the contextual and phonetic coverage. The Rainbow Passage and elicitation paragraph are the same for all speakers. The Rainbow Passage can be found in the International Dialects of English Archive: (http://web.ku.edu/~idea/readings/rainbow.htm). The elicitation paragraph is identical to the one used for the speech accent archive (http://accent.gmu.edu). The details of the speech accent archive can be found at http://www.ualberta.ca/~aacl2009/PDFs/WeinbergerKunath2009AACL.pdf . All speech data were recorded using an identical recording setup: an omni-directional head-mounted microphone (DPA 4035), 96kHz sampling frequency at 24 bits and in a hemi-anechoic chamber of the University of Edinburgh. All recordings were converted into 16 bits, downsampled to 48 kHz based on STPK, and manually end-pointed. This corpus was recorded for the purpose of building HMM-based text-to-speech synthesis systems, especially for speaker-adaptive HMM-based speech synthesis using average voice models trained on multiple speakers and speaker adaptation technologies. The file was previously available on the CSTR website, and was referenced in the Google DeepMind work on WaveNet: https://arxiv.org/pdf/1609.03499.pdf . Please note while text files containing transcripts of the speech are provided for 108 of the 109 recordings, in the '/txt' folder, the 'p315' text was lost due to a hard disk error. | en_UK |
Language | dc.language.iso | eng | en_UK |
Publisher | dc.publisher | University of Edinburgh. The Centre for Speech Technology Research (CSTR) | en_UK |
Relation (Is Version Of) | dc.relation.isversionof | http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html | en_UK |
Relation (Is Referenced By) | dc.relation.isreferencedby | https://arxiv.org/pdf/1609.03499.pdf | en_UK |
Relation (Is Referenced By) | dc.relation.isreferencedby | van den Oord, A et. al "WaveNet: A Generative Model for Raw Audio" arXiv:1609.03499v2 [cs.SD] 19 Sep 2016 | en_UK |
Relation (Is Referenced By) | dc.relation.isreferencedby | "WaveNet: A Generative Model for Raw Audio" https://deepmind.com/blog/wavenet-generative-model-raw-audio/ | en_UK |
Supersedes | dc.relation.replaces | https://doi.org/10.7488/ds/1495 | en_UK |
Superseded By | dc.relation.isreplacedby | https://doi.org/10.7488/ds/2645 | |
Rights | dc.rights | Creative Commons Attribution 4.0 International Public License | en |
Source | dc.source | The Rainbow Passage which the speakers read out can be found in the International Dialects of English Archive: (http://web.ku.edu/~idea/readings/rainbow.htm). | en_UK |
Source | dc.source | The elicitation paragraph which the speakers read out is identical to the one used for the speech accent archive (http://accent.gmu.edu). | en_UK |
Subject | dc.subject | speech synthesis | en_UK |
Subject | dc.subject | HMM | en_UK |
Subject Classification | dc.subject.classification | Mathematical and Computer Sciences::Speech and Natural Language Processing | en_UK |
Title | dc.title | SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit | en_UK |
Type | dc.type | sound | en_UK |
zip file MD5 Checksum:
20d3b2a5d79e2224a8c82e592b99ef36
Files in this item
This item appears in the following Collection(s)
-
Centre for Speech Technology Research (CSTR) research projects
-
VCTK
Voice Cloning Toolkit