{"title":"MaRePhoR — An open access machine-readable phonetic dictionary for Romanian","authors":"Stefan-Adrian Toma, Adriana Stan, Mihai-Lica Pura, Traian Barsan","doi":"10.1109/SPED.2017.7990435","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel open access resource, the machine-readable phonetic dictionary for Romanian — MaRePhoR. It contains over 70,000 word entries, and their manually performed phonetic transcription. The paper describes the dictionary format and statistics, as well as an initial use of the phonetic transcription entries by building a grapheme to phoneme converter based on decision trees. Various training strategies were tested enabling the correct selection of a final setup for our predictor. The best results showed that using the dictionary as training data, an accuracy of over 99% can be achieved.","PeriodicalId":345314,"journal":{"name":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"420 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPED.2017.7990435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
This paper introduces a novel open access resource, the machine-readable phonetic dictionary for Romanian — MaRePhoR. It contains over 70,000 word entries, and their manually performed phonetic transcription. The paper describes the dictionary format and statistics, as well as an initial use of the phonetic transcription entries by building a grapheme to phoneme converter based on decision trees. Various training strategies were tested enabling the correct selection of a final setup for our predictor. The best results showed that using the dictionary as training data, an accuracy of over 99% can be achieved.