{"title":"Building a Persian-English OMProDat Database Read by Persian Speakers","authors":"Mortaza Taheri-Ardali, D. Hirst","doi":"10.21437/speechprosody.2022-90","DOIUrl":null,"url":null,"abstract":"OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages. As part of the OMProDat project, this paper focuses on the creation of a bilingual Persian-English prosodic database read by native speakers of Persian. This collection contains 40 continuous, thematically connected paragraphs, each of five sentences, originally created during the European SAM project. Our collection was recorded by 5 male and 5 female speakers of standard Persian, all from monolingual families. The Persian texts were romanised and transcribed phonetically using the ASCII phonetic alphabet SAMPA. The database includes TextGrid annotations, which will be obtained semi-automatically from the sound and the orthographic transcription using the SPPAS alignment software. The Momel and INSINT algorithms will be used to provide prosodic annotation of the corpus. This considerable amount of data will allow us to compare the production of Persian and English as L1 and L2, respectively. In addition, a cross-linguistic comparison with other languages in OMProDat is easily feasible.","PeriodicalId":442842,"journal":{"name":"Speech Prosody 2022","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Prosody 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/speechprosody.2022-90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
OMProDat is an open multilingual prosodic database, which aims to collect, archive and distribute recordings and annotations of directly comparable data from different languages. As part of the OMProDat project, this paper focuses on the creation of a bilingual Persian-English prosodic database read by native speakers of Persian. This collection contains 40 continuous, thematically connected paragraphs, each of five sentences, originally created during the European SAM project. Our collection was recorded by 5 male and 5 female speakers of standard Persian, all from monolingual families. The Persian texts were romanised and transcribed phonetically using the ASCII phonetic alphabet SAMPA. The database includes TextGrid annotations, which will be obtained semi-automatically from the sound and the orthographic transcription using the SPPAS alignment software. The Momel and INSINT algorithms will be used to provide prosodic annotation of the corpus. This considerable amount of data will allow us to compare the production of Persian and English as L1 and L2, respectively. In addition, a cross-linguistic comparison with other languages in OMProDat is easily feasible.