{"title":"Auto-Landmark: Acoustic Landmark Dataset and Open-Source Toolkit for Landmark Extraction","authors":"Xiangyu Zhang, Daijiao Liu, Tianyi Xiao, Cihan Xiao, Tuende Szalay, Mostafa Shahin, Beena Ahmed, Julien Epps","doi":"arxiv-2409.07969","DOIUrl":null,"url":null,"abstract":"In the speech signal, acoustic landmarks identify times when the acoustic\nmanifestations of the linguistically motivated distinctive features are most\nsalient. Acoustic landmarks have been widely applied in various domains,\nincluding speech recognition, speech depression detection, clinical analysis of\nspeech abnormalities, and the detection of disordered speech. However, there is\ncurrently no dataset available that provides precise timing information for\nlandmarks, which has been proven to be crucial for downstream applications\ninvolving landmarks. In this paper, we selected the most useful acoustic\nlandmarks based on previous research and annotated the TIMIT dataset with them,\nbased on a combination of phoneme boundary information and manual inspection.\nMoreover, previous landmark extraction tools were not open source or\nbenchmarked, so to address this, we developed an open source Python-based\nlandmark extraction tool and established a series of landmark detection\nbaselines. The first of their kinds, the dataset with landmark precise timing\ninformation, landmark extraction tool and baselines are designed to support a\nwide variety of future research.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the speech signal, acoustic landmarks identify times when the acoustic
manifestations of the linguistically motivated distinctive features are most
salient. Acoustic landmarks have been widely applied in various domains,
including speech recognition, speech depression detection, clinical analysis of
speech abnormalities, and the detection of disordered speech. However, there is
currently no dataset available that provides precise timing information for
landmarks, which has been proven to be crucial for downstream applications
involving landmarks. In this paper, we selected the most useful acoustic
landmarks based on previous research and annotated the TIMIT dataset with them,
based on a combination of phoneme boundary information and manual inspection.
Moreover, previous landmark extraction tools were not open source or
benchmarked, so to address this, we developed an open source Python-based
landmark extraction tool and established a series of landmark detection
baselines. The first of their kinds, the dataset with landmark precise timing
information, landmark extraction tool and baselines are designed to support a
wide variety of future research.