Praditor: A DBSCAN-based automation for speech onset detection.

IF 3.9 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods Pub Date : 2025-08-04 DOI:10.3758/s13428-025-02776-2

Zhengyuan Liu, Xinqi Yu, Wing Chung Hu, Yunxiao Ma, Ruiming Wang, Haoyun Zhang

{"title":"Praditor: A DBSCAN-based automation for speech onset detection.","authors":"Zhengyuan Liu, Xinqi Yu, Wing Chung Hu, Yunxiao Ma, Ruiming Wang, Haoyun Zhang","doi":"10.3758/s13428-025-02776-2","DOIUrl":null,"url":null,"abstract":"<p><p>Speech onset time (SOT) serves as a critical parameter in speech production research, marking the transition from background noise to the start of the speech signal. While manual annotation remains the gold standard for identifying SOT, its labor-intensive nature can result in considerable fatigue, thereby jeopardizing the accuracy of the annotation. Here, we present Praditor, a semi-automatic speech onset detection tool, leveraging a combination of algorithms consisting of density-based spatial clustering of applications with noise (DBSCAN) and first-derivative thresholding. Praditor offers a user-friendly experience across major platforms, including Windows and macOS, eliminating the need for complex setup procedures and offering a GUI that facilitates the tuning procedure. Furthermore, Praditor is capable of processing both multiple-onset and single-onset audio files regardless of language, and generates a TextGrid file for subsequent verification. To assess the accuracy of Praditor, we compared time difference (TD) scores and executed a linear regression analysis between manual and automatic annotations. Results showed that Praditor was highly accurate in both Mandarin and English datasets, as about 90% of the annotations fell within the range of ±20 ms, with corpus-level tuning achieving slightly lower but acceptable accuracy with respect to file-level tuning. This semi-automatic method is expected to offer a general solution for speech onset annotation in a language-independent manner, catering to not only experienced programmers but also users with little to no prior experience. Praditor is openly available on its official GitHub repository ( https://github.com/Paradeluxe/Praditor ).</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 9","pages":"247"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02776-2","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Speech onset time (SOT) serves as a critical parameter in speech production research, marking the transition from background noise to the start of the speech signal. While manual annotation remains the gold standard for identifying SOT, its labor-intensive nature can result in considerable fatigue, thereby jeopardizing the accuracy of the annotation. Here, we present Praditor, a semi-automatic speech onset detection tool, leveraging a combination of algorithms consisting of density-based spatial clustering of applications with noise (DBSCAN) and first-derivative thresholding. Praditor offers a user-friendly experience across major platforms, including Windows and macOS, eliminating the need for complex setup procedures and offering a GUI that facilitates the tuning procedure. Furthermore, Praditor is capable of processing both multiple-onset and single-onset audio files regardless of language, and generates a TextGrid file for subsequent verification. To assess the accuracy of Praditor, we compared time difference (TD) scores and executed a linear regression analysis between manual and automatic annotations. Results showed that Praditor was highly accurate in both Mandarin and English datasets, as about 90% of the annotations fell within the range of ±20 ms, with corpus-level tuning achieving slightly lower but acceptable accuracy with respect to file-level tuning. This semi-automatic method is expected to offer a general solution for speech onset annotation in a language-independent manner, catering to not only experienced programmers but also users with little to no prior experience. Praditor is openly available on its official GitHub repository ( https://github.com/Paradeluxe/Praditor ).

查看原文本刊更多论文

基于dbscan的语音起始检测自动化。

语音起始时间（SOT）是语音产生研究中的一个关键参数，它标志着从背景噪声到语音信号起始的过渡。虽然手动注释仍然是识别SOT的金标准，但其劳动密集型的性质可能导致相当的疲劳，从而危及注释的准确性。在这里，我们提出了一种半自动语音开始检测工具Praditor，它利用了由基于密度的带噪声应用空间聚类（DBSCAN）和一阶导数阈值组成的算法组合。Praditor提供了跨主要平台（包括Windows和macOS）的用户友好体验，消除了复杂设置过程的需要，并提供了一个GUI，方便了调优过程。此外，不管语言如何，Praditor都能够处理多起始和单起始音频文件，并生成一个TextGrid文件以供后续验证。为了评估Praditor的准确性，我们比较了时差（TD）分数，并在手动和自动注释之间执行了线性回归分析。结果表明，Praditor在中文和英文数据集上都非常准确，大约90%的注释落在±20 ms的范围内，语料库级调优相对于文件级调优实现了略低但可接受的精度。这种半自动方法有望以与语言无关的方式为语音开始注释提供通用解决方案，不仅适合有经验的程序员，也适合没有经验的用户。Praditor在其官方GitHub存储库（https://github.com/Paradeluxe/Praditor）上公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Behavior Research Methods Multiple-

CiteScore

10.30

自引率

9.30%

发文量

266

期刊介绍： Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.