{"title":"使打字成绩单的人工评分成为过去:赫尔曼评论(2025)。","authors":"Hans Rutger Bosker","doi":"10.1080/2050571X.2025.2514395","DOIUrl":null,"url":null,"abstract":"<p><p>Coding the accuracy of typed transcripts from experiments testing speech intelligibility is an arduous endeavour. A recent study in this journal [Herrmann, B. 2025. Leveraging natural language processing models to automate speech-intelligibility scoring. <i>Speech, Language and Hearing, 28</i>(1)] presents a novel approach for automating the scoring of such listener transcripts, leveraging Natural Language Processing (NLP) models. It involves the calculation of the semantic similarity between transcripts and target sentences using high-dimensional vectors, generated by such NLP models as ADA2, GPT2, BERT, and USE. This approach demonstrates exceptional accuracy, with negligible underestimation of intelligibility scores (by about 2-4%), numerically outperforming simpler computational tools like Autoscore and TSR. The method uniquely relies on semantic representations generated by large language models. At the same time, these models also form the Achilles heel of the technique: the transparency, accessibility, data security, ethical framework, and cost of the selected model directly impact the suitability of the NLP-based scoring method. Hence, working with such models can raise serious risks regarding the reproducibility of scientific findings. This in turn emphasises the need for fair, ethical, and evidence-based open source models. With such models, Herrmann's new tool represents a valuable addition to the speech scientist's toolbox.</p>","PeriodicalId":43000,"journal":{"name":"Speech Language and Hearing","volume":"28 1","pages":"2514395"},"PeriodicalIF":0.9000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312738/pdf/","citationCount":"0","resultStr":"{\"title\":\"Making manual scoring of typed transcripts a thing of the past: a commentary on Herrmann (2025).\",\"authors\":\"Hans Rutger Bosker\",\"doi\":\"10.1080/2050571X.2025.2514395\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Coding the accuracy of typed transcripts from experiments testing speech intelligibility is an arduous endeavour. A recent study in this journal [Herrmann, B. 2025. Leveraging natural language processing models to automate speech-intelligibility scoring. <i>Speech, Language and Hearing, 28</i>(1)] presents a novel approach for automating the scoring of such listener transcripts, leveraging Natural Language Processing (NLP) models. It involves the calculation of the semantic similarity between transcripts and target sentences using high-dimensional vectors, generated by such NLP models as ADA2, GPT2, BERT, and USE. This approach demonstrates exceptional accuracy, with negligible underestimation of intelligibility scores (by about 2-4%), numerically outperforming simpler computational tools like Autoscore and TSR. The method uniquely relies on semantic representations generated by large language models. At the same time, these models also form the Achilles heel of the technique: the transparency, accessibility, data security, ethical framework, and cost of the selected model directly impact the suitability of the NLP-based scoring method. Hence, working with such models can raise serious risks regarding the reproducibility of scientific findings. This in turn emphasises the need for fair, ethical, and evidence-based open source models. With such models, Herrmann's new tool represents a valuable addition to the speech scientist's toolbox.</p>\",\"PeriodicalId\":43000,\"journal\":{\"name\":\"Speech Language and Hearing\",\"volume\":\"28 1\",\"pages\":\"2514395\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312738/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Language and Hearing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/2050571X.2025.2514395\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q3\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Language and Hearing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2050571X.2025.2514395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
为测试语音清晰度的打字记录的准确性编码是一项艰巨的工作。该杂志最近的一项研究[Herrmann, B. 2025]。利用自然语言处理模型自动化语音可理解性评分。语音,语言和听力,28(1)]提出了一种利用自然语言处理(NLP)模型自动评分的新方法。它涉及使用高维向量计算转录本和目标句子之间的语义相似度,这些向量由ADA2、GPT2、BERT和USE等NLP模型生成。这种方法显示出卓越的准确性,对可理解性分数的低估可以忽略不计(约2-4%),在数字上优于Autoscore和TSR等更简单的计算工具。该方法独特地依赖于大型语言模型生成的语义表示。同时,这些模型也构成了该技术的阿喀琉斯之踵:所选模型的透明度、可及性、数据安全性、伦理框架、成本等直接影响基于nlp的评分方法的适用性。因此,使用这些模型可能会增加科学发现可重复性方面的严重风险。这反过来又强调了对公平、道德和基于证据的开源模型的需求。有了这样的模型,Herrmann的新工具代表了语音科学家工具箱的一个有价值的补充。
Making manual scoring of typed transcripts a thing of the past: a commentary on Herrmann (2025).
Coding the accuracy of typed transcripts from experiments testing speech intelligibility is an arduous endeavour. A recent study in this journal [Herrmann, B. 2025. Leveraging natural language processing models to automate speech-intelligibility scoring. Speech, Language and Hearing, 28(1)] presents a novel approach for automating the scoring of such listener transcripts, leveraging Natural Language Processing (NLP) models. It involves the calculation of the semantic similarity between transcripts and target sentences using high-dimensional vectors, generated by such NLP models as ADA2, GPT2, BERT, and USE. This approach demonstrates exceptional accuracy, with negligible underestimation of intelligibility scores (by about 2-4%), numerically outperforming simpler computational tools like Autoscore and TSR. The method uniquely relies on semantic representations generated by large language models. At the same time, these models also form the Achilles heel of the technique: the transparency, accessibility, data security, ethical framework, and cost of the selected model directly impact the suitability of the NLP-based scoring method. Hence, working with such models can raise serious risks regarding the reproducibility of scientific findings. This in turn emphasises the need for fair, ethical, and evidence-based open source models. With such models, Herrmann's new tool represents a valuable addition to the speech scientist's toolbox.