一种可解释的口语转录本和书面文本自动分类方法。

IF 2.3 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mattias Wahde, Marco L Della Vedova, Marco Virgolin, Minerva Suvanto
{"title":"一种可解释的口语转录本和书面文本自动分类方法。","authors":"Mattias Wahde,&nbsp;Marco L Della Vedova,&nbsp;Marco Virgolin,&nbsp;Minerva Suvanto","doi":"10.1007/s12065-023-00851-1","DOIUrl":null,"url":null,"abstract":"<p><p>We investigate the differences between spoken language (in the form of radio show transcripts) and written language (Wikipedia articles) in the context of text classification. We present a novel, interpretable method for text classification, involving a linear classifier using a large set of <math><mrow><mi>n</mi><mo>-</mo></mrow></math>gram features, and apply it to a newly generated data set with sentences originating either from spoken transcripts or written text. Our classifier reaches an accuracy less than 0.02 below that of a commonly used classifier (DistilBERT) based on deep neural networks (DNNs). Moreover, our classifier has an integrated measure of confidence, for assessing the reliability of a given classification. An online tool is provided for demonstrating our classifier, particularly its interpretable nature, which is a crucial feature in classification tasks involving high-stakes decision-making. We also study the capability of DistilBERT to carry out fill-in-the-blank tasks in either spoken or written text, and find it to perform similarly in both cases. Our main conclusion is that, with careful improvements, the performance gap between classical methods and DNN-based methods may be reduced significantly, such that the choice of classification method comes down to the need (if any) for interpretability.</p>","PeriodicalId":46237,"journal":{"name":"Evolutionary Intelligence","volume":" ","pages":"1-13"},"PeriodicalIF":2.3000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10157555/pdf/","citationCount":"0","resultStr":"{\"title\":\"An interpretable method for automated classification of spoken transcripts and written text.\",\"authors\":\"Mattias Wahde,&nbsp;Marco L Della Vedova,&nbsp;Marco Virgolin,&nbsp;Minerva Suvanto\",\"doi\":\"10.1007/s12065-023-00851-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We investigate the differences between spoken language (in the form of radio show transcripts) and written language (Wikipedia articles) in the context of text classification. We present a novel, interpretable method for text classification, involving a linear classifier using a large set of <math><mrow><mi>n</mi><mo>-</mo></mrow></math>gram features, and apply it to a newly generated data set with sentences originating either from spoken transcripts or written text. Our classifier reaches an accuracy less than 0.02 below that of a commonly used classifier (DistilBERT) based on deep neural networks (DNNs). Moreover, our classifier has an integrated measure of confidence, for assessing the reliability of a given classification. An online tool is provided for demonstrating our classifier, particularly its interpretable nature, which is a crucial feature in classification tasks involving high-stakes decision-making. We also study the capability of DistilBERT to carry out fill-in-the-blank tasks in either spoken or written text, and find it to perform similarly in both cases. Our main conclusion is that, with careful improvements, the performance gap between classical methods and DNN-based methods may be reduced significantly, such that the choice of classification method comes down to the need (if any) for interpretability.</p>\",\"PeriodicalId\":46237,\"journal\":{\"name\":\"Evolutionary Intelligence\",\"volume\":\" \",\"pages\":\"1-13\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2023-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10157555/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s12065-023-00851-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12065-023-00851-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

在文本分类的背景下,我们调查了口语(以广播节目记录的形式)和书面语言(维基百科文章)之间的差异。我们提出了一种新颖的、可解释的文本分类方法,包括使用一大组n-gram特征的线性分类器,并将其应用于新生成的数据集,该数据集的句子来源于口语转录本或书面文本。我们的分类器的精度比常用的基于深度神经网络(DNN)的分类器(DistilBERT)低0.02以下。此外,我们的分类器有一个综合的置信度度量,用于评估给定分类的可靠性。提供了一个在线工具来演示我们的分类器,特别是其可解释性,这是涉及高风险决策的分类任务的一个关键特征。我们还研究了DistilBERT在口语或书面文本中执行填空任务的能力,并发现它在这两种情况下的表现相似。我们的主要结论是,经过仔细改进,经典方法和基于DNN的方法之间的性能差距可能会显著缩小,因此分类方法的选择取决于对可解释性的需求(如果有的话)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

An interpretable method for automated classification of spoken transcripts and written text.

An interpretable method for automated classification of spoken transcripts and written text.

An interpretable method for automated classification of spoken transcripts and written text.

An interpretable method for automated classification of spoken transcripts and written text.

We investigate the differences between spoken language (in the form of radio show transcripts) and written language (Wikipedia articles) in the context of text classification. We present a novel, interpretable method for text classification, involving a linear classifier using a large set of n-gram features, and apply it to a newly generated data set with sentences originating either from spoken transcripts or written text. Our classifier reaches an accuracy less than 0.02 below that of a commonly used classifier (DistilBERT) based on deep neural networks (DNNs). Moreover, our classifier has an integrated measure of confidence, for assessing the reliability of a given classification. An online tool is provided for demonstrating our classifier, particularly its interpretable nature, which is a crucial feature in classification tasks involving high-stakes decision-making. We also study the capability of DistilBERT to carry out fill-in-the-blank tasks in either spoken or written text, and find it to perform similarly in both cases. Our main conclusion is that, with careful improvements, the performance gap between classical methods and DNN-based methods may be reduced significantly, such that the choice of classification method comes down to the need (if any) for interpretability.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Evolutionary Intelligence
Evolutionary Intelligence COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
6.80
自引率
0.00%
发文量
108
期刊介绍: This Journal provides an international forum for the timely publication and dissemination of foundational and applied research in the domain of Evolutionary Intelligence. The spectrum of emerging fields in contemporary artificial intelligence, including Big Data, Deep Learning, Computational Neuroscience bridged with evolutionary computing and other population-based search methods constitute the flag of Evolutionary Intelligence Journal.Topics of interest for Evolutionary Intelligence refer to different aspects of evolutionary models of computation empowered with intelligence-based approaches, including but not limited to architectures, model optimization and tuning, machine learning algorithms, life inspired adaptive algorithms, swarm-oriented strategies, high performance computing, massive data processing, with applications to domains like computer vision, image processing, simulation, robotics, computational finance, media, internet of things, medicine, bioinformatics, smart cities, and similar. Surveys outlining the state of art in specific subfields and applications are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信