Qualitative Analysis of Semantic Language Models

Ancient Manuscripts in Digital Culture Pub Date : 2019-05-14 DOI:10.1163/9789004399297_007

Thibault Clérice, M. Munson

{"title":"Qualitative Analysis of Semantic Language Models","authors":"Thibault Clérice, M. Munson","doi":"10.1163/9789004399297_007","DOIUrl":null,"url":null,"abstract":"The task of automatically extracting semantic information from raw textual data is an increasingly important topic in computational linguistics and has begun to make its way into non-linguistic humanities research.1 That this task has been accepted as an important one in computational linguistics is shown by its appearance in the standard text books and handbooks for computational linguistics such as Manning and Schuetze Foundations of Statistical Natural Language Processing2 and Jurafsky and Martin Speech and Language Processing.3 And according to the Association for Computational Linguistics Wiki,4 there have been 25 published experiments which used the TOEFL (Test of English as a Foreign Language) standardized synonym questions to test the performance of algorithmic extraction of semantic information since 1997 with scores ranging from 20% to 100% accuracy. The question addressed by this paper, however, is not whether semantic information can be automatically extracted from textual data. The studies listed in the preceding paragraph have already proven this. It is also not about trying to find the best algorithm to use to do this. Instead, this paper aims to make this widely used and accepted task more useful outside of purely linguistic studies by considering how one can qualitatively assess the results returned by such algorithms. That is, it aims to move the assessment of the results returned by semantic extraction algorithms closer to the actual hermeneutical tasks carried out in the, e.g., historical, cultural, or theological interpretation of texts. We believe that this critical projection of algorithmic results back onto the","PeriodicalId":355737,"journal":{"name":"Ancient Manuscripts in Digital Culture","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ancient Manuscripts in Digital Culture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1163/9789004399297_007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The task of automatically extracting semantic information from raw textual data is an increasingly important topic in computational linguistics and has begun to make its way into non-linguistic humanities research.1 That this task has been accepted as an important one in computational linguistics is shown by its appearance in the standard text books and handbooks for computational linguistics such as Manning and Schuetze Foundations of Statistical Natural Language Processing2 and Jurafsky and Martin Speech and Language Processing.3 And according to the Association for Computational Linguistics Wiki,4 there have been 25 published experiments which used the TOEFL (Test of English as a Foreign Language) standardized synonym questions to test the performance of algorithmic extraction of semantic information since 1997 with scores ranging from 20% to 100% accuracy. The question addressed by this paper, however, is not whether semantic information can be automatically extracted from textual data. The studies listed in the preceding paragraph have already proven this. It is also not about trying to find the best algorithm to use to do this. Instead, this paper aims to make this widely used and accepted task more useful outside of purely linguistic studies by considering how one can qualitatively assess the results returned by such algorithms. That is, it aims to move the assessment of the results returned by semantic extraction algorithms closer to the actual hermeneutical tasks carried out in the, e.g., historical, cultural, or theological interpretation of texts. We believe that this critical projection of algorithmic results back onto the

查看原文本刊更多论文

语义语言模型的定性分析

从原始文本数据中自动提取语义信息是计算语言学中一个日益重要的课题，并已开始进入非语言人文学科的研究领域这一任务已被认为是计算语言学中的一项重要任务，它出现在计算语言学的标准教科书和手册中，如Manning和Schuetze的《统计自然语言处理基础》和Jurafsky和Martin的《语音和语言处理》。根据计算语言学协会的Wiki，已经发表了25项使用托福(作为外语的英语测试)的实验。自1997年以来，标准化同义词问题测试了算法提取语义信息的性能，准确率从20%到100%不等。然而，本文解决的问题不是语义信息是否可以从文本数据中自动提取。前文所列的研究已经证明了这一点。它也不是试图找到最好的算法来做这件事。相反，本文的目的是通过考虑如何定性地评估这种算法返回的结果，使这种广泛使用和接受的任务在纯语言学研究之外更有用。也就是说，它旨在将对语义提取算法返回的结果的评估更接近于在文本的历史、文化或神学解释中执行的实际解释学任务。我们相信这个关键的投影算法的结果回到

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ancient Manuscripts in Digital Culture

自引率

0.00%

发文量