在 19 世纪德语手稿研究中使用计算机和语料库工具 Bok of Notes and Extracts

Journal of Linguistics/Jazykovedný casopis Pub Date : 2023-06-01 DOI:10.2478/jazcas-2023-0046

Martin Braxatoris, Anita Braxatorisová

{"title":"在 19 世纪德语手稿研究中使用计算机和语料库工具 Bok of Notes and Extracts","authors":"Martin Braxatoris, Anita Braxatorisová","doi":"10.2478/jazcas-2023-0046","DOIUrl":null,"url":null,"abstract":"Abstract The study explores the possibilities of using computer and corpus tools in the interpretation of texts of the genre of book of notes and extracts; these are documents consisting of extracts and modified excerpts from contemporary press and literature, records of the author’s own thoughts, etc. Samuel Ferjenčík’s manuscript is a Germanlanguage document by a Slovak author intended for private use; cited or adapted passages are usually given without any reference to the source. The paper introduces the problems of automatic identification of the source base, which relate to the application of OCR and content similarity detection tools. It discusses the results of text matching, which revealed several manipulations of source texts, especially substitutions, indicating attitudes and priority problems in the author’s thought-world. It further interprets the results of the use of the Sketch Engine corpus manager tools by which the frequency of occurrence of key terms and their collocability were investigated, paying special attention to substituted words. The paper is an example of the application of computer and corpus-linguistics methods to the interpretation of literary texts, which is represented by a number of current studies in the field of digital humanities. The proposed approaches are applicable to research on other books of notes and extracts, topical in the context of research trends related to egodocuments, as well as to textual research on monu ments of other genres.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"67 1","pages":"287 - 300"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of Computer and Corpus Tols in the Research of a 19th Century German -Language Manuscript Bok of Notes and Extracts\",\"authors\":\"Martin Braxatoris, Anita Braxatorisová\",\"doi\":\"10.2478/jazcas-2023-0046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The study explores the possibilities of using computer and corpus tools in the interpretation of texts of the genre of book of notes and extracts; these are documents consisting of extracts and modified excerpts from contemporary press and literature, records of the author’s own thoughts, etc. Samuel Ferjenčík’s manuscript is a Germanlanguage document by a Slovak author intended for private use; cited or adapted passages are usually given without any reference to the source. The paper introduces the problems of automatic identification of the source base, which relate to the application of OCR and content similarity detection tools. It discusses the results of text matching, which revealed several manipulations of source texts, especially substitutions, indicating attitudes and priority problems in the author’s thought-world. It further interprets the results of the use of the Sketch Engine corpus manager tools by which the frequency of occurrence of key terms and their collocability were investigated, paying special attention to substituted words. The paper is an example of the application of computer and corpus-linguistics methods to the interpretation of literary texts, which is represented by a number of current studies in the field of digital humanities. The proposed approaches are applicable to research on other books of notes and extracts, topical in the context of research trends related to egodocuments, as well as to textual research on monu ments of other genres.\",\"PeriodicalId\":262732,\"journal\":{\"name\":\"Journal of Linguistics/Jazykovedný casopis\",\"volume\":\"67 1\",\"pages\":\"287 - 300\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Linguistics/Jazykovedný casopis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jazcas-2023-0046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Linguistics/Jazykovedný casopis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jazcas-2023-0046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摘要本研究探讨了利用计算机和语料库工具解释笔记和摘录类文本的可能性；这些文件包括当代报刊和文学作品的摘录和修改摘录，以及作者本人的思想记录等。Samuel Ferjenčík 的手稿是斯洛伐克作家的德文文献，供私人使用；引用或改编的段落通常不注明出处。本文介绍了自动识别来源基础的问题，这些问题与 OCR 和内容相似性检测工具的应用有关。论文讨论了文本比对的结果，结果显示对源文本进行了一些篡改，尤其是替换，这表明了作者思想世界中的态度和优先权问题。论文进一步解释了 Sketch Engine 语料库管理工具的使用结果，通过该工具，研究了关键术语的出现频率及其可搭配性，特别关注了替换词。本文是将计算机和语料库语言学方法应用于文学文本解读的一个范例，目前数字人文领域的一些研究都体现了这一点。所提出的方法适用于其他注释和摘录书籍的研究，在与电子文献相关的研究趋势背景下，也适用于其他体裁的古籍文本研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Use of Computer and Corpus Tols in the Research of a 19th Century German -Language Manuscript Bok of Notes and Extracts

Abstract The study explores the possibilities of using computer and corpus tools in the interpretation of texts of the genre of book of notes and extracts; these are documents consisting of extracts and modified excerpts from contemporary press and literature, records of the author’s own thoughts, etc. Samuel Ferjenčík’s manuscript is a Germanlanguage document by a Slovak author intended for private use; cited or adapted passages are usually given without any reference to the source. The paper introduces the problems of automatic identification of the source base, which relate to the application of OCR and content similarity detection tools. It discusses the results of text matching, which revealed several manipulations of source texts, especially substitutions, indicating attitudes and priority problems in the author’s thought-world. It further interprets the results of the use of the Sketch Engine corpus manager tools by which the frequency of occurrence of key terms and their collocability were investigated, paying special attention to substituted words. The paper is an example of the application of computer and corpus-linguistics methods to the interpretation of literary texts, which is represented by a number of current studies in the field of digital humanities. The proposed approaches are applicable to research on other books of notes and extracts, topical in the context of research trends related to egodocuments, as well as to textual research on monu ments of other genres.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Linguistics/Jazykovedný casopis

自引率

0.00%

发文量