Extracting and Tagging Unstructured Citation of a Hebrew Religious Document

Dror Mughaz, Yaakov HaCohen-Kerner, D. Gabbay
{"title":"Extracting and Tagging Unstructured Citation of a Hebrew Religious Document","authors":"Dror Mughaz, Yaakov HaCohen-Kerner, D. Gabbay","doi":"10.28945/4345","DOIUrl":null,"url":null,"abstract":"Aim/Purpose: Finding and tagging citation on an ancient Hebrew religious document. These documents have no structured citations and have no bibliography.\n\nBackground: We look for common patterns within Hebrew religious texts. \n\nMethodology: We developed a method that goes over the texts and extracts sentences con-taining the names of three famous authors. Within these sentences we find common ways of addressing those three authors and with these patterns we find references to various other authors.\n\nContribution: This type of text is rich in citations and references to authors, but because there is no structure of references it is very difficult for a computer to automatically identify the references. We hope that with the method we have developed it will be easier for a computer to identify references and even turn them into hyper-links.\n\nFindings: We have provided an algorithm to solve the problem of non-structured cita-tions in an old Hebrew plain text. The algorithm definitely was able to find many citations but it has missed out some types of citations.\n\nImpact on Society: When the computer recognizes references, it will be able to build (at least par-tially) a bibliography that currently does not exist in such texts at all. Over time, OCR scans more and more ancient texts. This method can make people's access and understanding much.\n\nFuture Research: After we identify the references, we plan to automatically create a bibliography for these texts and even transform those references into hyperlinks.","PeriodicalId":249265,"journal":{"name":"Proceedings of the 2019 InSITE Conference","volume":"301 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 InSITE Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28945/4345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Aim/Purpose: Finding and tagging citation on an ancient Hebrew religious document. These documents have no structured citations and have no bibliography. Background: We look for common patterns within Hebrew religious texts. Methodology: We developed a method that goes over the texts and extracts sentences con-taining the names of three famous authors. Within these sentences we find common ways of addressing those three authors and with these patterns we find references to various other authors. Contribution: This type of text is rich in citations and references to authors, but because there is no structure of references it is very difficult for a computer to automatically identify the references. We hope that with the method we have developed it will be easier for a computer to identify references and even turn them into hyper-links. Findings: We have provided an algorithm to solve the problem of non-structured cita-tions in an old Hebrew plain text. The algorithm definitely was able to find many citations but it has missed out some types of citations. Impact on Society: When the computer recognizes references, it will be able to build (at least par-tially) a bibliography that currently does not exist in such texts at all. Over time, OCR scans more and more ancient texts. This method can make people's access and understanding much. Future Research: After we identify the references, we plan to automatically create a bibliography for these texts and even transform those references into hyperlinks.
提取和标记非结构化引文的希伯来宗教文件
目的:在古希伯来宗教文献中寻找并标注引文。这些文档没有结构化的引用,也没有参考书目。背景:我们在希伯来宗教文本中寻找共同的模式。方法:我们开发了一种方法,通过文本和包含三个著名作家的名字提取句子。在这些句子中,我们可以找到称呼这三位作者的常用方式,并通过这些模式找到对其他作者的引用。贡献:这种类型的文本有丰富的引用和作者参考文献,但由于没有参考文献结构,计算机很难自动识别参考文献。我们希望通过我们开发的方法,计算机可以更容易地识别参考文献,甚至将它们转换为超链接。结果:我们提供了一种算法来解决古希伯来文纯文本中的非结构化引文问题。该算法确实能够找到许多引用,但它错过了某些类型的引用。对社会的影响:当计算机识别参考文献时,它将能够建立(至少部分)目前根本不存在于此类文本中的参考书目。随着时间的推移,OCR扫描了越来越多的古代文本。这种方法可以使人们的接触和理解更多。未来研究:在我们确定了参考文献之后,我们计划为这些文本自动创建参考书目,甚至将这些参考文献转换为超链接。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信