探索神圣文本:利用计算机科学进行宗教研究中的数据集相似性分析

Muhammad Raffiudin
{"title":"探索神圣文本:利用计算机科学进行宗教研究中的数据集相似性分析","authors":"Muhammad Raffiudin","doi":"10.4028/p-ke3xms","DOIUrl":null,"url":null,"abstract":"Studying the Quran and the Hadith side by side can help us understand that the two are fundamental and two main resources and essential wellspring of Islamic knowledge and law. There are many debates about similarities between those holy scriptures from many famous preachers and scholars. Technology can be used as an alternative solution to solve these problems. There are at least two overall approaches to determine text-similarity; the vector space model and semantic similarity —define the similarity or the distance. The similarity between words is often represented by a similarity between concepts associated with the words. This paper presents a method for identifying semantic sentence similarity among each sentence from each dataset using semantic relation of word senses between different synsets using WordNet path similarity and Wu-Palmer similarity. This method is also evaluated and has acceptable accuracy. Although both Path Similarity and Wu-Palmer Similarity successfully identify the similarity between two sentences; still, they have slightly different accuracy. The Wu-Palmer similarity is superior to path similarity when identifying sentences between Quran Sahih International and An-Nawawi Forty Hadith Translation. Looking ahead, we might be able to improve our results by using multipliers such as reverse document frequency (TF-IDF), combining the results of several steps in WordNet similarity, using vector space models, and optimal matching methods.","PeriodicalId":512976,"journal":{"name":"Engineering Headway","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring Sacred Texts: Leveraging Computer Science for Dataset Similarity Analysis in Religious Studies\",\"authors\":\"Muhammad Raffiudin\",\"doi\":\"10.4028/p-ke3xms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Studying the Quran and the Hadith side by side can help us understand that the two are fundamental and two main resources and essential wellspring of Islamic knowledge and law. There are many debates about similarities between those holy scriptures from many famous preachers and scholars. Technology can be used as an alternative solution to solve these problems. There are at least two overall approaches to determine text-similarity; the vector space model and semantic similarity —define the similarity or the distance. The similarity between words is often represented by a similarity between concepts associated with the words. This paper presents a method for identifying semantic sentence similarity among each sentence from each dataset using semantic relation of word senses between different synsets using WordNet path similarity and Wu-Palmer similarity. This method is also evaluated and has acceptable accuracy. Although both Path Similarity and Wu-Palmer Similarity successfully identify the similarity between two sentences; still, they have slightly different accuracy. The Wu-Palmer similarity is superior to path similarity when identifying sentences between Quran Sahih International and An-Nawawi Forty Hadith Translation. Looking ahead, we might be able to improve our results by using multipliers such as reverse document frequency (TF-IDF), combining the results of several steps in WordNet similarity, using vector space models, and optimal matching methods.\",\"PeriodicalId\":512976,\"journal\":{\"name\":\"Engineering Headway\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Headway\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4028/p-ke3xms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Headway","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4028/p-ke3xms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

并排研读《古兰经》和《圣训》可以帮助我们理解,这两部经书是伊斯兰知识和法律的两大基本资源和重要源泉。许多著名的传教士和学者都对这两部圣典之间的相似之处进行过多次辩论。技术可以作为解决这些问题的替代方案。确定文本相似性的总体方法至少有两种:向量空间模型和语义相似性--定义相似性或距离。词与词之间的相似性通常用与词相关的概念之间的相似性来表示。本文提出了一种方法,利用 WordNet 路径相似性和 Wu-Palmer 相似性,利用不同语义集之间词义的语义关系来识别每个数据集中每个句子的语义句子相似性。我们还对该方法进行了评估,其准确性是可以接受的。虽然路径相似性和 Wu-Palmer 相似性都能成功识别两个句子之间的相似性,但它们的准确性略有不同。在识别《古兰经 Sahih International》和《安-纳维四十圣训译本》之间的句子时,Wu-Palmer 相似性优于路径相似性。展望未来,我们或许可以通过使用反向文档频率(TF-IDF)等乘数、结合 WordNet 相似性中多个步骤的结果、使用向量空间模型和最佳匹配方法来改进我们的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploring Sacred Texts: Leveraging Computer Science for Dataset Similarity Analysis in Religious Studies
Studying the Quran and the Hadith side by side can help us understand that the two are fundamental and two main resources and essential wellspring of Islamic knowledge and law. There are many debates about similarities between those holy scriptures from many famous preachers and scholars. Technology can be used as an alternative solution to solve these problems. There are at least two overall approaches to determine text-similarity; the vector space model and semantic similarity —define the similarity or the distance. The similarity between words is often represented by a similarity between concepts associated with the words. This paper presents a method for identifying semantic sentence similarity among each sentence from each dataset using semantic relation of word senses between different synsets using WordNet path similarity and Wu-Palmer similarity. This method is also evaluated and has acceptable accuracy. Although both Path Similarity and Wu-Palmer Similarity successfully identify the similarity between two sentences; still, they have slightly different accuracy. The Wu-Palmer similarity is superior to path similarity when identifying sentences between Quran Sahih International and An-Nawawi Forty Hadith Translation. Looking ahead, we might be able to improve our results by using multipliers such as reverse document frequency (TF-IDF), combining the results of several steps in WordNet similarity, using vector space models, and optimal matching methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信