探索神圣文本：利用计算机科学进行宗教研究中的数据集相似性分析

Engineering Headway Pub Date : 2024-04-16 DOI:10.4028/p-ke3xms

Muhammad Raffiudin

{"title":"探索神圣文本：利用计算机科学进行宗教研究中的数据集相似性分析","authors":"Muhammad Raffiudin","doi":"10.4028/p-ke3xms","DOIUrl":null,"url":null,"abstract":"Studying the Quran and the Hadith side by side can help us understand that the two are fundamental and two main resources and essential wellspring of Islamic knowledge and law. There are many debates about similarities between those holy scriptures from many famous preachers and scholars. Technology can be used as an alternative solution to solve these problems. There are at least two overall approaches to determine text-similarity; the vector space model and semantic similarity —define the similarity or the distance. The similarity between words is often represented by a similarity between concepts associated with the words. This paper presents a method for identifying semantic sentence similarity among each sentence from each dataset using semantic relation of word senses between different synsets using WordNet path similarity and Wu-Palmer similarity. This method is also evaluated and has acceptable accuracy. Although both Path Similarity and Wu-Palmer Similarity successfully identify the similarity between two sentences; still, they have slightly different accuracy. The Wu-Palmer similarity is superior to path similarity when identifying sentences between Quran Sahih International and An-Nawawi Forty Hadith Translation. Looking ahead, we might be able to improve our results by using multipliers such as reverse document frequency (TF-IDF), combining the results of several steps in WordNet similarity, using vector space models, and optimal matching methods.","PeriodicalId":512976,"journal":{"name":"Engineering Headway","volume":"23 5‐6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring Sacred Texts: Leveraging Computer Science for Dataset Similarity Analysis in Religious Studies\",\"authors\":\"Muhammad Raffiudin\",\"doi\":\"10.4028/p-ke3xms\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Studying the Quran and the Hadith side by side can help us understand that the two are fundamental and two main resources and essential wellspring of Islamic knowledge and law. There are many debates about similarities between those holy scriptures from many famous preachers and scholars. Technology can be used as an alternative solution to solve these problems. There are at least two overall approaches to determine text-similarity; the vector space model and semantic similarity —define the similarity or the distance. The similarity between words is often represented by a similarity between concepts associated with the words. This paper presents a method for identifying semantic sentence similarity among each sentence from each dataset using semantic relation of word senses between different synsets using WordNet path similarity and Wu-Palmer similarity. This method is also evaluated and has acceptable accuracy. Although both Path Similarity and Wu-Palmer Similarity successfully identify the similarity between two sentences; still, they have slightly different accuracy. The Wu-Palmer similarity is superior to path similarity when identifying sentences between Quran Sahih International and An-Nawawi Forty Hadith Translation. Looking ahead, we might be able to improve our results by using multipliers such as reverse document frequency (TF-IDF), combining the results of several steps in WordNet similarity, using vector space models, and optimal matching methods.\",\"PeriodicalId\":512976,\"journal\":{\"name\":\"Engineering Headway\",\"volume\":\"23 5‐6\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Headway\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4028/p-ke3xms\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Headway","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4028/p-ke3xms","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

并排研读《古兰经》和《圣训》可以帮助我们理解，这两部经书是伊斯兰知识和法律的两大基本资源和重要源泉。许多著名的传教士和学者都对这两部圣典之间的相似之处进行过多次辩论。技术可以作为解决这些问题的替代方案。确定文本相似性的总体方法至少有两种：向量空间模型和语义相似性--定义相似性或距离。词与词之间的相似性通常用与词相关的概念之间的相似性来表示。本文提出了一种方法，利用 WordNet 路径相似性和 Wu-Palmer 相似性，利用不同语义集之间词义的语义关系来识别每个数据集中每个句子的语义句子相似性。我们还对该方法进行了评估，其准确性是可以接受的。虽然路径相似性和 Wu-Palmer 相似性都能成功识别两个句子之间的相似性，但它们的准确性略有不同。在识别《古兰经 Sahih International》和《安-纳维四十圣训译本》之间的句子时，Wu-Palmer 相似性优于路径相似性。展望未来，我们或许可以通过使用反向文档频率（TF-IDF）等乘数、结合 WordNet 相似性中多个步骤的结果、使用向量空间模型和最佳匹配方法来改进我们的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring Sacred Texts: Leveraging Computer Science for Dataset Similarity Analysis in Religious Studies

Studying the Quran and the Hadith side by side can help us understand that the two are fundamental and two main resources and essential wellspring of Islamic knowledge and law. There are many debates about similarities between those holy scriptures from many famous preachers and scholars. Technology can be used as an alternative solution to solve these problems. There are at least two overall approaches to determine text-similarity; the vector space model and semantic similarity —define the similarity or the distance. The similarity between words is often represented by a similarity between concepts associated with the words. This paper presents a method for identifying semantic sentence similarity among each sentence from each dataset using semantic relation of word senses between different synsets using WordNet path similarity and Wu-Palmer similarity. This method is also evaluated and has acceptable accuracy. Although both Path Similarity and Wu-Palmer Similarity successfully identify the similarity between two sentences; still, they have slightly different accuracy. The Wu-Palmer similarity is superior to path similarity when identifying sentences between Quran Sahih International and An-Nawawi Forty Hadith Translation. Looking ahead, we might be able to improve our results by using multipliers such as reverse document frequency (TF-IDF), combining the results of several steps in WordNet similarity, using vector space models, and optimal matching methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Headway

自引率

0.00%

发文量