S. Putra, Ria Hari Gusmita, K. Hulliyah, H. Sukmana
{"title":"A semantic-based question answering system for indonesian translation of Quran","authors":"S. Putra, Ria Hari Gusmita, K. Hulliyah, H. Sukmana","doi":"10.1145/3011141.3011219","DOIUrl":null,"url":null,"abstract":"This paper presents a work in developing a semantic-based question answering system (QAS) for Indonesian Translation of Quran (ITQ). This research is motivated by the lacks of previous built QAS that caused by a keyword-based retrieval. Instead of keeping the retrieval method, we shifted to a semantic approach where the retrieval process is done by using a semantic similarity measurement. In doing so, we built an ontology of ITQ to get the concepts as well as verses where they appear in. We applied three factoid question types on the QAS that including Who, Where, and When. Furthermore, a weighted vector for each concept that belongs to respective expected answering type (also called as named entity group) i.e. Person, Location, and Time is generated in order to feed semantic interpreter on user question. From 222 concepts defined from the ontology, we clustered them into 77, 24, and 6 concepts for Person, Location, and Time respectively. Since we found there are some characteristics of texts in ITQ, we developed our own modules to deal with including generate the inverted index and named entity recognition. Answer extraction is conducted by applying some features extraction in order to score the answer candidates. Evaluation of the system is designed by providing two data set of question and answer where the first one is purposed to measure the effectiveness of semantic approach comparing with keyword-based retrieval and the last one aims to know system performance in regard the appearance of concepts in ITQ.","PeriodicalId":247823,"journal":{"name":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3011141.3011219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
This paper presents a work in developing a semantic-based question answering system (QAS) for Indonesian Translation of Quran (ITQ). This research is motivated by the lacks of previous built QAS that caused by a keyword-based retrieval. Instead of keeping the retrieval method, we shifted to a semantic approach where the retrieval process is done by using a semantic similarity measurement. In doing so, we built an ontology of ITQ to get the concepts as well as verses where they appear in. We applied three factoid question types on the QAS that including Who, Where, and When. Furthermore, a weighted vector for each concept that belongs to respective expected answering type (also called as named entity group) i.e. Person, Location, and Time is generated in order to feed semantic interpreter on user question. From 222 concepts defined from the ontology, we clustered them into 77, 24, and 6 concepts for Person, Location, and Time respectively. Since we found there are some characteristics of texts in ITQ, we developed our own modules to deal with including generate the inverted index and named entity recognition. Answer extraction is conducted by applying some features extraction in order to score the answer candidates. Evaluation of the system is designed by providing two data set of question and answer where the first one is purposed to measure the effectiveness of semantic approach comparing with keyword-based retrieval and the last one aims to know system performance in regard the appearance of concepts in ITQ.
本文介绍了一种基于语义的古兰经印尼语翻译问答系统的开发工作。这项研究的动机是缺乏以前建立的QAS,这是由基于关键字的检索引起的。我们不再保留原有的检索方法,而是采用语义方法,通过使用语义相似度度量来完成检索过程。在这样做的过程中,我们建立了一个ITQ本体来获取概念以及它们出现的地方。我们在QAS中应用了三种因子问题类型,包括Who, Where, and When。此外,为每个属于各自预期回答类型(也称为命名实体组)的概念生成加权向量,即Person、Location和Time,以便为用户问题提供语义解释器。从本体定义的222个概念中,我们将它们分别聚类为77个、24个和6个概念,分别代表Person、Location和Time。由于我们发现ITQ中的文本有一些特点,我们开发了自己的模块来处理,包括倒排索引的生成和命名实体的识别。答案提取是通过一些特征提取来对候选答案进行评分。系统的评估是通过提供两个问题和答案数据集来设计的,其中第一个数据集旨在衡量语义方法与基于关键字的检索相比的有效性,最后一个数据集旨在了解系统在ITQ中概念外观方面的性能。