An Information Retrieval Approach to Finding Similar Questions in Question-Answering of Indonesian Government e-Procurement Services using TF*IDF and LSI Model

S. Bahri, S. Sumpeno, S. M. S. Nugroho
{"title":"An Information Retrieval Approach to Finding Similar Questions in Question-Answering of Indonesian Government e-Procurement Services using TF*IDF and LSI Model","authors":"S. Bahri, S. Sumpeno, S. M. S. Nugroho","doi":"10.1109/ICITEED.2018.8534856","DOIUrl":null,"url":null,"abstract":"In the implementation of e-procurement application in the government of the Republic of Indonesia, one of the important issues is how to find relevant questions in the archive of a question answering (QA) service with the question asked by the user. A common method used for finding relevant documents is representing text documents into vector space model (VSM). Relevance between query and documents can be calculated using document similarities theory, by comparing the deviation of angles between each document vector and the query (from user question) vector where the query is represented as the same kind of vector as the documents. The Vector Space model algorithms widely used are Term Frequency * Inverse Document Frequency (TF*IDF) and Latent Semantic Indexing (LSI), however both models have their respective limitation. Considering that problem, this paper proposed hybrid model that combines TF*IDF and LSI to fix some limitations on both. From the experimental results, it is found that the proposed model is outperform (P@1=0.67) compared to TF*IDF model (P@1=0.27) and LSI model (P@1=0.4) that stand alone.","PeriodicalId":142523,"journal":{"name":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2018.8534856","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In the implementation of e-procurement application in the government of the Republic of Indonesia, one of the important issues is how to find relevant questions in the archive of a question answering (QA) service with the question asked by the user. A common method used for finding relevant documents is representing text documents into vector space model (VSM). Relevance between query and documents can be calculated using document similarities theory, by comparing the deviation of angles between each document vector and the query (from user question) vector where the query is represented as the same kind of vector as the documents. The Vector Space model algorithms widely used are Term Frequency * Inverse Document Frequency (TF*IDF) and Latent Semantic Indexing (LSI), however both models have their respective limitation. Considering that problem, this paper proposed hybrid model that combines TF*IDF and LSI to fix some limitations on both. From the experimental results, it is found that the proposed model is outperform (P@1=0.67) compared to TF*IDF model (P@1=0.27) and LSI model (P@1=0.4) that stand alone.
基于TF*IDF和LSI模型的印尼政府电子采购服务问答中相似问题的信息检索方法
在印尼共和国政府实施电子采购应用的过程中,如何根据用户提出的问题在问答服务的档案中找到相关问题是一个重要的问题。一种常用的查找相关文档的方法是将文本文档表示为向量空间模型(VSM)。查询和文档之间的相关性可以使用文档相似度理论来计算,方法是比较每个文档向量和查询(来自用户问题)向量之间的角度偏差,其中查询被表示为与文档相同类型的向量。目前广泛使用的向量空间模型算法有Term Frequency * Inverse Document Frequency (TF*IDF)和Latent Semantic Indexing (LSI),但这两种模型都有各自的局限性。针对这一问题,本文提出了TF*IDF和LSI相结合的混合模型,以弥补两者的局限性。实验结果表明,与单独使用TF*IDF模型(P@1=0.27)和LSI模型(P@1=0.4)相比,本文提出的模型(P@1=0.67)的性能更优。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信