Aggregate Functions in Categorical Data Skyline Search (CDSS) for Multi-keyword Document Search

Mardiah Mardiah, Annisa Annisa, S. N. Neyman
{"title":"Aggregate Functions in Categorical Data Skyline Search (CDSS) for Multi-keyword Document Search","authors":"Mardiah Mardiah, Annisa Annisa, S. N. Neyman","doi":"10.23917/khif.v9i1.18127","DOIUrl":null,"url":null,"abstract":"- Literature review is the first step in starting research for a deep understanding of the research interest. However, finding literature relevant to research interests is difficult and takes time. Skyline query is a method that can be used for filtering. An object p is said to dominate object q if p equals q on all of its attributes, and p is at least better than q on one attribute. Categorical Data Skyline Search (CDSS) is an algorithm that can filter skyline objects in categorical data types such as documents. CDSS uses Extended Distance Wu and Palmer (DEWP) to calculate the distance between the user query and document keywords. The document keywords and user queries are represented as nodes in the ACM CCS ontology, and documents are assumed to be represented by a single keyword. This study aims to use the CDSS algorithm to search for skyline documents represented by more than one keyword by adding an aggregate function (average, minimum, maximum) to the CDSS algorithm, especially in calculating DEWP. This study used the thesis documents from the IPB University computer science department. Document keywords will be extracted using the Term Frequency-Inverse Term Frequency (TF-IDF) method. The collected keywords will be mapped in a mixed ontology tree that refers to the Association of Computing Machinery Computing Classification System 2012 (ACM CCS 2012) and Computer Science Ontology (CSO) as ontology standards in computer science. The skyline query algorithm for determining skyline documents is Block Nested Loop (BNL). The evaluation method uses the skyline ratio of each aggregate function in the CDSS. Based on the ratio value, CDSS using the maximum DEWP has the most relevant skyline results compared to the average DEWP and minimum DEWP.","PeriodicalId":326094,"journal":{"name":"Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23917/khif.v9i1.18127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

- Literature review is the first step in starting research for a deep understanding of the research interest. However, finding literature relevant to research interests is difficult and takes time. Skyline query is a method that can be used for filtering. An object p is said to dominate object q if p equals q on all of its attributes, and p is at least better than q on one attribute. Categorical Data Skyline Search (CDSS) is an algorithm that can filter skyline objects in categorical data types such as documents. CDSS uses Extended Distance Wu and Palmer (DEWP) to calculate the distance between the user query and document keywords. The document keywords and user queries are represented as nodes in the ACM CCS ontology, and documents are assumed to be represented by a single keyword. This study aims to use the CDSS algorithm to search for skyline documents represented by more than one keyword by adding an aggregate function (average, minimum, maximum) to the CDSS algorithm, especially in calculating DEWP. This study used the thesis documents from the IPB University computer science department. Document keywords will be extracted using the Term Frequency-Inverse Term Frequency (TF-IDF) method. The collected keywords will be mapped in a mixed ontology tree that refers to the Association of Computing Machinery Computing Classification System 2012 (ACM CCS 2012) and Computer Science Ontology (CSO) as ontology standards in computer science. The skyline query algorithm for determining skyline documents is Block Nested Loop (BNL). The evaluation method uses the skyline ratio of each aggregate function in the CDSS. Based on the ratio value, CDSS using the maximum DEWP has the most relevant skyline results compared to the average DEWP and minimum DEWP.
分类数据天际线搜索(CDSS)在多关键词文档搜索中的聚合功能
-文献综述是开始研究的第一步,以便深入了解研究兴趣。然而,找到与研究兴趣相关的文献是困难的,而且需要时间。Skyline查询是一种可用于过滤的方法。如果一个对象p在其所有属性上都等于q,并且p至少在一个属性上优于q,那么我们就说它支配对象q。分类数据天际线搜索(CDSS)是一种可以过滤分类数据类型(如文档)中的天际线对象的算法。CDSS使用DEWP (Extended Distance Wu and Palmer)来计算用户查询与文档关键字之间的距离。文档关键字和用户查询用ACM CCS本体中的节点表示,文档假设用单个关键字表示。本研究旨在通过在CDSS算法中加入一个聚合函数(average, minimum, maximum),特别是在计算DEWP时,利用CDSS算法搜索由多个关键字表示的天际线文档。本研究使用IPB大学计算机科学系的论文文件。使用词频-逆词频(TF-IDF)方法提取文档关键词。将收集到的关键词映射到一个混合本体树中,该树以计算机科学本体标准ACM CCS 2012和计算机科学本体CSO作为计算机科学本体标准。确定天际线文档的天际线查询算法是块嵌套循环(BNL)。评价方法采用CDSS中各聚合函数的天际线比值。根据比值值,与平均DEWP和最小DEWP相比,使用最大DEWP的CDSS具有最相关的天际线结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信