{"title":"分类数据天际线搜索(CDSS)在多关键词文档搜索中的聚合功能","authors":"Mardiah Mardiah, Annisa Annisa, S. N. Neyman","doi":"10.23917/khif.v9i1.18127","DOIUrl":null,"url":null,"abstract":"- Literature review is the first step in starting research for a deep understanding of the research interest. However, finding literature relevant to research interests is difficult and takes time. Skyline query is a method that can be used for filtering. An object p is said to dominate object q if p equals q on all of its attributes, and p is at least better than q on one attribute. Categorical Data Skyline Search (CDSS) is an algorithm that can filter skyline objects in categorical data types such as documents. CDSS uses Extended Distance Wu and Palmer (DEWP) to calculate the distance between the user query and document keywords. The document keywords and user queries are represented as nodes in the ACM CCS ontology, and documents are assumed to be represented by a single keyword. This study aims to use the CDSS algorithm to search for skyline documents represented by more than one keyword by adding an aggregate function (average, minimum, maximum) to the CDSS algorithm, especially in calculating DEWP. This study used the thesis documents from the IPB University computer science department. Document keywords will be extracted using the Term Frequency-Inverse Term Frequency (TF-IDF) method. The collected keywords will be mapped in a mixed ontology tree that refers to the Association of Computing Machinery Computing Classification System 2012 (ACM CCS 2012) and Computer Science Ontology (CSO) as ontology standards in computer science. The skyline query algorithm for determining skyline documents is Block Nested Loop (BNL). The evaluation method uses the skyline ratio of each aggregate function in the CDSS. Based on the ratio value, CDSS using the maximum DEWP has the most relevant skyline results compared to the average DEWP and minimum DEWP.","PeriodicalId":326094,"journal":{"name":"Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika","volume":"173 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Aggregate Functions in Categorical Data Skyline Search (CDSS) for Multi-keyword Document Search\",\"authors\":\"Mardiah Mardiah, Annisa Annisa, S. N. Neyman\",\"doi\":\"10.23917/khif.v9i1.18127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"- Literature review is the first step in starting research for a deep understanding of the research interest. However, finding literature relevant to research interests is difficult and takes time. Skyline query is a method that can be used for filtering. An object p is said to dominate object q if p equals q on all of its attributes, and p is at least better than q on one attribute. Categorical Data Skyline Search (CDSS) is an algorithm that can filter skyline objects in categorical data types such as documents. CDSS uses Extended Distance Wu and Palmer (DEWP) to calculate the distance between the user query and document keywords. The document keywords and user queries are represented as nodes in the ACM CCS ontology, and documents are assumed to be represented by a single keyword. This study aims to use the CDSS algorithm to search for skyline documents represented by more than one keyword by adding an aggregate function (average, minimum, maximum) to the CDSS algorithm, especially in calculating DEWP. This study used the thesis documents from the IPB University computer science department. Document keywords will be extracted using the Term Frequency-Inverse Term Frequency (TF-IDF) method. The collected keywords will be mapped in a mixed ontology tree that refers to the Association of Computing Machinery Computing Classification System 2012 (ACM CCS 2012) and Computer Science Ontology (CSO) as ontology standards in computer science. The skyline query algorithm for determining skyline documents is Block Nested Loop (BNL). The evaluation method uses the skyline ratio of each aggregate function in the CDSS. Based on the ratio value, CDSS using the maximum DEWP has the most relevant skyline results compared to the average DEWP and minimum DEWP.\",\"PeriodicalId\":326094,\"journal\":{\"name\":\"Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika\",\"volume\":\"173 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23917/khif.v9i1.18127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Khazanah Informatika : Jurnal Ilmu Komputer dan Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23917/khif.v9i1.18127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Aggregate Functions in Categorical Data Skyline Search (CDSS) for Multi-keyword Document Search
- Literature review is the first step in starting research for a deep understanding of the research interest. However, finding literature relevant to research interests is difficult and takes time. Skyline query is a method that can be used for filtering. An object p is said to dominate object q if p equals q on all of its attributes, and p is at least better than q on one attribute. Categorical Data Skyline Search (CDSS) is an algorithm that can filter skyline objects in categorical data types such as documents. CDSS uses Extended Distance Wu and Palmer (DEWP) to calculate the distance between the user query and document keywords. The document keywords and user queries are represented as nodes in the ACM CCS ontology, and documents are assumed to be represented by a single keyword. This study aims to use the CDSS algorithm to search for skyline documents represented by more than one keyword by adding an aggregate function (average, minimum, maximum) to the CDSS algorithm, especially in calculating DEWP. This study used the thesis documents from the IPB University computer science department. Document keywords will be extracted using the Term Frequency-Inverse Term Frequency (TF-IDF) method. The collected keywords will be mapped in a mixed ontology tree that refers to the Association of Computing Machinery Computing Classification System 2012 (ACM CCS 2012) and Computer Science Ontology (CSO) as ontology standards in computer science. The skyline query algorithm for determining skyline documents is Block Nested Loop (BNL). The evaluation method uses the skyline ratio of each aggregate function in the CDSS. Based on the ratio value, CDSS using the maximum DEWP has the most relevant skyline results compared to the average DEWP and minimum DEWP.