Graph-based concept discovery in multi relational data

Y. Kavurucu, Alev Mutlu, T. Ensari
{"title":"Graph-based concept discovery in multi relational data","authors":"Y. Kavurucu, Alev Mutlu, T. Ensari","doi":"10.1109/CONFLUENCE.2016.7508128","DOIUrl":null,"url":null,"abstract":"Developments in technology, especially in computer science created the need of storing data in variety of areas. This need created the term database where the data is stored in a useful form. In the database, data is logically integrated in file/files according to relations among them. One of the important issues is to extract knowledge from these databases that hold data in a useful and complete form. This process is called as data mining. The main objective of data mining is to extract implicit and useful knowledge from huge and at first glance meaningless mass of data that is stored in database(s). Multi-Relational databases are the ones in which the data is stored in multiple tables (relations). The relationships between those tables are also stored as tables (relations) in the database. The more effective and commonly known approaches for Multi-Relational Data Mining (MRDM) are based on Inductive Logic Programming (ILP). ILP contains concepts from Inductive Learning and Logic Programming. From this point, the main purpose of MRDM is extracting implicit and trivial knowledge from relational database(s) using ILP approaches and techniques. In this approach, data is represented in graph structures and graph mining techniques are used for knowledge discovery. Concept discovery in multi-relational data mining aims to find relational rules that best describe a relation, called target relation, in terms of other relations in the database, called background knowledge. In this study, a graph-based concept discovery method for concept discovery is presented. The proposed method, namely G-CDS (Graph-based Concept Discovery System), utilizes methods both from substructure-based and path-finding based approaches, hence it can be considered as a hybrid method. G-CDS generates disconnected graph structures for each target relation and its related background knowledge, which are initially stored in a relational database, and utilizes them to guide generation of a summary graph. The summary graph is traversed to find concept descriptors. A set of experiments is conducted on datasets that belong to different learning problems. The experimental results show that G-CDS is capable of learning definitions of target relations that belong to different learning problems.","PeriodicalId":299044,"journal":{"name":"2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 6th International Conference - Cloud System and Big Data Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONFLUENCE.2016.7508128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Developments in technology, especially in computer science created the need of storing data in variety of areas. This need created the term database where the data is stored in a useful form. In the database, data is logically integrated in file/files according to relations among them. One of the important issues is to extract knowledge from these databases that hold data in a useful and complete form. This process is called as data mining. The main objective of data mining is to extract implicit and useful knowledge from huge and at first glance meaningless mass of data that is stored in database(s). Multi-Relational databases are the ones in which the data is stored in multiple tables (relations). The relationships between those tables are also stored as tables (relations) in the database. The more effective and commonly known approaches for Multi-Relational Data Mining (MRDM) are based on Inductive Logic Programming (ILP). ILP contains concepts from Inductive Learning and Logic Programming. From this point, the main purpose of MRDM is extracting implicit and trivial knowledge from relational database(s) using ILP approaches and techniques. In this approach, data is represented in graph structures and graph mining techniques are used for knowledge discovery. Concept discovery in multi-relational data mining aims to find relational rules that best describe a relation, called target relation, in terms of other relations in the database, called background knowledge. In this study, a graph-based concept discovery method for concept discovery is presented. The proposed method, namely G-CDS (Graph-based Concept Discovery System), utilizes methods both from substructure-based and path-finding based approaches, hence it can be considered as a hybrid method. G-CDS generates disconnected graph structures for each target relation and its related background knowledge, which are initially stored in a relational database, and utilizes them to guide generation of a summary graph. The summary graph is traversed to find concept descriptors. A set of experiments is conducted on datasets that belong to different learning problems. The experimental results show that G-CDS is capable of learning definitions of target relations that belong to different learning problems.
多关系数据中基于图的概念发现
技术的发展,尤其是计算机科学的发展,产生了在不同领域存储数据的需求。这需要创建术语数据库,其中以有用的形式存储数据。在数据库中,数据按照文件之间的关系逻辑地集成到文件中。其中一个重要的问题是从这些以有用和完整的形式保存数据的数据库中提取知识。这个过程称为数据挖掘。数据挖掘的主要目标是从存储在数据库中的大量数据中提取隐含的和有用的知识。多关系数据库是指数据存储在多个表(关系)中的数据库。这些表之间的关系也作为表(关系)存储在数据库中。多关系数据挖掘(MRDM)的更有效和更广为人知的方法是基于归纳逻辑编程(ILP)。ILP包含归纳学习和逻辑编程的概念。从这一点来看,MRDM的主要目的是使用ILP方法和技术从关系数据库中提取隐含的和琐碎的知识。在这种方法中,数据以图结构表示,并使用图挖掘技术进行知识发现。多关系数据挖掘中的概念发现旨在根据数据库中的其他关系(称为背景知识)找到最能描述关系(称为目标关系)的关系规则。本文提出了一种基于图的概念发现方法。所提出的方法,即基于图的概念发现系统(G-CDS),利用了基于子结构和基于寻路的方法,因此可以认为是一种混合方法。G-CDS为每个目标关系及其相关背景知识生成不相连的图结构,这些图结构最初存储在关系数据库中,并利用它们指导汇总图的生成。遍历摘要图以查找概念描述符。在属于不同学习问题的数据集上进行了一组实验。实验结果表明,G-CDS能够学习属于不同学习问题的目标关系的定义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信