打破僵局:用多目标元启发式同时发现属性匹配和集群匹配

Q2 Computer Science
Haishan Liu, Dejing Dou, Hao Wang
{"title":"打破僵局:用多目标元启发式同时发现属性匹配和集群匹配","authors":"Haishan Liu, Dejing Dou, Hao Wang","doi":"10.1007/s13740-012-0010-0","DOIUrl":null,"url":null,"abstract":"<p><p>In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.</p>","PeriodicalId":54029,"journal":{"name":"Journal on Data Semantics","volume":"1 2","pages":"133-145"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478963/pdf/nihms406312.pdf","citationCount":"0","resultStr":"{\"title\":\"Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.\",\"authors\":\"Haishan Liu, Dejing Dou, Hao Wang\",\"doi\":\"10.1007/s13740-012-0010-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.</p>\",\"PeriodicalId\":54029,\"journal\":{\"name\":\"Journal on Data Semantics\",\"volume\":\"1 2\",\"pages\":\"133-145\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478963/pdf/nihms406312.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal on Data Semantics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s13740-012-0010-0\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Data Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13740-012-0010-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们提出了一种数据挖掘方法,以应对异构数据集匹配方面的挑战。特别是,我们针对在整合来自不同科研成果的信息时出现的两个问题提出了解决方案。第一个问题是属性匹配,涉及发现不同数字特征(属性)之间的对应关系,这些数字特征用于描述不同研究实验室收集和分析的数据集的特征。第二个问题是集群匹配,涉及发现跨数据集的模式(集群)之间的匹配。我们将这两个问题视为多目标优化问题。为了找到最优解,我们介绍了一种多目标元启发式算法,并与遗传算法进行了比较。我们使用合成数据集和现实数据集进行了一系列实验,展示了这种方法的实用性,这些数据集旨在模拟来自不同来源的异构数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.

In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal on Data Semantics
Journal on Data Semantics COMPUTER SCIENCE, INFORMATION SYSTEMS-
自引率
0.00%
发文量
0
期刊介绍: The Journal on Data Semantics (JoDS) provides an international high-quality publication venue for researchers whose themes cover issues related to information semantics. Its target domain ranges from theories supporting the formal definition of semantic content to innovative domain-specific applications of semantic knowledge, thus covering work done on conceptual modeling, databases, Semantic Web, information systems, workflow and process modeling, ontologies, business intelligence, interoperability, mobile information services, data warehousing, knowledge representation and reasoning, and artificial intelligence. Topics of relevance to this journal include (but are not limited to): Conceptualization, knowledge representation and reasoning, Conceptual data, process, workflow, and event modeling, Provenance, evolution and change management, Context and context-dependent representations and processing, Multi-model and multi-paradigm approaches, Mappings, transformations, reverse engineering and semantic elicitation, Semantic interoperability, semantic mediators and metadata management, Ontology models and languages, ontology-driven applications, Ontology, schema, data and process integration, reconciliation and alignment, Web semantics and semi-structured data, Integrity description and handling, Semantics in data mining and knowledge extraction, Semantics in business intelligence, analytics and data visualization, Spatial, temporal, multimedia and multimodal semantics, Semantic mobility data and services for mobile users, Supporting tools and applications of semantic-driven approaches.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信