打破僵局：用多目标元启发式同时发现属性匹配和集群匹配

Q2 Computer Science

Journal on Data Semantics Pub Date : 2012-08-01 DOI:10.1007/s13740-012-0010-0

Haishan Liu, Dejing Dou, Hao Wang

{"title":"打破僵局：用多目标元启发式同时发现属性匹配和集群匹配","authors":"Haishan Liu, Dejing Dou, Hao Wang","doi":"10.1007/s13740-012-0010-0","DOIUrl":null,"url":null,"abstract":"In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.","PeriodicalId":54029,"journal":{"name":"Journal on Data Semantics","volume":"1 2","pages":"133-145"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478963/pdf/nihms406312.pdf","citationCount":"0","resultStr":"{\"title\":\"Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.\",\"authors\":\"Haishan Liu, Dejing Dou, Hao Wang\",\"doi\":\"10.1007/s13740-012-0010-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.\",\"PeriodicalId\":54029,\"journal\":{\"name\":\"Journal on Data Semantics\",\"volume\":\"1 2\",\"pages\":\"133-145\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478963/pdf/nihms406312.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal on Data Semantics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s13740-012-0010-0\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Data Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13740-012-0010-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了一种数据挖掘方法，以应对异构数据集匹配方面的挑战。特别是，我们针对在整合来自不同科研成果的信息时出现的两个问题提出了解决方案。第一个问题是属性匹配，涉及发现不同数字特征（属性）之间的对应关系，这些数字特征用于描述不同研究实验室收集和分析的数据集的特征。第二个问题是集群匹配，涉及发现跨数据集的模式（集群）之间的匹配。我们将这两个问题视为多目标优化问题。为了找到最优解，我们介绍了一种多目标元启发式算法，并与遗传算法进行了比较。我们使用合成数据集和现实数据集进行了一系列实验，展示了这种方法的实用性，这些数据集旨在模拟来自不同来源的异构数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.

查看原文本刊更多论文

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.

In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal on Data Semantics COMPUTER SCIENCE, INFORMATION SYSTEMS-

自引率

0.00%

发文量

期刊介绍： The Journal on Data Semantics (JoDS) provides an international high-quality publication venue for researchers whose themes cover issues related to information semantics. Its target domain ranges from theories supporting the formal definition of semantic content to innovative domain-specific applications of semantic knowledge, thus covering work done on conceptual modeling, databases, Semantic Web, information systems, workflow and process modeling, ontologies, business intelligence, interoperability, mobile information services, data warehousing, knowledge representation and reasoning, and artificial intelligence. Topics of relevance to this journal include (but are not limited to): Conceptualization, knowledge representation and reasoning, Conceptual data, process, workflow, and event modeling, Provenance, evolution and change management, Context and context-dependent representations and processing, Multi-model and multi-paradigm approaches, Mappings, transformations, reverse engineering and semantic elicitation, Semantic interoperability, semantic mediators and metadata management, Ontology models and languages, ontology-driven applications, Ontology, schema, data and process integration, reconciliation and alignment, Web semantics and semi-structured data, Integrity description and handling, Semantics in data mining and knowledge extraction, Semantics in business intelligence, analytics and data visualization, Spatial, temporal, multimedia and multimodal semantics, Semantic mobility data and services for mobile users, Supporting tools and applications of semantic-driven approaches.