{"title":"打破僵局:用多目标元启发式同时发现属性匹配和集群匹配","authors":"Haishan Liu, Dejing Dou, Hao Wang","doi":"10.1007/s13740-012-0010-0","DOIUrl":null,"url":null,"abstract":"<p><p>In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.</p>","PeriodicalId":54029,"journal":{"name":"Journal on Data Semantics","volume":"1 2","pages":"133-145"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478963/pdf/nihms406312.pdf","citationCount":"0","resultStr":"{\"title\":\"Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.\",\"authors\":\"Haishan Liu, Dejing Dou, Hao Wang\",\"doi\":\"10.1007/s13740-012-0010-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.</p>\",\"PeriodicalId\":54029,\"journal\":{\"name\":\"Journal on Data Semantics\",\"volume\":\"1 2\",\"pages\":\"133-145\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3478963/pdf/nihms406312.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal on Data Semantics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s13740-012-0010-0\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal on Data Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13740-012-0010-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Metaheuristics.
In this paper, we present a data mining approach to address challenges in the matching of heterogeneous datasets. In particular, we propose solutions to two problems that arise in integrating information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric features (attributes) that are used to characterize datasets that have been collected and analyzed in different research labs. The second problem, cluster matching, involves discovery of matchings between patterns (clusters) across datasets. We treat both of these problems together as a multi-objective optimization problem. A multi-objective metaheuristics algorithm is described to find the optimal solution and compared with the genetic algorithm. The utility of this approach is demonstrated in a series of experiments using synthetic and realistic datasets that are designed to simulate heterogeneous data from different sources.
期刊介绍:
The Journal on Data Semantics (JoDS) provides an international high-quality publication venue for researchers whose themes cover issues related to information semantics. Its target domain ranges from theories supporting the formal definition of semantic content to innovative domain-specific applications of semantic knowledge, thus covering work done on conceptual modeling, databases, Semantic Web, information systems, workflow and process modeling, ontologies, business intelligence, interoperability, mobile information services, data warehousing, knowledge representation and reasoning, and artificial intelligence. Topics of relevance to this journal include (but are not limited to): Conceptualization, knowledge representation and reasoning, Conceptual data, process, workflow, and event modeling, Provenance, evolution and change management, Context and context-dependent representations and processing, Multi-model and multi-paradigm approaches, Mappings, transformations, reverse engineering and semantic elicitation, Semantic interoperability, semantic mediators and metadata management, Ontology models and languages, ontology-driven applications, Ontology, schema, data and process integration, reconciliation and alignment, Web semantics and semi-structured data, Integrity description and handling, Semantics in data mining and knowledge extraction, Semantics in business intelligence, analytics and data visualization, Spatial, temporal, multimedia and multimodal semantics, Semantic mobility data and services for mobile users, Supporting tools and applications of semantic-driven approaches.