通过属性对齐比较数据集

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM) Pub Date : 2014-12-01 DOI:10.1109/CIDM.2014.7008148

Jakub Smíd, Roman Neruda

{"title":"通过属性对齐比较数据集","authors":"Jakub Smíd, Roman Neruda","doi":"10.1109/CIDM.2014.7008148","DOIUrl":null,"url":null,"abstract":"Metalearning approach to the model selection problem - exploiting the idea that algorithms perform similarly on similar datasets - requires a suitable metric on the dataset space. One common approach compares the datasets based on fixed number of features describing the datasets as a whole. The information based on individual attributes is usually aggregated, taken for the most relevant attributes only, or omitted altogether. In this paper, we propose an approach that aligns complete sets of attributes of the datasets, allowing for different number of attributes. By supplying the distance between two attributes, one can find the alignment minimizing the sum of individual distances between aligned attributes. We present two methods that are able to find such an alignment. They differ in computational complexity and presumptions about the distance function between two attributes supplied. Experiments were performed using the proposed methods and the results were compared with the baseline algorithm.","PeriodicalId":117542,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Comparing datasets by attribute alignment\",\"authors\":\"Jakub Smíd, Roman Neruda\",\"doi\":\"10.1109/CIDM.2014.7008148\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metalearning approach to the model selection problem - exploiting the idea that algorithms perform similarly on similar datasets - requires a suitable metric on the dataset space. One common approach compares the datasets based on fixed number of features describing the datasets as a whole. The information based on individual attributes is usually aggregated, taken for the most relevant attributes only, or omitted altogether. In this paper, we propose an approach that aligns complete sets of attributes of the datasets, allowing for different number of attributes. By supplying the distance between two attributes, one can find the alignment minimizing the sum of individual distances between aligned attributes. We present two methods that are able to find such an alignment. They differ in computational complexity and presumptions about the distance function between two attributes supplied. Experiments were performed using the proposed methods and the results were compared with the baseline algorithm.\",\"PeriodicalId\":117542,\"journal\":{\"name\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIDM.2014.7008148\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIDM.2014.7008148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

模型选择问题的元学习方法——利用算法在相似数据集上执行相似的想法——需要在数据集空间上有一个合适的度量。一种常见的方法是基于将数据集作为一个整体描述的固定数量的特征来比较数据集。基于单个属性的信息通常是聚合的，只获取最相关的属性，或者完全省略。在本文中，我们提出了一种方法来对齐数据集的完整属性集，允许不同数量的属性。通过提供两个属性之间的距离，可以找到最小化对齐属性之间单个距离之和的对齐方式。我们提出了两种能够找到这种对齐的方法。它们在计算复杂性和对所提供的两个属性之间的距离函数的假设方面有所不同。采用该方法进行了实验，并与基线算法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparing datasets by attribute alignment

Metalearning approach to the model selection problem - exploiting the idea that algorithms perform similarly on similar datasets - requires a suitable metric on the dataset space. One common approach compares the datasets based on fixed number of features describing the datasets as a whole. The information based on individual attributes is usually aggregated, taken for the most relevant attributes only, or omitted altogether. In this paper, we propose an approach that aligns complete sets of attributes of the datasets, allowing for different number of attributes. By supplying the distance between two attributes, one can find the alignment minimizing the sum of individual distances between aligned attributes. We present two methods that are able to find such an alignment. They differ in computational complexity and presumptions about the distance function between two attributes supplied. Experiments were performed using the proposed methods and the results were compared with the baseline algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

自引率

0.00%

发文量