训练与输入选择和测试(TWIST)算法:机器学习模式识别性能的重大进展

M. Buscema, Marco Breda, W. Lodwick
{"title":"训练与输入选择和测试(TWIST)算法:机器学习模式识别性能的重大进展","authors":"M. Buscema, Marco Breda, W. Lodwick","doi":"10.4236/JILSA.2013.51004","DOIUrl":null,"url":null,"abstract":"This article shows the efficacy of TWIST, a methodology for the design of training and testing data subsets extracted from given dataset associated with a problem to be solved via ANNs. The methodology we present is embedded in algorithms and actualized in computer software. Our methodology as implemented in software is compared to the current standard methods of random cross validation: 10-Fold CV, random split into two subsets and the more advanced T&T. For each strategy, 13 learning machines, representing different families of the main algorithms, have been trained and tested. All algorithms were implemented using the well-known WEKA software package. On one hand a falsification test with randomly distributed dependent variable has been used to show how T&T and TWIST behaves as the other two strategies: when there is no information available on the datasets they are equivalent. On the other hand, using the real Statlog (Heart) dataset, a strong difference in accuracy is experimentally proved. Our results show that TWIST is superior to current methods. Pairs of subsets with similar probability density functions are generated, without coding noise, according to an optimal strategy that extracts the most useful information for pattern classification.","PeriodicalId":69452,"journal":{"name":"智能学习系统与应用(英文)","volume":"05 1","pages":"29-38"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"Training with Input Selection and Testing (TWIST) Algorithm: A Significant Advance in Pattern Recognition Performance of Machine Learning\",\"authors\":\"M. Buscema, Marco Breda, W. Lodwick\",\"doi\":\"10.4236/JILSA.2013.51004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article shows the efficacy of TWIST, a methodology for the design of training and testing data subsets extracted from given dataset associated with a problem to be solved via ANNs. The methodology we present is embedded in algorithms and actualized in computer software. Our methodology as implemented in software is compared to the current standard methods of random cross validation: 10-Fold CV, random split into two subsets and the more advanced T&T. For each strategy, 13 learning machines, representing different families of the main algorithms, have been trained and tested. All algorithms were implemented using the well-known WEKA software package. On one hand a falsification test with randomly distributed dependent variable has been used to show how T&T and TWIST behaves as the other two strategies: when there is no information available on the datasets they are equivalent. On the other hand, using the real Statlog (Heart) dataset, a strong difference in accuracy is experimentally proved. Our results show that TWIST is superior to current methods. Pairs of subsets with similar probability density functions are generated, without coding noise, according to an optimal strategy that extracts the most useful information for pattern classification.\",\"PeriodicalId\":69452,\"journal\":{\"name\":\"智能学习系统与应用(英文)\",\"volume\":\"05 1\",\"pages\":\"29-38\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"智能学习系统与应用(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.4236/JILSA.2013.51004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"智能学习系统与应用(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/JILSA.2013.51004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 40

摘要

本文展示了TWIST的有效性,TWIST是从给定数据集中提取的训练和测试数据子集的设计方法,该数据集与通过人工神经网络解决的问题相关。我们提出的方法嵌入在算法中,并在计算机软件中实现。我们在软件中实现的方法与目前随机交叉验证的标准方法进行了比较:10倍CV,随机分成两个子集和更先进的T&T。对于每种策略,代表不同主要算法族的13台学习机器都经过了训练和测试。所有算法均使用知名的WEKA软件包实现。一方面,随机分布因变量的证伪检验已用于显示T&T和TWIST如何表现为其他两个策略:当数据集上没有可用信息时,它们是等效的。另一方面,使用真实的Statlog (Heart)数据集,实验证明了准确性的巨大差异。我们的结果表明,TWIST优于现有的方法。根据提取最有用的模式分类信息的最优策略,生成具有相似概率密度函数的子集对,且没有编码噪声。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Training with Input Selection and Testing (TWIST) Algorithm: A Significant Advance in Pattern Recognition Performance of Machine Learning
This article shows the efficacy of TWIST, a methodology for the design of training and testing data subsets extracted from given dataset associated with a problem to be solved via ANNs. The methodology we present is embedded in algorithms and actualized in computer software. Our methodology as implemented in software is compared to the current standard methods of random cross validation: 10-Fold CV, random split into two subsets and the more advanced T&T. For each strategy, 13 learning machines, representing different families of the main algorithms, have been trained and tested. All algorithms were implemented using the well-known WEKA software package. On one hand a falsification test with randomly distributed dependent variable has been used to show how T&T and TWIST behaves as the other two strategies: when there is no information available on the datasets they are equivalent. On the other hand, using the real Statlog (Heart) dataset, a strong difference in accuracy is experimentally proved. Our results show that TWIST is superior to current methods. Pairs of subsets with similar probability density functions are generated, without coding noise, according to an optimal strategy that extracts the most useful information for pattern classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
135
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信