有监督机器学习实验中性能估计器的数据集阈值

2009 International Conference for Internet Technology and Secured Transactions, (ICITST) Pub Date : 2009-11-01 DOI:10.1109/ICITST.2009.5402500

Zanifa Omary, F. Mtenzi

{"title":"有监督机器学习实验中性能估计器的数据集阈值","authors":"Zanifa Omary, F. Mtenzi","doi":"10.1109/ICITST.2009.5402500","DOIUrl":null,"url":null,"abstract":"The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised machine learning experimenters to categorize datasets based on the number of instances and attributes and then choose the appropriate performance estimation method. The experiments will involve the use of four different datasets from UCI machine learning repository and two performance estimators. The performance of the methods will be measured using f1-score.","PeriodicalId":251169,"journal":{"name":"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Dataset threshold for the performance estimators in supervised machine learning experiments\",\"authors\":\"Zanifa Omary, F. Mtenzi\",\"doi\":\"10.1109/ICITST.2009.5402500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised machine learning experimenters to categorize datasets based on the number of instances and attributes and then choose the appropriate performance estimation method. The experiments will involve the use of four different datasets from UCI machine learning repository and two performance estimators. The performance of the methods will be measured using f1-score.\",\"PeriodicalId\":251169,\"journal\":{\"name\":\"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)\",\"volume\":\"2010 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITST.2009.5402500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITST.2009.5402500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

数据集阈值的建立是比较机器学习算法性能的第一步。它涉及使用不同的数据集，不同的样本大小与属性的数量和数据集中可用的实例数量有关。目前，对于那些不熟悉机器学习实验的人来说，基于这两个因素对这些数据集进行分类，无论是小的还是大的，都没有限制。在本文中，我们通过实验来建立数据集阈值。建立的数据集阈值将帮助不熟悉监督机器学习的实验人员根据实例数量和属性对数据集进行分类，然后选择合适的性能估计方法。实验将涉及使用来自UCI机器学习存储库的四个不同数据集和两个性能估计器。方法的性能将使用f1-score来衡量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dataset threshold for the performance estimators in supervised machine learning experiments

The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised machine learning experimenters to categorize datasets based on the number of instances and attributes and then choose the appropriate performance estimation method. The experiments will involve the use of four different datasets from UCI machine learning repository and two performance estimators. The performance of the methods will be measured using f1-score.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 International Conference for Internet Technology and Secured Transactions, (ICITST)

自引率

0.00%

发文量