{"title":"有监督机器学习实验中性能估计器的数据集阈值","authors":"Zanifa Omary, F. Mtenzi","doi":"10.1109/ICITST.2009.5402500","DOIUrl":null,"url":null,"abstract":"The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised machine learning experimenters to categorize datasets based on the number of instances and attributes and then choose the appropriate performance estimation method. The experiments will involve the use of four different datasets from UCI machine learning repository and two performance estimators. The performance of the methods will be measured using f1-score.","PeriodicalId":251169,"journal":{"name":"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Dataset threshold for the performance estimators in supervised machine learning experiments\",\"authors\":\"Zanifa Omary, F. Mtenzi\",\"doi\":\"10.1109/ICITST.2009.5402500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised machine learning experimenters to categorize datasets based on the number of instances and attributes and then choose the appropriate performance estimation method. The experiments will involve the use of four different datasets from UCI machine learning repository and two performance estimators. The performance of the methods will be measured using f1-score.\",\"PeriodicalId\":251169,\"journal\":{\"name\":\"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)\",\"volume\":\"2010 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITST.2009.5402500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference for Internet Technology and Secured Transactions, (ICITST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITST.2009.5402500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dataset threshold for the performance estimators in supervised machine learning experiments
The establishment of dataset threshold is one among the first steps when comparing the performance of machine learning algorithms. It involves the use of different datasets with different sample sizes in relation to the number of attributes and the number of instances available in the dataset. Currently, there is no limit which has been set for those who are unfamiliar with machine learning experiments on the categorisation of these datasets, as either small or large, based on the two factors. In this paper we perform experiments in order to establish dataset threshold. The established dataset threshold will help unfamiliar supervised machine learning experimenters to categorize datasets based on the number of instances and attributes and then choose the appropriate performance estimation method. The experiments will involve the use of four different datasets from UCI machine learning repository and two performance estimators. The performance of the methods will be measured using f1-score.