{"title":"鲁棒交叉验证的源感知分区","authors":"Ozsel Kilinc, Ismail Uysal","doi":"10.1109/ICMLA.2015.216","DOIUrl":null,"url":null,"abstract":"One of the most critical components of engineering a machine learning algorithm for a live application is robust performance assessment prior to its implementation. Cross-validation is used to forecast a specific algorithm's classification or prediction accuracy on new input data given a finite dataset for training and testing the algorithm. Two most well known cross-validation techniques, random subsampling (RSS) and K-fold, are used to generalize the assessment results of machine learning algorithms in a non-exhaustive random manner. In this work we first show that for an inertia based activity recognition problem where data is collected from different users of a wrist-worn wireless accelerometer, random partitioning of the data, regardless of cross-validation technique, results in statistically similar average accuracies for a standard feed-forward neural network classifier. We propose a novel source-aware partitioning technique where samples from specific users are completely left out of the training/validation sets in rotation. The average error for the proposed cross-validation method is significantly higher with lower standard variation, which is a major indicator of cross-validation robustness. Approximately 30% increase in average error rate implies that source-aware cross validation could be a better indication of live algorithm performance where test data statistics would be significantly different than training data due to source (or user)-sensitive nature of process data.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Source-Aware Partitioning for Robust Cross-Validation\",\"authors\":\"Ozsel Kilinc, Ismail Uysal\",\"doi\":\"10.1109/ICMLA.2015.216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the most critical components of engineering a machine learning algorithm for a live application is robust performance assessment prior to its implementation. Cross-validation is used to forecast a specific algorithm's classification or prediction accuracy on new input data given a finite dataset for training and testing the algorithm. Two most well known cross-validation techniques, random subsampling (RSS) and K-fold, are used to generalize the assessment results of machine learning algorithms in a non-exhaustive random manner. In this work we first show that for an inertia based activity recognition problem where data is collected from different users of a wrist-worn wireless accelerometer, random partitioning of the data, regardless of cross-validation technique, results in statistically similar average accuracies for a standard feed-forward neural network classifier. We propose a novel source-aware partitioning technique where samples from specific users are completely left out of the training/validation sets in rotation. The average error for the proposed cross-validation method is significantly higher with lower standard variation, which is a major indicator of cross-validation robustness. Approximately 30% increase in average error rate implies that source-aware cross validation could be a better indication of live algorithm performance where test data statistics would be significantly different than training data due to source (or user)-sensitive nature of process data.\",\"PeriodicalId\":288427,\"journal\":{\"name\":\"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2015.216\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.216","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Source-Aware Partitioning for Robust Cross-Validation
One of the most critical components of engineering a machine learning algorithm for a live application is robust performance assessment prior to its implementation. Cross-validation is used to forecast a specific algorithm's classification or prediction accuracy on new input data given a finite dataset for training and testing the algorithm. Two most well known cross-validation techniques, random subsampling (RSS) and K-fold, are used to generalize the assessment results of machine learning algorithms in a non-exhaustive random manner. In this work we first show that for an inertia based activity recognition problem where data is collected from different users of a wrist-worn wireless accelerometer, random partitioning of the data, regardless of cross-validation technique, results in statistically similar average accuracies for a standard feed-forward neural network classifier. We propose a novel source-aware partitioning technique where samples from specific users are completely left out of the training/validation sets in rotation. The average error for the proposed cross-validation method is significantly higher with lower standard variation, which is a major indicator of cross-validation robustness. Approximately 30% increase in average error rate implies that source-aware cross validation could be a better indication of live algorithm performance where test data statistics would be significantly different than training data due to source (or user)-sensitive nature of process data.