M. Bohanec, Dunajska cesta Sl Ljubljana Slovenia Salvirt Ltd., M. K. Borstnar, M. Robnik-Sikonja
{"title":"Estimation of minimum sample size for identification of the most important features: a case study providing a qualitative B2B sales data set","authors":"M. Bohanec, Dunajska cesta Sl Ljubljana Slovenia Salvirt Ltd., M. K. Borstnar, M. Robnik-Sikonja","doi":"10.17535/CRORR.2017.0033","DOIUrl":null,"url":null,"abstract":"An important task in machine learning is to reduce data set dimensionality, which in turn contributes to reducing computational load and data collection costs, while improving human understanding and interpretation of models. We introduce an operational guideline for determining the minimum number of instances sucient to identify correct ranks of features with the highest impact. We conduct tests based on qualitative B2B sales forecasting data. The results show that a relatively small instance subset is sucient for identifying the most important features when rank is not important.","PeriodicalId":44065,"journal":{"name":"Croatian Operational Research Review","volume":"42 3","pages":"515-524"},"PeriodicalIF":0.4000,"publicationDate":"2017-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.17535/CRORR.2017.0033","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Croatian Operational Research Review","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17535/CRORR.2017.0033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 4
Abstract
An important task in machine learning is to reduce data set dimensionality, which in turn contributes to reducing computational load and data collection costs, while improving human understanding and interpretation of models. We introduce an operational guideline for determining the minimum number of instances sucient to identify correct ranks of features with the highest impact. We conduct tests based on qualitative B2B sales forecasting data. The results show that a relatively small instance subset is sucient for identifying the most important features when rank is not important.
期刊介绍:
Croatian Operational Research Review (CRORR) is the journal which publishes original scientific papers from the area of operational research. The purpose is to publish papers from various aspects of operational research (OR) with the aim of presenting scientific ideas that will contribute both to theoretical development and practical application of OR. The scope of the journal covers the following subject areas: linear and non-linear programming, integer programing, combinatorial and discrete optimization, multi-objective programming, stohastic models and optimization, scheduling, macroeconomics, economic theory, game theory, statistics and econometrics, marketing and data analysis, information and decision support systems, banking, finance, insurance, environment, energy, health, neural networks and fuzzy systems, control theory, simulation, practical OR and applications. The audience includes both researchers and practitioners from the area of operations research, applied mathematics, statistics, econometrics, intelligent methods, simulation, and other areas included in the above list of topics. The journal has an international board of editors, consisting of more than 30 editors – university professors from Croatia, Slovenia, USA, Italy, Germany, Austria and other coutries.