{"title":"Novel split quality measures for stratified multilabel cross validation with application to large and sparse gene ontology datasets","authors":"Henri Tiittanen, L. Holm, Petri Toronen","doi":"10.3934/aci.222003","DOIUrl":null,"url":null,"abstract":"Multilabel learning is an important topic in machine learning research. Evaluating models in multilabel settings requires specific cross validation methods designed for multilabel data. In this article, we show that the most widely used cross validation split quality measure does not behave adequately with multilabel data that has strong class imbalance. We present improved measures and an algorithm, optisplit, for optimizing cross validations splits. Extensive comparison of various types of cross validation methods shows that optisplit produces more even cross validation splits than the existing methods and it is among the fastest methods with good splitting performance.","PeriodicalId":414924,"journal":{"name":"Applied Computing and Intelligence","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/aci.222003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Multilabel learning is an important topic in machine learning research. Evaluating models in multilabel settings requires specific cross validation methods designed for multilabel data. In this article, we show that the most widely used cross validation split quality measure does not behave adequately with multilabel data that has strong class imbalance. We present improved measures and an algorithm, optisplit, for optimizing cross validations splits. Extensive comparison of various types of cross validation methods shows that optisplit produces more even cross validation splits than the existing methods and it is among the fastest methods with good splitting performance.