Nurjahan Begum, Bing Hu, T. Rakthanmanon, Eamonn J. Keogh
{"title":"Towards a minimum description length based stopping criterion for semi-supervised time series classification","authors":"Nurjahan Begum, Bing Hu, T. Rakthanmanon, Eamonn J. Keogh","doi":"10.1109/IRI.2013.6642490","DOIUrl":null,"url":null,"abstract":"In the last decade the plunging costs of sensors/storage have made it possible to obtain vast amounts of medical telemetry. However for this data to be useful, it must be annotated. This annotation, requiring the attention of medical experts is very expensive and time consuming, and remains the critical bottleneck in medical analysis. Semi-supervised learning is an obvious way to mitigate the need for human labor, however, most such algorithms are designed for intrinsically discrete objects, and do not work well in this domain, which requires the ability to deal with real-valued objects arriving in a streaming fashion. In this work we make two contributions. First, we demonstrate that in many cases just a handful of human annotated examples are sufficient to perform accurate classification. Second, we devise a novel parameter-free stopping criterion for semi-supervised learning. We evaluate our work with a comprehensive set of experiments on diverse medical data sources including electrocardiograms. Our experimental results show that our approach can construct accurate classifiers even if given only a single annotated instance.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642490","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
In the last decade the plunging costs of sensors/storage have made it possible to obtain vast amounts of medical telemetry. However for this data to be useful, it must be annotated. This annotation, requiring the attention of medical experts is very expensive and time consuming, and remains the critical bottleneck in medical analysis. Semi-supervised learning is an obvious way to mitigate the need for human labor, however, most such algorithms are designed for intrinsically discrete objects, and do not work well in this domain, which requires the ability to deal with real-valued objects arriving in a streaming fashion. In this work we make two contributions. First, we demonstrate that in many cases just a handful of human annotated examples are sufficient to perform accurate classification. Second, we devise a novel parameter-free stopping criterion for semi-supervised learning. We evaluate our work with a comprehensive set of experiments on diverse medical data sources including electrocardiograms. Our experimental results show that our approach can construct accurate classifiers even if given only a single annotated instance.