Dong Ren, Long Ma, Yanqing Zhang, Rajshekhar Sunderraman, P. Fox, A. Laird, J. Turner, Matthew D. Turner
{"title":"Online biomedical publication classification using Multi-Instance Multi-Label algorithms with feature reduction","authors":"Dong Ren, Long Ma, Yanqing Zhang, Rajshekhar Sunderraman, P. Fox, A. Laird, J. Turner, Matthew D. Turner","doi":"10.1109/ICCI-CC.2015.7259391","DOIUrl":null,"url":null,"abstract":"Text annotation, the assignment of metadata to documents, requires significant time and effort when performed by humans. A variety of text mining methods have been used to automate this process, many of them based on either keyword extraction or word counts. However, when using keywords as text classification features, it is common to find that (1) the number of training instances is much less than the number of features extracted. This complexity affects text classification performance. Another challenge is (2) the assignment of multiple, non-exclusive labels to the documents (multi-label classification). This problem makes text classification more complicated when compared with single label classification. We use, as an example, a set of expertly labeled documents from the human functional neuroimaging literature, and we apply a Multi-instance Multi-label (MIML) classification algorithm to the problem. To address (1), we apply a feature reduction approach to reduce the feature dimension. For (2) we use an MIML algorithm called MIMLfast to implement the multi-label classification.","PeriodicalId":328695,"journal":{"name":"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI-CC.2015.7259391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Text annotation, the assignment of metadata to documents, requires significant time and effort when performed by humans. A variety of text mining methods have been used to automate this process, many of them based on either keyword extraction or word counts. However, when using keywords as text classification features, it is common to find that (1) the number of training instances is much less than the number of features extracted. This complexity affects text classification performance. Another challenge is (2) the assignment of multiple, non-exclusive labels to the documents (multi-label classification). This problem makes text classification more complicated when compared with single label classification. We use, as an example, a set of expertly labeled documents from the human functional neuroimaging literature, and we apply a Multi-instance Multi-label (MIML) classification algorithm to the problem. To address (1), we apply a feature reduction approach to reduce the feature dimension. For (2) we use an MIML algorithm called MIMLfast to implement the multi-label classification.