基于特征约简的多实例多标签在线生物医学出版物分类

Dong Ren, Long Ma, Yanqing Zhang, Rajshekhar Sunderraman, P. Fox, A. Laird, J. Turner, Matthew D. Turner
{"title":"基于特征约简的多实例多标签在线生物医学出版物分类","authors":"Dong Ren, Long Ma, Yanqing Zhang, Rajshekhar Sunderraman, P. Fox, A. Laird, J. Turner, Matthew D. Turner","doi":"10.1109/ICCI-CC.2015.7259391","DOIUrl":null,"url":null,"abstract":"Text annotation, the assignment of metadata to documents, requires significant time and effort when performed by humans. A variety of text mining methods have been used to automate this process, many of them based on either keyword extraction or word counts. However, when using keywords as text classification features, it is common to find that (1) the number of training instances is much less than the number of features extracted. This complexity affects text classification performance. Another challenge is (2) the assignment of multiple, non-exclusive labels to the documents (multi-label classification). This problem makes text classification more complicated when compared with single label classification. We use, as an example, a set of expertly labeled documents from the human functional neuroimaging literature, and we apply a Multi-instance Multi-label (MIML) classification algorithm to the problem. To address (1), we apply a feature reduction approach to reduce the feature dimension. For (2) we use an MIML algorithm called MIMLfast to implement the multi-label classification.","PeriodicalId":328695,"journal":{"name":"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Online biomedical publication classification using Multi-Instance Multi-Label algorithms with feature reduction\",\"authors\":\"Dong Ren, Long Ma, Yanqing Zhang, Rajshekhar Sunderraman, P. Fox, A. Laird, J. Turner, Matthew D. Turner\",\"doi\":\"10.1109/ICCI-CC.2015.7259391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text annotation, the assignment of metadata to documents, requires significant time and effort when performed by humans. A variety of text mining methods have been used to automate this process, many of them based on either keyword extraction or word counts. However, when using keywords as text classification features, it is common to find that (1) the number of training instances is much less than the number of features extracted. This complexity affects text classification performance. Another challenge is (2) the assignment of multiple, non-exclusive labels to the documents (multi-label classification). This problem makes text classification more complicated when compared with single label classification. We use, as an example, a set of expertly labeled documents from the human functional neuroimaging literature, and we apply a Multi-instance Multi-label (MIML) classification algorithm to the problem. To address (1), we apply a feature reduction approach to reduce the feature dimension. For (2) we use an MIML algorithm called MIMLfast to implement the multi-label classification.\",\"PeriodicalId\":328695,\"journal\":{\"name\":\"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCI-CC.2015.7259391\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI-CC.2015.7259391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

文本注释(将元数据分配给文档)需要大量的时间和精力。各种文本挖掘方法已被用于自动化此过程,其中许多方法基于关键字提取或单词计数。然而,当使用关键词作为文本分类特征时,通常会发现(1)训练实例的数量远远少于提取的特征数量。这种复杂性影响了文本分类性能。另一个挑战是(2)为文档分配多个非排他性标签(多标签分类)。这个问题使得文本分类比单标签分类更加复杂。我们使用一组来自人类功能神经影像学文献的专业标记文档作为示例,并应用多实例多标签(MIML)分类算法来解决该问题。为了解决(1),我们采用特征约简方法来降低特征维数。对于(2),我们使用称为MIMLfast的MIML算法来实现多标签分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Online biomedical publication classification using Multi-Instance Multi-Label algorithms with feature reduction
Text annotation, the assignment of metadata to documents, requires significant time and effort when performed by humans. A variety of text mining methods have been used to automate this process, many of them based on either keyword extraction or word counts. However, when using keywords as text classification features, it is common to find that (1) the number of training instances is much less than the number of features extracted. This complexity affects text classification performance. Another challenge is (2) the assignment of multiple, non-exclusive labels to the documents (multi-label classification). This problem makes text classification more complicated when compared with single label classification. We use, as an example, a set of expertly labeled documents from the human functional neuroimaging literature, and we apply a Multi-instance Multi-label (MIML) classification algorithm to the problem. To address (1), we apply a feature reduction approach to reduce the feature dimension. For (2) we use an MIML algorithm called MIMLfast to implement the multi-label classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信