半结构化文档的语义内核

S. Aseervatham, E. Viennet, Younès Bennani
{"title":"半结构化文档的语义内核","authors":"S. Aseervatham, E. Viennet, Younès Bennani","doi":"10.1109/ICDM.2007.23","DOIUrl":null,"url":null,"abstract":"Natural Language Processing has emerged as an active field of research in the machine learning community. Several methods based on statistical information have been proposed. However, with the linguistic complexity of the texts, semantic-based approaches have been investigated. In this paper, we propose a Semantic Kernel for semi- structured biomedical documents. The semantic meanings of words are extracted using the UMLS framework. The kernel, with a SVM classifier, has been applied to a text categorization task on a medical corpus of free text documents. The results have shown that the Semantic Kernel outperforms the Linear Kernel and the Naive Bayes classifier. Moreover, this kernel was ranked in the top ten of the best algorithms among 44 classification methods at the 2007 CMC Medical NLP International Challenge.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Semantic Kernel for Semi-structured DocumentS\",\"authors\":\"S. Aseervatham, E. Viennet, Younès Bennani\",\"doi\":\"10.1109/ICDM.2007.23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural Language Processing has emerged as an active field of research in the machine learning community. Several methods based on statistical information have been proposed. However, with the linguistic complexity of the texts, semantic-based approaches have been investigated. In this paper, we propose a Semantic Kernel for semi- structured biomedical documents. The semantic meanings of words are extracted using the UMLS framework. The kernel, with a SVM classifier, has been applied to a text categorization task on a medical corpus of free text documents. The results have shown that the Semantic Kernel outperforms the Linear Kernel and the Naive Bayes classifier. Moreover, this kernel was ranked in the top ten of the best algorithms among 44 classification methods at the 2007 CMC Medical NLP International Challenge.\",\"PeriodicalId\":233758,\"journal\":{\"name\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2007.23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

自然语言处理已经成为机器学习社区中一个活跃的研究领域。提出了几种基于统计信息的方法。然而,由于文本的语言复杂性,基于语义的方法一直在研究。本文提出了一种半结构化生物医学文档的语义核。使用UMLS框架提取单词的语义。该核与支持向量机分类器一起应用于医学自由文本文档语料库的文本分类任务。结果表明,语义核分类器优于线性核分类器和朴素贝叶斯分类器。此外,该内核在2007年CMC医学NLP国际挑战赛的44种分类方法中名列最佳算法前十名。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Semantic Kernel for Semi-structured DocumentS
Natural Language Processing has emerged as an active field of research in the machine learning community. Several methods based on statistical information have been proposed. However, with the linguistic complexity of the texts, semantic-based approaches have been investigated. In this paper, we propose a Semantic Kernel for semi- structured biomedical documents. The semantic meanings of words are extracted using the UMLS framework. The kernel, with a SVM classifier, has been applied to a text categorization task on a medical corpus of free text documents. The results have shown that the Semantic Kernel outperforms the Linear Kernel and the Naive Bayes classifier. Moreover, this kernel was ranked in the top ten of the best algorithms among 44 classification methods at the 2007 CMC Medical NLP International Challenge.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信