Predicting medical subject headings based on abstract similarity and citations to MEDLINE records

Adam K. Kehoe, Vetle I. Torvik
{"title":"Predicting medical subject headings based on abstract similarity and citations to MEDLINE records","authors":"Adam K. Kehoe, Vetle I. Torvik","doi":"10.1145/2910896.2910920","DOIUrl":null,"url":null,"abstract":"We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH®) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting “Heterocyclic Compounds, 2-Ring” vs. other “Heterocyclic Compounds”). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10-fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on “Heterocyclic Compounds, 2-Ring”, while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2910896.2910920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We describe a classifier-enhanced nearest neighbor approach to assigning Medical Subject Headings (MeSH®) to unlabeled documents using a combination of abstract similarities and direct citations to labeled MEDLINE records. The approach frames the classification problem by decomposing it into sets of siblings in the MeSH hierarchy (e.g., training a classifier for predicting “Heterocyclic Compounds, 2-Ring” vs. other “Heterocyclic Compounds”). Preliminary experiments using a small but diverse set of MeSH terms shows the highest performance when using both abstracts and citations compared to each alone, and coupled with a non-naive classifier: 90+% precision and recall with 10-fold cross-validation. NLM's Medical Text Indexer (MTI) tool achieves similar overall performance but varies more across the terms tested. For example, MTI performs better on “Heterocyclic Compounds, 2-Ring”, while our approach performs better on Alzheimer Disease and Neuroimaging. Our approach can be applied broadly to documents with abstracts that are similar to (or cite) MEDLINE abstracts, which would help linking and searching across bibliographic databases beyond MEDLINE.
基于对MEDLINE记录的抽象相似性和引用预测医学主题标题
我们描述了一种分类器增强的最近邻方法,将抽象相似性和对已标记MEDLINE记录的直接引用相结合,将医学主题标题(MeSH®)分配给未标记的文档。该方法通过将分类问题分解为MeSH层次结构中的兄弟姐妹集来构建分类问题(例如,训练用于预测“杂环化合物,2环”与其他“杂环化合物”的分类器)。使用小而多样的MeSH术语集进行的初步实验显示,与单独使用摘要和引文相比,同时使用非朴素分类器时,性能最高:90%以上的精度和10倍交叉验证的召回率。NLM的医学文本索引器(MTI)工具实现了类似的总体性能,但在测试的术语之间差异更大。例如,MTI在“杂环化合物,2环”方面表现更好,而我们的方法在阿尔茨海默病和神经影像学方面表现更好。我们的方法可以广泛地应用于具有类似(或引用)MEDLINE摘要的摘要的文档,这将有助于在MEDLINE以外的书目数据库之间进行链接和搜索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信