使用输出标签共现统计和语义预测的无监督医学主题标题分配。

Natural language processing and information systems : ... International Conference on Applications of Natural Language to Information Systems, NLDB ... revised papers. International Conference on Applications of Natural Language to Info... Pub Date : 2013-06-01 DOI:10.1007/978-3-642-38824-8_15

Ramakanth Kavuluru, Zhenghao He

{"title":"使用输出标签共现统计和语义预测的无监督医学主题标题分配。","authors":"Ramakanth Kavuluru, Zhenghao He","doi":"10.1007/978-3-642-38824-8_15","DOIUrl":null,"url":null,"abstract":"Librarians at the National Library of Medicine tag each biomedical abstract to be indexed by their Pubmed information system with terms from the Medical Subject Headings (MeSH) terminology. The MeSH terminology has over 26,000 terms and indexers look at each article's full text to assign a set of most suitable terms for indexing it. Several recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a novel unsupervised approach using named entity recognition, relationship extraction, and output label co-occurrence frequencies of MeSH term pairs from the existing set of 22 million articles already indexed with MeSH terms by librarians at NLM. The main goal of our study is to gauge the potential of output label co-occurrence statistics and relationships extracted from free text in unsupervised indexing approaches. Especially, in biomedical domains, output label co-occurrences are generally easier to obtain than training data involving document and label set pairs owing to the sensitive nature of textual documents containing protected health information. Our methods achieve a micro F-score that is comparable to those obtained using supervised machine learning techniques with training data consisting of document label set pairs. Baseline comparisons reveal strong prospects for further research in exploiting label co-occurrences and relationships extracted from free text in recommending terms for indexing biomedical articles.","PeriodicalId":92107,"journal":{"name":"Natural language processing and information systems : ... International Conference on Applications of Natural Language to Information Systems, NLDB ... revised papers. International Conference on Applications of Natural Language to Info...","volume":"7934 ","pages":"176-188"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5527755/pdf/nihms877795.pdf","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications.\",\"authors\":\"Ramakanth Kavuluru, Zhenghao He\",\"doi\":\"10.1007/978-3-642-38824-8_15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Librarians at the National Library of Medicine tag each biomedical abstract to be indexed by their Pubmed information system with terms from the Medical Subject Headings (MeSH) terminology. The MeSH terminology has over 26,000 terms and indexers look at each article's full text to assign a set of most suitable terms for indexing it. Several recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a novel unsupervised approach using named entity recognition, relationship extraction, and output label co-occurrence frequencies of MeSH term pairs from the existing set of 22 million articles already indexed with MeSH terms by librarians at NLM. The main goal of our study is to gauge the potential of output label co-occurrence statistics and relationships extracted from free text in unsupervised indexing approaches. Especially, in biomedical domains, output label co-occurrences are generally easier to obtain than training data involving document and label set pairs owing to the sensitive nature of textual documents containing protected health information. Our methods achieve a micro F-score that is comparable to those obtained using supervised machine learning techniques with training data consisting of document label set pairs. Baseline comparisons reveal strong prospects for further research in exploiting label co-occurrences and relationships extracted from free text in recommending terms for indexing biomedical articles.\",\"PeriodicalId\":92107,\"journal\":{\"name\":\"Natural language processing and information systems : ... International Conference on Applications of Natural Language to Information Systems, NLDB ... revised papers. International Conference on Applications of Natural Language to Info...\",\"volume\":\"7934 \",\"pages\":\"176-188\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5527755/pdf/nihms877795.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural language processing and information systems : ... International Conference on Applications of Natural Language to Information Systems, NLDB ... revised papers. International Conference on Applications of Natural Language to Info...\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/978-3-642-38824-8_15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural language processing and information systems : ... International Conference on Applications of Natural Language to Information Systems, NLDB ... revised papers. International Conference on Applications of Natural Language to Info...","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-642-38824-8_15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

美国国家医学图书馆的图书馆员会用医学主题词表（MeSH）中的术语为每篇要被 Pubmed 信息系统索引的生物医学摘要打上标签。MeSH 术语有 26,000 多个术语，索引员查看每篇文章的全文，为其指定一套最合适的索引术语。最近的一些自动化尝试主要是利用文章标题和摘要文本来识别相应文章的 MeSH 术语。这些方法大多使用监督机器学习技术，利用已被索引的文章和相应的 MeSH 术语。在本文中，我们提出了一种新颖的无监督方法，该方法使用命名实体识别、关系提取，并从 NLM 图书馆员已用 MeSH 术语索引的现有 2,200 万篇文章中输出 MeSH 术语对的标签共现频率。我们研究的主要目的是评估从自由文本中提取的输出标签共现统计和关系在无监督索引方法中的潜力。特别是在生物医学领域，由于包含受保护健康信息的文本文档的敏感性，输出标签共现通常比涉及文档和标签集对的训练数据更容易获得。我们的方法获得的微观 F 分数可与使用由文档标签集对组成的训练数据的监督机器学习技术获得的分数相媲美。基线比较揭示了进一步研究利用从自由文本中提取的标签共现和关系为生物医学文章索引推荐术语的广阔前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsupervised Medical Subject Heading Assignment Using Output Label Co-occurrence Statistics and Semantic Predications.

Librarians at the National Library of Medicine tag each biomedical abstract to be indexed by their Pubmed information system with terms from the Medical Subject Headings (MeSH) terminology. The MeSH terminology has over 26,000 terms and indexers look at each article's full text to assign a set of most suitable terms for indexing it. Several recent automated attempts focused on using the article title and abstract text to identify MeSH terms for the corresponding article. Most of these approaches used supervised machine learning techniques that use already indexed articles and the corresponding MeSH terms. In this paper, we present a novel unsupervised approach using named entity recognition, relationship extraction, and output label co-occurrence frequencies of MeSH term pairs from the existing set of 22 million articles already indexed with MeSH terms by librarians at NLM. The main goal of our study is to gauge the potential of output label co-occurrence statistics and relationships extracted from free text in unsupervised indexing approaches. Especially, in biomedical domains, output label co-occurrences are generally easier to obtain than training data involving document and label set pairs owing to the sensitive nature of textual documents containing protected health information. Our methods achieve a micro F-score that is comparable to those obtained using supervised machine learning techniques with training data consisting of document label set pairs. Baseline comparisons reveal strong prospects for further research in exploiting label co-occurrences and relationships extracted from free text in recommending terms for indexing biomedical articles.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Natural language processing and information systems : ... International Conference on Applications of Natural Language to Information Systems, NLDB ... revised papers. International Conference on Applications of Natural Language to Info...

自引率

0.00%

发文量