{"title":"MeSH Indexing Using the Biomedical Citation Network","authors":"William Gasper, P. Chundi, D. Ghersi","doi":"10.1145/3388440.3412466","DOIUrl":null,"url":null,"abstract":"PubMed contains over 30 million biomedical literature citations and is an invaluable resource for researchers, medical professionals, students, and curious individuals. The search and retrieval process is significantly enhanced by PubMed's Medical Subject Heading (MeSH) indexing process, which requires a significant manual component. It is difficult to effectively apply traditional machine learning methods to large scale semantic indexing problems, and this difficulty has impeded complete automation of the MeSH indexing process. PubMed citations are particularly challenging to index: documents are often indexed with a dozen or more terms, and most terms occur extremely infrequently in the document set. This work examines the biomedical literature citation network and MeSH vocabulary for viable signal that might benefit the indexing process. Simple predictive models utilizing features generated from the biomedical literature citation network proved useful and effective in recommending MeSH terms for document indexing. A neural network proved similarly effective to the simple model in terms of raw performance but produced qualitatively different term recommendations.","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3412466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
PubMed contains over 30 million biomedical literature citations and is an invaluable resource for researchers, medical professionals, students, and curious individuals. The search and retrieval process is significantly enhanced by PubMed's Medical Subject Heading (MeSH) indexing process, which requires a significant manual component. It is difficult to effectively apply traditional machine learning methods to large scale semantic indexing problems, and this difficulty has impeded complete automation of the MeSH indexing process. PubMed citations are particularly challenging to index: documents are often indexed with a dozen or more terms, and most terms occur extremely infrequently in the document set. This work examines the biomedical literature citation network and MeSH vocabulary for viable signal that might benefit the indexing process. Simple predictive models utilizing features generated from the biomedical literature citation network proved useful and effective in recommending MeSH terms for document indexing. A neural network proved similarly effective to the simple model in terms of raw performance but produced qualitatively different term recommendations.