Soumya Gayen, Deepak Gupta, Russell F Loane, Nicholas C Ide, Dina Demner-Fushman
{"title":"Effects of Porting Essie Tokenization and Normalization to Solr.","authors":"Soumya Gayen, Deepak Gupta, Russell F Loane, Nicholas C Ide, Dina Demner-Fushman","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Search for information is now an integral part of healthcare. Searches are enabled by search engines whose objective is to efficiently retrieve the relevant information for the user query. When it comes to retrieving biomedical text and literature, Essie search engine developed at the National Library of Medicine (NLM) performs exceptionally well. However, Essie is a software system developed for NLM that has ceased development and support. On the other hand, Solr is a popular opensource enterprise search engine used by many of the world's largest internet sites, offering continuous developments and improvements along with the state-of-the-art features. In this paper, we present our approach to porting the key features of Essie and developing custom components to be used in Solr. We demonstrate the effectiveness of the added components on three benchmark biomedical datasets. The custom components may aid the community in improving search methods for biomedical text retrieval.</p>","PeriodicalId":72180,"journal":{"name":"AMIA ... Annual Symposium proceedings. AMIA Symposium","volume":"2023 ","pages":"369-378"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10785910/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA ... Annual Symposium proceedings. AMIA Symposium","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Search for information is now an integral part of healthcare. Searches are enabled by search engines whose objective is to efficiently retrieve the relevant information for the user query. When it comes to retrieving biomedical text and literature, Essie search engine developed at the National Library of Medicine (NLM) performs exceptionally well. However, Essie is a software system developed for NLM that has ceased development and support. On the other hand, Solr is a popular opensource enterprise search engine used by many of the world's largest internet sites, offering continuous developments and improvements along with the state-of-the-art features. In this paper, we present our approach to porting the key features of Essie and developing custom components to be used in Solr. We demonstrate the effectiveness of the added components on three benchmark biomedical datasets. The custom components may aid the community in improving search methods for biomedical text retrieval.