{"title":"ReadFast: Optimizing structural search relevance for big biomedical text","authors":"M. Gubanov, A. Pyayt","doi":"10.1109/IRI.2013.6642540","DOIUrl":null,"url":null,"abstract":"While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.