ReadFast: Optimizing structural search relevance for big biomedical text

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) Pub Date : 2013-08-01 DOI:10.1109/IRI.2013.6642540

M. Gubanov, A. Pyayt

{"title":"ReadFast: Optimizing structural search relevance for big biomedical text","authors":"M. Gubanov, A. Pyayt","doi":"10.1109/IRI.2013.6642540","DOIUrl":null,"url":null,"abstract":"While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.

查看原文本刊更多论文

ReadFast:优化大型生物医学文本的结构搜索相关性

虽然在Web上找到所需信息的问题是至关重要的，但与十多年前Web刚刚出现时相比，现在这个问题已经不那么紧迫了。那时要找到感兴趣的网络资源要困难得多，因为搜索引擎还处于起步阶段，它们的索引覆盖了网络的一小部分，并配备了萌芽的页面排名算法。现在，网络搜索到目前为止还不完美，但它已经成为数百万人的日常“首选”资源。相比之下，对文本信息的访问甚至与今天的网络搜索算法所提供的还差得远。事实上，它与十年前的情况没有太大区别。也就是说，在大多数现代文字处理器和文本语料库搜索引擎中，关键字搜索(精确的子字符串匹配)通常是大海捞针的唯一方法。这里我们展示ReadFast——一个能够从任何自然语言文本语料库中提取特定结构的系统，并使用它为特定查询类别提供比关键字搜索更相关的搜索结果。我们的评估证明了两个大型生物医学文本语料库的显著相关性增益(20-30%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

自引率

0.00%

发文量