ReadFast: Optimizing structural search relevance for big biomedical text

M. Gubanov, A. Pyayt
{"title":"ReadFast: Optimizing structural search relevance for big biomedical text","authors":"M. Gubanov, A. Pyayt","doi":"10.1109/IRI.2013.6642540","DOIUrl":null,"url":null,"abstract":"While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.
ReadFast:优化大型生物医学文本的结构搜索相关性
虽然在Web上找到所需信息的问题是至关重要的,但与十多年前Web刚刚出现时相比,现在这个问题已经不那么紧迫了。那时要找到感兴趣的网络资源要困难得多,因为搜索引擎还处于起步阶段,它们的索引覆盖了网络的一小部分,并配备了萌芽的页面排名算法。现在,网络搜索到目前为止还不完美,但它已经成为数百万人的日常“首选”资源。相比之下,对文本信息的访问甚至与今天的网络搜索算法所提供的还差得远。事实上,它与十年前的情况没有太大区别。也就是说,在大多数现代文字处理器和文本语料库搜索引擎中,关键字搜索(精确的子字符串匹配)通常是大海捞针的唯一方法。这里我们展示ReadFast——一个能够从任何自然语言文本语料库中提取特定结构的系统,并使用它为特定查询类别提供比关键字搜索更相关的搜索结果。我们的评估证明了两个大型生物医学文本语料库的显著相关性增益(20-30%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信