ReadFast:优化大型生物医学文本的结构搜索相关性

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) Pub Date : 2013-08-01 DOI:10.1109/IRI.2013.6642540

M. Gubanov, A. Pyayt

{"title":"ReadFast:优化大型生物医学文本的结构搜索相关性","authors":"M. Gubanov, A. Pyayt","doi":"10.1109/IRI.2013.6642540","DOIUrl":null,"url":null,"abstract":"While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.","PeriodicalId":418492,"journal":{"name":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"ReadFast: Optimizing structural search relevance for big biomedical text\",\"authors\":\"M. Gubanov, A. Pyayt\",\"doi\":\"10.1109/IRI.2013.6642540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.\",\"PeriodicalId\":418492,\"journal\":{\"name\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2013.6642540\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2013.6642540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

虽然在Web上找到所需信息的问题是至关重要的，但与十多年前Web刚刚出现时相比，现在这个问题已经不那么紧迫了。那时要找到感兴趣的网络资源要困难得多，因为搜索引擎还处于起步阶段，它们的索引覆盖了网络的一小部分，并配备了萌芽的页面排名算法。现在，网络搜索到目前为止还不完美，但它已经成为数百万人的日常“首选”资源。相比之下，对文本信息的访问甚至与今天的网络搜索算法所提供的还差得远。事实上，它与十年前的情况没有太大区别。也就是说，在大多数现代文字处理器和文本语料库搜索引擎中，关键字搜索(精确的子字符串匹配)通常是大海捞针的唯一方法。这里我们展示ReadFast——一个能够从任何自然语言文本语料库中提取特定结构的系统，并使用它为特定查询类别提供比关键字搜索更相关的搜索结果。我们的评估证明了两个大型生物医学文本语料库的显著相关性增益(20-30%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ReadFast: Optimizing structural search relevance for big biomedical text

While the problem to find needed information on the Web is critical, it is arguably much less pressing nowadays than it was over a decade ago when the Web was emerging. Back then it was much more difficult to find a Web resource of interest, because the search engines were in their infancy covering much lesser portion of the Web by their indices, armed with embryonic page ranking algorithms. Now, Web-search is by far not perfect yet, but definitely went a long way to become an everyday “go-to” resource for millions of people. By contrast, access to textual information is not even close to what Web-search algorithms offer today. In fact, it does not differ much from what everyone had a decade ago. That is keyword-search (exact substring match) is often the only way to find needle in a haystack in most modern word processors and text corpora search engines. Here we demonstrate ReadFast - a system, capable to extract certain structure from any natural language text corpus and use it to provide more relevant search results than keyword-search for specific classes of queries. Our evaluation justified significant relevance gain (20-30%) for two large Biomedical text corpora.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

自引率

0.00%

发文量