Constructing query-biased summaries: a comparison of human and system generated snippets

L. L. Bando, Falk Scholer, A. Turpin
{"title":"Constructing query-biased summaries: a comparison of human and system generated snippets","authors":"L. L. Bando, Falk Scholer, A. Turpin","doi":"10.1145/1840784.1840813","DOIUrl":null,"url":null,"abstract":"Modern search engines display a summary for each ranked document that is returned in response to a query. These summaries typically include a snippet -- a collection of text fragments from the underlying document -- that has some relation to the query that is being answered.\n In this study we investigate how 10 humans construct snippets: participants first generate their own natural language snippet, and then separately extract a snippet by choosing text fragments, for four queries related to two documents. By mapping their generated snippets back to text fragments in the source document using eye tracking data, we observe that participants extract these same pieces of text around 73% of the time when creating their extractive snippets.\n In comparison, we notice that automated approaches for extracting snippets only use these same fragments 10% of the time. However, when the automated methods are evaluated using a position-independent bag-of-words approach, as typically used in the research literature for evaluating snippets, they are scored much more highly, seemingly extracting the \"correct\" text 24% of the time.\n In addition to demonstrating this large scope for improvement in snippet generation algorithms with our novel methodology, we also offer a series of observations on the behaviour of participants as they constructed their snippets.","PeriodicalId":413481,"journal":{"name":"International Conference on Information Interaction in Context","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Information Interaction in Context","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1840784.1840813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

Abstract

Modern search engines display a summary for each ranked document that is returned in response to a query. These summaries typically include a snippet -- a collection of text fragments from the underlying document -- that has some relation to the query that is being answered. In this study we investigate how 10 humans construct snippets: participants first generate their own natural language snippet, and then separately extract a snippet by choosing text fragments, for four queries related to two documents. By mapping their generated snippets back to text fragments in the source document using eye tracking data, we observe that participants extract these same pieces of text around 73% of the time when creating their extractive snippets. In comparison, we notice that automated approaches for extracting snippets only use these same fragments 10% of the time. However, when the automated methods are evaluated using a position-independent bag-of-words approach, as typically used in the research literature for evaluating snippets, they are scored much more highly, seemingly extracting the "correct" text 24% of the time. In addition to demonstrating this large scope for improvement in snippet generation algorithms with our novel methodology, we also offer a series of observations on the behaviour of participants as they constructed their snippets.
构造偏向查询的摘要:人工和系统生成的片段的比较
现代搜索引擎为响应查询返回的每个排名文档显示摘要。这些摘要通常包括一个片段——来自底层文档的文本片段的集合——它与要回答的查询有某种关系。在这项研究中,我们研究了10个人如何构建片段:参与者首先生成他们自己的自然语言片段,然后通过选择文本片段分别提取片段,用于与两个文档相关的四个查询。通过使用眼动追踪数据将他们生成的片段映射回源文档中的文本片段,我们观察到参与者在创建提取片段时提取这些相同的文本片段的概率为73%。相比之下,我们注意到用于提取片段的自动化方法只在10%的时间内使用这些相同的片段。然而,当使用与位置无关的词袋方法对自动化方法进行评估时,就像研究文献中通常用于评估片段一样,它们的得分要高得多,似乎在24%的时间里提取了“正确”的文本。除了用我们的新方法展示片段生成算法的大范围改进之外,我们还提供了一系列关于参与者构建片段时行为的观察。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信