使用隐式超链接识别文档的注释段落

UK Conference on Hypertext Pub Date : 2006-08-22 DOI:10.1145/1149941.1149960

Jean-Yves Delort

{"title":"使用隐式超链接识别文档的注释段落","authors":"Jean-Yves Delort","doi":"10.1145/1149941.1149960","DOIUrl":null,"url":null,"abstract":"This paper addresses the issue of automatically selecting passages of blog posts using readers' comments. The problem is difficult because: (i) the textual content of blogs is often noisy, (ii) comments do not always target passages of the posts and, (iii) comments are not equally useful for identifying important passages. We have developed a system for selecting commented passages which takes as input blog posts and their comments and delivers, for each post, the sentences of the post which are the most commented and/or the most discussed. Our approach combines three steps to identify commented passages of a post. The first step is to remove the complexity of processing the contents of posts and comments using heuristics adapted to the language of the blog. The second step is to find useful comments and assigns them a degree of relevance using a model automatically built and validated by an expert. The third step is to identify important passages using relevant comments. We conducted two experiments to evaluate the usefulness and the effectiveness of our approach. The first study show that in only 50% of the posts, the most commented sentence elicited by our approach corresponds to the post extract generated using generic summarization. In the second study, human participants confirmed that, in practice, selected passages are frequently commented passages.","PeriodicalId":134809,"journal":{"name":"UK Conference on Hypertext","volume":"204 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Identifying commented passages of documents using implicit hyperlinks\",\"authors\":\"Jean-Yves Delort\",\"doi\":\"10.1145/1149941.1149960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the issue of automatically selecting passages of blog posts using readers' comments. The problem is difficult because: (i) the textual content of blogs is often noisy, (ii) comments do not always target passages of the posts and, (iii) comments are not equally useful for identifying important passages. We have developed a system for selecting commented passages which takes as input blog posts and their comments and delivers, for each post, the sentences of the post which are the most commented and/or the most discussed. Our approach combines three steps to identify commented passages of a post. The first step is to remove the complexity of processing the contents of posts and comments using heuristics adapted to the language of the blog. The second step is to find useful comments and assigns them a degree of relevance using a model automatically built and validated by an expert. The third step is to identify important passages using relevant comments. We conducted two experiments to evaluate the usefulness and the effectiveness of our approach. The first study show that in only 50% of the posts, the most commented sentence elicited by our approach corresponds to the post extract generated using generic summarization. In the second study, human participants confirmed that, in practice, selected passages are frequently commented passages.\",\"PeriodicalId\":134809,\"journal\":{\"name\":\"UK Conference on Hypertext\",\"volume\":\"204 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"UK Conference on Hypertext\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1149941.1149960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"UK Conference on Hypertext","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1149941.1149960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

本文研究了利用读者评论自动选择博客文章段落的问题。这个问题很困难，因为:(i)博客的文本内容经常是嘈杂的，(ii)评论并不总是针对文章的段落，(iii)评论对识别重要段落并不同样有用。我们开发了一个选择评论段落的系统，该系统将博客文章及其评论作为输入，并为每篇文章提供评论最多和/或讨论最多的句子。我们的方法结合了三个步骤来识别帖子的评论段落。第一步是使用适应博客语言的启发式方法消除处理帖子和评论内容的复杂性。第二步是找到有用的评论，并使用由专家自动构建和验证的模型为它们分配一定程度的相关性。第三步是用相关的评论找出重要的段落。我们进行了两个实验来评估我们方法的有用性和有效性。第一项研究表明，在只有50%的帖子中，我们的方法得到的评论最多的句子与使用通用摘要生成的帖子摘录相对应。在第二项研究中，人类参与者证实，在实践中，选择的段落经常是评论段落。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Identifying commented passages of documents using implicit hyperlinks

This paper addresses the issue of automatically selecting passages of blog posts using readers' comments. The problem is difficult because: (i) the textual content of blogs is often noisy, (ii) comments do not always target passages of the posts and, (iii) comments are not equally useful for identifying important passages. We have developed a system for selecting commented passages which takes as input blog posts and their comments and delivers, for each post, the sentences of the post which are the most commented and/or the most discussed. Our approach combines three steps to identify commented passages of a post. The first step is to remove the complexity of processing the contents of posts and comments using heuristics adapted to the language of the blog. The second step is to find useful comments and assigns them a degree of relevance using a model automatically built and validated by an expert. The third step is to identify important passages using relevant comments. We conducted two experiments to evaluate the usefulness and the effectiveness of our approach. The first study show that in only 50% of the posts, the most commented sentence elicited by our approach corresponds to the post extract generated using generic summarization. In the second study, human participants confirmed that, in practice, selected passages are frequently commented passages.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

UK Conference on Hypertext

自引率

0.00%

发文量