Generating Stories From Archived Collections

Proceedings of the 2017 ACM on Web Science Conference Pub Date : 2017-06-25 DOI:10.1145/3091478.3091508

Yasmin AlNoamany, Michele C. Weigle, Michael L. Nelson

{"title":"Generating Stories From Archived Collections","authors":"Yasmin AlNoamany, Michele C. Weigle, Michael L. Nelson","doi":"10.1145/3091478.3091508","DOIUrl":null,"url":null,"abstract":"With the extensive growth of the Web, multiple Web archiving initiatives have been started to archive different aspects of the Web. Services such as Archive-It exist to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, \"storytelling\" is becoming a popular technique in social media for selecting Web resources to support a particular narrative or \"story\". We address the problem of understanding archived collections by proposing the Dark and Stormy Archive (DSA) framework, in which we integrate \"storytelling\" social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. Inspired by the Turing Test, we evaluate the stories automatically generated by the DSA framework against a ground truth dataset of hand-crafted stories, generated by expert archivists from Archive-It collections. Using Amazon's Mechanical Turk, we found that the stories automatically generated by DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated stories.","PeriodicalId":165747,"journal":{"name":"Proceedings of the 2017 ACM on Web Science Conference","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on Web Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3091478.3091508","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

With the extensive growth of the Web, multiple Web archiving initiatives have been started to archive different aspects of the Web. Services such as Archive-It exist to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular technique in social media for selecting Web resources to support a particular narrative or "story". We address the problem of understanding archived collections by proposing the Dark and Stormy Archive (DSA) framework, in which we integrate "storytelling" social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. Inspired by the Turing Test, we evaluate the stories automatically generated by the DSA framework against a ground truth dataset of hand-crafted stories, generated by expert archivists from Archive-It collections. Using Amazon's Mechanical Turk, we found that the stories automatically generated by DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated stories.

查看原文本刊更多论文

从归档集合生成故事

随着Web的广泛发展，已经启动了多个Web归档计划来归档Web的不同方面。像Archive-It这样的服务允许机构开发、管理和保存Web资源集合。理解这些存档集合的内容和边界是一个挑战，这导致了一个悖论，即越大的集合越难以理解。与此同时，随着网络上数据量的增长，“讲故事”正在成为社交媒体中一种流行的技术，用于选择网络资源来支持特定的叙述或“故事”。我们通过提出黑暗和风暴档案(DSA)框架来解决理解存档收藏的问题，在这个框架中，我们整合了“讲故事”的社会媒体和网络档案。在DSA框架中，我们从归档集合中识别、评估和选择候选Web页面，这些Web页面总结了这些集合的内容，按时间顺序排列它们，然后使用用户已经熟悉的工具(如Storify)将这些页面可视化。受图灵测试的启发，我们将DSA框架自动生成的故事与由Archive-It收藏的专家档案管理员生成的手工故事的真实数据集进行比较。使用Amazon的Mechanical Turk，我们发现DSA自动生成的故事与人类主题领域专家创建的故事无法区分，同时两种故事(自动和人类)都很容易与随机生成的故事区分开来。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM on Web Science Conference

自引率

0.00%

发文量