Creating order from the mess: web archive derivative datasets and notebooks

IF 0.8 3区 社会学 0 HUMANITIES, MULTIDISCIPLINARY
Nick Ruest, Samantha Fritz, Ian Milligan
{"title":"Creating order from the mess: web archive derivative datasets and notebooks","authors":"Nick Ruest, Samantha Fritz, Ian Milligan","doi":"10.1080/23257962.2022.2100336","DOIUrl":null,"url":null,"abstract":"ABSTRACT For a quarter-century, memory institutions have been preserving web-based content. These web archives have been collected and stored in ARC and WARC (W/ARC) file formats and will form a basis for contemporary histories. Yet, these formats present significant challenges to researchers who wish to access and use web archival data. This is primarily due to the nature of collecting, storing, and providing access to these multifaceted digital objects. In other words, web archives are messy. Applying traditional archival methods of description to digital-born collections is complicated due to issues of provenance, original order, and scale. However, we believe that archival description offers a practical starting point for thinking about access. This paper argues a robust finding aid must extend beyond basic collection-level description to allow for more meaningful interactions with web archives. As such, we propose a reimagining of a traditional finding-aid model into a three-level mode of description to include computational methods, the generation of derivative datasets, and interactive code-rich notebooks. These three factors combine to ultimately contribute to the expanded access and use of web archives.","PeriodicalId":42972,"journal":{"name":"Archives and Records-The Journal of the Archives and Records Association","volume":"43 1","pages":"316 - 331"},"PeriodicalIF":0.8000,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives and Records-The Journal of the Archives and Records Association","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/23257962.2022.2100336","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 2

Abstract

ABSTRACT For a quarter-century, memory institutions have been preserving web-based content. These web archives have been collected and stored in ARC and WARC (W/ARC) file formats and will form a basis for contemporary histories. Yet, these formats present significant challenges to researchers who wish to access and use web archival data. This is primarily due to the nature of collecting, storing, and providing access to these multifaceted digital objects. In other words, web archives are messy. Applying traditional archival methods of description to digital-born collections is complicated due to issues of provenance, original order, and scale. However, we believe that archival description offers a practical starting point for thinking about access. This paper argues a robust finding aid must extend beyond basic collection-level description to allow for more meaningful interactions with web archives. As such, we propose a reimagining of a traditional finding-aid model into a three-level mode of description to include computational methods, the generation of derivative datasets, and interactive code-rich notebooks. These three factors combine to ultimately contribute to the expanded access and use of web archives.
从混乱中创造秩序:网络存档衍生数据集和笔记本
摘要四分之一个世纪以来,记忆机构一直在保存基于网络的内容。这些网络档案以ARC和WARC(W/ARC)文件格式收集和存储,将成为当代历史的基础。然而,这些格式给希望访问和使用网络档案数据的研究人员带来了重大挑战。这主要是由于收集、存储和提供对这些多方面数字对象的访问的性质。换句话说,网络档案是混乱的。由于来源、原始顺序和规模的问题,将传统的档案描述方法应用于数字藏品是复杂的。然而,我们认为,档案描述为思考访问提供了一个实用的起点。本文认为,强大的查找辅助工具必须扩展到基本的收藏级描述之外,才能与网络档案进行更有意义的交互。因此,我们建议将传统的搜索辅助模型重新构想为三级描述模式,包括计算方法、衍生数据集的生成和交互式代码丰富的笔记本。这三个因素结合在一起,最终有助于扩大网络档案的访问和使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
45
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信