Cloud computing data capsules for non-consumptiveuse of texts

Jiaan Zeng, Guangchen Ruan, Alexander Crowell, A. Prakash, Beth Plale
{"title":"Cloud computing data capsules for non-consumptiveuse of texts","authors":"Jiaan Zeng, Guangchen Ruan, Alexander Crowell, A. Prakash, Beth Plale","doi":"10.1145/2608029.2608031","DOIUrl":null,"url":null,"abstract":"As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, natural language processing (NLP), and other text analysis techniques. In this paper we propose a virtual machine (VM) framework and methodology for non-consumptive text analysis. Using a remote VM model, the VM is configured with software and tooling for text analysis. When completed, the VM is wiped out and resources released for other users to share. Our approach extends the VM by turning it into a data capsules that prevents leakage of copyrighted content in the event that the VM is compromised. The HathiTrust Research Center Data Capsules has seen early use in application against the HathiTrust repository of digitized books from university libraries nationwide.","PeriodicalId":443577,"journal":{"name":"Scientific Cloud Computing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2608029.2608031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 40

Abstract

As digital data sources grow in number and size, they pose an opportunity for computational investigation by means of text mining, natural language processing (NLP), and other text analysis techniques. In this paper we propose a virtual machine (VM) framework and methodology for non-consumptive text analysis. Using a remote VM model, the VM is configured with software and tooling for text analysis. When completed, the VM is wiped out and resources released for other users to share. Our approach extends the VM by turning it into a data capsules that prevents leakage of copyrighted content in the event that the VM is compromised. The HathiTrust Research Center Data Capsules has seen early use in application against the HathiTrust repository of digitized books from university libraries nationwide.
用于文本非消耗性使用的云计算数据胶囊
随着数字数据源在数量和规模上的增长,它们为通过文本挖掘、自然语言处理(NLP)和其他文本分析技术进行计算调查提供了机会。在本文中,我们提出了一个虚拟机(VM)框架和方法,用于非消费文本分析。使用远程虚拟机模型,虚拟机配置了用于文本分析的软件和工具。完成后,将清除虚拟机,并释放资源供其他用户共享。我们的方法扩展了虚拟机,把它变成一个数据胶囊,在虚拟机被破坏的情况下防止版权内容的泄漏。HathiTrust研究中心的数据胶囊已经在针对全国大学图书馆的HathiTrust数字化图书存储库的应用中得到了早期应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信