{"title":"The Archival Acid Test: Evaluating archive performance on advanced HTML and JavaScript","authors":"Mat Kelly, Michael L. Nelson, Michele C. Weigle","doi":"10.1109/JCDL.2014.6970146","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970146","url":null,"abstract":"When preserving web pages, archival crawlers sometimes produce a result that varies from what an end-user expects. To quantitatively evaluate the degree to which an archival crawler is capable of comprehensively reproducing a web page from the live web into the archives, the crawlers' capabilities must be evaluated. In this paper, we propose a set of metrics to evaluate the capability of archival crawlers and other preservation tools using the Acid Test concept. For a variety of web preservation tools, we examine previous captures within web archives and note the features that produce incomplete or unexpected results. From there, we design the test to produce a quantitative measure of how well each tool performs its task.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"95 1","pages":"25-28"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90523481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PageRank-based Word Sense Induction within Web Search Results Clustering","authors":"Jose G. Moreno, G. Dias","doi":"10.1109/JCDL.2014.6970227","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970227","url":null,"abstract":"Word Sense Induction is an open problem in Natural Language Processing. Many recent works have been addressing this problem with a wide spectrum of strategies based on content analysis. In this paper, we present a sense induction strategy exclusively based on link analysis over the Web. In particular, we explore the idea that the main different senses of a given word share similar linking properties and can be found by performing clustering with link-based similarity metrics. The evaluation results show that PageRank-based sense induction achieves interesting results when compared to state-of-the-art content-based algorithms in the context of Web Search Results Clustering.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"145 1","pages":"465-466"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89086956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowd-sourcing Web knowledge for metadata extraction","authors":"Zhaohui Wu, W. Huang, Chen Liang, C. Lee Giles","doi":"10.1109/JCDL.2014.6970160","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970160","url":null,"abstract":"We explore a new metadata extraction framework without human annotators with the ground truth harvested from Web. A new training sample is selected based on not only the uncertainty and representativeness in the unlabeled pool, but also on its availability and credibility in Web knowledge bases. We construct a dataset of 4329 books with valid metadata and evaluate our approach using 5 Web book databases as oracles. Empirical results demonstrate its effectiveness and efficiency.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"135 1","pages":"141-144"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86424825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The value of risk management for data management in science and engineering","authors":"Filipe Ferreira, Ricardo Vieira, J. Borbinha","doi":"10.1109/JCDL.2014.6970214","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970214","url":null,"abstract":"An established concept to address data management challenges in science and engineering is the Data Management Plans. However, we claim that in some complex scenarios the actual principles for Data Management Plans might not be enough, especially when Risk Management turns to be relevant. Therefore, we propose a method, based on the ISO 31000, for science and engineering projects to create a Risk Management Plan that can complement the Data Management Plan. The validation of this proposal is presented in the real case of an engineering laboratory.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"29 6 1","pages":"439-440"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81444292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human and machine error analysis on dependency parsing of ancient Greek texts","authors":"Saeed Majidi, G. Crane","doi":"10.1109/JCDL.2014.6970171","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970171","url":null,"abstract":"Automatically generated metadata from large collections is an essential component of digital libraries. It is beginning to emerge as fundamental to the study of languages. Morphosyntactic annotation captures the form of individual words and their function. Nonetheless automated syntactic analysis is still imperfect and human annotators can be significantly more accurate. On the other hand, human work is expensive and even humans find some constructions difficult to annotate correctly. Comparing the performance of human annotators with that of an automatic parser is thus important for exploring how the two methods can best be combined. In the present study, we compare the frequency of the different types of errors made by student annotators with those made by different dependency parsers when annotating ancient Greek. With a few exceptions, the frequency of the different types of errors was similar for human and machine. The significance of these results is briefly discussed.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"4 1","pages":"221-224"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81664669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research networks in data repositories","authors":"Mark R. Costa, Jian Qin, Jun Wang","doi":"10.1109/JCDL.2014.6970197","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970197","url":null,"abstract":"This paper reports our ongoing work investigating the structural features of scientific collaboration based on metadata collected from a scientific data repository (SDR). The background literature is reviewed in supporting our claim that metadata collected from SDRs offer a complimentary data source to traditional publication metadata collected from digital libraries. Methodological considerations are discussed in association with using metadata from SDRs, including author name disambiguation and data parsing. Initial findings show that the network has some unique macro-level structural features while also in agreement with existing networks theories. Challenges due to inconsistent metadata quality control procedures are also discussed in an attempt to reinforce claims that metadata should be designed to support both domain specific retrieval and evaluation and assessment needs.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"83 1","pages":"403-406"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84428949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mink: Integrating the live and archived web viewing experience using web browsers and memento","authors":"Mat Kelly, Michael L. Nelson, Michele C. Weigle","doi":"10.1109/JCDL.2014.6970229","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970229","url":null,"abstract":"We describe Mink, a new web browser extension that provides a different model for integration of the live and archived web. While a user browses the live web, Mink actively queries the archives and reports other instances of the page in the archives without requiring active querying by the user. Further, by querying the archives dynamically and asynchronously, a user can view the extent to which the currently viewed page on the live web has been archived and proactively submit a request to various archives using an overlay on the live web page and a simple interface.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"42 1","pages":"469-470"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86737864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing Digital Preservation Strategy: Developing content collection profiles at the British Library","authors":"M. Day, A. MacDonald, M. Pennock, Akiko Kimura","doi":"10.1109/JCDL.2014.6970145","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970145","url":null,"abstract":"The British Library is increasingly a digital library. Through both digitization and acquisition, it has built up significant collections of digital content covering a very wide range of content types. Most recently, the extension of legal deposit provisions to non-print works in 2013 has meant that it - working in conjunction with the other UK legal deposit libraries - has begun to collect new categories of digital content, including periodic harvests of the UK Web domain. In order to support this, the Library has also invested heavily in developing scalable infrastructures for the acquisition, storage and management of large amounts of digital content. The British Library Digital Preservation Strategy, 2013-2016 is focused on the embedding of digital sustainability as an organizational principle across the Library and to help manage preservation risks and challenges across all digital collection content lifecycles. This practice paper describes work being undertaken by the Digital Preservation Team at the British Library to develop content profiles of high-level digital collections that will support the implementation of the strategy, in particular for the capture of long-term preservation requirements.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"8 1","pages":"21-24"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85788003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The anatomy of a search and mining system for digital humanities","authors":"Martyn Harris, M. Levene, Dell Zhang, D. Levene","doi":"10.1109/JCDL.2014.6970163","DOIUrl":"https://doi.org/10.1109/JCDL.2014.6970163","url":null,"abstract":"Samtla (Search And Mining Tools with Linguistic Analysis) is an online integrated research environment designed in collaboration with historians and linguists to facilitate the study of digitised texts written in any language. It currently supports the research of two corpora: the Genizah collection held by the Taylor-Schechter Genizah Research Unit in Cambridge University, and a collection of Aramaic incantation texts from late antiquity. In contrast to standard search engines and text mining systems that rely on the bag-of-words representation of text, Samtla provides the retrieval and discovery of fuzzy text patterns/motifs (aka “formulae” to historians), which is achieved through applying a character-based n-gram statistical language model built on top of a powerful generalised suffix tree data structure. This paper brie y describes the major components of Samtla and their underlying techniques.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"11 1","pages":"165-168"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80280288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explore the stacks: A system for exploration in large digital libraries","authors":"M. Hall","doi":"10.5555/2740769.2740845","DOIUrl":"https://doi.org/10.5555/2740769.2740845","url":null,"abstract":"Providing access to large digital library collections to novice users requires novel interfaces that are not built around the concept of search, as novice users frequently struggle to formulate appropriate queries. This paper presents the “Explore the Stacks” system, which provides a novel, browsing-focused interface for exploring digital library collections that is applicable to Big Data scale digital libraries. The system is demonstrated using a collection of approximately one million book illustrations provided by the British Library.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"1 1","pages":"417-418"},"PeriodicalIF":0.0,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79898587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}