Yusra Shakeel, Rand Alchokr, J. Krüger, G. Saake, Thomas Leich
{"title":"Are Altmetrics Proxies or Complements to Citations for Assessing Impact in Computer Science?","authors":"Yusra Shakeel, Rand Alchokr, J. Krüger, G. Saake, Thomas Leich","doi":"10.1109/JCDL52503.2021.00037","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00037","url":null,"abstract":"Altmetrics represent an alternative to established citation-based metrics to measure the scientific impact of a publication. For instance, they cover social-media platforms (e.g., Twitter, YouTube) to elicit how individuals outside of the scientific community interact with publications. Still, it is somewhat unclear to what extent Altmetrics are a valuable addition to existing metrics, or may represent only proxies without additional value. In this paper, we present our current steps towards understanding this problem in more detail. To this end, we describe and discuss the results of an initial correlation study that revealed significant positive correlations of different strengths between four categories of Altmetrics and citations. We elaborate on potential causes for, and the impact of, these correlations to define steps for future research aimed at understanding the value of Altrnetrics.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115563404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Beck, M. Schubotz, V. Stange, Norman Meuschke, Bela Gipp
{"title":"Recognize, Annotate, and Visualize Parallel Content Structures in XML Documents","authors":"Marco Beck, M. Schubotz, V. Stange, Norman Meuschke, Bela Gipp","doi":"10.1109/JCDL52503.2021.00078","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00078","url":null,"abstract":"We present a four-phase parallel approach for capturing, annotating, and visualizing parallel structures in XML documents. We designed a highlighting strategy that first decomposes XML documents in various data streams, including plain text, formulae, and images. Second, those streams are processed with external algorithms and tools optimized for specific tasks, such as analyzing similarities or differences or differences in the respective formats. Third, we compute comparison metadata such as annotations and highlighting marks. Fourth, the position information is concatenated based on the original XML's computed positions document. Eventually, the resulting comparison can then be visualized or processed further while keeping the reference to the source documents intact. While our algorithm has been developed for visualizing similarities as part of plagiarism detection tasks, we expect that many applications will benefit from a well-designed and integrative method that separates between addressing the match locations and inserting highlight marks. For example, our algorithm can also add comments in XML-unaware plaintext editors. We also treat the edge cases, overlaps as well as multi-match with our approach.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130115677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bennett, W. Sutherland, Yubing Tian, Megan Finn, Amelia Acker
{"title":"Pathways to Data: From Plans to Datasets","authors":"A. Bennett, W. Sutherland, Yubing Tian, Megan Finn, Amelia Acker","doi":"10.1109/JCDL52503.2021.00077","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00077","url":null,"abstract":"What is the relationship between Data Management Plans (DMPs), DMP guidance documents, and the reality of end-of-project data preservation and access? In this short paper we report on some preliminary findings of a 3-year investigation into the impact of DMPs on federally funded science in the United States. We investigated a small sample of publicly accessible DMPs (N=14) published using DMPTool. We found that while DMPs followed the National Science Foundation's guidelines, the pathways to the resulting research data are often obscure, vague, or not obvious. We define two “data pathways” as the search tactics and strategies deployed in order to find datasets.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125023487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Banerjee, Debarshi Kumar Sanyal, S. Chattopadhyay, Plaban Kumar Bhowmick, P. Das
{"title":"Automatic Recognition of Learning Resource Category in a Digital Library","authors":"S. Banerjee, Debarshi Kumar Sanyal, S. Chattopadhyay, Plaban Kumar Bhowmick, P. Das","doi":"10.1109/JCDL52503.2021.00039","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00039","url":null,"abstract":"Digital libraries generally need to process a large volume of diverse document types. The collection and tagging of metadata is a long, error-prone, workforce-consuming task. We are attempting to build an automatic metadata extractor for digital libraries. In this work, we present the Heterogeneous Learning Resources (HLR) dataset for document image classification. The individual learning resource is first decomposed into its constituent document images (sheets) which are then passed through an OCR tool to obtain the textual representation. The document image and its textual content are classified with state-of-the-art classifiers. Finally, the labels of the constituent document images are used to predict the label of the overall document.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"92 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123573409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ming Jiang, Yuerong Hu, Glen Worthey, Ryan Dubnicek, T. Underwood, J. S. Downie
{"title":"Evaluating BERT's Encoding of Intrinsic Semantic Features of OCR'd Digital Library Collections","authors":"Ming Jiang, Yuerong Hu, Glen Worthey, Ryan Dubnicek, T. Underwood, J. S. Downie","doi":"10.1109/JCDL52503.2021.00045","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00045","url":null,"abstract":"The uncertainty caused by optical character recognition (OCR) noise has been a primary barrier for digital libraries (DL) to promote their curated datasets for research purposes, particularly when the datasets are fed into advanced language models with less transparency. To shed some light on this issue, this study evaluates the impacts of OCR noise on BERT models for encoding the intrinsic semantic features of OCR'd texts. Specifically, we encoded chapterwise paired OCR'd texts and their cleaned counterparts extracted from books in six domains using BERT pre-trained and fine-tune models respectively. Given the encoded text features, we further calculated the cosine similarity between any two chapters and used normalized discounted cumulative gain (NDCG) [1] to measure BERT variants' capabilities to preserve narrative coherence and semantic relevance among texts. Our empirical results show that (1) BERT embeddings can encode and preserve texts' intrinsic semantic features (i.e., relevance and coherence); and (2) such capabilities are comparatively robust against OCR noise. This should help alleviate some DL users' concerns regarding applying contextualized word embeddings to encode chapter-level or even document-level OCR'd text information, which benefits promoting scholarly use of DL collections. Our research also demonstrates how texts' intrinsic semantic features can be used for evaluating the impacts of OCR noise on advanced language models, which is an underdeveloped and promising direction for future work.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124542539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workshop on the Future of Digital Libraries","authors":"G. Buchanan, Dana Mckay, D. Bainbridge","doi":"10.1109/JCDL52503.2021.00083","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00083","url":null,"abstract":"This workshop will examine the future landscape of digital library research and practice. While conventional facilities of digital libraries, for example indexation, search and browsing of collections of static texts are well understood, there is growing demand for a richer range of content including dynamic data streams, linking of heterogeneous content and automated analysis. We aim to uncover the common agenda for the features of future digital libraries, and the corresponding challenges for research and practice. Contributions from both theory and practice, and from technologists, information scientists and researchers in human information behaviour will all contribute to this workshop.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134190568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter J. Cobb, Esther M. W. Woo, Nicol F. C. Pan, V. Lou, Xiao Hu, Michael Cheng, Jesse Xiao
{"title":"Sharing the Past: the Library as Digital Co-Design Space for Intergenerational Heritage Preservation","authors":"Peter J. Cobb, Esther M. W. Woo, Nicol F. C. Pan, V. Lou, Xiao Hu, Michael Cheng, Jesse Xiao","doi":"10.1109/JCDL52503.2021.00051","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00051","url":null,"abstract":"This poster presents the “Intergenerational Participatory Co-design Project,” an interdisciplinary initiative at the University of Hong Kong for facilitating collaboration among different age groups to design digital historic preservation. This project reimagines the global challenge of aging as an opportunity to enhance cultural heritage when older and younger members of society share their unique knowledge and perspectives. Over the course of the 2019–2020 academic year, four mixed-age groups co-designed a variety of innovative digital products to support the preservation and appreciation of Hong Kong's historic culture. The guiding principle of the project was to engage the participants as co-creators of both their own learning outcomes and learning processes. The participants also had opportunities to develop skills with new technologies for documenting, preserving, and presenting cultural heritage. The University of Hong Kong Libraries served as the central space (both physically and virtually) for facilitating these activities, in partnership with the University's Sau Po Centre on Ageing, the Common Core program, and the Faculty of Education. This project can serve as a model for how libraries can support local communities to digitally embrace an aging society for enhancing cultural heritage.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131665288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jodi Schneider, A. Waard, Wolf-Tilo Balke, Xiaoguang Wang, Ningyuan Song, Bolin Hua, Yuanxi Fu
{"title":"Digital Infrastructures for Scholarly Content Objects","authors":"Jodi Schneider, A. Waard, Wolf-Tilo Balke, Xiaoguang Wang, Ningyuan Song, Bolin Hua, Yuanxi Fu","doi":"10.1109/JCDL52503.2021.00069","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00069","url":null,"abstract":"As digital libraries make the dissemination of research publications easier, they also enable the propagation of invalid or unreliable knowledge. Examples of relevant problems include: retraction and inadvertent citation and reuse of retracted papers [1], [2]; propagation of errors in literature and scientific databases [3], [4]; non-reproducible papers; known domain-specific issues such as cell line contamination [5]; bias in research datasets and publications [6]–[8]; systematic reviews that arrive at different conclusions about the same question at the same time [9], [10]. The digital environment facilitates broad interdisciplinary reuse beyond the originating scientific community; thus, marking known problems and tracing the impact on dependent and follow-on works is particularly important (but still under-addressed). Further, context-specific information inside a paper may not be immediately reusable when extracted by automated processes, leading to apparent contradictions [11]. Current mitigating approaches use the underlying reasoning for information retrieval [12], [13], develop new infrastructures analyzing the reasoning [14]–[16] or certainty [17] of statements, or use visualization to highlight possible discrepancies [10], [15].","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115748439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abigail Mabe, Michael L. Nelson, Michele C. Weigle
{"title":"Extending Chromium: Memento-Aware Browser","authors":"Abigail Mabe, Michael L. Nelson, Michele C. Weigle","doi":"10.1109/JCDL52503.2021.00046","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00046","url":null,"abstract":"Users rely on their web browser to provide information about the websites they are visiting, such as the security state of the web page their viewing. Current browsers do not differentiate between the live Web and the past Web. If a user loads an archived web page, known as a memento, they have to rely on user interface (UI) elements within the page itself to inform them that the page they are viewing is not the live Web. Memento-awareness extends beyond recognizing a page that has already been archived. The browser should give users the ability to easily archive live web pages as they are browsing. This report presents a proof-of-concept browser that is memento-aware and is created by extending Google's open-source web browser Chromium.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"90 1 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129164867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cell Block HTML: Towards Spreadsheet-based Text-Mining for the Masses","authors":"B. Wheeler, D. Bainbridge","doi":"10.1109/JCDL52503.2021.00041","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00041","url":null,"abstract":"This article details a technical advancement in the core ability of spreadsheets to be able to natively handle forms of rich text, such as HTML. We establish the context to the work, and specify the criteria we needed to meet so that the expansion of spreadsheet computation to handle sophisticated forms of text analysis-comparable to that of numeric calculation-remained within the purview of regular users. Implementation details are provided, along with an example illustrating the application of a LDA-based text-mining technique to perform topic modeling.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125929892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}