2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)最新文献

筛选
英文 中文
Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases 德国议会程序的历时分析:政治偏见镜头下的意识形态转变
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-08-13 DOI: 10.1109/JCDL52503.2021.00017
Tobias Walter, Celina Kirschner, Steffen Eger, Goran Glavavs, Anne Lauscher, Simone Paolo Ponzetto
{"title":"Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases","authors":"Tobias Walter, Celina Kirschner, Steffen Eger, Goran Glavavs, Anne Lauscher, Simone Paolo Ponzetto","doi":"10.1109/JCDL52503.2021.00017","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00017","url":null,"abstract":"We analyze bias in historical corpora as encoded in diachronic distributional semantic models by focusing on two specific forms of bias, namely a political (i.e., anti-communism) and racist (i.e., antisemitism) one. For this, we use a new corpus of German parliamentary proceedings, Deuparl, spanning the period 1867–2020. We complement this analysis of historical biases in diachronic word embeddings with a novel measure of bias on the basis of term co-occurrences and graph-based label propagation. The results of our bias measurements align with commonly perceived historical trends of antisemitic and anticommunist biases in German politics in different time periods, thus indicating the viability of analyzing historical bias trends using semantic spaces induced from historical corpora.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132883362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
COMPARE: A Taxonomy and Dataset of Comparison Discussions in Peer Reviews 比较:同行评议中比较讨论的分类和数据集
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-08-09 DOI: 10.1109/JCDL52503.2021.00068
Shruti Singh, M. Singh, Pawan Goyal
{"title":"COMPARE: A Taxonomy and Dataset of Comparison Discussions in Peer Reviews","authors":"Shruti Singh, M. Singh, Pawan Goyal","doi":"10.1109/JCDL52503.2021.00068","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00068","url":null,"abstract":"Comparing research papers is a conventional method to demonstrate progress in experimental research. We present COMPARE, a taxonomy and a dataset of comparison discussions in peer reviews of research papers in the domain of experimental deep learning. From a thorough observation of a large set of review sentences, we build a taxonomy of categories in comparison discussions and present a detailed annotation scheme to analyze this. Overall, we annotate 117 reviews covering 1,800 sentences. We experiment with various methods to identify comparison sentences in peer reviews and report a maximum F1 Score of 0.49. We also pretrain two language models specifically on ML, NLP, and CV paper abstracts and reviews to learn informative representations of peer reviews. The annotated dataset and the pretrained models are available at https://github.com/shruti-singh/COMPARE.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114712655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Profiling Web Archival Voids for Memento Routing 剖析网络档案空白的纪念品路由
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-08-06 DOI: 10.1109/JCDL52503.2021.00027
Sawood Alam, Michele C. Weigle, Michael L. Nelson
{"title":"Profiling Web Archival Voids for Memento Routing","authors":"Sawood Alam, Michele C. Weigle, Michael L. Nelson","doi":"10.1109/JCDL52503.2021.00027","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00027","url":null,"abstract":"Prior work on web archive profiling were focused on Archival Holdings to describe what is present in an archive. This work defines and explores Archival Voids to establish a means to represent portions of URI spaces that are not present in a web archive. Archival Holdings and Archival Voids profiles can work independently or as complements to each other to maximize the Accuracy of Memento Aggregators. We discuss various sources of truth that can be used to create Archival Voids profiles. We use access logs from Arquivo.pt to create various Archival Voids profiles and analyze them against our MemGator access logs for evaluation. We find that we could have avoided more than 8% of additional False Positives on top of the 60% Accuracy we got from profiling Archival Holdings in our prior work, if Arquivo.pt were to provide an Archival Voids profile based on URIs that were requested hundreds of times and never returned any success responses,","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128864078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Garbage, Glitter, or Gold: Assigning Multi-Dimensional Quality Scores to Social Media Seeds for Web Archive Collections 垃圾,闪光,还是黄金:为网络档案收藏的社交媒体种子分配多维质量分数
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-07-06 DOI: 10.1109/JCDL52503.2021.00020
Alexander C. Nwala, Michele C. Weigle, Michael L. Nelson
{"title":"Garbage, Glitter, or Gold: Assigning Multi-Dimensional Quality Scores to Social Media Seeds for Web Archive Collections","authors":"Alexander C. Nwala, Michele C. Weigle, Michael L. Nelson","doi":"10.1109/JCDL52503.2021.00020","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00020","url":null,"abstract":"From popular uprisings to pandemics, the Web is an essential source consulted by scientists and historians for reconstructing and studying past events. Unfortunately, the Web is plagued by link rot and content drift (reference rot) which causes important Web resources to disappear. Web archive collections help reduce the costly effects of reference rot by saving Web resources that chronicle important stories and events before they disappear. These collections often begin with URLs called seeds, hand-selected by experts or scraped from social media posts. The quality of social media content content varies widely, therefore, we propose a framework for assigning multidimensional quality scores to social media seeds for Web archive collections about stories and events. We leveraged contributions from social media research for attributing quality to social media content and users based on credibility, reputation, and influence. We combined these with additional contributions from the Web archive research that emphasizes the importance of considering geographical and temporal constraints when selecting seeds. Next, we developed the Quality Proxies (QP) framework which assigns seeds extracted from social media a quality score across 10 major dimensions: popularity, geographical, temporal, subject-expert, retrievability, relevance, reputation, and scarcity. We instantiated the framework and showed that seeds can be scored across multiple QP classes that map to different policies for ranking seeds such as prioritizing seeds from local news, reputable and/or popular sources, etc. The QP framework is extensible and robust; seeds can be scored when a subset of the QP dimensions are absent. Most importantly, scores assigned by Quality Proxies are explainable, providing the opportunity to critique them. Our results showed that Quality Proxies resulted in the selection of quality seeds with increased precision (by ≈0.13) when novelty is and is not prioritized. These contributions provide an explainable score applicable to rank and select quality seeds for Web archive collections and other domains that select seeds from social media.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128738845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations 自动元数据提取结合视觉特征从扫描电子论文和学位论文
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-07-01 DOI: 10.1109/JCDL52503.2021.00066
Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, E. Fox
{"title":"Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations","authors":"Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, E. Fox","doi":"10.1109/JCDL52503.2021.00066","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00066","url":null,"abstract":"Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research trends. Automatic metadata extraction is important to build scalable digital library search engines. Most existing methods are designed for born-digital documents such as GROBID, CERMINE, and ParsCit, so they often fail to extract metadata from scanned documents such as for ETDs. Traditional sequence tagging methods mainly rely on text-based features. In this paper, we propose a conditional random field (CRF) model that combines text-based and visual features. To verify the robustness of our model, we extended an existing corpus and created a new ground truth corpus consisting of 500 ETD cover pages with human validated metadata. Our experiments show that CRF with visual features outperformed both a heuristic baseline and a CRF model with only text-based features. The proposed model achieved 81.3%-96% F1 measure on seven metadata fields. The data and source code are publicly available on Google Drive11httns://tinvurl.com/y8kxzwrp and a GitHub repository22https://github.com/lamps-lab/ETDMiner/tree/master/etd_crf, respectively.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
GraphConfRec: A Graph Neural Network-Based Conference Recommender System GraphConfRec:基于图神经网络的会议推荐系统
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-06-23 DOI: 10.1109/JCDL52503.2021.00021
Andreea Iana, Heiko Paulheim
{"title":"GraphConfRec: A Graph Neural Network-Based Conference Recommender System","authors":"Andreea Iana, Heiko Paulheim","doi":"10.1109/JCDL52503.2021.00021","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00021","url":null,"abstract":"In today's academic publishing model, especially in Computer Science, conferences commonly constitute the main platforms for releasing the latest peer-reviewed advancements in their respective fields. However, choosing a suitable academic venue for publishing one's research can represent a challenging task considering the plethora of available conferences, particularly for those at the start of their academic careers, or for those seeking to publish outside of their usual domain. In this paper, we propose GraphConfRec, a conference recommender system which combines SciGraph and graph neural networks, to infer suggestions based not only on title and abstract, but also on coauthorship and citation relationships. GraphConfRec achieves a recall@10 of up to 0.580 and a MAP of up to 0.336 with a graph attention network-based recommendation model. A user study with 25 subjects supports the positive results.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"308 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120838797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations ScanBank:从扫描电子论文和学位论文中提取图形的基准数据集
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-06-23 DOI: 10.1109/JCDL52503.2021.00030
S. Kahu, William A. Ingram, E. Fox, Jian Wu
{"title":"ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations","authors":"S. Kahu, William A. Ingram, E. Fox, Jian Wu","doi":"10.1109/JCDL52503.2021.00030","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00030","url":null,"abstract":"We focus on electronic theses and dissertations (ETDs), aiming to improve access and expand their utility, since more than 6 million are publicly available, and they constitute an important corpus to aid research and education across disciplines. The corpus is growing as new born-digital documents are included, and since millions of older theses and dissertations have been converted to digital form to be disseminated electronically in institutional repositories. In ETDs, as with other scholarly works, figures and tables can communicate a large amount of information in a concise way. Although methods have been proposed for extracting figures and tables from born-digital PDFs, they do not work well with scanned ETDs. Considering this problem, our assessment of state-of-the-art figure extraction systems is that the reason they do not function well on scanned PDFs is that they have only been trained on born-digital documents. To address this limitation, we present ScanBank, a new dataset containing 10 thousand scanned page images, manually labeled by humans as to the presence of the 3.3 thousand figures or tables found therein. We use this dataset to train a deep neural network model based on YOLOv5 to accurately extract figures and tables from scanned ETDs. We pose and answer important research questions aimed at finding better methods for figure extraction from scanned documents. One of those concerns the value for training, of data augmentation techniques applied to born-digital documents which are used to train models better suited for figure extraction from scanned documents. To the best of our knowledge, ScanBank is the first manually annotated dataset for figure and table extraction for scanned ETDs. A YOLOv5-based model, trained on ScanBank, outperforms existing comparable open-source and freely available baseline methods by a considerable margin.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121203439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
TweetPap: A Dataset to Study the Social Media Discourse of Scientific Papers TweetPap:一个研究科学论文的社交媒体话语的数据集
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-06-14 DOI: 10.1109/JCDL52503.2021.00055
Naman Jain, M. Singh
{"title":"TweetPap: A Dataset to Study the Social Media Discourse of Scientific Papers","authors":"Naman Jain, M. Singh","doi":"10.1109/JCDL52503.2021.00055","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00055","url":null,"abstract":"Nowadays, researchers have moved to platforms like Twitter to spread information about their ideas and empirical evidence. Recent studies have shown that social media affects the scientific impact of a paper. However, these studies only utilize the tweet counts to represent Twitter activity. In this paper, we propose TweetPap, a large-scale dataset that introduces temporal information of citation/tweets and the metadata of the tweets to quantify and understand the discourse of scientific papers on social media. The dataset is publicly available at https://github.com/lingo-iitgn/TweetPap.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"100 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114126124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ConSTR: A Contextual Search Term Recommender ConSTR:上下文搜索词推荐器
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-06-08 DOI: 10.1109/JCDL52503.2021.00042
Thomas Kramer, Zeljko Carevic, Dwaipayan Roy, Claus-Peter Klas, Philipp Mayr
{"title":"ConSTR: A Contextual Search Term Recommender","authors":"Thomas Kramer, Zeljko Carevic, Dwaipayan Roy, Claus-Peter Klas, Philipp Mayr","doi":"10.1109/JCDL52503.2021.00042","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00042","url":null,"abstract":"In this demo paper, we present ConSTR, a novel Contextual Search Term Recommender that utilises the user's interaction context for search term recommendation and literature retrieval. ConSTR integrates a two-layered recommendation interface: the first layer suggests terms with respect to a user's current search term, and the second layer suggests terms based on the users' previous search activities (interaction context). For the demonstration, ConSTR is built on the arXiv, an academic repository consisting of 1.8 million documents.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121317663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MexPub: Deep Transfer Learning for Metadata Extraction from German Publications 从德国出版物中提取元数据的深度迁移学习
2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Pub Date : 2021-06-04 DOI: 10.1109/JCDL52503.2021.00076
Zeyd Boukhers, Nada Beili, Timo Hartmann, Prantik Goswami, Muhammad Arslan Zafar
{"title":"MexPub: Deep Transfer Learning for Metadata Extraction from German Publications","authors":"Zeyd Boukhers, Nada Beili, Timo Hartmann, Prantik Goswami, Muhammad Arslan Zafar","doi":"10.1109/JCDL52503.2021.00076","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00076","url":null,"abstract":"In contrast to most of the English scientific publications that follow standard and simple layouts, the order, content, position and size of metadata in German publications vary greatly among publications. This variety makes traditional NLP methods fail to accurately extract metadata from these publications. In this paper, we present a method that extracts metadata from PDF documents with different layouts and styles by viewing the document as an image. We used Mask R-CNN which is trained on COCO dataset and finetuned with PubLayNet dataset that consists of 200K PDF snapshots with five basic classes (e.g, text, figure, etc). We refine-tuned the model on our proposed synthetic dataset consisting of 30K article snapshots to extract nine patterns (i.e. author, title, etc). Our synthetic dataset is generated using contents in both languages German and English and a finite set of challenging templates obtained from German publications. Our method achieved an average accuracy of around 90% which validates its capability to accurately extract metadata from a variety of PDF documents with challenging templates.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125920144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信