{"title":"Comparing Personalized PageRank and Activation Spreading in Wikipedia Diagram-Based Search","authors":"Hisham Benotman, D. Maier","doi":"10.1109/JCDL52503.2021.00016","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00016","url":null,"abstract":"Diagram Navigation (DN) is based on using existing diagrams for a domain as maps to navigate and query a collection from different perspectives. With a relatively small number of manual connections, such as ones between diagram concepts and related documents, a domain expert can integrate their perspective of a domain (depicted in a diagram) into the navigation system of a collection. DN utilizes the abundance of internal connections in a collection, such as Wikipedia hyperlinks to access the entire collection. In a Diagram-to-Content (D2C) query, an end user selects a diagram concept to retrieve a ranked list of related collection documents. In a Content-to-Diagram (C2D) query, DN highlights related concepts in a diagram based on document(s) selected by the user. To increase D2C ranking performance, we study and tune Personalized PageRank and an energy-spreading algorithm. We report key differences in how the algorithms rank D2C queries. We show that the tested algorithms are affected differently by Wikipedia graph structures, such as categories and hyperlinks from article templates. We also show that diagrams not only can provide overviews, but they also positively bias the ranking of D2C queries.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124763576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Drahomira Herrmannova, Chathika Gunaratne, V. Walker, Andrew A. Rooney, Robert M. Patton, Mary Wolfe, Charles Schmitt
{"title":"Weak Supervision for Scientific Document Relevance Tagging","authors":"Drahomira Herrmannova, Chathika Gunaratne, V. Walker, Andrew A. Rooney, Robert M. Patton, Mary Wolfe, Charles Schmitt","doi":"10.1109/JCDL52503.2021.00060","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00060","url":null,"abstract":"Developing training data for predicting the relevance of research articles to scientific concepts is a resource-intensive process, and existing datasets are only available for limited subject domains. In this work, we investigate the possibility of weakly supervised data generation for developing relevance models. We approach this by generating document, query, and label triples in an automated manner and by using this data to create a training set for a classification model. Published documents were sampled from an open access repository, and the concepts appearing in these documents were used as queries. We use the location of occurrence of each query concept within a document to determine the relevance label. We find that a classification model trained on this synthetic data can learn to tag documents according to their relevance to a query surprisingly well, providing an 11% f-score improvement over a model trained on ground truth data.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130594413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Sato, A. Jatowt, Yijun Duan, Ricardo Campos, Masatoshi Yoshikawa
{"title":"Estimating Contemporary Relevance of Past News","authors":"M. Sato, A. Jatowt, Yijun Duan, Ricardo Campos, Masatoshi Yoshikawa","doi":"10.1109/JCDL52503.2021.00019","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00019","url":null,"abstract":"Our society generates massive amounts of digital data, significant portion of which is being archived and made accessible to the public for the current and future use. In addition, historical born-analog documents are being increasingly digitized and included in document archives which are available online. Professionals who use document archives tend to know what they wish to search for. Yet, if the results are to be useful and attractive for ordinary users they need to contain content which is interesting and familiar. However, the state-of-the-art retrieval methods for document archives basically apply same techniques as search engines for synchronic document collections. In this paper, we introduce a novel concept of estimating the relation of archival documents to the present times, called contemporary relevance. Contemporary relevance can be used for improving access to archival document collections so that users have higher probability of finding interesting or useful content. We then propose an effective method for computing contemporary relevance degrees of news articles using Learning to Rank with a range of diverse features, and we successfully test it on the New York Times Annotated document collection. Our proposal offers a novel paradigm of information access to archival document collections by incorporating the context of contemporary time.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"95 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116359706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TASSY—A Text Annotation Survey System","authors":"K. Sinha, Norman Meuschke, Bela Gipp","doi":"10.1109/JCDL52503.2021.00052","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00052","url":null,"abstract":"We present a free and open-source tool for creating web-based surveys that include text annotation tasks. Existing tools offer either text annotation or survey functionality but not both. Combining the two input types is particularly relevant for investigating a reader's perception of a text which also depends on the reader's background, such as age, gender, and education. Our tool caters primarily to the needs of researchers in the Library and Information Sciences, the Social Sciences, and the Humanities who apply Content Analysis to investigate, e.g., media bias, political communication, or fake news.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133756993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reem Alsaffar, Susan Gauch, Mohammed Alqahtani, Omar Salman
{"title":"Incorporating Fairness in Paper Recommendation","authors":"Reem Alsaffar, Susan Gauch, Mohammed Alqahtani, Omar Salman","doi":"10.1109/JCDL52503.2021.00050","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00050","url":null,"abstract":"Although many conferences use double-blind reviewing to increase fairness, studies show that bias still occurs. Our research focuses on developing fair algorithms that correct for these biases and select papers from a more demographically diverse group of authors. To increase author diversity and achieve demographic parity, we use multidimensional author profiles with Boolean feature values, i.e., gender, ethnicity, career stage, university rank, and geolocation. Based on these profiles, we present two algorithms that explicitly consider demographic diversity and paper quality during paper recommendation. To evaluate our approaches, we compare the resulting set of conference papers with the actual accepted papers in the conference, measuring the diversity gain, utility savings, and F-measure for each method. Our best method, Multi-Faceted Diversity, produces a set of papers whose authors achieve 95% similarity to the demographics of the pool across multiple dimensions, increasing the selected papers' authors by 46% with only a 2.48% drop in utility. Tasks within academia, such as conference papers, journal papers, grant and proposal reviews, could benefit from applying this approach.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128330464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualizing Feature-based Similarity for Research Paper Recommendation","authors":"Corinna Breitinger, Harald Reiterer","doi":"10.1109/JCDL52503.2021.00033","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00033","url":null,"abstract":"Research paper recommender systems are widely used by academics to discover and explore the most relevant publications on a topic. While existing recommendation interfaces present researchers with a ranked list of publications based on a global relevance score, they fail to visualize the full range of non-textual features uniquely present in academic publications: citations, figures, charts, or images, and mathematical formulae or expressions. Especially for STEM literature, examining such non-textual features efficiently can provide utility to researchers interested in answering specialized research questions or information needs. If research paper search and recommender systems are to consider the similarity of such features as one facet of a content-based similarity assessment for academic literature, new methods for visualizing these non-textual features are needed. In this paper, we review the state-of-the-art in visualizing feature-based similarity in documents. We subsequently propose a set of user-customizable visualization approaches tailored to STEM literature and the research paper recommendation context. Results from a study with 10 expert users show that the interactive visualization interface we propose for the exploration of non-textual features in publications can effectively address specialized information retrieval tasks, which cannot be addressed by existing research paper search or recommendation interfaces.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"415 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122992934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowdsourced Linked Data Question Answering with AQUACOLD","authors":"Nick Collis, Ingo Frommholz","doi":"10.1109/JCDL52503.2021.00043","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00043","url":null,"abstract":"There is a need for Question Answering (QA) to return accurate answers to complex natural language questions over Linked Data, improving the accessibility of Linked Data (LD) search by abstracting the complexity of SPARQL whilst retaining its expressiveness. This work presents AQUACOLD, a LD QA system which harnesses the power of crowdsourcing to meet this need.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121397924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marwa Trabelsi, Cyrille Suire, Jacques Morcos, R. Champagnat
{"title":"A New Methodology to Bring Out Typical Users Interactions in Digital Libraries","authors":"Marwa Trabelsi, Cyrille Suire, Jacques Morcos, R. Champagnat","doi":"10.1109/JCDL52503.2021.00013","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00013","url":null,"abstract":"With the growing amount of digital publications, digital libraries (DLs) attract a variety of users for diverse tasks. A practical need to investigate how users interact with digital library (DL) portals is greatly increasing. Modeling users' interaction in DLs is interestingly required in order to optimize the use of different DL functionalities and to ease the accessibility to stored resources. The aim of this work is to take advantage of Process Mining techniques to model DL user's journeys. To the best of our knowledge, no other research work applied PM to real DLs users journeys. Discovered models can therefore be used in forthcoming work to present a set of recommendations to DL users. However, the large number of generated logs leads to complicated models that are not generic for all users and do not allow achieving all their objectives. For this reason, we propose in this paper a new methodology of grouping users' interactions prior to modeling. We compare our proposed approach to two state-of-the-art methods over a synthetic resource manually annotated used for validation and a real-life user interaction history (event logs) provided by the national library of France. The experimental part shows that our method outperforms existing methods in both clustering and modeling users over the synthetic dataset and generates interesting models on real-world data.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117274775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Surfacing Collective Harms in Privacy Sensitive Data","authors":"Nicholas M. Weber","doi":"10.1109/JCDL52503.2021.00032","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00032","url":null,"abstract":"Privacy protections for human subject data are often focused on reducing individual harms that result from improper disclosure of personally identifiable information. However, in a networked environment where information infrastructures enable rapid sharing and linking of different datasets there exist numerous harms which abstract to group or collective levels. In this paper we discuss how privacy protections aimed at individual harms, as opposed to collective or group harms, results in an incompatible notion of privacy protections for social science research that synthesizes multiple data sources. Using the framework of Contextual Integrity we present empirical scenarios drawn from 17 in-depth interviews with researchers conducting synthetic research using one or more privacy sensitive data sources. We use these scenarios to identify ways that digital infrastructure providers can help social scientists manage collective harms over time through specific, targeted privacy engineering of supporting research infrastructures and data curation.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114438096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasunobu Sumikawa, Ryohei Ikejiri, A. Doucet, Eva Pfanzelter, Mohammed Hasanuzzaman, G. Dias, Ian Milligan, A. Jatowt
{"title":"HistoInformatics2021: The 6th International Workshop on Computational History","authors":"Yasunobu Sumikawa, Ryohei Ikejiri, A. Doucet, Eva Pfanzelter, Mohammed Hasanuzzaman, G. Dias, Ian Milligan, A. Jatowt","doi":"10.1109/JCDL52503.2021.00072","DOIUrl":"https://doi.org/10.1109/JCDL52503.2021.00072","url":null,"abstract":"This paper discusses HistoInformatics2021 workshop (the 6th International Workshop on Computational History) held in conjunction with the JCDL2021 conference. This is the 6th installment of the workshop series devoted to the interaction between Computer Science and History. This interdisciplinary initiative is a response to the growing popularity of Digital Humanities, particularly in historical research, and an increased tendency to apply algorithms and computer techniques for fostering and facilitating new research methods and tools in the Humanities.","PeriodicalId":112400,"journal":{"name":"2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126656138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}