{"title":"Methods for generation, recommendation, exploration and analysis of scholarly publications","authors":"Gianmaria Silvello, Oscar Corcho, Paolo Manghi","doi":"10.1007/s00799-024-00409-1","DOIUrl":"https://doi.org/10.1007/s00799-024-00409-1","url":null,"abstract":"<p>In the shifting landscape of sharing knowledge, it is no longer only about writing papers. After a paper is written, what comes next is an integral part of the process. This special issue delves into the transformative landscape of scholarly communication, exploring novel methodologies and technologies reshaping how scholarly content is generated, recommended, explored and analysed. Indeed, the contemporary perspective on scholarly publication recognizes the centrality of post-publication activities. The criticality of refining and scrutinizing manuscripts has gained prominence, surpassing the act of dissemination. The emphasis has shifted from publication to ensuring visibility and comprehension of the conveyed content. The papers compiled in this special issue scrutinize these evolving dynamics. They delve into the intricacies of post-processing and close examination of manuscripts, acknowledging the impact of these aspects. The overarching objective is to stimulate scholarly discussions on the evolving nature of communication in academia.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"18 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Backes, Anastasiia Iurshina, Muhammad Ahsan Shahid, Philipp Mayr
{"title":"Comparing free reference extraction pipelines","authors":"Tobias Backes, Anastasiia Iurshina, Muhammad Ahsan Shahid, Philipp Mayr","doi":"10.1007/s00799-024-00404-6","DOIUrl":"https://doi.org/10.1007/s00799-024-00404-6","url":null,"abstract":"<p>In this paper, we compare the performance of several popular pre-trained reference extraction and segmentation toolkits combined in different pipeline configurations on three different datasets. The extraction is end-to-end, i.e. the input is PDF documents, and the output is parsed reference objects. The evaluation is for reference strings and individual fields in the reference objects using alignment by identical fields and close-to-identical values. Our results show that Grobid and AnyStyle perform best of all compared tools, although one may want to use them in combination. Our work is meant to serve as a reference for researchers interested in applying out-of-the-box reference extraction and -parsing tools, for example, as a preprocessing step to a more complex research question. Our detailed results on different datasets with results for individual parsed fields will allow them to focus on aspects that are particularly important to them.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"46 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital detection of play characters’ relationships in Shakespeare’s plays: extended cross-correlation analysis of the character appearance frequencies","authors":"Miyuki Yamada, Yuichi Murai, Ichiro Kumagai","doi":"10.1007/s00799-024-00401-9","DOIUrl":"https://doi.org/10.1007/s00799-024-00401-9","url":null,"abstract":"<p>We propose a method for visualizing literary works that quantitatively extracts the mutual relationships among play characters from the narrative of a storyline. The method first determines the cross-correlation of the appearance frequencies in the time domain between two play characters, which is calculated for all pairs of characters in each narrative. We also calculate the correlation among three play characters to find unique triangular relationships. Then we create a graphical representation of the relationships using node-link representations based on a physical potential model. The method is suitable for dramas, as demonstrated for ten famous Shakespeare plays. The resulting visualizations show good agreement with the conventional understanding of each play and also provide new insight into Shakespearean criticism.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"12 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Book recommendation system: reviewing different techniques and approaches","authors":"P. Devika, A. Milton","doi":"10.1007/s00799-024-00403-7","DOIUrl":"https://doi.org/10.1007/s00799-024-00403-7","url":null,"abstract":"<p>E-reading has become more popular by making the number of book readers high in number. With online book reading websites, it is much simpler to read any book at any time by simply typing its name into a search engine. These websites offer free reading platform to users with unlimited number of choices without exceeding any rights. However, statistics reveal that reading is dwindling, particularly among young people. In this survey, we presented several existing approaches employed to design a book recommendation system from 2012 to 2023. Different types of datasets, used to extract information about books and users, in terms of features, source and usage were discussed. Six different categories for book recommendation techniques have been recognized and discussed which would build the groundwork for future study in this area. The issues related to book recommendation system was also briefly discussed. We have discussed on the performance analysis of various research works on book recommendation system. We have also highlighted the research concerns and future scope to improve the performance of book recommender system. We hope these findings will help researchers to explore more in book recommender systems particularly.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"64 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140926722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structured abstract generator (SAG) model: analysis of IMRAD structure of articles and its effect on extractive summarization","authors":"Ayşe Esra Özkan Çelik, Umut Al","doi":"10.1007/s00799-024-00402-8","DOIUrl":"https://doi.org/10.1007/s00799-024-00402-8","url":null,"abstract":"<p>An abstract is the most crucial element that may convince readers to read the complete text of a scientific publication. However, studies show that in terms of organization, readability, and style, abstracts are also among the most troublesome parts of the pertinent manuscript. The ultimate goal of this article is to produce better understandable abstracts with automatic methods that will contribute to scientific communication in Turkish. We propose a summarization system based on extractive techniques combining general features that have been shown to be beneficial for Turkish. To construct the data set for this aim, a sample of 421 peer-reviewed Turkish articles in the field of librarianship and information science was developed. First, the structure of the full-texts, and their readability in comparison with author abstracts, were examined for text quality evaluation. A content-based evaluation of the system outputs was then carried out. System outputs, in cases of using and ignoring structural features of full-texts, were compared. Structured outputs outperformed classical outputs in terms of content and text quality. Each output group has better readability levels than their original abstracts. Additionally, it was discovered that higher-quality outputs are correlated with more structured full-texts, highlighting the importance of structural writing. Finally, it was determined that our system can facilitate the scholarly communication process as an auxiliary tool for authors and editors. Findings also indicate the significance of structural writing for better scholarly communication.\u0000</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"27 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140926494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William A. Ingram, Jian Wu, Sampanna Yashwant Kahu, Javaid Akbar Manzoor, Bipasha Banerjee, Aman Ahuja, Muntabir Hasan Choudhury, Lamia Salsabil, Winston Shields, Edward A. Fox
{"title":"Building datasets to support information extraction and structure parsing from electronic theses and dissertations","authors":"William A. Ingram, Jian Wu, Sampanna Yashwant Kahu, Javaid Akbar Manzoor, Bipasha Banerjee, Aman Ahuja, Muntabir Hasan Choudhury, Lamia Salsabil, Winston Shields, Edward A. Fox","doi":"10.1007/s00799-024-00395-4","DOIUrl":"https://doi.org/10.1007/s00799-024-00395-4","url":null,"abstract":"<p>Despite the millions of electronic theses and dissertations (ETDs) publicly available online, digital library services for ETDs have not evolved past simple search and browse at the metadata level. We need better digital library services that allow users to discover and explore the content buried in these long documents. Recent advances in machine learning have shown promising results for decomposing documents into their constituent parts, but these models and techniques require data for training and evaluation. In this article, we present high-quality datasets to train, evaluate, and compare machine learning methods in tasks that are specifically suited to identify and extract key elements of ETD documents. We explain how we construct the datasets by manual labeling the data or by deriving labeled data through synthetic processes. We demonstrate how our datasets can be used to develop downstream applications and to evaluate, retrain, or fine-tune pre-trained machine learning models. We describe our ongoing work to compile benchmark datasets and exploit machine learning techniques to build intelligent digital libraries for ETDs.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"83 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140926607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle
{"title":"Robots still outnumber humans in web archives in 2019, but less than in 2015 and 2012","authors":"Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle","doi":"10.1007/s00799-024-00397-2","DOIUrl":"https://doi.org/10.1007/s00799-024-00397-2","url":null,"abstract":"<p>The significance of the web and the crucial role of web archives in its preservation highlight the necessity of understanding how users, both human and robot, access web archive content, and how best to satisfy this disparate needs of both types of users. To identify robots and humans in web archives and analyze their respective access patterns, we used the Internet Archive’s (IA) Wayback Machine access logs from 2012, 2015, and 2019, as well as Arquivo.pt’s (Portuguese Web Archive) access logs from 2019. We identified user sessions in the access logs and classified those sessions as human or robot based on their browsing behavior. To better understand how users navigate through the web archives, we evaluated these sessions to discover user access patterns. Based on the two archives and between the three years of IA access logs (2012 vs. 2015 vs. 2019), we present a comparison of detected robots vs. humans and their user access patterns and temporal preferences. The total number of robots detected in IA 2012 (91% of requests) and IA 2015 (88% of requests) is greater than in IA 2019 (70% of requests). Robots account for 98% of requests in Arquivo.pt (2019). We found that the robots are almost entirely limited to “Dip” and “Skim” access patterns in IA 2012 and 2015, but exhibit all the patterns and their combinations in IA 2019. Both humans and robots show a preference for web pages archived in the near past.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"10 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140074149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ko Senoo, Yohei Seki, Wakako Kashino, Atsushi Keyaki, Noriko Kando
{"title":"Stance prediction with a relevance attribute to political issues in comparing the opinions of citizens and city councilors","authors":"Ko Senoo, Yohei Seki, Wakako Kashino, Atsushi Keyaki, Noriko Kando","doi":"10.1007/s00799-024-00396-3","DOIUrl":"https://doi.org/10.1007/s00799-024-00396-3","url":null,"abstract":"<p>This study focuses on a method for differentiating between the stance of citizens and city councilors on political issues (i.e., in favor or against) and attempts to compare the arguments of both sides. We created a dataset by annotating citizen tweets and city council minutes with labels for four attributes: stance, usefulness, regional dependence, and relevance. We then fine-tuned pretrained large language model using this dataset to assign the attribute labels to a large quantity of unlabeled data automatically. We introduced multitask learning to train each attribute jointly with relevance to identify the clues by focusing on those sentences that were relevant to the political issues. Our prediction models are based on T5, a large language model suitable for multitask learning. We compared the results from our system with those that used BERT or RoBERTa. Our experimental results showed that the macro-F1-scores for stance were improved by 1.8% for citizen tweets and 1.7% for city council minutes with multitask learning. Using the fine-tuned model to analyze real opinion gaps, we found that although the vaccination regime was positively evaluated by city councilors in Fukuoka city, it was not rated very highly by citizens.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"73 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139979688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards privacy-aware exploration of archived personal emails","authors":"Zoe Bartliff, Yunhyong Kim, Frank Hopfgartner","doi":"10.1007/s00799-024-00394-5","DOIUrl":"https://doi.org/10.1007/s00799-024-00394-5","url":null,"abstract":"<p>This paper examines how privacy measures, such as anonymisation and aggregation processes for email collections, can affect the perceived usefulness of email visualisations for research, especially in the humanities and social sciences. The work is intended to inform archivists and data managers who are faced with the challenge of accessioning and reviewing increasingly sizeable and complex personal digital collections. The research in this paper provides a focused user study to investigate the usefulness of data visualisation as a mediator between privacy-aware management of data and maximisation of research value of data. The research is carried out with researchers and archivists with vested interest in using, making sense of, and/or archiving the data to derive meaningful results. Participants tend to perceive email visualisations as useful, with an average rating of 4.281 (out of 7) for all the visualisations in the study, with above average ratings for mountain graphs and word trees. The study shows that while participants voice a strong desire for information identifying individuals in email data, they perceive visualisations as almost equally useful for their research and/or work when aggregation is employed in addition to anonymisation.\u0000</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"79 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139921521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting the untapped functional potential of Memento aggregators beyond aggregation","authors":"Mat Kelly","doi":"10.1007/s00799-023-00391-0","DOIUrl":"https://doi.org/10.1007/s00799-023-00391-0","url":null,"abstract":"<p>Web archives capture, retain, and present historical versions of web pages. Viewing web archives often amounts to a user visiting the Wayback Machine homepage, typing in a URL, then choosing a date and time significant of the capture. Other web archives also capture the web and use Memento as an interoperable point of querying their captures. Memento aggregators are web accessible software packages that allow clients to send requests for past web pages to a single endpoint source that then relays that request to a set of web archives. Though few deployed aggregator instances exist that exhibit this aggregation trait, they all, for the most part, align to a model of serving a request for a URI of an original resource (URI-R) to a client by first querying then aggregating the results of the responses from a collection of web archives. This single tier querying need not be the logical flow of an aggregator, so long as a user can still utilize the aggregator from a single URL. In this paper, we discuss theoretical aggregation models of web archives. We first describe the status quo as the conventional behavior exhibited by an aggregator. We then build on prior work to describe a multi-tiered, structured querying model that may be exhibited by an aggregator. We highlight some potential issues and high-level optimization to ensure efficient aggregation while also extending on the state-of-the-art of memento aggregation. Part of our contribution is the extension of an open-source, user-deployable Memento aggregator to exhibit the capability described in this paper. We also extend a browser extension that typically consults an aggregator to have the ability to aggregate itself rather than needing to consult an external service. A purely client-side, browser-based Memento aggregator is novel to this work.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"4 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139582920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}