Yannis Foufoulas, Eleni Zacharia, Harry Dimitropoulos, Natalia Manola, Yannis Ioannidis
{"title":"DETEXA: declarative extensible text exploration and analysis through SQL.","authors":"Yannis Foufoulas, Eleni Zacharia, Harry Dimitropoulos, Natalia Manola, Yannis Ioannidis","doi":"10.1007/s00799-023-00358-1","DOIUrl":"10.1007/s00799-023-00358-1","url":null,"abstract":"<p><p>Metadata enrichment through text mining techniques is becoming one of the most significant tasks in digital libraries. Due to the exponential increase of open access publications, several new challenges have emerged. Raw data are usually big, unstructured, and come from heterogeneous data sources. In this paper, we introduce a text analysis framework implemented in extended SQL that exploits the scalability characteristics of modern database management systems. The purpose of this framework is to provide the opportunity to build performant end-to-end text mining pipelines which include data harvesting, cleaning, processing, and text analysis at once. SQL is selected due to its declarative nature which offers fast experimentation and the ability to build APIs so that domain experts can edit text mining workflows via easy-to-use graphical interfaces. Our experimental analysis demonstrates that the proposed framework is very effective and achieves significant speedup, up to three times faster, in common use cases compared to other popular approaches.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":" ","pages":"1-13"},"PeriodicalIF":1.6,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10170051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9688563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting answer acceptability for question-answering system","authors":"P. Roy","doi":"10.1007/s00799-023-00357-2","DOIUrl":"https://doi.org/10.1007/s00799-023-00357-2","url":null,"abstract":"","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"7 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88891860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep author name disambiguation using DBLP data","authors":"Zeyd Boukhers, Nagaraj Bahubali Asundi","doi":"10.1007/s00799-023-00361-6","DOIUrl":"https://doi.org/10.1007/s00799-023-00361-6","url":null,"abstract":"Abstract In the academic world, the number of scientists grows every year and so does the number of authors sharing the same names. Consequently, it is challenging to assign newly published papers to their respective authors. Therefore, author name ambiguity is considered a critical open problem in digital libraries. This paper proposes an author name disambiguation approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use data collected from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136314547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards automated meta-review generation via an NLP/ML pipeline in different stages of the scholarly peer review process","authors":"Asheesh Kumar, Tirthankar Ghosal, Saprativa Bhattacharjee, Asif Ekbal","doi":"10.1007/s00799-023-00359-0","DOIUrl":"https://doi.org/10.1007/s00799-023-00359-0","url":null,"abstract":"","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"36 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85022473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hermann Kroll, Jan Pirklbauer, Jan-Christoph Kalo, Morris Kunz, Johannes Ruthmann, Wolf-Tilo Balke
{"title":"A discovery system for narrative query graphs: entity-interaction-aware document retrieval.","authors":"Hermann Kroll, Jan Pirklbauer, Jan-Christoph Kalo, Morris Kunz, Johannes Ruthmann, Wolf-Tilo Balke","doi":"10.1007/s00799-023-00356-3","DOIUrl":"10.1007/s00799-023-00356-3","url":null,"abstract":"<p><p>Finding relevant publications in the scientific domain can be quite tedious: Accessing large-scale document collections often means to formulate an initial keyword-based query followed by many refinements to retrieve a <i>sufficiently complete, yet manageable</i> set of documents to satisfy one's information need. Since keyword-based search limits researchers to formulating their information needs as a set of unconnected keywords, retrieval systems try to guess each user's intent. In contrast, distilling short narratives of the searchers' information needs into simple, yet precise entity-interaction graph patterns provides all information needed for a precise search. As an additional benefit, such graph patterns may also feature variable nodes to flexibly allow for different substitutions of entities taking a specified role. An evaluation over the PubMed document collection quantifies the gains in precision for our novel entity-interaction-aware search. Moreover, we perform expert interviews and a questionnaire to verify the usefulness of our system in practice. This paper extends our previous work by giving a comprehensive overview about the discovery system to realize narrative query graph retrieval.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":" ","pages":"1-22"},"PeriodicalIF":1.5,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10123011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10092914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Organisciak, Benjamin M. Schmidt, M. Durward
{"title":"Approximate nearest neighbor for long document relationship labeling in digital libraries","authors":"Peter Organisciak, Benjamin M. Schmidt, M. Durward","doi":"10.1007/s00799-023-00354-5","DOIUrl":"https://doi.org/10.1007/s00799-023-00354-5","url":null,"abstract":"","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"455 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77031104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erika Alves dos Santos, S. Peroni, M. L. Mucheroni
{"title":"Referencing behaviours across disciplines: publication types and common metadata for defining bibliographic references","authors":"Erika Alves dos Santos, S. Peroni, M. L. Mucheroni","doi":"10.1007/s00799-023-00351-8","DOIUrl":"https://doi.org/10.1007/s00799-023-00351-8","url":null,"abstract":"","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80516342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scientific document processing: challenges for modern learning methods.","authors":"Abhinav Ramesh Kashyap, Yajing Yang, Min-Yen Kan","doi":"10.1007/s00799-023-00352-7","DOIUrl":"10.1007/s00799-023-00352-7","url":null,"abstract":"<p><p>Neural network models enjoy success on language tasks related to Web documents, including news and Wikipedia articles. However, the characteristics of scientific publications pose specific challenges that have yet to be satisfactorily addressed: the discourse structure of scientific documents crucial in scholarly document processing (SDP) tasks, the interconnected nature of scientific documents, and their multimodal nature. We survey modern neural network learning methods that tackle these challenges: those that can model discourse structure and their interconnectivity and use their multimodal nature. We also highlight efforts to collect large-scale datasets and tools developed to enable effective deep learning deployment for SDP. We conclude with a discussion on upcoming trends and recommend future directions for pursuing neural natural language processing approaches for SDP.</p>","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":" ","pages":"1-27"},"PeriodicalIF":1.5,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9770420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jill P. Naiman, Peter K. G. Williams, Alyssa Goodman
{"title":"The digitization of historical astrophysical literature with highly localized figures and figure captions","authors":"Jill P. Naiman, Peter K. G. Williams, Alyssa Goodman","doi":"10.1007/s00799-023-00350-9","DOIUrl":"https://doi.org/10.1007/s00799-023-00350-9","url":null,"abstract":"Scientific articles published prior to the “age of digitization” in the late 1990s contain figures which are “trapped” within their scanned pages. While progress to extract figures and their captions has been made, there is currently no robust method for this process. We present a YOLO-based method for use on scanned pages, after they have been processed with Optical character recognition (OCR), which uses both grayscale and OCR features. We focus our efforts on translating the intersection-over-union (IOU) metric from the field of object detection to document layout analysis and quantify “high localization” levels as an IOU of 0.9. When applied to the astrophysics literature holdings of the NASA astrophysics data system, we find F1 scores of 90.9% (92.2%) for figures (figure captions) with the IOU cut-off of 0.9 which is a significant improvement over other state-of-the-art methods.","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136173999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Crane, Alison Babeu, Lisa M. Cerrato, Amelia Parrish, Carolina Penagos, Farnoosh Shamsian, James Tauber, Jake Wegner
{"title":"Beyond translation: engaging with foreign languages in a digital library","authors":"G. Crane, Alison Babeu, Lisa M. Cerrato, Amelia Parrish, Carolina Penagos, Farnoosh Shamsian, James Tauber, Jake Wegner","doi":"10.1007/s00799-023-00349-2","DOIUrl":"https://doi.org/10.1007/s00799-023-00349-2","url":null,"abstract":"","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"21 1","pages":"163 - 176"},"PeriodicalIF":1.5,"publicationDate":"2023-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77985407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}