arXiv - CS - Digital Libraries最新文献

筛选
英文 中文
Automating the Identification of High-Value Datasets in Open Government Data Portals 自动识别开放式政府数据门户中的高价值数据集
arXiv - CS - Digital Libraries Pub Date : 2024-06-15 DOI: arxiv-2406.10541
Alfonso Quarati, Anastasija Nikiforova
{"title":"Automating the Identification of High-Value Datasets in Open Government Data Portals","authors":"Alfonso Quarati, Anastasija Nikiforova","doi":"arxiv-2406.10541","DOIUrl":"https://doi.org/arxiv-2406.10541","url":null,"abstract":"Recognized for fostering innovation and transparency, driving economic\u0000growth, enhancing public services, supporting research, empowering citizens,\u0000and promoting environmental sustainability, High-Value Datasets (HVD) play a\u0000crucial role in the broader Open Government Data (OGD) movement. However,\u0000identifying HVD presents a resource-intensive and complex challenge due to the\u0000nuanced nature of data value. Our proposal aims to automate the identification\u0000of HVDs on OGD portals using a quantitative approach based on a detailed\u0000analysis of user interest derived from data usage statistics, thereby\u0000minimizing the need for human intervention. The proposed method involves\u0000extracting download data, analyzing metrics to identify high-value categories,\u0000and comparing HVD datasets across different portals. This automated process\u0000provides valuable insights into trends in dataset usage, reflecting citizens'\u0000needs and preferences. The effectiveness of our approach is demonstrated\u0000through its application to a sample of US OGD city portals. The practical\u0000implications of this study include contributing to the understanding of HVD at\u0000both local and national levels. By providing a systematic and efficient means\u0000of identifying HVD, our approach aims to inform open governance initiatives and\u0000practices, aiding OGD portal managers and public authorities in their efforts\u0000to optimize data dissemination and utilization.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141527520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An open dataset of article processing charges from six large scholarly publishers (2019-2023) 六家大型学术出版商的文章处理费开放数据集(2019-2023 年)
arXiv - CS - Digital Libraries Pub Date : 2024-06-12 DOI: arxiv-2406.08356
Leigh-Ann Butler, Madelaine Hare, Nina Schönfelder, Eric Schares, Juan Pablo Alperin, Stefanie Haustein
{"title":"An open dataset of article processing charges from six large scholarly publishers (2019-2023)","authors":"Leigh-Ann Butler, Madelaine Hare, Nina Schönfelder, Eric Schares, Juan Pablo Alperin, Stefanie Haustein","doi":"arxiv-2406.08356","DOIUrl":"https://doi.org/arxiv-2406.08356","url":null,"abstract":"This paper introduces a dataset of article processing charges (APCs) produced\u0000from the price lists of six large scholarly publishers - Elsevier, Frontiers,\u0000PLOS, MDPI, Springer Nature and Wiley - between 2019 and 2023. APC price lists\u0000were downloaded from publisher websites each year as well as via Wayback\u0000Machine snapshots to retrieve fees per journal per year. The dataset includes\u0000journal metadata, APC collection method, and annual APC price list information\u0000in several currencies (USD, EUR, GBP, CHF, JPY, CAD) for 8,712 unique journals\u0000and 36,618 journal-year combinations. The dataset was generated to allow for\u0000more precise analysis of APCs and can support library collection development\u0000and scientometric analysis estimating APCs paid in gold and hybrid OA journals.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141527521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Which topics are best represented by science maps? An analysis of clustering effectiveness for citation and text similarity networks 科学地图最能体现哪些主题?引文和文本相似性网络的聚类效果分析
arXiv - CS - Digital Libraries Pub Date : 2024-06-10 DOI: arxiv-2406.06454
Juan Pablo Bascur, Suzan Verberne, Nees Jan van Eck, Ludo Waltman
{"title":"Which topics are best represented by science maps? An analysis of clustering effectiveness for citation and text similarity networks","authors":"Juan Pablo Bascur, Suzan Verberne, Nees Jan van Eck, Ludo Waltman","doi":"arxiv-2406.06454","DOIUrl":"https://doi.org/arxiv-2406.06454","url":null,"abstract":"A science map of topics is a visualization that shows topics identified\u0000algorithmically based on the bibliographic metadata of scientific publications.\u0000In practice not all topics are well represented in a science map. We analyzed\u0000how effectively different topics are represented in science maps created by\u0000clustering biomedical publications. To achieve this, we investigated which\u0000topic categories, obtained from MeSH terms, are better represented in science\u0000maps based on citation or text similarity networks. To evaluate the clustering\u0000effectiveness of topics, we determined the extent to which documents belonging\u0000to the same topic are grouped together in the same cluster. We found that the\u0000best and worst represented topic categories are the same for citation and text\u0000similarity networks. The best represented topic categories are diseases,\u0000psychology, anatomy, organisms and the techniques and equipment used for\u0000diagnostics and therapy, while the worst represented topic categories are\u0000natural science fields, geographical entities, information sciences and health\u0000care and occupations. Furthermore, for the diseases and organisms topic\u0000categories and for science maps with smaller clusters, we found that topics\u0000tend to be better represented in citation similarity networks than in text\u0000similarity networks.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coconut Libtool: Bridging Textual Analysis Gaps for Non-Programmers 椰子 Libtool:为非程序员缩小文本分析差距
arXiv - CS - Digital Libraries Pub Date : 2024-06-10 DOI: arxiv-2406.05949
Faizhal Arif Santosa, Manika Lamba, Crissandra George, J. Stephen Downie
{"title":"Coconut Libtool: Bridging Textual Analysis Gaps for Non-Programmers","authors":"Faizhal Arif Santosa, Manika Lamba, Crissandra George, J. Stephen Downie","doi":"arxiv-2406.05949","DOIUrl":"https://doi.org/arxiv-2406.05949","url":null,"abstract":"In the era of big and ubiquitous data, professionals and students alike are\u0000finding themselves needing to perform a number of textual analysis tasks.\u0000Historically, the general lack of statistical expertise and programming skills\u0000has stopped many with humanities or social sciences backgrounds from performing\u0000and fully benefiting from such analyses. Thus, we introduce Coconut Libtool\u0000(www.coconut-libtool.com/), an open-source, web-based application that utilizes\u0000state-of-the-art natural language processing (NLP) technologies. Coconut\u0000Libtool analyzes text data from customized files and bibliographic databases\u0000such as Web of Science, Scopus, and Lens. Users can verify which functions can\u0000be performed with the data they have. Coconut Libtool deploys multiple\u0000algorithmic NLP techniques at the backend, including topic modeling (LDA,\u0000Biterm, and BERTopic algorithms), network graph visualization, keyword\u0000lemmatization, and sunburst visualization. Coconut Libtool is the people-first\u0000web application designed to be used by professionals, researchers, and students\u0000in the information sciences, digital humanities, and computational social\u0000sciences domains to promote transparency, reproducibility, accessibility,\u0000reciprocity, and responsibility in research practices.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141527524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of AI on Academic Research and Publishing 人工智能对学术研究和出版的影响
arXiv - CS - Digital Libraries Pub Date : 2024-06-10 DOI: arxiv-2406.06009
Brady Lund, Manika Lamba, Sang Hoo Oh
{"title":"The Impact of AI on Academic Research and Publishing","authors":"Brady Lund, Manika Lamba, Sang Hoo Oh","doi":"arxiv-2406.06009","DOIUrl":"https://doi.org/arxiv-2406.06009","url":null,"abstract":"Generative artificial intelligence (AI) technologies like ChatGPT, have\u0000significantly impacted academic writing and publishing through their ability to\u0000generate content at levels comparable to or surpassing human writers. Through a\u0000review of recent interdisciplinary literature, this paper examines ethical\u0000considerations surrounding the integration of AI into academia, focusing on the\u0000potential for this technology to be used for scholarly misconduct and necessary\u0000oversight when using it for writing, editing, and reviewing of scholarly\u0000papers. The findings highlight the need for collaborative approaches to AI\u0000usage among publishers, editors, reviewers, and authors to ensure that this\u0000technology is used ethically and productively.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141527523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Analysis of ETDs in ProQuest Dissertations and Theses (PQDT) Global (2016-2018) ProQuest Dissertations and Theses (PQDT)全球ETD文本分析(2016-2018年)
arXiv - CS - Digital Libraries Pub Date : 2024-06-10 DOI: arxiv-2406.06076
Manika Lamba
{"title":"Text Analysis of ETDs in ProQuest Dissertations and Theses (PQDT) Global (2016-2018)","authors":"Manika Lamba","doi":"arxiv-2406.06076","DOIUrl":"https://doi.org/arxiv-2406.06076","url":null,"abstract":"The information explosion in the form of ETDs poses the challenge of\u0000management and extraction of appropriate knowledge for decision-making. Thus,\u0000the present study forwards a solution to the above problem by applying topic\u0000mining and prediction modeling tools to 263 ETDs submitted to the PQDT Global\u0000database during 2016-18 in the field of library science. This study was divided\u0000into two phases. The first phase determined the core topics from the ETDs using\u0000Topic-Modeling-Tool (TMT), which was based on latent dirichlet allocation\u0000(LDA), whereas the second phase employed prediction analysis using\u0000RapidMinerplatform to annotate the future research articles on the basis of the\u0000modeled topics. The core topics (tags) for the studied period were found to be\u0000book history, school librarian, public library, communicative ecology, and\u0000informatics followed by text network and trend analysis on the high probability\u0000cooccurred words. Lastly, a prediction model using Support Vector Machine (SVM)\u0000classifier was created in order to accurately predict the placement of future\u0000ETDs going to be submitted to PQDT Global under the five modeled topics (a to\u0000e). The tested dataset against the trained data set for the predictive\u0000performed perfectly.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141527522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatically detecting scientific political science texts from a large general document index 从大型通用文件索引中自动检测科学政治学文本
arXiv - CS - Digital Libraries Pub Date : 2024-06-05 DOI: arxiv-2406.03067
Nina Smirnova
{"title":"Automatically detecting scientific political science texts from a large general document index","authors":"Nina Smirnova","doi":"arxiv-2406.03067","DOIUrl":"https://doi.org/arxiv-2406.03067","url":null,"abstract":"This technical report outlines the filtering approach applied to the\u0000collection of the Bielefeld Academic Search Engine (BASE) data to extract\u0000articles from the political science domain. We combined hard and soft filters\u0000to address entries with different available metadata, e.g. title, abstract or\u0000keywords. The hard filter is a weighted keyword-based filter approach. The soft\u0000filter uses a multilingual BERT-based classification model, trained to detect\u0000scientific articles from the political science domain. We evaluated both\u0000approaches using an annotated dataset, consisting of scientific articles from\u0000different scientific domains. The weighted keyword-based approach achieved the\u0000highest total accuracy of 0.88. The multilingual BERT-based classification\u0000model was fine-tuned using a dataset of 14,178 abstracts from scientific\u0000articles and reached the highest total accuracy of 0.98.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141547359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Promotional Language and the Adoption of Innovative Ideas in Science 宣传语言与科学创新理念的采用
arXiv - CS - Digital Libraries Pub Date : 2024-06-04 DOI: arxiv-2406.02798
Hao Peng, Huilian Sophie Qiu, Henrik Barslund Fosse, Brian Uzzi
{"title":"Promotional Language and the Adoption of Innovative Ideas in Science","authors":"Hao Peng, Huilian Sophie Qiu, Henrik Barslund Fosse, Brian Uzzi","doi":"arxiv-2406.02798","DOIUrl":"https://doi.org/arxiv-2406.02798","url":null,"abstract":"How are the merits of innovative ideas communicated in science? Here we\u0000conduct semantic analyses of grant application success with a focus on\u0000scientific promotional language, which has been growing in frequency in many\u0000contexts and purportedly may convey an innovative idea's originality and\u0000significance. Our analysis attempts to surmount limitations of prior studies by\u0000examining the full text of tens of thousands of both funded and unfunded grants\u0000from three leading public and private funding agencies: the NIH, the NSF, and\u0000the Novo Nordisk Foundation, one of the world's largest private science\u0000foundations. We find a robust association between promotional language and the\u0000support and adoption of innovative ideas by funders and other scientists.\u0000First, the percentage of promotional language in a grant proposal is associated\u0000with up to a doubling of the grant's probability of being funded. Second, a\u0000grant's promotional language reflects its intrinsic level of innovativeness.\u0000Third, the percentage of promotional language predicts the expected citation\u0000and productivity impact of publications that are supported by funded grants.\u0000Lastly, a computer-assisted experiment that manipulates the promotional\u0000language in our data demonstrates how promotional language can communicate the\u0000merit of ideas through cognitive activation. With the incidence of promotional\u0000language in science steeply rising, and the pivotal role of grants in\u0000converting promising and aspirational ideas into solutions, our analysis\u0000provides empirical evidence that promotional language is associated with\u0000effectively communicating the merits of innovative scientific ideas.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141527525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OpenDataLab: Empowering General Artificial Intelligence with Open Datasets 开放数据实验室:利用开放数据集增强通用人工智能能力
arXiv - CS - Digital Libraries Pub Date : 2024-06-04 DOI: arxiv-2407.13773
Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin
{"title":"OpenDataLab: Empowering General Artificial Intelligence with Open Datasets","authors":"Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin","doi":"arxiv-2407.13773","DOIUrl":"https://doi.org/arxiv-2407.13773","url":null,"abstract":"The advancement of artificial intelligence (AI) hinges on the quality and\u0000accessibility of data, yet the current fragmentation and variability of data\u0000sources hinder efficient data utilization. The dispersion of data sources and\u0000diversity of data formats often lead to inefficiencies in data retrieval and\u0000processing, significantly impeding the progress of AI research and\u0000applications. To address these challenges, this paper introduces OpenDataLab, a\u0000platform designed to bridge the gap between diverse data sources and the need\u0000for unified data processing. OpenDataLab integrates a wide range of open-source\u0000AI datasets and enhances data acquisition efficiency through intelligent\u0000querying and high-speed downloading services. The platform employs a\u0000next-generation AI Data Set Description Language (DSDL), which standardizes the\u0000representation of multimodal and multi-format data, improving interoperability\u0000and reusability. Additionally, OpenDataLab optimizes data processing through\u0000tools that complement DSDL. By integrating data with unified data descriptions\u0000and smart data toolchains, OpenDataLab can improve data preparation efficiency\u0000by 30%. We anticipate that OpenDataLab will significantly boost artificial\u0000general intelligence (AGI) research and facilitate advancements in related AI\u0000fields. For more detailed information, please visit the platform's official\u0000website: https://opendatalab.com.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141737222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Twitter should now be referred to as X: How academics, journals and publishers need to make the nomenclatural transition Twitter 现在应被称为 X:学术界、期刊和出版商需要如何进行命名过渡
arXiv - CS - Digital Libraries Pub Date : 2024-05-31 DOI: arxiv-2405.20670
Jaime A. Teixeira da Silva, Serhii Nazarovets
{"title":"Twitter should now be referred to as X: How academics, journals and publishers need to make the nomenclatural transition","authors":"Jaime A. Teixeira da Silva, Serhii Nazarovets","doi":"arxiv-2405.20670","DOIUrl":"https://doi.org/arxiv-2405.20670","url":null,"abstract":"Here, we note how academics, journals and publishers should no longer refer\u0000to the social media platform Twitter as such, rather as X. Relying on Google\u0000Scholar, we found 16 examples of papers published in the last months of 2023 -\u0000essentially during the transition period between Twitter and X - that used\u0000Twitter and X, but in different ways. Unlike that transition period in which\u0000the binary Twitter/X could have been used in academic papers, we suggest that\u0000papers should no longer refer to Twitter as Twitter, but only as X, except for\u0000historical studies about that social media platform, because such use would be\u0000factually incorrect.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141252444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信