Data Science Journal最新文献

筛选
英文 中文
Development of a Job Advertisement Analysis for Assessing Data Science Competencies 开发一种评估数据科学能力的招聘广告分析
Data Science Journal Pub Date : 2023-01-01 DOI: 10.5334/dsj-2023-033
Jan Vogt, Thilo Voigt, Annika Nowak, Jan M. Pawlowski
{"title":"Development of a Job Advertisement Analysis for Assessing Data Science Competencies","authors":"Jan Vogt, Thilo Voigt, Annika Nowak, Jan M. Pawlowski","doi":"10.5334/dsj-2023-033","DOIUrl":"https://doi.org/10.5334/dsj-2023-033","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression 特征重要性的去相关概念和趋势的随机森林回归检测
Data Science Journal Pub Date : 2023-01-01 DOI: 10.5334/dsj-2023-042
Yannick Gerstorfer, Max Hahn-Klimroth, Lena Krieg
{"title":"A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression","authors":"Yannick Gerstorfer, Max Hahn-Klimroth, Lena Krieg","doi":"10.5334/dsj-2023-042","DOIUrl":"https://doi.org/10.5334/dsj-2023-042","url":null,"abstract":"In many studies, we want to determine the influence of certain features on a dependent variable. More specifically, we are interested in the strength of the influence – i.e., is the feature relevant? And, if so, how the feature influences the dependent variable. Recently, data-driven approaches such as random forest regression have found their way into applications (Boulesteix et al. 2012). These models allow researchers to directly derive measures of feature importance, which are a natural indicator of the strength of the influence. For the relevant features, the correlation or rank correlation between the feature and the dependent variable has typically been used to determine the nature of the influence. More recent methods, some of which can also measure interactions between features, are based on a modeling approach. In particular, when machine learning models are used, SHAP scores are a recent and prominent method to determine these trends (Lundberg et al. 2017). In this paper, we introduce a novel notion of feature importance based on the well-studied Gram-Schmidt decorrelation method. Furthermore, we propose two estimators for identifying trends in the data using random forest regression, the so-called absolute and relative traversal rate. We empirically compare the properties of our estimators with those of well-established estimators on a variety of synthetic and real-world datasets.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Ontology-Driven Semantic Enrichment Framework for Open Data Value Creation 面向开放数据价值创造的本体驱动语义充实框架
Data Science Journal Pub Date : 2023-01-01 DOI: 10.5334/dsj-2023-040
Oarabile Sebubi, Irina Zlotnikova, Hlomani Hlomani
{"title":"Ontology-Driven Semantic Enrichment Framework for Open Data Value Creation","authors":"Oarabile Sebubi, Irina Zlotnikova, Hlomani Hlomani","doi":"10.5334/dsj-2023-040","DOIUrl":"https://doi.org/10.5334/dsj-2023-040","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134883791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Framework for Data-Driven Solutions with COVID-19 Illustrations 带有COVID-19插图的数据驱动解决方案框架
Data Science Journal Pub Date : 2021-11-18 DOI: 10.5334/dsj-2021-036
Kassim S. Mwitondi, Raed A. Said
{"title":"A Framework for Data-Driven Solutions with COVID-19 Illustrations","authors":"Kassim S. Mwitondi, Raed A. Said","doi":"10.5334/dsj-2021-036","DOIUrl":"https://doi.org/10.5334/dsj-2021-036","url":null,"abstract":"Data–driven solutions have long been keenly sought after as tools for driving the world’s fast changing business environment, with business leaders seeking to enhance decision making processes within their organisations. In the current era of Big Data, applications of data tools in addressing global, regional and national challenges have steadily grown in almost all fields across the globe. However, working in silos has continued to impede research progress, creating knowledge gaps and challenges across geographical borders, legislations, sectors and fields. There are many examples of the challenges the world faces in tackling global issues, including the complex interactions of the 17 Sustainable Development Goals (SDG) and the spatio–temporal variations of the impact of the on-going COVID–19 pandemic. Both challenges can be seen as non–orthogonal, strongly correlated and requiring an interdisciplinary approach to address. We present a generic framework for filling such gaps, based on two data-driven algorithms that combine data, machine learning and interdisciplinarity to bridge societal knowledge gaps. The novelty of the algorithms derives from their robust built–in mechanics for handling data randomness. Animation applications on structured COVID–19 related data obtained from the European Centre for Disease Prevention and Control (ECDC) and the UK Office of National Statistics exhibit great potentials for decision-support systems. Predictive findings are based on unstructured data–a large COVID–19 X–Ray data, 3181 image files, obtained from GitHub and Kaggle. Our results exhibit consistent performance across samples, resonating with cross-disciplinary discussions on novel paths for data-driven interdisciplinary research. © 2021, Ubiquity Press. All rights reserved.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47906193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Application Profile for Machine-Actionable Data Management Plans 机器可操作数据管理计划的应用程序配置文件
Data Science Journal Pub Date : 2021-10-26 DOI: 10.5334/dsj-2021-032
Tomasz Miksa, P. Walk, Peter Neish, Simon Oblasser, Hollydawn Murray, Tom Renner, Marie-Christine Jacquemot-Perbal, João Cardoso, T. Kvamme, M. Praetzellis, M. Suchánek, Rob W.W. Hooft, Benjamin Faure, H. Moa, A. Hasan, Sarah Jones
{"title":"Application Profile for Machine-Actionable Data Management Plans","authors":"Tomasz Miksa, P. Walk, Peter Neish, Simon Oblasser, Hollydawn Murray, Tom Renner, Marie-Christine Jacquemot-Perbal, João Cardoso, T. Kvamme, M. Praetzellis, M. Suchánek, Rob W.W. Hooft, Benjamin Faure, H. Moa, A. Hasan, Sarah Jones","doi":"10.5334/dsj-2021-032","DOIUrl":"https://doi.org/10.5334/dsj-2021-032","url":null,"abstract":"This paper presents the application profile for machine-actionable data management plans that allows information from traditional data management plans to be expressed in a machine-actionable way. We describe the methodology and research conducted to define the application profile. We also discuss design decisions made during its development and present systems which have adopted it. The application profile was developed in an open and consensus-driven manner within the DMP Common Standards Working Group of the Research Data Alliance and is its official recommendation. TOMASZ MIKSA PAUL WALK PETER NEISH SIMON OBLASSER HOLLYDAWN MURRAY TOM RENNER MARIE-CHRISTINE JACQUEMOT-PERBAL JOÃO CARDOSO TROND KVAMME MARIA PRAETZELLIS MAREK SUCHÁNEK ROB HOOFT BENJAMIN FAURE HANNE MOA ADIL HASAN SARAH JONES","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49529013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Do I-PASS for FAIR? Measuring the FAIR-ness of Research Organizations 我算公平吗?衡量研究机构的公平性
Data Science Journal Pub Date : 2021-10-07 DOI: 10.5334/dsj-2021-030
J. Ringersma, M. Miedema
{"title":"Do I-PASS for FAIR? Measuring the FAIR-ness of Research Organizations","authors":"J. Ringersma, M. Miedema","doi":"10.5334/dsj-2021-030","DOIUrl":"https://doi.org/10.5334/dsj-2021-030","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43886427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open Access and Data Sharing of Nucleotide Sequence Data 核苷酸序列数据的开放获取与数据共享
Data Science Journal Pub Date : 2021-09-15 DOI: 10.5334/dsj-2021-028
Masanori Arita
{"title":"Open Access and Data Sharing of Nucleotide Sequence Data","authors":"Masanori Arita","doi":"10.5334/dsj-2021-028","DOIUrl":"https://doi.org/10.5334/dsj-2021-028","url":null,"abstract":"Open access, free access, and the public domain are different concepts. The International Nucleotide Sequence Database Collaboration (INSDC) permanently guarantees free and unrestricted access to nucleotide sequence data for all researchers, irrespective of nationality or affiliation. However, recent virus information is primarily distributed via the restricted-access repository known as the Global Initiative on Sharing Avian Flu Data (GISAID) supported by the World Health Organization. As compensation for the restriction, GISAID needs to meet its initial goal of benefit-sharing among countries and to curb ongoing vaccine diplomacy campaigns.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47342634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Research Data Management Challenges in Citizen Science Projects and Recommendations for Library Support Services. A Scoping Review and Case Study 公民科学项目中的研究数据管理挑战和图书馆支持服务建议。范围界定综述和案例研究
Data Science Journal Pub Date : 2021-08-18 DOI: 10.5334/dsj-2021-025
J. S. Hansen, Signe Gadegaard, Karsten Kryger Hansen, Asger Væring Larsen, S. Møller, Gertrud Stougård Thomsen, Katrine Flindt Holmstrand
{"title":"Research Data Management Challenges in Citizen Science Projects and Recommendations for Library Support Services. A Scoping Review and Case Study","authors":"J. S. Hansen, Signe Gadegaard, Karsten Kryger Hansen, Asger Væring Larsen, S. Møller, Gertrud Stougård Thomsen, Katrine Flindt Holmstrand","doi":"10.5334/dsj-2021-025","DOIUrl":"https://doi.org/10.5334/dsj-2021-025","url":null,"abstract":"Citizen science (CS) projects are part of a new era of data aggregation and harmonisation that facilitates interconnections between different datasets. Increasing the value and reuse of CS data has received growing attention with the appearance of the FAIR principles and systematic research data management (RDM) practises, which are often promoted by university libraries. However, RDM initiatives in CS appear diversified and if CS have special needs in terms of RDM is unclear. Therefore, the aim of this article is firstly to identify RDM challenges for CS projects and secondly, to discuss how university libraries may support any such challenges. A scoping review and a case study of Danish CS projects were performed to identify RDM challenges. 48 articles were selected for data extraction. Four academic project leaders were interviewed about RDM practices in their CS projects. Challenges and recommendations identified in the review and case study are often not specific for CS. However, finding CS data, engaging specific populations, attributing volunteers and handling sensitive data including health data are some of the challenges requiring special attention by CS project managers. Scientific requirements or national practices do not always encompass the nature of CS projects. Based on the identified challenges, it is recommended that university libraries focus their services on 1) identifying legal and ethical issues that the project managers should be aware of in their projects, 2) elaborating these issues in a Terms of Participation that also specifies data handling and sharing to the citizen scientist, and 3) motivating the project manager to good data handling practises. Adhering to the FAIR principles and good RDM practices in CS projects will continuously secure contextualisation and data quality. High data quality increases the value and reuse of the data and, therefore, the empowerment of the citizen scientists.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41536545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the Application of Principal Component Analysis to Classification Problems 主成分分析在分类问题中的应用
Data Science Journal Pub Date : 2021-08-18 DOI: 10.5334/dsj-2021-026
Jianwei Zheng, C. Rakovski
{"title":"On the Application of Principal Component Analysis to Classification Problems","authors":"Jianwei Zheng, C. Rakovski","doi":"10.5334/dsj-2021-026","DOIUrl":"https://doi.org/10.5334/dsj-2021-026","url":null,"abstract":"Principal Component Analysis (PCA) is a commonly used technique that uses the correlation structure of the original variables to reduce the dimensionality of the data. This reduction is achieved by considering only the first few principal components for a subsequent analysis. The usual inclusion criterion is defined by the proportion of the total variance of the principal components exceeding a predetermined threshold. We show that in certain classification problems, even extremely high inclusion threshold can negatively impact the classification accuracy. The omission of small variance principal components can severely diminish the performance of the models. We noticed this phenomenon in classification analyses using high dimension ECG data where the most common classification methods lost between 1 and 6% of accuracy even when using 99% inclusion threshold. However, this issue can even occur in low dimension data with simple correlation structure as our numerical example shows. We conclude that the exclusion of any principal components should be carefully investigated.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48310066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SASSCAL WebSAPI: A Web Scraping Application Programming Interface to Support Access to SASSCAL’s Weather Data 支持访问SASSCAL天气数据的Web抓取应用程序编程接口
Data Science Journal Pub Date : 2021-07-28 DOI: 10.5334/dsj-2021-024
Tsaone Swaabow Thapelo, M. Namoshe, O. Matsebe, T. Motshegwa, Mary-Jane M. Bopape
{"title":"SASSCAL WebSAPI: A Web Scraping Application Programming Interface to Support Access to SASSCAL’s Weather Data","authors":"Tsaone Swaabow Thapelo, M. Namoshe, O. Matsebe, T. Motshegwa, Mary-Jane M. Bopape","doi":"10.5334/dsj-2021-024","DOIUrl":"https://doi.org/10.5334/dsj-2021-024","url":null,"abstract":"The Southern African Science Service Centre for Climate and Land Management (SASSCAL) was initiated to support regional weather monitoring and climate research in Southern Africa. As a result, several Automatic Weather Stations (AWSs) were implemented to provide numerical weather data within the collaborating countries. Meanwhile, access to the SASSCAL weather data is limited to a number of records that are achieved via a series of clicks. Currently, end users can not efficaciously extract the desired weather values. Thus, the data is not fully utilised by end users. This work contributes with an open source Web Scraping Application Programming Interface (WebSAPI) through an interactive dashboard. The objective is to extend functionalities of the SASSCAL Weathernet for: data extraction, statistical data analysis and visualisation. The SASSCAL WebSAPI was developed using the R statistical environment. It deploys web scraping and data wrangling techniques to support access to SASSCAL weather data. This WebSAPI reduces the risk of human error, and the researcher’s effort of generating desired data sets. The proposed framework for the SASSCAL WebSAPI can be modified for other weather data banks while taking into consideration the legality and ethics of the toolkit.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42327269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信