Data Science Journal最新文献_第3页

Development of a Job Advertisement Analysis for Assessing Data Science Competencies 开发一种评估数据科学能力的招聘广告分析

Data Science Journal Pub Date : 2023-01-01 DOI: 10.5334/dsj-2023-033

Jan Vogt, Thilo Voigt, Annika Nowak, Jan M. Pawlowski

引用次数: 0

A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression 特征重要性的去相关概念和趋势的随机森林回归检测

Data Science Journal Pub Date : 2023-01-01 DOI: 10.5334/dsj-2023-042

Yannick Gerstorfer, Max Hahn-Klimroth, Lena Krieg

{"title":"A Notion of Feature Importance by Decorrelation and Detection of Trends by Random Forest Regression","authors":"Yannick Gerstorfer, Max Hahn-Klimroth, Lena Krieg","doi":"10.5334/dsj-2023-042","DOIUrl":"https://doi.org/10.5334/dsj-2023-042","url":null,"abstract":"In many studies, we want to determine the influence of certain features on a dependent variable. More specifically, we are interested in the strength of the influence – i.e., is the feature relevant? And, if so, how the feature influences the dependent variable. Recently, data-driven approaches such as random forest regression have found their way into applications (Boulesteix et al. 2012). These models allow researchers to directly derive measures of feature importance, which are a natural indicator of the strength of the influence. For the relevant features, the correlation or rank correlation between the feature and the dependent variable has typically been used to determine the nature of the influence. More recent methods, some of which can also measure interactions between features, are based on a modeling approach. In particular, when machine learning models are used, SHAP scores are a recent and prominent method to determine these trends (Lundberg et al. 2017). In this paper, we introduce a novel notion of feature importance based on the well-studied Gram-Schmidt decorrelation method. Furthermore, we propose two estimators for identifying trends in the data using random forest regression, the so-called absolute and relative traversal rate. We empirically compare the properties of our estimators with those of well-established estimators on a variety of synthetic and real-world datasets.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135784269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Ontology-Driven Semantic Enrichment Framework for Open Data Value Creation 面向开放数据价值创造的本体驱动语义充实框架

Data Science Journal Pub Date : 2023-01-01 DOI: 10.5334/dsj-2023-040

Oarabile Sebubi, Irina Zlotnikova, Hlomani Hlomani

引用次数: 0

A Framework for Data-Driven Solutions with COVID-19 Illustrations 带有COVID-19插图的数据驱动解决方案框架

Data Science Journal Pub Date : 2021-11-18 DOI: 10.5334/dsj-2021-036

Kassim S. Mwitondi, Raed A. Said

{"title":"A Framework for Data-Driven Solutions with COVID-19 Illustrations","authors":"Kassim S. Mwitondi, Raed A. Said","doi":"10.5334/dsj-2021-036","DOIUrl":"https://doi.org/10.5334/dsj-2021-036","url":null,"abstract":"Data–driven solutions have long been keenly sought after as tools for driving the world’s fast changing business environment, with business leaders seeking to enhance decision making processes within their organisations. In the current era of Big Data, applications of data tools in addressing global, regional and national challenges have steadily grown in almost all fields across the globe. However, working in silos has continued to impede research progress, creating knowledge gaps and challenges across geographical borders, legislations, sectors and fields. There are many examples of the challenges the world faces in tackling global issues, including the complex interactions of the 17 Sustainable Development Goals (SDG) and the spatio–temporal variations of the impact of the on-going COVID–19 pandemic. Both challenges can be seen as non–orthogonal, strongly correlated and requiring an interdisciplinary approach to address. We present a generic framework for filling such gaps, based on two data-driven algorithms that combine data, machine learning and interdisciplinarity to bridge societal knowledge gaps. The novelty of the algorithms derives from their robust built–in mechanics for handling data randomness. Animation applications on structured COVID–19 related data obtained from the European Centre for Disease Prevention and Control (ECDC) and the UK Office of National Statistics exhibit great potentials for decision-support systems. Predictive findings are based on unstructured data–a large COVID–19 X–Ray data, 3181 image files, obtained from GitHub and Kaggle. Our results exhibit consistent performance across samples, resonating with cross-disciplinary discussions on novel paths for data-driven interdisciplinary research. © 2021, Ubiquity Press. All rights reserved.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47906193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Application Profile for Machine-Actionable Data Management Plans 机器可操作数据管理计划的应用程序配置文件

Data Science Journal Pub Date : 2021-10-26 DOI: 10.5334/dsj-2021-032

Tomasz Miksa, P. Walk, Peter Neish, Simon Oblasser, Hollydawn Murray, Tom Renner, Marie-Christine Jacquemot-Perbal, João Cardoso, T. Kvamme, M. Praetzellis, M. Suchánek, Rob W.W. Hooft, Benjamin Faure, H. Moa, A. Hasan, Sarah Jones

引用次数: 5

Do I-PASS for FAIR? Measuring the FAIR-ness of Research Organizations 我算公平吗?衡量研究机构的公平性

Data Science Journal Pub Date : 2021-10-07 DOI: 10.5334/dsj-2021-030

J. Ringersma, M. Miedema

引用次数: 0

Open Access and Data Sharing of Nucleotide Sequence Data 核苷酸序列数据的开放获取与数据共享

Data Science Journal Pub Date : 2021-09-15 DOI: 10.5334/dsj-2021-028

Masanori Arita

引用次数: 4

Research Data Management Challenges in Citizen Science Projects and Recommendations for Library Support Services. A Scoping Review and Case Study 公民科学项目中的研究数据管理挑战和图书馆支持服务建议。范围界定综述和案例研究

Data Science Journal Pub Date : 2021-08-18 DOI: 10.5334/dsj-2021-025

J. S. Hansen, Signe Gadegaard, Karsten Kryger Hansen, Asger Væring Larsen, S. Møller, Gertrud Stougård Thomsen, Katrine Flindt Holmstrand

{"title":"Research Data Management Challenges in Citizen Science Projects and Recommendations for Library Support Services. A Scoping Review and Case Study","authors":"J. S. Hansen, Signe Gadegaard, Karsten Kryger Hansen, Asger Væring Larsen, S. Møller, Gertrud Stougård Thomsen, Katrine Flindt Holmstrand","doi":"10.5334/dsj-2021-025","DOIUrl":"https://doi.org/10.5334/dsj-2021-025","url":null,"abstract":"Citizen science (CS) projects are part of a new era of data aggregation and harmonisation that facilitates interconnections between different datasets. Increasing the value and reuse of CS data has received growing attention with the appearance of the FAIR principles and systematic research data management (RDM) practises, which are often promoted by university libraries. However, RDM initiatives in CS appear diversified and if CS have special needs in terms of RDM is unclear. Therefore, the aim of this article is firstly to identify RDM challenges for CS projects and secondly, to discuss how university libraries may support any such challenges. A scoping review and a case study of Danish CS projects were performed to identify RDM challenges. 48 articles were selected for data extraction. Four academic project leaders were interviewed about RDM practices in their CS projects. Challenges and recommendations identified in the review and case study are often not specific for CS. However, finding CS data, engaging specific populations, attributing volunteers and handling sensitive data including health data are some of the challenges requiring special attention by CS project managers. Scientific requirements or national practices do not always encompass the nature of CS projects. Based on the identified challenges, it is recommended that university libraries focus their services on 1) identifying legal and ethical issues that the project managers should be aware of in their projects, 2) elaborating these issues in a Terms of Participation that also specifies data handling and sharing to the citizen scientist, and 3) motivating the project manager to good data handling practises. Adhering to the FAIR principles and good RDM practices in CS projects will continuously secure contextualisation and data quality. High data quality increases the value and reuse of the data and, therefore, the empowerment of the citizen scientists.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41536545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

On the Application of Principal Component Analysis to Classification Problems 主成分分析在分类问题中的应用

Data Science Journal Pub Date : 2021-08-18 DOI: 10.5334/dsj-2021-026

Jianwei Zheng, C. Rakovski

引用次数: 1

SASSCAL WebSAPI: A Web Scraping Application Programming Interface to Support Access to SASSCAL’s Weather Data 支持访问SASSCAL天气数据的Web抓取应用程序编程接口

Data Science Journal Pub Date : 2021-07-28 DOI: 10.5334/dsj-2021-024

Tsaone Swaabow Thapelo, M. Namoshe, O. Matsebe, T. Motshegwa, Mary-Jane M. Bopape

{"title":"SASSCAL WebSAPI: A Web Scraping Application Programming Interface to Support Access to SASSCAL’s Weather Data","authors":"Tsaone Swaabow Thapelo, M. Namoshe, O. Matsebe, T. Motshegwa, Mary-Jane M. Bopape","doi":"10.5334/dsj-2021-024","DOIUrl":"https://doi.org/10.5334/dsj-2021-024","url":null,"abstract":"The Southern African Science Service Centre for Climate and Land Management (SASSCAL) was initiated to support regional weather monitoring and climate research in Southern Africa. As a result, several Automatic Weather Stations (AWSs) were implemented to provide numerical weather data within the collaborating countries. Meanwhile, access to the SASSCAL weather data is limited to a number of records that are achieved via a series of clicks. Currently, end users can not efficaciously extract the desired weather values. Thus, the data is not fully utilised by end users. This work contributes with an open source Web Scraping Application Programming Interface (WebSAPI) through an interactive dashboard. The objective is to extend functionalities of the SASSCAL Weathernet for: data extraction, statistical data analysis and visualisation. The SASSCAL WebSAPI was developed using the R statistical environment. It deploys web scraping and data wrangling techniques to support access to SASSCAL weather data. This WebSAPI reduces the risk of human error, and the researcher’s effort of generating desired data sets. The proposed framework for the SASSCAL WebSAPI can be modified for other weather data banks while taking into consideration the legality and ethics of the toolkit.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42327269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1