Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

筛选
英文 中文
A survey on datasets for fairness‐aware machine learning 公平感知机器学习的数据集调查
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-10-01 DOI: 10.1002/widm.1452
Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Eirini Ntoutsi
{"title":"A survey on datasets for fairness‐aware machine learning","authors":"Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Eirini Ntoutsi","doi":"10.1002/widm.1452","DOIUrl":"https://doi.org/10.1002/widm.1452","url":null,"abstract":"As decision‐making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data‐driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness‐aware ML solutions have been proposed which involve fairness‐related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real‐world datasets used for fairness‐aware ML. We focus on tabular data as the most common data representation for fairness‐aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"28 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79017406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Detecting communities using social network analysis in online learning environments: Systematic literature review 在线学习环境中使用社会网络分析检测社区:系统文献综述
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-09-25 DOI: 10.1002/widm.1431
Sahar Yassine, S. Kadry, M. Sicilia
{"title":"Detecting communities using social network analysis in online learning environments: Systematic literature review","authors":"Sahar Yassine, S. Kadry, M. Sicilia","doi":"10.1002/widm.1431","DOIUrl":"https://doi.org/10.1002/widm.1431","url":null,"abstract":"Uncovering community structure has made a significant advancement in explaining, analyzing, and forecasting behaviors and dynamics of networks related to different fields in sociology, criminology, biology, medicine, communication, economics, and academia. Detecting and clustering communities is a powerful step toward identifying the structural properties and the behavioral patterns in social networks. Recently, online learning has been progressively adopted by a lot of educational practices which raise many questions about assessing the learners' engagement, collaboration, and behaviors in the new emerging learning communities. This systematic literature review aims to assess the use of community detection techniques in analyzing the network's structure in online learning environments. It provides a comprehensive overview of the existing research that adopted those techniques with identifying the educational objectives behind their application as well as suggesting possible future research directions. Our analysis covered 65 studies that found in the literature and applied different community discovery techniques on various types of online learning environments to analyze their users' interactions patterns. Our review revealed the potential of this field in improving educational practices and decisions and in utilizing the massive amount of data generated from interacting with those environments. Finally, we highlighted the need to include automated community discovery techniques in online learning environments to facilitate and enhance their use as well as we stressed on the urge for further advance research to uncover a lot of hidden opportunities.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"89 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77808874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Overview of accurate coresets 准确核心集概述
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-09-16 DOI: 10.1002/widm.1429
Ibrahim Jubran, Alaa Maalouf, D. Feldman
{"title":"Overview of accurate coresets","authors":"Ibrahim Jubran, Alaa Maalouf, D. Feldman","doi":"10.1002/widm.1429","DOIUrl":"https://doi.org/10.1002/widm.1429","url":null,"abstract":"A coreset of an input set is its small summarization, such that solving a problem on the coreset as its input, provably yields the same result as solving the same problem on the original (full) set, for a given family of problems (models/classifiers/loss functions). Coresets have been suggested for many fundamental problems, for example, in machine/deep learning, computer vision, databases, and theoretical computer science. This introductory paper was written following requests regarding the many inconsistent coreset definitions, lack of source code, the required deep theoretical background from different fields, and the dense papers that make it hard for beginners to apply and develop coresets. The article provides folklore, classic, and simple results including step‐by‐step proofs and figures, for the simplest (accurate) coresets. Nevertheless, we did not find most of their constructions in the literature. Moreover, we expect that putting them together in a retrospective context would help the reader to grasp current results that usually generalize these fundamental observations. Experts might appreciate the unified notation and comparison table for existing results. Open source code is provided for all presented algorithms, to demonstrate their usage, and to support the readers who are more familiar with programming than mathematics.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"5 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79049243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mining text from natural scene and video images: A survey 从自然场景和视频图像中挖掘文本:综述
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-08-24 DOI: 10.1002/widm.1428
P. Shivakumara, Alireza Alaei, U. Pal
{"title":"Mining text from natural scene and video images: A survey","authors":"P. Shivakumara, Alireza Alaei, U. Pal","doi":"10.1002/widm.1428","DOIUrl":"https://doi.org/10.1002/widm.1428","url":null,"abstract":"In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"101 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85412730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Critical insights into modern hyperspectral image applications through deep learning 通过深度学习对现代高光谱图像应用的关键见解
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-07-21 DOI: 10.1002/widm.1426
Garima Jaiswal, Aruna Sharma, S. Yadav
{"title":"Critical insights into modern hyperspectral image applications through deep learning","authors":"Garima Jaiswal, Aruna Sharma, S. Yadav","doi":"10.1002/widm.1426","DOIUrl":"https://doi.org/10.1002/widm.1426","url":null,"abstract":"Hyperspectral imaging has shown tremendous growth over the past three decades. Hyperspectral imaging was evolved through remote sensing. Along, with the technological enhancements hyperspectral imaging has outgrown, conquering over other various application areas. In addition to it, data enriched data cubes with abundant spectral and spatial information works as perk for capturing, analyzing, reviewing, and interpreting results from data. This review concentrates on emerging application areas of hyperspectral imaging. Emerging application areas are selected in ways where there is a vast scope for future enhancements by exploiting cutting edge technology, that is, deep learning. Applications of hyperspectral imaging techniques in some selected areas (remote sensing, document forgery, history and archaeology conservation, surveillance and security, machine vision for fruit quality inspection, medical imaging) are focused. The review pivots around the publicly available datasets and features used domain wise. This review can act as a baseline for deep learning and machine vision experts, historical geographers, and scholars by providing them a view of how hyperspectral imaging is implemented in multiple domains along with future research prospects.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"102 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80501507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges 超参数优化:基础、算法、最佳实践和公开挑战
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-07-13 DOI: 10.1002/widm.1484
B. Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, M. Becker, A. Boulesteix, Difan Deng, M. Lindauer
{"title":"Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges","authors":"B. Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, M. Becker, A. Boulesteix, Difan Deng, M. Lindauer","doi":"10.1002/widm.1484","DOIUrl":"https://doi.org/10.1002/widm.1484","url":null,"abstract":"Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time‐consuming and irreproducible manual process of trial‐and‐error to find well‐performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"22 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88823751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
Explainable artificial intelligence: an analytical review 可解释的人工智能:分析回顾
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-07-12 DOI: 10.1002/widm.1424
P. Angelov, E. Soares, Richard Jiang, Nicholas I. Arnold, Peter M. Atkinson
{"title":"Explainable artificial intelligence: an analytical review","authors":"P. Angelov, E. Soares, Richard Jiang, Nicholas I. Arnold, Peter M. Atkinson","doi":"10.1002/widm.1424","DOIUrl":"https://doi.org/10.1002/widm.1424","url":null,"abstract":"This paper provides a brief analytical review of the current state‐of‐the‐art in relation to the explainability of artificial intelligence in the context of recent advances in machine learning and deep learning. The paper starts with a brief historical introduction and a taxonomy, and formulates the main challenges in terms of explainability building on the recently formulated National Institute of Standards four principles of explainability. Recently published methods related to the topic are then critically reviewed and analyzed. Finally, future directions for research are suggested.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"70 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78016549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 208
A survey on machine learning based light curve analysis for variable astronomical sources 基于机器学习的变光源光曲线分析研究进展
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-07-04 DOI: 10.1002/widm.1425
Ce Yu, Kun Li, Yanxia Zhang, Jian Xiao, Chenzhou Cui, Yihan Tao, Shanjian Tang, Chao Sun, Chongke Bi
{"title":"A survey on machine learning based light curve analysis for variable astronomical sources","authors":"Ce Yu, Kun Li, Yanxia Zhang, Jian Xiao, Chenzhou Cui, Yihan Tao, Shanjian Tang, Chao Sun, Chongke Bi","doi":"10.1002/widm.1425","DOIUrl":"https://doi.org/10.1002/widm.1425","url":null,"abstract":"The improvement of observation capabilities has expanded the scale of new data available for time domain astronomy research, and the accumulation of observational data continues to accelerate. However, traditional data analysis methods are difficult to fully tap the potential scientific value of all data. Therefore, in the current and future research on light curve analysis, it is inevitable to use artificial intelligence (AI) technology to assist in data analysis in order to obtain as many candidates as possible with scientific research goals. This survey reviews important developments in light curve analysis over the past years, summarizes the basic concepts in machine learning and their applications in light curve analysis and concludes perspectives and challenges for light curve analysis in the near future. The full exploration of light curves of variable celestial objects relies heavily on new techniques derived from promotion of machine learning and deep learning in the astronomical big data era.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"268 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75389550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Trending machine learning models in cyber‐physical building environment: A survey 网络物理建筑环境中机器学习模型的趋势:一项调查
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-06-29 DOI: 10.1002/widm.1422
Zahid Hasan, Nirmalya Roy
{"title":"Trending machine learning models in cyber‐physical building environment: A survey","authors":"Zahid Hasan, Nirmalya Roy","doi":"10.1002/widm.1422","DOIUrl":"https://doi.org/10.1002/widm.1422","url":null,"abstract":"Electricity usage of buildings (including offices, malls, and residential apartments) represents a significant portion of a nation's energy expenditure and carbon footprint. In the United States, the buildings' appliances consume 72% of the total produced electricity approximately. In this regard, cyber‐physical system (CPS) researchers have put forth associated research questions to reduce cyber‐physical building environment energy consumption by minimizing the energy dissipation while securing occupants' comfort. Some of the questions in CPS building include finding the optimal HVAC control, monitoring appliances' energy usage, detecting insulation problems, estimating the occupants' number and activities, managing thermal comfort, intelligently interacting with the smart grid. Various machine learning (ML) applications have been studied in recent CPS researches to improve building energy efficiency by addressing these questions. In this paper, we comprehensively review and report on the contemporary applications of ML algorithms such as deep learning, transfer learning, active learning, reinforcement learning, and other emerging techniques that propose and envision to address the above challenges in the CPS building environment. Finally, we conclude this article by discussing diverse existing open questions and prospective future directions in the CPS building environment research.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"729 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75415831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results 基准研究中的过度乐观以及解释其结果时设计和分析选项的多样性
IF 7.8 2区 计算机科学
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery Pub Date : 2021-06-04 DOI: 10.1002/widm.1441
Chris Niessl, M. Herrmann, Chiara Wiedemann, Giuseppe Casalicchio, Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry, Epidemiology, Lmu Munich, Germany, Department of Statistics
{"title":"Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results","authors":"Chris Niessl, M. Herrmann, Chiara Wiedemann, Giuseppe Casalicchio, Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry, Epidemiology, Lmu Munich, Germany, Department of Statistics","doi":"10.1002/widm.1441","DOIUrl":"https://doi.org/10.1002/widm.1441","url":null,"abstract":"In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over‐optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"168 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85483185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信