Xifeng Guo, Xinlu Wang, Yanshuang Ao, Wei Dai, Ye Gao
{"title":"Short‐term photovoltaic power forecasting with adaptive stochastic configuration network ensemble","authors":"Xifeng Guo, Xinlu Wang, Yanshuang Ao, Wei Dai, Ye Gao","doi":"10.1002/widm.1477","DOIUrl":"https://doi.org/10.1002/widm.1477","url":null,"abstract":"The volatility and intermittency of solar energy seriously restrict the development of the photovoltaic (PV) industry. Accurate forecast of short‐term PV power generation is essential for the optimal balance and dispatch of power plants in the smart grid. This article presents a machine learning approach for analyzing the volt‐ampere characteristics and influential factors on PV data. A correlation analysis is employed to discover some hidden characteristic variables. Then, an adaptive ensemble method with stochastic configuration networks as base models (AE‐SCN) is proposed to construct the PV prediction model, which integrates bagging and adaptive weighted data fusion algorithms. Compared with the original SCN, SCN ensemble (SCNE) and random vector functional‐link network (RVFLN), linear regression model, random forest model and autoregressive integrated moving average (ARMA) model, AE‐SCN performs favorably in the terms of the prediction accuracy.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"32 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74289413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the application of machine learning in astronomy and astrophysics: A text‐mining‐based scientometric analysis","authors":"J. Rodríguez, I. Rodríguez-Rodríguez, Wai Lok Woo","doi":"10.1002/widm.1476","DOIUrl":"https://doi.org/10.1002/widm.1476","url":null,"abstract":"Since the beginning of the 21st century, the fields of astronomy and astrophysics have experienced significant growth at observational and computational levels, leading to the acquisition of increasingly huge volumes of data. In order to process this vast quantity of information, artificial intelligence (AI) techniques are being combined with data mining to detect patterns with the aim of modeling, classifying or predicting the behavior of certain astronomical phenomena or objects. Parallel to the exponential development of the aforementioned techniques, the scientific output related to the application of AI and machine learning (ML) in astronomy and astrophysics has also experienced considerable growth in recent years. Therefore, the increasingly abundant articles make it difficult to monitor this field in terms of which research topics are the most prolific or novel, or which countries or authors are leading them. In this article, a text‐mining‐based scientometric analysis of scientific documents published over the last three decades on the application of AI and ML in the fields of astronomy and astrophysics is presented. The VOSviewer software and data from the Web of Science (WoS) are used to elucidate the evolution of publications in this research field, their distribution by country (including co‐authorship), the most relevant topics addressed, and the most cited elements and most significant co‐citations according to publication source and authorship. The obtained results demonstrate how application of AI/ML to the fields of astronomy/astrophysics represents an established and rapidly growing field of research that is crucial to obtaining scientific understanding of the universe.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"198 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76999307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Abdelsamea, Usama Zidan, Zakaria Senousy, M. Gaber, E. Rakha, Mohammad Ilyas
{"title":"A survey on artificial intelligence in histopathology image analysis","authors":"M. Abdelsamea, Usama Zidan, Zakaria Senousy, M. Gaber, E. Rakha, Mohammad Ilyas","doi":"10.1002/widm.1474","DOIUrl":"https://doi.org/10.1002/widm.1474","url":null,"abstract":"The increasing adoption of the whole slide image (WSI) technology in histopathology has dramatically transformed pathologists' workflow and allowed the use of computer systems in histopathology analysis. Extensive research in Artificial Intelligence (AI) with a huge progress has been conducted resulting in efficient, effective, and robust algorithms for several applications including cancer diagnosis, prognosis, and treatment. These algorithms offer highly accurate predictions but lack transparency, understandability, and actionability. Thus, explainable artificial intelligence (XAI) techniques are needed not only to understand the mechanism behind the decisions made by AI methods and increase user trust but also to broaden the use of AI algorithms in the clinical setting. From the survey of over 150 papers, we explore different AI algorithms that have been applied and contributed to the histopathology image analysis workflow. We first address the workflow of the histopathological process. We present an overview of various learning‐based, XAI, and actionable techniques relevant to deep learning methods in histopathological imaging. We also address the evaluation of XAI methods and the need to ensure their reliability on the field.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"35 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83485803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open source intelligence extraction for terrorism‐related information: A review","authors":"Megha Chaudhary, D. Bansal","doi":"10.1002/widm.1473","DOIUrl":"https://doi.org/10.1002/widm.1473","url":null,"abstract":"In this contemporary era, where a large part of the world population is deluged by extensive use of the internet and social media, terrorists have found it a potential opportunity to execute their vicious plans. They have got a befitting medium to reach out to their targets to spread propaganda, disseminate training content, operate virtually, and further their goals. To restrain such activities, information over the internet in context of terrorism needs to be analyzed to channel it to appropriate measures in combating terrorism. Open Source Intelligence (OSINT) accounts for a felicitous solution to this problem, which is an emerging discipline of leveraging publicly accessible sources of information over the internet by effectively utilizing it to extract intelligence. The process of OSINT extraction is broadly observed to be in three phases (i) Data Acquisition, (ii) Data Enrichment, and (iii) Knowledge Inference. In the context of terrorism, researchers have given noticeable contributions in compliance with these three phases. However, a comprehensive review that delineates these research contributions into an integrated workflow of intelligence extraction has not been found. The paper presents the most current review in OSINT, reflecting how the various state‐of‐the‐art tools and techniques can be applied in extracting terrorism‐related textual information from publicly accessible sources. Various data mining and text analysis‐based techniques, that is, natural language processing, machine learning, and deep learning have been reviewed to extract and evaluate textual data. Additionally, towards the end of the paper, we discuss challenges and gaps observed in different phases of OSINT extraction.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"14 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79776571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corporate investment prediction using a weighted temporal graph neural network","authors":"Jianing Li, X. Yao","doi":"10.1002/widm.1472","DOIUrl":"https://doi.org/10.1002/widm.1472","url":null,"abstract":"Corporate investment is an important part of corporate financial decision‐making and affects the future profit and value of the corporation. Predicting corporate investment provides great significance for capital market investors to understand the future operation and development of a corporation. Many researchers have studied independent prediction methods. However, individual firms imitate each other's investment in the actual decision‐making process. This phenomenon of investment convergence indicates investment correlation among individual firms, which is ignored in these existing methods. In this article, we first identify key variables in multivariate sequences by our designed two‐way fixed effects model for precise corporate network construction. Then, we propose a weighted temporal graph neural network called weighted temporal graph neural network (WTGNN) for graph learning and investment prediction over the corporate network. WTGNN improves the graph convolution capability by weighted sampling with attention and multivariate time series aggregation. We conducted extensive experiments using real‐world financial reporting data. The results show that WTGNN can achieve excellent graph learning performance and outperforms existing methods in the investment prediction task.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"38 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76978723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A geometric framework for outlier detection in high‐dimensional data","authors":"M. Herrmann, Florian Pfisterer, F. Scheipl","doi":"10.1002/widm.1491","DOIUrl":"https://doi.org/10.1002/widm.1491","url":null,"abstract":"Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high‐dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high‐dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high‐dimensional and non‐tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"3 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89413359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhijit Dasgupta, Abhisek Bakshi, Srijani Mukherjee, Kuntal Das, Soumyajeet Talukdar, Pratyayee Chatterjee, Sagnik Mondal, Puspita Das, Subhrojit Ghosh, Archisman Som, Pritha Roy, Rima Kundu, Akash Sarkar, Arnab Biswas, Karnelia Paul, Sujit Basak, Krishnendu Manna, Chinmay Saha, Satinath Mukhopadhyay, Nitai P Bhattacharyya, Rajat K De
{"title":"Epidemiological challenges in pandemic coronavirus disease (COVID-19): Role of artificial intelligence.","authors":"Abhijit Dasgupta, Abhisek Bakshi, Srijani Mukherjee, Kuntal Das, Soumyajeet Talukdar, Pratyayee Chatterjee, Sagnik Mondal, Puspita Das, Subhrojit Ghosh, Archisman Som, Pritha Roy, Rima Kundu, Akash Sarkar, Arnab Biswas, Karnelia Paul, Sujit Basak, Krishnendu Manna, Chinmay Saha, Satinath Mukhopadhyay, Nitai P Bhattacharyya, Rajat K De","doi":"10.1002/widm.1462","DOIUrl":"https://doi.org/10.1002/widm.1462","url":null,"abstract":"<p><p>World is now experiencing a major health calamity due to the coronavirus disease (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus clade 2. The foremost challenge facing the scientific community is to explore the growth and transmission capability of the virus. Use of artificial intelligence (AI), such as deep learning, in (i) rapid disease detection from x-ray or computed tomography (CT) or high-resolution CT (HRCT) images, (ii) accurate prediction of the epidemic patterns and their saturation throughout the globe, (iii) forecasting the disease and psychological impact on the population from social networking data, and (iv) prediction of drug-protein interactions for repurposing the drugs, has attracted much attention. In the present study, we describe the role of various AI-based technologies for rapid and efficient detection from CT images complementing quantitative real-time polymerase chain reaction and immunodiagnostic assays. AI-based technologies to anticipate the current pandemic pattern, prevent the spread of disease, and face mask detection are also discussed. We inspect how the virus transmits depending on different factors. We investigate the deep learning technique to assess the affinity of the most probable drugs to treat COVID-19. This article is categorized under:Application Areas > Health CareAlgorithmic Development > Biological Data MiningTechnologies > Machine Learning.</p>","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"12 4","pages":"e1462"},"PeriodicalIF":7.8,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9350133/pdf/WIDM-12-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10603683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy protection in smart meters using homomorphic encryption: An overview","authors":"Zita Abreu, Lucas Pereira","doi":"10.1002/widm.1469","DOIUrl":"https://doi.org/10.1002/widm.1469","url":null,"abstract":"This article presents an overview of the literature on privacy protection in smart meters with a particular focus on homomorphic encryption (HE). Firstly, we introduce the concept of smart meters, the context in which they are inserted the main concerns and oppositions inherent to its use. Later, an overview of privacy protection is presented, emphasizing the need to safeguard the privacy of smart‐meter users by identifying, describing, and comparing the main approaches that seek to address this problem. Then, two privacy protection approaches based on HE are presented in more detail and additionally we present two possible application scenarios. Finally, the article concludes with a brief overview of the unsolved challenges in HE and the most promising future research directions.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"115 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89177868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data mining in predictive maintenance systems: A taxonomy and systematic review","authors":"Aurora Esteban, A. Zafra, Sebastián Ventura","doi":"10.1002/widm.1471","DOIUrl":"https://doi.org/10.1002/widm.1471","url":null,"abstract":"Predictive maintenance is a field of study whose main objective is to optimize the timing and type of maintenance to perform on various industrial systems. This aim involves maximizing the availability time of the monitored system and minimizing the number of resources used in maintenance. Predictive maintenance is currently undergoing a revolution thanks to advances in industrial systems monitoring within the Industry 4.0 paradigm. Likewise, advances in artificial intelligence and data mining allow the processing of a great amount of data to provide more accurate and advanced predictive models. In this context, many actors have become interested in predictive maintenance research, becoming one of the most active areas of research in computing, where academia and industry converge. The objective of this paper is to conduct a systematic literature review that provides an overview of the current state of research concerning predictive maintenance from a data mining perspective. The review presents a first taxonomy that implies different phases considered in any data mining process to solve a predictive maintenance problem, relating the predictive maintenance tasks with the main data mining tasks to solve them. Finally, the paper presents significant challenges and future research directions in terms of the potential of data mining applied to predictive maintenance.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"12 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88443029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Taxonomy of machine learning paradigms: A data‐centric perspective","authors":"F. Emmert-Streib, M. Dehmer","doi":"10.1002/widm.1470","DOIUrl":"https://doi.org/10.1002/widm.1470","url":null,"abstract":"Machine learning is a field composed of various pillars. Traditionally, supervised learning (SL), unsupervised learning (UL), and reinforcement learning (RL) are the dominating learning paradigms that inspired the field since the 1950s. Based on these, thousands of different methods have been developed during the last seven decades used in nearly all application domains. However, recently, other learning paradigms are gaining momentum which complement and extend the above learning paradigms significantly. These are multi‐label learning (MLL), semi‐supervised learning (SSL), one‐class classification (OCC), positive‐unlabeled learning (PUL), transfer learning (TL), multi‐task learning (MTL), and one‐shot learning (OSL). The purpose of this article is a systematic discussion of these modern learning paradigms and their connection to the traditional ones. We discuss each of the learning paradigms formally by defining key constituents and paying particular attention to the data requirements for allowing an easy connection to applications. That means, we assume a data‐driven perspective. This perspective will also allow a systematic identification of relations between the individual learning paradigms in the form of a learning‐paradigm graph (LP‐graph). Overall, the LP‐graph establishes a taxonomy among 10 different learning paradigms.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"1 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85662535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}