Akash Ghosh, Raghav Jain, Anubhav Jhangra, Sriparna Saha, Adam Jatowt
{"title":"A Survey on Medical Document Summarization: From Machine Learning Techniques to Large Language Models","authors":"Akash Ghosh, Raghav Jain, Anubhav Jhangra, Sriparna Saha, Adam Jatowt","doi":"10.1002/widm.70045","DOIUrl":"https://doi.org/10.1002/widm.70045","url":null,"abstract":"The widespread adoption of the Internet has transformed healthcare by enabling the digital storage, sharing, and management of medical documents. This shift has improved information access, enhanced patient care, and opened new avenues for research and innovation. As the volume of medical data available to clinicians and patients continues to grow, the need for effective summarization methods becomes increasingly critical. Recent breakthroughs in deep learning—particularly the emergence of Large Language Models (LLMs)—have further accelerated progress in this area. This paper provides a comprehensive survey of current techniques and emerging trends in medical document summarization.","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145247627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edward Hengzhou Yan, Feng Guo, Baolong Zhang, Muhammad Rehan, Delei Wang, Zhicheng Xu, Chi Ho Wong, Long Teng, Wai Sze Yip, Suet To
{"title":"Exploring the Application of the Internet of Things in Precision Machining by Comparative Text Mining","authors":"Edward Hengzhou Yan, Feng Guo, Baolong Zhang, Muhammad Rehan, Delei Wang, Zhicheng Xu, Chi Ho Wong, Long Teng, Wai Sze Yip, Suet To","doi":"10.1002/widm.70042","DOIUrl":"https://doi.org/10.1002/widm.70042","url":null,"abstract":"Precision machining, manufacturing components with superior surface quality and dimensional accuracy, increasingly leverages Internet of Things (IoT) technologies. This study employs a novel comparative text mining approach by systematically integrating tree maps, word clouds, keyword network analysis, and Pearson correlation to identify critical linkages between IoT and precision machining. By analyzing a scientific research database (2019–2023), this study highlights IoT's core competencies in enhancing precision machining, including real‐time monitoring, predictive maintenance, and data‐driven optimization. Furthermore, this study proposes actionable strategies, including neural network‐based cyber production systems, blockchain‐integrated IIoT platforms, and machine learning‐driven predictive models, for precision machining. These recommendations empower academia and industry to harness IoT to improve product quality and reduce costs in precision machining.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Algorithmic Development > Text Mining</jats:list-item> <jats:list-item>Fundamental Concepts of Data and Knowledge > Knowledge Representation</jats:list-item> <jats:list-item>Technologies > Data Preprocessing</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144995182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Review of Unlabeled and Imbalanced Data Challenges in Machine Learning: Strategies and Solutions","authors":"Neethu M S, Vinod Chandra S S","doi":"10.1002/widm.70043","DOIUrl":"https://doi.org/10.1002/widm.70043","url":null,"abstract":"Machine learning models often face significant challenges while dealing with imbalanced and unlabeled datasets. Addressing these issues is resource‐intensive, requiring comprehensive strategies to navigate their individual complexities and compounded effects. This article explores the dual challenges imposed by class imbalance and the absence of labeled data, along with their individual complexities and combined effects on the performance of the model. This study addresses approaches for handling the imbalance problem in datasets, such as data‐level, algorithm‐level, and deep learning methods. The survey also examines hybrid methodologies that integrate these strategies to tackle the compounded issues effectively. Emerging techniques like Bayesian graph‐based learning, uncertainty‐guided semi‐supervised learning, and self‐supervised approaches are also considered for their potential to address the scalability, noise filtering, and generalization challenges associated with imbalanced and unlabeled datasets. It identified persistent gaps, such as the lack of robust evaluation metrics and the underutilization of dynamic feature extraction techniques, suggesting solutions with advanced machine learning approaches. Additionally, the need for adaptive techniques, such as dynamic class weighting and data‐driven filtering mechanisms, is highlighted to address limitations and improve the scalability of machine learning models in real‐world applications.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Technologies > Machine Learning</jats:list-item> <jats:list-item>Technologies > Classification</jats:list-item> <jats:list-item>Technologies > Artificial Intelligence</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamed Zamanian, Ahmad Shalbaf, Maryam Parvizi, Roohallah Alizadehsani, Ru‐San Tan, U. Rajendra Acharya
{"title":"Automated Detection of Non‐Alcoholic Fatty Liver Disease Using Histopathological Images: A Systematic Review","authors":"Hamed Zamanian, Ahmad Shalbaf, Maryam Parvizi, Roohallah Alizadehsani, Ru‐San Tan, U. Rajendra Acharya","doi":"10.1002/widm.70044","DOIUrl":"https://doi.org/10.1002/widm.70044","url":null,"abstract":"The global rise in fatty liver diseases is alarming. Traditional diagnostic methods include ultrasound, CT scans, MRI, and liver biopsies, the latter being the gold standard for diagnosis and treatment. Recent advancements in artificial intelligence (AI) have enhanced liver biopsy accuracy, improving treatment outcomes. This study investigates how various AI techniques aid histopathologists, gastroenterologists, and liver specialists in diagnosing and assessing liver damage due to abnormal fat accumulation. We conducted a systematic review of AI applications in evaluating fatty liver diseases, particularly through histopathological image analysis. Our search encompassed five scientific databases: PubMed Central, ACM Digital Library, IEEE Xplore, Scopus, and Google Scholar. We focused on peer‐reviewed articles, conference papers, theses, and book chapters, adhering to specific terminology. The data synthesis followed the PRISMA guidelines, comparing literature based on four key indices and their annual distribution. We evaluated 37 studies utilizing histopathological imaging for the diagnosis of non‐alcoholic fatty liver disease and non‐alcoholic steatohepatitis, including related conditions, metabolic dysfunction‐associated fatty liver disease and metabolic dysfunction‐associated steatohepatitis. The review summarized the performance of various algorithms and explored the distribution of machine learning efforts. Given the complexity of histopathological images, AI algorithms can effectively stratify liver samples affected by fat. Our findings indicate that AI's diagnostic performance closely matches traditional pathological interpretations, offering reliable results for clinical applications.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Application Areas > Health Care</jats:list-item> <jats:list-item>Technologies > Machine Learning</jats:list-item> <jats:list-item>Technologies > Artificial Intelligence</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Survey of Argument Mining in the Educational Domain: Techniques, Applications, and Future Directions","authors":"David Eduardo Pereira, Daniela Thuaslar Simão Gomes, Larissa Lucena Vasconcelos, Claudio Elizio Calazans Campelo","doi":"10.1002/widm.70041","DOIUrl":"https://doi.org/10.1002/widm.70041","url":null,"abstract":"The application of argument mining (AM) in the educational domain is a tool for identifying text structures that express an argument. AM can help evaluate the quality of students' assignments, generate insights into their perspectives, and understand their stance on certain topics. This article examines various aspects of AM in education, including techniques, models, approaches, data representation, language resources, and target artifacts. The findings suggest that AM can enhance learning and teaching processes. However, the study highlights gaps in the literature, particularly in exploring educational artifacts like debates and a lack of research on AM in languages other than English. This paper calls for further research to improve educational outcomes through AM in the educational domain.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Application Areas > Education and Learning</jats:list-item> <jats:list-item>Technologies > Artificial Intelligence</jats:list-item> <jats:list-item>Technologies > Machine Learning</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144900517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hardware Security in the Connected World","authors":"Durba Chatterjee, Shuvodip Maitra, Nimish Mishra, Shubhi Shukla, Debdeep Mukhopadhyay","doi":"10.1002/widm.70034","DOIUrl":"https://doi.org/10.1002/widm.70034","url":null,"abstract":"The rapid proliferation of the Internet of Things (IoT) has integrated billions of smart devices into our daily lives, generating and exchanging vast amounts of critical data. While this connectivity offers significant benefits, it also introduces numerous security vulnerabilities. Addressing these vulnerabilities requires a comprehensive approach to hardware security, one that evaluates the interplay of various attacks and countermeasures to protect these systems. This article provides an extensive overview of hardware security strategies and explores contemporary attacks threatening connected systems. We begin by presenting state‐of‐the‐art side‐channel and fault attacks targeting embedded systems, emphasizing the wide range of IoT targets such as smart home devices, medical implants, industrial control systems, and automotive components. Next, we examine hardware‐based security primitives such as physically unclonable functions (PUFs) and physically related functions (PReFs), which have emerged as promising solutions for establishing a hardware root‐of‐trust in lightweight, resource‐constrained devices. These primitives provide robust alternatives to secure storage of cryptographic keys, essential for protecting the diverse array of IoT devices. Further, we discuss trusted architectures, hardware Trojans, and physical assurance mechanisms, highlighting their roles in enhancing security across different IoT environments. We conclude by exploring the expanse of machine learning‐assisted attacks, which present new and intriguing challenges across all the aforementioned security domains. This article aims to offer valuable insights into the current challenges and future directions of research in hardware security, particularly pertaining to the varied and expanding landscape of IoT devices.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Technologies > Internet of Things</jats:list-item> <jats:list-item>Technologies > Machine Learning</jats:list-item> <jats:list-item>Commercial, Legal, and Ethical Issues > Security and Privacy</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"179 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the Evolution of Feature Extraction Methods in Brain–Computer Interfaces (BCIs): A Systematic Review of Research Progress and Future Trends","authors":"Shweta Thakur, Samriti Thakur, Aryan Rana, Pankaj Kumar, Kranti Kumar, Chien‐Ming Chen","doi":"10.1002/widm.70040","DOIUrl":"https://doi.org/10.1002/widm.70040","url":null,"abstract":"Brain–computer interfaces (BCIs) have emerged as transformative tools, enabling direct communication between the brain and external devices, particularly for individuals with neuromuscular disabilities. This paper provides a comprehensive analysis of feature extraction (FE) methods across all major signal processing domains and various types of BCIs, addressing a significant gap in existing reviews and surveys that often focus exclusively on EEG‐based systems. Also, a detailed comparative analysis of FE techniques, highlighting their formulas, advantages, limitations, and practical applications, is provided. The study not only reviews state‐of‐the‐art methods but also evaluates recent research, identifying trends and gaps in the field. Key insights reveal a growing foundation for invasive BCI research, which, while currently limited, shows promise for future advancements. Based on this analysis, we identify and discuss open challenges such as inter‐subject variability, real‐time processing demands, integration of multiple modalities, and user training and adaptation. Additionally, we examine pressing concerns related to security, privacy, and the transferability of models. By addressing these challenges, this paper aims to guide the development of robust, efficient, and inclusive BCI systems, paving the way for cutting‐edge innovations and real‐world applications.This article is categorized under: <jats:list list-type=\"bullet\"> <jats:list-item>Technologies > Machine Learning</jats:list-item> <jats:list-item>Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"746 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144850845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A State‐Of‐The‐Art Survey of Remote Photoplethysmography for Contactless Health Parameters Sensing","authors":"Shadman Sakib, Zahid Hasan, Nirmalya Roy","doi":"10.1002/widm.70039","DOIUrl":"https://doi.org/10.1002/widm.70039","url":null,"abstract":"Remote photoplethysmography (rPPG) has emerged as a vital technology for remote healthcare, offering non‐invasive and accessible health monitoring through off‐the‐shelf standard video cameras. rPPG facilitates the assessment of key health indicators like heart rate (HR), respiratory rate (RR), and blood oxygen saturation (SpO<jats:sub>2</jats:sub>) from video data, providing advantages in early disease diagnosis and routine health assessments. Recognizing its potential, researchers from multiple fields have substantially progressed rPPG by establishing a strong theoretical basis for signal acquisition and developing signal processing and data‐driven algorithms for rPPG extraction. While most rPPG reviews primarily focus on HR signal extraction methods, our research provides an overview of the potential scope of rPPG. We systematically organize research on rPPG signal acquisition and extraction techniques and provide a critical review of recent rPPG advancements in diverse health parameter estimation. Besides providing a thorough HR estimation review, we incorporate the extraction of derivative signals such as RR and SpO<jats:sub>2</jats:sub> from rPPG data, including their applications and limitations. We also highlight the adaptation of Machine Learning (ML), Deep Learning (DL), and Computer Vision (CV) techniques with rPPG technologies, and accumulate available critical rPPG resources like datasets, codes, and tutorials. Finally, we identify challenges and research gaps, such as motion artifacts, varying lighting conditions, and differences in skin tone. We aim to uplift advancements in rPPG systems by outlining future research directions. Our comprehensive review aims to support the development of robust and safe applications by advancing the field of contactless health parameter sensing.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Application Areas > Health Care</jats:list-item> <jats:list-item>Technologies > Machine Learning</jats:list-item> <jats:list-item>Fundamental Concepts of Data and Knowledge > Human Centricity and User Interaction</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Meta‐Heuristic Optimization for the Multi‐Classification of Chronic Disease: A Review With Machine Learning Perspectives","authors":"Akansha Singh, Nupur Prakash, Anurag Jain","doi":"10.1002/widm.70030","DOIUrl":"https://doi.org/10.1002/widm.70030","url":null,"abstract":"Chronic diseases (CDs) present a global health challenge due to their complex, overlapping symptoms and the limitations of traditional diagnostic methods. Artificial intelligence (AI)‐based techniques, particularly Machine Learning (ML) and Meta‐Heuristic Optimization (MHO) algorithms, have emerged as powerful tools for addressing these challenges. This review examines ML and MHO‐based approaches for the multi‐classification of CDs, highlighting how MHO enhances ML frameworks by addressing key limitations such as class imbalance and suboptimal feature selection. Despite these advancements, MHO‐based methods face challenges, including computational complexity and algorithmic biases, which require further research. By critically analyzing existing studies and identifying gaps, this paper provides a foundation for developing more robust and efficient diagnostic models for CDs.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Application Areas > Health Care</jats:list-item> <jats:list-item>Technologies > Machine Learning</jats:list-item> <jats:list-item>Technologies > Prediction</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"148 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144747364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Guide to Machine Learning Epistemic Ignorance, Hidden Paradoxes, and Other Tensions","authors":"M. Z. Naser","doi":"10.1002/widm.70038","DOIUrl":"https://doi.org/10.1002/widm.70038","url":null,"abstract":"Machine learning (ML) has rapidly scaled in capacity and complexity, yet blind spots persist beneath its high performance façade. In order to shed more light on this argument, this paper presents a curated catalogue of 175 unconventional concepts, each capturing a paradox, tension, or overlooked risk in modern ML practice. Through nine themes spanning data quality, model architecture and training, interpretability and explainability, fairness and bias, model behavior and limitations, evaluation and metrics, multimodal and system integration, practical and societal implications, and causal reasoning, we provide conceptual definitions, illustrative examples, and actionable mitigation strategies. This review equips practitioners and researchers with a structured taxonomy for diagnosing and preempting the brittle edges of modern ML systems and offers a paradox detection and remediation framework (PDRF) to anticipate limitations, design more thoughtful evaluation protocols, and develop ML systems that balance predictive power with epistemic transparency.This article is categorized under: <jats:list list-type=\"simple\"> <jats:list-item>Fundamental Concepts of Data and Knowledge > Data Concepts</jats:list-item> <jats:list-item>Fundamental Concepts of Data and Knowledge > Big Data Mining</jats:list-item> <jats:list-item>Technologies > Computational Intelligence</jats:list-item> </jats:list>","PeriodicalId":501013,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144693602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}