Lingyu Shao , Jiarui Wang , Lin Li , Yujie Liu , Zhaoqing Liu , Boming Song , Shuyan Li
{"title":"ADEPT: An advanced data exploration and processing tool for clinical data insights","authors":"Lingyu Shao , Jiarui Wang , Lin Li , Yujie Liu , Zhaoqing Liu , Boming Song , Shuyan Li","doi":"10.1016/j.cmpb.2025.108860","DOIUrl":null,"url":null,"abstract":"<div><h3>Background and objective</h3><div>The rapid growth of clinical data creates challenges in analysis and interpretation for medical professionals. To address these issues, we developed the Advanced Data Exploration and Processing Tool (ADEPT), integrating data preprocessing, modeling, visualization, and statistical reporting to resolve common problems like inaccurate terminology, outliers, and missing values.</div></div><div><h3>Methods</h3><div>ADEPT incorporates advanced preprocessing, including standardizing numerical values, detecting outliers via Isolation Forest and DBSCAN, and filling missing data with KNN and MissForest. Tokenized text features are processed through keyword-based classification and K-means clustering. Five machine learning models—Gradient Boosting Machine, Random Forest, Extreme Gradient Boosting, Logistic Regression, and Support Vector Machine—are combined with a dynamic voting mechanism. Performance was assessed using precision, sensitivity, and specificity.</div></div><div><h3>Results</h3><div>ADEPT demonstrated substantial performance improvements, with the Area Under the Curve (AUC) increasing by over 14 %. Key results include enhanced precision, sensitivity, and specificity, validating the tool's ability to extract valuable insights from complex datasets.</div></div><div><h3>Conclusions</h3><div>ADEPT offers a comprehensive solution for automated clinical data analysis, combining rigorous preprocessing with advanced modeling. Its dynamic voting mechanism and integrated tools enhance accuracy and interpretability, addressing critical challenges in clinical data management and decision-making.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"268 ","pages":"Article 108860"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002779","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Background and objective
The rapid growth of clinical data creates challenges in analysis and interpretation for medical professionals. To address these issues, we developed the Advanced Data Exploration and Processing Tool (ADEPT), integrating data preprocessing, modeling, visualization, and statistical reporting to resolve common problems like inaccurate terminology, outliers, and missing values.
Methods
ADEPT incorporates advanced preprocessing, including standardizing numerical values, detecting outliers via Isolation Forest and DBSCAN, and filling missing data with KNN and MissForest. Tokenized text features are processed through keyword-based classification and K-means clustering. Five machine learning models—Gradient Boosting Machine, Random Forest, Extreme Gradient Boosting, Logistic Regression, and Support Vector Machine—are combined with a dynamic voting mechanism. Performance was assessed using precision, sensitivity, and specificity.
Results
ADEPT demonstrated substantial performance improvements, with the Area Under the Curve (AUC) increasing by over 14 %. Key results include enhanced precision, sensitivity, and specificity, validating the tool's ability to extract valuable insights from complex datasets.
Conclusions
ADEPT offers a comprehensive solution for automated clinical data analysis, combining rigorous preprocessing with advanced modeling. Its dynamic voting mechanism and integrated tools enhance accuracy and interpretability, addressing critical challenges in clinical data management and decision-making.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.