Sheng Zhong, Jane Zhang, Jenny Jiao, Hongjian Zhu, Yunzhao Xing, Li Wang
{"title":"预测罕见临床事件的机器学习案例研究:不平衡数据、可解释性和实际考虑因素。","authors":"Sheng Zhong, Jane Zhang, Jenny Jiao, Hongjian Zhu, Yunzhao Xing, Li Wang","doi":"10.1080/10543406.2024.2364722","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate prediction of a rare and clinically important event following study treatment has been crucial in drug development. For instance, the rarity of an adverse event is often commensurate with the seriousness of medical consequences, and delayed detection of the rare adverse event can pose significant or even life-threatening health risks to patients. In this machine learning case study, we demonstrate with an example originated from a real clinical trial setting how to define and solve the rare clinical event prediction problem using machine learning in pharmaceutical industry. The unique contributions of this work include the proposal of a six-step investigation framework that facilitates the communication with non-technical stakeholders and the interpretation of the model performance in terms of practical consequences in the context of patient screenings for conducting a future clinical trial. In terms of machine learning methodology, for data splitting into the training and test sets, we adapt the rare-event stratified split approach (from scikit-learn) to further account for group splitting for multiple records of a patient simultaneously. To handle imbalanced data due to rare events in model training, the cost-sensitive learning approach is employed to give more weights to the minor class and the metrics precision together with recall are used to capture prediction performance instead of the raw accuracy rate. Finally, we demonstrate how to apply the state-of-the-art SHAP values to identify important risk factors to improve model interpretability.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-14"},"PeriodicalIF":1.2000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A machine learning case study to predict rare clinical event of interest: imbalanced data, interpretability, and practical considerations.\",\"authors\":\"Sheng Zhong, Jane Zhang, Jenny Jiao, Hongjian Zhu, Yunzhao Xing, Li Wang\",\"doi\":\"10.1080/10543406.2024.2364722\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Accurate prediction of a rare and clinically important event following study treatment has been crucial in drug development. For instance, the rarity of an adverse event is often commensurate with the seriousness of medical consequences, and delayed detection of the rare adverse event can pose significant or even life-threatening health risks to patients. In this machine learning case study, we demonstrate with an example originated from a real clinical trial setting how to define and solve the rare clinical event prediction problem using machine learning in pharmaceutical industry. The unique contributions of this work include the proposal of a six-step investigation framework that facilitates the communication with non-technical stakeholders and the interpretation of the model performance in terms of practical consequences in the context of patient screenings for conducting a future clinical trial. In terms of machine learning methodology, for data splitting into the training and test sets, we adapt the rare-event stratified split approach (from scikit-learn) to further account for group splitting for multiple records of a patient simultaneously. To handle imbalanced data due to rare events in model training, the cost-sensitive learning approach is employed to give more weights to the minor class and the metrics precision together with recall are used to capture prediction performance instead of the raw accuracy rate. Finally, we demonstrate how to apply the state-of-the-art SHAP values to identify important risk factors to improve model interpretability.</p>\",\"PeriodicalId\":54870,\"journal\":{\"name\":\"Journal of Biopharmaceutical Statistics\",\"volume\":\" \",\"pages\":\"1-14\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2024-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biopharmaceutical Statistics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/10543406.2024.2364722\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biopharmaceutical Statistics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/10543406.2024.2364722","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
A machine learning case study to predict rare clinical event of interest: imbalanced data, interpretability, and practical considerations.
Accurate prediction of a rare and clinically important event following study treatment has been crucial in drug development. For instance, the rarity of an adverse event is often commensurate with the seriousness of medical consequences, and delayed detection of the rare adverse event can pose significant or even life-threatening health risks to patients. In this machine learning case study, we demonstrate with an example originated from a real clinical trial setting how to define and solve the rare clinical event prediction problem using machine learning in pharmaceutical industry. The unique contributions of this work include the proposal of a six-step investigation framework that facilitates the communication with non-technical stakeholders and the interpretation of the model performance in terms of practical consequences in the context of patient screenings for conducting a future clinical trial. In terms of machine learning methodology, for data splitting into the training and test sets, we adapt the rare-event stratified split approach (from scikit-learn) to further account for group splitting for multiple records of a patient simultaneously. To handle imbalanced data due to rare events in model training, the cost-sensitive learning approach is employed to give more weights to the minor class and the metrics precision together with recall are used to capture prediction performance instead of the raw accuracy rate. Finally, we demonstrate how to apply the state-of-the-art SHAP values to identify important risk factors to improve model interpretability.
期刊介绍:
The Journal of Biopharmaceutical Statistics, a rapid publication journal, discusses quality applications of statistics in biopharmaceutical research and development. Now publishing six times per year, it includes expositions of statistical methodology with immediate applicability to biopharmaceutical research in the form of full-length and short manuscripts, review articles, selected/invited conference papers, short articles, and letters to the editor. Addressing timely and provocative topics important to the biostatistical profession, the journal covers:
Drug, device, and biological research and development;
Drug screening and drug design;
Assessment of pharmacological activity;
Pharmaceutical formulation and scale-up;
Preclinical safety assessment;
Bioavailability, bioequivalence, and pharmacokinetics;
Phase, I, II, and III clinical development including complex innovative designs;
Premarket approval assessment of clinical safety;
Postmarketing surveillance;
Big data and artificial intelligence and applications.