Tyler N. Meyer , Olga Andreeva , Roger D. Weiss , Wei Ding , Iris Shen , Changning Wang , Ping Chen , Tewodros Mulugeta Dagnew
{"title":"利用生成式机器学习增强低数据体制下的神经分子成像分类:酒精使用障碍的HDAC PET/MR成像案例研究","authors":"Tyler N. Meyer , Olga Andreeva , Roger D. Weiss , Wei Ding , Iris Shen , Changning Wang , Ping Chen , Tewodros Mulugeta Dagnew","doi":"10.1016/j.neuri.2025.100225","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Positron Emission Tomography (PET) is a vital modality for investigating brain related disorders. However, data scarcity especially for novel molecular targets like neuroepigenetic enzymes combined with difficult-to-recruit patient populations limits the development of machine learning (ML) models. Our primary objective is to enhance single-subject classification of neuromolecular imaging data and facilitate biomarker discovery. We demonstrate our approach using histone deacetylase (HDAC) PET/MR imaging in Alcohol Use Disorder (AUD).</div></div><div><h3>Methods</h3><div>We propose <em>Catalysis Training pipeline</em>, a framework that augments real imaging data with high-quality synthetic data generated by a Wasserstein Conditional Generative Adversarial Network (WCGAN). Using [<sup>11</sup>C]Martinostat PET/MR imaging, we extracted 1-D standardized uptake value ratio (SUVR) tabular features representing HDAC enzyme expression density across eight cingulate subregions. These were used to train and test ML classifiers, including Support Vector Machine (SVM), XGBoost, and Random Forest, under leave-one-out cross-validation.</div></div><div><h3>Results</h3><div>Integrating synthetic data in the training process improved classification accuracy significantly: +26% for XGBoost and Random Forest (from 59% to 85%), and +18% for SVM (from 70% to 88%). Synthetic samples improved model generalizability. Key hemispheric and subregional cingulate HDAC patterns were also identified as potential biomarkers.</div></div><div><h3>Conclusion</h3><div>Our results demonstrate that generative AI can help overcome data scarcity in low-data regime neuroimaging applications. Catalysis Training provides a scalable strategy to enhance ML-driven biomarker discovery and disease classification, especially for rare or difficult-to-study disorders like AUD. Clinically, cingulate HDAC expression measured by [<sup>11</sup>C]Martinostat PET/MR shows promise as an objective biomarker for AUD, complementing DSM-based diagnosis and informing novel treatment strategies.</div></div>","PeriodicalId":74295,"journal":{"name":"Neuroscience informatics","volume":"5 4","pages":"Article 100225"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing neuromolecular imaging classification in low-data regimes with generative machine learning: A case study in HDAC PET/MR imaging of alcohol use disorder\",\"authors\":\"Tyler N. Meyer , Olga Andreeva , Roger D. Weiss , Wei Ding , Iris Shen , Changning Wang , Ping Chen , Tewodros Mulugeta Dagnew\",\"doi\":\"10.1016/j.neuri.2025.100225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Positron Emission Tomography (PET) is a vital modality for investigating brain related disorders. However, data scarcity especially for novel molecular targets like neuroepigenetic enzymes combined with difficult-to-recruit patient populations limits the development of machine learning (ML) models. Our primary objective is to enhance single-subject classification of neuromolecular imaging data and facilitate biomarker discovery. We demonstrate our approach using histone deacetylase (HDAC) PET/MR imaging in Alcohol Use Disorder (AUD).</div></div><div><h3>Methods</h3><div>We propose <em>Catalysis Training pipeline</em>, a framework that augments real imaging data with high-quality synthetic data generated by a Wasserstein Conditional Generative Adversarial Network (WCGAN). Using [<sup>11</sup>C]Martinostat PET/MR imaging, we extracted 1-D standardized uptake value ratio (SUVR) tabular features representing HDAC enzyme expression density across eight cingulate subregions. These were used to train and test ML classifiers, including Support Vector Machine (SVM), XGBoost, and Random Forest, under leave-one-out cross-validation.</div></div><div><h3>Results</h3><div>Integrating synthetic data in the training process improved classification accuracy significantly: +26% for XGBoost and Random Forest (from 59% to 85%), and +18% for SVM (from 70% to 88%). Synthetic samples improved model generalizability. Key hemispheric and subregional cingulate HDAC patterns were also identified as potential biomarkers.</div></div><div><h3>Conclusion</h3><div>Our results demonstrate that generative AI can help overcome data scarcity in low-data regime neuroimaging applications. Catalysis Training provides a scalable strategy to enhance ML-driven biomarker discovery and disease classification, especially for rare or difficult-to-study disorders like AUD. Clinically, cingulate HDAC expression measured by [<sup>11</sup>C]Martinostat PET/MR shows promise as an objective biomarker for AUD, complementing DSM-based diagnosis and informing novel treatment strategies.</div></div>\",\"PeriodicalId\":74295,\"journal\":{\"name\":\"Neuroscience informatics\",\"volume\":\"5 4\",\"pages\":\"Article 100225\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neuroscience informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772528625000408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroscience informatics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772528625000408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing neuromolecular imaging classification in low-data regimes with generative machine learning: A case study in HDAC PET/MR imaging of alcohol use disorder
Introduction
Positron Emission Tomography (PET) is a vital modality for investigating brain related disorders. However, data scarcity especially for novel molecular targets like neuroepigenetic enzymes combined with difficult-to-recruit patient populations limits the development of machine learning (ML) models. Our primary objective is to enhance single-subject classification of neuromolecular imaging data and facilitate biomarker discovery. We demonstrate our approach using histone deacetylase (HDAC) PET/MR imaging in Alcohol Use Disorder (AUD).
Methods
We propose Catalysis Training pipeline, a framework that augments real imaging data with high-quality synthetic data generated by a Wasserstein Conditional Generative Adversarial Network (WCGAN). Using [11C]Martinostat PET/MR imaging, we extracted 1-D standardized uptake value ratio (SUVR) tabular features representing HDAC enzyme expression density across eight cingulate subregions. These were used to train and test ML classifiers, including Support Vector Machine (SVM), XGBoost, and Random Forest, under leave-one-out cross-validation.
Results
Integrating synthetic data in the training process improved classification accuracy significantly: +26% for XGBoost and Random Forest (from 59% to 85%), and +18% for SVM (from 70% to 88%). Synthetic samples improved model generalizability. Key hemispheric and subregional cingulate HDAC patterns were also identified as potential biomarkers.
Conclusion
Our results demonstrate that generative AI can help overcome data scarcity in low-data regime neuroimaging applications. Catalysis Training provides a scalable strategy to enhance ML-driven biomarker discovery and disease classification, especially for rare or difficult-to-study disorders like AUD. Clinically, cingulate HDAC expression measured by [11C]Martinostat PET/MR shows promise as an objective biomarker for AUD, complementing DSM-based diagnosis and informing novel treatment strategies.
Neuroscience informaticsSurgery, Radiology and Imaging, Information Systems, Neurology, Artificial Intelligence, Computer Science Applications, Signal Processing, Critical Care and Intensive Care Medicine, Health Informatics, Clinical Neurology, Pathology and Medical Technology