{"title":"基于高斯扩散生成的MSDGD框架打破医学诊断中的数据壁垒","authors":"Fengwei Jia , Fengyuan Jia , Huale Li , Shuhan Qi , Hongli Zhu","doi":"10.1016/j.ipm.2025.104130","DOIUrl":null,"url":null,"abstract":"<div><div>Domain knowledge gaps and scarcity of data on rare medical conditions pose significant challenges to leveraging Medical Service Data (MSD) effectively, resulting in compromised research quality and uninformed medical decision-making. To address these limitations, we propose a novel Gaussian diffusion-based MSD generation framework, MSDGD. This framework includes embedding modules such as the Parents-Children Numerical encoding scheme (ParentChild), which encodes column pair interactions, the Random Column Rearrangement algorithm (RamCol), which uncovers hidden multi-column relationships, and a Spatial Dimensional Transformation strategy (DimTrans) for optimal multi-row feature extraction. Additionally, we develop a new UNet model (MSDUNet) and a column relationship predictive process to enhance MSDGD optimization. We perform three sub-experiments to evaluate the effectiveness of MSDGD in generating MSD, and to assess the framework across five metrics, we use (1) a Simulation Table without Column Relationship, (2) a Simulation Table with Column Relationship, and even (3) a complex relationship table derived from real-world U.S. Chronic Disease Indicators datasets. This table contains more than 40 columns and 8x4 categories. The experimental results indicate that MSDGD has a high potential for improving medical service quality, achieving a 97.97% reduction in errors rate, thus promoting research by generating dependable and diversified generation of medical services data.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 4","pages":"Article 104130"},"PeriodicalIF":6.9000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Breaking data barriers in medical diagnosis with MSDGD framework based on Gaussian Diffusion Generation\",\"authors\":\"Fengwei Jia , Fengyuan Jia , Huale Li , Shuhan Qi , Hongli Zhu\",\"doi\":\"10.1016/j.ipm.2025.104130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Domain knowledge gaps and scarcity of data on rare medical conditions pose significant challenges to leveraging Medical Service Data (MSD) effectively, resulting in compromised research quality and uninformed medical decision-making. To address these limitations, we propose a novel Gaussian diffusion-based MSD generation framework, MSDGD. This framework includes embedding modules such as the Parents-Children Numerical encoding scheme (ParentChild), which encodes column pair interactions, the Random Column Rearrangement algorithm (RamCol), which uncovers hidden multi-column relationships, and a Spatial Dimensional Transformation strategy (DimTrans) for optimal multi-row feature extraction. Additionally, we develop a new UNet model (MSDUNet) and a column relationship predictive process to enhance MSDGD optimization. We perform three sub-experiments to evaluate the effectiveness of MSDGD in generating MSD, and to assess the framework across five metrics, we use (1) a Simulation Table without Column Relationship, (2) a Simulation Table with Column Relationship, and even (3) a complex relationship table derived from real-world U.S. Chronic Disease Indicators datasets. This table contains more than 40 columns and 8x4 categories. The experimental results indicate that MSDGD has a high potential for improving medical service quality, achieving a 97.97% reduction in errors rate, thus promoting research by generating dependable and diversified generation of medical services data.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 4\",\"pages\":\"Article 104130\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S030645732500072X\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030645732500072X","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Breaking data barriers in medical diagnosis with MSDGD framework based on Gaussian Diffusion Generation
Domain knowledge gaps and scarcity of data on rare medical conditions pose significant challenges to leveraging Medical Service Data (MSD) effectively, resulting in compromised research quality and uninformed medical decision-making. To address these limitations, we propose a novel Gaussian diffusion-based MSD generation framework, MSDGD. This framework includes embedding modules such as the Parents-Children Numerical encoding scheme (ParentChild), which encodes column pair interactions, the Random Column Rearrangement algorithm (RamCol), which uncovers hidden multi-column relationships, and a Spatial Dimensional Transformation strategy (DimTrans) for optimal multi-row feature extraction. Additionally, we develop a new UNet model (MSDUNet) and a column relationship predictive process to enhance MSDGD optimization. We perform three sub-experiments to evaluate the effectiveness of MSDGD in generating MSD, and to assess the framework across five metrics, we use (1) a Simulation Table without Column Relationship, (2) a Simulation Table with Column Relationship, and even (3) a complex relationship table derived from real-world U.S. Chronic Disease Indicators datasets. This table contains more than 40 columns and 8x4 categories. The experimental results indicate that MSDGD has a high potential for improving medical service quality, achieving a 97.97% reduction in errors rate, thus promoting research by generating dependable and diversified generation of medical services data.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.