{"title":"G-Mix:一个面向平坦最小值的广义混合学习框架","authors":"Xingyu Li;Bo Tang","doi":"10.1109/TAI.2025.3529816","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) have demonstrated promising results in various complex tasks. However, such DNN models face challenges related to over-parameterization, particularly in scenarios where training data are scarce. In response to these challenges and to improve the generalization capabilities of DNNs, the Mixup technique has emerged, which effectively addresses the limitations posed by over-parameterization. Nevertheless, it still produces suboptimal outcomes. Inspired by the successful sharpness-aware minimization (SAM) method, which establishes a connection between the sharpness of the training loss landscape and model generalization, we propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models. The theoretical analysis provided demonstrates how the developed G-Mix framework enhances generalization. Additionally, to further optimize DNN performance with the G-Mix framework, we introduce two novel algorithms: Binary G-Mix (BG-Mix) and Decomposed G-Mix (DG-Mix). These algorithms partition the training data into two subsets based on the sharpness-sensitivity of each example to address the issue of “manifold intrusion” in Mixup. Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models, achieving state-of-the-art performance.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 7","pages":"1870-1883"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"G-Mix: A Generalized Mixup Learning Framework Toward Flat Minima\",\"authors\":\"Xingyu Li;Bo Tang\",\"doi\":\"10.1109/TAI.2025.3529816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) have demonstrated promising results in various complex tasks. However, such DNN models face challenges related to over-parameterization, particularly in scenarios where training data are scarce. In response to these challenges and to improve the generalization capabilities of DNNs, the Mixup technique has emerged, which effectively addresses the limitations posed by over-parameterization. Nevertheless, it still produces suboptimal outcomes. Inspired by the successful sharpness-aware minimization (SAM) method, which establishes a connection between the sharpness of the training loss landscape and model generalization, we propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models. The theoretical analysis provided demonstrates how the developed G-Mix framework enhances generalization. Additionally, to further optimize DNN performance with the G-Mix framework, we introduce two novel algorithms: Binary G-Mix (BG-Mix) and Decomposed G-Mix (DG-Mix). These algorithms partition the training data into two subsets based on the sharpness-sensitivity of each example to address the issue of “manifold intrusion” in Mixup. Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models, achieving state-of-the-art performance.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 7\",\"pages\":\"1870-1883\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10839570/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10839570/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
G-Mix: A Generalized Mixup Learning Framework Toward Flat Minima
Deep neural networks (DNNs) have demonstrated promising results in various complex tasks. However, such DNN models face challenges related to over-parameterization, particularly in scenarios where training data are scarce. In response to these challenges and to improve the generalization capabilities of DNNs, the Mixup technique has emerged, which effectively addresses the limitations posed by over-parameterization. Nevertheless, it still produces suboptimal outcomes. Inspired by the successful sharpness-aware minimization (SAM) method, which establishes a connection between the sharpness of the training loss landscape and model generalization, we propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models. The theoretical analysis provided demonstrates how the developed G-Mix framework enhances generalization. Additionally, to further optimize DNN performance with the G-Mix framework, we introduce two novel algorithms: Binary G-Mix (BG-Mix) and Decomposed G-Mix (DG-Mix). These algorithms partition the training data into two subsets based on the sharpness-sensitivity of each example to address the issue of “manifold intrusion” in Mixup. Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models, achieving state-of-the-art performance.