Mohammed Alswaitti;Roberto Verdecchia;Grégoire Danoy;Pascal Bouvry;Johnatan E. Pecero
{"title":"使用精英样本训练绿色AI模型","authors":"Mohammed Alswaitti;Roberto Verdecchia;Grégoire Danoy;Pascal Bouvry;Johnatan E. Pecero","doi":"10.1109/TSUSC.2025.3544430","DOIUrl":null,"url":null,"abstract":"The substantial increase in AI model training has considerable environmental implications, requiring energy-efficient and sustainable AI practices. On one hand, data-centric approaches show great potential towards training energy-efficient AI models. On the other hand, instance selection methods demonstrate the capability of training AI models with minimised training sets and negligible performance degradation. Despite the growing interest in both topics, the impact of data-centric training set selection on energy efficiency remains to date unexplored. This paper presents an evolutionary-based sampling framework aimed at (i) identifying elite training samples tailored for datasets and model pairs, (ii) comparing model performance and energy efficiency gains against typical model training practice, and (iii) investigating the feasibility of this framework for fostering sustainable model training practices. To evaluate the proposed framework, we conducted an empirical experiment including 8 commonly used AI classification models and 25 publicly available datasets. The results showcase that by considering 10% elite training samples, the models’ performance can show a 50% improvement and remarkable energy savings of 98% compared to the common training practice. In essence, this study establishes a new benchmark for AI researchers and practitioners interested in improving the environmental sustainability of AI model training via data-centric approaches.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"10 5","pages":"858-872"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10897883","citationCount":"0","resultStr":"{\"title\":\"Training Green AI Models Using Elite Samples\",\"authors\":\"Mohammed Alswaitti;Roberto Verdecchia;Grégoire Danoy;Pascal Bouvry;Johnatan E. Pecero\",\"doi\":\"10.1109/TSUSC.2025.3544430\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The substantial increase in AI model training has considerable environmental implications, requiring energy-efficient and sustainable AI practices. On one hand, data-centric approaches show great potential towards training energy-efficient AI models. On the other hand, instance selection methods demonstrate the capability of training AI models with minimised training sets and negligible performance degradation. Despite the growing interest in both topics, the impact of data-centric training set selection on energy efficiency remains to date unexplored. This paper presents an evolutionary-based sampling framework aimed at (i) identifying elite training samples tailored for datasets and model pairs, (ii) comparing model performance and energy efficiency gains against typical model training practice, and (iii) investigating the feasibility of this framework for fostering sustainable model training practices. To evaluate the proposed framework, we conducted an empirical experiment including 8 commonly used AI classification models and 25 publicly available datasets. The results showcase that by considering 10% elite training samples, the models’ performance can show a 50% improvement and remarkable energy savings of 98% compared to the common training practice. In essence, this study establishes a new benchmark for AI researchers and practitioners interested in improving the environmental sustainability of AI model training via data-centric approaches.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"10 5\",\"pages\":\"858-872\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10897883\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10897883/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10897883/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
The substantial increase in AI model training has considerable environmental implications, requiring energy-efficient and sustainable AI practices. On one hand, data-centric approaches show great potential towards training energy-efficient AI models. On the other hand, instance selection methods demonstrate the capability of training AI models with minimised training sets and negligible performance degradation. Despite the growing interest in both topics, the impact of data-centric training set selection on energy efficiency remains to date unexplored. This paper presents an evolutionary-based sampling framework aimed at (i) identifying elite training samples tailored for datasets and model pairs, (ii) comparing model performance and energy efficiency gains against typical model training practice, and (iii) investigating the feasibility of this framework for fostering sustainable model training practices. To evaluate the proposed framework, we conducted an empirical experiment including 8 commonly used AI classification models and 25 publicly available datasets. The results showcase that by considering 10% elite training samples, the models’ performance can show a 50% improvement and remarkable energy savings of 98% compared to the common training practice. In essence, this study establishes a new benchmark for AI researchers and practitioners interested in improving the environmental sustainability of AI model training via data-centric approaches.