General information metrics for improving AI model training efficiency

IF 13.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jianfeng Xu, Congcong Liu, Xiaoying Tan, Xiaojie Zhu, Anpeng Wu, Huan Wan, Weijun Kong, Chun Li, Hu Xu, Kun Kuang, Fei Wu
{"title":"General information metrics for improving AI model training efficiency","authors":"Jianfeng Xu,&nbsp;Congcong Liu,&nbsp;Xiaoying Tan,&nbsp;Xiaojie Zhu,&nbsp;Anpeng Wu,&nbsp;Huan Wan,&nbsp;Weijun Kong,&nbsp;Chun Li,&nbsp;Hu Xu,&nbsp;Kun Kuang,&nbsp;Fei Wu","doi":"10.1007/s10462-025-11281-z","DOIUrl":null,"url":null,"abstract":"<div><p>To address the growing size of AI model training data and the lack of a universal data selection methodology–factors that significantly drive up training costs–this paper presents the General Information Metrics Evaluation (GIME) method. GIME leverages general information metrics from Objective Information Theory (OIT), including <i>volume</i>, <i>delay</i>, <i>scope</i>, <i>granularity</i>, <i>variety</i>, <i>duration</i>, <i>sampling rate</i>, <i>aggregation</i>, <i>coverage</i>, <i>distortion</i>, and <i>mismatch</i> to optimize dataset selection for training purposes. Comprehensive experiments conducted across diverse domains, such as CTR Prediction, Civil Case Prediction, and Weather Forecasting, demonstrate that GIME effectively preserves model performance while substantially reducing both training time and costs. Additionally, applying GIME within the Judicial AI Program led to a remarkable 39.56% reduction in total model training expenses, underscoring its potential to support efficient and sustainable AI development.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 9","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11281-z.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11281-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

To address the growing size of AI model training data and the lack of a universal data selection methodology–factors that significantly drive up training costs–this paper presents the General Information Metrics Evaluation (GIME) method. GIME leverages general information metrics from Objective Information Theory (OIT), including volume, delay, scope, granularity, variety, duration, sampling rate, aggregation, coverage, distortion, and mismatch to optimize dataset selection for training purposes. Comprehensive experiments conducted across diverse domains, such as CTR Prediction, Civil Case Prediction, and Weather Forecasting, demonstrate that GIME effectively preserves model performance while substantially reducing both training time and costs. Additionally, applying GIME within the Judicial AI Program led to a remarkable 39.56% reduction in total model training expenses, underscoring its potential to support efficient and sustainable AI development.

提高人工智能模型训练效率的通用信息度量
为了解决人工智能模型训练数据规模不断增长和缺乏通用数据选择方法(这些因素显著提高了训练成本)的问题,本文提出了通用信息度量评估(General Information Metrics Evaluation, GIME)方法。GIME利用客观信息论(OIT)中的一般信息度量,包括体积、延迟、范围、粒度、种类、持续时间、采样率、聚合、覆盖、失真和不匹配,以优化用于训练目的的数据集选择。在CTR预测、民事案件预测和天气预报等不同领域进行的综合实验表明,GIME有效地保持了模型的性能,同时大大减少了训练时间和成本。此外,在司法人工智能项目中应用GIME使模型培训总费用显著减少了39.56%,强调了其支持高效和可持续人工智能发展的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信