LLMOverTab: Tabular data augmentation with language model-driven oversampling

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2024-11-28 DOI:10.1016/j.eswa.2024.125852

Tokimasa Isomura , Ryotaro Shimizu , Goto Masayuki

{"title":"LLMOverTab: Tabular data augmentation with language model-driven oversampling","authors":"Tokimasa Isomura , Ryotaro Shimizu , Goto Masayuki","doi":"10.1016/j.eswa.2024.125852","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, Large Language Model (LLM) have seen significant advancements, attracting attention for their applications in various fields. These models have shown promising results in handling tabular data, especially in cases with limited datasets, by leveraging pre-trained knowledge. However, their effectiveness in addressing imbalanced data in tabular formats is less explored. To bridge this gap, our study introduces LLMOverTab, a novel approach using LLMs for oversampling in imbalanced tabular data. We conducted comprehensive experiments on diverse tabular datasets to assess the effectiveness of LLMOverTab, demonstrating its potential in improving the handling of imbalanced data. The study also explores application of LLMOverTab in zero-shot and few-shot learning contexts, providing insights into its adaptability. Additionally, we analyze the oversampled data, offering reflections on the quality of generated samples. Our research not only showcases the utility of LLMOverTab in managing imbalanced tabular data, but also opens new avenues for the application of language models in various tasks of tabular data. This study adds to the increasing interest in applying LLMs to various task domains. It provides new perspectives for the innovative use of LLMs in structured tabular data fields, highlighting their potential in a range of applications.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"264 ","pages":"Article 125852"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424027192","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, Large Language Model (LLM) have seen significant advancements, attracting attention for their applications in various fields. These models have shown promising results in handling tabular data, especially in cases with limited datasets, by leveraging pre-trained knowledge. However, their effectiveness in addressing imbalanced data in tabular formats is less explored. To bridge this gap, our study introduces LLMOverTab, a novel approach using LLMs for oversampling in imbalanced tabular data. We conducted comprehensive experiments on diverse tabular datasets to assess the effectiveness of LLMOverTab, demonstrating its potential in improving the handling of imbalanced data. The study also explores application of LLMOverTab in zero-shot and few-shot learning contexts, providing insights into its adaptability. Additionally, we analyze the oversampled data, offering reflections on the quality of generated samples. Our research not only showcases the utility of LLMOverTab in managing imbalanced tabular data, but also opens new avenues for the application of language models in various tasks of tabular data. This study adds to the increasing interest in applying LLMs to various task domains. It provides new perspectives for the innovative use of LLMs in structured tabular data fields, highlighting their potential in a range of applications.

查看原文本刊更多论文

LLMOverTab：通过语言模型驱动的过采样增强表格数据

近年来，大型语言模型（Large Language Model， LLM）在各个领域的应用都取得了长足的进步。这些模型通过利用预先训练的知识，在处理表格数据方面显示出有希望的结果，特别是在数据集有限的情况下。然而，它们在处理表格格式的不平衡数据方面的有效性却很少被探索。为了弥补这一差距，我们的研究引入了LLMOverTab，这是一种使用llm对不平衡表格数据进行过采样的新方法。我们在不同的表格数据集上进行了全面的实验来评估LLMOverTab的有效性，证明了它在改善不平衡数据处理方面的潜力。本研究还探讨了LLMOverTab在零射击和少射击学习环境中的应用，提供了对其适应性的见解。此外，我们还分析了过采样数据，对生成样本的质量进行了反思。我们的研究不仅展示了LLMOverTab在管理不平衡表格数据方面的实用性，而且为语言模型在表格数据的各种任务中的应用开辟了新的途径。这项研究增加了将法学硕士应用于各种任务领域的兴趣。它为法学硕士在结构化表格数据领域的创新使用提供了新的视角，突出了它们在一系列应用中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.