基于小语言模型监督微调（SFT）开发会计虚拟助手

IF 3.7 Q1 Economics, Econometrics and Finance

Intelligent Systems in Accounting, Finance and Management Pub Date : 2025-07-28 DOI:10.1002/isaf.70011

Mario Zupan

{"title":"基于小语言模型监督微调（SFT）开发会计虚拟助手","authors":"Mario Zupan","doi":"10.1002/isaf.70011","DOIUrl":null,"url":null,"abstract":"<p>The development of an in-house accounting bot—an artificial intelligence (AI) assistant capable of generating internally structured bookkeeping double-entry posting schemes—is explored in this paper. The processes of curating a suitable dataset, selecting, and fine-tuning a seven-billion-parameter language model, categorized as a small language model (SLM) (SLMs typically refer to models with fewer than 10 billion parameters, whereas medium-sized models often have 14B parameters, and large-scale models exceed 70B), are described. A human-evaluated benchmark is also presented to assess model performance. To achieve efficient supervised fine-tuning (SFT), low-rank adaptation (LoRA) was employed, significantly reducing memory requirements by using a small set of trainable parameters while maintaining model expressiveness. The process of backpropagation was further optimized using Unsloth, a high-performance training framework designed for efficient video memory usage and flash attention mechanisms, which accelerates adaptation and reduces memory overhead. The model whose layers were updated is called QwenCoder2.5. It was selected with the presumption that it would be able to learn how to generate and examine bookkeeping patterns generated by accounting information system (AIS) over a 17-year history. This proof of concept aims to support researchers and practitioners exploring the integration of generative AI in accounting by providing insights into both the benefits and challenges of AI-driven automation in bookkeeping tasks. The study demonstrates how an SLM can be fine-tuned on a proprietary dataset of journal posting schemes to assist accountants, auditors, and financial analysts while also facilitating synthetic data generation. Challenges related to AI, data preprocessing, fine-tuning optimization, and evaluation methodology are introduced and examined.</p>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"32 3","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/isaf.70011","citationCount":"0","resultStr":"{\"title\":\"Developing an Accounting Virtual Assistant Through Supervised Fine-Tuning (SFT) of a Small Language Model (SLM)\",\"authors\":\"Mario Zupan\",\"doi\":\"10.1002/isaf.70011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The development of an in-house accounting bot—an artificial intelligence (AI) assistant capable of generating internally structured bookkeeping double-entry posting schemes—is explored in this paper. The processes of curating a suitable dataset, selecting, and fine-tuning a seven-billion-parameter language model, categorized as a small language model (SLM) (SLMs typically refer to models with fewer than 10 billion parameters, whereas medium-sized models often have 14B parameters, and large-scale models exceed 70B), are described. A human-evaluated benchmark is also presented to assess model performance. To achieve efficient supervised fine-tuning (SFT), low-rank adaptation (LoRA) was employed, significantly reducing memory requirements by using a small set of trainable parameters while maintaining model expressiveness. The process of backpropagation was further optimized using Unsloth, a high-performance training framework designed for efficient video memory usage and flash attention mechanisms, which accelerates adaptation and reduces memory overhead. The model whose layers were updated is called QwenCoder2.5. It was selected with the presumption that it would be able to learn how to generate and examine bookkeeping patterns generated by accounting information system (AIS) over a 17-year history. This proof of concept aims to support researchers and practitioners exploring the integration of generative AI in accounting by providing insights into both the benefits and challenges of AI-driven automation in bookkeeping tasks. The study demonstrates how an SLM can be fine-tuned on a proprietary dataset of journal posting schemes to assist accountants, auditors, and financial analysts while also facilitating synthetic data generation. Challenges related to AI, data preprocessing, fine-tuning optimization, and evaluation methodology are introduced and examined.</p>\",\"PeriodicalId\":53473,\"journal\":{\"name\":\"Intelligent Systems in Accounting, Finance and Management\",\"volume\":\"32 3\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/isaf.70011\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Systems in Accounting, Finance and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/isaf.70011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.70011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}

引用次数: 0

摘要

本文探讨了内部会计机器人的开发-一种能够生成内部结构化簿记复式记帐方案的人工智能（AI）助手。描述了管理合适的数据集、选择和微调70亿个参数的语言模型的过程，这些模型被归类为小型语言模型（SLM）（SLM通常指的是参数少于100亿个的模型，而中型模型通常有14B个参数，而大型模型通常有70B个参数）。还提出了一个人类评估的基准来评估模型的性能。为了实现有效的监督微调（SFT），采用了低秩自适应（LoRA），在保持模型表达性的同时，使用少量可训练参数显著降低了内存需求。使用Unsloth进一步优化了反向传播过程，Unsloth是一种高性能训练框架，专为高效的视频内存使用和flash注意机制而设计，可以加速适应并减少内存开销。更新图层的模型称为QwenCoder2.5。选择它的前提是，它将能够学习如何生成和检查会计信息系统（AIS）在17年的历史中生成的簿记模式。这一概念证明旨在通过提供对人工智能驱动的簿记任务自动化的好处和挑战的见解，支持研究人员和实践者探索生成式人工智能在会计中的集成。该研究展示了如何在日志发布方案的专有数据集上微调SLM，以协助会计师、审计师和财务分析师，同时促进合成数据的生成。与人工智能，数据预处理，微调优化和评估方法相关的挑战被介绍和检查。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Developing an Accounting Virtual Assistant Through Supervised Fine-Tuning (SFT) of a Small Language Model (SLM)

查看原文本刊更多论文

Developing an Accounting Virtual Assistant Through Supervised Fine-Tuning (SFT) of a Small Language Model (SLM)

The development of an in-house accounting bot—an artificial intelligence (AI) assistant capable of generating internally structured bookkeeping double-entry posting schemes—is explored in this paper. The processes of curating a suitable dataset, selecting, and fine-tuning a seven-billion-parameter language model, categorized as a small language model (SLM) (SLMs typically refer to models with fewer than 10 billion parameters, whereas medium-sized models often have 14B parameters, and large-scale models exceed 70B), are described. A human-evaluated benchmark is also presented to assess model performance. To achieve efficient supervised fine-tuning (SFT), low-rank adaptation (LoRA) was employed, significantly reducing memory requirements by using a small set of trainable parameters while maintaining model expressiveness. The process of backpropagation was further optimized using Unsloth, a high-performance training framework designed for efficient video memory usage and flash attention mechanisms, which accelerates adaptation and reduces memory overhead. The model whose layers were updated is called QwenCoder2.5. It was selected with the presumption that it would be able to learn how to generate and examine bookkeeping patterns generated by accounting information system (AIS) over a 17-year history. This proof of concept aims to support researchers and practitioners exploring the integration of generative AI in accounting by providing insights into both the benefits and challenges of AI-driven automation in bookkeeping tasks. The study demonstrates how an SLM can be fine-tuned on a proprietary dataset of journal posting schemes to assist accountants, auditors, and financial analysts while also facilitating synthetic data generation. Challenges related to AI, data preprocessing, fine-tuning optimization, and evaluation methodology are introduced and examined.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent Systems in Accounting, Finance and Management Economics, Econometrics and Finance-Finance

CiteScore

6.00

自引率

0.00%

发文量

期刊介绍： Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.