{"title":"Developing an Accounting Virtual Assistant Through Supervised Fine-Tuning (SFT) of a Small Language Model (SLM)","authors":"Mario Zupan","doi":"10.1002/isaf.70011","DOIUrl":null,"url":null,"abstract":"<p>The development of an in-house accounting bot—an artificial intelligence (AI) assistant capable of generating internally structured bookkeeping double-entry posting schemes—is explored in this paper. The processes of curating a suitable dataset, selecting, and fine-tuning a seven-billion-parameter language model, categorized as a small language model (SLM) (SLMs typically refer to models with fewer than 10 billion parameters, whereas medium-sized models often have 14B parameters, and large-scale models exceed 70B), are described. A human-evaluated benchmark is also presented to assess model performance. To achieve efficient supervised fine-tuning (SFT), low-rank adaptation (LoRA) was employed, significantly reducing memory requirements by using a small set of trainable parameters while maintaining model expressiveness. The process of backpropagation was further optimized using Unsloth, a high-performance training framework designed for efficient video memory usage and flash attention mechanisms, which accelerates adaptation and reduces memory overhead. The model whose layers were updated is called QwenCoder2.5. It was selected with the presumption that it would be able to learn how to generate and examine bookkeeping patterns generated by accounting information system (AIS) over a 17-year history. This proof of concept aims to support researchers and practitioners exploring the integration of generative AI in accounting by providing insights into both the benefits and challenges of AI-driven automation in bookkeeping tasks. The study demonstrates how an SLM can be fine-tuned on a proprietary dataset of journal posting schemes to assist accountants, auditors, and financial analysts while also facilitating synthetic data generation. Challenges related to AI, data preprocessing, fine-tuning optimization, and evaluation methodology are introduced and examined.</p>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"32 3","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/isaf.70011","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.70011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}
引用次数: 0
Abstract
The development of an in-house accounting bot—an artificial intelligence (AI) assistant capable of generating internally structured bookkeeping double-entry posting schemes—is explored in this paper. The processes of curating a suitable dataset, selecting, and fine-tuning a seven-billion-parameter language model, categorized as a small language model (SLM) (SLMs typically refer to models with fewer than 10 billion parameters, whereas medium-sized models often have 14B parameters, and large-scale models exceed 70B), are described. A human-evaluated benchmark is also presented to assess model performance. To achieve efficient supervised fine-tuning (SFT), low-rank adaptation (LoRA) was employed, significantly reducing memory requirements by using a small set of trainable parameters while maintaining model expressiveness. The process of backpropagation was further optimized using Unsloth, a high-performance training framework designed for efficient video memory usage and flash attention mechanisms, which accelerates adaptation and reduces memory overhead. The model whose layers were updated is called QwenCoder2.5. It was selected with the presumption that it would be able to learn how to generate and examine bookkeeping patterns generated by accounting information system (AIS) over a 17-year history. This proof of concept aims to support researchers and practitioners exploring the integration of generative AI in accounting by providing insights into both the benefits and challenges of AI-driven automation in bookkeeping tasks. The study demonstrates how an SLM can be fine-tuned on a proprietary dataset of journal posting schemes to assist accountants, auditors, and financial analysts while also facilitating synthetic data generation. Challenges related to AI, data preprocessing, fine-tuning optimization, and evaluation methodology are introduced and examined.
期刊介绍:
Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.