Combination of machine learning and data envelopment analysis to measure the efficiency of the Tax Service Office.

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2025-02-17 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.2672

Shofinurdin Soffan, Arif Bramantoro, Ahmad A Alzahrani

{"title":"Combination of machine learning and data envelopment analysis to measure the efficiency of the Tax Service Office.","authors":"Shofinurdin Soffan, Arif Bramantoro, Ahmad A Alzahrani","doi":"10.7717/peerj-cs.2672","DOIUrl":null,"url":null,"abstract":"<p><p>The Tax Service Office, a division of the Directorate General of Taxes, is responsible for providing taxation services to the public and collecting taxes. Achieving tax targets efficiently while utilizing available resources is crucial. To assess the performance efficiency of decision-making units (DMUs), data envelopment analysis (DEA) is commonly employed. However, ensuring homogeneity among the DMUs is often necessary and requires the application of machine learning clustering techniques. In this study, we propose a three-stage approach: Clustering, DEA, and Regression, to measure the efficiency of all tax service office units. Real datasets from Indonesian tax service offices were used while maintaining strict confidentiality. Unlike previous studies that considered both input and output variables, we focus solely on clustering input variables, as it leads to more objective efficiency values when combining the results from each cluster. The results revealed three clusters with a silhouette score of 0.304 and Davies Bouldin Index of 1.119, demonstrating the effectiveness of fuzzy c-means clustering. Out of 352 DMUs, 225 or approximately 64% were identified as efficient using DEA calculations. We propose a regression algorithm to measure the efficiency of DMUs in new office planning, by determining the values of input and output variables. The optimization of multilayer perceptrons using genetic algorithms reduced the mean squared error by about 75.75%, from 0.0144 to 0.0035. Based on our findings, the overall performance of tax service offices in Indonesia has reached an efficiency level of 64%. These results show a significant improvement over the previous study, in which only about 18% of offices were considered efficient. The main contribution of this research is the development of a comprehensive framework for evaluating and predicting tax office efficiency, providing valuable insights for improving performance.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e2672"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11888853/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2672","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The Tax Service Office, a division of the Directorate General of Taxes, is responsible for providing taxation services to the public and collecting taxes. Achieving tax targets efficiently while utilizing available resources is crucial. To assess the performance efficiency of decision-making units (DMUs), data envelopment analysis (DEA) is commonly employed. However, ensuring homogeneity among the DMUs is often necessary and requires the application of machine learning clustering techniques. In this study, we propose a three-stage approach: Clustering, DEA, and Regression, to measure the efficiency of all tax service office units. Real datasets from Indonesian tax service offices were used while maintaining strict confidentiality. Unlike previous studies that considered both input and output variables, we focus solely on clustering input variables, as it leads to more objective efficiency values when combining the results from each cluster. The results revealed three clusters with a silhouette score of 0.304 and Davies Bouldin Index of 1.119, demonstrating the effectiveness of fuzzy c-means clustering. Out of 352 DMUs, 225 or approximately 64% were identified as efficient using DEA calculations. We propose a regression algorithm to measure the efficiency of DMUs in new office planning, by determining the values of input and output variables. The optimization of multilayer perceptrons using genetic algorithms reduced the mean squared error by about 75.75%, from 0.0144 to 0.0035. Based on our findings, the overall performance of tax service offices in Indonesia has reached an efficiency level of 64%. These results show a significant improvement over the previous study, in which only about 18% of offices were considered efficient. The main contribution of this research is the development of a comprehensive framework for evaluating and predicting tax office efficiency, providing valuable insights for improving performance.

查看原文本刊更多论文

结合机器学习和数据包络分析来衡量税务服务办公室的效率。

税务服务办公室是税务总局的一个部门，负责向公众提供税务服务和征收税款。在利用现有资源的同时有效实现税收目标至关重要。为了评估决策单位（dmu）的绩效效率，通常采用数据包络分析（DEA）。然而，确保dmu之间的同质性通常是必要的，并且需要应用机器学习聚类技术。在本研究中，我们提出了一个三阶段的方法：聚类、DEA和回归，以衡量所有税务服务单位的效率。在严格保密的情况下，使用了印度尼西亚税务服务办公室的真实数据集。与以往同时考虑输入和输出变量的研究不同，我们只关注输入变量的聚类，因为当结合每个聚类的结果时，它会产生更客观的效率值。结果显示，有3个聚类的剪影得分为0.304，Davies Bouldin指数为1.119，表明模糊c均值聚类的有效性。在352个dmu中，225个（约64%）通过DEA计算被确定为有效的。我们提出了一种回归算法，通过确定输入和输出变量的值来衡量dmu在新办公室规划中的效率。利用遗传算法对多层感知器进行优化，均方误差从0.0144降至0.0035，降低了约75.75%。根据我们的研究结果，印度尼西亚税务服务办公室的整体绩效达到了64%的效率水平。这些结果与之前的研究相比有了显著的改善，在之前的研究中，只有大约18%的办公室被认为是高效的。本研究的主要贡献是开发了一个评估和预测税务办公室效率的综合框架，为提高绩效提供了有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.