Improved Regression Analysis with Ensemble Pipeline Approach for Applications Across Multiple Domains

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-02-08 DOI:10.1145/3645110

Debajyoty Banik, Rahul Paul, Rajkumar Singh Rathore, Rutvij H. Jhaveri

{"title":"Improved Regression Analysis with Ensemble Pipeline Approach for Applications Across Multiple Domains","authors":"Debajyoty Banik, Rahul Paul, Rajkumar Singh Rathore, Rutvij H. Jhaveri","doi":"10.1145/3645110","DOIUrl":null,"url":null,"abstract":"<p>In this research, we introduce two new machine learning regression methods: the Ensemble Average and the Pipelined Model. These methods aim to enhance traditional regression analysis for predictive tasks and have undergone thorough evaluation across three datasets: Kaggle House Price, Boston House Price, and California Housing, using various performance metrics. The results consistently show that our models outperform existing methods in terms of accuracy and reliability across all three datasets. The Pipelined Model, in particular, is notable for its ability to combine predictions from multiple models, leading to higher accuracy and impressive scalability. This scalability allows for their application in diverse fields like technology, finance, and healthcare. Furthermore, these models can be adapted for real-time and streaming data analysis, making them valuable for applications such as fraud detection, stock market prediction, and IoT sensor data analysis. Enhancements to the models also make them suitable for big data applications, ensuring their relevance for large datasets and distributed computing environments. It’s important to acknowledge some limitations of our models, including potential data biases, specific assumptions, increased complexity, and challenges related to interpretability when using them in practical scenarios. Nevertheless, these innovations advance predictive modeling, and our comprehensive evaluation underscores their potential to provide increased accuracy and reliability across a wide range of applications. The results indicate that the proposed models outperform existing models in terms of accuracy and robustness for all three datasets. The source code can be found at https://huggingface.co/DebajyotyBanik/Ensemble-Pipelined-Regression/tree/main.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"16 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3645110","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this research, we introduce two new machine learning regression methods: the Ensemble Average and the Pipelined Model. These methods aim to enhance traditional regression analysis for predictive tasks and have undergone thorough evaluation across three datasets: Kaggle House Price, Boston House Price, and California Housing, using various performance metrics. The results consistently show that our models outperform existing methods in terms of accuracy and reliability across all three datasets. The Pipelined Model, in particular, is notable for its ability to combine predictions from multiple models, leading to higher accuracy and impressive scalability. This scalability allows for their application in diverse fields like technology, finance, and healthcare. Furthermore, these models can be adapted for real-time and streaming data analysis, making them valuable for applications such as fraud detection, stock market prediction, and IoT sensor data analysis. Enhancements to the models also make them suitable for big data applications, ensuring their relevance for large datasets and distributed computing environments. It’s important to acknowledge some limitations of our models, including potential data biases, specific assumptions, increased complexity, and challenges related to interpretability when using them in practical scenarios. Nevertheless, these innovations advance predictive modeling, and our comprehensive evaluation underscores their potential to provide increased accuracy and reliability across a wide range of applications. The results indicate that the proposed models outperform existing models in terms of accuracy and robustness for all three datasets. The source code can be found at https://huggingface.co/DebajyotyBanik/Ensemble-Pipelined-Regression/tree/main.

查看原文本刊更多论文

利用集合管道法改进回归分析，实现跨领域应用

在这项研究中，我们介绍了两种新的机器学习回归方法：集合平均法和流水线模型。这些方法旨在增强预测任务的传统回归分析，并在三个数据集上进行了全面评估：我们使用各种性能指标对 Kaggle 房价、波士顿房价和加州住房三个数据集进行了全面评估。结果一致表明，在所有三个数据集上，我们的模型在准确性和可靠性方面都优于现有方法。特别是管道化模型，它能够结合多个模型的预测结果，从而获得更高的准确性和令人印象深刻的可扩展性。这种可扩展性使其能够应用于技术、金融和医疗保健等不同领域。此外，这些模型还可用于实时和流数据分析，因此在欺诈检测、股市预测和物联网传感器数据分析等应用中非常有价值。对模型的改进还使其适用于大数据应用，确保其适用于大型数据集和分布式计算环境。必须承认我们的模型存在一些局限性，包括潜在的数据偏差、特定的假设、复杂性的增加以及在实际场景中使用时与可解释性相关的挑战。然而，这些创新推动了预测建模的发展，我们的综合评估强调了它们在广泛应用中提供更高精度和可靠性的潜力。结果表明，就所有三个数据集而言，所提出的模型在准确性和稳健性方面都优于现有模型。源代码见 https://huggingface.co/DebajyotyBanik/Ensemble-Pipelined-Regression/tree/main。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.