On-Line Big Data Processing Using Python Libraries for Multiple Linear Regression in Complex Environment

D. Jovanoska, Gjorgji Mancheski
{"title":"On-Line Big Data Processing Using Python Libraries for Multiple Linear Regression in Complex Environment","authors":"D. Jovanoska, Gjorgji Mancheski","doi":"10.46541/978-86-7233-406-7_228","DOIUrl":null,"url":null,"abstract":"The phenomenon called Big Data today is one of the most significant and least visible consequences of the development of technology and the Internet. Namely, the data generated by today's globally connected world is growing at an exponential rate and they are a real \"gold mine\" for those users who know how to correctly interpret such data and make successful decisions based on them. Data analysis and processing is one of the most important components of a large data system, and in this branch of data science the most popular is the Python programming language, which provides its users with a large number of constantly maintained program libraries and developing environments. The most important thing for legal entities and individuals is that almost all program libraries and functions provided by this programming language come with free licenses and possess open code, maintained and quality technical documentation, which provides each company with significant money savings and time. This research paper is dedicated to the possibility of determining and creating a multi regression model of large amounts of data by using Python, on the basis of large amounts of data provided by two market retailers in order to display a multi regression model and assess its predictive power. Because the number of variables is large, several models have been made in this research paper and a comparative analysis of the different models has been made, which shows that Python is a good tool that can be used repeatedly to select different variants and evaluate the resulting model for which a graphical interface can be made and would be much more acceptable as an end user, can be placed on a server on the Internet or on a modern Cloud platform and used by users as an on-demand concept and the results can be embedded in end-user interfaces and models made in this way (with dynamic data extraction)can be used in BI and machine learning processes.","PeriodicalId":63896,"journal":{"name":"战略管理","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"战略管理","FirstCategoryId":"1089","ListUrlMain":"https://doi.org/10.46541/978-86-7233-406-7_228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The phenomenon called Big Data today is one of the most significant and least visible consequences of the development of technology and the Internet. Namely, the data generated by today's globally connected world is growing at an exponential rate and they are a real "gold mine" for those users who know how to correctly interpret such data and make successful decisions based on them. Data analysis and processing is one of the most important components of a large data system, and in this branch of data science the most popular is the Python programming language, which provides its users with a large number of constantly maintained program libraries and developing environments. The most important thing for legal entities and individuals is that almost all program libraries and functions provided by this programming language come with free licenses and possess open code, maintained and quality technical documentation, which provides each company with significant money savings and time. This research paper is dedicated to the possibility of determining and creating a multi regression model of large amounts of data by using Python, on the basis of large amounts of data provided by two market retailers in order to display a multi regression model and assess its predictive power. Because the number of variables is large, several models have been made in this research paper and a comparative analysis of the different models has been made, which shows that Python is a good tool that can be used repeatedly to select different variants and evaluate the resulting model for which a graphical interface can be made and would be much more acceptable as an end user, can be placed on a server on the Internet or on a modern Cloud platform and used by users as an on-demand concept and the results can be embedded in end-user interfaces and models made in this way (with dynamic data extraction)can be used in BI and machine learning processes.
基于Python库的复杂环境下多元线性回归在线大数据处理
今天被称为“大数据”的现象是科技和互联网发展最重要、最不明显的后果之一。也就是说,当今全球互联的世界所产生的数据正以指数级的速度增长,对于那些知道如何正确解释这些数据并根据这些数据做出成功决策的用户来说,这些数据是真正的“金矿”。数据分析和处理是大型数据系统中最重要的组成部分之一,在数据科学的这个分支中,最受欢迎的是Python编程语言,它为用户提供了大量不断维护的程序库和开发环境。对于法人实体和个人来说,最重要的是这种编程语言提供的几乎所有程序库和函数都有免费的许可证,并拥有开放的代码、维护和高质量的技术文档,这为每个公司节省了大量的金钱和时间。本研究论文致力于在两个市场零售商提供的大量数据的基础上,利用Python确定和创建大量数据的多元回归模型的可能性,以展示多元回归模型并评估其预测能力。由于变量的数量很大,本研究论文中建立了几个模型,并对不同的模型进行了比较分析,这表明Python是一个很好的工具,可以反复使用它来选择不同的变量,并评估最终模型,从而可以制作图形界面,并且更容易被最终用户接受。可以放置在互联网上的服务器或现代云平台上,并作为按需概念由用户使用,结果可以嵌入到最终用户界面中,以这种方式制作的模型(具有动态数据提取)可以用于BI和机器学习过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
49
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信