A study of variance and its utility in machine learning

Q3 Mathematics

International Journal of Sensors, Wireless Communications and Control Pub Date : 2022-06-17 DOI:10.2174/2210327912666220617153359

K. G. Sharma, Yashpal Singh

{"title":"A study of variance and its utility in machine learning","authors":"K. G. Sharma, Yashpal Singh","doi":"10.2174/2210327912666220617153359","DOIUrl":null,"url":null,"abstract":"\n\nWith the availability of inexpensive devices like storage and data sensors, collecting and storing data is now simpler than ever. Biotechnology, pharmacy, business, online marketing websites, Twitter, Facebook, and blogs are some of the sources of the data. Understanding the data is crucial today as every business activity from private to public, from hospitals to mega mart benefits from this. However, due to the explosive volume of data, it is becoming almost impossible to decipher the data manually. We are creating 2.5 quintillion bytes per day in 2022. One quintillion byte is one billion Gigabytes. Approximately, 90% of the total data is created in the last two years. Naturally, an automatic technique to analyze the data is a necessity of today. Therefore, data mining is performed with the help of machine learning tools to analyze and understand the data. Data Mining and Machine Learning are heavily dependent on statistical tools and techniques. Therefore, we sometimes use the term – “Statistical Learning” for Machine Learning. Many machine learning techniques exist in the literature and improvement is a continuous process as no model is perfect. This paper examines the influence of variance, a statistical concept, on various machine learning approaches and tries to understand how this concept can be used to improve performance.\n","PeriodicalId":37686,"journal":{"name":"International Journal of Sensors, Wireless Communications and Control","volume":"71 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Sensors, Wireless Communications and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/2210327912666220617153359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 1

Abstract

With the availability of inexpensive devices like storage and data sensors, collecting and storing data is now simpler than ever. Biotechnology, pharmacy, business, online marketing websites, Twitter, Facebook, and blogs are some of the sources of the data. Understanding the data is crucial today as every business activity from private to public, from hospitals to mega mart benefits from this. However, due to the explosive volume of data, it is becoming almost impossible to decipher the data manually. We are creating 2.5 quintillion bytes per day in 2022. One quintillion byte is one billion Gigabytes. Approximately, 90% of the total data is created in the last two years. Naturally, an automatic technique to analyze the data is a necessity of today. Therefore, data mining is performed with the help of machine learning tools to analyze and understand the data. Data Mining and Machine Learning are heavily dependent on statistical tools and techniques. Therefore, we sometimes use the term – “Statistical Learning” for Machine Learning. Many machine learning techniques exist in the literature and improvement is a continuous process as no model is perfect. This paper examines the influence of variance, a statistical concept, on various machine learning approaches and tries to understand how this concept can be used to improve performance.

查看原文本刊更多论文

方差及其在机器学习中的应用研究

随着存储和数据传感器等廉价设备的出现，收集和存储数据比以往任何时候都要简单。生物技术、制药、商业、在线营销网站、Twitter、Facebook和博客都是数据的一些来源。了解数据在今天是至关重要的，因为从私人到公共，从医院到大型超市的每一项商业活动都从中受益。然而，由于数据量的爆炸式增长，人工破译数据几乎是不可能的。到2022年，我们每天将创造2.5万亿字节。1万亿字节等于10亿千兆字节。大约90%的总数据是在过去两年中创建的。自然，今天需要一种自动分析数据的技术。因此，数据挖掘是在机器学习工具的帮助下进行的，以分析和理解数据。数据挖掘和机器学习在很大程度上依赖于统计工具和技术。因此，我们有时使用“统计学习”这个术语来描述机器学习。文献中存在许多机器学习技术，并且改进是一个持续的过程，因为没有一个模型是完美的。本文研究了方差(一个统计概念)对各种机器学习方法的影响，并试图理解如何使用这个概念来提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Sensors, Wireless Communications and Control Engineering-Electrical and Electronic Engineering

CiteScore

2.20

自引率

0.00%

发文量

期刊介绍： International Journal of Sensors, Wireless Communications and Control publishes timely research articles, full-length/ mini reviews and communications on these three strongly related areas, with emphasis on networked control systems whose sensors are interconnected via wireless communication networks. The emergence of high speed wireless network technologies allows a cluster of devices to be linked together economically to form a distributed system. Wireless communication is playing an increasingly important role in such distributed systems. Transmitting sensor measurements and control commands over wireless links allows rapid deployment, flexible installation, fully mobile operation and prevents the cable wear and tear problem in industrial automation, healthcare and environmental assessment. Wireless networked systems has raised and continues to raise fundamental challenges in the fields of science, engineering and industrial applications, hence, more new modelling techniques, problem formulations and solutions are required.