数字和分类数据监督机器学习工作流程的基本组成部分和原理

Eng Pub Date : 2024-02-29 DOI:10.3390/eng5010021

Styliani I. Kampezidou, Archana Tikayat Ray, Anirudh Prabhakara Bhat, Olivia J. Pinon Fischer, D. Mavris

{"title":"数字和分类数据监督机器学习工作流程的基本组成部分和原理","authors":"Styliani I. Kampezidou, Archana Tikayat Ray, Anirudh Prabhakara Bhat, Olivia J. Pinon Fischer, D. Mavris","doi":"10.3390/eng5010021","DOIUrl":null,"url":null,"abstract":"This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e., feature engineering and model selection), principles (i.e., bias–variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, and output interpretability), models (i.e., neural networks and regression models), methods (i.e., cross-validation and data augmentation), metrics (i.e., Mean Squared Error and F1-score) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community regarding complete and rigorous machine learning workflows and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are non-AI experts. This paper may be of significant value to academic researchers developing and prototyping machine learning workflows for their own research or as customer-tailored solutions for government and industry partners.","PeriodicalId":502660,"journal":{"name":"Eng","volume":"2016 28","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data\",\"authors\":\"Styliani I. Kampezidou, Archana Tikayat Ray, Anirudh Prabhakara Bhat, Olivia J. Pinon Fischer, D. Mavris\",\"doi\":\"10.3390/eng5010021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e., feature engineering and model selection), principles (i.e., bias–variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, and output interpretability), models (i.e., neural networks and regression models), methods (i.e., cross-validation and data augmentation), metrics (i.e., Mean Squared Error and F1-score) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community regarding complete and rigorous machine learning workflows and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are non-AI experts. This paper may be of significant value to academic researchers developing and prototyping machine learning workflows for their own research or as customer-tailored solutions for government and industry partners.\",\"PeriodicalId\":502660,\"journal\":{\"name\":\"Eng\",\"volume\":\"2016 28\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Eng\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/eng5010021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eng","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/eng5010021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文全面探讨了为预测和分类目的开发和自动化有监督端到端机器学习工作流程的过程。本文全面概述了机器学习的组成部分（即特征工程和模型选择）、原理（即偏差-方差分解、模型复杂性、过拟合、模型对特征假设和缩放的敏感性以及输出可解释性）、模型（即神经网络和回归模型）、方法（即交叉验证和回归模型）和应用（即预测和分类）、神经网络和回归模型）、方法（即交叉验证和数据增强）、度量（即均方误差和 F1-分数）和工具，这些方法和工具统治着大多数数值数据和分类数据的监督学习应用，以及它们的集成、自动化和部署。本文的最终目标和贡献在于教育和指导非人工智能专家学术界了解完整、严谨的机器学习工作流程和数据科学实践，从问题范围界定到设计和最先进的自动化工具，包括方法选择的基本原则和推理。本文深入探讨了有监督机器学习工作流开发的关键阶段（其中许多阶段往往被研究人员忽略），涵盖了理解和优化功能性机器学习工作流所必需的基础概念，从而为非人工智能专家的应用研究人员提供了任务特定应用开发的整体视角。本文对于为自己的研究开发机器学习工作流和原型的学术研究人员或为政府和行业合作伙伴提供的客户定制解决方案具有重要价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data

This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e., feature engineering and model selection), principles (i.e., bias–variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, and output interpretability), models (i.e., neural networks and regression models), methods (i.e., cross-validation and data augmentation), metrics (i.e., Mean Squared Error and F1-score) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community regarding complete and rigorous machine learning workflows and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are non-AI experts. This paper may be of significant value to academic researchers developing and prototyping machine learning workflows for their own research or as customer-tailored solutions for government and industry partners.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Eng

CiteScore

2.10

自引率

0.00%

发文量