Best Practices in Supervised Machine Learning: A Tutorial for Psychologists

IF 13.4 1区心理学 Q1 PSYCHOLOGY

Advances in Methods and Practices in Psychological Science Pub Date : 2023-07-01 DOI:10.1177/25152459231162559

F. Pargent, Ramona Schoedel, Clemens Stachl

{"title":"Best Practices in Supervised Machine Learning: A Tutorial for Psychologists","authors":"F. Pargent, Ramona Schoedel, Clemens Stachl","doi":"10.1177/25152459231162559","DOIUrl":null,"url":null,"abstract":"Supervised machine learning (ML) is becoming an influential analytical method in psychology and other social sciences. However, theoretical ML concepts and predictive-modeling techniques are not yet widely taught in psychology programs. This tutorial is intended to provide an intuitive but thorough primer and introduction to supervised ML for psychologists in four consecutive modules. After introducing the basic terminology and mindset of supervised ML, in Module 1, we cover how to use resampling methods to evaluate the performance of ML models (bias-variance trade-off, performance measures, k-fold cross-validation). In Module 2, we introduce the nonlinear random forest, a type of ML model that is particularly user-friendly and well suited to predicting psychological outcomes. Module 3 is about performing empirical benchmark experiments (comparing the performance of several ML models on multiple data sets). Finally, in Module 4, we discuss the interpretation of ML models, including permutation variable importance measures, effect plots (partial-dependence plots, individual conditional-expectation profiles), and the concept of model fairness. Throughout the tutorial, intuitive descriptions of theoretical concepts are provided, with as few mathematical formulas as possible, and followed by code examples using the mlr3 and companion packages in R. Key practical-analysis steps are demonstrated on the publicly available PhoneStudy data set (N = 624), which includes more than 1,800 variables from smartphone sensing to predict Big Five personality trait scores. The article contains a checklist to be used as a reminder of important elements when performing, reporting, or reviewing ML analyses in psychology. Additional examples and more advanced concepts are demonstrated in online materials (https://osf.io/9273g/).","PeriodicalId":55645,"journal":{"name":"Advances in Methods and Practices in Psychological Science","volume":" ","pages":""},"PeriodicalIF":13.4000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Methods and Practices in Psychological Science","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/25152459231162559","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY","Score":null,"Total":0}

引用次数: 4

Abstract

Supervised machine learning (ML) is becoming an influential analytical method in psychology and other social sciences. However, theoretical ML concepts and predictive-modeling techniques are not yet widely taught in psychology programs. This tutorial is intended to provide an intuitive but thorough primer and introduction to supervised ML for psychologists in four consecutive modules. After introducing the basic terminology and mindset of supervised ML, in Module 1, we cover how to use resampling methods to evaluate the performance of ML models (bias-variance trade-off, performance measures, k-fold cross-validation). In Module 2, we introduce the nonlinear random forest, a type of ML model that is particularly user-friendly and well suited to predicting psychological outcomes. Module 3 is about performing empirical benchmark experiments (comparing the performance of several ML models on multiple data sets). Finally, in Module 4, we discuss the interpretation of ML models, including permutation variable importance measures, effect plots (partial-dependence plots, individual conditional-expectation profiles), and the concept of model fairness. Throughout the tutorial, intuitive descriptions of theoretical concepts are provided, with as few mathematical formulas as possible, and followed by code examples using the mlr3 and companion packages in R. Key practical-analysis steps are demonstrated on the publicly available PhoneStudy data set (N = 624), which includes more than 1,800 variables from smartphone sensing to predict Big Five personality trait scores. The article contains a checklist to be used as a reminder of important elements when performing, reporting, or reviewing ML analyses in psychology. Additional examples and more advanced concepts are demonstrated in online materials (https://osf.io/9273g/).

查看原文本刊更多论文

监督机器学习的最佳实践：心理学家教程

监督式机器学习(ML)正在成为心理学和其他社会科学中有影响力的分析方法。然而，理论机器学习概念和预测建模技术尚未在心理学课程中广泛教授。本教程旨在为心理学家提供一个直观但彻底的入门和介绍监督ML连续四个模块。在介绍了监督机器学习的基本术语和思维方式之后，在模块1中，我们将介绍如何使用重采样方法来评估机器学习模型的性能(偏差-方差权衡，性能度量，k-fold交叉验证)。在模块2中，我们介绍了非线性随机森林，这是一种特别用户友好且非常适合预测心理结果的ML模型。模块3是关于执行经验基准实验(比较几个ML模型在多个数据集上的性能)。最后，在模块4中，我们讨论了ML模型的解释，包括排列变量重要性度量，效果图(部分依赖图，个体条件期望曲线)和模型公平性的概念。在整个教程中，提供了理论概念的直观描述，尽可能少的数学公式，然后是使用r中的mlr3和配套软件包的代码示例。关键的实际分析步骤在公开可用的PhoneStudy数据集(N = 624)上进行了演示，其中包括1800多个变量，从智能手机感知到预测五大人格特质得分。这篇文章包含了一个清单，作为执行，报告或审查心理学ML分析时重要元素的提醒。在线材料(https://osf.io/9273g/)展示了其他示例和更高级的概念。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advances in Methods and Practices in Psychological Science Multiple-

CiteScore

21.20

自引率

0.70%

发文量

期刊介绍： In 2021, Advances in Methods and Practices in Psychological Science will undergo a transition to become an open access journal. This journal focuses on publishing innovative developments in research methods, practices, and conduct within the field of psychological science. It embraces a wide range of areas and topics and encourages the integration of methodological and analytical questions. The aim of AMPPS is to bring the latest methodological advances to researchers from various disciplines, even those who are not methodological experts. Therefore, the journal seeks submissions that are accessible to readers with different research interests and that represent the diverse research trends within the field of psychological science. The types of content that AMPPS welcomes include articles that communicate advancements in methods, practices, and metascience, as well as empirical scientific best practices. Additionally, tutorials, commentaries, and simulation studies on new techniques and research tools are encouraged. The journal also aims to publish papers that bring advances from specialized subfields to a broader audience. Lastly, AMPPS accepts Registered Replication Reports, which focus on replicating important findings from previously published studies. Overall, the transition of Advances in Methods and Practices in Psychological Science to an open access journal aims to increase accessibility and promote the dissemination of new developments in research methods and practices within the field of psychological science.