{"title":"凡事皆有代价:成本敏感型机器学习的基础及其在心理学中的应用。","authors":"Philipp Sterner, David Goretzko, Florian Pargent","doi":"10.1037/met0000586","DOIUrl":null,"url":null,"abstract":"<p><p>Psychology has seen an increase in the use of machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false positive or false negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive machine learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, that is, the drug consumption data set (<i>N</i> = 1, 885) from the University of California Irvine ML Repository. In our example, all demonstrated CSL methods noticeably reduced mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/). (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":"112-127"},"PeriodicalIF":7.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Everything has its price: Foundations of cost-sensitive machine learning and its application in psychology.\",\"authors\":\"Philipp Sterner, David Goretzko, Florian Pargent\",\"doi\":\"10.1037/met0000586\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Psychology has seen an increase in the use of machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false positive or false negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive machine learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, that is, the drug consumption data set (<i>N</i> = 1, 885) from the University of California Irvine ML Repository. In our example, all demonstrated CSL methods noticeably reduced mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/). (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>\",\"PeriodicalId\":20782,\"journal\":{\"name\":\"Psychological methods\",\"volume\":\" \",\"pages\":\"112-127\"},\"PeriodicalIF\":7.8000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Psychological methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1037/met0000586\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/8/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000586","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/8/10 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
摘要
在心理学中,机器学习(ML)方法的使用有所增加。在许多应用中,观测值被分为两组(二元分类)之一。现成的分类算法假设错误分类(假阳性或假阴性)的代价是相等的。因为这通常是不合理的(例如,在临床心理学中),成本敏感机器学习(CSL)方法可以考虑不同的成本比率。我们介绍了数学基础,并介绍了最常用的CSL方法的分类,然后展示了它们在心理数据上的应用和有用性,即来自加州大学欧文分校ML存储库的药物消费数据集(N = 1,885)。在我们的示例中,与常规ML算法相比,所有演示的CSL方法都显著降低了平均误分类成本。我们讨论了研究人员为了自己的实际应用而对CSL方法进行小型基准测试的必要性。因此,我们的开放材料提供了R代码,演示了如何在mlr3框架中应用CSL方法(https://osf.io/cvks7/)。(PsycInfo Database Record (c) 2025 APA,版权所有)。
Everything has its price: Foundations of cost-sensitive machine learning and its application in psychology.
Psychology has seen an increase in the use of machine learning (ML) methods. In many applications, observations are classified into one of two groups (binary classification). Off-the-shelf classification algorithms assume that the costs of a misclassification (false positive or false negative) are equal. Because this is often not reasonable (e.g., in clinical psychology), cost-sensitive machine learning (CSL) methods can take different cost ratios into account. We present the mathematical foundations and introduce a taxonomy of the most commonly used CSL methods, before demonstrating their application and usefulness on psychological data, that is, the drug consumption data set (N = 1, 885) from the University of California Irvine ML Repository. In our example, all demonstrated CSL methods noticeably reduced mean misclassification costs compared to regular ML algorithms. We discuss the necessity for researchers to perform small benchmarks of CSL methods for their own practical application. Thus, our open materials provide R code, demonstrating how CSL methods can be applied within the mlr3 framework (https://osf.io/cvks7/). (PsycInfo Database Record (c) 2025 APA, all rights reserved).
期刊介绍:
Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.