Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models

IF 2.2 2区工程技术 Q2 ENGINEERING, INDUSTRIAL

Journal of Quality Technology Pub Date : 2021-09-21 DOI:10.1080/00224065.2021.1977101

Bing Si

{"title":"Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models","authors":"Bing Si","doi":"10.1080/00224065.2021.1977101","DOIUrl":null,"url":null,"abstract":"Predictive models aim to guess, a.k.a., predict, values of a variable of interest based on other variables. It has been used throughout the entire human history and many statistical models have been developed for prediction during the last century. This book covers methods for exploration of predictive models from both instance level and dataset level. It would be a valuable addition to the Chapman & Hall/CRC’s Data Science Series. Together with other books that have published in the book series, this book provides a unique perspective into applied data science to guide data science practitioners who are interested in exploring, explaining, and examining data in real-world applications with both R and Python. Predictive models constitute an important component in the big picture of machine learning and data science approaches and require standard analytical steps such as model specification, model estimation, and model fitness diagnosis. Most of published books in this field focus on how to use these statistical methods to make predictions for different types of datasets, while lack of tools for model exploration and, in particular, model explanation (obtaining insights from model-based prediction) and model examination (evaluation of model performance and understanding its weakness). In contrast, this book is a novel effort that provides a deep understanding to all the steps with extensive validation and justification methods, leading to a better and faster interpretable data analysis. The book is well organized with three parts. It starts with an overview of basic concepts in Chapters 1-4 and then presents the instance-level exploration and datasetlevel exploration in Chapters 5-13 and Chapters 14-20, respectively. The overview part introduces basic and essential knowledge on model development process, software installation, and how to perform classic predictive models using software. The instance-level exploration part covers methods to help better understand “how a model yields a prediction for a particular single observation” for predictive models with both a small and a large number of exploratory variables. The last part is about dataset-level exploration that discusses “how do the model predictions perform overall, for an entire set of observations?” Although a basic understanding of programming languages would be beneficial, the coding part in this book is designed to be self-contained and friendly to readers without programming background as well. Additionally, it is worth noting that the readers are expected to have a certain level of knowledge about different types of data science models, such as logistic regression, support vector machine, and gradient boosting, and understand which kind of research questions each model can address. For example, given a research question aiming at predicting patient survival (yes/no) after surgery from other variables, e.g., age, symptoms, and medical history, the reader should be able to identify that the dependent variable of interest, survival, is a binary variable, and then consider a logistic regression model as a natural choice to start the predictive modeling. Overall, the book is a suitable reference book for data science practitioners to learn exploratory data analysis for predictive models and its applications using R or Python software.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"26 1","pages":"486 - 486"},"PeriodicalIF":2.2000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/00224065.2021.1977101","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 62

Abstract

Predictive models aim to guess, a.k.a., predict, values of a variable of interest based on other variables. It has been used throughout the entire human history and many statistical models have been developed for prediction during the last century. This book covers methods for exploration of predictive models from both instance level and dataset level. It would be a valuable addition to the Chapman & Hall/CRC’s Data Science Series. Together with other books that have published in the book series, this book provides a unique perspective into applied data science to guide data science practitioners who are interested in exploring, explaining, and examining data in real-world applications with both R and Python. Predictive models constitute an important component in the big picture of machine learning and data science approaches and require standard analytical steps such as model specification, model estimation, and model fitness diagnosis. Most of published books in this field focus on how to use these statistical methods to make predictions for different types of datasets, while lack of tools for model exploration and, in particular, model explanation (obtaining insights from model-based prediction) and model examination (evaluation of model performance and understanding its weakness). In contrast, this book is a novel effort that provides a deep understanding to all the steps with extensive validation and justification methods, leading to a better and faster interpretable data analysis. The book is well organized with three parts. It starts with an overview of basic concepts in Chapters 1-4 and then presents the instance-level exploration and datasetlevel exploration in Chapters 5-13 and Chapters 14-20, respectively. The overview part introduces basic and essential knowledge on model development process, software installation, and how to perform classic predictive models using software. The instance-level exploration part covers methods to help better understand “how a model yields a prediction for a particular single observation” for predictive models with both a small and a large number of exploratory variables. The last part is about dataset-level exploration that discusses “how do the model predictions perform overall, for an entire set of observations?” Although a basic understanding of programming languages would be beneficial, the coding part in this book is designed to be self-contained and friendly to readers without programming background as well. Additionally, it is worth noting that the readers are expected to have a certain level of knowledge about different types of data science models, such as logistic regression, support vector machine, and gradient boosting, and understand which kind of research questions each model can address. For example, given a research question aiming at predicting patient survival (yes/no) after surgery from other variables, e.g., age, symptoms, and medical history, the reader should be able to identify that the dependent variable of interest, survival, is a binary variable, and then consider a logistic regression model as a natural choice to start the predictive modeling. Overall, the book is a suitable reference book for data science practitioners to learn exploratory data analysis for predictive models and its applications using R or Python software.

查看原文本刊更多论文

解释模型分析:探索、解释和检验预测模型

预测模型的目的是猜测，也就是预测基于其他变量的感兴趣变量的值。它在整个人类历史中一直被使用，在上个世纪，人们开发了许多统计模型来进行预测。这本书涵盖了从实例级和数据集级探索预测模型的方法。这将是查普曼和霍尔/CRC的数据科学系列的一个有价值的补充。与该系列中已出版的其他书籍一起，本书提供了应用数据科学的独特视角，以指导对使用R和Python在实际应用中探索、解释和检查数据感兴趣的数据科学从业者。预测模型是机器学习和数据科学方法的重要组成部分，需要标准的分析步骤，如模型规范、模型估计和模型适应度诊断。该领域出版的大多数书籍都侧重于如何使用这些统计方法对不同类型的数据集进行预测，而缺乏模型探索，特别是模型解释(从基于模型的预测中获得见解)和模型检验(评估模型性能并了解其弱点)的工具。相比之下，这本书是一个新颖的努力，提供了一个深刻的理解与广泛的验证和论证方法的所有步骤，导致一个更好和更快的可解释的数据分析。这本书组织得很好，分为三部分。它首先概述了第1-4章的基本概念，然后分别在第5-13章和第14-20章介绍了实例级探索和数据集级探索。概述部分介绍了模型开发过程、软件安装以及如何使用软件执行经典预测模型的基本和必要知识。实例级探索部分涵盖了帮助更好地理解具有少量和大量探索变量的预测模型的“模型如何产生对特定单个观测的预测”的方法。最后一部分是关于数据集级别的探索，讨论了“对于整个观测集，模型预测的总体表现如何?”虽然对编程语言有基本的了解是有益的，但本书中的编码部分被设计为对没有编程背景的读者也是独立和友好的。此外，值得注意的是，读者应该对不同类型的数据科学模型(如逻辑回归、支持向量机和梯度增强)有一定程度的了解，并了解每种模型可以解决哪种研究问题。例如，给定一个研究问题，旨在从其他变量(如年龄、症状和病史)预测手术后患者的生存(是/否)，读者应该能够识别出感兴趣的因变量，生存，是一个二元变量，然后考虑逻辑回归模型作为开始预测建模的自然选择。总的来说，这本书是一本适合数据科学从业者使用R或Python软件学习预测模型的探索性数据分析及其应用的参考书。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Quality Technology 管理科学-工程：工业

CiteScore

5.20

自引率

4.00%

发文量

审稿时长

>12 weeks

期刊介绍： The objective of Journal of Quality Technology is to contribute to the technical advancement of the field of quality technology by publishing papers that emphasize the practical applicability of new techniques, instructive examples of the operation of existing techniques and results of historical researches. Expository, review, and tutorial papers are also acceptable if they are written in a style suitable for practicing engineers. Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days