Lucy Moctezuma Tan, Lorena Benitez, Florentine van Nouhuijs, Faye Orcales, Allen Kim, Ross Campbell, Megumi Fuse, Pleuni S Pennings
{"title":"使用决策树预测 COVID 病例数:初学者教程","authors":"Lucy Moctezuma Tan, Lorena Benitez, Florentine van Nouhuijs, Faye Orcales, Allen Kim, Ross Campbell, Megumi Fuse, Pleuni S Pennings","doi":"10.1101/2023.12.19.572463","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) makes it possible to analyze large volumes of data and is an important tool in biomedical research. The use of ML methods can lead to improvements in diagnosis, treatment, and prevention of diseases. During the COVID pandemic, ML methods were used for predictions at the patient and community levels. Given the ubiquity of ML, it is important that future doctors, researchers and teachers get acquainted with ML and its contributions to research. Our goal is to make it easier for students and their professors to learn about ML. The learning module we present here is based on a small but relevant COVID dataset, videos, annotated code and the use of cloud computing platforms. The benefit of cloud computing platforms is that students do not have to set up a coding environment on their computer. This saves time and is also an important democratization factor, allowing students to use old or borrowed computers (e.g., from a library), tablets or Chromebooks. As a result, this will benefit colleges geared toward underserved populations with limited computing infrastructure. We developed a beginner-friendly module focused on learning the basics of decision trees by applying them to COVID tabular data. It introduces students to basic terminology used in supervised ML and its relevance to research. The module includes two Python notebooks with pre-written code, one with practice exercises and another with its solutions. Our experience with biology students at San Francisco State University suggests that the material increases interest in ML.","PeriodicalId":501568,"journal":{"name":"bioRxiv - Scientific Communication and Education","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using a decision tree to predict COVID case numbers: a tutorial for beginners\",\"authors\":\"Lucy Moctezuma Tan, Lorena Benitez, Florentine van Nouhuijs, Faye Orcales, Allen Kim, Ross Campbell, Megumi Fuse, Pleuni S Pennings\",\"doi\":\"10.1101/2023.12.19.572463\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning (ML) makes it possible to analyze large volumes of data and is an important tool in biomedical research. The use of ML methods can lead to improvements in diagnosis, treatment, and prevention of diseases. During the COVID pandemic, ML methods were used for predictions at the patient and community levels. Given the ubiquity of ML, it is important that future doctors, researchers and teachers get acquainted with ML and its contributions to research. Our goal is to make it easier for students and their professors to learn about ML. The learning module we present here is based on a small but relevant COVID dataset, videos, annotated code and the use of cloud computing platforms. The benefit of cloud computing platforms is that students do not have to set up a coding environment on their computer. This saves time and is also an important democratization factor, allowing students to use old or borrowed computers (e.g., from a library), tablets or Chromebooks. As a result, this will benefit colleges geared toward underserved populations with limited computing infrastructure. We developed a beginner-friendly module focused on learning the basics of decision trees by applying them to COVID tabular data. It introduces students to basic terminology used in supervised ML and its relevance to research. The module includes two Python notebooks with pre-written code, one with practice exercises and another with its solutions. Our experience with biology students at San Francisco State University suggests that the material increases interest in ML.\",\"PeriodicalId\":501568,\"journal\":{\"name\":\"bioRxiv - Scientific Communication and Education\",\"volume\":\"32 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Scientific Communication and Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.12.19.572463\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Scientific Communication and Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.12.19.572463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
机器学习(ML)可以分析大量数据,是生物医学研究的重要工具。使用 ML 方法可以改善疾病的诊断、治疗和预防。在 COVID 大流行期间,ML 方法被用于患者和社区层面的预测。鉴于 ML 无处不在,未来的医生、研究人员和教师必须熟悉 ML 及其对研究的贡献。我们的目标是让学生和他们的教授更容易了解 ML。我们在此介绍的学习模块基于一个小型但相关的 COVID 数据集、视频、带注释的代码以及云计算平台的使用。云计算平台的好处在于,学生无需在自己的计算机上设置编码环境。这不仅节省了时间,也是一个重要的民主化因素,学生可以使用旧电脑或借来的电脑(如从图书馆借来的电脑)、平板电脑或 Chromebook。因此,这将有利于面向计算机基础设施有限、服务不足的人群的学院。我们开发了一个适合初学者的模块,重点是通过将决策树应用于 COVID 表格数据来学习决策树的基础知识。它向学生介绍了监督式 ML 中使用的基本术语及其与研究的相关性。该模块包括两个预写代码的 Python 笔记本,一个是练习题,另一个是解答题。我们对旧金山州立大学生物系学生的经验表明,这些材料提高了学生对 ML 的兴趣。
Using a decision tree to predict COVID case numbers: a tutorial for beginners
Machine learning (ML) makes it possible to analyze large volumes of data and is an important tool in biomedical research. The use of ML methods can lead to improvements in diagnosis, treatment, and prevention of diseases. During the COVID pandemic, ML methods were used for predictions at the patient and community levels. Given the ubiquity of ML, it is important that future doctors, researchers and teachers get acquainted with ML and its contributions to research. Our goal is to make it easier for students and their professors to learn about ML. The learning module we present here is based on a small but relevant COVID dataset, videos, annotated code and the use of cloud computing platforms. The benefit of cloud computing platforms is that students do not have to set up a coding environment on their computer. This saves time and is also an important democratization factor, allowing students to use old or borrowed computers (e.g., from a library), tablets or Chromebooks. As a result, this will benefit colleges geared toward underserved populations with limited computing infrastructure. We developed a beginner-friendly module focused on learning the basics of decision trees by applying them to COVID tabular data. It introduces students to basic terminology used in supervised ML and its relevance to research. The module includes two Python notebooks with pre-written code, one with practice exercises and another with its solutions. Our experience with biology students at San Francisco State University suggests that the material increases interest in ML.