Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.

IF 8.3 2区 医学 Q1 GASTROENTEROLOGY & HEPATOLOGY
Imogen S Stafford, James J Ashton, Enrico Mossotto, Guo Cheng, Robert Mark Beattie, Sarah Ennis
{"title":"Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.","authors":"Imogen S Stafford, James J Ashton, Enrico Mossotto, Guo Cheng, Robert Mark Beattie, Sarah Ennis","doi":"10.1093/ecco-jcc/jjad084","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype.</p><p><strong>Methods: </strong>Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset.</p><p><strong>Results: </strong>A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC.</p><p><strong>Discussion: </strong>We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.</p>","PeriodicalId":15547,"journal":{"name":"Journal of Crohns & Colitis","volume":" ","pages":"1672-1680"},"PeriodicalIF":8.3000,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10637043/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Crohns & Colitis","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ecco-jcc/jjad084","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype.

Methods: Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset.

Results: A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC.

Discussion: We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.

监督机器学习使用全外显子组测序数据按亚型对炎症性肠病患者进行分类。
背景:炎症性肠病(IBD)是一种慢性炎症性疾病,主要有两种亚型:克罗恩病(CD)和溃疡性结肠炎(UC)。及时的亚型诊断使正确的治疗得以实施。利用基因组数据,我们旨在评估机器学习[ML]根据IBD亚型对患者进行分类。方法:使用内部生物信息学管道处理来自儿科/成人IBD患者的全外显子组测序[WES]。这些数据被浓缩成每个基因、每个个体的基因组负担评分(GenePy)。数据被分成训练和测试数据集[80/20]。使用线性支持向量分类器进行特征选择,使用贝叶斯优化进行超参数调优[训练数据]。使用监督ML方法随机森林将患者分类为CD或UC,使用三个面板:1]所有可用基因;2]自身免疫基因;[3]“IBD”基因。使用受试者工作特征曲线下的面积(AUROC)、敏感性和测试数据集的特异性来评估ML结果。结果:共有906例患者被纳入分析[600例CD, 306例UC]。训练数据包括488例患者,根据UC的少数类别进行平衡。自身免疫基因组产生了表现最好的ML模型[AUROC = 0.68],优于IBD基因组[AUROC = 0.61]。无论使用何种基因面板,NOD2都是区分CD和UC的最佳基因。在CD患者中缺乏高GenePy评分的基因变异是UC诊断的最佳分类器。讨论:我们展示了使用随机森林和WES数据按亚型进行患者分类的前景。专注于特定的亚组患者,拥有更大的数据集,可能会导致更好的分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Crohns & Colitis
Journal of Crohns & Colitis 医学-胃肠肝病学
CiteScore
15.50
自引率
7.50%
发文量
1048
审稿时长
1 months
期刊介绍: Journal of Crohns and Colitis is concerned with the dissemination of knowledge on clinical, basic science and innovative methods related to inflammatory bowel diseases. The journal publishes original articles, review papers, editorials, leading articles, viewpoints, case reports, innovative methods and letters to the editor.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信