Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms.

Q1 Mathematics
Yangyang Hao, Quan-Yang Duh, Richard T Kloos, Joshua Babiarz, R Mack Harrell, S Thomas Traweek, Su Yeon Kim, Grazyna Fedorowicz, P Sean Walsh, Peter M Sadow, Jing Huang, Giulia C Kennedy
{"title":"Identification of Hürthle cell cancers: solving a clinical challenge with genomic sequencing and a trio of machine learning algorithms.","authors":"Yangyang Hao,&nbsp;Quan-Yang Duh,&nbsp;Richard T Kloos,&nbsp;Joshua Babiarz,&nbsp;R Mack Harrell,&nbsp;S Thomas Traweek,&nbsp;Su Yeon Kim,&nbsp;Grazyna Fedorowicz,&nbsp;P Sean Walsh,&nbsp;Peter M Sadow,&nbsp;Jing Huang,&nbsp;Giulia C Kennedy","doi":"10.1186/s12918-019-0693-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Identification of Hürthle cell cancers by non-operative fine-needle aspiration biopsy (FNAB) of thyroid nodules is challenging. Resultingly, non-cancerous Hürthle lesions were conventionally distinguished from Hürthle cell cancers by histopathological examination of tissue following surgical resection. Reliance on histopathological evaluation requires patients to undergo surgery to obtain a diagnosis despite most being non-cancerous. It is highly desirable to avoid surgery and to provide accurate classification of benignity versus malignancy from FNAB preoperatively. In our first-generation algorithm, Gene Expression Classifier (GEC), we achieved this goal by using machine learning (ML) on gene expression features. The classifier is sensitive, but not specific due in part to the presence of non-neoplastic benign Hürthle cells in many FNAB.</p><p><strong>Results: </strong>We sought to overcome this low-specificity limitation by expanding the feature set for ML using next-generation whole transcriptome RNA sequencing and called the improved algorithm the Genomic Sequencing Classifier (GSC). The Hürthle identification leverages mitochondrial expression and we developed novel feature extraction mechanisms to measure chromosomal and genomic level loss-of-heterozygosity (LOH) for the algorithm. Additionally, we developed a multi-layered system of cascading classifiers to sequentially triage Hürthle cell-containing FNAB, including: 1. presence of Hürthle cells, 2. presence of neoplastic Hürthle cells, and 3. presence of benign Hürthle cells. The final Hürthle cell Index utilizes 1048 nuclear and mitochondrial genes; and Hürthle cell Neoplasm Index leverages LOH features as well as 2041 genes. Both indices are Support Vector Machine (SVM) based. The third classifier, the GSC Benign/Suspicious classifier, utilizes 1115 core genes and is an ensemble classifier incorporating 12 individual models.</p><p><strong>Conclusions: </strong>The accurate algorithmic depiction of this complex biological system among Hürthle subtypes results in a dramatic improvement of classification performance; specificity among Hürthle cell neoplasms increases from 11.8% with the GEC to 58.8% with the GSC, while maintaining the same sensitivity of 89%.</p>","PeriodicalId":9013,"journal":{"name":"BMC Systems Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s12918-019-0693-z","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12918-019-0693-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 24

Abstract

Background: Identification of Hürthle cell cancers by non-operative fine-needle aspiration biopsy (FNAB) of thyroid nodules is challenging. Resultingly, non-cancerous Hürthle lesions were conventionally distinguished from Hürthle cell cancers by histopathological examination of tissue following surgical resection. Reliance on histopathological evaluation requires patients to undergo surgery to obtain a diagnosis despite most being non-cancerous. It is highly desirable to avoid surgery and to provide accurate classification of benignity versus malignancy from FNAB preoperatively. In our first-generation algorithm, Gene Expression Classifier (GEC), we achieved this goal by using machine learning (ML) on gene expression features. The classifier is sensitive, but not specific due in part to the presence of non-neoplastic benign Hürthle cells in many FNAB.

Results: We sought to overcome this low-specificity limitation by expanding the feature set for ML using next-generation whole transcriptome RNA sequencing and called the improved algorithm the Genomic Sequencing Classifier (GSC). The Hürthle identification leverages mitochondrial expression and we developed novel feature extraction mechanisms to measure chromosomal and genomic level loss-of-heterozygosity (LOH) for the algorithm. Additionally, we developed a multi-layered system of cascading classifiers to sequentially triage Hürthle cell-containing FNAB, including: 1. presence of Hürthle cells, 2. presence of neoplastic Hürthle cells, and 3. presence of benign Hürthle cells. The final Hürthle cell Index utilizes 1048 nuclear and mitochondrial genes; and Hürthle cell Neoplasm Index leverages LOH features as well as 2041 genes. Both indices are Support Vector Machine (SVM) based. The third classifier, the GSC Benign/Suspicious classifier, utilizes 1115 core genes and is an ensemble classifier incorporating 12 individual models.

Conclusions: The accurate algorithmic depiction of this complex biological system among Hürthle subtypes results in a dramatic improvement of classification performance; specificity among Hürthle cell neoplasms increases from 11.8% with the GEC to 58.8% with the GSC, while maintaining the same sensitivity of 89%.

Abstract Image

Abstract Image

Abstract Image

Hürthle细胞癌的鉴定:用基因组测序和三种机器学习算法解决临床挑战。
背景:通过甲状腺结节的非手术细针抽吸活检(FNAB)来识别Hürthle细胞癌是一项具有挑战性的工作。因此,通过手术切除后组织的组织病理学检查,传统上将非癌性Hürthle病变与Hüarthle细胞癌区分开来。依赖组织病理学评估需要患者接受手术以获得诊断,尽管大多数患者是非癌性的。非常希望避免手术,并在术前对FNAB的良恶性进行准确的分类。在我们的第一代算法基因表达分类器(GEC)中,我们通过对基因表达特征使用机器学习(ML)来实现这一目标。该分类器是敏感的,但不是特异性的,部分原因是许多FNAB中存在非肿瘤性良性Hürthle细胞。结果:我们试图通过使用下一代全转录组RNA测序扩展ML的特征集来克服这种低特异性限制,并将改进的算法称为基因组测序分类器(GSC)。Hürthle鉴定利用了线粒体的表达,我们开发了新的特征提取机制来测量算法的染色体和基因组水平的杂合性损失(LOH)。此外,我们开发了一个多层级联分类器系统,以顺序分类包含FNAB的Hürthle细胞,包括:1。Hürthle细胞的存在,2。肿瘤性Hürthle细胞的存在,以及3。良性Hürthle细胞的存在。最终的Hürthle细胞指数利用了1048个细胞核和线粒体基因;Hürthle细胞肿瘤指数利用LOH特征以及2041个基因。这两个索引都是基于支持向量机(SVM)的。第三个分类器,GSC良性/可疑分类器,利用1115个核心基因,是一个包含12个个体模型的集成分类器。结论:对Hürthle亚型中这种复杂生物系统的准确算法描述显著提高了分类性能;Hürthle细胞肿瘤的特异性从GEC的11.8%增加到GSC的58.8%,同时保持89%的敏感性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Systems Biology
BMC Systems Biology 生物-数学与计算生物学
CiteScore
6.30
自引率
0.00%
发文量
0
审稿时长
9 months
期刊介绍: Cessation. BMC Systems Biology is an open access journal publishing original peer-reviewed research articles in experimental and theoretical aspects of the function of biological systems at the molecular, cellular or organismal level, in particular those addressing the engineering of biological systems, network modelling, quantitative analyses, integration of different levels of information and synthetic biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信