定制化疗：创新的深度学习模型为高级别浆液性卵巢癌定制化疗。

IF 7.9 1区医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

Clinical and Translational Medicine Pub Date : 2024-09-07 DOI:10.1002/ctm2.1774

Se Ik Kim, Sangick Park, Eunyong Ahn, Jeunhui Kim, HyunA Jo, Juwon Lee, Untack Cho, Maria Lee, Cheol Lee, Danny N. Dhanasekaran, Taejin Ahn, Yong Sang Song

{"title":"定制化疗：创新的深度学习模型为高级别浆液性卵巢癌定制化疗。","authors":"Se Ik Kim, Sangick Park, Eunyong Ahn, Jeunhui Kim, HyunA Jo, Juwon Lee, Untack Cho, Maria Lee, Cheol Lee, Danny N. Dhanasekaran, Taejin Ahn, Yong Sang Song","doi":"10.1002/ctm2.1774","DOIUrl":null,"url":null,"abstract":"Dear Editor,The study presents a novel RNA-seq-based deep-learning model for predicting the chemoresistance of platinum-based therapy in high-grade serous ovarian carcinoma (HGSOC), aiming to personalize chemotherapy and improve patient outcomes. By leveraging diverse transcriptome datasets of ovarian tissue and employing deep ensemble learning techniques, the model prioritized to predict chemo-resistant HGSOC patients after initial platinum-based chemotherapy with high performance prioritized to sensitivity (sensitivity 100%, specificity 54.1% and area under the curve [AUC] 0.85). This may offer treatment strategies and enhance clinical reliability.HGSOC remains a significant health burden with high mortality rates worldwide, often diagnosed late due to ineffective screening.1 Furthermore, despite extensive surgery and chemotherapy, chemo-resistance remains a major challenge of platinum-based therapy in HGSOC, necessitating accurate prediction methods to improve patient outcomes and guide treatment decisions. Predicting the chemo-sensitivity of platinum-based therapy is the very first step of the personalized medicine for HGSOC, as it may offer incorporation of targeted agents.2 Genetic profiles offer potential in predicting resistance of platinum-based chemotherapy in HGSOC, supplementing clinicopathologic data inadequacies.3 Yet, reliance solely on genomic data faces challenges due to tumour heterogeneity.4 However, epigenetic factors, and DNA methylation patterns, offer promise in chemotherapy response prediction, while RNA-seq data aids in chemo-resistance prediction, requiring further validation for the clinical applicability of a small number of samples.5 Gene expression difference among racial groups in HGSOC is also confounding for accurate prediction of survival outcome.6Here, we adopt strategical approaches to extract universal chemo-resistance traits from public data with diverse ethnic backgrounds aiming for prediction accuracy in a small sample size. We utilized RNA-seq of fresh-frozen primary ovarian cancer tissue from The Cancer Genome Atlas (TCGA), Seoul National University (SNUH) and Patch et al.’s dataset (Patch).7 TCGA includes a majority of Caucasians, comprising 208 (chemo-resistant group: 149, chemo-sensitive group: 59) HGSOC patients. Patch comprises 40 (24, 16) Australian HGSOC patients. SNUH included 86 (14, 72) Korean HGSOC patients, who applied the same resistance criteria (no recurrence within 6 months) after initial platinum-based chemotherapy. No significant differences were observed in age, CA-125 levels, or FIGO stage between chemo-resistant and chemo-sensitive cases (Table S1).The study proceeded through three phases: data preprocessing, gene selection, and deep learning (Figure 1).We aligned TCGA and SNUH fastq files to GRCh38 using HISAT2.0, yielding TPM gene expression data. Patch provided TPM data exclusively. Combining TCGA, SNUH and Patch TPM with ensemble IDs, we filtered out lowly expressed genes, resulting in 14 902 ensemble IDs. Each dataset was split 2:1 for training and testing, ensuring a balance between chemo-resistant and chemo-sensitive cases.(See Figure 2)For gene selection, we used two strategies. The first aimed to capture the most concordant features across all datasets. A student's t-tests were conducted for each gene, selecting those with a p-value < .05 for each dataset. The intersection of these lists yielded four genes (tier1).The second strategy involves identifying genes differentially expressed in each dataset. After 100 bagging trials with a balanced number of chemo-resistant and chemo-sensitive samples, genes were selected if significant in over 80 trials by Mann-Whitney U-tests (p-value < .05), yielding 27 genes (4, 7 and 16 genes from TCGA, SNUH and Patch, respectively) (tier2). Combining these with the initial four genes (tier1) resulted in 31 genes for predicting chemo-resistance (Tables S1).TCGA training samples were split into five folds, ensuring class balance. Each fold underwent training with 2160 hyperparameter combinations using the Adam optimizer and binary cross-entropy loss (Table S4). Models from each fold were applied to SNUH training data to select the best-performing one (Table S5). The output values of these selected five models were averaged for predicting chemo-resistance.The deep ensemble model achieved AUCs of 0.721 and 0.85 for TCGA and SNUH, with sensitivities of 0.75 and 1.0, and specificities of 0.68 and 0.541 respectively. Another model using 16 previously reported genes yielded AUCs of 0.716 and 0.717 for TCGA and SNUH, with sensitivities of 0.75 and 0.6, and specificities of 0.62 and 0.458 respectively. Our selected 31 genes outperform the previous ones using the same method and data. Additionally, the 31 genes exhibit significantly higher AUC than models generated from randomly drawn the same number of genes. (Figure S1). These findings indicate both the increased number of genes and their potential biological relevance contribute to improved performance.Visualization of the information that is held by the last layer of our deep ensemble model shows consistent chemo-resistance classification performance across TCGA and SNUH datasets, despite ethnic composition differences between the datasets (Figure S2).The 31 identified genes show 100% sensitivity in Koreans. Among them, the network of four (tier1) genes highlights pathways like “Cell Cycle: G1/S Checkpoint Regulation” and “DNA Methylation and Transcriptional Repression Signaling.” Key genes include TP53, E2F1, E2F4, HDAC1, HDAC2 and MYC1 (Figure S3A and Table S6). TP53 mutations induce chemotherapy resistance by targeting p53 complexes for therapy.9E2F predicts chemoresistance, with histone deacetylases under study in ovarian cancer trials. MYC1 upregulated in chemo-resistant ovarian cancer cells. Among the functions of 31 genes, the ‘Ribonucleotide Reductase Signaling Pathway’ stands out, including TP53, E2F, CDK4 and CREB1, suggesting functional coupling between tier1 and tier2 genes. (Figure S3B and Table S7). Targeting this pathway restores chemo-sensitivity in chemo-resistant ovarian cancer. CDK4 inhibition effectively restores chemo-sensitivity in vivo, while inhibiting CREB1 phosphorylation sensitizes chemo-resistant cells to platinum, crucial for preventing tumor recurrence.10This study developed a deep ensemble model to predict chemoresistance in HGSOC patients. To compensate for the limited sample size of HGSOC patient data, we combine publicly available data with newly collected samples. With the strategy of combining common features in all data and features found in each data source, we identified 31 genes for predicting chemo-resistant in this population. These genes achieved 100% sensitivity, 54.1% specificity and AUC 0.85 in the validation dataset and have documented roles in cases of ovarian cancer chemo-resistant. The approach may be useful to build a prediction model with a limited sample size in conjunction with public resources. Especially, the identified genes and prediction models are worthy to be highlighted for further research to understand the biological significance and their application in other ovarian cancer research with a limited sample size.Se Ik Kim, Sangick Park and Eunyong Ahn contributed equally to this work. Se Ik Kim, Taejin Ahn and Yong Sang Song designed the study; Sangick Park and Jeunhui Kim analyzed the data and developed the prediction model; Se Ik Kim, Sangick Park, HyunA Jo, Juwon Lee, Untack Cho, Maria Lee, Cheol Lee and Danny N. Dhanasekaran collected pathological and clinical data; Cheol Lee reviewed and confirmed the pathological condition; Maria Lee, Cheol Lee, Danny N. Dhanasekaran and TP provided suggestions for the manuscript analysis results; Se Ik Kim and Sangick Park wrote the first draft of the manuscript; Eunyong Ahn reviewed the manuscript and wrote the final version of the manuscript; Taejin Ahn and Yong Sang Song supervised the research; all authors reviewed the manuscript, and approved the final report.The authors declare no conflict of interest.This work was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), Republic of Korea (No. HI16C2037) and Korean National Research Foundation (NRF-2019R1C1C1008185, 2022R1F1A1073939).This study was approved by the Institutional Review Board of SNUH (No. H-1807-037-956). We conducted this study in accordance with the Declaration of Helsinki. All patients in the SNUH cohort provided written informed consent and donated their cancer tissues for scientific purposes.","PeriodicalId":10189,"journal":{"name":"Clinical and Translational Medicine","volume":"14 9","pages":""},"PeriodicalIF":7.9000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ctm2.1774","citationCount":"0","resultStr":"{\"title\":\"Tailored chemotherapy: Innovative deep-learning model customizing chemotherapy for high-grade serous ovarian carcinoma\",\"authors\":\"Se Ik Kim, Sangick Park, Eunyong Ahn, Jeunhui Kim, HyunA Jo, Juwon Lee, Untack Cho, Maria Lee, Cheol Lee, Danny N. Dhanasekaran, Taejin Ahn, Yong Sang Song\",\"doi\":\"10.1002/ctm2.1774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dear Editor,The study presents a novel RNA-seq-based deep-learning model for predicting the chemoresistance of platinum-based therapy in high-grade serous ovarian carcinoma (HGSOC), aiming to personalize chemotherapy and improve patient outcomes. By leveraging diverse transcriptome datasets of ovarian tissue and employing deep ensemble learning techniques, the model prioritized to predict chemo-resistant HGSOC patients after initial platinum-based chemotherapy with high performance prioritized to sensitivity (sensitivity 100%, specificity 54.1% and area under the curve [AUC] 0.85). This may offer treatment strategies and enhance clinical reliability.HGSOC remains a significant health burden with high mortality rates worldwide, often diagnosed late due to ineffective screening.1 Furthermore, despite extensive surgery and chemotherapy, chemo-resistance remains a major challenge of platinum-based therapy in HGSOC, necessitating accurate prediction methods to improve patient outcomes and guide treatment decisions. Predicting the chemo-sensitivity of platinum-based therapy is the very first step of the personalized medicine for HGSOC, as it may offer incorporation of targeted agents.2 Genetic profiles offer potential in predicting resistance of platinum-based chemotherapy in HGSOC, supplementing clinicopathologic data inadequacies.3 Yet, reliance solely on genomic data faces challenges due to tumour heterogeneity.4 However, epigenetic factors, and DNA methylation patterns, offer promise in chemotherapy response prediction, while RNA-seq data aids in chemo-resistance prediction, requiring further validation for the clinical applicability of a small number of samples.5 Gene expression difference among racial groups in HGSOC is also confounding for accurate prediction of survival outcome.6Here, we adopt strategical approaches to extract universal chemo-resistance traits from public data with diverse ethnic backgrounds aiming for prediction accuracy in a small sample size. We utilized RNA-seq of fresh-frozen primary ovarian cancer tissue from The Cancer Genome Atlas (TCGA), Seoul National University (SNUH) and Patch et al.’s dataset (Patch).7 TCGA includes a majority of Caucasians, comprising 208 (chemo-resistant group: 149, chemo-sensitive group: 59) HGSOC patients. Patch comprises 40 (24, 16) Australian HGSOC patients. SNUH included 86 (14, 72) Korean HGSOC patients, who applied the same resistance criteria (no recurrence within 6 months) after initial platinum-based chemotherapy. No significant differences were observed in age, CA-125 levels, or FIGO stage between chemo-resistant and chemo-sensitive cases (Table S1).The study proceeded through three phases: data preprocessing, gene selection, and deep learning (Figure 1).We aligned TCGA and SNUH fastq files to GRCh38 using HISAT2.0, yielding TPM gene expression data. Patch provided TPM data exclusively. Combining TCGA, SNUH and Patch TPM with ensemble IDs, we filtered out lowly expressed genes, resulting in 14 902 ensemble IDs. Each dataset was split 2:1 for training and testing, ensuring a balance between chemo-resistant and chemo-sensitive cases.(See Figure 2)For gene selection, we used two strategies. The first aimed to capture the most concordant features across all datasets. A student's t-tests were conducted for each gene, selecting those with a p-value < .05 for each dataset. The intersection of these lists yielded four genes (tier1).The second strategy involves identifying genes differentially expressed in each dataset. After 100 bagging trials with a balanced number of chemo-resistant and chemo-sensitive samples, genes were selected if significant in over 80 trials by Mann-Whitney U-tests (p-value < .05), yielding 27 genes (4, 7 and 16 genes from TCGA, SNUH and Patch, respectively) (tier2). Combining these with the initial four genes (tier1) resulted in 31 genes for predicting chemo-resistance (Tables S1).TCGA training samples were split into five folds, ensuring class balance. Each fold underwent training with 2160 hyperparameter combinations using the Adam optimizer and binary cross-entropy loss (Table S4). Models from each fold were applied to SNUH training data to select the best-performing one (Table S5). The output values of these selected five models were averaged for predicting chemo-resistance.The deep ensemble model achieved AUCs of 0.721 and 0.85 for TCGA and SNUH, with sensitivities of 0.75 and 1.0, and specificities of 0.68 and 0.541 respectively. Another model using 16 previously reported genes yielded AUCs of 0.716 and 0.717 for TCGA and SNUH, with sensitivities of 0.75 and 0.6, and specificities of 0.62 and 0.458 respectively. Our selected 31 genes outperform the previous ones using the same method and data. Additionally, the 31 genes exhibit significantly higher AUC than models generated from randomly drawn the same number of genes. (Figure S1). These findings indicate both the increased number of genes and their potential biological relevance contribute to improved performance.Visualization of the information that is held by the last layer of our deep ensemble model shows consistent chemo-resistance classification performance across TCGA and SNUH datasets, despite ethnic composition differences between the datasets (Figure S2).The 31 identified genes show 100% sensitivity in Koreans. Among them, the network of four (tier1) genes highlights pathways like “Cell Cycle: G1/S Checkpoint Regulation” and “DNA Methylation and Transcriptional Repression Signaling.” Key genes include TP53, E2F1, E2F4, HDAC1, HDAC2 and MYC1 (Figure S3A and Table S6). TP53 mutations induce chemotherapy resistance by targeting p53 complexes for therapy.9E2F predicts chemoresistance, with histone deacetylases under study in ovarian cancer trials. MYC1 upregulated in chemo-resistant ovarian cancer cells. Among the functions of 31 genes, the ‘Ribonucleotide Reductase Signaling Pathway’ stands out, including TP53, E2F, CDK4 and CREB1, suggesting functional coupling between tier1 and tier2 genes. (Figure S3B and Table S7). Targeting this pathway restores chemo-sensitivity in chemo-resistant ovarian cancer. CDK4 inhibition effectively restores chemo-sensitivity in vivo, while inhibiting CREB1 phosphorylation sensitizes chemo-resistant cells to platinum, crucial for preventing tumor recurrence.10This study developed a deep ensemble model to predict chemoresistance in HGSOC patients. To compensate for the limited sample size of HGSOC patient data, we combine publicly available data with newly collected samples. With the strategy of combining common features in all data and features found in each data source, we identified 31 genes for predicting chemo-resistant in this population. These genes achieved 100% sensitivity, 54.1% specificity and AUC 0.85 in the validation dataset and have documented roles in cases of ovarian cancer chemo-resistant. The approach may be useful to build a prediction model with a limited sample size in conjunction with public resources. Especially, the identified genes and prediction models are worthy to be highlighted for further research to understand the biological significance and their application in other ovarian cancer research with a limited sample size.Se Ik Kim, Sangick Park and Eunyong Ahn contributed equally to this work. Se Ik Kim, Taejin Ahn and Yong Sang Song designed the study; Sangick Park and Jeunhui Kim analyzed the data and developed the prediction model; Se Ik Kim, Sangick Park, HyunA Jo, Juwon Lee, Untack Cho, Maria Lee, Cheol Lee and Danny N. Dhanasekaran collected pathological and clinical data; Cheol Lee reviewed and confirmed the pathological condition; Maria Lee, Cheol Lee, Danny N. Dhanasekaran and TP provided suggestions for the manuscript analysis results; Se Ik Kim and Sangick Park wrote the first draft of the manuscript; Eunyong Ahn reviewed the manuscript and wrote the final version of the manuscript; Taejin Ahn and Yong Sang Song supervised the research; all authors reviewed the manuscript, and approved the final report.The authors declare no conflict of interest.This work was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), Republic of Korea (No. HI16C2037) and Korean National Research Foundation (NRF-2019R1C1C1008185, 2022R1F1A1073939).This study was approved by the Institutional Review Board of SNUH (No. H-1807-037-956). We conducted this study in accordance with the Declaration of Helsinki. All patients in the SNUH cohort provided written informed consent and donated their cancer tissues for scientific purposes.\",\"PeriodicalId\":10189,\"journal\":{\"name\":\"Clinical and Translational Medicine\",\"volume\":\"14 9\",\"pages\":\"\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ctm2.1774\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical and Translational Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ctm2.1774\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Medicine","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ctm2.1774","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

亲爱的编辑，这项研究提出了一种基于RNA-seq的新型深度学习模型，用于预测高级别浆液性卵巢癌（HGSOC）患者对铂类疗法的化疗耐药性，旨在实现个性化化疗并改善患者预后。通过利用不同的卵巢组织转录组数据集和采用深度集合学习技术，该模型优先预测了初次铂类化疗后化疗耐药的 HGSOC 患者，灵敏度高（灵敏度 100%，特异性 54.1%，曲线下面积 [AUC] 0.85）。1 此外，尽管进行了广泛的手术和化疗，化疗耐药仍是铂类药物治疗 HGSOC 的一大挑战，因此需要精确的预测方法来改善患者预后并指导治疗决策。预测铂类药物的化疗敏感性是 HGSOC 个性化医疗的第一步，因为它可以提供靶向药物。2 基因图谱为预测 HGSOC 对铂类药物化疗的耐药性提供了可能，补充了临床病理数据的不足。3 然而，由于肿瘤的异质性，仅仅依靠基因组数据面临挑战。然而，表观遗传因素和 DNA 甲基化模式为化疗反应预测提供了希望，而 RNA-seq 数据有助于化疗耐药性预测，但需要对少量样本的临床适用性进行进一步验证。5 HGSOC 不同种族间的基因表达差异也会影响生存结果的准确预测。我们利用了来自《癌症基因组图谱》（TCGA）、首尔国立大学（SNUH）和 Patch 等人的数据集（Patch）7 的新鲜冷冻原发性卵巢癌组织的 RNA-seq。Patch 包括 40 名（24，16）澳大利亚 HGSOC 患者。SNUH包括86名（14名，72名）韩国HGSOC患者，他们在首次铂类化疗后采用了相同的耐药标准（6个月内无复发）。化疗耐药和化疗敏感病例在年龄、CA-125水平或FIGO分期方面无明显差异（表S1）。研究分为三个阶段：数据预处理、基因选择和深度学习（图1）。Patch 独家提供了 TPM 数据。将TCGA、SNUH和Patch的TPM与集合ID相结合，我们过滤掉了低表达基因，得到了14 902个集合ID。每个数据集的训练和测试比例为 2:1，以确保化疗耐药和化疗敏感病例之间的平衡（见图 2）。第一种策略旨在捕捉所有数据集中最一致的特征。我们对每个基因进行了学生 t 检验，选出每个数据集中 p 值为 0.05 的基因。第二种策略是识别每个数据集中差异表达的基因。在对化疗耐药样本和化疗敏感样本进行了 100 次装袋试验后，通过曼-惠特尼 U 检验（p 值为 0.05），在 80 多次试验中选择了具有显著性的基因，从而得到了 27 个基因（分别来自 TCGA、SNUH 和 Patch 的 4、7 和 16 个基因）（tier2）。将这些基因与最初的 4 个基因（tier1）相结合，得出了 31 个预测化疗耐药性的基因（表 S1）。使用 Adam 优化器和二元交叉熵损失（表 S4），对每个褶皱进行了 2160 个超参数组合的训练。将每个褶皱中的模型应用于 SNUH 训练数据，以选出表现最好的模型（表 S5）。深度集合模型对 TCGA 和 SNUH 的 AUC 分别为 0.721 和 0.85，灵敏度分别为 0.75 和 1.0，特异度分别为 0.68 和 0.541。另一个模型使用了 16 个以前报告过的基因，TCGA 和 SNUH 的 AUC 分别为 0.716 和 0.717，灵敏度分别为 0.75 和 0.6，特异度分别为 0.62 和 0.458。我们选出的 31 个基因优于之前使用相同方法和数据选出的基因。此外，这 31 个基因的 AUC 明显高于随机抽取相同数量基因生成的模型（图 S1）。(图 S1）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Tailored chemotherapy: Innovative deep-learning model customizing chemotherapy for high-grade serous ovarian carcinoma

查看原文本刊更多论文

Tailored chemotherapy: Innovative deep-learning model customizing chemotherapy for high-grade serous ovarian carcinoma

Dear Editor,

The study presents a novel RNA-seq-based deep-learning model for predicting the chemoresistance of platinum-based therapy in high-grade serous ovarian carcinoma (HGSOC), aiming to personalize chemotherapy and improve patient outcomes. By leveraging diverse transcriptome datasets of ovarian tissue and employing deep ensemble learning techniques, the model prioritized to predict chemo-resistant HGSOC patients after initial platinum-based chemotherapy with high performance prioritized to sensitivity (sensitivity 100%, specificity 54.1% and area under the curve [AUC] 0.85). This may offer treatment strategies and enhance clinical reliability.

HGSOC remains a significant health burden with high mortality rates worldwide, often diagnosed late due to ineffective screening.¹ Furthermore, despite extensive surgery and chemotherapy, chemo-resistance remains a major challenge of platinum-based therapy in HGSOC, necessitating accurate prediction methods to improve patient outcomes and guide treatment decisions. Predicting the chemo-sensitivity of platinum-based therapy is the very first step of the personalized medicine for HGSOC, as it may offer incorporation of targeted agents.² Genetic profiles offer potential in predicting resistance of platinum-based chemotherapy in HGSOC, supplementing clinicopathologic data inadequacies.³ Yet, reliance solely on genomic data faces challenges due to tumour heterogeneity.⁴ However, epigenetic factors, and DNA methylation patterns, offer promise in chemotherapy response prediction, while RNA-seq data aids in chemo-resistance prediction, requiring further validation for the clinical applicability of a small number of samples.⁵ Gene expression difference among racial groups in HGSOC is also confounding for accurate prediction of survival outcome.⁶

Here, we adopt strategical approaches to extract universal chemo-resistance traits from public data with diverse ethnic backgrounds aiming for prediction accuracy in a small sample size. We utilized RNA-seq of fresh-frozen primary ovarian cancer tissue from The Cancer Genome Atlas (TCGA), Seoul National University (SNUH) and Patch et al.’s dataset (Patch).⁷ TCGA includes a majority of Caucasians, comprising 208 (chemo-resistant group: 149, chemo-sensitive group: 59) HGSOC patients. Patch comprises 40 (24, 16) Australian HGSOC patients. SNUH included 86 (14, 72) Korean HGSOC patients, who applied the same resistance criteria (no recurrence within 6 months) after initial platinum-based chemotherapy. No significant differences were observed in age, CA-125 levels, or FIGO stage between chemo-resistant and chemo-sensitive cases (Table S1).

The study proceeded through three phases: data preprocessing, gene selection, and deep learning (Figure 1).

We aligned TCGA and SNUH fastq files to GRCh38 using HISAT2.0, yielding TPM gene expression data. Patch provided TPM data exclusively. Combining TCGA, SNUH and Patch TPM with ensemble IDs, we filtered out lowly expressed genes, resulting in 14 902 ensemble IDs. Each dataset was split 2:1 for training and testing, ensuring a balance between chemo-resistant and chemo-sensitive cases.(See Figure 2)

For gene selection, we used two strategies. The first aimed to capture the most concordant features across all datasets. A student's t-tests were conducted for each gene, selecting those with a p-value < .05 for each dataset. The intersection of these lists yielded four genes (tier1).

The second strategy involves identifying genes differentially expressed in each dataset. After 100 bagging trials with a balanced number of chemo-resistant and chemo-sensitive samples, genes were selected if significant in over 80 trials by Mann-Whitney U-tests (p-value < .05), yielding 27 genes (4, 7 and 16 genes from TCGA, SNUH and Patch, respectively) (tier2). Combining these with the initial four genes (tier1) resulted in 31 genes for predicting chemo-resistance (Tables S1).

TCGA training samples were split into five folds, ensuring class balance. Each fold underwent training with 2160 hyperparameter combinations using the Adam optimizer and binary cross-entropy loss (Table S4). Models from each fold were applied to SNUH training data to select the best-performing one (Table S5). The output values of these selected five models were averaged for predicting chemo-resistance.

The deep ensemble model achieved AUCs of 0.721 and 0.85 for TCGA and SNUH, with sensitivities of 0.75 and 1.0, and specificities of 0.68 and 0.541 respectively. Another model using 16 previously reported genes yielded AUCs of 0.716 and 0.717 for TCGA and SNUH, with sensitivities of 0.75 and 0.6, and specificities of 0.62 and 0.458 respectively. Our selected 31 genes outperform the previous ones using the same method and data. Additionally, the 31 genes exhibit significantly higher AUC than models generated from randomly drawn the same number of genes. (Figure S1). These findings indicate both the increased number of genes and their potential biological relevance contribute to improved performance.

Visualization of the information that is held by the last layer of our deep ensemble model shows consistent chemo-resistance classification performance across TCGA and SNUH datasets, despite ethnic composition differences between the datasets (Figure S2).

The 31 identified genes show 100% sensitivity in Koreans. Among them, the network of four (tier1) genes highlights pathways like “Cell Cycle: G1/S Checkpoint Regulation” and “DNA Methylation and Transcriptional Repression Signaling.” Key genes include TP53, E2F1, E2F4, HDAC1, HDAC2 and MYC1 (Figure S3A and Table S6). TP53 mutations induce chemotherapy resistance by targeting p53 complexes for therapy.⁹

E2F predicts chemoresistance, with histone deacetylases under study in ovarian cancer trials. MYC1 upregulated in chemo-resistant ovarian cancer cells. Among the functions of 31 genes, the ‘Ribonucleotide Reductase Signaling Pathway’ stands out, including TP53, E2F, CDK4 and CREB1, suggesting functional coupling between tier1 and tier2 genes. (Figure S3B and Table S7). Targeting this pathway restores chemo-sensitivity in chemo-resistant ovarian cancer. CDK4 inhibition effectively restores chemo-sensitivity in vivo, while inhibiting CREB1 phosphorylation sensitizes chemo-resistant cells to platinum, crucial for preventing tumor recurrence.¹⁰

This study developed a deep ensemble model to predict chemoresistance in HGSOC patients. To compensate for the limited sample size of HGSOC patient data, we combine publicly available data with newly collected samples. With the strategy of combining common features in all data and features found in each data source, we identified 31 genes for predicting chemo-resistant in this population. These genes achieved 100% sensitivity, 54.1% specificity and AUC 0.85 in the validation dataset and have documented roles in cases of ovarian cancer chemo-resistant. The approach may be useful to build a prediction model with a limited sample size in conjunction with public resources. Especially, the identified genes and prediction models are worthy to be highlighted for further research to understand the biological significance and their application in other ovarian cancer research with a limited sample size.

Se Ik Kim, Sangick Park and Eunyong Ahn contributed equally to this work. Se Ik Kim, Taejin Ahn and Yong Sang Song designed the study; Sangick Park and Jeunhui Kim analyzed the data and developed the prediction model; Se Ik Kim, Sangick Park, HyunA Jo, Juwon Lee, Untack Cho, Maria Lee, Cheol Lee and Danny N. Dhanasekaran collected pathological and clinical data; Cheol Lee reviewed and confirmed the pathological condition; Maria Lee, Cheol Lee, Danny N. Dhanasekaran and TP provided suggestions for the manuscript analysis results; Se Ik Kim and Sangick Park wrote the first draft of the manuscript; Eunyong Ahn reviewed the manuscript and wrote the final version of the manuscript; Taejin Ahn and Yong Sang Song supervised the research; all authors reviewed the manuscript, and approved the final report.

The authors declare no conflict of interest.

This work was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), Republic of Korea (No. HI16C2037) and Korean National Research Foundation (NRF-2019R1C1C1008185, 2022R1F1A1073939).

This study was approved by the Institutional Review Board of SNUH (No. H-1807-037-956). We conducted this study in accordance with the Declaration of Helsinki. All patients in the SNUH cohort provided written informed consent and donated their cancer tissues for scientific purposes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Clinical and Translational Medicine Multiple-

CiteScore

15.90

自引率

1.90%

发文量

450

审稿时长

4 weeks

期刊介绍： Clinical and Translational Medicine (CTM) is an international, peer-reviewed, open-access journal dedicated to accelerating the translation of preclinical research into clinical applications and fostering communication between basic and clinical scientists. It highlights the clinical potential and application of various fields including biotechnologies, biomaterials, bioengineering, biomarkers, molecular medicine, omics science, bioinformatics, immunology, molecular imaging, drug discovery, regulation, and health policy. With a focus on the bench-to-bedside approach, CTM prioritizes studies and clinical observations that generate hypotheses relevant to patients and diseases, guiding investigations in cellular and molecular medicine. The journal encourages submissions from clinicians, researchers, policymakers, and industry professionals.