E. Crowgey, Pankaj Vats, Karl R. Franke, G. Burnett, Ankit Sethia, T. Harkins, T. Druley
{"title":"Abstract 165: Enhanced processing of genomic sequencing data for pediatric cancers: GPUs and machine learning techniques for variant detection","authors":"E. Crowgey, Pankaj Vats, Karl R. Franke, G. Burnett, Ankit Sethia, T. Harkins, T. Druley","doi":"10.1158/1538-7445.AM2021-165","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-165","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90481225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenneth B. Thomas, Y. Mou, C. Magnan, T. Gyuris, E. Shinbrot, Fernando Díaz, Steven Lau-Rivera, Segun Jung, V. Funari, L. Weiss
{"title":"Abstract 240: Gene fusion calling from RNA panel sequencing data: An ensemble learning approach","authors":"Kenneth B. Thomas, Y. Mou, C. Magnan, T. Gyuris, E. Shinbrot, Fernando Díaz, Steven Lau-Rivera, Segun Jung, V. Funari, L. Weiss","doi":"10.1158/1538-7445.AM2021-240","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-240","url":null,"abstract":"Introduction: Our goal is to improve gene fusion detection via RNA sequencing by combining multiple fusion callers through machine learning techniques. Background: Gene Fusion events are important drivers of malignancy. RNA sequencing (RNAseq) methods for detection of fusions have the advantage that multiple markers can be targeted at one time. Unlike DNA methods, in which it is challenging to capture fusion breakpoints, in RNA methods fusions are readily identified through chimeric transcripts. While many fusion calling algorithms exist for use on RNAseq data, sensitive fusion callers, needed for samples of low tumor content, often present high false positive rates - a result of aligning chimeric transcripts. Further, there currently is no single feature in NGS data that can be used to filter out false positive fusion calls. In order to achieve higher accuracy in fusion calls than can be achieved using individual fusion callers, we have weighted and combined the results of multiple fusion callers by systematic and objective means: an ensemble learning approach based on random forest models. Our method selects from data generated by three independent fusion callers supplemented by metrics obtained from in-house methods. It presents a metric that can be immediately interpreted as the probability that a candidate fusion call is a true fusion call. Methods: Random forest models were generated by use of the randomForest package in R, with tuning by the R caret package. Training data sets consisted of a balanced set of 394 fusion calls from clinical samples of solid tumors. For training, fusion calls with at least 10 supporting reads were deemed true or false based on manual review via IGV, and orthogonal methods including PCR with Sanger sequencing and the commercial Archer™ fusion CTL and Sarcoma panels. We present the results of training on data from the three well-known fusion callers Arriba, STAR-Fusion, and FusionCatcher, together with additional data from an in-house developed junction counting method, and fusion membership in a list of known fusions (a “white list”). Models were validated by 10-fold cross-validation. Results: In performance evaluations, false positive and false negative calls were presumed false based on orthogonal determinations. On that basis, our current best model has an accuracy of 94.9% (sensitivity 93.4%, specificity 96.7%). Currently, High Confidence fusion calls (calls with probability score greater than 70%) are the most common positive calls. These have been confirmed with 100% success. Conclusion: We have successfully integrated multiple fusion callers by means of random forest models. Our current model is validated for use on our solid tumor fusion calling pipeline. Citation Format: Kenneth B. Thomas, Yanglong Mou, Christophe Magnan, Tibor Gyuris, Eve Shinbrot, Fernando Lopez Diaz, Steven Lau-Rivera, Segun Jung, Vincent Funari, Lawrence M. Weiss. Gene fusion calling from RNA panel sequencing data: An ensemble lear","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78213570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim
{"title":"Abstract 183: End-to-end training of convolutional network for breast cancer detection in two-view mammography","authors":"D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim","doi":"10.1158/1538-7445.AM2021-183","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-183","url":null,"abstract":"Background:Early computer-aided detection systems for mammography have failed to improve the performance of radiologists. With the remarkable success of deep learning, some recent studies have described computer systems with similar or even superior performance to that of human experts. Among them, Shen et al. (Nature Sci. Rep., 2019) present a promising “end-to-end” training approach. Instead of training a convolutional net with whole mammograms, they first train a “patch classifier” that recognizes lesions in small subimages. Then, they generalize the patch classifier to “whole image classifier” using the property of fully convolutional networks and the end-to-end approach. Using this strategy, the authors have obtained a per-image AUC of 0.87 [0.84, 0.90] in the CBIS-DDSM dataset. Standard mammography consists of two views for each breast: bilateral craniocaudal (CC) and mediolateral oblique (MLO). The algorithm proposed by Shen et al. processes only single-view mammography. We extend their work, presenting the end-to-end training of convolutional net for two-view mammography. Methods:First, we reproduced Shen et al.9s work, using the CBIS-DDSM dataset. We trained a ResNet50-based net for classifying patches with 224x224 pixels using segmented lesions. Then, the weights of the patch classifier were transferred to the whole image single-view classifier, obtained by removing the dense layers from the patch classifier and stacking one ResNet block at the top. This single-view classifier was trained using full images from the same dataset. Trying to replicate Shen et al.9s work, we obtained an AUC of 0.8524±0.0560, less than 0.87 reported in the original paper. We attribute this worsening to the fact that we are using only 2260 images with two views, instead of 2478 images from the original work. Finally, we built the two-view classifier that receives CC and MLO views as input. This classifier has inside two copies of the patch classifier, loaded with the weights from the single-view classifier. The features extracted by the two patch classifiers are concatenated and submitted to the ResNet block. The two-view classifier is end-to-end trained using full images, refining all its weights, including those inside the two patch classifiers. Results:The two-view classifier yielded an AUC of 0.9199±0.0623 in 5-fold cross-validation to classify mammographies into malignant/non-malignant, using single-model and without test-time data augmentation. This is better than the Shen et al.9s AUC (0.87), our single-view AUC (0.85). Zhang et al. (Plos One, 2020) present another two-view algorithm (without end-to-end training) with AUC of 0.95. However, this work cannot directly be compared with ours, as it was tested on a different set of images. Conclusions:We presented end-to-end training of convolutional net for two-view mammography. Our system9s AUC was 0.92, better than the 0.87 obtained by the previous single-view system. Citation Format: Daniel G. Petrini, C","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76684620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claire J. Guo, Mary Saltarelli, S. Lambert, H. Fang, Chun Zhang
{"title":"Abstract 153: Development of a workflow to handle the quality control and analysis of Olink protein biomarker data in early phase oncology clinical trials","authors":"Claire J. Guo, Mary Saltarelli, S. Lambert, H. Fang, Chun Zhang","doi":"10.1158/1538-7445.AM2021-153","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-153","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73607351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arpad M. Danos, Wan-Hsin Lin, J. Saliba, Angshumoy Roy, A. Church, Shruti Rao, D. Ritter, Kilannin Krysiak, A. Wagner, Erica K. Barnell, Lana M. Sheta, Adam C. Coffman, S. Kiwala, Joshua F. McMichael, L. Corson, Kevin E. Fisher, H. Williams, Matthew C. Hiemenz, K. Janeway, J. Ji, Kesserwan A. Chimene, L. Fuqua, L. Dyer, Huiling Xu, Jeffrey Jean, L. Satgunaseelan, Liying Zhang, T. Laetsch, D. Parsons, Ryan J. Schmidt, L. Schriml, K. Sund, S. Kulkarni, Subha Madhavan, Xinjie Xu, R. Kanagal-Shamana, M. Harris, Y. Akkari, Nurit Paz Yacov, P. Terraf, M. Griffith, O. Griffith, G. Raca
{"title":"Abstract 210: Advancing knowledgebase representation of pediatric cancer variants through ClinGen/CIViC collaboration","authors":"Arpad M. Danos, Wan-Hsin Lin, J. Saliba, Angshumoy Roy, A. Church, Shruti Rao, D. Ritter, Kilannin Krysiak, A. Wagner, Erica K. Barnell, Lana M. Sheta, Adam C. Coffman, S. Kiwala, Joshua F. McMichael, L. Corson, Kevin E. Fisher, H. Williams, Matthew C. Hiemenz, K. Janeway, J. Ji, Kesserwan A. Chimene, L. Fuqua, L. Dyer, Huiling Xu, Jeffrey Jean, L. Satgunaseelan, Liying Zhang, T. Laetsch, D. Parsons, Ryan J. Schmidt, L. Schriml, K. Sund, S. Kulkarni, Subha Madhavan, Xinjie Xu, R. Kanagal-Shamana, M. Harris, Y. Akkari, Nurit Paz Yacov, P. Terraf, M. Griffith, O. Griffith, G. Raca","doi":"10.1158/1538-7445.AM2021-210","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-210","url":null,"abstract":"Childhood cancers are driven by unique profiles of somatic genetic alterations, with a significant contribution from predisposing germline variants. Understanding the genomic landscape of pediatric cancers is complicated by their rarity, the heterogeneity of variation within a given disease, and the complex forms of structural variation they contain. Variants in childhood disease may differ from those in adult versions of the same cancer type, or may have different clinical significance. Currently, pediatric variants are underrepresented in cancer variant databases, and an urgent need exists for their publicly available expert curation. To address this, the Pediatric Cancer Taskforce (PCT) was formed within the Clinical Genome Resource (ClinGen) Somatic Cancer Clinical Domain Working Group (CDWG) (https://www.clinicalgenome.org/working-groups/somatic/). The PCT is a multi-institutional group of 39 members with broad experience in childhood cancer and variant curation, whose work consists of standardization and classification of genetic variants in pediatric cancers. The CIViC knowledgebase (www.civicdb.org) is a freely available resource for Clinical Interpretation of Variants in Cancer, which leverages public curation and expert moderation to address the problem of annotating the large volume of clinically actionable cancer variants. PCT curators work together with PCT expert members and the CIViC team on variant curation, and have submitted over 230 Evidence Items and over 10 Assertions to CIViC. To further address issues specific to pediatric curation, the PCT is working with CIViC to develop new pediatric-specific CIViC features and modifications of the data model that will aid in pediatric curation. A pediatric user interface, as well as representation of large scale structural and copy number variation are being developed for version two of CIViC, expected to be released in 1-2 years, which will enable curation of a new class of structural variants often encountered in pediatric cancer. A novel standard operating procedure for childhood cancer curation in CIViC is being developed by PCT experts, curators and the CIViC team. This SOP will cover topics including curation of structural variants, as well as pediatric-specific variant tiering guidelines which take into account the sparse nature of evidence in pediatric cases. A companion resource, CIViCmine (http://bionlp.bcgsc.ca/civicmine/), will be further developed to incorporate pediatric data. These and other joint efforts of the PCT and CIViC will significantly enhance pediatric variant representation for public use, to support the care of children with cancer. Citation Format: Arpad Danos, Wan-Hsin Lin, Jason Saliba, Angshumoy Roy, Alanna J. Church, Shruti Rao, Deborah Ritter, Kilannin Krysiak, Alex Wagner, Erica Barnell, Lana Sheta, Adam Coffman, Susanna Kiwala, Joshua F. McMichael, Laura Corson, Kevin Fisher, Heather E. Williams, Matthew Hiemenz, Katherine A. Janeway, Jianling Ji, Kess","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84487140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Abstract 225: Computational analysis of 5-fluorouracil antitumor activity in colon cancer using a mechanistic pharmacokinetic/pharmacodynamic model","authors":"Chenhui Ma, A. Almasan, Evren Gurkan-Cavusoglu","doi":"10.1158/1538-7445.AM2021-225","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-225","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72859961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Parikh, O. Elemento, Neel S. Madhukar, Coryandar Gilvary
{"title":"Abstract 220: Identifying novel oncology targets and positioning existing targets through the prediction of cancer dependencies","authors":"M. Parikh, O. Elemento, Neel S. Madhukar, Coryandar Gilvary","doi":"10.1158/1538-7445.AM2021-220","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-220","url":null,"abstract":"","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83278925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianjiong Gao, T. Mazor, Ino de Bruijn, Adam Abeshouse, Diana Baiceanu, Ziya Erkoç, Benjamin E. Gross, David M Higgins, P. Jagannathan, Karthik Kalletla, P. Kumari, Ritika Kundra, Xiang Li, James Lindsay, Aaron Lisman, Pieter Lukasse, Divya Madala, Ramyasree Madupuri, Angelica Ochoa, Oleguer Plantalech, Joyce Quach, Sander Y. A. Rodenburg, Anusha Satravada, F. Schaeffer, R. Sheridan, Lucas Sikina, S. O. Sumer, Yichao Sun, P. van Dijk, P. van Nierop, Avery Wang, Manda Wilson, Hongxin Zhang, Gaofei Zhao, Sjoerd van Hagen, K. van Bochove, U. Dogrusoz, Allison P. Heath, A. Resnick, Trevor J Pugh, C. Sander, E. Cerami, N. Schultz
{"title":"Abstract 207: The cBioPortal for Cancer Genomics","authors":"Jianjiong Gao, T. Mazor, Ino de Bruijn, Adam Abeshouse, Diana Baiceanu, Ziya Erkoç, Benjamin E. Gross, David M Higgins, P. Jagannathan, Karthik Kalletla, P. Kumari, Ritika Kundra, Xiang Li, James Lindsay, Aaron Lisman, Pieter Lukasse, Divya Madala, Ramyasree Madupuri, Angelica Ochoa, Oleguer Plantalech, Joyce Quach, Sander Y. A. Rodenburg, Anusha Satravada, F. Schaeffer, R. Sheridan, Lucas Sikina, S. O. Sumer, Yichao Sun, P. van Dijk, P. van Nierop, Avery Wang, Manda Wilson, Hongxin Zhang, Gaofei Zhao, Sjoerd van Hagen, K. van Bochove, U. Dogrusoz, Allison P. Heath, A. Resnick, Trevor J Pugh, C. Sander, E. Cerami, N. Schultz","doi":"10.1158/1538-7445.am2021-207","DOIUrl":"https://doi.org/10.1158/1538-7445.am2021-207","url":null,"abstract":"207: The cBioPortal for Cancer Genomics Author & Article Information Cancer Res (2021) 81 (13_Supplement): 207. https://doi.org/10.1158/1538-7445.AM2021-207","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84333756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Mehrabadi, S. Malikić, Kerrie L. Marie, Eva Pérez-Guijarro, Erfan Sadeqi Azer, Howard H. Yang, Can Kızılkale, Charli Gruen, Huaitian Liu, C. Marcelus, A. Buluç, Funda Ergün, M. Lee, G. Merlino, Chi-Ping Day, S. C. Sahinalp
{"title":"Abstract LB019: Trisicell: Scalable Tumor Phylogeny Reconstruction and Validation Reveals Developmental Origin and Therapeutic Impact of Intratumoral Heterogeneity","authors":"F. Mehrabadi, S. Malikić, Kerrie L. Marie, Eva Pérez-Guijarro, Erfan Sadeqi Azer, Howard H. Yang, Can Kızılkale, Charli Gruen, Huaitian Liu, C. Marcelus, A. Buluç, Funda Ergün, M. Lee, G. Merlino, Chi-Ping Day, S. C. Sahinalp","doi":"10.1158/1538-7445.AM2021-LB019","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-LB019","url":null,"abstract":"Emerging sets of single-cell sequencing data makes it appealing to apply existing tumor phylogeny reconstruction methods to analyze associated intratumor heterogeneity. Unfortunately, tumor phylogeny inference is an NP-hard problem and existing principled methods typically fail to scale up to handle thousands of cells and mutations observed in emerging single-cell data sets. Even though there are greedy heuristics to build hierarchical clustering of cells and mutations, they suffer from well-documented issues in accuracy. Additionally even when “optimal” solutions are feasible, existing approaches only provide a single “most likely” tree to depict the evolutionary processes that may result in an observed collection of cells and mutations. To make matters worse, the vast majority of single-cell sequencing data sets are transcriptomic and as a result, suffer from considerable variation in coverage across mutational loci. In this paper, we introduce Trisicell, a computational toolkit for scalable tumor phylogeny reconstruction and validation from single-cell genomic, exomic or transcriptomic sequencing data. Trisicell has three components: (i) Trisicell-DnC, a new tumor phylogeny reconstruction method from genotype matrices derived from single-cell data, (ii) Trisicell-ConT a new algorithm for constructing the consensus for two or more tumor phylogenies - which may be built through the use of different data types on the same set of cells, or built through the use of different methods on the same data, and (iii) Trisicell-PF, a new partition function method for assessing the likelihood of any user-defined subtree/set of cells to be seeded by a given set of mutations in the phylogeny. Collectively, these tools provide means of identifying and validating robust portions of a tumor phylogeny, offering the ability to focus on the most important (sub)clones and the genomic alterations that seed the associated clonal expansion. We applied Trisicell to a panel of clonal sublines derived from single-cells of a parental mouse melanoma model on which we performed both whole exome and whole transcriptome sequencing. The tumor phylogenies of the clonal sublines built on exomic and transcriptomic mutations by Trisicell-DnC, were shown by Trisicell-ConT to be highly similar and the subtrees comprised of phenotypically similar clonal sublines were shown to be strongly associated by Trisicell-PF to their seeding mutations. In addition, we applied Trisicell to single-cell whole transcriptome sequencing data from a tumor derived from the same parental melanoma cell line, which was subjected to anti-CTLA-4 immunotherapy. The phylogenies generated from both studies featured distinct subtrees, strongly associated with phenotypes including cell differentiation status, tumor growth and therapeutic response. These results suggest that Trisicell can be used for scalable tumor phylogeny reconstruction and validation through both single-cell and clonal-subline sequencing data,","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89145061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim
{"title":"Abstract 181: High-accuracy breast cancer detection in mammography using EfficientNet and end-to-end training","authors":"D. Petrini, C. Shimizu, G. Valente, Guilherme Folgueira, Guilherme Apolinario Silva Novaes, M. H. Katayama, P. Serio, R. A. Roela, T. Tucunduva, M. A. K. Folgueira, Hae Yong Kim","doi":"10.1158/1538-7445.AM2021-181","DOIUrl":"https://doi.org/10.1158/1538-7445.AM2021-181","url":null,"abstract":"Background:Breast cancer (BC) is the second most common cancer among women. BC screening is usually based on mammography interpreted by radiologists. Recently, some researchers have used deep learning to automatically diagnose BC in mammography and so assist radiologists. The progress of BC detection algorithms can be measured by their performance on public datasets. The CBIS-DDSM is a widely used public dataset composed of scanned mammographies, equally divided into malignant and non-malignant (benign) images. Each image is accompanied by the segmentation of the lesion. Shen et al. (Nature Sci. Rep., 2019) presented a BC detection algorithm using an “end-to-end” approach to train deep neural networks. In this algorithm, a patch classifier is first trained to classify local image patches. The patch classifier9s weights are then used to initialize the whole image classifier, that is refined using datasets with the cancer status of the whole image. They achieved an AUC of 0.87 [0.84, 0.90] in classifying CBIS-DDSM images, using their best single-model, single-view breast classifier. They used ResNet (He et al., CVPR 2016) as the basis of their algorithm. Our hypothesis was that replacing the old ResNet with the modern EfficientNet (Tan et al., arXiv 2019) and MobileNetV2 (Sandler et al.,CVPR 2018) would result in greater accuracy. Methods:We tested many different models, to conclude that the best model is obtained using EfficientNet-B4 as the base model, with a MobileNetV2 block at the top, followed by a dense layer with two output categories. We trained the patch classifier using 52,528 patches with 224x224 pixels extracted from CBIS-DDSM. From each image, we extracted 20 patches: 10 patches containing the lesion and 10 from the background (without lesion). The patch classifier weights were then used to initialize the whole image classifier, that was trained using the end-to-end approach with CBIS-DDSM images resized to 1152x896 pixels, with data augmentation. The training was performed using a step learning rate of 1e-4 for the first 20 epochs then 1e-5 for the remaining 10 and batch size of 4, using 10-fold cross-validation. We used 81% of the dataset for training, 9% for validation and 10% for testing. Results:We obtained an AUC of 0.8963±0.06, using a single-model, single-view classifier and without test-time data augmentation. Conclusions:Using EfficientNet and MobileNetV2 as the basis of the BC detection algorithm (instead of ResNet), we obtained an improvement in classifying CBIS-DDSM images into malignant/non-malignant: AUC has increased from 0.87 to 0.896. Our AUC is also larger than other recent papers in the literature, such as Shu et al. (IEEE Trans Med. Image, 2020) that achieved an AUC of 0.838 in the same CBIS-DDSM dataset. Citation Format: Daniel G. Petrini, Carlos Shimizu, Gabriel V. Valente, Guilherme Folgueira, Guilherme A. Novaes, Maria L. Katayama, Pedro Serio, Rosimeire A. Roela, Tatiana C. Tucunduva, Maria Aparecida A. Folgu","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"85 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86986647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}