Giang T. T. Nguyen, Le Due Hoang, Q. Nguyen, T. Nguyen, Hien Dang, Duc-Hau Le
{"title":"An investigation of cancer cell line-based drug response prediction methods on patient data","authors":"Giang T. T. Nguyen, Le Due Hoang, Q. Nguyen, T. Nguyen, Hien Dang, Duc-Hau Le","doi":"10.1109/KSE50997.2020.9287633","DOIUrl":null,"url":null,"abstract":"The most significant goal of precision medicine is to identify the right treatment for individual patients based on their molecular profiles. Several big projects have been provided with a large amount of -omics and drug response data for human cell lines such as GDSC and CCLE and for patients such as GEO. Based on these useful datasets, many computational methods are increasingly being applied to predict not only untested drug responses on cell lines but also those on the patients. Such approaches built prediction models for drug response on cell line data then applied the learned models to predict drug response on the patient. In this way, it also helps to tackle the disparity between models trained on cell lines and their clinical applications. However, the datasets are highly heterogeneous in terms of the used array techniques, drug response measurements, and so on, thus leading to inconsistent results across computational methods on different datasets. Therefore, in this study, we assessed seven machine learning models built on the cell line datasets and then applied them to the patient datasets. Experimental results show that models built on pan-cancer cell lines cannot work well on every cancer-specific patient dataset Also, patient datasets with larger sizes were suggested to measure the prediction performance of each method correctly.","PeriodicalId":275683,"journal":{"name":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 12th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE50997.2020.9287633","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The most significant goal of precision medicine is to identify the right treatment for individual patients based on their molecular profiles. Several big projects have been provided with a large amount of -omics and drug response data for human cell lines such as GDSC and CCLE and for patients such as GEO. Based on these useful datasets, many computational methods are increasingly being applied to predict not only untested drug responses on cell lines but also those on the patients. Such approaches built prediction models for drug response on cell line data then applied the learned models to predict drug response on the patient. In this way, it also helps to tackle the disparity between models trained on cell lines and their clinical applications. However, the datasets are highly heterogeneous in terms of the used array techniques, drug response measurements, and so on, thus leading to inconsistent results across computational methods on different datasets. Therefore, in this study, we assessed seven machine learning models built on the cell line datasets and then applied them to the patient datasets. Experimental results show that models built on pan-cancer cell lines cannot work well on every cancer-specific patient dataset Also, patient datasets with larger sizes were suggested to measure the prediction performance of each method correctly.