Gözde Ayse Tataroglu, G. Ozbulak, Kazim Kivanç Eren
{"title":"用SHAP方法确定遗传变异的可靠性","authors":"Gözde Ayse Tataroglu, G. Ozbulak, Kazim Kivanç Eren","doi":"10.1109/SIU49456.2020.9302443","DOIUrl":null,"url":null,"abstract":"Analysis of genetic variants is important for the detection of diseases associated with a variant. Detection of changes in the genetic variant is important for accurate diagnosis of the disease and appropriate solutions. One of the biggest problems in the classification of variants is the reliability of the data sets that will be presented as an input to modeling for the classification of variants. In this study, a system design based on machine learning, which determines the reliability of variants to be introduced to a variant scoring model, is proposed. Thus, it is aimed to provide more reliable training data for variant scoring systems. Shapley Additive Explanation (SHAP) method has been used to determine the most effective ones. In the experiments carried out on ClinVar, one of the data sets where this problem was observed, classifiers were created for the detection of contradictory situations by using Support Vector Machines (SVMs) and Gradient Boosting Trees (XGBoost) methods. In this study, 157 features were reduced to 41 attributes in SVM modeling and 13 attributes in XGBoost modeling for the detection of contradictory situations, and results were very close to the performance rates obtained with all attributes. Keywords—Variant Conflicting Detection, SHAP, Machine Learning Interpretability, SVM, XGBoost.","PeriodicalId":312627,"journal":{"name":"2020 28th Signal Processing and Communications Applications Conference (SIU)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Determination of the Genetic Variant Reliability Using SHAP Approach\",\"authors\":\"Gözde Ayse Tataroglu, G. Ozbulak, Kazim Kivanç Eren\",\"doi\":\"10.1109/SIU49456.2020.9302443\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Analysis of genetic variants is important for the detection of diseases associated with a variant. Detection of changes in the genetic variant is important for accurate diagnosis of the disease and appropriate solutions. One of the biggest problems in the classification of variants is the reliability of the data sets that will be presented as an input to modeling for the classification of variants. In this study, a system design based on machine learning, which determines the reliability of variants to be introduced to a variant scoring model, is proposed. Thus, it is aimed to provide more reliable training data for variant scoring systems. Shapley Additive Explanation (SHAP) method has been used to determine the most effective ones. In the experiments carried out on ClinVar, one of the data sets where this problem was observed, classifiers were created for the detection of contradictory situations by using Support Vector Machines (SVMs) and Gradient Boosting Trees (XGBoost) methods. In this study, 157 features were reduced to 41 attributes in SVM modeling and 13 attributes in XGBoost modeling for the detection of contradictory situations, and results were very close to the performance rates obtained with all attributes. Keywords—Variant Conflicting Detection, SHAP, Machine Learning Interpretability, SVM, XGBoost.\",\"PeriodicalId\":312627,\"journal\":{\"name\":\"2020 28th Signal Processing and Communications Applications Conference (SIU)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 28th Signal Processing and Communications Applications Conference (SIU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIU49456.2020.9302443\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU49456.2020.9302443","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Determination of the Genetic Variant Reliability Using SHAP Approach
Analysis of genetic variants is important for the detection of diseases associated with a variant. Detection of changes in the genetic variant is important for accurate diagnosis of the disease and appropriate solutions. One of the biggest problems in the classification of variants is the reliability of the data sets that will be presented as an input to modeling for the classification of variants. In this study, a system design based on machine learning, which determines the reliability of variants to be introduced to a variant scoring model, is proposed. Thus, it is aimed to provide more reliable training data for variant scoring systems. Shapley Additive Explanation (SHAP) method has been used to determine the most effective ones. In the experiments carried out on ClinVar, one of the data sets where this problem was observed, classifiers were created for the detection of contradictory situations by using Support Vector Machines (SVMs) and Gradient Boosting Trees (XGBoost) methods. In this study, 157 features were reduced to 41 attributes in SVM modeling and 13 attributes in XGBoost modeling for the detection of contradictory situations, and results were very close to the performance rates obtained with all attributes. Keywords—Variant Conflicting Detection, SHAP, Machine Learning Interpretability, SVM, XGBoost.