Firdaus, Suci Dwi Lestari, S. Nurmaini, R. F. Malik, M. N. Rachmatullah, Annisa Darmawahyuni, Ade Iriani Sapitri, Mohammad El Qiliqsandy
{"title":"Author Matching Classification on a Highly Imbalanced Bibliographic Data using Cost-Sensitive Deep Neural Network","authors":"Firdaus, Suci Dwi Lestari, S. Nurmaini, R. F. Malik, M. N. Rachmatullah, Annisa Darmawahyuni, Ade Iriani Sapitri, Mohammad El Qiliqsandy","doi":"10.1109/ICIMCIS53775.2021.9699331","DOIUrl":null,"url":null,"abstract":"One of the stages before classifying the author matching is to combine the data, in this case the resulting data becomes highly imbalanced dataset, between the author who matches or the author who does not match. This paper presents a method to solve the highly imbalanced problem in author matching classification. The method used Cost-Sensitive Deep Neural Network (CSDNN). CSDNN will consider costs that vary from the type of data misclassification. As text feature similarity measures, we use cosine similarity. And we use Digital Bibliography & Library Project (DBLP) data as a dataset. The result is outstanding in terms of specificity 0.99, precision 0.95, recall 0.96, f1-score 0.96, and accuracy 0.99.","PeriodicalId":250460,"journal":{"name":"2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIMCIS53775.2021.9699331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
One of the stages before classifying the author matching is to combine the data, in this case the resulting data becomes highly imbalanced dataset, between the author who matches or the author who does not match. This paper presents a method to solve the highly imbalanced problem in author matching classification. The method used Cost-Sensitive Deep Neural Network (CSDNN). CSDNN will consider costs that vary from the type of data misclassification. As text feature similarity measures, we use cosine similarity. And we use Digital Bibliography & Library Project (DBLP) data as a dataset. The result is outstanding in terms of specificity 0.99, precision 0.95, recall 0.96, f1-score 0.96, and accuracy 0.99.