{"title":"A novel framework with ComMAND: A combined method for author name disambiguation","authors":"Natan S. Rodrigues , Célia G. Ralha","doi":"10.1016/j.ipm.2025.104304","DOIUrl":null,"url":null,"abstract":"<div><div>Author Name Disambiguation (AND) in digital bibliographic repositories is a persistent challenge due to homonyms and synonyms, compromising information retrieval and database integrity. This work presents a novel framework with a <strong>Com</strong>bined <strong>M</strong>ethod for <strong>A</strong>uthor <strong>N</strong>ame <strong>D</strong>isambiguation (ComMAND) that integrates transfer learning with SciBERT, Graph Convolutional Network (GCN), and Graph-enhanced Hierarchical Agglomerative Clustering (GHAC) to enhance AND performance. The framework includes a Graphical User Interface (GUI), allowing users to load datasets, execute AND tasks, and visualize results without requiring programming knowledge. By semantically analyzing document content and leveraging graph-based relationships, our approach achieves higher precision in identifying unique authors. Experimental results on AMiner-12, AMiner-18, and DBLP validate the effectiveness of the framework. Considering the DBLP dataset, which contains extensive ambiguous name references (679), the results show the highest F1 of 0.869 and K-metric of 0.972 compared to the baseline works, with improvements ranging from 1.1% to 33.6% over baseline works. These findings highlight the effectiveness of combining machine learning, graph-based techniques, and clustering for large-scale AND tasks.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104304"},"PeriodicalIF":7.4000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002456","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Author Name Disambiguation (AND) in digital bibliographic repositories is a persistent challenge due to homonyms and synonyms, compromising information retrieval and database integrity. This work presents a novel framework with a Combined Method for Author Name Disambiguation (ComMAND) that integrates transfer learning with SciBERT, Graph Convolutional Network (GCN), and Graph-enhanced Hierarchical Agglomerative Clustering (GHAC) to enhance AND performance. The framework includes a Graphical User Interface (GUI), allowing users to load datasets, execute AND tasks, and visualize results without requiring programming knowledge. By semantically analyzing document content and leveraging graph-based relationships, our approach achieves higher precision in identifying unique authors. Experimental results on AMiner-12, AMiner-18, and DBLP validate the effectiveness of the framework. Considering the DBLP dataset, which contains extensive ambiguous name references (679), the results show the highest F1 of 0.869 and K-metric of 0.972 compared to the baseline works, with improvements ranging from 1.1% to 33.6% over baseline works. These findings highlight the effectiveness of combining machine learning, graph-based techniques, and clustering for large-scale AND tasks.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.