{"title":"Adaptive hyperparameter optimization for author name disambiguation","authors":"Shuo Lu, Yong Zhou","doi":"10.1002/asi.24996","DOIUrl":null,"url":null,"abstract":"<p>In the process of author name disambiguation (AND), varying characteristics and noise of different blocks significantly impact disambiguation performance. In this paper, we propose a block-based adaptive hyperparameter optimization method that assigns optimal hyperparameters to each block without altering the original AND model structure. Based on this, a random forest model is trained using the optimized results to fit the relationship between the block's data features and its optimal hyperparameters, thereby enabling the prediction of hyperparameters for new blocks. Empirical studies on 6 state-of-the-art AND algorithms, 11 public datasets, and a manually labeled dataset of China's information and communication technology (ICT) industry patents demonstrate that the proposed method significantly outperforms the original algorithms across multiple standard performance evaluation metrics (Cluster F1/Pairwise F1, B-Cubed F1, and K metrics). The results of the random forest regression indicate that the selected 16 features effectively predict the optimal hyperparameters. Further analysis reveals a power-law relationship between relative block size and both relative performance and relative optimized performance across all datasets and evaluation metrics, and the relative performance improvement of the adaptive hyperparameter optimization algorithm is particularly significant for smaller blocks. These findings provide theoretical support and practical guidance for the development of AND algorithms.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"76 8","pages":"1082-1104"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Association for Information Science and Technology","FirstCategoryId":"91","ListUrlMain":"https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24996","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In the process of author name disambiguation (AND), varying characteristics and noise of different blocks significantly impact disambiguation performance. In this paper, we propose a block-based adaptive hyperparameter optimization method that assigns optimal hyperparameters to each block without altering the original AND model structure. Based on this, a random forest model is trained using the optimized results to fit the relationship between the block's data features and its optimal hyperparameters, thereby enabling the prediction of hyperparameters for new blocks. Empirical studies on 6 state-of-the-art AND algorithms, 11 public datasets, and a manually labeled dataset of China's information and communication technology (ICT) industry patents demonstrate that the proposed method significantly outperforms the original algorithms across multiple standard performance evaluation metrics (Cluster F1/Pairwise F1, B-Cubed F1, and K metrics). The results of the random forest regression indicate that the selected 16 features effectively predict the optimal hyperparameters. Further analysis reveals a power-law relationship between relative block size and both relative performance and relative optimized performance across all datasets and evaluation metrics, and the relative performance improvement of the adaptive hyperparameter optimization algorithm is particularly significant for smaller blocks. These findings provide theoretical support and practical guidance for the development of AND algorithms.
期刊介绍:
The Journal of the Association for Information Science and Technology (JASIST) is a leading international forum for peer-reviewed research in information science. For more than half a century, JASIST has provided intellectual leadership by publishing original research that focuses on the production, discovery, recording, storage, representation, retrieval, presentation, manipulation, dissemination, use, and evaluation of information and on the tools and techniques associated with these processes.
The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (“Advances in Information Science”) and reviews of print and other media.