Adaptive hyperparameter optimization for author name disambiguation

IF 4.3 2区管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the Association for Information Science and Technology Pub Date : 2025-03-12 DOI:10.1002/asi.24996

Shuo Lu, Yong Zhou

{"title":"Adaptive hyperparameter optimization for author name disambiguation","authors":"Shuo Lu, Yong Zhou","doi":"10.1002/asi.24996","DOIUrl":null,"url":null,"abstract":"<p>In the process of author name disambiguation (AND), varying characteristics and noise of different blocks significantly impact disambiguation performance. In this paper, we propose a block-based adaptive hyperparameter optimization method that assigns optimal hyperparameters to each block without altering the original AND model structure. Based on this, a random forest model is trained using the optimized results to fit the relationship between the block's data features and its optimal hyperparameters, thereby enabling the prediction of hyperparameters for new blocks. Empirical studies on 6 state-of-the-art AND algorithms, 11 public datasets, and a manually labeled dataset of China's information and communication technology (ICT) industry patents demonstrate that the proposed method significantly outperforms the original algorithms across multiple standard performance evaluation metrics (Cluster F1/Pairwise F1, B-Cubed F1, and K metrics). The results of the random forest regression indicate that the selected 16 features effectively predict the optimal hyperparameters. Further analysis reveals a power-law relationship between relative block size and both relative performance and relative optimized performance across all datasets and evaluation metrics, and the relative performance improvement of the adaptive hyperparameter optimization algorithm is particularly significant for smaller blocks. These findings provide theoretical support and practical guidance for the development of AND algorithms.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"76 8","pages":"1082-1104"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Association for Information Science and Technology","FirstCategoryId":"91","ListUrlMain":"https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24996","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the process of author name disambiguation (AND), varying characteristics and noise of different blocks significantly impact disambiguation performance. In this paper, we propose a block-based adaptive hyperparameter optimization method that assigns optimal hyperparameters to each block without altering the original AND model structure. Based on this, a random forest model is trained using the optimized results to fit the relationship between the block's data features and its optimal hyperparameters, thereby enabling the prediction of hyperparameters for new blocks. Empirical studies on 6 state-of-the-art AND algorithms, 11 public datasets, and a manually labeled dataset of China's information and communication technology (ICT) industry patents demonstrate that the proposed method significantly outperforms the original algorithms across multiple standard performance evaluation metrics (Cluster F1/Pairwise F1, B-Cubed F1, and K metrics). The results of the random forest regression indicate that the selected 16 features effectively predict the optimal hyperparameters. Further analysis reveals a power-law relationship between relative block size and both relative performance and relative optimized performance across all datasets and evaluation metrics, and the relative performance improvement of the adaptive hyperparameter optimization algorithm is particularly significant for smaller blocks. These findings provide theoretical support and practical guidance for the development of AND algorithms.

Abstract Image

查看原文本刊更多论文

作者姓名消歧的自适应超参数优化

在作者姓名消歧（AND）过程中，不同块的特征和噪声的变化对消歧性能影响较大。在本文中，我们提出了一种基于块的自适应超参数优化方法，该方法在不改变原始AND模型结构的情况下为每个块分配最优超参数。在此基础上，利用优化结果训练随机森林模型，拟合块的数据特征与其最优超参数之间的关系，从而实现对新块的超参数预测。对6种最先进的AND算法、11个公共数据集和中国信息通信技术（ICT）行业专利的人工标记数据集进行的实证研究表明，该方法在多个标准性能评估指标（Cluster F1/Pairwise F1、b - cube F1和K指标）上显著优于原始算法。随机森林回归结果表明，选取的16个特征可以有效地预测最优超参数。进一步分析表明，在所有数据集和评估指标中，相对块大小与相对性能和相对优化性能之间存在幂律关系，自适应超参数优化算法的相对性能改进对于较小的块尤为显著。这些发现为and算法的发展提供了理论支持和实践指导。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the Association for Information Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

8.30

自引率

8.60%

发文量

115

期刊介绍： The Journal of the Association for Information Science and Technology (JASIST) is a leading international forum for peer-reviewed research in information science. For more than half a century, JASIST has provided intellectual leadership by publishing original research that focuses on the production, discovery, recording, storage, representation, retrieval, presentation, manipulation, dissemination, use, and evaluation of information and on the tools and techniques associated with these processes. The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (“Advances in Information Science”) and reviews of print and other media.