Adaptive hyperparameter optimization for author name disambiguation

IF 4.3 2区 管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Shuo Lu, Yong Zhou
{"title":"Adaptive hyperparameter optimization for author name disambiguation","authors":"Shuo Lu,&nbsp;Yong Zhou","doi":"10.1002/asi.24996","DOIUrl":null,"url":null,"abstract":"<p>In the process of author name disambiguation (AND), varying characteristics and noise of different blocks significantly impact disambiguation performance. In this paper, we propose a block-based adaptive hyperparameter optimization method that assigns optimal hyperparameters to each block without altering the original AND model structure. Based on this, a random forest model is trained using the optimized results to fit the relationship between the block's data features and its optimal hyperparameters, thereby enabling the prediction of hyperparameters for new blocks. Empirical studies on 6 state-of-the-art AND algorithms, 11 public datasets, and a manually labeled dataset of China's information and communication technology (ICT) industry patents demonstrate that the proposed method significantly outperforms the original algorithms across multiple standard performance evaluation metrics (Cluster F1/Pairwise F1, B-Cubed F1, and K metrics). The results of the random forest regression indicate that the selected 16 features effectively predict the optimal hyperparameters. Further analysis reveals a power-law relationship between relative block size and both relative performance and relative optimized performance across all datasets and evaluation metrics, and the relative performance improvement of the adaptive hyperparameter optimization algorithm is particularly significant for smaller blocks. These findings provide theoretical support and practical guidance for the development of AND algorithms.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"76 8","pages":"1082-1104"},"PeriodicalIF":4.3000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Association for Information Science and Technology","FirstCategoryId":"91","ListUrlMain":"https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24996","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In the process of author name disambiguation (AND), varying characteristics and noise of different blocks significantly impact disambiguation performance. In this paper, we propose a block-based adaptive hyperparameter optimization method that assigns optimal hyperparameters to each block without altering the original AND model structure. Based on this, a random forest model is trained using the optimized results to fit the relationship between the block's data features and its optimal hyperparameters, thereby enabling the prediction of hyperparameters for new blocks. Empirical studies on 6 state-of-the-art AND algorithms, 11 public datasets, and a manually labeled dataset of China's information and communication technology (ICT) industry patents demonstrate that the proposed method significantly outperforms the original algorithms across multiple standard performance evaluation metrics (Cluster F1/Pairwise F1, B-Cubed F1, and K metrics). The results of the random forest regression indicate that the selected 16 features effectively predict the optimal hyperparameters. Further analysis reveals a power-law relationship between relative block size and both relative performance and relative optimized performance across all datasets and evaluation metrics, and the relative performance improvement of the adaptive hyperparameter optimization algorithm is particularly significant for smaller blocks. These findings provide theoretical support and practical guidance for the development of AND algorithms.

Abstract Image

Abstract Image

Abstract Image

作者姓名消歧的自适应超参数优化
在作者姓名消歧(AND)过程中,不同块的特征和噪声的变化对消歧性能影响较大。在本文中,我们提出了一种基于块的自适应超参数优化方法,该方法在不改变原始AND模型结构的情况下为每个块分配最优超参数。在此基础上,利用优化结果训练随机森林模型,拟合块的数据特征与其最优超参数之间的关系,从而实现对新块的超参数预测。对6种最先进的AND算法、11个公共数据集和中国信息通信技术(ICT)行业专利的人工标记数据集进行的实证研究表明,该方法在多个标准性能评估指标(Cluster F1/Pairwise F1、b - cube F1和K指标)上显著优于原始算法。随机森林回归结果表明,选取的16个特征可以有效地预测最优超参数。进一步分析表明,在所有数据集和评估指标中,相对块大小与相对性能和相对优化性能之间存在幂律关系,自适应超参数优化算法的相对性能改进对于较小的块尤为显著。这些发现为and算法的发展提供了理论支持和实践指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.30
自引率
8.60%
发文量
115
期刊介绍: The Journal of the Association for Information Science and Technology (JASIST) is a leading international forum for peer-reviewed research in information science. For more than half a century, JASIST has provided intellectual leadership by publishing original research that focuses on the production, discovery, recording, storage, representation, retrieval, presentation, manipulation, dissemination, use, and evaluation of information and on the tools and techniques associated with these processes. The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (“Advances in Information Science”) and reviews of print and other media.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信