{"title":"LUCID:使用图结构聚类消除作者姓名歧义","authors":"I. Hussain, S. Asghar","doi":"10.1109/INTELLISYS.2017.8324326","DOIUrl":null,"url":null,"abstract":"Author name ambiguity may occur in two situations when multiple authors have the same name or the same author writes her name in multiple ways. The former is called homonym and the later is called synonym. Disambiguation of these ambiguous authors is a non-trivial job because there is a limited amount of information available in citations data set. In this paper, a graph structural clustering algorithm “LUCID: Author Name Disambiguation using Graph Structural Clustering” is proposed which disambiguates authors by using community detection algorithm and graph operations. In the first phase, LUCID performs some preprocessing tasks on data set and creates blocks of ambiguous authors. In the second phase coauthors graph is built and “SCAN: A Structural Clustering Algorithm for Networks” is applied to detect hubs, outliers, and clusters of nodes (author communities). The hub node that intersects with many clusters is considered as a homonym and resolved by splitting across this node. Finally, the synonyms are disambiguated using proposed hybrid similarity function. LUCID performance is evaluated using a real data set of Arnetminer. Results show that LUCID performance is overall better than baseline methods and it achieves 97% in terms of pairwise precision, 74% in pairwise recall and 82% in pairwise F1.","PeriodicalId":131825,"journal":{"name":"2017 Intelligent Systems Conference (IntelliSys)","volume":"06 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"LUCID: Author name disambiguation using graph Structural Clustering\",\"authors\":\"I. Hussain, S. Asghar\",\"doi\":\"10.1109/INTELLISYS.2017.8324326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Author name ambiguity may occur in two situations when multiple authors have the same name or the same author writes her name in multiple ways. The former is called homonym and the later is called synonym. Disambiguation of these ambiguous authors is a non-trivial job because there is a limited amount of information available in citations data set. In this paper, a graph structural clustering algorithm “LUCID: Author Name Disambiguation using Graph Structural Clustering” is proposed which disambiguates authors by using community detection algorithm and graph operations. In the first phase, LUCID performs some preprocessing tasks on data set and creates blocks of ambiguous authors. In the second phase coauthors graph is built and “SCAN: A Structural Clustering Algorithm for Networks” is applied to detect hubs, outliers, and clusters of nodes (author communities). The hub node that intersects with many clusters is considered as a homonym and resolved by splitting across this node. Finally, the synonyms are disambiguated using proposed hybrid similarity function. LUCID performance is evaluated using a real data set of Arnetminer. Results show that LUCID performance is overall better than baseline methods and it achieves 97% in terms of pairwise precision, 74% in pairwise recall and 82% in pairwise F1.\",\"PeriodicalId\":131825,\"journal\":{\"name\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"volume\":\"06 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTELLISYS.2017.8324326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Intelligent Systems Conference (IntelliSys)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELLISYS.2017.8324326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
作者姓名歧义可能发生在两种情况下,即多个作者有相同的名字或同一作者以多种方式写自己的名字。前者称为谐音,后者称为同义词。由于引文数据集中的可用信息有限,因此对这些有歧义的作者进行消歧是一项非常重要的工作。本文提出了一种图结构聚类算法“LUCID: Author Name Disambiguation using graph structural clustering”,该算法利用社区检测算法和图运算对作者进行消歧。在第一阶段,LUCID对数据集执行一些预处理任务,并创建模糊作者块。在第二阶段,构建合著者图,并应用“SCAN:网络结构聚类算法”来检测中心、异常值和节点集群(作者社区)。与许多集群相交的hub节点被视为同音异义,并通过在该节点上进行拆分来解决。最后,利用所提出的混合相似函数消除同义词的歧义。使用Arnetminer的真实数据集对LUCID性能进行了评估。结果表明,LUCID的性能总体上优于基线方法,在成对精度方面达到97%,在成对召回率方面达到74%,在成对F1方面达到82%。
LUCID: Author name disambiguation using graph Structural Clustering
Author name ambiguity may occur in two situations when multiple authors have the same name or the same author writes her name in multiple ways. The former is called homonym and the later is called synonym. Disambiguation of these ambiguous authors is a non-trivial job because there is a limited amount of information available in citations data set. In this paper, a graph structural clustering algorithm “LUCID: Author Name Disambiguation using Graph Structural Clustering” is proposed which disambiguates authors by using community detection algorithm and graph operations. In the first phase, LUCID performs some preprocessing tasks on data set and creates blocks of ambiguous authors. In the second phase coauthors graph is built and “SCAN: A Structural Clustering Algorithm for Networks” is applied to detect hubs, outliers, and clusters of nodes (author communities). The hub node that intersects with many clusters is considered as a homonym and resolved by splitting across this node. Finally, the synonyms are disambiguated using proposed hybrid similarity function. LUCID performance is evaluated using a real data set of Arnetminer. Results show that LUCID performance is overall better than baseline methods and it achieves 97% in terms of pairwise precision, 74% in pairwise recall and 82% in pairwise F1.