{"title":"Cloud-Based Name Disambiguation Algorithm","authors":"Yang Juan, He Hua, Wu Bin","doi":"10.1109/ISME.2010.33","DOIUrl":null,"url":null,"abstract":"In Scientific Collaboration Networks, the phenomenon that one author name corresponds to many author entities is very common. Traditional algorithms for name disambiguation performed inefficiently in dealing with massive data. This paper presents a parallel algorithm for solving the name disambiguation problem: first merge authors with same names and similar author information, then divide the scientific collaboration networks into author communities, authors with same name in one community is supposed as one entity with great possibility. The algorithm is based on the Cloud-Computing platform, and has the ability to deal with massive data. In our experiment, the algorithm efficiently processed massive data and achieved an average f-score of 0.93.","PeriodicalId":348878,"journal":{"name":"2010 International Conference of Information Science and Management Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference of Information Science and Management Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISME.2010.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In Scientific Collaboration Networks, the phenomenon that one author name corresponds to many author entities is very common. Traditional algorithms for name disambiguation performed inefficiently in dealing with massive data. This paper presents a parallel algorithm for solving the name disambiguation problem: first merge authors with same names and similar author information, then divide the scientific collaboration networks into author communities, authors with same name in one community is supposed as one entity with great possibility. The algorithm is based on the Cloud-Computing platform, and has the ability to deal with massive data. In our experiment, the algorithm efficiently processed massive data and achieved an average f-score of 0.93.