{"title":"书目检索系统中作者-论文连接的无监督框架","authors":"Xin Ding, Hui Zhang, Xiaoyu Guo","doi":"10.1109/SKG.2018.00028","DOIUrl":null,"url":null,"abstract":"Author name ambiguity can significantly impact the accuracy of a bibliographic retrieval system, especially when author name served as a search keyword. In this paper, we propose an unsupervised approach addressing the name ambiguity problem by linking papers to their corresponding authors based on clustering result of word embeddings. Each cluster represents a collection of words in a certain research area. Papers and authors which to be disambiguated are then assigned a probability of each research area they belong to. We put those probabilities and some metadata of papers and authors as features into a graphic model and do the collective inference. Experiment shows that our entirely unsupervised method perform well for a Chinese Bibliographic Retrieval System even with a huge amount of noisy in its database.","PeriodicalId":265760,"journal":{"name":"2018 14th International Conference on Semantics, Knowledge and Grids (SKG)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Unsupervised Framework for Author-Paper Linking in Bibliographic Retrieval System\",\"authors\":\"Xin Ding, Hui Zhang, Xiaoyu Guo\",\"doi\":\"10.1109/SKG.2018.00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Author name ambiguity can significantly impact the accuracy of a bibliographic retrieval system, especially when author name served as a search keyword. In this paper, we propose an unsupervised approach addressing the name ambiguity problem by linking papers to their corresponding authors based on clustering result of word embeddings. Each cluster represents a collection of words in a certain research area. Papers and authors which to be disambiguated are then assigned a probability of each research area they belong to. We put those probabilities and some metadata of papers and authors as features into a graphic model and do the collective inference. Experiment shows that our entirely unsupervised method perform well for a Chinese Bibliographic Retrieval System even with a huge amount of noisy in its database.\",\"PeriodicalId\":265760,\"journal\":{\"name\":\"2018 14th International Conference on Semantics, Knowledge and Grids (SKG)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 14th International Conference on Semantics, Knowledge and Grids (SKG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SKG.2018.00028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Semantics, Knowledge and Grids (SKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SKG.2018.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Unsupervised Framework for Author-Paper Linking in Bibliographic Retrieval System
Author name ambiguity can significantly impact the accuracy of a bibliographic retrieval system, especially when author name served as a search keyword. In this paper, we propose an unsupervised approach addressing the name ambiguity problem by linking papers to their corresponding authors based on clustering result of word embeddings. Each cluster represents a collection of words in a certain research area. Papers and authors which to be disambiguated are then assigned a probability of each research area they belong to. We put those probabilities and some metadata of papers and authors as features into a graphic model and do the collective inference. Experiment shows that our entirely unsupervised method perform well for a Chinese Bibliographic Retrieval System even with a huge amount of noisy in its database.