{"title":"结合BERT和正交约束非负矩阵分解的深度名称消歧","authors":"Yangchen Huang, Licai Wang, Zhonglin Liu","doi":"10.1109/CCIS53392.2021.9754675","DOIUrl":null,"url":null,"abstract":"We are searching information on the Internet every day, with people’s name as the most popular entries. However, the ambiguity of name itself makes the returning page a mix of person entities with the same name or even non-person entities. Moreover, the scoring algorithm might rank well-known person which appears more frequently to the front, which would cover the information of others. Name disambiguation addresses these two issues by extracting discriminative features from the context and grouping the returning pages. Nevertheless, modern methods are limited by the complicated manual feature design and clustering methods, as well as the pre-defined cluster number by experience. In this work, we propose to learn the semantic representations of person name reference items with the pre-trained language model BERT incorporating triplet loss, and further group the learned features with a constrained non-negative matrix factorization algorithm. To select proper cluster number automatically, we employ the Silhouette Coefficient. Experiments on the benchmark datasets WePS show the superiority of our method in name disambiguation compared with other state-of-the-art methods.","PeriodicalId":191226,"journal":{"name":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Name Disambiguation by Combining BERT and Orthogonal Constrained Non-negative Matrix Factorization\",\"authors\":\"Yangchen Huang, Licai Wang, Zhonglin Liu\",\"doi\":\"10.1109/CCIS53392.2021.9754675\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We are searching information on the Internet every day, with people’s name as the most popular entries. However, the ambiguity of name itself makes the returning page a mix of person entities with the same name or even non-person entities. Moreover, the scoring algorithm might rank well-known person which appears more frequently to the front, which would cover the information of others. Name disambiguation addresses these two issues by extracting discriminative features from the context and grouping the returning pages. Nevertheless, modern methods are limited by the complicated manual feature design and clustering methods, as well as the pre-defined cluster number by experience. In this work, we propose to learn the semantic representations of person name reference items with the pre-trained language model BERT incorporating triplet loss, and further group the learned features with a constrained non-negative matrix factorization algorithm. To select proper cluster number automatically, we employ the Silhouette Coefficient. Experiments on the benchmark datasets WePS show the superiority of our method in name disambiguation compared with other state-of-the-art methods.\",\"PeriodicalId\":191226,\"journal\":{\"name\":\"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCIS53392.2021.9754675\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS53392.2021.9754675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deep Name Disambiguation by Combining BERT and Orthogonal Constrained Non-negative Matrix Factorization
We are searching information on the Internet every day, with people’s name as the most popular entries. However, the ambiguity of name itself makes the returning page a mix of person entities with the same name or even non-person entities. Moreover, the scoring algorithm might rank well-known person which appears more frequently to the front, which would cover the information of others. Name disambiguation addresses these two issues by extracting discriminative features from the context and grouping the returning pages. Nevertheless, modern methods are limited by the complicated manual feature design and clustering methods, as well as the pre-defined cluster number by experience. In this work, we propose to learn the semantic representations of person name reference items with the pre-trained language model BERT incorporating triplet loss, and further group the learned features with a constrained non-negative matrix factorization algorithm. To select proper cluster number automatically, we employ the Silhouette Coefficient. Experiments on the benchmark datasets WePS show the superiority of our method in name disambiguation compared with other state-of-the-art methods.