基于多特征组合的三阶段聚类框架中文人名消歧

Fei Wang, Yi Yang, Zhaocai Ma, Lian Li
{"title":"基于多特征组合的三阶段聚类框架中文人名消歧","authors":"Fei Wang, Yi Yang, Zhaocai Ma, Lian Li","doi":"10.1109/ISCC-C.2013.33","DOIUrl":null,"url":null,"abstract":"To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation\",\"authors\":\"Fei Wang, Yi Yang, Zhaocai Ma, Lian Li\",\"doi\":\"10.1109/ISCC-C.2013.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.\",\"PeriodicalId\":313511,\"journal\":{\"name\":\"2013 International Conference on Information Science and Cloud Computing Companion\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Information Science and Cloud Computing Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC-C.2013.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Information Science and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC-C.2013.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

为了解决人名歧义问题,提高人名消歧义的性能,本文提出了一种三阶段聚类算法。在第一阶段,使用组织和位置(OLs)来聚类关于同一个人的文档,因此一些更相似的文本将被分配到一个类别。这个阶段是简单的基于ol相似性的文档聚类。在第二阶段,将聚类文档用作新数据源,从中提取一些新特征(如合著者姓名)。我们使用这些新提取的特征在文档之间进行额外的聚类。同时,提出了一种基于共同作者关系的社会网络构建方法来解决姓名歧义问题。在第三阶段,采用基于内容的层次聚类算法对网页进行聚类,然后对有用内容(包括标题、摘要和关键词)进行分析,消除歧义性名称。实验结果表明,本文提出的三阶段聚类算法可以有效地提高人名消歧的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation
To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信