ToFM: Topic-specific Facet Mining by Facet Propagation within Clusters

Hongxuan Li, Bifan Wei, Jun Liu, Zhaotong Guo, Jingchao Qi, Bei Wu, Yong Liu, Yuanyuan Shi
{"title":"ToFM: Topic-specific Facet Mining by Facet Propagation within Clusters","authors":"Hongxuan Li, Bifan Wei, Jun Liu, Zhaotong Guo, Jingchao Qi, Bei Wu, Yong Liu, Yuanyuan Shi","doi":"10.1109/ICKG52313.2021.00060","DOIUrl":null,"url":null,"abstract":"Mining the facets of topics is an essential task for information retrieval, information extraction and knowledge base construction. For the topics in courses, there are three challenges: different topics have different facet, the labels of facets rarely appear in the topic description text and not all topics have enough textural information to mine facets. In this paper we propose a weakly-supervised algorithm for topic-specific facet mining (ToFM for short) based on our finding that similar topics in a cluster have similar facet sets. For example, topics Binary Search Tree, Suffix Tree and AVL tree in Tree cluster have example, insertion, deletion, traversal and other similar facets. ToFM first splits topics in a domain into several topic clusters based on the topic description text. Then ToFM extracts initial facet sets for all topics from the corresponding Wikipedia article pages. Finally, ToFM performs a normalized facet propagation within each topic cluster to acquire final facet sets of every topic. We evaluate the performance of ToFM on six real-world datasets and experimental results show that ToFM achieves better performance than the existing facet mining algorithms.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKG52313.2021.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Mining the facets of topics is an essential task for information retrieval, information extraction and knowledge base construction. For the topics in courses, there are three challenges: different topics have different facet, the labels of facets rarely appear in the topic description text and not all topics have enough textural information to mine facets. In this paper we propose a weakly-supervised algorithm for topic-specific facet mining (ToFM for short) based on our finding that similar topics in a cluster have similar facet sets. For example, topics Binary Search Tree, Suffix Tree and AVL tree in Tree cluster have example, insertion, deletion, traversal and other similar facets. ToFM first splits topics in a domain into several topic clusters based on the topic description text. Then ToFM extracts initial facet sets for all topics from the corresponding Wikipedia article pages. Finally, ToFM performs a normalized facet propagation within each topic cluster to acquire final facet sets of every topic. We evaluate the performance of ToFM on six real-world datasets and experimental results show that ToFM achieves better performance than the existing facet mining algorithms.
ToFM:在集群内通过Facet传播进行特定主题的Facet挖掘
主题方面的挖掘是信息检索、信息抽取和知识库建设的重要任务。对于课程中的主题,存在三个挑战:不同的主题有不同的facet, facet的标签很少出现在主题描述文本中,并不是所有的主题都有足够的纹理信息来挖掘facet。在本文中,我们提出了一种弱监督算法,用于特定主题的facet挖掘(简称ToFM),这是基于我们发现集群中相似的主题具有相似的facet集。例如,树簇中的主题二叉搜索树、后缀树和AVL树具有示例、插入、删除、遍历等类似方面。ToFM首先根据主题描述文本将领域中的主题划分为几个主题集群。然后ToFM从相应的Wikipedia文章页面中提取所有主题的初始facet集。最后,ToFM在每个主题集群内执行规范化的facet传播,以获取每个主题的最终facet集。我们在六个真实数据集上评估了ToFM的性能,实验结果表明ToFM比现有的facet挖掘算法取得了更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信