Selecting Article Segment Titles Based on Keyphrase Features and Semantic Relatedness

Yuming Guo, M. Iwaihara
{"title":"Selecting Article Segment Titles Based on Keyphrase Features and Semantic Relatedness","authors":"Yuming Guo, M. Iwaihara","doi":"10.1109/IIAI-AAI.2018.00034","DOIUrl":null,"url":null,"abstract":"Nowadays people can find almost all kinds of information they want from the Internet. However, in most cases, users are not willing to find their target among segment among long paragraphs, by spending much time browsing texts. Existing work on topic labeling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Thus we propose a method for selecting titles for segments in long documents. We analyze the characteristics of high quality titles for article segments, from the aspect of semantic relatedness between the target segment and related articles as well as other segments. Then we revise three features proposed before. We improve the phraseness feature, for giving appropriate scores for long titles. Meanwhile, we combine the features SimPF and Embedding-vector to enhance the efficiency and rationality. We use Wikipedia articles for experimental evaluations, in which a large number of article segments are titled manually, and a great number of articles lack detailed segment titles. We evaluate scoring functions by where hidden original segment titles are ranked, through precision@K. Through rigorous evaluations, we show an optimum combination of the features.","PeriodicalId":309975,"journal":{"name":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2018.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays people can find almost all kinds of information they want from the Internet. However, in most cases, users are not willing to find their target among segment among long paragraphs, by spending much time browsing texts. Existing work on topic labeling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Thus we propose a method for selecting titles for segments in long documents. We analyze the characteristics of high quality titles for article segments, from the aspect of semantic relatedness between the target segment and related articles as well as other segments. Then we revise three features proposed before. We improve the phraseness feature, for giving appropriate scores for long titles. Meanwhile, we combine the features SimPF and Embedding-vector to enhance the efficiency and rationality. We use Wikipedia articles for experimental evaluations, in which a large number of article segments are titled manually, and a great number of articles lack detailed segment titles. We evaluate scoring functions by where hidden original segment titles are ranked, through precision@K. Through rigorous evaluations, we show an optimum combination of the features.
基于关键词特征和语义相关性的文章分段标题选择
现在人们可以从互联网上找到几乎所有他们想要的信息。然而,在大多数情况下,用户不愿意花费大量时间浏览文本,在长段落的分段中找到自己的目标。现有的主题标注工作在文档分类方面表现良好,但在细节内容的粒度方面做得不够。因此,我们提出了一种在长文档中选择片段标题的方法。本文从目标词段与相关词段以及其他词段之间的语义关联角度,分析了高质量词段标题的特征。然后对之前提出的三个特征进行了修正。我们改进了短语功能,为长标题提供适当的分数。同时,我们结合了SimPF和Embedding-vector的特点,提高了算法的效率和合理性。我们使用维基百科的文章进行实验评估,其中大量的文章分段是手工命名的,大量的文章缺乏详细的分段标题。我们通过precision@K通过隐藏的原始片段标题的排名来评估评分函数。通过严格的评估,我们展示了特征的最佳组合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信