通过层次结构提取总结长篇科学文献

Grishma Sharma , Deepak Sharma , M. Sasikumar
{"title":"通过层次结构提取总结长篇科学文献","authors":"Grishma Sharma ,&nbsp;Deepak Sharma ,&nbsp;M. Sasikumar","doi":"10.1016/j.nlp.2024.100080","DOIUrl":null,"url":null,"abstract":"<div><p>In the realm of academia, staying updated with the latest advancements has become increasingly difficult due to the rapid rise in scientific publications. Text summarization emerges as a solution to this challenge by distilling essential contributions into concise summaries. Despite the structured nature of scientific documents, current summarization techniques often overlook this valuable structural information. Our proposed method addresses this gap through an unsupervised, extractive, user preference-based, and hierarchical iterative graph-based ranking algorithm for summarizing long scientific documents. Unlike existing approaches, our method operates by leveraging the inherent structural information within scientific texts to generate diverse summaries tailored to user preferences. To assess the efficiency of our approach, we conducted evaluations on two distinct long document datasets: ScisummNet and a custom dataset comprising papers from esteemed journals and conferences with human-extracted sentences as gold summaries. The results obtained using automatic evaluation metric Rouge scores as well as human evaluation, demonstrate that our method performs better than other well-known unsupervised algorithms. This emphasizes the need for structural information in text summarization, enabling more effective and customizable solutions.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100080"},"PeriodicalIF":0.0000,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000281/pdfft?md5=7e249fba3a7dd6613770889389366f05&pid=1-s2.0-S2949719124000281-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Summarizing long scientific documents through hierarchical structure extraction\",\"authors\":\"Grishma Sharma ,&nbsp;Deepak Sharma ,&nbsp;M. Sasikumar\",\"doi\":\"10.1016/j.nlp.2024.100080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In the realm of academia, staying updated with the latest advancements has become increasingly difficult due to the rapid rise in scientific publications. Text summarization emerges as a solution to this challenge by distilling essential contributions into concise summaries. Despite the structured nature of scientific documents, current summarization techniques often overlook this valuable structural information. Our proposed method addresses this gap through an unsupervised, extractive, user preference-based, and hierarchical iterative graph-based ranking algorithm for summarizing long scientific documents. Unlike existing approaches, our method operates by leveraging the inherent structural information within scientific texts to generate diverse summaries tailored to user preferences. To assess the efficiency of our approach, we conducted evaluations on two distinct long document datasets: ScisummNet and a custom dataset comprising papers from esteemed journals and conferences with human-extracted sentences as gold summaries. The results obtained using automatic evaluation metric Rouge scores as well as human evaluation, demonstrate that our method performs better than other well-known unsupervised algorithms. This emphasizes the need for structural information in text summarization, enabling more effective and customizable solutions.</p></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"8 \",\"pages\":\"Article 100080\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000281/pdfft?md5=7e249fba3a7dd6613770889389366f05&pid=1-s2.0-S2949719124000281-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在学术领域,由于科学出版物的迅速增加,要了解最新进展变得越来越困难。文本摘要是解决这一难题的一种方法,它能将重要的贡献提炼成简明扼要的摘要。尽管科学文档具有结构化的特点,但目前的摘要技术往往忽略了这些宝贵的结构信息。我们提出的方法通过一种无监督、抽取式、基于用户偏好和分层迭代图式排序算法来总结长篇科学文档,从而弥补了这一不足。与现有方法不同的是,我们的方法利用科学文本中固有的结构信息来生成符合用户偏好的各种摘要。为了评估我们方法的效率,我们在两个不同的长文档数据集上进行了评估:这两个数据集分别是 ScisummNet 数据集和一个自定义数据集,前者包括来自著名期刊和会议的论文,后者以人工提取的句子作为金摘要。使用自动评估指标 Rouge 分数和人工评估获得的结果表明,我们的方法比其他著名的无监督算法性能更好。这强调了在文本摘要中对结构信息的需求,从而使解决方案更加有效和可定制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Summarizing long scientific documents through hierarchical structure extraction

In the realm of academia, staying updated with the latest advancements has become increasingly difficult due to the rapid rise in scientific publications. Text summarization emerges as a solution to this challenge by distilling essential contributions into concise summaries. Despite the structured nature of scientific documents, current summarization techniques often overlook this valuable structural information. Our proposed method addresses this gap through an unsupervised, extractive, user preference-based, and hierarchical iterative graph-based ranking algorithm for summarizing long scientific documents. Unlike existing approaches, our method operates by leveraging the inherent structural information within scientific texts to generate diverse summaries tailored to user preferences. To assess the efficiency of our approach, we conducted evaluations on two distinct long document datasets: ScisummNet and a custom dataset comprising papers from esteemed journals and conferences with human-extracted sentences as gold summaries. The results obtained using automatic evaluation metric Rouge scores as well as human evaluation, demonstrate that our method performs better than other well-known unsupervised algorithms. This emphasizes the need for structural information in text summarization, enabling more effective and customizable solutions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信