基于k -均值聚类的科学论文BERT提取摘要

Feliks Victor Parningotan Samosir, Hapnes Toba, M. Ayub
{"title":"基于k -均值聚类的科学论文BERT提取摘要","authors":"Feliks Victor Parningotan Samosir, Hapnes Toba, M. Ayub","doi":"10.28932/jutisi.v8i1.4474","DOIUrl":null,"url":null,"abstract":"This study aims to propose methods and models for extractive text summarization with contextual embedding. To build this model, a combination of traditional machine learning algorithms such as K-Means Clustering and the latest BERT-based architectures such as Sentence-BERT (SBERT) is carried out. The contextual embedding process will be carried out at the sentence level by SBERT. Embedded sentences will be clustered and the distance calculated from the centroid. The top sentences from each cluster will be used as summary candidates. The dataset used in this study is a collection of scientific journals from NeurIPS. Performance evaluation carried out with ROUGE-L gave a result of 15.52% and a BERTScore of 85.55%. This result surpasses several previous models such as PyTextRank and BERT Extractive Summarizer. The results of these measurements prove that the use of contextual embedding is very good if applied to extractive text summarization which is generally done at the sentence level.","PeriodicalId":185279,"journal":{"name":"Jurnal Teknik Informatika dan Sistem Informasi","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BESKlus : BERT Extractive Summarization with K-Means Clustering in Scientific Paper\",\"authors\":\"Feliks Victor Parningotan Samosir, Hapnes Toba, M. Ayub\",\"doi\":\"10.28932/jutisi.v8i1.4474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims to propose methods and models for extractive text summarization with contextual embedding. To build this model, a combination of traditional machine learning algorithms such as K-Means Clustering and the latest BERT-based architectures such as Sentence-BERT (SBERT) is carried out. The contextual embedding process will be carried out at the sentence level by SBERT. Embedded sentences will be clustered and the distance calculated from the centroid. The top sentences from each cluster will be used as summary candidates. The dataset used in this study is a collection of scientific journals from NeurIPS. Performance evaluation carried out with ROUGE-L gave a result of 15.52% and a BERTScore of 85.55%. This result surpasses several previous models such as PyTextRank and BERT Extractive Summarizer. The results of these measurements prove that the use of contextual embedding is very good if applied to extractive text summarization which is generally done at the sentence level.\",\"PeriodicalId\":185279,\"journal\":{\"name\":\"Jurnal Teknik Informatika dan Sistem Informasi\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Teknik Informatika dan Sistem Informasi\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.28932/jutisi.v8i1.4474\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknik Informatika dan Sistem Informasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28932/jutisi.v8i1.4474","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究旨在提出基于上下文嵌入的抽取文本摘要方法和模型。为了构建该模型,将传统的机器学习算法(如k均值聚类)和最新的基于bert的架构(如句子bert (SBERT))相结合。上下文嵌入过程将由SBERT在句子层面进行。嵌入的句子将被聚类,并从质心计算距离。每个聚类的最佳句子将被用作摘要候选词。本研究中使用的数据集是来自NeurIPS的科学期刊集。用ROUGE-L进行性能评价,结果为15.52%,BERTScore为85.55%。这个结果超过了以前的几个模型,如PyTextRank和BERT Extractive Summarizer。这些测量结果证明,如果将上下文嵌入应用于通常在句子级别完成的提取文本摘要,则效果非常好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
BESKlus : BERT Extractive Summarization with K-Means Clustering in Scientific Paper
This study aims to propose methods and models for extractive text summarization with contextual embedding. To build this model, a combination of traditional machine learning algorithms such as K-Means Clustering and the latest BERT-based architectures such as Sentence-BERT (SBERT) is carried out. The contextual embedding process will be carried out at the sentence level by SBERT. Embedded sentences will be clustered and the distance calculated from the centroid. The top sentences from each cluster will be used as summary candidates. The dataset used in this study is a collection of scientific journals from NeurIPS. Performance evaluation carried out with ROUGE-L gave a result of 15.52% and a BERTScore of 85.55%. This result surpasses several previous models such as PyTextRank and BERT Extractive Summarizer. The results of these measurements prove that the use of contextual embedding is very good if applied to extractive text summarization which is generally done at the sentence level.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信