{"title":"在自动tweet流摘要的背景下,探索由神经语言模型生成的无监督文本表示","authors":"Alexis Dusart, Karen Pinel-Sauvagnat, Gilles Hubert","doi":"10.1016/j.osnem.2023.100272","DOIUrl":null,"url":null,"abstract":"<div><p><span>Users are often overwhelmed by the amount of information generated on online social networks<span> and media (OSNEM), in particular Twitter, during particular events. Summarizing the information streams would help them be informed in a reasonable time. In parallel, recent state of the art in summarization has a special focus on deep neural models and pre-trained </span></span>language models.</p><p>In this context, we aim at (i) evaluating different pre-trained language model (PLM) to represent microblogs<span> (i.e., tweets), and (ii) to identify the most suitable ones in a summarization context, as well as (iii) to see how neural models can be used knowing the issue of input size limitation of such models. For this purpose, we divided the problem into 3 questions and made experiments on 3 different datasets. Using a simple greedy algorithm<span>, we first compared several pre-trained models for single tweet representation. We then evaluated the quality of the average representation of the stream and sought to use it as a starting point for a neural approach. First results show the interest of using USE and Sentence-BERT representations for tweet stream summarization, as well as the great potential of using the average representation of the stream.</span></span></p></div>","PeriodicalId":52228,"journal":{"name":"Online Social Networks and Media","volume":"37 ","pages":"Article 100272"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring unsupervised textual representations generated by neural language models in the context of automatic tweet stream summarization\",\"authors\":\"Alexis Dusart, Karen Pinel-Sauvagnat, Gilles Hubert\",\"doi\":\"10.1016/j.osnem.2023.100272\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span>Users are often overwhelmed by the amount of information generated on online social networks<span> and media (OSNEM), in particular Twitter, during particular events. Summarizing the information streams would help them be informed in a reasonable time. In parallel, recent state of the art in summarization has a special focus on deep neural models and pre-trained </span></span>language models.</p><p>In this context, we aim at (i) evaluating different pre-trained language model (PLM) to represent microblogs<span> (i.e., tweets), and (ii) to identify the most suitable ones in a summarization context, as well as (iii) to see how neural models can be used knowing the issue of input size limitation of such models. For this purpose, we divided the problem into 3 questions and made experiments on 3 different datasets. Using a simple greedy algorithm<span>, we first compared several pre-trained models for single tweet representation. We then evaluated the quality of the average representation of the stream and sought to use it as a starting point for a neural approach. First results show the interest of using USE and Sentence-BERT representations for tweet stream summarization, as well as the great potential of using the average representation of the stream.</span></span></p></div>\",\"PeriodicalId\":52228,\"journal\":{\"name\":\"Online Social Networks and Media\",\"volume\":\"37 \",\"pages\":\"Article 100272\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Online Social Networks and Media\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468696423000319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Social Networks and Media","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468696423000319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}
Exploring unsupervised textual representations generated by neural language models in the context of automatic tweet stream summarization
Users are often overwhelmed by the amount of information generated on online social networks and media (OSNEM), in particular Twitter, during particular events. Summarizing the information streams would help them be informed in a reasonable time. In parallel, recent state of the art in summarization has a special focus on deep neural models and pre-trained language models.
In this context, we aim at (i) evaluating different pre-trained language model (PLM) to represent microblogs (i.e., tweets), and (ii) to identify the most suitable ones in a summarization context, as well as (iii) to see how neural models can be used knowing the issue of input size limitation of such models. For this purpose, we divided the problem into 3 questions and made experiments on 3 different datasets. Using a simple greedy algorithm, we first compared several pre-trained models for single tweet representation. We then evaluated the quality of the average representation of the stream and sought to use it as a starting point for a neural approach. First results show the interest of using USE and Sentence-BERT representations for tweet stream summarization, as well as the great potential of using the average representation of the stream.