建立神经网络模型，识别和纠正夸大肥胖相关科学发现的新闻标题

Journal of data and information science (Warsaw, Poland) Pub Date : 2023-06-01 DOI:10.2478/jdis-2023-0014

R. An, Quinlan Batcheller, Junjie Wang, Yuyi Yang

{"title":"建立神经网络模型，识别和纠正夸大肥胖相关科学发现的新闻标题","authors":"R. An, Quinlan Batcheller, Junjie Wang, Yuyi Yang","doi":"10.2478/jdis-2023-0014","DOIUrl":null,"url":null,"abstract":"Abstract Purpose Media exaggerations of health research may confuse readers’ understanding, erode public trust in science and medicine, and cause disease mismanagement. This study built artificial intelligence (AI) models to automatically identify and correct news headlines exaggerating obesity-related research findings. Design/methodology/approach We searched popular digital media outlets to collect 523 headlines exaggerating obesity-related research findings. The reasons for exaggerations include: inferring causality from observational studies, inferring human outcomes from animal research, inferring distant/end outcomes (e.g., obesity) from immediate/intermediate outcomes (e.g., calorie intake), and generalizing findings to the population from a subgroup or convenience sample. Each headline was paired with the title and abstract of the peer-reviewed journal publication covered by the news article. We drafted an exaggeration-free counterpart for each original headline and fined-tuned a BERT model to differentiate between them. We further fine-tuned three generative language models—BART, PEGASUS, and T5 to autogenerate exaggeration-free headlines based on a journal publication’s title and abstract. Model performance was evaluated using the ROUGE metrics by comparing model-generated headlines with journal publication titles. Findings The fine-tuned BERT model achieved 92.5% accuracy in differentiating between exaggeration-free and original headlines. Baseline ROUGE scores averaged 0.311 for ROUGE-1, 0.113 for ROUGE-2, 0.253 for ROUGE-L, and 0.253 ROUGE-Lsum. PEGASUS, T5, and BART all outperformed the baseline. The best-performing BART model attained 0.447 for ROUGE-1, 0.221 for ROUGE-2, 0.402 for ROUGE-L, and 0.402 for ROUGE-Lsum. Originality/value This study demonstrated the feasibility of leveraging AI to automatically identify and correct news headlines exaggerating obesity-related research findings.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"88 - 97"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings\",\"authors\":\"R. An, Quinlan Batcheller, Junjie Wang, Yuyi Yang\",\"doi\":\"10.2478/jdis-2023-0014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Purpose Media exaggerations of health research may confuse readers’ understanding, erode public trust in science and medicine, and cause disease mismanagement. This study built artificial intelligence (AI) models to automatically identify and correct news headlines exaggerating obesity-related research findings. Design/methodology/approach We searched popular digital media outlets to collect 523 headlines exaggerating obesity-related research findings. The reasons for exaggerations include: inferring causality from observational studies, inferring human outcomes from animal research, inferring distant/end outcomes (e.g., obesity) from immediate/intermediate outcomes (e.g., calorie intake), and generalizing findings to the population from a subgroup or convenience sample. Each headline was paired with the title and abstract of the peer-reviewed journal publication covered by the news article. We drafted an exaggeration-free counterpart for each original headline and fined-tuned a BERT model to differentiate between them. We further fine-tuned three generative language models—BART, PEGASUS, and T5 to autogenerate exaggeration-free headlines based on a journal publication’s title and abstract. Model performance was evaluated using the ROUGE metrics by comparing model-generated headlines with journal publication titles. Findings The fine-tuned BERT model achieved 92.5% accuracy in differentiating between exaggeration-free and original headlines. Baseline ROUGE scores averaged 0.311 for ROUGE-1, 0.113 for ROUGE-2, 0.253 for ROUGE-L, and 0.253 ROUGE-Lsum. PEGASUS, T5, and BART all outperformed the baseline. The best-performing BART model attained 0.447 for ROUGE-1, 0.221 for ROUGE-2, 0.402 for ROUGE-L, and 0.402 for ROUGE-Lsum. Originality/value This study demonstrated the feasibility of leveraging AI to automatically identify and correct news headlines exaggerating obesity-related research findings.\",\"PeriodicalId\":92237,\"journal\":{\"name\":\"Journal of data and information science (Warsaw, Poland)\",\"volume\":\"8 1\",\"pages\":\"88 - 97\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of data and information science (Warsaw, Poland)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/jdis-2023-0014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data and information science (Warsaw, Poland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jdis-2023-0014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

摘要目的媒体对健康研究的夸大可能会混淆读者的理解，侵蚀公众对科学和医学的信任，并导致疾病管理不善。该研究建立了人工智能(AI)模型，自动识别和纠正夸大肥胖相关研究结果的新闻标题。设计/方法/方法我们搜索了流行的数字媒体，收集了523个夸大肥胖相关研究结果的标题。夸大的原因包括:从观察性研究推断因果关系，从动物研究推断人类结果，从直接/中期结果(如卡路里摄入量)推断遥远/最终结果(如肥胖)，以及从亚组或方便样本中将结果推广到人群。每个标题都与新闻文章所涉及的同行评审期刊出版物的标题和摘要配对。我们为每个原始标题起草了一个没有夸张的对应标题，并对BERT模型进行了微调，以区分它们。我们进一步微调了三个生成语言模型——bart、PEGASUS和T5，以根据期刊出版物的标题和摘要自动生成无夸张的标题。通过比较模型生成的标题和期刊出版标题，使用ROUGE度量来评估模型的性能。结果改进后的BERT模型对无夸张标题和原创标题的区分准确率达到92.5%。ROUGE-1的基线评分平均为0.311,ROUGE-2为0.113,ROUGE- l为0.253,ROUGE- lsum为0.253。PEGASUS、T5和BART的表现都优于基线。表现最好的BART模型为ROUGE-1 0.447, ROUGE-2 0.221, ROUGE-L 0.402, ROUGE-Lsum 0.402。独创性/价值本研究证明了利用人工智能自动识别和纠正夸大肥胖相关研究结果的新闻标题的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings

Abstract Purpose Media exaggerations of health research may confuse readers’ understanding, erode public trust in science and medicine, and cause disease mismanagement. This study built artificial intelligence (AI) models to automatically identify and correct news headlines exaggerating obesity-related research findings. Design/methodology/approach We searched popular digital media outlets to collect 523 headlines exaggerating obesity-related research findings. The reasons for exaggerations include: inferring causality from observational studies, inferring human outcomes from animal research, inferring distant/end outcomes (e.g., obesity) from immediate/intermediate outcomes (e.g., calorie intake), and generalizing findings to the population from a subgroup or convenience sample. Each headline was paired with the title and abstract of the peer-reviewed journal publication covered by the news article. We drafted an exaggeration-free counterpart for each original headline and fined-tuned a BERT model to differentiate between them. We further fine-tuned three generative language models—BART, PEGASUS, and T5 to autogenerate exaggeration-free headlines based on a journal publication’s title and abstract. Model performance was evaluated using the ROUGE metrics by comparing model-generated headlines with journal publication titles. Findings The fine-tuned BERT model achieved 92.5% accuracy in differentiating between exaggeration-free and original headlines. Baseline ROUGE scores averaged 0.311 for ROUGE-1, 0.113 for ROUGE-2, 0.253 for ROUGE-L, and 0.253 ROUGE-Lsum. PEGASUS, T5, and BART all outperformed the baseline. The best-performing BART model attained 0.447 for ROUGE-1, 0.221 for ROUGE-2, 0.402 for ROUGE-L, and 0.402 for ROUGE-Lsum. Originality/value This study demonstrated the feasibility of leveraging AI to automatically identify and correct news headlines exaggerating obesity-related research findings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of data and information science (Warsaw, Poland)

自引率

0.00%

发文量