{"title":"Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings","authors":"R. An, Quinlan Batcheller, Junjie Wang, Yuyi Yang","doi":"10.2478/jdis-2023-0014","DOIUrl":null,"url":null,"abstract":"Abstract Purpose Media exaggerations of health research may confuse readers’ understanding, erode public trust in science and medicine, and cause disease mismanagement. This study built artificial intelligence (AI) models to automatically identify and correct news headlines exaggerating obesity-related research findings. Design/methodology/approach We searched popular digital media outlets to collect 523 headlines exaggerating obesity-related research findings. The reasons for exaggerations include: inferring causality from observational studies, inferring human outcomes from animal research, inferring distant/end outcomes (e.g., obesity) from immediate/intermediate outcomes (e.g., calorie intake), and generalizing findings to the population from a subgroup or convenience sample. Each headline was paired with the title and abstract of the peer-reviewed journal publication covered by the news article. We drafted an exaggeration-free counterpart for each original headline and fined-tuned a BERT model to differentiate between them. We further fine-tuned three generative language models—BART, PEGASUS, and T5 to autogenerate exaggeration-free headlines based on a journal publication’s title and abstract. Model performance was evaluated using the ROUGE metrics by comparing model-generated headlines with journal publication titles. Findings The fine-tuned BERT model achieved 92.5% accuracy in differentiating between exaggeration-free and original headlines. Baseline ROUGE scores averaged 0.311 for ROUGE-1, 0.113 for ROUGE-2, 0.253 for ROUGE-L, and 0.253 ROUGE-Lsum. PEGASUS, T5, and BART all outperformed the baseline. The best-performing BART model attained 0.447 for ROUGE-1, 0.221 for ROUGE-2, 0.402 for ROUGE-L, and 0.402 for ROUGE-Lsum. Originality/value This study demonstrated the feasibility of leveraging AI to automatically identify and correct news headlines exaggerating obesity-related research findings.","PeriodicalId":92237,"journal":{"name":"Journal of data and information science (Warsaw, Poland)","volume":"8 1","pages":"88 - 97"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of data and information science (Warsaw, Poland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jdis-2023-0014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Purpose Media exaggerations of health research may confuse readers’ understanding, erode public trust in science and medicine, and cause disease mismanagement. This study built artificial intelligence (AI) models to automatically identify and correct news headlines exaggerating obesity-related research findings. Design/methodology/approach We searched popular digital media outlets to collect 523 headlines exaggerating obesity-related research findings. The reasons for exaggerations include: inferring causality from observational studies, inferring human outcomes from animal research, inferring distant/end outcomes (e.g., obesity) from immediate/intermediate outcomes (e.g., calorie intake), and generalizing findings to the population from a subgroup or convenience sample. Each headline was paired with the title and abstract of the peer-reviewed journal publication covered by the news article. We drafted an exaggeration-free counterpart for each original headline and fined-tuned a BERT model to differentiate between them. We further fine-tuned three generative language models—BART, PEGASUS, and T5 to autogenerate exaggeration-free headlines based on a journal publication’s title and abstract. Model performance was evaluated using the ROUGE metrics by comparing model-generated headlines with journal publication titles. Findings The fine-tuned BERT model achieved 92.5% accuracy in differentiating between exaggeration-free and original headlines. Baseline ROUGE scores averaged 0.311 for ROUGE-1, 0.113 for ROUGE-2, 0.253 for ROUGE-L, and 0.253 ROUGE-Lsum. PEGASUS, T5, and BART all outperformed the baseline. The best-performing BART model attained 0.447 for ROUGE-1, 0.221 for ROUGE-2, 0.402 for ROUGE-L, and 0.402 for ROUGE-Lsum. Originality/value This study demonstrated the feasibility of leveraging AI to automatically identify and correct news headlines exaggerating obesity-related research findings.