{"title":"Sentence-Aligned Simplification of Biomedical Abstracts.","authors":"Brian Ondov, Dina Demner-Fushman","doi":"10.1007/978-3-031-66538-7_32","DOIUrl":null,"url":null,"abstract":"<p><p>The availability of biomedical abstracts in online databases could improve health literacy and drive more informed choices. However, the technical language of these documents makes them inaccessible to healthcare consumers, causing disengagement, frustration and potential misuse. In this work we explore adapting foundation language models to the Plain Language Adaptation of Biomedical Abstracts benchmark. This task is challenging because it requires sentence-by-sentence simplifications, but entire abstracts must also be simplified cohesively. We present a sentence-wise autoregressive approach and report experiments with this technique in both zero-shot and fine-tuned settings, using both proprietary and open-source models. We also introduce a stochastic regularization technique to encourage recovery from source-copying during autoregressive inference. Our best-performing model achieves a 32 point increase in SARI and 6 point increase in BERTscore over the reported state-of-the-art. This also surpasses performance of recent open-domain and biomedical sentence simplification models on this task. Further, in manual evaluation, models achieve factual accuracy comparable to human-level, with simplicity close to that of humans. Abstracts simplified by these models could unlock a massive source of health information while retaining clear provenance for each statement to enhance trustworthiness.</p>","PeriodicalId":72303,"journal":{"name":"Artificial intelligence in medicine. Conference on Artificial Intelligence in Medicine (2005- )","volume":"14844 ","pages":"322-333"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377474/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in medicine. Conference on Artificial Intelligence in Medicine (2005- )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-66538-7_32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The availability of biomedical abstracts in online databases could improve health literacy and drive more informed choices. However, the technical language of these documents makes them inaccessible to healthcare consumers, causing disengagement, frustration and potential misuse. In this work we explore adapting foundation language models to the Plain Language Adaptation of Biomedical Abstracts benchmark. This task is challenging because it requires sentence-by-sentence simplifications, but entire abstracts must also be simplified cohesively. We present a sentence-wise autoregressive approach and report experiments with this technique in both zero-shot and fine-tuned settings, using both proprietary and open-source models. We also introduce a stochastic regularization technique to encourage recovery from source-copying during autoregressive inference. Our best-performing model achieves a 32 point increase in SARI and 6 point increase in BERTscore over the reported state-of-the-art. This also surpasses performance of recent open-domain and biomedical sentence simplification models on this task. Further, in manual evaluation, models achieve factual accuracy comparable to human-level, with simplicity close to that of humans. Abstracts simplified by these models could unlock a massive source of health information while retaining clear provenance for each statement to enhance trustworthiness.