Simplifying Scholarly Abstracts for Accessible Digital Libraries

arXiv - CS - Digital Libraries Pub Date : 2024-08-07 DOI:arxiv-2408.03899

Haining Wang, Jason Clark

{"title":"Simplifying Scholarly Abstracts for Accessible Digital Libraries","authors":"Haining Wang, Jason Clark","doi":"arxiv-2408.03899","DOIUrl":null,"url":null,"abstract":"Standing at the forefront of knowledge dissemination, digital libraries\ncurate vast collections of scientific literature. However, these scholarly\nwritings are often laden with jargon and tailored for domain experts rather\nthan the general public. As librarians, we strive to offer services to a\ndiverse audience, including those with lower reading levels. To extend our\nservices beyond mere access, we propose fine-tuning a language model to rewrite\nscholarly abstracts into more comprehensible versions, thereby making scholarly\nliterature more accessible when requested. We began by introducing a corpus\nspecifically designed for training models to simplify scholarly abstracts. This\ncorpus consists of over three thousand pairs of abstracts and significance\nstatements from diverse disciplines. We then fine-tuned four language models\nusing this corpus. The outputs from the models were subsequently examined both\nquantitatively for accessibility and semantic coherence, and qualitatively for\nlanguage quality, faithfulness, and completeness. Our findings show that the\nresulting models can improve readability by over three grade levels, while\nmaintaining fidelity to the original content. Although commercial\nstate-of-the-art models still hold an edge, our models are much more compact,\ncan be deployed locally in an affordable manner, and alleviate the privacy\nconcerns associated with using commercial models. We envision this work as a\nstep toward more inclusive and accessible libraries, improving our services for\nyoung readers and those without a college degree.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"192 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Standing at the forefront of knowledge dissemination, digital libraries curate vast collections of scientific literature. However, these scholarly writings are often laden with jargon and tailored for domain experts rather than the general public. As librarians, we strive to offer services to a diverse audience, including those with lower reading levels. To extend our services beyond mere access, we propose fine-tuning a language model to rewrite scholarly abstracts into more comprehensible versions, thereby making scholarly literature more accessible when requested. We began by introducing a corpus specifically designed for training models to simplify scholarly abstracts. This corpus consists of over three thousand pairs of abstracts and significance statements from diverse disciplines. We then fine-tuned four language models using this corpus. The outputs from the models were subsequently examined both quantitatively for accessibility and semantic coherence, and qualitatively for language quality, faithfulness, and completeness. Our findings show that the resulting models can improve readability by over three grade levels, while maintaining fidelity to the original content. Although commercial state-of-the-art models still hold an edge, our models are much more compact, can be deployed locally in an affordable manner, and alleviate the privacy concerns associated with using commercial models. We envision this work as a step toward more inclusive and accessible libraries, improving our services for young readers and those without a college degree.

查看原文本刊更多论文

简化学术论文摘要，打造无障碍数字图书馆

数字图书馆站在知识传播的最前沿，收集了大量科学文献。然而，这些学术著作往往充斥着专业术语，是为领域专家而非普通大众量身定做的。作为图书馆员，我们努力为包括阅读水平较低的读者在内的各类读者提供服务。为了将我们的服务扩展到单纯的查阅之外，我们建议对语言模型进行微调，将学术文摘改写成更易懂的版本，从而使学术文献在被请求时更容易被查阅。我们首先介绍了一个专门用于训练模型以简化学术文摘的语料库。该语料库由来自不同学科的三千多对摘要和意义声明组成。然后，我们利用这个语料库对四个语言模型进行了微调。随后，我们对这些模型的输出结果进行了定量检查，包括可访问性和语义连贯性，以及语言质量、忠实性和完整性。我们的研究结果表明，在保持忠实于原始内容的前提下，所产生的模型可以将可读性提高三个等级以上。尽管最先进的商业模型仍然具有优势，但我们的模型更加紧凑，可以在本地部署，价格合理，而且减轻了与使用商业模型相关的隐私问题。我们将这项工作视为迈向更具包容性和更方便使用的图书馆的一步，改善我们为年轻读者和没有大学文凭的人提供的服务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Digital Libraries

自引率

0.00%

发文量