{"title":"简化学术论文摘要,打造无障碍数字图书馆","authors":"Haining Wang, Jason Clark","doi":"arxiv-2408.03899","DOIUrl":null,"url":null,"abstract":"Standing at the forefront of knowledge dissemination, digital libraries\ncurate vast collections of scientific literature. However, these scholarly\nwritings are often laden with jargon and tailored for domain experts rather\nthan the general public. As librarians, we strive to offer services to a\ndiverse audience, including those with lower reading levels. To extend our\nservices beyond mere access, we propose fine-tuning a language model to rewrite\nscholarly abstracts into more comprehensible versions, thereby making scholarly\nliterature more accessible when requested. We began by introducing a corpus\nspecifically designed for training models to simplify scholarly abstracts. This\ncorpus consists of over three thousand pairs of abstracts and significance\nstatements from diverse disciplines. We then fine-tuned four language models\nusing this corpus. The outputs from the models were subsequently examined both\nquantitatively for accessibility and semantic coherence, and qualitatively for\nlanguage quality, faithfulness, and completeness. Our findings show that the\nresulting models can improve readability by over three grade levels, while\nmaintaining fidelity to the original content. Although commercial\nstate-of-the-art models still hold an edge, our models are much more compact,\ncan be deployed locally in an affordable manner, and alleviate the privacy\nconcerns associated with using commercial models. We envision this work as a\nstep toward more inclusive and accessible libraries, improving our services for\nyoung readers and those without a college degree.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"192 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simplifying Scholarly Abstracts for Accessible Digital Libraries\",\"authors\":\"Haining Wang, Jason Clark\",\"doi\":\"arxiv-2408.03899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Standing at the forefront of knowledge dissemination, digital libraries\\ncurate vast collections of scientific literature. However, these scholarly\\nwritings are often laden with jargon and tailored for domain experts rather\\nthan the general public. As librarians, we strive to offer services to a\\ndiverse audience, including those with lower reading levels. To extend our\\nservices beyond mere access, we propose fine-tuning a language model to rewrite\\nscholarly abstracts into more comprehensible versions, thereby making scholarly\\nliterature more accessible when requested. We began by introducing a corpus\\nspecifically designed for training models to simplify scholarly abstracts. This\\ncorpus consists of over three thousand pairs of abstracts and significance\\nstatements from diverse disciplines. We then fine-tuned four language models\\nusing this corpus. The outputs from the models were subsequently examined both\\nquantitatively for accessibility and semantic coherence, and qualitatively for\\nlanguage quality, faithfulness, and completeness. Our findings show that the\\nresulting models can improve readability by over three grade levels, while\\nmaintaining fidelity to the original content. Although commercial\\nstate-of-the-art models still hold an edge, our models are much more compact,\\ncan be deployed locally in an affordable manner, and alleviate the privacy\\nconcerns associated with using commercial models. We envision this work as a\\nstep toward more inclusive and accessible libraries, improving our services for\\nyoung readers and those without a college degree.\",\"PeriodicalId\":501285,\"journal\":{\"name\":\"arXiv - CS - Digital Libraries\",\"volume\":\"192 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.03899\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Simplifying Scholarly Abstracts for Accessible Digital Libraries
Standing at the forefront of knowledge dissemination, digital libraries
curate vast collections of scientific literature. However, these scholarly
writings are often laden with jargon and tailored for domain experts rather
than the general public. As librarians, we strive to offer services to a
diverse audience, including those with lower reading levels. To extend our
services beyond mere access, we propose fine-tuning a language model to rewrite
scholarly abstracts into more comprehensible versions, thereby making scholarly
literature more accessible when requested. We began by introducing a corpus
specifically designed for training models to simplify scholarly abstracts. This
corpus consists of over three thousand pairs of abstracts and significance
statements from diverse disciplines. We then fine-tuned four language models
using this corpus. The outputs from the models were subsequently examined both
quantitatively for accessibility and semantic coherence, and qualitatively for
language quality, faithfulness, and completeness. Our findings show that the
resulting models can improve readability by over three grade levels, while
maintaining fidelity to the original content. Although commercial
state-of-the-art models still hold an edge, our models are much more compact,
can be deployed locally in an affordable manner, and alleviate the privacy
concerns associated with using commercial models. We envision this work as a
step toward more inclusive and accessible libraries, improving our services for
young readers and those without a college degree.