{"title":"<i>LitAI</i>: Enhancing Multimodal Literature Understanding and Mining with Generative AI.","authors":"Gowtham Medisetti, Zacchaeus Compson, Heng Fan, Huaxiao Yang, Yunhe Feng","doi":"10.1109/mipr62202.2024.00080","DOIUrl":null,"url":null,"abstract":"<p><p>Information processing and retrieval in literature are critical for advancing scientific research and knowledge discovery. The inherent multimodality and diverse literature formats, including text, tables, and figures, present significant challenges in literature information retrieval. This paper introduces <i>LitAI</i>, a novel approach that employs readily available generative AI tools to enhance multimodal information retrieval from literature documents. By integrating tools such as optical character recognition (OCR) with generative AI services, <i>LitAI</i> facilitates the retrieval of text, tables, and figures from PDF documents. We have developed specific prompts that leverage in-context learning and prompt engineering within Generative AI to achieve precise information extraction. Our empirical evaluations, conducted on datasets from the ecological and biological sciences, demonstrate the superiority of our approach over several established baselines including Tesseract-OCR and GPT-4. The implementation of <i>LitAI</i> is accessible at https://github.com/ResponsibleAILab/LitAI.</p>","PeriodicalId":520274,"journal":{"name":"Proceedings. IEEE Conference on Multimedia Information Processing and Retrieval","volume":"2024 ","pages":"471-476"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526646/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Conference on Multimedia Information Processing and Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mipr62202.2024.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/15 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Information processing and retrieval in literature are critical for advancing scientific research and knowledge discovery. The inherent multimodality and diverse literature formats, including text, tables, and figures, present significant challenges in literature information retrieval. This paper introduces LitAI, a novel approach that employs readily available generative AI tools to enhance multimodal information retrieval from literature documents. By integrating tools such as optical character recognition (OCR) with generative AI services, LitAI facilitates the retrieval of text, tables, and figures from PDF documents. We have developed specific prompts that leverage in-context learning and prompt engineering within Generative AI to achieve precise information extraction. Our empirical evaluations, conducted on datasets from the ecological and biological sciences, demonstrate the superiority of our approach over several established baselines including Tesseract-OCR and GPT-4. The implementation of LitAI is accessible at https://github.com/ResponsibleAILab/LitAI.