{"title":"Annotating Materials Science Text: A Semi-automated Approach for Crafting Outputs with Gemini Pro","authors":"Hasan M. Sayeed, Trupti Mohanty, Taylor D. Sparks","doi":"10.1007/s40192-024-00356-4","DOIUrl":null,"url":null,"abstract":"<p>Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google’s Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM’s capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM’s proficient starting point.</p>","PeriodicalId":13604,"journal":{"name":"Integrating Materials and Manufacturing Innovation","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integrating Materials and Manufacturing Innovation","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1007/s40192-024-00356-4","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MANUFACTURING","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google’s Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM’s capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM’s proficient starting point.
期刊介绍:
The journal will publish: Research that supports building a model-based definition of materials and processes that is compatible with model-based engineering design processes and multidisciplinary design optimization; Descriptions of novel experimental or computational tools or data analysis techniques, and their application, that are to be used for ICME; Best practices in verification and validation of computational tools, sensitivity analysis, uncertainty quantification, and data management, as well as standards and protocols for software integration and exchange of data; In-depth descriptions of data, databases, and database tools; Detailed case studies on efforts, and their impact, that integrate experiment and computation to solve an enduring engineering problem in materials and manufacturing.