{"title":"注释材料科学文本:使用 Gemini Pro 制作输出结果的半自动方法","authors":"Hasan M. Sayeed, Trupti Mohanty, Taylor D. Sparks","doi":"10.1007/s40192-024-00356-4","DOIUrl":null,"url":null,"abstract":"<p>Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google’s Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM’s capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM’s proficient starting point.</p>","PeriodicalId":13604,"journal":{"name":"Integrating Materials and Manufacturing Innovation","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Annotating Materials Science Text: A Semi-automated Approach for Crafting Outputs with Gemini Pro\",\"authors\":\"Hasan M. Sayeed, Trupti Mohanty, Taylor D. Sparks\",\"doi\":\"10.1007/s40192-024-00356-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google’s Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM’s capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM’s proficient starting point.</p>\",\"PeriodicalId\":13604,\"journal\":{\"name\":\"Integrating Materials and Manufacturing Innovation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Integrating Materials and Manufacturing Innovation\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://doi.org/10.1007/s40192-024-00356-4\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, MANUFACTURING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integrating Materials and Manufacturing Innovation","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1007/s40192-024-00356-4","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MANUFACTURING","Score":null,"Total":0}
Annotating Materials Science Text: A Semi-automated Approach for Crafting Outputs with Gemini Pro
Recent advancements in large language models (LLMs) have paved the way for automated information extraction in the materials science domain. However, fine-tuning these models, crucial for effective machine learning pipelines in materials science, is hindered by a lack of pre-annotated data. Manual annotation, a laborious process, exacerbates the challenge. To address this, we introduce a tailored semi-automated annotation process, using Google’s Gemini Pro language model. Our approach focuses on two key tasks: extracting information in structured JSON format and generating abstractive summaries from materials science texts. The collaborative process, a symbiotic effort between human annotators and the LLM, driven by structured prompts and user-guided examples, enhances the annotation quality and augments the LLM’s capacity to comprehend materials science intricacies. Importantly, it streamlines human annotation efforts by leveraging the LLM’s proficient starting point.
期刊介绍:
The journal will publish: Research that supports building a model-based definition of materials and processes that is compatible with model-based engineering design processes and multidisciplinary design optimization; Descriptions of novel experimental or computational tools or data analysis techniques, and their application, that are to be used for ICME; Best practices in verification and validation of computational tools, sensitivity analysis, uncertainty quantification, and data management, as well as standards and protocols for software integration and exchange of data; In-depth descriptions of data, databases, and database tools; Detailed case studies on efforts, and their impact, that integrate experiment and computation to solve an enduring engineering problem in materials and manufacturing.