Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis

Proceedings of the AAAI Symposium Series Pub Date : 2024-05-20 DOI:10.1609/aaaiss.v3i1.31205

Sagar Srinivas Sakhinana, Geethan Sannidhi, Venkataramana Runkana

引用次数: 0

Abstract

We present a novel framework for analyzing and interpreting electron microscopy images in semiconductor manufacturing using vision-language instruction tuning. The framework employs a unique teacher-student approach, leveraging pretrained multimodal large language models such as GPT-4 to generate instruction-following data for zero-shot visual question answering (VQA) and classification tasks, customizing smaller multimodal models (SMMs) for microscopy image analysis, resulting in an instruction tuned language-and-vision assistant. Our framework merges knowledge engineering with machine learning to integrate domain-specific expertise from larger to smaller multimodal models within this specialized field, greatly reducing the need for extensive human labeling. Our study presents a secure, cost-effective, and customizable approach for analyzing microscopy images, addressing the challenges of adopting proprietary models in semiconductor manufacturing.

查看原文本刊更多论文

用于半导体电子显微图像分析的多模式指令调整小型语言和视觉助手

我们提出了一个新颖的框架，利用视觉语言指令调整来分析和解释半导体制造中的电子显微镜图像。该框架采用独特的师生方法，利用 GPT-4 等预训练的多模态大型语言模型，为零镜头视觉问题解答（VQA）和分类任务生成指令跟踪数据，为显微镜图像分析定制较小的多模态模型 (SMM)，从而形成一个经过指令调整的语言和视觉助手。我们的框架将知识工程与机器学习相结合，在这一专业领域内将特定领域的专业知识从较大的多模态模型整合到较小的多模态模型中，从而大大减少了对大量人工标注的需求。我们的研究提出了一种安全、经济、可定制的显微图像分析方法，解决了在半导体制造中采用专有模型的难题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the AAAI Symposium Series

自引率

0.00%

发文量