{"title":"基于大语言嵌入模型的多模态机器学习聚合物性能预测","authors":"Tianren Zhang*, and , Dai-Bei Yang, ","doi":"10.1021/acs.chemmater.5c00940","DOIUrl":null,"url":null,"abstract":"<p >Contemporary large language models (LLMs), such as GPT-4 and Llama, have harnessed extensive computational power and diverse text corpora to achieve remarkable proficiency in interpreting and generating domain-specific content, including materials science. To leverage the domain knowledge embedded within these models, we propose a simple yet effective multimodal architecture, PolyLLMem. By integrating text embeddings from Llama 3 with molecular structure embeddings from Uni-Mol, PolyLLMem enables the accurate prediction of polymer properties. Low-Rank Adaptation (LoRA) layers were integrated into our model during the property prediction stage to adapt the Llama 3 and Uni-Mol embeddings to our limited polymer data set, thereby enhancing their chemical relevance for polymer SMILES representation. Such a balanced fusion of fine-tuned textual and structural information enables PolyLLMem to robustly predict a variety of polymer properties despite the scarcity of training data. Its performance is comparable to, and in some cases exceeds, that of graph-based or transformer-based models that typically require pretraining on millions of polymer samples. These findings demonstrate that LLM, such as Llama, can effectively capture chemical information encoded in polymer SMILES (PSMILES), and underscore the efficacy of multimodal fusion of LLM embeddings and molecular structure embeddings in overcoming data scarcity and accelerating the discovery of advanced polymeric materials.</p>","PeriodicalId":33,"journal":{"name":"Chemistry of Materials","volume":"37 18","pages":"7002–7013"},"PeriodicalIF":7.0000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Machine Learning with Large Language Embedding Model for Polymer Property Prediction\",\"authors\":\"Tianren Zhang*, and , Dai-Bei Yang, \",\"doi\":\"10.1021/acs.chemmater.5c00940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Contemporary large language models (LLMs), such as GPT-4 and Llama, have harnessed extensive computational power and diverse text corpora to achieve remarkable proficiency in interpreting and generating domain-specific content, including materials science. To leverage the domain knowledge embedded within these models, we propose a simple yet effective multimodal architecture, PolyLLMem. By integrating text embeddings from Llama 3 with molecular structure embeddings from Uni-Mol, PolyLLMem enables the accurate prediction of polymer properties. Low-Rank Adaptation (LoRA) layers were integrated into our model during the property prediction stage to adapt the Llama 3 and Uni-Mol embeddings to our limited polymer data set, thereby enhancing their chemical relevance for polymer SMILES representation. Such a balanced fusion of fine-tuned textual and structural information enables PolyLLMem to robustly predict a variety of polymer properties despite the scarcity of training data. Its performance is comparable to, and in some cases exceeds, that of graph-based or transformer-based models that typically require pretraining on millions of polymer samples. These findings demonstrate that LLM, such as Llama, can effectively capture chemical information encoded in polymer SMILES (PSMILES), and underscore the efficacy of multimodal fusion of LLM embeddings and molecular structure embeddings in overcoming data scarcity and accelerating the discovery of advanced polymeric materials.</p>\",\"PeriodicalId\":33,\"journal\":{\"name\":\"Chemistry of Materials\",\"volume\":\"37 18\",\"pages\":\"7002–7013\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chemistry of Materials\",\"FirstCategoryId\":\"88\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.chemmater.5c00940\",\"RegionNum\":2,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemistry of Materials","FirstCategoryId":"88","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.chemmater.5c00940","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
Multimodal Machine Learning with Large Language Embedding Model for Polymer Property Prediction
Contemporary large language models (LLMs), such as GPT-4 and Llama, have harnessed extensive computational power and diverse text corpora to achieve remarkable proficiency in interpreting and generating domain-specific content, including materials science. To leverage the domain knowledge embedded within these models, we propose a simple yet effective multimodal architecture, PolyLLMem. By integrating text embeddings from Llama 3 with molecular structure embeddings from Uni-Mol, PolyLLMem enables the accurate prediction of polymer properties. Low-Rank Adaptation (LoRA) layers were integrated into our model during the property prediction stage to adapt the Llama 3 and Uni-Mol embeddings to our limited polymer data set, thereby enhancing their chemical relevance for polymer SMILES representation. Such a balanced fusion of fine-tuned textual and structural information enables PolyLLMem to robustly predict a variety of polymer properties despite the scarcity of training data. Its performance is comparable to, and in some cases exceeds, that of graph-based or transformer-based models that typically require pretraining on millions of polymer samples. These findings demonstrate that LLM, such as Llama, can effectively capture chemical information encoded in polymer SMILES (PSMILES), and underscore the efficacy of multimodal fusion of LLM embeddings and molecular structure embeddings in overcoming data scarcity and accelerating the discovery of advanced polymeric materials.
期刊介绍:
The journal Chemistry of Materials focuses on publishing original research at the intersection of materials science and chemistry. The studies published in the journal involve chemistry as a prominent component and explore topics such as the design, synthesis, characterization, processing, understanding, and application of functional or potentially functional materials. The journal covers various areas of interest, including inorganic and organic solid-state chemistry, nanomaterials, biomaterials, thin films and polymers, and composite/hybrid materials. The journal particularly seeks papers that highlight the creation or development of innovative materials with novel optical, electrical, magnetic, catalytic, or mechanical properties. It is essential that manuscripts on these topics have a primary focus on the chemistry of materials and represent a significant advancement compared to prior research. Before external reviews are sought, submitted manuscripts undergo a review process by a minimum of two editors to ensure their appropriateness for the journal and the presence of sufficient evidence of a significant advance that will be of broad interest to the materials chemistry community.