Advancing materials science through next-generation machine learning

IF 12.2 2区 材料科学 Q1 MATERIALS SCIENCE, MULTIDISCIPLINARY
Rohit Unni , Mingyuan Zhou , Peter R. Wiecha , Yuebing Zheng
{"title":"Advancing materials science through next-generation machine learning","authors":"Rohit Unni ,&nbsp;Mingyuan Zhou ,&nbsp;Peter R. Wiecha ,&nbsp;Yuebing Zheng","doi":"10.1016/j.cossms.2024.101157","DOIUrl":null,"url":null,"abstract":"<div><p>For over a decade, machine learning (ML) models have been making strides in computer vision and natural language processing (NLP), demonstrating high proficiency in specialized tasks. The emergence of large-scale language and generative image models, such as ChatGPT and Stable Diffusion, has significantly broadened the accessibility and application scope of these technologies. Traditional predictive models are typically constrained to mapping input data to numerical values or predefined categories, limiting their usefulness beyond their designated tasks. In contrast, contemporary models employ representation learning and generative modeling, enabling them to extract and encode key insights from a wide variety of data sources and decode them to create novel responses for desired goals. They can interpret queries phrased in natural language to deduce the intended output. In parallel, the application of ML techniques in materials science has advanced considerably, particularly in areas like inverse design, material prediction, and atomic modeling. Despite these advancements, the current models are overly specialized, hindering their potential to supplant established industrial processes. Materials science, therefore, necessitates the creation of a comprehensive, versatile model capable of interpreting human-readable inputs, intuiting a wide range of possible search directions, and delivering precise solutions. To realize such a model, the field must adopt cutting-edge representation, generative, and foundation model techniques tailored to materials science. A pivotal component in this endeavor is the establishment of an extensive, centralized dataset encompassing a broad spectrum of research topics. This dataset could be assembled by crowdsourcing global research contributions and developing models to extract data from existing literature and represent them in a homogenous format. A massive dataset can be used to train a central model that learns the underlying physics of the target areas, which can then be connected to a variety of specialized downstream tasks. Ultimately, the envisioned model would empower users to intuitively pose queries for a wide array of desired outcomes. It would facilitate the search for existing data that closely matches the sought-after solutions and leverage its understanding of physics and material-behavior relationships to innovate new solutions when pre-existing ones fall short.</p></div>","PeriodicalId":295,"journal":{"name":"Current Opinion in Solid State & Materials Science","volume":"30 ","pages":"Article 101157"},"PeriodicalIF":12.2000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Opinion in Solid State & Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1359028624000238","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

For over a decade, machine learning (ML) models have been making strides in computer vision and natural language processing (NLP), demonstrating high proficiency in specialized tasks. The emergence of large-scale language and generative image models, such as ChatGPT and Stable Diffusion, has significantly broadened the accessibility and application scope of these technologies. Traditional predictive models are typically constrained to mapping input data to numerical values or predefined categories, limiting their usefulness beyond their designated tasks. In contrast, contemporary models employ representation learning and generative modeling, enabling them to extract and encode key insights from a wide variety of data sources and decode them to create novel responses for desired goals. They can interpret queries phrased in natural language to deduce the intended output. In parallel, the application of ML techniques in materials science has advanced considerably, particularly in areas like inverse design, material prediction, and atomic modeling. Despite these advancements, the current models are overly specialized, hindering their potential to supplant established industrial processes. Materials science, therefore, necessitates the creation of a comprehensive, versatile model capable of interpreting human-readable inputs, intuiting a wide range of possible search directions, and delivering precise solutions. To realize such a model, the field must adopt cutting-edge representation, generative, and foundation model techniques tailored to materials science. A pivotal component in this endeavor is the establishment of an extensive, centralized dataset encompassing a broad spectrum of research topics. This dataset could be assembled by crowdsourcing global research contributions and developing models to extract data from existing literature and represent them in a homogenous format. A massive dataset can be used to train a central model that learns the underlying physics of the target areas, which can then be connected to a variety of specialized downstream tasks. Ultimately, the envisioned model would empower users to intuitively pose queries for a wide array of desired outcomes. It would facilitate the search for existing data that closely matches the sought-after solutions and leverage its understanding of physics and material-behavior relationships to innovate new solutions when pre-existing ones fall short.

通过新一代机器学习推动材料科学发展
十多年来,机器学习(ML)模型在计算机视觉和自然语言处理(NLP)领域取得了长足的进步,在专业任务中表现出了很高的能力。ChatGPT 和稳定扩散等大规模语言和生成图像模型的出现,大大拓宽了这些技术的可访问性和应用范围。传统的预测模型通常受限于将输入数据映射到数值或预定义的类别,从而限制了其在指定任务之外的实用性。相比之下,现代模型采用了表征学习和生成建模技术,使其能够从各种数据源中提取和编码关键见解,并对其进行解码,从而为所需目标创建新颖的响应。它们可以解释以自然语言提出的查询,从而推导出预期的输出结果。与此同时,ML 技术在材料科学中的应用也取得了长足的进步,尤其是在反向设计、材料预测和原子建模等领域。尽管取得了这些进步,但目前的模型过于专业化,阻碍了其取代既定工业流程的潜力。因此,材料科学需要创建一个全面、通用的模型,能够解释人类可读的输入,直觉一系列可能的搜索方向,并提供精确的解决方案。要实现这样一个模型,该领域必须采用最先进的表示、生成和基础模型技术,为材料科学量身定制。这项工作的一个关键组成部分是建立一个广泛的、集中化的数据集,涵盖各种研究课题。该数据集可通过众包全球研究成果和开发模型来收集,以便从现有文献中提取数据并以统一格式表示出来。海量数据集可用于训练一个中央模型,该模型可学习目标领域的基础物理知识,然后将其连接到各种专门的下游任务。最终,设想中的模型将使用户能够直观地对各种预期结果进行查询。它将为搜索与所需解决方案密切匹配的现有数据提供便利,并利用其对物理学和材料行为关系的理解,在现有解决方案不足时创新出新的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Current Opinion in Solid State & Materials Science
Current Opinion in Solid State & Materials Science 工程技术-材料科学:综合
CiteScore
21.10
自引率
3.60%
发文量
41
审稿时长
47 days
期刊介绍: Title: Current Opinion in Solid State & Materials Science Journal Overview: Aims to provide a snapshot of the latest research and advances in materials science Publishes six issues per year, each containing reviews covering exciting and developing areas of materials science Each issue comprises 2-3 sections of reviews commissioned by international researchers who are experts in their fields Provides materials scientists with the opportunity to stay informed about current developments in their own and related areas of research Promotes cross-fertilization of ideas across an increasingly interdisciplinary field
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信