Digital discovery最新文献

筛选
英文 中文
Learning on compressed molecular representations 压缩分子表征的学习
IF 6.2
Digital discovery Pub Date : 2024-11-04 DOI: 10.1039/D4DD00162A
Jan Weinreich and Daniel Probst
{"title":"Learning on compressed molecular representations","authors":"Jan Weinreich and Daniel Probst","doi":"10.1039/D4DD00162A","DOIUrl":"https://doi.org/10.1039/D4DD00162A","url":null,"abstract":"<p >Last year, a preprint gained notoriety, proposing that a <em>k</em>-nearest neighbour classifier is able to outperform large-language models using compressed text as input and normalised compression distance (NCD) as a metric. In chemistry and biochemistry, molecules are often represented as strings, such as SMILES for small molecules or single-letter amino acid sequences for proteins. Here, we extend the previously introduced approach with support for regression and multitask classification and subsequently apply it to the prediction of molecular properties and protein–ligand binding affinities. We further propose converting numerical descriptors into string representations, enabling the integration of text input with domain-informed numerical descriptors. Finally, we show that the method can achieve performance competitive with chemical fingerprint- and GNN-based methodologies in general, and perform better than comparable methods on quantum chemistry and protein–ligand binding affinity prediction tasks.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 84-92"},"PeriodicalIF":6.2,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00162a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142994103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials† ArcaNN:化学反应机器学习原子间电位训练集的自动增强采样生成。
IF 6.2
Digital discovery Pub Date : 2024-10-30 DOI: 10.1039/D4DD00209A
Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann and Damien Laage
{"title":"ArcaNN: automated enhanced sampling generation of training sets for chemically reactive machine learning interatomic potentials†","authors":"Rolf David, Miguel de la Puente, Axel Gomez, Olaia Anton, Guillaume Stirnemann and Damien Laage","doi":"10.1039/D4DD00209A","DOIUrl":"10.1039/D4DD00209A","url":null,"abstract":"<p >The emergence of artificial intelligence is profoundly impacting computational chemistry, particularly through machine-learning interatomic potentials (MLIPs). Unlike traditional potential energy surface representations, MLIPs overcome the conventional computational scaling limitations by offering an effective combination of accuracy and efficiency for calculating atomic energies and forces to be used in molecular simulations. These MLIPs have significantly enhanced molecular simulations across various applications, including large-scale simulations of materials, interfaces, chemical reactions, and beyond. Despite these advances, the construction of training datasets—a critical component for the accuracy of MLIPs—has not received proportional attention, especially in the context of chemical reactivity, which depends on rare barrier-crossing events that are not easily included in the datasets. Here we address this gap by introducing ArcaNN, a comprehensive framework designed for generating training datasets for reactive MLIPs. ArcaNN employs a concurrent learning approach combined with advanced sampling techniques to ensure an accurate representation of high-energy geometries. The framework integrates automated processes for iterative training, exploration, new configuration selection, and energy and force labeling, all while ensuring reproducibility and documentation. We demonstrate ArcaNN's capabilities through two paradigm reactions: a nucleophilic substitution and a Diels–Alder reaction. These examples showcase its effectiveness, the uniformly low error of the resulting MLIP everywhere along the chemical reaction coordinate, and its potential for broad applications in reactive molecular dynamics. Finally, we provide guidelines for assessing the quality of MLIPs in reactive systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 54-72"},"PeriodicalIF":6.2,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11563209/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Navigating the Maize: cyclic and conditional computational graphs for molecular simulation 导航玉米:分子模拟的循环和条件计算图
IF 6.2
Digital discovery Pub Date : 2024-10-28 DOI: 10.1039/D4DD00288A
Thomas Löhr, Michele Assante, Michael Dodds, Lili Cao, Mikhail Kabeshov, Jon-Paul Janet, Marco Klähn and Ola Engkvist
{"title":"Navigating the Maize: cyclic and conditional computational graphs for molecular simulation","authors":"Thomas Löhr, Michele Assante, Michael Dodds, Lili Cao, Mikhail Kabeshov, Jon-Paul Janet, Marco Klähn and Ola Engkvist","doi":"10.1039/D4DD00288A","DOIUrl":"https://doi.org/10.1039/D4DD00288A","url":null,"abstract":"<p >Many computational chemistry and molecular simulation workflows can be expressed as graphs. This abstraction is useful to modularize and potentially reuse existing components, as well as provide parallelization and ease reproducibility. Existing tools represent the computation as a directed acyclic graph (DAG), thus allowing efficient execution by parallelization of concurrent branches. These systems can, however, generally not express cyclic and conditional workflows. We therefore developed Maize, a workflow manager for cyclic and conditional graphs based on the principles of flow-based programming. By running each node of the graph concurrently in separate processes and allowing communication at any time through dedicated inter-node channels, arbitrary graph structures can be executed. We demonstrate the effectiveness of the tool on a dynamic active learning task in computational drug design, involving the use of a small molecule generative model and an associated scoring system, and on a reactivity prediction pipeline using quantum-chemistry and semiempirical approaches.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2551-2559"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00288a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete and mixed-variable experimental design with surrogate-based approach† 基于代理的离散和混合变量实验设计[j]
IF 6.2
Digital discovery Pub Date : 2024-10-28 DOI: 10.1039/D4DD00113C
Mengjia Zhu, Austin Mroz, Lingfeng Gui, Kim E. Jelfs, Alberto Bemporad, Ehecatl Antonio del Río Chanona and Ye Seol Lee
{"title":"Discrete and mixed-variable experimental design with surrogate-based approach†","authors":"Mengjia Zhu, Austin Mroz, Lingfeng Gui, Kim E. Jelfs, Alberto Bemporad, Ehecatl Antonio del Río Chanona and Ye Seol Lee","doi":"10.1039/D4DD00113C","DOIUrl":"https://doi.org/10.1039/D4DD00113C","url":null,"abstract":"<p >Experimental design plays an important role in efficiently acquiring informative data for system characterization and deriving robust conclusions under resource limitations. Recent advancements in high-throughput experimentation coupled with machine learning have notably improved experimental procedures. While Bayesian optimization (BO) has undeniably revolutionized the landscape of optimization in experimental design, especially in the chemical domain, it is important to recognize the role of other surrogate-based approaches in conventional chemistry optimization problems. This is particularly relevant for chemical problems involving mixed-variable design space with mixed-variable physical constraints, where conventional BO approaches struggle to obtain feasible samples during the acquisition step while maintaining exploration capability. In this paper, we demonstrate that integrating mixed-integer optimization strategies is one way to address these challenges effectively. Specifically, we propose the utilization of mixed-integer surrogates and acquisition functions–methods that offer inherent compatibility with problems with discrete and mixed-variable design space. This work focuses on piecewise affine surrogate-based optimization (PWAS), a surrogate model capable of handling medium-sized mixed-variable problems (up to around 100 variables after encoding) subject to known linear constraints. We demonstrate the effectiveness of this approach in optimizing experimental planning through three case studies. By benchmarking PWAS against state-of-the-art optimization algorithms, including genetic algorithms and BO variants, we offer insights into the practical applicability of mixed-integer surrogates, with emphasis on problems subject to known discrete/mixed-variable linear constraints.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2589-2606"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00113c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Agent-based learning of materials datasets from the scientific literature† 科学文献中基于agent的材料数据集学习
IF 6.2
Digital discovery Pub Date : 2024-10-28 DOI: 10.1039/D4DD00252K
Mehrad Ansari and Seyed Mohamad Moosavi
{"title":"Agent-based learning of materials datasets from the scientific literature†","authors":"Mehrad Ansari and Seyed Mohamad Moosavi","doi":"10.1039/D4DD00252K","DOIUrl":"https://doi.org/10.1039/D4DD00252K","url":null,"abstract":"<p >Advancements in machine learning and artificial intelligence are transforming the discovery of materials. While the vast corpus of scientific literature presents a valuable and rich resource of experimental data that can be used for training machine learning models, the availability and accessibility of these data remains a bottleneck. Accessing these data by manual dataset creation is limited due to issues in maintaining quality and consistency, scalability limitations, and the risk of human error and bias. Therefore, in this work, we develop a chemist AI agent, powered by large language models (LLMs), to overcome these limitations by autonomously creating structured datasets from natural language text, ranging from sentences and paragraphs to extensive scientific research articles and extract guidelines for designing materials with desired properties. Our chemist AI agent, Eunomia, can plan and execute actions by leveraging the existing knowledge from decades of scientific research articles, scientists, the Internet and other tools altogether. We benchmark the performance of our approach in three different information extraction tasks with various levels of complexity, including solid-state impurity doping, metal–organic framework (MOF) chemical formula, and property relationships. Our results demonstrate that our zero-shot agent, with the appropriate tools, is capable of attaining performance that is either superior or comparable to the state-of-the-art fine-tuned material information extraction methods. This approach simplifies compilation of machine learning-ready datasets for the applications of discovery of various materials, and significantly eases the accessibility of advanced natural language processing tools for novice users in natural language. The methodology in this work is developed as open-source software on https://github.com/AI4ChemS/Eunomia.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2607-2617"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00252k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised learning and pattern recognition in alloy design 合金设计中的无监督学习与模式识别
IF 6.2
Digital discovery Pub Date : 2024-10-28 DOI: 10.1039/D4DD00282B
Ninad Bhat, Nick Birbilis and Amanda S. Barnard
{"title":"Unsupervised learning and pattern recognition in alloy design","authors":"Ninad Bhat, Nick Birbilis and Amanda S. Barnard","doi":"10.1039/D4DD00282B","DOIUrl":"https://doi.org/10.1039/D4DD00282B","url":null,"abstract":"<p >Machine learning has the potential to revolutionise alloy design by uncovering useful patterns in complex datasets and supplementing human expertise and experience. This review examines the role of unsupervised learning methods, including clustering, dimensionality reduction, and manifold learning, in the context of alloy design. While the use of unsupervised learning in alloy design is still in its early stages, these techniques offer new ways to analyse high-dimensional alloy data, uncovering structures and relationships that are difficult to detect with traditional methods. Using unsupervised learning, researchers can identify specific groups within alloy data sets that are not simple partitions based on metal compositions, and can help optimise and develop new alloys with customised properties. Incorporating these data-driven methods into alloy design speeds up the discovery process and reveals new connections that were not previously understood, significantly contributing to innovation in materials science. This review outlines the key scientific progress and future possibilities for using unsupervised machine learning in alloy design.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2396-2416"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00282b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Hammett σ constants for Δ-machine learning and catalyst discovery† 结合Hammett σ常数Δ-machine学习和催化剂发现†
IF 6.2
Digital discovery Pub Date : 2024-10-23 DOI: 10.1039/D4DD00228H
V. Diana Rakotonirina, Marco Bragato, Stefan Heinen and O. Anatole von Lilienfeld
{"title":"Combining Hammett σ constants for Δ-machine learning and catalyst discovery†","authors":"V. Diana Rakotonirina, Marco Bragato, Stefan Heinen and O. Anatole von Lilienfeld","doi":"10.1039/D4DD00228H","DOIUrl":"https://doi.org/10.1039/D4DD00228H","url":null,"abstract":"<p >We study the applicability of the Hammett-inspired product (HIP) Ansatz to model relative substrate binding within homogenous organometallic catalysis, assigning <em>σ</em> and <em>ρ</em> to ligands and metals, respectively. Implementing an additive combination (c) rule for obtaining <em>σ</em> constants for any ligand pair combination results in a cHIP model that enhances data efficiency in computational ligand tuning. We show its usefulness (i) as a baseline for Δ-machine learning (ML), and (ii) to identify novel catalyst candidates <em>via</em> volcano plots. After testing the combination rule on Hammett constants previously published in the literature, we have generated numerical evidence for the Suzuki–Miyaura (SM) C–C cross-coupling reaction using two synthetic datasets of metallic catalysts (including (10) and (11)-metals Ni, Pd, Pt, and Cu, Ag, Au as well as 96 ligands such as N-heterocyclic carbenes, phosphines, or pyridines). When used as a baseline, Δ-ML prediction errors of relative binding decrease systematically with training set size and reach chemical accuracy (∼1 kcal mol<small><sup>−1</sup></small>) for 20k training instances. Employing the individual ligand constants obtained from cHIP, we report relative substrate binding for a novel dataset consisting of 720 catalysts (not part of training data), of which 145 fall into the most promising range on the volcano plot accounting for oxidative addition, transmetalation, and reductive elimination steps. Multiple Ni-based catalysts, <em>e.g.</em> Aphos-Ni-P(<em>t</em>-Bu)<small><sub>3</sub></small>, are included among these promising candidates, potentially offering dramatic cost savings in experimental applications.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2487-2496"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00228h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From text to test: AI-generated control software for materials science instruments† 从文本到测试:人工智能生成的材料科学仪器控制软件†
IF 6.2
Digital discovery Pub Date : 2024-10-23 DOI: 10.1039/D4DD00143E
Davi Fébba, Kingsley Egbo, William A. Callahan and Andriy Zakutayev
{"title":"From text to test: AI-generated control software for materials science instruments†","authors":"Davi Fébba, Kingsley Egbo, William A. Callahan and Andriy Zakutayev","doi":"10.1039/D4DD00143E","DOIUrl":"https://doi.org/10.1039/D4DD00143E","url":null,"abstract":"<p >Large language models (LLMs) are one of the AI technologies that are transforming the landscape of chemistry and materials science. Recent examples of LLM-accelerated experimental research include virtual assistants for parsing synthesis recipes from the literature, or using the extracted knowledge to guide synthesis and characterization. However, these AI-driven materials advances are limited to a few laboratories with existing automated instruments and control software, whereas the rest of materials science research remains highly manual. AI-crafted control code for automating scientific instruments would democratize and further accelerate materials research advances, but reports of such AI applications remain scarce. The goal of this manuscript is to demonstrate how to swiftly establish a Python-based control module for a scientific measurement instrument solely through interactions with ChatGPT-4. Through a series of test and correction cycles, we achieved successful management of a common Keithley 2400 electrical source measure unit instrument with minimal human-corrected code, and discussed lessons learned from this development approach for scientific software. Additionally, a user-friendly graphical user interface (GUI) was created, effectively linking all instrument controls to interactive screen elements, and text prompts as well as JSON templates for interaction with ChatGPT are provided for this and other instruments. Finally, we integrated this AI-crafted instrument control software with a high-performance stochastic optimization algorithm to facilitate rapid and automated extraction of electronic device parameters related to semiconductor charge transport mechanisms from current–voltage (IV) measurement data. This integration resulted in a comprehensive open-source toolkit for semiconductor device characterization and analysis using IV curve measurements. We demonstrate the application of these tools by acquiring, analyzing and parameterizing IV data from a Pt/Cr<small><sub>2</sub></small>O<small><sub>3</sub></small>:Mg/β-Ga<small><sub>2</sub></small>O<small><sub>3</sub></small> heterojunction diode, a novel stack for high-power and high-temperature electronic devices. This approach underscores the powerful synergy between LLMs and the development of instruments for scientific inquiry, showcasing a path to further accelerate research progress towards synthesis and characterization in materials science.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 35-45"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00143e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating metal–organic framework discovery via synthesisability prediction: the MFD evaluation method for one-class classification models† 通过可合成性预测加速发现金属有机骨架:一类分类模型的MFD评价方法[j]
IF 6.2
Digital discovery Pub Date : 2024-10-22 DOI: 10.1039/D4DD00161C
Chi Zhang, Dmytro Antypov, Matthew J. Rosseinsky and Matthew S. Dyer
{"title":"Accelerating metal–organic framework discovery via synthesisability prediction: the MFD evaluation method for one-class classification models†","authors":"Chi Zhang, Dmytro Antypov, Matthew J. Rosseinsky and Matthew S. Dyer","doi":"10.1039/D4DD00161C","DOIUrl":"https://doi.org/10.1039/D4DD00161C","url":null,"abstract":"<p >Machine learning has found wide application in the materials field, particularly in discovering structure–property relationships. However, its potential in predicting synthetic accessibility of materials remains relatively unexplored due to the lack of negative data. In this study, we employ several one-class classification (OCC) approaches to accelerate the development of novel metal–organic framework materials by predicting their synthesisability. The evaluation of OCC model performance poses challenges, as traditional evaluation metrics are not applicable when dealing with a single type of data. To overcome this limitation, we introduce a quantitative approach, the maximum fraction difference (MFD) method, to assess and compare model performance, as well as determine optimal thresholds for effectively distinguishing between positives and negatives. A DeepSVDD model with superior predictive capability is proposed. By combining assessment of synthetic viability with porosity prediction models, a list of 3453 unreported combinations is generated and characterised by predictions of high synthesisability and large pore size. The MFD methodology proposed in this study is intended to provide an effective complementary assessment method for addressing the inherent challenges in evaluating OCC models. The research process, developed models, and predicted results of this study are aimed at helping prioritisation of materials for synthesis.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2509-2522"},"PeriodicalIF":6.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00161c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning for analyzing atomic force microscopy (AFM) images generated from polymer blends† 用于分析聚合物共混物产生的原子力显微镜(AFM)图像的机器学习
IF 6.2
Digital discovery Pub Date : 2024-10-21 DOI: 10.1039/D4DD00215F
Aanish Paruchuri, Yunfei Wang, Xiaodan Gu and Arthi Jayaraman
{"title":"Machine learning for analyzing atomic force microscopy (AFM) images generated from polymer blends†","authors":"Aanish Paruchuri, Yunfei Wang, Xiaodan Gu and Arthi Jayaraman","doi":"10.1039/D4DD00215F","DOIUrl":"https://doi.org/10.1039/D4DD00215F","url":null,"abstract":"<p >In this paper, we present a new machine learning (ML) workflow with unsupervised learning techniques to identify domains within atomic force microscopy (AFM) images obtained from polymer films. The goal of the workflow is to (i) identify the spatial location of two types of polymer domains with little to no manual intervention (Task 1) and (ii) calculate the domain size distributions, which in turn can help qualify the phase separated state of the material as macrophase or microphase ordered/disordered domains (Task 2). We briefly review existing approaches used in other fields – computer vision and signal processing – that can be applicable to the above tasks frequently encountered in the field of polymer science and engineering. We then test these approaches from computer vision and signal processing on the AFM image dataset to identify the strengths and limitations of each of these approaches for our first task. For our first domain segmentation task, we found that the workflow using discrete Fourier transform (DFT) or discrete cosine transform (DCT) with variance statistics as the feature works the best. The popular ResNet50 deep learning approach from the computer vision field exhibited relatively poorer performance in the domain segmentation task for our AFM images as compared to the DFT and DCT based workflows. For the second task, for each of the 144 input AFM images, we then used an existing Porespy Python package to calculate the domain size distribution from the output of that image from the DFT-based workflow. The information and open-source codes we share in this paper can serve as a guide for researchers in the fields of polymers and soft materials who need ML modeling and workflows for automated analyses of AFM images from polymer samples that may have crystalline/amorphous domains, sharp/rough interfaces between domains, or micro- or macro-phase separated domains.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2533-2550"},"PeriodicalIF":6.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00215f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信