“Twisting” the data: a universal machine-learning approach to classify single-molecule curves and beyond

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2025-09-15 DOI:10.1039/D5DD00207A

C. Roldán-Piñero, M. Teresa González, Pablo M. Olmos, Linda A. Zotti and Edmund Leary

{"title":"“Twisting” the data: a universal machine-learning approach to classify single-molecule curves and beyond","authors":"C. Roldán-Piñero, M. Teresa González, Pablo M. Olmos, Linda A. Zotti and Edmund Leary","doi":"10.1039/D5DD00207A","DOIUrl":null,"url":null,"abstract":"We present a new automated supervised procedure trained to classify both conductance-voltage (G(V)) curves and conductance-distance (G(z)) traces generated in single-molecule junctions to a high degree of confidence. Compared to unsupervised methods, our approach, involving a convolutional neural network (CNN), is vastly superior as it allows core shapes to be recognised by ignoring differences in scale and is relatively insensitive to conductance jumps. A key aspect is the transformation of curves into a spiral image map, which allows us to separate various fundamental G(V) and G(z) shapes from datasets containing tens of thousands of curves. Moreover, by using transfer learning, training requires little input data compared to other approaches. This is extremely advantageous as it reduces training time by many orders of magnitude and means the model can be trained on user-selected shapes, including rare types. This contrasts with arbitrary class-assignment, instead basing classification on a sound physical understanding of the system. Furthermore, as there is no minimum class population requirement, our method can be used to find rare events with a high degree of confidence. As an example, we used our procedure to find, with a minimum 66% confidence level, a class of G(V) curves which are parabolic at low bias but flat at high bias. Such curves make up just 4% of the total, and would be very difficult to detect cleanly with unsupervised methods. This gives insights into the electron transport behaviour at high-bias because we can now easily quantify the types of curves present. Thanks to its universality, this opens up new possibilities in general signal processing and the identification of rare and important events.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 3043-3052"},"PeriodicalIF":6.2000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00207a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00207a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

We present a new automated supervised procedure trained to classify both conductance-voltage (G(V)) curves and conductance-distance (G(z)) traces generated in single-molecule junctions to a high degree of confidence. Compared to unsupervised methods, our approach, involving a convolutional neural network (CNN), is vastly superior as it allows core shapes to be recognised by ignoring differences in scale and is relatively insensitive to conductance jumps. A key aspect is the transformation of curves into a spiral image map, which allows us to separate various fundamental G(V) and G(z) shapes from datasets containing tens of thousands of curves. Moreover, by using transfer learning, training requires little input data compared to other approaches. This is extremely advantageous as it reduces training time by many orders of magnitude and means the model can be trained on user-selected shapes, including rare types. This contrasts with arbitrary class-assignment, instead basing classification on a sound physical understanding of the system. Furthermore, as there is no minimum class population requirement, our method can be used to find rare events with a high degree of confidence. As an example, we used our procedure to find, with a minimum 66% confidence level, a class of G(V) curves which are parabolic at low bias but flat at high bias. Such curves make up just 4% of the total, and would be very difficult to detect cleanly with unsupervised methods. This gives insights into the electron transport behaviour at high-bias because we can now easily quantify the types of curves present. Thanks to its universality, this opens up new possibilities in general signal processing and the identification of rare and important events.

Abstract Image

查看原文本刊更多论文

“扭曲”数据：一种通用的机器学习方法，用于分类单分子曲线及其他

我们提出了一种新的自动监督程序，经过训练可以对单分子结中产生的电导-电压（G(V)）曲线和电导-距离（G(z)）迹线进行分类，具有很高的置信度。与无监督方法相比，我们的方法（涉及卷积神经网络（CNN））非常优越，因为它允许通过忽略尺度差异来识别核心形状，并且对电导跳变相对不敏感。一个关键方面是将曲线转换为螺旋图像映射，这允许我们从包含数万条曲线的数据集中分离各种基本G(V)和G(z)形状。此外，通过使用迁移学习，与其他方法相比，训练需要很少的输入数据。这是非常有利的，因为它减少了许多数量级的训练时间，并且意味着模型可以在用户选择的形状上进行训练，包括罕见的类型。这与任意的类分配形成对比，而是基于对系统的合理物理理解进行分类。此外，由于没有最小类人口要求，我们的方法可以用于寻找具有高置信度的罕见事件。作为一个例子，我们使用我们的程序发现，在最低66%的置信水平下，一类G(V)曲线在低偏置时呈抛物线状，但在高偏置时呈平坦状。这样的曲线只占总数的4%，并且很难用无监督的方法清晰地检测出来。这使我们对高偏置下的电子传递行为有了深入的了解，因为我们现在可以很容易地量化存在的曲线类型。由于其通用性，这为一般信号处理和罕见和重要事件的识别开辟了新的可能性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量