C. Roldán-Piñero, M. Teresa González, Pablo M. Olmos, Linda A. Zotti and Edmund Leary
{"title":"“Twisting” the data: a universal machine-learning approach to classify single-molecule curves and beyond","authors":"C. Roldán-Piñero, M. Teresa González, Pablo M. Olmos, Linda A. Zotti and Edmund Leary","doi":"10.1039/D5DD00207A","DOIUrl":null,"url":null,"abstract":"<p >We present a new automated supervised procedure trained to classify both conductance-voltage (<em>G</em>(<em>V</em>)) curves and conductance-distance (<em>G</em>(<em>z</em>)) traces generated in single-molecule junctions to a high degree of confidence. Compared to unsupervised methods, our approach, involving a convolutional neural network (CNN), is vastly superior as it allows core shapes to be recognised by ignoring differences in scale and is relatively insensitive to conductance jumps. A key aspect is the transformation of curves into a spiral image map, which allows us to separate various fundamental <em>G</em>(<em>V</em>) and <em>G</em>(<em>z</em>) shapes from datasets containing tens of thousands of curves. Moreover, by using transfer learning, training requires little input data compared to other approaches. This is extremely advantageous as it reduces training time by many orders of magnitude and means the model can be trained on user-selected shapes, including rare types. This contrasts with arbitrary class-assignment, instead basing classification on a sound physical understanding of the system. Furthermore, as there is no minimum class population requirement, our method can be used to find rare events with a high degree of confidence. As an example, we used our procedure to find, with a minimum 66% confidence level, a class of <em>G</em>(<em>V</em>) curves which are parabolic at low bias but flat at high bias. Such curves make up just 4% of the total, and would be very difficult to detect cleanly with unsupervised methods. This gives insights into the electron transport behaviour at high-bias because we can now easily quantify the types of curves present. Thanks to its universality, this opens up new possibilities in general signal processing and the identification of rare and important events.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 3043-3052"},"PeriodicalIF":6.2000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00207a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00207a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
We present a new automated supervised procedure trained to classify both conductance-voltage (G(V)) curves and conductance-distance (G(z)) traces generated in single-molecule junctions to a high degree of confidence. Compared to unsupervised methods, our approach, involving a convolutional neural network (CNN), is vastly superior as it allows core shapes to be recognised by ignoring differences in scale and is relatively insensitive to conductance jumps. A key aspect is the transformation of curves into a spiral image map, which allows us to separate various fundamental G(V) and G(z) shapes from datasets containing tens of thousands of curves. Moreover, by using transfer learning, training requires little input data compared to other approaches. This is extremely advantageous as it reduces training time by many orders of magnitude and means the model can be trained on user-selected shapes, including rare types. This contrasts with arbitrary class-assignment, instead basing classification on a sound physical understanding of the system. Furthermore, as there is no minimum class population requirement, our method can be used to find rare events with a high degree of confidence. As an example, we used our procedure to find, with a minimum 66% confidence level, a class of G(V) curves which are parabolic at low bias but flat at high bias. Such curves make up just 4% of the total, and would be very difficult to detect cleanly with unsupervised methods. This gives insights into the electron transport behaviour at high-bias because we can now easily quantify the types of curves present. Thanks to its universality, this opens up new possibilities in general signal processing and the identification of rare and important events.