{"title":"Accelerating scientific progress with preprints","authors":"","doi":"10.1038/s43588-024-00641-4","DOIUrl":"10.1038/s43588-024-00641-4","url":null,"abstract":"We recognize the importance of preprint posting in communicating research findings and encourage our authors to make use of this service.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"311-311"},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00641-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141176767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outsourcing eureka moments to artificial intelligence","authors":"Martijn Meeter","doi":"10.1038/s43588-024-00633-4","DOIUrl":"10.1038/s43588-024-00633-4","url":null,"abstract":"A two-stage learning algorithm is proposed to directly uncover the symbolic representation of rules for skill acquisition from large-scale training log data.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"314-315"},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrete latent embeddings illuminate cellular diversity in single-cell epigenomics","authors":"Zhi Wei","doi":"10.1038/s43588-024-00634-3","DOIUrl":"10.1038/s43588-024-00634-3","url":null,"abstract":"CASTLE, a deep learning approach, extracts interpretable discrete representations from single-cell chromatin accessibility data, enabling accurate cell type identification, effective data integration, and quantitative insights into gene regulatory mechanisms.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"316-317"},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141099637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sannyuya Liu, Qing Li, Xiaoxuan Shen, Jianwen Sun, Zongkai Yang
{"title":"Automated discovery of symbolic laws governing skill acquisition from naturally occurring data","authors":"Sannyuya Liu, Qing Li, Xiaoxuan Shen, Jianwen Sun, Zongkai Yang","doi":"10.1038/s43588-024-00629-0","DOIUrl":"10.1038/s43588-024-00629-0","url":null,"abstract":"Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and an algorithmic explosion in searching. A deep learning model is initially employed to determine the learner’s cognitive state and assess the feature importance. Symbolic regression algorithms are then used to parse the neural network model into algebraic equations. Experimental results show that the algorithm can accurately restore preset laws within a noise range in continuous feedback settings. When applied to Lumosity training data, the method outperforms traditional and recent models in fitness terms. The study reveals two new forms of skill acquisition laws and reaffirms some previous findings. This paper introduces an algorithm to uncover laws of skill acquisition from naturally occurring data. By combining deep learning and symbolic regression, it accurately identifies cognitive states and extracts algebraic equations.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"334-345"},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Designing semiconductor materials and devices in the post-Moore era by tackling computational challenges with data-driven strategies","authors":"Jiahao Xie, Yansong Zhou, Muhammad Faizan, Zewei Li, Tianshu Li, Yuhao Fu, Xinjiang Wang, Lijun Zhang","doi":"10.1038/s43588-024-00632-5","DOIUrl":"10.1038/s43588-024-00632-5","url":null,"abstract":"In the post-Moore’s law era, the progress of electronics relies on discovering superior semiconductor materials and optimizing device fabrication. Computational methods, augmented by emerging data-driven strategies, offer a promising alternative to the traditional trial-and-error approach. In this Perspective, we highlight data-driven computational frameworks for enhancing semiconductor discovery and device development by elaborating on their advances in exploring the materials design space, predicting semiconductor properties and optimizing device fabrication, with a concluding discussion on the challenges and opportunities in these areas. Discovering improved semiconductor materials is essential for optimal device fabrication. In this Perspective, data-driven computational frameworks for semiconductor discovery and device development are discussed, including the challenges and opportunities moving forward.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"322-333"},"PeriodicalIF":0.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141087004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shuffling haplotypes to share reference panels for imputation","authors":"","doi":"10.1038/s43588-024-00640-5","DOIUrl":"10.1038/s43588-024-00640-5","url":null,"abstract":"We present a method to alleviate re-identification risks behind sharing haplotype reference panels for imputation. In an anonymized reference panel, one might try to infer the genomes’ phenotypes to re-identify their owner. Our method protects against such attack by shuffling the reference panels genomes while maintaining imputation accuracy.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"320-321"},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A resampling-based approach to share reference panels","authors":"Théo Cavinato, Simone Rubinacci, Anna-Sapfo Malaspinas, Olivier Delaneau","doi":"10.1038/s43588-024-00630-7","DOIUrl":"10.1038/s43588-024-00630-7","url":null,"abstract":"For many genome-wide association studies, imputing genotypes from a haplotype reference panel is a necessary step. Over the past 15 years, reference panels have become larger and more diverse, leading to improvements in imputation accuracy. However, the latest generation of reference panels is subject to restrictions on data sharing due to concerns about privacy, limiting their usefulness for genotype imputation. In this context, here we propose RESHAPE, a method that employs a recombination Poisson process on a reference panel to simulate the genomes of hypothetical descendants after multiple generations. This data transformation helps to protect against re-identification threats and preserves data attributes, such as linkage disequilibrium patterns and, to some degree, identity-by-descent sharing, allowing for genotype imputation. Our experiments on gold-standard datasets show that simulated descendants up to eight generations can serve as reference panels without substantially reducing genotype imputation accuracy. The authors develop the tool RESHAPE to share reference panels in a safer way. The genome–phenome links in reference panels can generate re-identification threats and RESHAPE breaks these links by shuffling haplotypes while preserving imputation accuracy.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"360-366"},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00630-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multidimensional dataset for structure-based machine learning","authors":"Matthew Holcomb, Stefano Forli","doi":"10.1038/s43588-024-00631-6","DOIUrl":"10.1038/s43588-024-00631-6","url":null,"abstract":"MISATO, a dataset for structure-based drug discovery combines quantum mechanics property data and molecular dynamics simulations on ~20,000 protein–ligand structures, substantially extends the amount of data available to the community and holds potential for advancing work in drug discovery.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"318-319"},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Till Siebenmorgen, Filipe Menezes, Sabrina Benassou, Erinc Merdivan, Kieran Didi, André Santos Dias Mourão, Radosław Kitel, Pietro Liò, Stefan Kesselheim, Marie Piraud, Fabian J. Theis, Michael Sattler, Grzegorz M. Popowicz
{"title":"MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery","authors":"Till Siebenmorgen, Filipe Menezes, Sabrina Benassou, Erinc Merdivan, Kieran Didi, André Santos Dias Mourão, Radosław Kitel, Pietro Liò, Stefan Kesselheim, Marie Piraud, Fabian J. Theis, Michael Sattler, Grzegorz M. Popowicz","doi":"10.1038/s43588-024-00627-2","DOIUrl":"10.1038/s43588-024-00627-2","url":null,"abstract":"Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule–ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein–ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein–ligand complexes in explicit water is included, accumulating over 170 μs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models. MISATO is a database for structure-based drug discovery that combines quantum mechanics data with molecular dynamics simulations on ~20,000 protein–ligand structures. The artificial intelligence models included provide an easy entry point for the machine learning and drug discovery communities.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"367-378"},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43588-024-00627-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrete latent embedding of single-cell chromatin accessibility sequencing data for uncovering cell heterogeneity","authors":"Xuejian Cui, Xiaoyang Chen, Zhen Li, Zijing Gao, Shengquan Chen, Rui Jiang","doi":"10.1038/s43588-024-00625-4","DOIUrl":"10.1038/s43588-024-00625-4","url":null,"abstract":"Single-cell epigenomic data has been growing continuously at an unprecedented pace, but their characteristics such as high dimensionality and sparsity pose substantial challenges to downstream analysis. Although deep learning models—especially variational autoencoders—have been widely used to capture low-dimensional feature embeddings, the prevalent Gaussian assumption somewhat disagrees with real data, and these models tend to struggle to incorporate reference information from abundant cell atlases. Here we propose CASTLE, a deep generative model based on the vector-quantized variational autoencoder framework to extract discrete latent embeddings that interpretably characterize single-cell chromatin accessibility sequencing data. We validate the performance and robustness of CASTLE for accurate cell-type identification and reasonable visualization compared with state-of-the-art methods. We demonstrate the advantages of CASTLE for effective incorporation of existing massive reference datasets in a weakly supervised or supervised manner. We further demonstrate CASTLE’s capacity for intuitively distilling cell-type-specific feature spectra that unveil cell heterogeneity and biological implications quantitatively. A method based on a vector-quantized variational autoencoder, called CASTLE, can interpretably extract discrete latent embeddings and quantitatively generate the cell-type-specific feature spectrum for single-cell chromatin accessibility sequencing data.","PeriodicalId":74246,"journal":{"name":"Nature computational science","volume":"4 5","pages":"346-359"},"PeriodicalIF":0.0,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}