{"title":"UmamiPredict: machine learning model to predict umami taste of molecules and peptides.","authors":"Pavit Singh, Mansi Goel, Devansh Garg, Aaditya Bhargav, Ganesh Bagler","doi":"10.1007/s11030-025-11371-8","DOIUrl":null,"url":null,"abstract":"<p><p>Umami, recognized as the fifth basic taste, is primarily induced by specific amino acids and nucleotides, such as L-glutamate and inosinate, which interact with specialized taste receptors. Traditional foods like soy sauce, cheese, and fermented Asian products are rich in umami flavor. Despite extensive research into the biological mechanisms of umami perception, computational methods for predicting umami taste from molecular structures are underdeveloped due to the lack of dataset and inadequate feature representation from molecules. This study uses machine learning to introduce a computational model for classifying peptides and small molecules as umami or non-umami, addressing the gaps through comprehensive feature extraction and model integration. We curated a balanced dataset of 868 compounds (439 umami and 429 non-umami), and extracted a rich set of molecular descriptors representing their physicochemical and structural properties. Ensemble models, including LightGBM, XGBoost, and ExtraTrees, demonstrated high predictive accuracy across different datasets. Notably, the random forest classifier achieved an accuracy of 92.13% on the peptide-only dataset, while linear discriminant analysis and ExtraTrees classifiers attained an accuracy of 98.84% on the small molecules dataset. On the combined dataset, LightGBM achieved the highest accuracy of 96.55%, highlighting the effectiveness of integrating peptide and small molecule data for umami prediction. A user-friendly web server, UmamiPredict ( https://cosylab.iiitd.edu.in/umami/ ), facilitates users in predicting the umami taste of molecules and peptides with SMILES representations of molecules or peptides as input.</p>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11371-8","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Umami, recognized as the fifth basic taste, is primarily induced by specific amino acids and nucleotides, such as L-glutamate and inosinate, which interact with specialized taste receptors. Traditional foods like soy sauce, cheese, and fermented Asian products are rich in umami flavor. Despite extensive research into the biological mechanisms of umami perception, computational methods for predicting umami taste from molecular structures are underdeveloped due to the lack of dataset and inadequate feature representation from molecules. This study uses machine learning to introduce a computational model for classifying peptides and small molecules as umami or non-umami, addressing the gaps through comprehensive feature extraction and model integration. We curated a balanced dataset of 868 compounds (439 umami and 429 non-umami), and extracted a rich set of molecular descriptors representing their physicochemical and structural properties. Ensemble models, including LightGBM, XGBoost, and ExtraTrees, demonstrated high predictive accuracy across different datasets. Notably, the random forest classifier achieved an accuracy of 92.13% on the peptide-only dataset, while linear discriminant analysis and ExtraTrees classifiers attained an accuracy of 98.84% on the small molecules dataset. On the combined dataset, LightGBM achieved the highest accuracy of 96.55%, highlighting the effectiveness of integrating peptide and small molecule data for umami prediction. A user-friendly web server, UmamiPredict ( https://cosylab.iiitd.edu.in/umami/ ), facilitates users in predicting the umami taste of molecules and peptides with SMILES representations of molecules or peptides as input.
期刊介绍:
Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including:
combinatorial chemistry and parallel synthesis;
small molecule libraries;
microwave synthesis;
flow synthesis;
fluorous synthesis;
diversity oriented synthesis (DOS);
nanoreactors;
click chemistry;
multiplex technologies;
fragment- and ligand-based design;
structure/function/SAR;
computational chemistry and molecular design;
chemoinformatics;
screening techniques and screening interfaces;
analytical and purification methods;
robotics, automation and miniaturization;
targeted libraries;
display libraries;
peptides and peptoids;
proteins;
oligonucleotides;
carbohydrates;
natural diversity;
new methods of library formulation and deconvolution;
directed evolution, origin of life and recombination;
search techniques, landscapes, random chemistry and more;