UmamiPredict: machine learning model to predict umami taste of molecules and peptides.

IF 3.8 2区化学 Q2 CHEMISTRY, APPLIED

Molecular Diversity Pub Date : 2025-10-04 DOI:10.1007/s11030-025-11371-8

Pavit Singh, Mansi Goel, Devansh Garg, Aaditya Bhargav, Ganesh Bagler

{"title":"UmamiPredict: machine learning model to predict umami taste of molecules and peptides.","authors":"Pavit Singh, Mansi Goel, Devansh Garg, Aaditya Bhargav, Ganesh Bagler","doi":"10.1007/s11030-025-11371-8","DOIUrl":null,"url":null,"abstract":"<p><p>Umami, recognized as the fifth basic taste, is primarily induced by specific amino acids and nucleotides, such as L-glutamate and inosinate, which interact with specialized taste receptors. Traditional foods like soy sauce, cheese, and fermented Asian products are rich in umami flavor. Despite extensive research into the biological mechanisms of umami perception, computational methods for predicting umami taste from molecular structures are underdeveloped due to the lack of dataset and inadequate feature representation from molecules. This study uses machine learning to introduce a computational model for classifying peptides and small molecules as umami or non-umami, addressing the gaps through comprehensive feature extraction and model integration. We curated a balanced dataset of 868 compounds (439 umami and 429 non-umami), and extracted a rich set of molecular descriptors representing their physicochemical and structural properties. Ensemble models, including LightGBM, XGBoost, and ExtraTrees, demonstrated high predictive accuracy across different datasets. Notably, the random forest classifier achieved an accuracy of 92.13% on the peptide-only dataset, while linear discriminant analysis and ExtraTrees classifiers attained an accuracy of 98.84% on the small molecules dataset. On the combined dataset, LightGBM achieved the highest accuracy of 96.55%, highlighting the effectiveness of integrating peptide and small molecule data for umami prediction. A user-friendly web server, UmamiPredict ( https://cosylab.iiitd.edu.in/umami/ ), facilitates users in predicting the umami taste of molecules and peptides with SMILES representations of molecules or peptides as input.</p>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11371-8","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Umami, recognized as the fifth basic taste, is primarily induced by specific amino acids and nucleotides, such as L-glutamate and inosinate, which interact with specialized taste receptors. Traditional foods like soy sauce, cheese, and fermented Asian products are rich in umami flavor. Despite extensive research into the biological mechanisms of umami perception, computational methods for predicting umami taste from molecular structures are underdeveloped due to the lack of dataset and inadequate feature representation from molecules. This study uses machine learning to introduce a computational model for classifying peptides and small molecules as umami or non-umami, addressing the gaps through comprehensive feature extraction and model integration. We curated a balanced dataset of 868 compounds (439 umami and 429 non-umami), and extracted a rich set of molecular descriptors representing their physicochemical and structural properties. Ensemble models, including LightGBM, XGBoost, and ExtraTrees, demonstrated high predictive accuracy across different datasets. Notably, the random forest classifier achieved an accuracy of 92.13% on the peptide-only dataset, while linear discriminant analysis and ExtraTrees classifiers attained an accuracy of 98.84% on the small molecules dataset. On the combined dataset, LightGBM achieved the highest accuracy of 96.55%, highlighting the effectiveness of integrating peptide and small molecule data for umami prediction. A user-friendly web server, UmamiPredict ( https://cosylab.iiitd.edu.in/umami/ ), facilitates users in predicting the umami taste of molecules and peptides with SMILES representations of molecules or peptides as input.

查看原文本刊更多论文

UmamiPredict：预测分子和肽鲜味的机器学习模型。

鲜味被认为是第五种基本味觉，主要是由特定的氨基酸和核苷酸引起的，如l -谷氨酸和肌苷酸，它们与专门的味觉受体相互作用。传统食品如酱油、奶酪和发酵的亚洲产品都有丰富的鲜味。尽管对鲜味感知的生物学机制进行了广泛的研究，但由于缺乏数据集和分子特征表示不足，从分子结构预测鲜味的计算方法尚不发达。本研究利用机器学习引入一种计算模型，将多肽和小分子分类为鲜味或非鲜味，通过综合特征提取和模型集成来解决两者之间的差距。我们整理了868个化合物（439个鲜味化合物和429个非鲜味化合物）的平衡数据集，并提取了一套丰富的分子描述符，代表了它们的物理化学和结构性质。包括LightGBM、XGBoost和ExtraTrees在内的集成模型在不同的数据集上显示出很高的预测精度。值得注意的是，随机森林分类器在仅肽数据集上的准确率为92.13%，而线性判别分析和ExtraTrees分类器在小分子数据集上的准确率为98.84%。在组合数据集上，LightGBM达到了96.55%的最高准确率，突出了整合肽和小分子数据进行鲜味预测的有效性。一个用户友好的web服务器，UmamiPredict (https://cosylab.iiitd.edu.in/umami/)，方便用户预测分子和肽的鲜味，分子或肽的SMILES表示作为输入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular Diversity 化学-化学综合

CiteScore

7.30

自引率

7.90%

发文量

219

审稿时长

2.7 months

期刊介绍： Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including: combinatorial chemistry and parallel synthesis; small molecule libraries; microwave synthesis; flow synthesis; fluorous synthesis; diversity oriented synthesis (DOS); nanoreactors; click chemistry; multiplex technologies; fragment- and ligand-based design; structure/function/SAR; computational chemistry and molecular design; chemoinformatics; screening techniques and screening interfaces; analytical and purification methods; robotics, automation and miniaturization; targeted libraries; display libraries; peptides and peptoids; proteins; oligonucleotides; carbohydrates; natural diversity; new methods of library formulation and deconvolution; directed evolution, origin of life and recombination; search techniques, landscapes, random chemistry and more;