A comparative study of machine learning models on molecular fingerprints for odor decoding.

IF 6.2 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Jinyoung Suh, Yeonju Hong, Chunho Park
{"title":"A comparative study of machine learning models on molecular fingerprints for odor decoding.","authors":"Jinyoung Suh, Yeonju Hong, Chunho Park","doi":"10.1038/s42004-025-01651-7","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding how molecular structure relates to odor perception is a longstanding problem, with important implications for fragrance development and sensory science. In this study, we present an advanced comparative analysis of machine learning approaches for predicting fragrance odors, examining both individual descriptor-based models and integrated frameworks. Using a curated dataset of 8681 compounds from ten expert sources, we benchmark functional group fingerprints, classical molecular descriptors, and Morgan structural fingerprints across Random Forest, eXtreme Gradient Boosting, and Light Gradient Boosting Machine. The Morgan-fingerprint-based XGBoost model achieves the highest discrimination (AUROC 0.828, AUPRC 0.237), outperforming descriptor-based models. Our findings highlight the superior representational capacity of molecular fingerprints to capture olfactory cues, not only achieving high predictive performance but also revealing a continuous, interpretable scent space that aligns with perceptual and chemical relationships. This paves the way for data-driven research into olfactory mechanisms, alongside the next generation of in silico odor prediction.</p>","PeriodicalId":10529,"journal":{"name":"Communications Chemistry","volume":"8 1","pages":"278"},"PeriodicalIF":6.2000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462479/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1038/s42004-025-01651-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding how molecular structure relates to odor perception is a longstanding problem, with important implications for fragrance development and sensory science. In this study, we present an advanced comparative analysis of machine learning approaches for predicting fragrance odors, examining both individual descriptor-based models and integrated frameworks. Using a curated dataset of 8681 compounds from ten expert sources, we benchmark functional group fingerprints, classical molecular descriptors, and Morgan structural fingerprints across Random Forest, eXtreme Gradient Boosting, and Light Gradient Boosting Machine. The Morgan-fingerprint-based XGBoost model achieves the highest discrimination (AUROC 0.828, AUPRC 0.237), outperforming descriptor-based models. Our findings highlight the superior representational capacity of molecular fingerprints to capture olfactory cues, not only achieving high predictive performance but also revealing a continuous, interpretable scent space that aligns with perceptual and chemical relationships. This paves the way for data-driven research into olfactory mechanisms, alongside the next generation of in silico odor prediction.

Abstract Image

Abstract Image

Abstract Image

分子指纹气味解码机器学习模型的比较研究。
了解分子结构如何与气味感知相关是一个长期存在的问题,对芳香开发和感官科学具有重要意义。在这项研究中,我们提出了用于预测香水气味的机器学习方法的高级比较分析,检查了基于个体描述符的模型和集成框架。利用来自10个专家来源的8681种化合物的精选数据集,我们对随机森林、极端梯度增强和光梯度增强机中的功能基团指纹、经典分子描述符和摩根结构指纹进行了基准测试。基于morgan指纹的XGBoost模型具有最高的识别率(AUROC为0.828,AUPRC为0.237),优于基于描述符的模型。我们的研究结果强调了分子指纹在捕捉嗅觉线索方面的卓越表现能力,不仅实现了高预测性能,而且揭示了一个连续的、可解释的气味空间,该空间与感知和化学关系相一致。这为数据驱动的嗅觉机制研究铺平了道路,同时也为下一代的计算机气味预测铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Communications Chemistry
Communications Chemistry Chemistry-General Chemistry
CiteScore
7.70
自引率
1.70%
发文量
146
审稿时长
13 weeks
期刊介绍: Communications Chemistry is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the chemical sciences. Research papers published by the journal represent significant advances bringing new chemical insight to a specialized area of research. We also aim to provide a community forum for issues of importance to all chemists, regardless of sub-discipline.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信