{"title":"SitoshnaPred: Learning phytochemical descriptors to elucidate Ayurvedic herbal potency","authors":"Ashish Panghalia, Vikram Singh","doi":"10.1016/j.jep.2025.120620","DOIUrl":null,"url":null,"abstract":"<div><h3>Ethnopharmacological relevance</h3><div>In Ayurveda, the potency of herbs is traditionally classified into three categories, namely, Sīta (cold), Uṣṇa (hot), and Unuṣṇa (neutral). In this study, a gold standard dataset of 627 herbs with their potency preferences has been developed using the classical literature-based textbooks. With the premise that the herbal potencies must be associated with their phytochemical constituents, a machine learning framework is developed using the molecular descriptors of the phytochemicals. This study will pave the way for characterizing the molecular basis of the traditional wisdom of herbal potencies and for utilizing this knowledge for the future development of artificial intelligence-guided technologies towards predicting the potency and other features of the traditional herbs.</div></div><div><h3>Objectives</h3><div>This study aims to examine the cold and hot nature of Ayurvedic herbs by analyzing their constituent phytochemicals and to develop an ensemble learning-based binary classifier.</div></div><div><h3>Methods</h3><div>We developed a standard dataset of 627 herbs that are commonly used in Ayurvedic formulations and are well classified into the Sīta (cold), Uṣṇa (hot), and Unuṣṇa (neutral) categories as per their potency. Since only 7 herbs were associated with Unuṣṇa category, this class of herbs were not used in the further studies. Firstly, a dataset comprising of 13,534 phytochemicals associated with 454 herbs was developed; no associations could be retrieved for remaining herbs. Further, the phytochemicals distribution into the Sīta and Uṣṇa herbs was studied, and 1613 two-dimensional and 213 three-dimensional molecular descriptors were calculated for each phytochemical. By reducing the dimensionality of the dataset corresponding to the 95 % variability, binary and ternary SitoshnaPred classifiers were developed using the LightGBM algorithm. SHAP (SHapley Additive exPlanations) analysis and Loadings analysis were conducted to interpret the models’ predictions and to identify the most influential molecular descriptors contributing to classification performance.</div></div><div><h3>Results</h3><div>A total of 7193 phytochemicals were identified in the herbs of Sīta group, and 9116 phytochemicals in the herbs of Uṣṇa group. The LightGBM algorithm-based SitoshnaPred models were developed using the top 92 principal components corresponding to 95 % of the total variance. The binary classification model achieved an accuracy of 94.01 % and an AUC of 98.41 %, whereas the ternary classification model attained an accuracy of 79.84 % and an AUC of 98.10 %. SHAP (SHapley Additive exPlanations) analysis, combined with loadings analysis, was used to interpret the models’ predictions and identify the most influential molecular descriptors contributing to classification performance.</div></div><div><h3>Conclusion</h3><div>Molecular descriptors of the constituent phytochemicals carry strong signals of the herbal potency characteristics.</div></div>","PeriodicalId":15761,"journal":{"name":"Journal of ethnopharmacology","volume":"355 ","pages":"Article 120620"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ethnopharmacology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378874125013121","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
Ethnopharmacological relevance
In Ayurveda, the potency of herbs is traditionally classified into three categories, namely, Sīta (cold), Uṣṇa (hot), and Unuṣṇa (neutral). In this study, a gold standard dataset of 627 herbs with their potency preferences has been developed using the classical literature-based textbooks. With the premise that the herbal potencies must be associated with their phytochemical constituents, a machine learning framework is developed using the molecular descriptors of the phytochemicals. This study will pave the way for characterizing the molecular basis of the traditional wisdom of herbal potencies and for utilizing this knowledge for the future development of artificial intelligence-guided technologies towards predicting the potency and other features of the traditional herbs.
Objectives
This study aims to examine the cold and hot nature of Ayurvedic herbs by analyzing their constituent phytochemicals and to develop an ensemble learning-based binary classifier.
Methods
We developed a standard dataset of 627 herbs that are commonly used in Ayurvedic formulations and are well classified into the Sīta (cold), Uṣṇa (hot), and Unuṣṇa (neutral) categories as per their potency. Since only 7 herbs were associated with Unuṣṇa category, this class of herbs were not used in the further studies. Firstly, a dataset comprising of 13,534 phytochemicals associated with 454 herbs was developed; no associations could be retrieved for remaining herbs. Further, the phytochemicals distribution into the Sīta and Uṣṇa herbs was studied, and 1613 two-dimensional and 213 three-dimensional molecular descriptors were calculated for each phytochemical. By reducing the dimensionality of the dataset corresponding to the 95 % variability, binary and ternary SitoshnaPred classifiers were developed using the LightGBM algorithm. SHAP (SHapley Additive exPlanations) analysis and Loadings analysis were conducted to interpret the models’ predictions and to identify the most influential molecular descriptors contributing to classification performance.
Results
A total of 7193 phytochemicals were identified in the herbs of Sīta group, and 9116 phytochemicals in the herbs of Uṣṇa group. The LightGBM algorithm-based SitoshnaPred models were developed using the top 92 principal components corresponding to 95 % of the total variance. The binary classification model achieved an accuracy of 94.01 % and an AUC of 98.41 %, whereas the ternary classification model attained an accuracy of 79.84 % and an AUC of 98.10 %. SHAP (SHapley Additive exPlanations) analysis, combined with loadings analysis, was used to interpret the models’ predictions and identify the most influential molecular descriptors contributing to classification performance.
Conclusion
Molecular descriptors of the constituent phytochemicals carry strong signals of the herbal potency characteristics.
期刊介绍:
The Journal of Ethnopharmacology is dedicated to the exchange of information and understandings about people''s use of plants, fungi, animals, microorganisms and minerals and their biological and pharmacological effects based on the principles established through international conventions. Early people confronted with illness and disease, discovered a wealth of useful therapeutic agents in the plant and animal kingdoms. The empirical knowledge of these medicinal substances and their toxic potential was passed on by oral tradition and sometimes recorded in herbals and other texts on materia medica. Many valuable drugs of today (e.g., atropine, ephedrine, tubocurarine, digoxin, reserpine) came into use through the study of indigenous remedies. Chemists continue to use plant-derived drugs (e.g., morphine, taxol, physostigmine, quinidine, emetine) as prototypes in their attempts to develop more effective and less toxic medicinals.