Text mining-based profiling of chemical environments in protein–ligand binding assays across analytical techniques

IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS
Erdem Önal , Zeynep Kalaycıoğlu
{"title":"Text mining-based profiling of chemical environments in protein–ligand binding assays across analytical techniques","authors":"Erdem Önal ,&nbsp;Zeynep Kalaycıoğlu","doi":"10.1016/j.chemolab.2026.105659","DOIUrl":null,"url":null,"abstract":"<div><div>Protein–ligand binding studies are critical in drug discovery and development, as they offer valuable insights into molecular interactions that underlie biological function, disease mechanisms, and therapeutic effects. The potential of combining text mining with cheminformatics to explore trends in protein–ligand binding studies across a range of analytical techniques was evaluated in this study. Six widely used analytical techniques were selected to reveal important patterns. Utilizing an open-source Python platform (SCOPE), we analyzed over 33,000 scientific articles and more than 1.3 million chemical entities. The resulting data were visualized as two-dimensional hexbin plots, revealing trends in hydrophobicity (log P)–molecular weight (Da) for each technique. Instead of focusing solely on ligands, this study aims to characterize the overall chemical environments—including solvents, buffers, and supporting agents—associated with protein–ligand binding assays. By analyzing the physicochemical properties of compounds reported across different analytical techniques, we highlight how method-specific preferences shape the experimental design landscape. The analysis integrates unsupervised K-means clustering, multivariate principal component analysis (PCA), and nonparametric statistical testing to quantitatively compare technique-associated chemical spaces. Moreover, this study offers a data-driven perspective on methodologies and historical trends in protein–ligand binding research. It is positioned as a data-driven, method-centric literature analysis rather than a traditional narrative review.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105659"},"PeriodicalIF":3.8000,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743926000328","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/5 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Protein–ligand binding studies are critical in drug discovery and development, as they offer valuable insights into molecular interactions that underlie biological function, disease mechanisms, and therapeutic effects. The potential of combining text mining with cheminformatics to explore trends in protein–ligand binding studies across a range of analytical techniques was evaluated in this study. Six widely used analytical techniques were selected to reveal important patterns. Utilizing an open-source Python platform (SCOPE), we analyzed over 33,000 scientific articles and more than 1.3 million chemical entities. The resulting data were visualized as two-dimensional hexbin plots, revealing trends in hydrophobicity (log P)–molecular weight (Da) for each technique. Instead of focusing solely on ligands, this study aims to characterize the overall chemical environments—including solvents, buffers, and supporting agents—associated with protein–ligand binding assays. By analyzing the physicochemical properties of compounds reported across different analytical techniques, we highlight how method-specific preferences shape the experimental design landscape. The analysis integrates unsupervised K-means clustering, multivariate principal component analysis (PCA), and nonparametric statistical testing to quantitatively compare technique-associated chemical spaces. Moreover, this study offers a data-driven perspective on methodologies and historical trends in protein–ligand binding research. It is positioned as a data-driven, method-centric literature analysis rather than a traditional narrative review.
跨分析技术的蛋白质配体结合分析中基于文本挖掘的化学环境分析
蛋白质-配体结合研究在药物发现和开发中至关重要,因为它们为生物学功能、疾病机制和治疗效果基础上的分子相互作用提供了有价值的见解。本研究评估了将文本挖掘与化学信息学相结合的潜力,通过一系列分析技术探索蛋白质配体结合研究的趋势。选择了六种广泛使用的分析技术来揭示重要的模式。利用开源Python平台(SCOPE),我们分析了超过33,000篇科学文章和超过130万个化学实体。结果数据被可视化为二维hexbin图,揭示了每种技术的疏水性(log P) -分子量(Da)的趋势。而不是仅仅关注配体,本研究的目的是表征整体的化学环境-包括溶剂,缓冲液和支持剂-与蛋白质配体结合分析相关。通过分析不同分析技术报告的化合物的物理化学性质,我们强调了方法特定偏好如何塑造实验设计景观。该分析集成了无监督k均值聚类、多元主成分分析(PCA)和非参数统计检验,以定量比较技术相关的化学空间。此外,本研究为蛋白质配体结合研究的方法和历史趋势提供了数据驱动的视角。它被定位为数据驱动的、以方法为中心的文献分析,而不是传统的叙事评论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书