Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes
{"title":"LigExtract: Large-scale Automated Identification of Ligands from Protein Structures in the Protein Data Bank.","authors":"Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes","doi":"10.1093/gpbjnl/qzaf018","DOIUrl":null,"url":null,"abstract":"<p><p>The Protein Data Bank is an ever-growing database of 3D macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing the ligands in these proteins is paramount to help researchers understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools to perform large-scale ligand identification do not address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing whereby the user simply provides a list of UniProt IDs and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains engaged with the ligand and a series of log files that inform the user of the decisions made during the ligand extraction process as well as potential flagging of additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is available, open-source, on GitHub (https://github.com/comp-medchem/LigExtract).</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, proteomics & bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gpbjnl/qzaf018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Protein Data Bank is an ever-growing database of 3D macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing the ligands in these proteins is paramount to help researchers understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools to perform large-scale ligand identification do not address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing whereby the user simply provides a list of UniProt IDs and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains engaged with the ligand and a series of log files that inform the user of the decisions made during the ligand extraction process as well as potential flagging of additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is available, open-source, on GitHub (https://github.com/comp-medchem/LigExtract).