Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes
{"title":"LigExtract:在蛋白质数据库中从蛋白质结构中大规模自动识别配体。","authors":"Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes","doi":"10.1093/gpbjnl/qzaf018","DOIUrl":null,"url":null,"abstract":"<p><p>The Protein Data Bank is an ever-growing database of 3D macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing the ligands in these proteins is paramount to help researchers understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools to perform large-scale ligand identification do not address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing whereby the user simply provides a list of UniProt IDs and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains engaged with the ligand and a series of log files that inform the user of the decisions made during the ligand extraction process as well as potential flagging of additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is available, open-source, on GitHub (https://github.com/comp-medchem/LigExtract).</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LigExtract: Large-scale Automated Identification of Ligands from Protein Structures in the Protein Data Bank.\",\"authors\":\"Natália Aniceto, Nuno Martinho, Ismael Rufino, Rita C Guedes\",\"doi\":\"10.1093/gpbjnl/qzaf018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The Protein Data Bank is an ever-growing database of 3D macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing the ligands in these proteins is paramount to help researchers understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools to perform large-scale ligand identification do not address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing whereby the user simply provides a list of UniProt IDs and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains engaged with the ligand and a series of log files that inform the user of the decisions made during the ligand extraction process as well as potential flagging of additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is available, open-source, on GitHub (https://github.com/comp-medchem/LigExtract).</p>\",\"PeriodicalId\":94020,\"journal\":{\"name\":\"Genomics, proteomics & bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, proteomics & bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/gpbjnl/qzaf018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, proteomics & bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gpbjnl/qzaf018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LigExtract: Large-scale Automated Identification of Ligands from Protein Structures in the Protein Data Bank.
The Protein Data Bank is an ever-growing database of 3D macromolecular structures that has become a crucial resource for the drug discovery process. Exploring complexed proteins and accessing the ligands in these proteins is paramount to help researchers understand biological processes and design new compounds of pharmaceutical interest. However, currently available tools to perform large-scale ligand identification do not address many of the more complex ways in which ligands are stored and represented in PDB structures. Therefore, a new tool called LigExtract was specifically developed for the large-scale processing of PDB structures and the identification of their ligands. This is a fully open-source tool available to the scientific community, designed to provide end-to-end processing whereby the user simply provides a list of UniProt IDs and LigExtract returns a list of ligands, their individual PDB files, a PDB file of the protein chains engaged with the ligand and a series of log files that inform the user of the decisions made during the ligand extraction process as well as potential flagging of additional scenarios that might have to be considered during any follow-up use of the processed files (e.g., ligands covalently bound to the protein). LigExtract is available, open-source, on GitHub (https://github.com/comp-medchem/LigExtract).