Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx
{"title":"一种新型天然产物数据库化学信息学工具的开发","authors":"Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx","doi":"10.4155/fdd-2023-0007","DOIUrl":null,"url":null,"abstract":"Aim: This study aimed to develop a chemoinformatic tool for extracting natural product information from academic literature. Materials & methods: Machine learning graph embeddings were used to extract knowledge from a knowledge graph, connecting properties, molecular data and BERTopic topics. Results: Metapath2Vec performed best in extracting compound names and showed improvement over evaluation stages. Embedding Propagation on Heterogeneous Networks achieved the best performance in extracting bioactivity information. Metapath2Vec excelled in extracting species information, while DeepWalk and Node2Vec performed well in one stage for species location extraction. Embedding Propagation on Heterogeneous Networks consistently improved performance and achieved the best overall scores. Unsupervised embeddings effectively extracted knowledge, with different methods excelling in different scenarios. Conclusion: This research establishes a foundation for frameworks in knowledge extraction, benefiting sustainable resource use.","PeriodicalId":73122,"journal":{"name":"Future drug discovery","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of a novel chemoinformatic tool for natural product databases\",\"authors\":\"Paulo Ricardo Viviurka do Carmo, Ricardo Marcacini, Marilia Valli, João Victor Silva-Silva, Leonardo Luiz Gomes Ferreira, Alan Cesar Pilon, Vanderlan da Silva Bolzani, Adriano D Andricopulo, Edgard Marx\",\"doi\":\"10.4155/fdd-2023-0007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aim: This study aimed to develop a chemoinformatic tool for extracting natural product information from academic literature. Materials & methods: Machine learning graph embeddings were used to extract knowledge from a knowledge graph, connecting properties, molecular data and BERTopic topics. Results: Metapath2Vec performed best in extracting compound names and showed improvement over evaluation stages. Embedding Propagation on Heterogeneous Networks achieved the best performance in extracting bioactivity information. Metapath2Vec excelled in extracting species information, while DeepWalk and Node2Vec performed well in one stage for species location extraction. Embedding Propagation on Heterogeneous Networks consistently improved performance and achieved the best overall scores. Unsupervised embeddings effectively extracted knowledge, with different methods excelling in different scenarios. Conclusion: This research establishes a foundation for frameworks in knowledge extraction, benefiting sustainable resource use.\",\"PeriodicalId\":73122,\"journal\":{\"name\":\"Future drug discovery\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future drug discovery\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4155/fdd-2023-0007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future drug discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4155/fdd-2023-0007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development of a novel chemoinformatic tool for natural product databases
Aim: This study aimed to develop a chemoinformatic tool for extracting natural product information from academic literature. Materials & methods: Machine learning graph embeddings were used to extract knowledge from a knowledge graph, connecting properties, molecular data and BERTopic topics. Results: Metapath2Vec performed best in extracting compound names and showed improvement over evaluation stages. Embedding Propagation on Heterogeneous Networks achieved the best performance in extracting bioactivity information. Metapath2Vec excelled in extracting species information, while DeepWalk and Node2Vec performed well in one stage for species location extraction. Embedding Propagation on Heterogeneous Networks consistently improved performance and achieved the best overall scores. Unsupervised embeddings effectively extracted knowledge, with different methods excelling in different scenarios. Conclusion: This research establishes a foundation for frameworks in knowledge extraction, benefiting sustainable resource use.