GOGCN:利用深度学习支持将新概念插入基因本体

4th International Conference on Information Science, Electrical and Automation Engineering Pub Date : 2023-08-10 DOI:10.1117/12.2689526

Cheng Chen, Lingyun Luo

{"title":"GOGCN:利用深度学习支持将新概念插入基因本体","authors":"Cheng Chen, Lingyun Luo","doi":"10.1117/12.2689526","DOIUrl":null,"url":null,"abstract":"Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GOGCN: using deep learning to support insertion of new concepts into gene ontology\",\"authors\":\"Cheng Chen, Lingyun Luo\",\"doi\":\"10.1117/12.2689526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.\",\"PeriodicalId\":118234,\"journal\":{\"name\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2689526\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

许多生物医学本体有规律地发展并随时间变化。一个本体的新版本将更新它的数据，包含修复以前版本中的一些错误，并添加许多新的概念以适应领域的发展。在本体的自动丰富中，将新概念插入到术语的适当位置是一个具有挑战性的问题。在过去，新概念总是由领域专家创造的。然后，专家将运行传统的分类器或人工操作来将新概念插入适当的位置。随着技术的发展，人们提出了基于机器学习(ML)的方法来帮助术语研究者开发和维护术语本体。本文提出了一种基于仅提供概念名称并使用聚合子字符串邻居信息的图卷积网络(GCN)学习方法的新方法。我们选择了双向长短期记忆网络(Bi-LSTM)模型作为预测任务的分类器。我们首先在Gene Ontology (GO) 2020年1月发布的版本中对该方法进行了测试，在预测IS-A直接链接的任务中，平均精度为89.68%，F1分数为0.9081。在比较2020年1月和2022年3月的版本时，我们预测了与新概念相关的链接，我们的平均准确率得分为0.6996。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GOGCN: using deep learning to support insertion of new concepts into gene ontology

Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

4th International Conference on Information Science, Electrical and Automation Engineering

自引率

0.00%

发文量