GOGCN:利用深度学习支持将新概念插入基因本体

Cheng Chen, Lingyun Luo
{"title":"GOGCN:利用深度学习支持将新概念插入基因本体","authors":"Cheng Chen, Lingyun Luo","doi":"10.1117/12.2689526","DOIUrl":null,"url":null,"abstract":"Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.","PeriodicalId":118234,"journal":{"name":"4th International Conference on Information Science, Electrical and Automation Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GOGCN: using deep learning to support insertion of new concepts into gene ontology\",\"authors\":\"Cheng Chen, Lingyun Luo\",\"doi\":\"10.1117/12.2689526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.\",\"PeriodicalId\":118234,\"journal\":{\"name\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"4th International Conference on Information Science, Electrical and Automation Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2689526\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Conference on Information Science, Electrical and Automation Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2689526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

许多生物医学本体有规律地发展并随时间变化。一个本体的新版本将更新它的数据,包含修复以前版本中的一些错误,并添加许多新的概念以适应领域的发展。在本体的自动丰富中,将新概念插入到术语的适当位置是一个具有挑战性的问题。在过去,新概念总是由领域专家创造的。然后,专家将运行传统的分类器或人工操作来将新概念插入适当的位置。随着技术的发展,人们提出了基于机器学习(ML)的方法来帮助术语研究者开发和维护术语本体。本文提出了一种基于仅提供概念名称并使用聚合子字符串邻居信息的图卷积网络(GCN)学习方法的新方法。我们选择了双向长短期记忆网络(Bi-LSTM)模型作为预测任务的分类器。我们首先在Gene Ontology (GO) 2020年1月发布的版本中对该方法进行了测试,在预测IS-A直接链接的任务中,平均精度为89.68%,F1分数为0.9081。在比较2020年1月和2022年3月的版本时,我们预测了与新概念相关的链接,我们的平均准确率得分为0.6996。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GOGCN: using deep learning to support insertion of new concepts into gene ontology
Many biomedical ontologies develop regularly and change over time. An ontology new release will update its data, containing that fix some errors in the previous version and add many new concepts to adapt to the development in the domain. Insertion of new concepts into their proper positions on a terminology is a challenging problem in the automatic enrichment of ontologies. In the past, the new concepts are always created by domain experts. Then the experts will run a traditional classifier or manual operation to insert the new concepts in proper place. With the development of technology, the methods based on Machine learning (ML) have been proposed to help terminology researchers to develop and maintain the ontologies. We propose an new approach that is based on providing only the concept name and using a Graph Convolutional Network (GCN) aggregated the sub-string neighbor information learning method. We chose a Bidirectional Long Short-term Memory Networks (Bi-LSTM) model as our classifier for the predicted task. We first tested this method within Gene Ontology (GO) 2020 January release and achieved an average of 89.68% precision and an F1 score of 0.9081 in task of predicting direct IS-A links. In comparing the January 2020 release with the March 2022 release, we predicted the links related to new concepts, our average Accuracy score was 0.6996.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信