Predicting implicit concept embeddings for singular relationship discovery replication of closed literature-based discovery.

Frontiers in research metrics and analytics Pub Date : 2025-03-05 eCollection Date: 2025-01-01 DOI:10.3389/frma.2025.1509502
Clint Cuffy, Bridget T McInnes
{"title":"Predicting implicit concept embeddings for singular relationship discovery replication of closed literature-based discovery.","authors":"Clint Cuffy, Bridget T McInnes","doi":"10.3389/frma.2025.1509502","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Literature-based Discovery (LBD) identifies new knowledge by leveraging existing literature. It exploits interconnecting implicit relationships to build bridges between isolated sets of non-interacting literatures. It has been used to facilitate drug repurposing, new drug discovery, and study adverse event reactions. Within the last decade, LBD systems have transitioned from using statistical methods to exploring deep learning (DL) to analyze semantic spaces between non-interacting literatures. Recent works explore knowledge graphs (KG) to represent explicit relationships. These works envision LBD as a knowledge graph completion (KGC) task and use DL to generate implicit relationships. However, these systems require the researcher to have domain-expert knowledge when submitting relevant queries for novel hypothesis discovery.</p><p><strong>Methods: </strong>Our method explores a novel approach to identify all implicit hypotheses given the researcher's search query and expedites the knowledge discovery process. We revise the KGC task as the task of predicting interconnecting vertex embeddings within the graph. We train our model using a similarity learning objective and compare our model's predictions against all known vertices within the graph to determine the likelihood of an implicit relationship (i.e., connecting edge). We also explore three approaches to represent edge connections between vertices within the KG: average, concatenation, and Hadamard. Lastly, we explore an approach to induce inductive biases and expedite model convergence (i.e., input representation scaling).</p><p><strong>Results: </strong>We evaluate our method by replicating five known discoveries within the Hallmark of Cancer (HOC) datasets and compare our method to two existing works. Our results show no significant difference in reported ranks and model convergence rate when comparing scaling our input representations and not using this method. Comparing our method to previous works, we found our method achieves optimal performance on two of five datasets and achieves comparable performance on the remaining datasets. We further analyze our results using statistical significance testing to demonstrate the efficacy of our method.</p><p><strong>Conclusion: </strong>We found our similarity-based learning objective predicts linking vertex embeddings for single relationship closed discovery replication. Our method also provides a ranked list of linking vertices between a set of inputs. This approach reduces researcher burden and allows further exploration of generated hypotheses.</p>","PeriodicalId":73104,"journal":{"name":"Frontiers in research metrics and analytics","volume":"10 ","pages":"1509502"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11920161/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in research metrics and analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frma.2025.1509502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Literature-based Discovery (LBD) identifies new knowledge by leveraging existing literature. It exploits interconnecting implicit relationships to build bridges between isolated sets of non-interacting literatures. It has been used to facilitate drug repurposing, new drug discovery, and study adverse event reactions. Within the last decade, LBD systems have transitioned from using statistical methods to exploring deep learning (DL) to analyze semantic spaces between non-interacting literatures. Recent works explore knowledge graphs (KG) to represent explicit relationships. These works envision LBD as a knowledge graph completion (KGC) task and use DL to generate implicit relationships. However, these systems require the researcher to have domain-expert knowledge when submitting relevant queries for novel hypothesis discovery.

Methods: Our method explores a novel approach to identify all implicit hypotheses given the researcher's search query and expedites the knowledge discovery process. We revise the KGC task as the task of predicting interconnecting vertex embeddings within the graph. We train our model using a similarity learning objective and compare our model's predictions against all known vertices within the graph to determine the likelihood of an implicit relationship (i.e., connecting edge). We also explore three approaches to represent edge connections between vertices within the KG: average, concatenation, and Hadamard. Lastly, we explore an approach to induce inductive biases and expedite model convergence (i.e., input representation scaling).

Results: We evaluate our method by replicating five known discoveries within the Hallmark of Cancer (HOC) datasets and compare our method to two existing works. Our results show no significant difference in reported ranks and model convergence rate when comparing scaling our input representations and not using this method. Comparing our method to previous works, we found our method achieves optimal performance on two of five datasets and achieves comparable performance on the remaining datasets. We further analyze our results using statistical significance testing to demonstrate the efficacy of our method.

Conclusion: We found our similarity-based learning objective predicts linking vertex embeddings for single relationship closed discovery replication. Our method also provides a ranked list of linking vertices between a set of inputs. This approach reduces researcher burden and allows further exploration of generated hypotheses.

目标:基于文献的发现(LBD)通过利用现有文献来识别新知识。它利用相互关联的隐含关系,在孤立的非互动文献集之间搭建桥梁。它已被用于促进药物再利用、新药发现和不良事件反应研究。在过去十年中,LBD 系统已从使用统计方法过渡到探索深度学习(DL),以分析非交互文献之间的语义空间。最近的工作探索了知识图谱(KG)来表示明确的关系。这些作品将 LBD 设想为知识图谱补全(KGC)任务,并使用 DL 生成隐式关系。然而,这些系统要求研究人员在提交相关查询以发现新假设时具备领域专家知识:我们的方法探索了一种新方法,可以根据研究人员的搜索查询识别所有隐含假设,并加快知识发现过程。我们将 KGC 任务修改为预测图中相互连接的顶点嵌入。我们使用相似性学习目标训练模型,并将模型的预测结果与图中所有已知顶点进行比较,以确定隐含关系(即连接边)的可能性。我们还探索了三种表示 KG 中顶点之间边连接的方法:平均法、串联法和哈达玛法。最后,我们探索了一种诱导归纳偏差和加速模型收敛的方法(即输入表示缩放):我们通过复制癌症标志(HOC)数据集中的五个已知发现来评估我们的方法,并将我们的方法与现有的两种方法进行比较。结果表明,在缩放输入表征和不使用该方法的情况下,报告的等级和模型收敛率没有明显差异。将我们的方法与以前的工作进行比较,我们发现我们的方法在五个数据集中的两个数据集上实现了最佳性能,在其余数据集上实现了相当的性能。我们还使用统计显著性测试进一步分析了我们的结果,以证明我们方法的有效性:我们发现,我们基于相似性的学习目标可以预测单一关系封闭发现复制的链接顶点嵌入。我们的方法还提供了一组输入之间链接顶点的排序列表。这种方法减轻了研究人员的负担,并允许进一步探索生成的假设。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.50
自引率
0.00%
发文量
0
审稿时长
14 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信