Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data.

IF 2.3 4区 化学 Q3 CHEMISTRY, MULTIDISCIPLINARY
Nicole Hayes, Ekaterina Merkurjev, Guo-Wei Wei
{"title":"Graph-Based Bidirectional Transformer Decision Threshold Adjustment Algorithm for Class-Imbalanced Molecular Data.","authors":"Nicole Hayes, Ekaterina Merkurjev, Guo-Wei Wei","doi":"10.1142/s2737416524500479","DOIUrl":null,"url":null,"abstract":"<p><p>Data sets with imbalanced class sizes, where one class size is much smaller than that of others, occur exceedingly often in many applications, including those with biological foundations, such as disease diagnosis and drug discovery. Therefore, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to do so can result in heavy costs. Nonetheless, many data classification procedures do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this work, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) approaches and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification tasks on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed technique not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer procedure based on an attention mechanism for self-supervised learning. In addition, the model implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed method is validated using six molecular data sets and compared to other related techniques. The computational experiments show that the proposed technique is superior to competing approaches even in the case of a high class imbalance ratio.</p>","PeriodicalId":15603,"journal":{"name":"Journal of Computational Biophysics and Chemistry","volume":"23 10","pages":"1339-1358"},"PeriodicalIF":2.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467357/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biophysics and Chemistry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2737416524500479","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/19 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Data sets with imbalanced class sizes, where one class size is much smaller than that of others, occur exceedingly often in many applications, including those with biological foundations, such as disease diagnosis and drug discovery. Therefore, it is extremely important to be able to identify data elements of classes of various sizes, as a failure to do so can result in heavy costs. Nonetheless, many data classification procedures do not perform well on imbalanced data sets as they often fail to detect elements belonging to underrepresented classes. In this work, we propose the BTDT-MBO algorithm, incorporating Merriman-Bence-Osher (MBO) approaches and a bidirectional transformer, as well as distance correlation and decision threshold adjustments, for data classification tasks on highly imbalanced molecular data sets, where the sizes of the classes vary greatly. The proposed technique not only integrates adjustments in the classification threshold for the MBO algorithm in order to help deal with the class imbalance, but also uses a bidirectional transformer procedure based on an attention mechanism for self-supervised learning. In addition, the model implements distance correlation as a weight function for the similarity graph-based framework on which the adjusted MBO algorithm operates. The proposed method is validated using six molecular data sets and compared to other related techniques. The computational experiments show that the proposed technique is superior to competing approaches even in the case of a high class imbalance ratio.

基于图的类不平衡分子数据双向互感器决策阈值调整算法。
类大小不平衡的数据集,其中一个类大小比其他类小得多,在许多应用中非常常见,包括具有生物学基础的应用,如疾病诊断和药物发现。因此,能够识别各种大小的类的数据元素是非常重要的,因为不能这样做可能会导致沉重的成本。尽管如此,许多数据分类过程在不平衡数据集上表现不佳,因为它们经常无法检测到属于代表性不足的类的元素。在这项工作中,我们提出了BTDT-MBO算法,结合了merriman - bce - osher (MBO)方法和双向变压器,以及距离相关和决策阈值调整,用于高度不平衡分子数据集的数据分类任务,其中类的大小差异很大。该技术不仅集成了MBO算法的分类阈值调整以帮助处理类不平衡问题,而且采用了基于注意机制的双向变压器过程进行自监督学习。此外,该模型将距离相关性作为权重函数实现到基于相似图的框架中,调整后的MBO算法在此框架上运行。用六个分子数据集验证了所提出的方法,并与其他相关技术进行了比较。计算实验表明,即使在高类不平衡比的情况下,该方法也优于竞争方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.60
自引率
9.10%
发文量
62
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信