Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text

IF 2 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Shikha Verma, Aditi Sharan, Nidhi Malik
{"title":"Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text","authors":"Shikha Verma, Aditi Sharan, Nidhi Malik","doi":"10.1007/s00354-024-00248-3","DOIUrl":null,"url":null,"abstract":"<p>The Hallmark of Cancers consists of various biological capabilities of the tumor cell which help the medical experts to understand the development and identification of these cells during various stages of the cancer disease. The hallmark of cancer classification is a widely accepted framework that characterizes the fundamental biological capabilities of cancer cells. This classification is based on the work of Hanahan and Weinberg, who identified 10 hallmark capabilities that collectively enable the development and progression of cancer. The hallmark of cancer classification provides a comprehensive framework for understanding the biological basis of cancer development and progression. It helps researchers to identify the key molecular and cellular pathways that are involved in the disease, which can inform the development of new diagnostic tools and therapies. Multi-label classification aims to assign a set of labels to the samples under study. This paper focuses on creating an improved model by hybridizing the biomedical domain-specific embeddings for all the extracted biomedical features on the machine learning model. The use of domain-specific embeddings adds semantics to the vector-represented text. More specifically the study has tried to improve the efficacy of the multi-label classification as compared with other state-of-art methods using BioWordVec and the MeSH embeddings. The experimental work showed a significant improvement in the performance of our model which is being trained on the machine learning algorithm Support Vector Machine (SVM). The paper also focuses on understanding the label correlation which is studied by conducting a case study with medical domain experts and is also analyzed with the proposed model.</p>","PeriodicalId":54726,"journal":{"name":"New Generation Computing","volume":"42 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Generation Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00354-024-00248-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

The Hallmark of Cancers consists of various biological capabilities of the tumor cell which help the medical experts to understand the development and identification of these cells during various stages of the cancer disease. The hallmark of cancer classification is a widely accepted framework that characterizes the fundamental biological capabilities of cancer cells. This classification is based on the work of Hanahan and Weinberg, who identified 10 hallmark capabilities that collectively enable the development and progression of cancer. The hallmark of cancer classification provides a comprehensive framework for understanding the biological basis of cancer development and progression. It helps researchers to identify the key molecular and cellular pathways that are involved in the disease, which can inform the development of new diagnostic tools and therapies. Multi-label classification aims to assign a set of labels to the samples under study. This paper focuses on creating an improved model by hybridizing the biomedical domain-specific embeddings for all the extracted biomedical features on the machine learning model. The use of domain-specific embeddings adds semantics to the vector-represented text. More specifically the study has tried to improve the efficacy of the multi-label classification as compared with other state-of-art methods using BioWordVec and the MeSH embeddings. The experimental work showed a significant improvement in the performance of our model which is being trained on the machine learning algorithm Support Vector Machine (SVM). The paper also focuses on understanding the label correlation which is studied by conducting a case study with medical domain experts and is also analyzed with the proposed model.

Abstract Image

使用基于嵌入的支持向量机对多标签文本进行高效的癌症特征分类
癌症标志包括肿瘤细胞的各种生物能力,有助于医学专家了解这些细胞在癌症疾病不同阶段的发展和识别。癌症标志分类是一个广为接受的框架,它描述了癌细胞的基本生物学能力。该分类法基于 Hanahan 和 Weinberg 的研究成果,他们确定了 10 种标志性能力,这些能力共同促成了癌症的发展和恶化。癌症标志分类为了解癌症发生和发展的生物学基础提供了一个全面的框架。它有助于研究人员确定疾病的关键分子和细胞通路,为开发新的诊断工具和疗法提供信息。多标签分类旨在为研究样本分配一组标签。本文的重点是在机器学习模型上混合所有提取的生物医学特征的生物医学领域特定嵌入,从而创建一个改进的模型。特定领域嵌入的使用为向量表示的文本增加了语义。更具体地说,与使用 BioWordVec 和 MeSH 嵌入的其他先进方法相比,该研究试图提高多标签分类的效率。实验工作表明,我们的模型在机器学习算法支持向量机(SVM)的训练下性能有了显著提高。本文还重点介绍了标签相关性,通过对医学领域专家进行案例研究,并利用所提出的模型进行了分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
New Generation Computing
New Generation Computing 工程技术-计算机:理论方法
CiteScore
5.90
自引率
15.40%
发文量
47
审稿时长
>12 weeks
期刊介绍: The journal is specially intended to support the development of new computational and cognitive paradigms stemming from the cross-fertilization of various research fields. These fields include, but are not limited to, programming (logic, constraint, functional, object-oriented), distributed/parallel computing, knowledge-based systems, agent-oriented systems, and cognitive aspects of human embodied knowledge. It also encourages theoretical and/or practical papers concerning all types of learning, knowledge discovery, evolutionary mechanisms, human cognition and learning, and emergent systems that can lead to key technologies enabling us to build more complex and intelligent systems. The editorial board hopes that New Generation Computing will work as a catalyst among active researchers with broad interests by ensuring a smooth publication process.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信