Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

New Generation Computing Pub Date : 2024-03-12 DOI:10.1007/s00354-024-00248-3

Shikha Verma, Aditi Sharan, Nidhi Malik

{"title":"Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text","authors":"Shikha Verma, Aditi Sharan, Nidhi Malik","doi":"10.1007/s00354-024-00248-3","DOIUrl":null,"url":null,"abstract":"<p>The Hallmark of Cancers consists of various biological capabilities of the tumor cell which help the medical experts to understand the development and identification of these cells during various stages of the cancer disease. The hallmark of cancer classification is a widely accepted framework that characterizes the fundamental biological capabilities of cancer cells. This classification is based on the work of Hanahan and Weinberg, who identified 10 hallmark capabilities that collectively enable the development and progression of cancer. The hallmark of cancer classification provides a comprehensive framework for understanding the biological basis of cancer development and progression. It helps researchers to identify the key molecular and cellular pathways that are involved in the disease, which can inform the development of new diagnostic tools and therapies. Multi-label classification aims to assign a set of labels to the samples under study. This paper focuses on creating an improved model by hybridizing the biomedical domain-specific embeddings for all the extracted biomedical features on the machine learning model. The use of domain-specific embeddings adds semantics to the vector-represented text. More specifically the study has tried to improve the efficacy of the multi-label classification as compared with other state-of-art methods using BioWordVec and the MeSH embeddings. The experimental work showed a significant improvement in the performance of our model which is being trained on the machine learning algorithm Support Vector Machine (SVM). The paper also focuses on understanding the label correlation which is studied by conducting a case study with medical domain experts and is also analyzed with the proposed model.</p>","PeriodicalId":54726,"journal":{"name":"New Generation Computing","volume":"42 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Generation Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00354-024-00248-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The Hallmark of Cancers consists of various biological capabilities of the tumor cell which help the medical experts to understand the development and identification of these cells during various stages of the cancer disease. The hallmark of cancer classification is a widely accepted framework that characterizes the fundamental biological capabilities of cancer cells. This classification is based on the work of Hanahan and Weinberg, who identified 10 hallmark capabilities that collectively enable the development and progression of cancer. The hallmark of cancer classification provides a comprehensive framework for understanding the biological basis of cancer development and progression. It helps researchers to identify the key molecular and cellular pathways that are involved in the disease, which can inform the development of new diagnostic tools and therapies. Multi-label classification aims to assign a set of labels to the samples under study. This paper focuses on creating an improved model by hybridizing the biomedical domain-specific embeddings for all the extracted biomedical features on the machine learning model. The use of domain-specific embeddings adds semantics to the vector-represented text. More specifically the study has tried to improve the efficacy of the multi-label classification as compared with other state-of-art methods using BioWordVec and the MeSH embeddings. The experimental work showed a significant improvement in the performance of our model which is being trained on the machine learning algorithm Support Vector Machine (SVM). The paper also focuses on understanding the label correlation which is studied by conducting a case study with medical domain experts and is also analyzed with the proposed model.

Abstract Image

查看原文本刊更多论文

使用基于嵌入的支持向量机对多标签文本进行高效的癌症特征分类

癌症标志包括肿瘤细胞的各种生物能力，有助于医学专家了解这些细胞在癌症疾病不同阶段的发展和识别。癌症标志分类是一个广为接受的框架，它描述了癌细胞的基本生物学能力。该分类法基于 Hanahan 和 Weinberg 的研究成果，他们确定了 10 种标志性能力，这些能力共同促成了癌症的发展和恶化。癌症标志分类为了解癌症发生和发展的生物学基础提供了一个全面的框架。它有助于研究人员确定疾病的关键分子和细胞通路，为开发新的诊断工具和疗法提供信息。多标签分类旨在为研究样本分配一组标签。本文的重点是在机器学习模型上混合所有提取的生物医学特征的生物医学领域特定嵌入，从而创建一个改进的模型。特定领域嵌入的使用为向量表示的文本增加了语义。更具体地说，与使用 BioWordVec 和 MeSH 嵌入的其他先进方法相比，该研究试图提高多标签分类的效率。实验工作表明，我们的模型在机器学习算法支持向量机（SVM）的训练下性能有了显著提高。本文还重点介绍了标签相关性，通过对医学领域专家进行案例研究，并利用所提出的模型进行了分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

New Generation Computing 工程技术-计算机：理论方法

CiteScore

5.90

自引率

15.40%

发文量

审稿时长

>12 weeks

期刊介绍： The journal is specially intended to support the development of new computational and cognitive paradigms stemming from the cross-fertilization of various research fields. These fields include, but are not limited to, programming (logic, constraint, functional, object-oriented), distributed/parallel computing, knowledge-based systems, agent-oriented systems, and cognitive aspects of human embodied knowledge. It also encourages theoretical and/or practical papers concerning all types of learning, knowledge discovery, evolutionary mechanisms, human cognition and learning, and emergent systems that can lead to key technologies enabling us to build more complex and intelligent systems. The editorial board hopes that New Generation Computing will work as a catalyst among active researchers with broad interests by ensuring a smooth publication process.