Audio-based deep learning classification of laryngeal pathologies with detection of precancerous and cancerous lesions using Gammatone Cepstral coefficients

Biomedical engineering advances Pub Date : 2026-06-01 Epub Date: 2026-01-19 DOI:10.1016/j.bea.2026.100211

Julia Zofia Tomaszewska , Wojciech Kukwa , Apostolos Georgakis

{"title":"Audio-based deep learning classification of laryngeal pathologies with detection of precancerous and cancerous lesions using Gammatone Cepstral coefficients","authors":"Julia Zofia Tomaszewska , Wojciech Kukwa , Apostolos Georgakis","doi":"10.1016/j.bea.2026.100211","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Despite extensive research on audio-based voice pathology detection, current literature lacks clear and consistent evidence identifying acoustic features capable of reliably discriminating precancerous and cancerous laryngeal lesions, particularly when analysed using continuous speech signals.</div></div><div><h3>Problem statement</h3><div>The performance of audio-based laryngeal pathology classification systems on continuous speech remains significantly underreported, and commonly used Mel-Frequency Cepstral Coefficients (MFCCs) may be suboptimal for capturing pathology-related acoustic characteristics.</div></div><div><h3>Objectives</h3><div>This study investigates the hypothesis that continuous speech audio signals analysed with Gammatone Cepstral Coefficients (GTCCs) enable the accurate and precise detection of laryngeal pathologies, with the specific focus on precancerous and cancerous lesions.</div></div><div><h3>Methods</h3><div>An audio-based classification system employing GTCCs for feature extraction and a one-dimensional Convolutional Neural Network (CNN) for classification is proposed. The system considers three classes: precancerous and cancerous lesions, neuromuscular disorders, and healthy cases. Performance was evaluated using two datasets: a custom speech dataset collected for this research and the Saarbruecken Voice Database (SVD).</div></div><div><h3>Results</h3><div>GTCCs derived from speech signals delivered superior classification accuracy compared to the widely used Mel-Frequency Cepstral Coefficients (MFCCs). On the custom dataset, the proposed method achieved an average classification accuracy of 85.04% ±1.23 compared to 63.22% ± 1.62 using MFCCs. On SVD, GTCCs achieved 73.93% ±1.42, compared to 60.36% ±2.44 for MFCCs. The statistical significance of the obtained results was evidenced using <em>t</em>-test with the significance level set at 1%.</div></div><div><h3>Conclusions</h3><div>The results demonstrate that GTCCs extracted from continuous speech signals provide a robust and effective representation for audio-based laryngeal pathology classification, highlighting their potential for use in automated pre-screening systems targeting precancerous and cancerous voice disorders.</div></div>","PeriodicalId":72384,"journal":{"name":"Biomedical engineering advances","volume":"11 ","pages":"Article 100211"},"PeriodicalIF":0.0000,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical engineering advances","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266709922600006X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

Despite extensive research on audio-based voice pathology detection, current literature lacks clear and consistent evidence identifying acoustic features capable of reliably discriminating precancerous and cancerous laryngeal lesions, particularly when analysed using continuous speech signals.

Problem statement

The performance of audio-based laryngeal pathology classification systems on continuous speech remains significantly underreported, and commonly used Mel-Frequency Cepstral Coefficients (MFCCs) may be suboptimal for capturing pathology-related acoustic characteristics.

Objectives

This study investigates the hypothesis that continuous speech audio signals analysed with Gammatone Cepstral Coefficients (GTCCs) enable the accurate and precise detection of laryngeal pathologies, with the specific focus on precancerous and cancerous lesions.

Methods

An audio-based classification system employing GTCCs for feature extraction and a one-dimensional Convolutional Neural Network (CNN) for classification is proposed. The system considers three classes: precancerous and cancerous lesions, neuromuscular disorders, and healthy cases. Performance was evaluated using two datasets: a custom speech dataset collected for this research and the Saarbruecken Voice Database (SVD).

Results

GTCCs derived from speech signals delivered superior classification accuracy compared to the widely used Mel-Frequency Cepstral Coefficients (MFCCs). On the custom dataset, the proposed method achieved an average classification accuracy of 85.04% ±1.23 compared to 63.22% ± 1.62 using MFCCs. On SVD, GTCCs achieved 73.93% ±1.42, compared to 60.36% ±2.44 for MFCCs. The statistical significance of the obtained results was evidenced using t-test with the significance level set at 1%.

Conclusions

The results demonstrate that GTCCs extracted from continuous speech signals provide a robust and effective representation for audio-based laryngeal pathology classification, highlighting their potential for use in automated pre-screening systems targeting precancerous and cancerous voice disorders.

Abstract Image

查看原文本刊更多论文

基于音频的深度学习喉部病变分类，使用伽玛酮倒谱系数检测癌前病变和癌性病变

尽管对基于音频的语音病理检测进行了广泛的研究，但目前的文献缺乏明确和一致的证据来识别能够可靠地区分癌前病变和癌性喉部病变的声学特征，特别是在使用连续语音信号进行分析时。基于音频的喉病理分类系统在连续语音上的表现仍然被严重低估，通常使用的Mel-Frequency倒谱系数（MFCCs）可能不是捕获病理相关声学特征的最佳选择。目的：本研究探讨了用伽玛酮倒谱系数（gtcc）分析连续语音音频信号能够准确和精确地检测喉部病变，特别是癌前病变和癌性病变的假设。方法提出了一种基于音频的分类系统，采用gtcc进行特征提取，一维卷积神经网络（CNN）进行分类。该系统考虑了三类：癌前病变和癌性病变、神经肌肉疾病和健康病例。使用两个数据集评估性能：为本研究收集的自定义语音数据集和Saarbruecken语音数据库（SVD）。结果基于语音信号的gtcc比常用的Mel-Frequency倒谱系数（mfcc）具有更好的分类精度。在自定义数据集上，该方法的平均分类准确率为85.04%±1.23，而使用mfc的平均分类准确率为63.22%±1.62。在SVD上，gtcc为73.93%±1.42，而mfcc为60.36%±2.44。所得结果的统计学显著性采用t检验，显著性水平设为1%。结果表明，从连续语音信号中提取的gtcc为基于音频的喉部病理分类提供了稳健有效的表征，突出了其在针对癌前和癌性语音疾病的自动预筛查系统中的应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biomedical engineering advances Bioengineering, Biomedical Engineering

自引率

0.00%

发文量

审稿时长

59 days