Antonio Saverio Valente , Teresa Angela Trunfio , Marco Aiello , Dario Baldi , Marilena Baldi , Silvio Imbò , Mario Alessandro Russo , Carlo Cavaliere , Monica Franzese
{"title":"Text mining approach for feature extraction and cartilage disease grade classification using knee MRI radiology reports","authors":"Antonio Saverio Valente , Teresa Angela Trunfio , Marco Aiello , Dario Baldi , Marilena Baldi , Silvio Imbò , Mario Alessandro Russo , Carlo Cavaliere , Monica Franzese","doi":"10.1016/j.csbj.2024.10.003","DOIUrl":null,"url":null,"abstract":"<div><div>MRI radiology reporting processes can be improved by exploiting structured and semantically labelled data that can be fed to artificial intelligence (AI) tools. AI-based tools assisting radiology reporting can help to automatically individuate cartilage grading in textual magnetic resonance imaging (MRI) reports, thus supporting clinicians' decisions regarding medical imaging utilisation, diagnosis and treatment. In this study, we extracted information (clinical findings, observations, anatomical regions, etc.) and classified knee cartilage degradation from medical reports utilising transfer-learning techniques applied to the Bidirectional Encoder Representations from Transformers (BERT) model and its variants, pre-trained on an Italian-language corpus. To realise this objective, we used a dataset of 750 MRI knee reports written by three radiologists who contributed to a manual annotation process to perform text classification (TC) and named entity recognition (NER) tasks. The dataset was obtained from an internal database of the IRCCS SYNLAB SDN. Seventy percent of the dataset was used for training, 10% was used for validation and 20% was used for testing. The best-performing configurations for NER and TC tasks were based on the pre-trained BERT model. The macro F1-scores obtained with the NER and TC models are 0.89 and 0.81, respectively. The accuracies calculated on the test set for both tasks are 0.96 and 0.99, respectively.</div></div>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"24 ","pages":"Pages 622-629"},"PeriodicalIF":4.4000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2001037024003234","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
MRI radiology reporting processes can be improved by exploiting structured and semantically labelled data that can be fed to artificial intelligence (AI) tools. AI-based tools assisting radiology reporting can help to automatically individuate cartilage grading in textual magnetic resonance imaging (MRI) reports, thus supporting clinicians' decisions regarding medical imaging utilisation, diagnosis and treatment. In this study, we extracted information (clinical findings, observations, anatomical regions, etc.) and classified knee cartilage degradation from medical reports utilising transfer-learning techniques applied to the Bidirectional Encoder Representations from Transformers (BERT) model and its variants, pre-trained on an Italian-language corpus. To realise this objective, we used a dataset of 750 MRI knee reports written by three radiologists who contributed to a manual annotation process to perform text classification (TC) and named entity recognition (NER) tasks. The dataset was obtained from an internal database of the IRCCS SYNLAB SDN. Seventy percent of the dataset was used for training, 10% was used for validation and 20% was used for testing. The best-performing configurations for NER and TC tasks were based on the pre-trained BERT model. The macro F1-scores obtained with the NER and TC models are 0.89 and 0.81, respectively. The accuracies calculated on the test set for both tasks are 0.96 and 0.99, respectively.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology