Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer

IF 4.2 2区 地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
{"title":"Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer","authors":"","doi":"10.1016/j.cageo.2024.105701","DOIUrl":null,"url":null,"abstract":"<div><p>The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424001845","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.

利用转换器进行图像-文本整合,加强对燧石化石的分类鉴定
燧石化石的准确分类鉴定在古生物学、古生态学和古地理学中具有重要的科学价值。然而,图像样本的不平衡导致模型倾向于从样本较多的类别中学习特征,而忽略样本较少的类别,从而大大降低了化石鉴定的预测准确性。此外,化石的文字描述包含丰富的特征信息。我们收集并创建了一个顺序化石多模态(OFM)数据集进行研究。我们提出了一种基于变压器的多模态集成框架(TMIF),利用深度学习来识别燧石化石。与传统的神经网络相比,变换器可以在不同位置的特征之间建立全局依赖关系。TMIF 包含图像和文本分支,专门用于提取两种模态的特征,还有一个关键的跨模态整合模块,可以让视觉特征充分学习文本语义特征,从而获得更全面的特征表示。使用 OFM 数据集进行的实验评估表明,TMIF 的预测准确率达到了 81.7%,比仅基于图像的方法提高了 2.8%。对多个网络的进一步比较分析表明,TMIF 在解决样本不平衡的燧石化石分类鉴定方面表现最佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Geosciences
Computers & Geosciences 地学-地球科学综合
CiteScore
9.30
自引率
6.80%
发文量
164
审稿时长
3.4 months
期刊介绍: Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信