DeepSeek-Based Multi-dimensional Augmentation of Short and Highly Domain-Specific Textual Inquires for Aquaculture Question-Answering Framework

IF 2.2 3区 农林科学 Q2 FISHERIES
Liming Shao, Hong Yu, Wei Huang, Huiyuan Zhao, Lixin Zhang, Jing Song
{"title":"DeepSeek-Based Multi-dimensional Augmentation of Short and Highly Domain-Specific Textual Inquires for Aquaculture Question-Answering Framework","authors":"Liming Shao,&nbsp;Hong Yu,&nbsp;Wei Huang,&nbsp;Huiyuan Zhao,&nbsp;Lixin Zhang,&nbsp;Jing Song","doi":"10.1007/s10499-025-01948-3","DOIUrl":null,"url":null,"abstract":"<div><p>High-quality data are essential for accurate and timely decision-making in disease prevention and control within aquaculture question-answering (QA) framework. However, textual data reflecting conversational question–answer exchange between fishery farmers and domain experts remain scarce, hindering the progress in training and building such systems. To address this gap, we introduce a multi-dimensionalaugmentationapproach leveraging DeepSeek to generate high-quality augmented data tailored to aquaculture, concentrating on the questions side. Our method aims to provide high quality synthesized data to better train aquaculture question-answering frameworks to gain comprehensive understanding of key information from short, free-form and conversational inquiries and deduct the intention of the questions. We employ a multi-task BERT framework to assess the reliability and diversity of these augmented samples, ensuring they preserve core semantics while expanding domain-specific data availability. We benchmark our approach against ChatGPT o1 and our experimental results demonstrate that DeepSeek achieves better performance. Specifically, for domain specific key information (aka entity) recognition, it attains an accuracy of 92.08%, precision of 92.3%, recall of 92.05%, and an F1 score of 91.78%; for intent classification, the model reaches 91.67% accuracy, 93.48% precision, 91.67% recall, and 89.68% F1 score. Notably, DeepSeek surpasses ChatGPT o1 in intent classification and remains competitive in key entity recognition. Furthermore, the augmented samples exhibit robust domain reliability (Cosine similarity &lt; 0.474619) and high diversity (Distinct- 1 = 0.9776; Self-BLUE = 0.0106). These results demonstrate the efficacy of DeepSeek-based multi-dimensional text augmentation in improving data consistency and coverage for aquaculture professionals engaged in disease management. Our method places a particular emphasis on enhancing the quality and comprehensiveness of user questions, thereby laying a stronger foundation for subsequent answer generation and overall knowledge improvement in QA framework.\n</p></div>","PeriodicalId":8122,"journal":{"name":"Aquaculture International","volume":"33 4","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquaculture International","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s10499-025-01948-3","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0

Abstract

High-quality data are essential for accurate and timely decision-making in disease prevention and control within aquaculture question-answering (QA) framework. However, textual data reflecting conversational question–answer exchange between fishery farmers and domain experts remain scarce, hindering the progress in training and building such systems. To address this gap, we introduce a multi-dimensionalaugmentationapproach leveraging DeepSeek to generate high-quality augmented data tailored to aquaculture, concentrating on the questions side. Our method aims to provide high quality synthesized data to better train aquaculture question-answering frameworks to gain comprehensive understanding of key information from short, free-form and conversational inquiries and deduct the intention of the questions. We employ a multi-task BERT framework to assess the reliability and diversity of these augmented samples, ensuring they preserve core semantics while expanding domain-specific data availability. We benchmark our approach against ChatGPT o1 and our experimental results demonstrate that DeepSeek achieves better performance. Specifically, for domain specific key information (aka entity) recognition, it attains an accuracy of 92.08%, precision of 92.3%, recall of 92.05%, and an F1 score of 91.78%; for intent classification, the model reaches 91.67% accuracy, 93.48% precision, 91.67% recall, and 89.68% F1 score. Notably, DeepSeek surpasses ChatGPT o1 in intent classification and remains competitive in key entity recognition. Furthermore, the augmented samples exhibit robust domain reliability (Cosine similarity < 0.474619) and high diversity (Distinct- 1 = 0.9776; Self-BLUE = 0.0106). These results demonstrate the efficacy of DeepSeek-based multi-dimensional text augmentation in improving data consistency and coverage for aquaculture professionals engaged in disease management. Our method places a particular emphasis on enhancing the quality and comprehensiveness of user questions, thereby laying a stronger foundation for subsequent answer generation and overall knowledge improvement in QA framework.

基于 DeepSeek 的水产养殖问答框架多维度扩充简短且高度特定领域的文本查询
在水产养殖问答(QA)框架内,高质量的数据对于疾病预防和控制的准确和及时决策至关重要。然而,反映渔业农民和领域专家之间对话问答交流的文本数据仍然很少,阻碍了培训和建立这种系统的进展。为了解决这一差距,我们引入了一种多维增强方法,利用DeepSeek来生成针对水产养殖的高质量增强数据,专注于问题方面。我们的方法旨在提供高质量的综合数据,以更好地训练水产养殖问答框架,从简短、自由形式和会话式的查询中全面理解关键信息,并推断问题的意图。我们采用多任务BERT框架来评估这些增强样本的可靠性和多样性,确保它们在扩展特定领域数据可用性的同时保留核心语义。我们将我们的方法与ChatGPT 01进行了基准测试,实验结果表明DeepSeek取得了更好的性能。具体而言,对于特定领域的关键信息(即实体)识别,准确率为92.08%,精密度为92.3%,召回率为92.05%,F1分数为91.78%;在意图分类方面,模型准确率达到91.67%,精密度达到93.48%,召回率达到91.67%,F1得分达到89.68%。值得注意的是,DeepSeek在意图分类方面超过了ChatGPT 01,在关键实体识别方面仍然具有竞争力。此外,增强后的样本具有鲁棒的域可靠性(余弦相似度<; 0.474619)和高多样性(Distinct- 1 = 0.9776;Self-BLUE = 0.0106)。这些结果证明了基于deepseek的多维文本增强在提高从事疾病管理的水产养殖专业人员的数据一致性和覆盖率方面的有效性。我们的方法特别强调提高用户问题的质量和全面性,从而为后续的答案生成和QA框架的整体知识改进奠定了更坚实的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Aquaculture International
Aquaculture International 农林科学-渔业
CiteScore
5.10
自引率
6.90%
发文量
204
审稿时长
1.0 months
期刊介绍: Aquaculture International is an international journal publishing original research papers, short communications, technical notes and review papers on all aspects of aquaculture. The Journal covers topics such as the biology, physiology, pathology and genetics of cultured fish, crustaceans, molluscs and plants, especially new species; water quality of supply systems, fluctuations in water quality within farms and the environmental impacts of aquacultural operations; nutrition, feeding and stocking practices, especially as they affect the health and growth rates of cultured species; sustainable production techniques; bioengineering studies on the design and management of offshore and land-based systems; the improvement of quality and marketing of farmed products; sociological and societal impacts of aquaculture, and more. This is the official Journal of the European Aquaculture Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信