Liming Shao, Hong Yu, Wei Huang, Huiyuan Zhao, Lixin Zhang, Jing Song
{"title":"DeepSeek-Based Multi-dimensional Augmentation of Short and Highly Domain-Specific Textual Inquires for Aquaculture Question-Answering Framework","authors":"Liming Shao, Hong Yu, Wei Huang, Huiyuan Zhao, Lixin Zhang, Jing Song","doi":"10.1007/s10499-025-01948-3","DOIUrl":null,"url":null,"abstract":"<div><p>High-quality data are essential for accurate and timely decision-making in disease prevention and control within aquaculture question-answering (QA) framework. However, textual data reflecting conversational question–answer exchange between fishery farmers and domain experts remain scarce, hindering the progress in training and building such systems. To address this gap, we introduce a multi-dimensionalaugmentationapproach leveraging DeepSeek to generate high-quality augmented data tailored to aquaculture, concentrating on the questions side. Our method aims to provide high quality synthesized data to better train aquaculture question-answering frameworks to gain comprehensive understanding of key information from short, free-form and conversational inquiries and deduct the intention of the questions. We employ a multi-task BERT framework to assess the reliability and diversity of these augmented samples, ensuring they preserve core semantics while expanding domain-specific data availability. We benchmark our approach against ChatGPT o1 and our experimental results demonstrate that DeepSeek achieves better performance. Specifically, for domain specific key information (aka entity) recognition, it attains an accuracy of 92.08%, precision of 92.3%, recall of 92.05%, and an F1 score of 91.78%; for intent classification, the model reaches 91.67% accuracy, 93.48% precision, 91.67% recall, and 89.68% F1 score. Notably, DeepSeek surpasses ChatGPT o1 in intent classification and remains competitive in key entity recognition. Furthermore, the augmented samples exhibit robust domain reliability (Cosine similarity < 0.474619) and high diversity (Distinct- 1 = 0.9776; Self-BLUE = 0.0106). These results demonstrate the efficacy of DeepSeek-based multi-dimensional text augmentation in improving data consistency and coverage for aquaculture professionals engaged in disease management. Our method places a particular emphasis on enhancing the quality and comprehensiveness of user questions, thereby laying a stronger foundation for subsequent answer generation and overall knowledge improvement in QA framework.\n</p></div>","PeriodicalId":8122,"journal":{"name":"Aquaculture International","volume":"33 4","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aquaculture International","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s10499-025-01948-3","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FISHERIES","Score":null,"Total":0}
引用次数: 0
Abstract
High-quality data are essential for accurate and timely decision-making in disease prevention and control within aquaculture question-answering (QA) framework. However, textual data reflecting conversational question–answer exchange between fishery farmers and domain experts remain scarce, hindering the progress in training and building such systems. To address this gap, we introduce a multi-dimensionalaugmentationapproach leveraging DeepSeek to generate high-quality augmented data tailored to aquaculture, concentrating on the questions side. Our method aims to provide high quality synthesized data to better train aquaculture question-answering frameworks to gain comprehensive understanding of key information from short, free-form and conversational inquiries and deduct the intention of the questions. We employ a multi-task BERT framework to assess the reliability and diversity of these augmented samples, ensuring they preserve core semantics while expanding domain-specific data availability. We benchmark our approach against ChatGPT o1 and our experimental results demonstrate that DeepSeek achieves better performance. Specifically, for domain specific key information (aka entity) recognition, it attains an accuracy of 92.08%, precision of 92.3%, recall of 92.05%, and an F1 score of 91.78%; for intent classification, the model reaches 91.67% accuracy, 93.48% precision, 91.67% recall, and 89.68% F1 score. Notably, DeepSeek surpasses ChatGPT o1 in intent classification and remains competitive in key entity recognition. Furthermore, the augmented samples exhibit robust domain reliability (Cosine similarity < 0.474619) and high diversity (Distinct- 1 = 0.9776; Self-BLUE = 0.0106). These results demonstrate the efficacy of DeepSeek-based multi-dimensional text augmentation in improving data consistency and coverage for aquaculture professionals engaged in disease management. Our method places a particular emphasis on enhancing the quality and comprehensiveness of user questions, thereby laying a stronger foundation for subsequent answer generation and overall knowledge improvement in QA framework.
期刊介绍:
Aquaculture International is an international journal publishing original research papers, short communications, technical notes and review papers on all aspects of aquaculture.
The Journal covers topics such as the biology, physiology, pathology and genetics of cultured fish, crustaceans, molluscs and plants, especially new species; water quality of supply systems, fluctuations in water quality within farms and the environmental impacts of aquacultural operations; nutrition, feeding and stocking practices, especially as they affect the health and growth rates of cultured species; sustainable production techniques; bioengineering studies on the design and management of offshore and land-based systems; the improvement of quality and marketing of farmed products; sociological and societal impacts of aquaculture, and more.
This is the official Journal of the European Aquaculture Society.