Annotating scientific uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches

IF 3.5 2区管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Informetrics Pub Date : 2025-04-03 DOI:10.1016/j.joi.2025.101661

Panggih Kusuma Ningrum , Philipp Mayr , Nina Smirnova , Iana Atanassova

{"title":"Annotating scientific uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches","authors":"Panggih Kusuma Ningrum , Philipp Mayr , Nina Smirnova , Iana Atanassova","doi":"10.1016/j.joi.2025.101661","DOIUrl":null,"url":null,"abstract":"<div><div>We present UnScientify,<span><span><sup>1</sup></span></span> a system designed to detect scientific uncertainty in scholarly full text. The system utilizes a weakly supervised technique to identify verbally expressed uncertainty in scientific texts and their authorial references. The core methodology of UnScientify is based on a multi-faceted pipeline that integrates span pattern matching, complex sentence analysis and author reference checking. This approach streamlines the labeling and annotation processes essential for identifying scientific uncertainty, covering a variety of uncertainty expression types to support diverse applications including information retrieval, text mining and scientific document processing. The evaluation results highlight the trade-offs between modern large language models (LLMs) and the UnScientify system. UnScientify, which employs more traditional techniques, achieved superior performance in the scientific uncertainty detection task, attaining an accuracy score of 0.808. This finding underscores the continued relevance and efficiency of UnScientify's simple rule-based and pattern matching strategy for this specific application. The results demonstrate that in scenarios where resource efficiency, interpretability, and domain-specific adaptability are critical, traditional methods can still offer significant advantages.</div></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"19 2","pages":"Article 101661"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157725000252","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

We present UnScientify,¹ a system designed to detect scientific uncertainty in scholarly full text. The system utilizes a weakly supervised technique to identify verbally expressed uncertainty in scientific texts and their authorial references. The core methodology of UnScientify is based on a multi-faceted pipeline that integrates span pattern matching, complex sentence analysis and author reference checking. This approach streamlines the labeling and annotation processes essential for identifying scientific uncertainty, covering a variety of uncertainty expression types to support diverse applications including information retrieval, text mining and scientific document processing. The evaluation results highlight the trade-offs between modern large language models (LLMs) and the UnScientify system. UnScientify, which employs more traditional techniques, achieved superior performance in the scientific uncertainty detection task, attaining an accuracy score of 0.808. This finding underscores the continued relevance and efficiency of UnScientify's simple rule-based and pattern matching strategy for this specific application. The results demonstrate that in scenarios where resource efficiency, interpretability, and domain-specific adaptability are critical, traditional methods can still offer significant advantages.

查看原文本刊更多论文

科学不确定性注释：一个使用语言模式的综合模型，并与现有方法进行比较

我们提出了UnScientify，1一个系统，旨在检测学术全文中的科学不确定性。该系统利用弱监督技术来识别科学文本及其作者参考文献中口头表达的不确定性。UnScientify的核心方法论是基于一个多面管道，它集成了跨模式匹配、复杂句子分析和作者参考检查。该方法简化了识别科学不确定性所必需的标记和注释过程，涵盖了各种不确定性表达类型，以支持包括信息检索、文本挖掘和科学文档处理在内的各种应用。评估结果突出了现代大型语言模型（llm）和UnScientify系统之间的权衡。UnScientify采用更传统的技术，在科学不确定度检测任务中取得了更优异的成绩，准确率得分为0.808。这一发现强调了UnScientify简单的基于规则和模式匹配策略对该特定应用程序的持续相关性和效率。结果表明，在资源效率、可解释性和特定于领域的适应性至关重要的场景中，传统方法仍然可以提供显著的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Informetrics Social Sciences-Library and Information Sciences

CiteScore

6.40

自引率

16.20%

发文量

期刊介绍： Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.