语料库感知:高级文本和话语探索的综合工具

IF 2.1
Antonio Moreno-Ortiz
{"title":"语料库感知:高级文本和话语探索的综合工具","authors":"Antonio Moreno-Ortiz","doi":"10.1016/j.acorp.2025.100145","DOIUrl":null,"url":null,"abstract":"<div><div><em>Corpus Sense</em> is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100145"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Corpus sense: A comprehensive tool for advanced text and discourse exploration\",\"authors\":\"Antonio Moreno-Ortiz\",\"doi\":\"10.1016/j.acorp.2025.100145\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div><em>Corpus Sense</em> is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.</div></div>\",\"PeriodicalId\":72254,\"journal\":{\"name\":\"Applied Corpus Linguistics\",\"volume\":\"5 3\",\"pages\":\"Article 100145\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Corpus Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666799125000280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

语料库感知是一个专注于内容和话语分析的web应用程序,旨在促进语言语料库的探索,分析和可视化,其中包含一些现有软件无法提供的高级功能。该工具通过结合定量、定性和人工智能功能,使用户能够以最小的努力获得有用的见解。它专为中小型语料库(目前多达250万个令牌)而设计,允许在线语料库共享,并提供独特的功能,例如基于nlp的关键字提取,命名实体识别,语义搜索和高级主题建模与llm生成的可解释标签。该应用程序的界面简单直观,旨在使其能够被广泛的用户配置文件访问。本文对该应用程序的开发、结构及其在语料库语言学和语篇分析研究中的应用进行了综述。本文还讨论了基于nlp和ai辅助的新型工具与传统语料库分析方法的集成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Corpus sense: A comprehensive tool for advanced text and discourse exploration
Corpus Sense is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Corpus Linguistics
Applied Corpus Linguistics Linguistics and Language
CiteScore
1.30
自引率
0.00%
发文量
0
审稿时长
70 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信