{"title":"Corpus sense: A comprehensive tool for advanced text and discourse exploration","authors":"Antonio Moreno-Ortiz","doi":"10.1016/j.acorp.2025.100145","DOIUrl":null,"url":null,"abstract":"<div><div><em>Corpus Sense</em> is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 3","pages":"Article 100145"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Corpus Sense is a web application with a focus on content and discourse analysis designed to facilitate the exploration, analysis and visualization of linguistic corpora that incorporates some advanced functionalities not available in existing software. The tool enables users to obtain useful insights with minimal effort by combining quantitative, qualitative and AI-powered features. It is designed for small to medium-sized corpora (currently up to 2.5 million tokens), permits online corpus sharing, and offers unique functionalities, such as NLP-based keyword extraction, named entity recognition, semantic search and advanced topic modelling with LLM-generated interpretable labels. The application’s interface is simple and intuitive, in an effort to make it accessible to a wide range of user profiles. This paper provides a comprehensive overview of the application’s development, architecture and applications in corpus linguistics and discourse analysis research. This description is complemented by a discussion of the integration of novel NLP-based and AI-assisted tools with traditional corpus analysis methods.