Jinhang Jiang , Ben Liu , Weiyao Peng , Karthik Srinivasan
{"title":"TextRegress: A Python package for advanced regression analysis on long-form text data","authors":"Jinhang Jiang , Ben Liu , Weiyao Peng , Karthik Srinivasan","doi":"10.1016/j.simpa.2025.100760","DOIUrl":null,"url":null,"abstract":"<div><div>TextRegress is an open-source Python package that leverages state-of-the-art deep learning techniques to perform regression analysis on long-form text data. Departing from conventional text mining tools that are confined to classification, sentiment, or readability metrics, TextRegress provides a unified framework for conducting predictive modeling of continuous outcomes. By integrating advanced encoding methods – including transformer-based embeddings, TF-IDF, and pre-trained Hugging Face models – with a robust PyTorch Lightning backend, TextRegress efficiently processes long texts through automatic chunking and dynamic feature integration. Its flexible architecture and customizable training paradigms empower researchers and practitioners across diverse domains to deploy sophisticated regression models, fostering reproducibility and accelerating innovation in text analytics.</div></div>","PeriodicalId":29771,"journal":{"name":"Software Impacts","volume":"24 ","pages":"Article 100760"},"PeriodicalIF":1.3000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software Impacts","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S266596382500020X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
TextRegress is an open-source Python package that leverages state-of-the-art deep learning techniques to perform regression analysis on long-form text data. Departing from conventional text mining tools that are confined to classification, sentiment, or readability metrics, TextRegress provides a unified framework for conducting predictive modeling of continuous outcomes. By integrating advanced encoding methods – including transformer-based embeddings, TF-IDF, and pre-trained Hugging Face models – with a robust PyTorch Lightning backend, TextRegress efficiently processes long texts through automatic chunking and dynamic feature integration. Its flexible architecture and customizable training paradigms empower researchers and practitioners across diverse domains to deploy sophisticated regression models, fostering reproducibility and accelerating innovation in text analytics.