Big Data ResearchPub Date : 2025-05-23DOI: 10.1016/j.bdr.2025.100537
Alessandro Magrini
{"title":"Bankruptcy risk prediction: A new approach based on compositional analysis of financial statements","authors":"Alessandro Magrini","doi":"10.1016/j.bdr.2025.100537","DOIUrl":"10.1016/j.bdr.2025.100537","url":null,"abstract":"<div><div>The development of models for bankruptcy risk prediction has gained much attention in recent years due to the great availability of financial statement data. Most existing predictive models rely on financial ratios, which are performance-based measures expressing the relative magnitude of two accounting items. Despite the popularity of financial ratios, their use is notoriously accompanied by serious practical drawbacks, like the occurrence of outliers and redundancy, making data preprocessing necessary to avoid computational problems and obtain a good predictive accuracy. Isometric log ratios can potentially overcome these problems because they are designed to represent compositional data efficiently and have a logarithmic form that limits the occurrence of outliers. However, although they are not novel in the analysis of financial statements, no study has ever employed them to predict bankruptcy. In this article, we show the effectiveness of isometric log ratios to detect bankruptcy events in a sample of 138,720 Italian firms (127,420 active and 11,300 bankrupted) belonging to different industries and with different size and age. For this purpose, we use logistic regression with adaptive LASSO regularization and random forests to construct several predictive models featuring either financial ratios or isometric log ratios, and combining different horizons and lag structures. The results show that a set of 8 isometric log ratios provides, without preprocessing, almost the same predictive accuracy as a selection of 16 financial ratios that requires dropping 3.6% of the data. Also, the adaptive LASSO regularization reveals that redundancy for isometric log ratios is always below 20%, and in some cases near 0%, while it ranges from 12.5% to 46.9% for financial ratios. The predictive accuracy of models based on logistic regression is in line with and even higher than the one reported by recent studies, and random forests achieve a gain in the area under the Receiver Operating Characteristic (ROC) curve ranging between two and three percentage points.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100537"},"PeriodicalIF":3.5,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144138321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2025-05-23DOI: 10.1016/j.bdr.2025.100543
Chiara Ardito , Roberto Leombruni , Giuseppe Costa , Angelo d’Errico
{"title":"Mortality and risk of cardiovascular diseases by age at retirement in three Italian cohorts","authors":"Chiara Ardito , Roberto Leombruni , Giuseppe Costa , Angelo d’Errico","doi":"10.1016/j.bdr.2025.100543","DOIUrl":"10.1016/j.bdr.2025.100543","url":null,"abstract":"<div><div>The relationship between age at retirement and subsequent physical health appears still contradictory in the literature, with more recent studies suggesting possible adverse health effects linked to employment at later ages. Aim of this study was to assess the long-term risk of overall mortality and incidence of cardiovascular diseases (CVDs) associated with age at retirement in three large Italian cohorts using both survey and administrative data.</div><div>The risk of mortality and CVDs associated with age at retirement, kept continuous, was assessed separately for gender using age-adjusted Cox models, further controlled for chronic morbidity, education, socioeconomic and previous working characteristics. In another analysis, age at retirement was examined treating it as a dichotomous variable, comparing, in a set of analyses with age at retirement from 52 to 65 years, the incidence of the health outcomes among subjects who retired after a certain age, compared to those who retired up to that age.</div><div>Higher age at retirement was associated with significantly higher mortality among men in the three cohorts, while among women the association was not significant, although in the same direction as for men. The risk of CVDs was also significantly associated with higher age at retirement in all the datasets among men, and in two of them among women. The set of the analyses on age at retirement dichotomized confirmed the results based on continuous age at retirement for both genders. Several robustness analyses, including IV Poisson instrumental variable, confirm the validity of results for men, whereas female results were less stable and robust.</div><div>Policy makers should be aware of the risk for public heath of policies that increase retirement age.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100543"},"PeriodicalIF":3.5,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144205025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2025-05-19DOI: 10.1016/j.bdr.2025.100540
Xiaoyu Zhang , Ye Pan , Lilan Tu
{"title":"The influence of China's exchange rate market on the Belt and Road trade market: Based on temporal two-layer networks","authors":"Xiaoyu Zhang , Ye Pan , Lilan Tu","doi":"10.1016/j.bdr.2025.100540","DOIUrl":"10.1016/j.bdr.2025.100540","url":null,"abstract":"<div><div>From 2010 to 2023, this research utilizes daily closing exchange rate data for countries participating in the Belt and Road Initiative (BRI) as well as China’s import and export volumes with these countries. Taking the renminbi (RMB) as the base currency and the other BRI currencies as quote currencies, we employ the Autoregressive Distributed Lag (ARDL) model to propose an algorithm for constructing a temporal two-layer network, resulting in the exchange-rate-trade network composed of 14 subnetworks. Through an analysis of the network’s topological structure, we observe that 2013 marks a significant turning point, after which the network transitions from a decentralized to a more centralized form. To assess the annual impact of China’s exchange rate and trade from 2010 to 2023, we introduce a comprehensive index for identifying key nodes within the network. Our findings based on this index indicate that: (1) Lebanon, Kyrgyzstan, and other diverse countries and regions emerge as key nodes, demonstrating China’s close economic ties with these countries and reflecting the substantial influence of RMB internationalization; and (2) compared with other years, China’s exchange rate market exerts notably stronger influence on the trade market in 2018, 2021, 2022, and 2023.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100540"},"PeriodicalIF":3.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multiple-group hidden Markov model for multi-source data. Cross-country differences in employment mobility in the presence of measurement error","authors":"Roberta Varriale , Mauricio Garnier-Villarreal , Dimitris Pavlopoulos , Danila Filipponi","doi":"10.1016/j.bdr.2025.100527","DOIUrl":"10.1016/j.bdr.2025.100527","url":null,"abstract":"<div><div>In this paper, we develop a multigroup hidden Markov model to tackle the issue of measurement error in multi-source data from different countries. We focus, in particular, on the measurement of employment mobility in the Netherlands and Italy using linked data from the Labour Force Survey and administrative sources. The measurement-error correction we apply reconciles differences between data sources and shows that cross-country differences in employment mobility are smaller than originally thought. Error-corrected estimates indicate that mobility from temporary to permanent employment has become, over time, larger in Italy than in the Netherlands, while mobility from non-employment to temporary employment has steadily been higher in the Netherlands than in Italy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100527"},"PeriodicalIF":3.5,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144116532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2025-05-17DOI: 10.1016/j.bdr.2025.100535
Yunting Liu, Yirong Huang
{"title":"A multimodal deep learning framework for constructing a market sentiment index from stock news","authors":"Yunting Liu, Yirong Huang","doi":"10.1016/j.bdr.2025.100535","DOIUrl":"10.1016/j.bdr.2025.100535","url":null,"abstract":"<div><div>Unimodal sentiment analysis often fails to capture the complexity of financial sentiment. This paper proposes a multimodal deep learning framework that integrates text, audio, and image data from CCTV news videos on TikTok to construct a multimodal sentiment indicator for the Chinese stock market. Empirical results show that multimodal fusion enhances sentiment analysis, with text outperforming audio and image modalities. The indicator correlates weakly with stock returns but significantly with market volatility, aligns with seasonal sentiment patterns, and reflects significant events like COVID-19. Additionally, weekly sentiment trends indicate the lowest sentiment on Thursdays and the highest on Fridays. This study advances financial sentiment analysis by demonstrating the efficacy of multimodal indicators in capturing market sentiment and informing volatility forecasts.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100535"},"PeriodicalIF":3.5,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The narrative on tourism sustainability in Italian news: A text mining approach","authors":"Carla Galluccio , Paola Beccherle , Alessandra Petrucci","doi":"10.1016/j.bdr.2025.100541","DOIUrl":"10.1016/j.bdr.2025.100541","url":null,"abstract":"<div><div>Tourism sustainability is a complex and multidimensional construct, for which there is no shared definition in the literature. Consequently, there is no standard method for its measurement, and the adoption of sustainable practices often falls short of reached goals. Therefore, contributing to the definition of the concept of sustainable tourism is essential, both for policymakers and academics. In this vein, news media data can represent a key element through which to understand the debate about tourism sustainability. This research aims to exploit the potential of news texts to explore how sustainable tourism is conceived within specific cultural contexts. Focusing on the case study of Italy, we analysed how the concept of tourism sustainability is represented in Italian newspapers, extracting the topics discussed in relation to this theme. From a methodological point of view, we employed a network-based approach for topic extraction. Our study contributes to the literature on tourism sustainability by proposing an innovative method for extracting information from unstructured data sources, such as textual data, providing policymakers with insights about the narrative around this topic.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100541"},"PeriodicalIF":3.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2025-05-16DOI: 10.1016/j.bdr.2025.100533
Alessio Bumbea , Andrea Mazzitelli , Giuseppe Espa , Alessandro Rinaldi
{"title":"Bipartite graph partitioning and spatial bootstrapping methods: A case study of innovative startups","authors":"Alessio Bumbea , Andrea Mazzitelli , Giuseppe Espa , Alessandro Rinaldi","doi":"10.1016/j.bdr.2025.100533","DOIUrl":"10.1016/j.bdr.2025.100533","url":null,"abstract":"<div><div>Innovative startups are the source of innovation and technological development; therefore, understanding their behavior can help better recognize the business organization's direction. This paper introduces a new method for clustering innovative startups using bipartite graph partitioning combined with spatial bootstrapping, improving clusters' accuracy and interpretability. Recent advancements in clustering techniques have introduced ensemble or consensus clustering methods, which aim to merge multiple clustering results into a superior outcome. A key challenge in this field is effectively integrating diverse clusters, and one promising solution involves utilizing graph formalism and partitioning strategies. By leveraging advanced graph partitioning techniques, we transform the task of partitioning the ensemble graph into a community detection problem. Our methodological approach improves the traditional method of bipartite graphs used in cluster ensembles by implementing the state of the art biLouvain algorithm. We also focused on techniques that could be used to increase the interpretability of the clusters themselves and how they can be used to obtain insightful information from the data. The proposed methodology was applied to a dataset of technologically advanced new businesses, located in the Lombardy region and recorded as innovative startups in the special section of the Italian Chambers of Commerce's Business Register.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100533"},"PeriodicalIF":3.5,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144090526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2025-05-15DOI: 10.1016/j.bdr.2025.100542
Domenica Fioredistella Iezzi , Roberto Monte
{"title":"E-word of mouth in sales volume forecasting: Toyota Camry case study","authors":"Domenica Fioredistella Iezzi , Roberto Monte","doi":"10.1016/j.bdr.2025.100542","DOIUrl":"10.1016/j.bdr.2025.100542","url":null,"abstract":"<div><div>In recent years, electronic word of mouth has become a significant factor in purchasing decisions, with consumers' sentiments playing a crucial role in shaping the sales of products and services.</div><div>This paper introduces a novel approach to sales forecasting that addresses consumers' sentiments toward goods or services by combining the sales volume time series with a quantitative proxy of the unobservable true sentiment. Numerous studies have explored various methods to capture sentiment and accurately predict sales. We have integrated an estimated sentiment signal, variously built via lexicon-based, machine-learning, and deep-learning approaches, into a multivariate autoregressive state space (MARSS) model. We have tested our model on a dataset of 163,000 tweets about the Toyota Camry, covering the period from June 2009 to December 2022 and sales volumes in the US market over the same timeframe.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100542"},"PeriodicalIF":3.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2025-05-15DOI: 10.1016/j.bdr.2025.100539
Cristian Usala, Isabella Sulis, Mariano Porcu
{"title":"Exploring the impact of high schools, socioeconomic factors, and degree programs on higher education success in Italy","authors":"Cristian Usala, Isabella Sulis, Mariano Porcu","doi":"10.1016/j.bdr.2025.100539","DOIUrl":"10.1016/j.bdr.2025.100539","url":null,"abstract":"<div><div>This study investigates the determinants of tertiary education success in Italy, focusing on students' outcomes between the first and second years. We use population data of students enrolled between 2015 and 2019, integrating information on high school environments and degree program characteristics. This rich dataset has been exploited with a two-step approach: the first step defines indicators for high school quality and degree program difficulty; the second estimates a multinomial logit to assess the determinants of students' probability of being classified as regulars, churners, at risk of dropout, and dropouts. Data regarding the 2019 cohort have been further investigated by exploiting the additional information on students' socioeconomic backgrounds and schools' self-assessed effectiveness evaluations. Results indicate that students' high school backgrounds, socioeconomic conditions, and post-graduation prospects in terms of net wages and occupation rates of graduates in the chosen degree program significantly influence academic success and students' academic persistence. Overall, the results offer a comprehensive view of the determinants of university success, with specific patterns observed across the different student categories.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100539"},"PeriodicalIF":3.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Business digitalization in Italy: A comprehensive analysis using supplementary fuzzy set approach","authors":"Ilaria Benedetti, Federico Crescenzi, Tiziana Laureti, Niccolò Salvini","doi":"10.1016/j.bdr.2025.100538","DOIUrl":"10.1016/j.bdr.2025.100538","url":null,"abstract":"<div><div>In an era where digital technologies such as AI, cloud computing and IoT are reshaping global business dynamics, the digital transformation of enterprises has become a pivotal factor for maintaining competitive advantage. This paper provides an in-depth analysis of the digitalization process among Italian firms, leveraging data from the ISTAT ICT survey. Using a fuzzy set approach, we develop a refined index to measure technological deprivation across multiple dimensions, providing a detailed understanding of how digitalization is adopted at the firm level. The results indicate a moderate level of technological development among firms. The dimension related to online sales emerges as the most underdeveloped, highlighting it as a critical area for improvement for Italian companies and underscoring the need for targeted policy interventions to bridge these digital gaps. Moreover, the analysis reveals significant disparities across sectors, geographic areas, and firm sizes, with smaller enterprises and those in certain regions exhibiting lower levels of digital adoption. Our study underscores the utility of the fuzzy set methodology for analyzing high-dimensional big data and provides actionable insights for enhancing digital adoption among firms in Italy.</div></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"41 ","pages":"Article 100538"},"PeriodicalIF":3.5,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144106841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}