Mohammed Jawwadul Islam , Mohammad Fahad Al Rafi , Pranto Podder , Aysha Siddika , Moumy Kabir , Euna Mehnaz Khan , Najmul Islam , Saddam Mukta
{"title":"Irregular sleep pattern identification and analysis from social media dataset using hybrid deep learning based attention mechanism","authors":"Mohammed Jawwadul Islam , Mohammad Fahad Al Rafi , Pranto Podder , Aysha Siddika , Moumy Kabir , Euna Mehnaz Khan , Najmul Islam , Saddam Mukta","doi":"10.1016/j.dim.2025.100104","DOIUrl":"10.1016/j.dim.2025.100104","url":null,"abstract":"<div><div>Following an irregular bedtime routine and having different amounts of sleep each night might increase a person's risk of obesity, cardiovascular problems, high blood pressure, insulin levels, and other metabolic problems. Similarly, in recent times, social media platforms have gained popularity among users for sharing their interests, thoughts, and opinions. Through social media activities, researchers have been able to mine the text data generated on these platforms to investigate and understand users' behaviors and habits. In this paper, we examine a total of 2,468,697 tweets to identify users' irregular sleeping patterns (ISP) using psycholinguistic and contextual features from their tweets. We conduct a linguistic analysis to understand the factors influencing users' psychological behavior and word use patterns, and find a correlation with their irregular sleeping patterns. We observe that users who have irregular sleeping patterns use anger, anxiety, death, and future categories of words in their tweets largely. In contrast, users with irregular sleeping patterns tend to use positive emotions, family, and other categories of words in their tweets. Building upon our findings, we develop a hybrid prediction model that predicts users' irregular sleeping patterns from psycholinguistic features with an accuracy of 91%. We examine the application of social media data for the early identification of irregular sleep patterns and their related mental and psychological concerns while investigating design prospects for future health technologies to enhance the monitoring and support of healthy sleep behavior.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 1","pages":"Article 100104"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145529210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aspect-based sentiment evolution and its correlation with review rounds in multi-round peer reviews: A deep learning approach","authors":"Ruxue Han , Haomin Zhou , Jiangtao Zhong , Chengzhi Zhang","doi":"10.1016/j.dim.2025.100105","DOIUrl":"10.1016/j.dim.2025.100105","url":null,"abstract":"<div><div>Mining sentiment information from the textual content of peer review comments offers valuable insights into the scientific evaluation process. However, previous studies are often constrained by coarse-grained analysis and the lack of differentiation across review rounds. Notably, the dynamic shifts in reviewers' focus and sentiment tendencies throughout multiple review stages remain underexplored. To address this gap, the present study investigates the distribution and evolution of aspect-level sentiments and examines their correlation with the number of review rounds. We begin by segmenting the multi-round review comments of 11,063 accepted papers from Nature Communications and identifying fine-grained review aspect clusters. A manually annotated corpus of approximately 5000 review sentences is then constructed. Using this dataset, we train a series of deep learning-based aspect sentiment classification models. Among them, the LCF-BERT-CDM model achieves the best performance, with a Macro-F<sub>1</sub> score of 82.65 %. Subsequent statistical analysis reveals a consistent trend: as the number of review rounds increases, the proportion of positive sentiments rises, while negative sentiments decline. Correlation analysis further indicates that aspect sentiment scores are negatively associated with the total number of review rounds. Key aspects exhibiting stronger correlations include “experiments”, “research significance” and “result analysis”.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 1","pages":"Article 100105"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145529209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The influence of multidisciplinary mega-journals on the Journal Impact Factor: Discipline, country/region, category, JIF quartile, and journal","authors":"Jing Li , Dengsheng Wu , Xinxin Chen","doi":"10.1016/j.dim.2025.100112","DOIUrl":"10.1016/j.dim.2025.100112","url":null,"abstract":"<div><div>This study examines the significant impact of mega-journals (MJs) on the scholarly evaluation system, particularly their citation contributions and the increased impact factor scores of influenced entities. By analyzing seven MJs across multiple dimensions, including discipline, country/region, Web of Science (WoS) category, and JIF quartile, we used the Generalized Impact Factor (GIF) as a proxy for the Journal Impact Factor (JIF) and developed the Contribution to Impact Factor (CIF) metric to quantify MJs' contributions. Our findings indicate that MJs can increase the GIF of scholarly entities by up to 7.79 %, even in countries/regions with millions of citations. Notably, Computer Science, Information Systems received 14.21 % of its citations from MJs, the highest among 241 fields studied. These results highlight the potential for MJs to inflate citation-based metrics, posing challenges to the academic evaluation system. We recommend adaptive strategies to mitigate JIF inflation or the development of alternative metrics to ensure a fair evaluation system.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 1","pages":"Article 100112"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145476083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How online health information exposure and fear of missing out drive Cyberchondria? The dual-stimulus effect","authors":"Lin Li , Wei Chen , Gaohui Cao","doi":"10.1016/j.dim.2025.100111","DOIUrl":"10.1016/j.dim.2025.100111","url":null,"abstract":"<div><div>The rise of cyberchondria has become a troubling side effect of the digital age, drawing concern due to its negative psychological impact. This study investigates the link between excessive social media use for health information and the development of cyberchondria. The current research focuses on the environmental and emotional stimuli, and social media communication overload as organism to examine the mechanism of cyberchondria. The findings suggest that increasing engagement with online health resources is associated with a reduction in information and communication overload. Conversely, heightened levels of fear of missing out can exacerbate these overloads. As information and communication overload escalate, so does cyberchondria. The significance of our findings lies in our expansion of the SOR model through the assessment of these factors in relation to the development of cyberchondria.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 1","pages":"Article 100111"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145476084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuepeng Li, Ziming Zeng, Qingqing Li, Shouqiang Sun, Yu Liu
{"title":"Evolution analysis of technological topics value potential and diffusion ability based on a three-tier network","authors":"Yuepeng Li, Ziming Zeng, Qingqing Li, Shouqiang Sun, Yu Liu","doi":"10.1016/j.dim.2025.100109","DOIUrl":"10.1016/j.dim.2025.100109","url":null,"abstract":"<div><div>To address the insufficient attention to the technological value potential and diffusion ability of topics in current evolution analysis, this study employs patent data from 2014 to 2023 in the domains of speech and image recognition. A tripartite \"keywords-topics-documents\" network is constructed using the BERTopic model for evaluation analysis. The evolution patterns of technological value potential and diffusion ability are investigated through the analysis of keyword associations and patent literature related to technical topics. By examining the evolution trajectories of technical topics and integrating value potential and diffusion ability analyses—based on keyword weights calculated using TextRank and patent citation frequencies—this research reveals a trend of cross-fusion in speech and image recognition topics. This trend is characterized by the incorporation of deep learning and multimodal recognition technologies. The value potential of technological topics exhibits an initial decline followed by a subsequent rise, while the diffusion ability demonstrates a continuous downward trend. This study provides intellectual support for technological forecasting and patent analytics.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 1","pages":"Article 100109"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145529206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining personal data Sovereignty: An ontologically-based framework facilitating subject privacy control","authors":"Vijon Baraku , Edon Ramadani , Iraklis Paraskakis , Simeon Veloudis , Poonam Yadav","doi":"10.1016/j.dim.2025.100108","DOIUrl":"10.1016/j.dim.2025.100108","url":null,"abstract":"<div><div>This paper presents the implementation and evaluation of the Data Capsule framework, a novel approach for achieving personal data sovereignty. Our framework uses formal knowledge representation to understand both the context of personal data collection across heterogeneous systems and define comprehensive usage policies - from access control to monetisation opportunities. As organisations increasingly collect and process personal data, individuals continue to lack effective mechanisms to control how their information is processed and/or shared across heterogeneous systems. We tackle this problem with two key contributions: (1) an ontology-based federation system that allows for seamless federation of personal data across databases using <span><span>schema.org</span><svg><path></path></svg></span> as a semantic foundation, and (2) a semantically driven dynamic usage control mechanism that allows individuals to define and enforce granular access rules. Our implementation demonstrates that effective personal data sovereignty can be achieved and serves as a foundation for future systems contributing to the empowerment of individuals in the digital economy.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"10 1","pages":"Article 100108"},"PeriodicalIF":0.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145529208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised topic labeling and opportunity model of social media data for enhancing automotive product design processes","authors":"Broto Widya Hartanto, Subagyo, I.G.B. Budi Dharma","doi":"10.1016/j.dim.2025.100103","DOIUrl":"10.1016/j.dim.2025.100103","url":null,"abstract":"<div><div>This study introduces a hybrid method combining topic modeling and an opportunity model to generate novel ideas for automobile product design improvements. Furthermore, it proposes a novel unsupervised topic labeling procedure to address the limitations in current topic modeling interpretations, which are often not fully unsupervised. The procedure comprised automatic generation of labels that directly support opportunity modeling and facilitate product design development. To achieve the stated objectives, data was collected from user comments on YouTube car reviews and analyzed using various algorithms and part-of-speech rules, finding that Non-Negative Matrix Factorization with noun-adjective combinations proved most effective in generating comprehensible topic labels and capturing emotional expressions. The results revealed six underserved labels, one served right, and two overserved categories for new vehicle design improvements, providing valuable insights into user experiences. The insights provided in this context are expected to contribute to the potential improvement of vehicle attribute designs, thereby enhancing the efficiency of the entire design process.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100103"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating explainability in language classification models: A unified framework incorporating feature attribution methods and key factors affecting faithfulness","authors":"Tahereh Dehdarirad","doi":"10.1016/j.dim.2025.100101","DOIUrl":"10.1016/j.dim.2025.100101","url":null,"abstract":"<div><div>This paper presents a unified framework for evaluating explainability methods in language classification models, integrating feature attribution and interaction approaches while considering key factors impacting faithfulness: model architecture, dataset characteristics, and evidence type. By comparing classical (Logistic Regression and Random Forest) and transformer models (RoBERTa and DistilBERT), the faithfulness of SHAP, LIME, and Integrated Gradients (IG) across positive, negative, and all evidence types were examined. In classical models, SHAP and LIME generally provide faithful explanations for positive and all evidence types, with SHAP and Random Forest best handling negative evidence.</div><div>For transformer models, the faithfulness of LIME and SHAP varies by model, dataset, and evidence type. LIME performs consistently well in complex models like RoBERTa and DistilBERT, while SHAP excels with positive evidence across datasets and is most effective in RoBERTa for negative evidence. For all evidence, SHAP also shows broader applicability across evidence types, whereas LIME is suited to specific datasets, such as Brexit, especially in RoBERTa and DistilBERT. For longer texts, IG and SHAP outperform LIME, with SHAP excelling in complex architectures like RoBERTa. When using IG, RoBERTa provides slightly more faithful explanations than DistilBERT for positive evidence, though only DistilBERT aligns with expected trends for negative evidence.</div><div>Feature interaction analyses using Shapley Taylor Interaction (STI) and Archipelago reveal that RoBERTa consistently provides more cohesive explanations than DistilBERT across datasets, especially with Archipelago. STI-based models produce more interpretable, human-relevant phrases, achieving higher relevance ratings, especially when evaluated within full text contexts.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100101"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ramcharan Ramanaharan, Deepani B. Guruge, Johnson I. Agbinya
{"title":"DeepFake video detection: Insights into model generalisation — A Systematic review","authors":"Ramcharan Ramanaharan, Deepani B. Guruge, Johnson I. Agbinya","doi":"10.1016/j.dim.2025.100099","DOIUrl":"10.1016/j.dim.2025.100099","url":null,"abstract":"<div><div>Deep learning generative models have progressed to a stage where distinguishing fake images and videos has become difficult, posing risks to personal integrity, potentially leading to social instability, and disrupting government functioning. Existing reviews have mainly focused on the approaches used to detect DeepFakes, and the data sets used for those approaches. However, challenges persist when attempting to generalise detection techniques to identify previously unseen datasets. The purpose of this systematic review is to explore state-of-the-art frameworks for DeepFake detection and provide readers with an understanding of the strengths and weaknesses of current approaches, as well as the generalisability of existing detection techniques. The study indicates that generalising DeepFake detection remains a challenge that requires further research. Moreover, 46.3% of the selected publications agreed that DeepFake detection techniques could be generalised to identify various types of DeepFakes. A key limitation in achieving generalisation is the tendency of models to overfit to available data datasets, reducing their effectiveness in adapting to new or unseen types of DeepFakes. This review emphasises the need for the development of extensive and diverse datasets that more accurately reflect the wide range of DeepFake manipulations encountered in real-world applications. Lastly, the paper explores potential advancements that could pave the way to the next generation of solutions against DeepFakes.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100099"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
German Cuaya-Simbro, Manolo Tellez Meneses, Elías Ruiz Hernández
{"title":"Improving structural learning in Bayesian networks: Stationarity analysis for algorithm choice","authors":"German Cuaya-Simbro, Manolo Tellez Meneses, Elías Ruiz Hernández","doi":"10.1016/j.dim.2025.100097","DOIUrl":"10.1016/j.dim.2025.100097","url":null,"abstract":"<div><div>Structural learning in Bayesian networks is crucial for accurate modeling of complex systems. However, the performance of structural learning algorithms is significantly influenced by data characteristics. This study investigates the impact of data stationarity on the performance of structural learning algorithms and proposes a measure for selecting the most appropriate algorithm based on stationarity analysis. We compared the performance of various algorithms on both stationary and non-stationary datasets, using the KPSS test to assess stationarity. Our findings indicate that Max-Min Hill Climbing (MMHC) is particularly effective for stationary data, while Hill Climbing performs better for non-stationary data. These results highlight the importance of tailoring algorithm selection to data characteristics and provide practical guidelines for researchers and practitioners. Future research could explore the development of more adaptive algorithms and delve deeper into the relationship between data stationarity and algorithm performance.</div></div>","PeriodicalId":72769,"journal":{"name":"Data and information management","volume":"9 4","pages":"Article 100097"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145468768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}