Journal of Computational Science最新文献

筛选
英文 中文
Enhancing multi-omics data classification with relative expression analysis and decision trees 利用相对表达分析和决策树加强多组学数据分类
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-18 DOI: 10.1016/j.jocs.2024.102460
Marcin Czajkowski, Krzysztof Jurczuk, Marek Kretowski
{"title":"Enhancing multi-omics data classification with relative expression analysis and decision trees","authors":"Marcin Czajkowski,&nbsp;Krzysztof Jurczuk,&nbsp;Marek Kretowski","doi":"10.1016/j.jocs.2024.102460","DOIUrl":"10.1016/j.jocs.2024.102460","url":null,"abstract":"<div><div>This study introduces the Relative Multi-test Classification Tree (RMCT), a novel classification method tailored for multi-omics data analysis. The RMCT method combines the interpretative power of decision trees with the analytical precision of Relative eXpression Analysis (RXA) to address the complex task of examining biomedical data derived from diverse high-throughput technologies. The proposed RMCT approach discerns patterns within and across omics layers, yielding an accurate and interpretable classifier. In each internal node of RMCT, we create a multitest - group of Top-Scoring-Pair tests, that capture the ordering relationships among features from various omics. Multi-tests are optimized for maximal reduction of Gini impurity, and ensuring consistency in decision-making. We address computational challenges by advanced GPU parallelization, remarkably improving RMCT’s time performance. Through experimental validation on diverse multi-omics datasets, RMCT has demonstrated superior performance compared to traditional tree-based solutions, particularly in terms of accuracy and clarity of predictions. This method effectively reveals intricate interactions and relationships within multi-omics data, marking it as a useful addition to bioinformatics and biomedicine. This work represents a thorough extension of our preliminary research, which was initially presented at the twenty-third edition of the International Conference on Computational Science (ICCS). It expands the initial concept of integrating decision trees with RXA for multi-omics data classification, deepening the analytical methodologies, further optimizing the GPU computing, and broadening the experimental validation.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"84 ","pages":"Article 102460"},"PeriodicalIF":3.1,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying influential nodes in complex networks through the k-shell index and neighborhood information 通过 K 壳指数和邻域信息识别复杂网络中的有影响力节点
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-15 DOI: 10.1016/j.jocs.2024.102473
Shima Esfandiari, Mohammad Reza Moosavi
{"title":"Identifying influential nodes in complex networks through the k-shell index and neighborhood information","authors":"Shima Esfandiari,&nbsp;Mohammad Reza Moosavi","doi":"10.1016/j.jocs.2024.102473","DOIUrl":"10.1016/j.jocs.2024.102473","url":null,"abstract":"<div><div>Identifying influential nodes is crucial in network science for controlling diseases, sharing information, and viral marketing. Current methods for finding vital spreaders have problems with accuracy, resolution, or time complexity. To address these limitations, this paper presents a hybrid approach called the Bubble Method (BM). First, the BM assumes a bubble with a radius of two surrounding each node. Then, it extracts various attributes from inside and near the surface of the bubble. These attributes are the k-shell index, k-shell diversity, and the distances of nodes within the bubble from the central node. We compared our method to 12 recent ones, including the Hybrid Global Structure model (HGSM) and Generalized Degree Decomposition (GDD), using the Susceptible–Infectious–Recovered (SIR) model to test its effectiveness. The results show the BM outperforms other methods in terms of accuracy, correctness, and resolution. Its low computational complexity renders it highly suitable for analyzing large-scale networks.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"84 ","pages":"Article 102473"},"PeriodicalIF":3.1,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EVADyR: A new dynamic resampling algorithm for auto-tuning noisy High Performance Computing systems EVADyR:用于自动调整噪声高性能计算系统的新型动态重采样算法
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-14 DOI: 10.1016/j.jocs.2024.102468
Sophie Robert-Hayek , Soraya Zertal , Philippe Couvée
{"title":"EVADyR: A new dynamic resampling algorithm for auto-tuning noisy High Performance Computing systems","authors":"Sophie Robert-Hayek ,&nbsp;Soraya Zertal ,&nbsp;Philippe Couvée","doi":"10.1016/j.jocs.2024.102468","DOIUrl":"10.1016/j.jocs.2024.102468","url":null,"abstract":"<div><div>Black-box auto-tuning methods have been proven to be efficient for tuning configurable computer hardware, including those encountered within the High Performance Computing (HPC) ecosystem. However, because of the shared nature of HPC clusters and the complexity of the software and hardware stacks, the measurement of the performance function can be tainted by noise during the tuning process, which can reduce and sometimes prevent the benefit of the tuning approach. A usual choice for performing the tuning in spite of these interference is to add a resampling step at each iteration to reduce uncertainty, but this approach can be time-consuming and must be done carefully. In this paper, we propose a new resampling and filtering algorithm called EVADyR (Efficient Value Aware Dynamic Resampling). Compared to the state of the art, it finds a better exploration versus exploitation trade-off by resampling only promising configuration and increases the level of confidence around the suggested solution as the tuning process advances. This algorithm was able to tune efficiently two I/O accelerators highly sensitive to interference, in two different scenarios. Compared to Standard Error Dynamic Resampling (SEDR), a state of the art noise reduction strategy, we show that EVADyR is able to reduce the distance to the optimum by 93.5% and 24.7% for the two I/O accelerators respectively, as well as speed-up the experiment duration by 45.8% and 58.1% because less iterations are needed to reach the found optimum. Our results prove the importance of using noise reduction strategies whenever tuning systems running in production.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"84 ","pages":"Article 102468"},"PeriodicalIF":3.1,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explosive synchronization in interacting star networks 相互作用的星形网络中的爆炸性同步
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-14 DOI: 10.1016/j.jocs.2024.102469
Ruby Varshney , Anjuman Ara Khatun , Haider Hasan Jafri
{"title":"Explosive synchronization in interacting star networks","authors":"Ruby Varshney ,&nbsp;Anjuman Ara Khatun ,&nbsp;Haider Hasan Jafri","doi":"10.1016/j.jocs.2024.102469","DOIUrl":"10.1016/j.jocs.2024.102469","url":null,"abstract":"<div><div>We study the transition to phase synchronization in an ensemble of Stuart–Landau oscillators interacting on a star network. We observe that by introducing frequency-weighted coupling and timescale variations in the dynamics of nodes, the system exhibits a first-order explosive transition to phase synchrony. Further, we extend this study to understand the nature of synchronization in the case of two coupled star networks. If the coupled star networks are identical, we observe that with increasing inter-star coupling strength, the hysteresis width initially increases, reaches a maximum value, then decreases before saturating. If the interacting star networks are non-identical, we observe that the transition to the coherent state is preceded by the occurrence of intermittent in-phase and anti-phase synchrony for small inter-star coupling. However, for large values of coupling strengths, we observe that the intermittent state disappears and the hysteresis width changes as in coupled identical star networks. We characterize these transitions by plotting the Lyapunov exponents for the system and the master stability function.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102469"},"PeriodicalIF":3.1,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new fourth-order compact finite difference method for solving Lane-Emden-Fowler type singular boundary value problems 解决 Lane-Emden-Fowler 型奇异边界值问题的新四阶紧凑有限差分法
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-12 DOI: 10.1016/j.jocs.2024.102474
Nirupam Sahoo , Randhir Singh , Ankur Kanaujiya , Carlo Cattani
{"title":"A new fourth-order compact finite difference method for solving Lane-Emden-Fowler type singular boundary value problems","authors":"Nirupam Sahoo ,&nbsp;Randhir Singh ,&nbsp;Ankur Kanaujiya ,&nbsp;Carlo Cattani","doi":"10.1016/j.jocs.2024.102474","DOIUrl":"10.1016/j.jocs.2024.102474","url":null,"abstract":"<div><div>We develop a novel fourth-order compact finite difference scheme to solve nonlinear singular ordinary differential equations. Such problems occur in many fields of science and engineering, such as studying the equilibrium of an isothermal gas sphere, reaction–diffusion in a spherical permeable catalyst, etc. These problems are challenging to solve because of their singularity or nonlinearity. By our proposed method, we can easily solve these complex problems without removing or modifying the singularity. To construct the new fourth-order compact difference method, Initially, we created a uniform mesh within the solution domain and developed a compact finite difference scheme. This scheme approximates the derivatives at the boundary nodal points to handle the problem’s singularity effectively. Employing a matrix analysis approach, we discussed the convergence analysis of the methods. To demonstrate its efficacy, we apply our approach to solve various real-life problems from the literature. The new method offers high-order accuracy with minimal grid points and provides better numerical results than the nonstandard finite difference method and exponential compact finite difference method.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102474"},"PeriodicalIF":3.1,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-informed neural networks and higher-order high-resolution methods for resolving discontinuities and shocks: A comprehensive study 用于解决不连续性和冲击的物理信息神经网络和高阶高分辨率方法:综合研究
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-12 DOI: 10.1016/j.jocs.2024.102466
Arun Govind Neelan , G. Sai Krishna , Vinoth Paramanantham
{"title":"Physics-informed neural networks and higher-order high-resolution methods for resolving discontinuities and shocks: A comprehensive study","authors":"Arun Govind Neelan ,&nbsp;G. Sai Krishna ,&nbsp;Vinoth Paramanantham","doi":"10.1016/j.jocs.2024.102466","DOIUrl":"10.1016/j.jocs.2024.102466","url":null,"abstract":"<div><div>Addressing discontinuities in fluid flow problems is inherently difficult, especially when shocks arise due to the nonlinear nature of the flow. While handling discontinuities is a well-established practice in computational fluid dynamics (CFD), it remains a major challenge when applying physics-informed neural networks (PINNs). In this study, we compare the shock-resolving capabilities of traditional CFD methods with those of PINNs, highlighting the advantages of the latter. Our findings show that PINNs exhibit less dissipative behavior compared to conventional techniques. We evaluated the performance of both PINNs and traditional methods on linear and nonlinear test cases, demonstrating that PINNs offer superior shock-resolving properties. Notably, PINNs can accurately resolve inviscid shocks with just three grid points, whereas traditional methods require at least seven points. This suggests that PINNs are more effective at resolving shocks and discontinuities when using the same grid for both PINN and CFD simulations. However, it is important to note that PINNs, in this context, are computationally more expensive than traditional methods on a given grid.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102466"},"PeriodicalIF":3.1,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A model to address the cold-start in peer recommendation by using k-means clustering and sentence embedding 利用均值聚类和句子嵌入解决同伴推荐冷启动问题的模型
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-12 DOI: 10.1016/j.jocs.2024.102465
Deepika Shukla, C. Ravindranath Chowdary
{"title":"A model to address the cold-start in peer recommendation by using k-means clustering and sentence embedding","authors":"Deepika Shukla,&nbsp;C. Ravindranath Chowdary","doi":"10.1016/j.jocs.2024.102465","DOIUrl":"10.1016/j.jocs.2024.102465","url":null,"abstract":"<div><div>In academia, research collaboration plays a vital role in enhancing the research quality and enriching the academic profile of the authors. Recommending appropriate collaborators from a vast scholarly database, particularly for newcomers, poses a challenging cold-start problem. This study addresses a cold-start problem in peer recommendation, considering a dynamic coauthorship graph as a network structure of academic collaborators. As the coauthorship graph is quite large and complex, an efficient indexing method is essential for speeding up the initial search of similar coauthors. The study introduces an efficient Global Inverted List <span><math><mrow><mo>(</mo><mi>G</mi><mi>I</mi><mi>L</mi><mo>)</mo></mrow></math></span> for indexing research areas and active authors in the coauthorship network. An attribute-based search and filtering mechanism is proposed to identify relevant collaborators, followed by the application of k-means clustering and doc2vec metrics to rank and select top recommendations. A cold user is associated with attributes that identify coauthors with similar research interests. For each attribute of the cold user, the model searches the associated authors from the GIL. Further, two filtering approaches are applied to refine the retrieved author list. The first ensures that the authors have a significant presence in the specified research areas, whereas the second one helps avoid recommending authors with only superficial connections to the cold user. The model creates a feature matrix of filtered authors using the publication features of authors. The k-means clustering applied to the feature matrix generates <span><math><mi>k</mi></math></span> clusters, among which the model chooses only those with seed nodes i.e. the clusters which are having seed nodes are selected for further process. Selected clusters are ranked using doc2vec metrics, with the top-ranked cluster providing the final recommendation. The model recommends the top <span><math><mi>L</mi></math></span> members of the selected cluster, where <span><math><mi>L</mi></math></span> is the length of the recommendations provided to the new user. Our extensive experiments show the efficacy of the proposed model.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102465"},"PeriodicalIF":3.1,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative assessment and dynamic characteristic measurement of regional resilience: From the perspective of post-earthquakes effects 地区复原力的定量评估和动态特征测量:从地震后影响的角度出发
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-10 DOI: 10.1016/j.jocs.2024.102461
Suyue Han , Bin Liu , Jun Shu , Zuli He , Xinyu Xia , Ke Pan , Hourui Ren
{"title":"Quantitative assessment and dynamic characteristic measurement of regional resilience: From the perspective of post-earthquakes effects","authors":"Suyue Han ,&nbsp;Bin Liu ,&nbsp;Jun Shu ,&nbsp;Zuli He ,&nbsp;Xinyu Xia ,&nbsp;Ke Pan ,&nbsp;Hourui Ren","doi":"10.1016/j.jocs.2024.102461","DOIUrl":"10.1016/j.jocs.2024.102461","url":null,"abstract":"<div><div>Strong geological disasters have caused persistent losses in society, economy, and ecological environments. Given the unique geographical settings of the stricken areas, their resilience is prone to damage or even loss. Comprehensive risk assessment of natural disasters is the core content and important foundation for building regional resilience. Therefore, conducting dynamic characteristics analysis of resilience in mountainous disaster areas impacted by strong earthquake geological disasters is vital for ensuring the region's high-quality and sustainable development. This article takes the 51 stricken areas of Wenchuan earthquake as the research object. To this end, social, economic and ecological environmental data from 2008 to 2020 was hereby collected. Initially, a regional resilience assessment system based on \"socio-economic-ecological environment\" was established, considering the long-term and spatial heterogeneity of geological disasters. Secondly, the regional resilience assessment model was constructed using Spectral clustering-genetic algorithm-improved entropy weight method. Following that, the dynamic characteristics of regional resilience were quantitatively analyzed from two aspects, including change velocity state and change rate trend. Finally, based on the regional resilience characteristics, differentiated resilience enhancement strategies were proposed. Collectively, the results revealed that: (1) From a geological disaster standpoint, the risk in post-earthquake disaster areas exhibited a strikingly rapid decline, with the spatial distribution of geological disaster risk being notably higher in the central areas and diminishing towards the peripheries. (2) Overall, the regional resilience of the 51 stricken areas showed a \"V-shaped\" trend, with a significant upturn since 2012. (3) From the perspective of dynamic characteristics, more counties (cities) presented an upward trend. (4) The 51 stricken areas were hereby divided into the \"benchmarking type\", the \"declination type\", the \"backward type\", and the \"potential type\". In conclusion, the current study enhances the technical framework for evaluating regional resilience and provides technical support for the construction of resilient cities.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102461"},"PeriodicalIF":3.1,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A method for filling missing values in multivariate sequence bidirectional recurrent neural networks based on feature correlations 基于特征相关性的多元序列双向递归神经网络缺失值填充方法
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-08 DOI: 10.1016/j.jocs.2024.102472
Xiaoying Pan , Hao Wang , Mingzhu Lei , Tong Ju , Lin Bai
{"title":"A method for filling missing values in multivariate sequence bidirectional recurrent neural networks based on feature correlations","authors":"Xiaoying Pan ,&nbsp;Hao Wang ,&nbsp;Mingzhu Lei ,&nbsp;Tong Ju ,&nbsp;Lin Bai","doi":"10.1016/j.jocs.2024.102472","DOIUrl":"10.1016/j.jocs.2024.102472","url":null,"abstract":"<div><div>Multivariate real-life time series data often contain missing values. These missing values often affect subsequent prediction tasks. Traditional imputation methods generally consider only some of the characteristics of multivariate time series data. This can easily lead to inaccurate filling results. In this paper, a feature correlation-based bidirectional recurrent network (BRNN-FR) is proposed to solve the problem of missing values in multivariate sequence data. First, this method involves the design of a bidirectional prediction network based on time intervals and the use of forward and reverse time series information between data points to obtain the characteristics of data changes with time to the greatest extent. Second, considering the correlation between features, a combined feature selection strategy based on the Pearson correlation coefficient and mutual information was proposed. A multiple regression model was established to predict between features. Finally, a bidirectional network ensemble filling algorithm based on the relationships between features is established to predict missing values. Comprehensive experiments on four public datasets show that the mean absolute error (MAE), root mean square error (RMSE) and maximum R2 value (R2_score) of the BRNN-FR algorithm in the direct imputation test are better than those of the other comparison methods in most cases. BRNN-FR also achieved a better area under the curve (AUC) in the indirect comparison experiment of two classifications of in-hospital death after filling the medical dataset. Using the AIR air quality dataset and the power transformer temperature dataset from the ETTH1 interpolation regression to predict the next 3 hours and 6 hours of average numerical results, most of the optimal regression results are obtained.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102472"},"PeriodicalIF":3.1,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applications of Text Mining techniques to extract meaningful information from gastroenterology medical reports 应用文本挖掘技术从消化内科医疗报告中提取有意义的信息
IF 3.1 3区 计算机科学
Journal of Computational Science Pub Date : 2024-11-05 DOI: 10.1016/j.jocs.2024.102458
Rosarina Vallelunga , Ileana Scarpino , Maria Chiara Martinis , Francesco Luzza , Chiara Zucco
{"title":"Applications of Text Mining techniques to extract meaningful information from gastroenterology medical reports","authors":"Rosarina Vallelunga ,&nbsp;Ileana Scarpino ,&nbsp;Maria Chiara Martinis ,&nbsp;Francesco Luzza ,&nbsp;Chiara Zucco","doi":"10.1016/j.jocs.2024.102458","DOIUrl":"10.1016/j.jocs.2024.102458","url":null,"abstract":"<div><div>Text mining techniques, particularly topic modeling, can be used for the automatic extraction of information from medical reports. The ability to autonomously analyze texts and identify topics within them can provide meaningful clinical insights that support physicians in diagnostic settings and enhance the characterization of intestinal diseases, leading to more efficient and automated systems.</div><div>This study evaluates the effectiveness of Latent Dirichlet Allocation (LDA) and BERTopic in modeling topics from colonoscopy reports related to Crohn’s Disease, Ulcerative Colitis, and Polyps. We compared these models in terms of their ability to identify clinically relevant topics, their influence on the performance of machine learning classifiers trained on the derived topic features, and their scalability.</div><div>Our analysis, based on average results across five iterations of train-test splits, showed that BERTopic generally outperformed LDA in clustering metrics, achieving Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Purity scores of 0.5637, 0.5953, and 0.8447, respectively, compared to LDA’s scores of 0.5349, 0.5254, and 0.8149. Additionally, classifiers trained on BERTopic-derived features exhibited improved predictive accuracy and F1-scores, with Logistic Regression reaching a mean accuracy of 0.8464 and a mean F1-score of 0.8507, compared to 0.8319 and 0.8351 for LDA-based features. Despite BERTopic’s overall superior performance, LDA demonstrated greater stability and interpretability, making it a viable option in scenarios where computational efficiency is a priority.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"83 ","pages":"Article 102458"},"PeriodicalIF":3.1,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142663590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信