Journal of Big Data最新文献

筛选
英文 中文
A proposed hybrid framework to improve the accuracy of customer churn prediction in telecom industry 提高电信业客户流失预测准确性的混合框架建议
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-09 DOI: 10.1186/s40537-024-00922-9
Shimaa Ouf, Kholoud T. Mahmoud, Manal A. Abdel-Fattah
{"title":"A proposed hybrid framework to improve the accuracy of customer churn prediction in telecom industry","authors":"Shimaa Ouf, Kholoud T. Mahmoud, Manal A. Abdel-Fattah","doi":"10.1186/s40537-024-00922-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00922-9","url":null,"abstract":"<p>In the telecom sector, predicting customer churn has increased in importance in recent years. Developing a robust and accurate churn prediction model takes time, but it is crucial. Early churn prediction avoids revenue loss and improves customer retention. Telecom companies must identify these customers before they leave to solve this issue. Researchers have used a variety of applied machine-learning approaches to reveal the hidden relationships between different features. A key aspect of churn prediction is the accuracy level that affects the learning model's performance. This study aims to clarify several aspects of customer churn prediction accuracy and investigate state-of-the-art techniques' performance. However, no previous research has investigated performance using a hybrid framework combining the advantages of selecting suitable data preprocessing, ensemble learning, and resampling techniques. The study introduces a proposed hybrid framework that improves the accuracy of customer churn prediction in the telecom industry. The framework is built by integrating the XGBOOST classifier with the hybrid resampling method SMOTE-ENN, which concerns applying effective techniques for data preprocessing. The proposed framework is used for two experiments with three datasets in the telecom industry. This study determines which features are most crucial and influence customer churn, introduces the impact of data balancing, compares the classifiers' pre- and post-data balancing performances, and examines a speed-accuracy trade-off in hybrid classifiers. Many metrics, including accuracy, precision, recall, F1-score, and ROC curve, are used to analyze the results. All evaluation criteria are used to identify the most effective experiment. The results of the accuracy of the hybrid framework that respects balanced data outperformed applying the classifier only to imbalanced data. In addition, the results of the proposed hybrid framework are compared to previous studies on the same datasets, and the result of this comparison is offered. Compared with the review of the latest works, our proposed hybrid framework with the three datasets outperformed these works.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"58 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clustering 基于dirichlet多叉混合物和模糊匹配算法的混合主题建模方法用于短文聚类
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-09 DOI: 10.1186/s40537-024-00930-9
Mutasem K. Alsmadi, Malek Alzaqebah, Sana Jawarneh, Ibrahim ALmarashdeh, Mohammed Azmi Al-Betar, Maram Alwohaibi, Noha A. Al-Mulla, Eman AE Ahmed, Ahmad AL Smadi
{"title":"Hybrid topic modeling method based on dirichlet multinomial mixture and fuzzy match algorithm for short text clustering","authors":"Mutasem K. Alsmadi, Malek Alzaqebah, Sana Jawarneh, Ibrahim ALmarashdeh, Mohammed Azmi Al-Betar, Maram Alwohaibi, Noha A. Al-Mulla, Eman AE Ahmed, Ahmad AL Smadi","doi":"10.1186/s40537-024-00930-9","DOIUrl":"https://doi.org/10.1186/s40537-024-00930-9","url":null,"abstract":"<p>Topic modeling methods proved to be effective for inferring latent topics from short texts. Dealing with short texts is challenging yet helpful for many real-world applications, due to the sparse terms in the text and the high dimensionality representation. Most of the topic modeling methods require the number of topics to be defined earlier. Similarly, methods based on Dirichlet Multinomial Mixture (DMM) involve the maximum possible number of topics before execution which is hard to determine due to topic uncertainty, and many noises exist in the dataset. Hence, a new approach called the Topic Clustering algorithm based on Levenshtein Distance (TCLD) is introduced in this paper, TCLD combines DMM models and the Fuzzy matching algorithm to address two key challenges in topic modeling: (a) The outlier problem in topic modeling methods. (b) The problem of determining the optimal number of topics. TCLD uses the initial clustered topics generated by DMM models and then evaluates the semantic relationships between documents using Levenshtein Distance. Subsequently, it determines whether to keep the document in the same cluster, relocate it to another cluster, or mark it as an outlier. The results demonstrate the efficiency of the proposed approach across six English benchmark datasets, in comparison to seven topic modeling approaches, with 83% improvement in purity and 67% enhancement in Normalized Mutual Information (NMI) across all datasets. The proposed method was also applied to a collected Arabic tweet and the results showed that only 12% of the Arabic short texts were incorrectly clustered, according to human inspection.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEMFFA: a multi-strategy modified Fennec Fox algorithm with mixed improved differential evolutionary variation strategies DEMFFA:采用混合改进型差分进化变异策略的多策略改进型芬内克-福克斯算法
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-08 DOI: 10.1186/s40537-024-00917-6
Gang Hu, Keke Song, Xiuxiu Li, Yi Wang
{"title":"DEMFFA: a multi-strategy modified Fennec Fox algorithm with mixed improved differential evolutionary variation strategies","authors":"Gang Hu, Keke Song, Xiuxiu Li, Yi Wang","doi":"10.1186/s40537-024-00917-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00917-6","url":null,"abstract":"<p>The Fennec Fox algorithm (FFA) is a new meta-heuristic algorithm that is primarily inspired by the Fennec fox's ability to dig and escape from wild predators. Compared with other classical algorithms, FFA shows strong competitiveness. The “No free lunch” theorem shows that an algorithm has different effects in the face of different problems, such as: when solving high-dimensional or more complex applications, there are challenges such as easily falling into local optimal and slow convergence speed. To solve this problem with FFA, in this paper, an improved Fenna fox algorithm DEMFFA is proposed by adding sin chaotic mapping, formula factor adjustment, Cauchy operator mutation, and differential evolution mutation strategies. Firstly, a sin chaotic mapping strategy is added in the initialization stage to make the population distribution more uniform, thus speeding up the algorithm convergence speed. Secondly, in order to expedite the convergence speed of the algorithm, adjustments are made to the factors of the formula whose position is updated in the first stage, resulting in faster convergence. Finally, in order to prevent the algorithm from getting into the local optimal too early and expand the search space of the population, the Cauchy operator mutation strategy and differential evolution mutation strategy are added after the first and second stages of the original algorithm update. In order to verify the performance of the proposed DEMFFA, qualitative analysis is carried out on different test sets, and the proposed algorithm is tested with the original FFA, other classical algorithms, improved algorithms, and newly proposed algorithms on three different test sets. And we also carried out a qualitative analysis of the CEC2020. In addition, DEMFFA is applied to 10 practical engineering design problems and a complex 24-bar truss topology optimization problem, and the results show that the DEMFFA algorithm has the potential to solve complex problems.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"38 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Establishment of an automatic diagnosis system for corneal endothelium diseases using artificial intelligence 利用人工智能建立角膜内皮疾病自动诊断系统
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-08 DOI: 10.1186/s40537-024-00913-w
Jing-hao Qu, Xiao-ran Qin, Zi-jun Xie, Jia-he Qian, Yang Zhang, Xiao-nan Sun, Yu-zhao Sun, Rong-mei Peng, Ge-ge Xiao, Jing Lin, Xiao-yan Bian, Tie-hong Chen, Yan Cheng, Shao-feng Gu, Hai-kun Wang, Jing Hong
{"title":"Establishment of an automatic diagnosis system for corneal endothelium diseases using artificial intelligence","authors":"Jing-hao Qu, Xiao-ran Qin, Zi-jun Xie, Jia-he Qian, Yang Zhang, Xiao-nan Sun, Yu-zhao Sun, Rong-mei Peng, Ge-ge Xiao, Jing Lin, Xiao-yan Bian, Tie-hong Chen, Yan Cheng, Shao-feng Gu, Hai-kun Wang, Jing Hong","doi":"10.1186/s40537-024-00913-w","DOIUrl":"https://doi.org/10.1186/s40537-024-00913-w","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>To use artificial intelligence to establish an automatic diagnosis system for corneal endothelium diseases (CEDs).</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>We develop an automatic system for detecting multiple common CEDs involving an enhanced compact convolutional transformer (ECCT). Specifically, we introduce a cross-head relative position encoding scheme into a standard self-attention module to capture contextual information among different regions and employ a token-attention feed-forward network to place greater focus on valuable abnormal regions.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>A total of 2723 images from CED patients are used to train our system. It achieves an accuracy of 89.53%, and the area under the receiver operating characteristic curve (AUC) is 0.958 (95% CI 0.943–0.971) on images from multiple centres.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Our system is the first artificial intelligence-based system for diagnosing CEDs worldwide. Images can be uploaded to a specified website, and automatic diagnoses can be obtained; this system can be particularly helpful under pandemic conditions, such as those seen during the recent COVID-19 pandemic.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"15 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation is key: a survey on evaluation measures for synthetic time series 评估是关键:关于合成时间序列评估措施的调查
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-07 DOI: 10.1186/s40537-024-00924-7
Michael Stenger, Robert Leppich, Ian Foster, Samuel Kounev, André Bauer
{"title":"Evaluation is key: a survey on evaluation measures for synthetic time series","authors":"Michael Stenger, Robert Leppich, Ian Foster, Samuel Kounev, André Bauer","doi":"10.1186/s40537-024-00924-7","DOIUrl":"https://doi.org/10.1186/s40537-024-00924-7","url":null,"abstract":"<p>Synthetic data generation describes the process of learning the underlying distribution of a given real dataset in a model, which is, in turn, sampled to produce new data objects still adhering to the original distribution. This approach often finds application where circumstances limit the availability or usability of real-world datasets, for instance, in health care due to privacy concerns. While image synthesis has received much attention in the past, time series are key for many practical (e.g., industrial) applications. To date, numerous different generative models and measures to evaluate time series syntheses have been proposed. However, regarding the defining features of high-quality synthetic time series and how to quantify quality, no consensus has yet been reached among researchers. Hence, we propose a comprehensive survey on evaluation measures for time series generation to assist users in evaluating synthetic time series. For one, we provide brief descriptions or - where applicable - precise definitions. Further, we order the measures in a taxonomy and examine applicability and usage. To assist in the selection of the most appropriate measures, we provide a concise guide for fast lookup. Notably, our findings reveal a lack of a universally accepted approach for an evaluation procedure, including the selection of appropriate measures. We believe this situation hinders progress and may even erode evaluation standards to a “do as you like”-approach to synthetic data evaluation. Therefore, this survey is a preliminary step to advance the field of synthetic data evaluation.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"28 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140886975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the current landscape of AI and sustainability literature: identifying key trends, addressing gaps and challenges 评估当前人工智能和可持续发展文献的现状:确定主要趋势,缩小差距,应对挑战
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-06 DOI: 10.1186/s40537-024-00912-x
Shailesh Tripathi, Nadine Bachmann, Manuel Brunner, Ziad Rizk, Herbert Jodlbauer
{"title":"Assessing the current landscape of AI and sustainability literature: identifying key trends, addressing gaps and challenges","authors":"Shailesh Tripathi, Nadine Bachmann, Manuel Brunner, Ziad Rizk, Herbert Jodlbauer","doi":"10.1186/s40537-024-00912-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00912-x","url":null,"abstract":"<p>The United Nations’ 17 Sustainable Development Goals stress the importance of global and local efforts to address inequalities and implement sustainability. Addressing complex, interconnected sustainability challenges requires a systematic, interdisciplinary approach, where technology, AI, and data-driven methods offer potential solutions for optimizing resources, integrating different aspects of sustainability, and informed decision-making. Sustainability research surrounds various local, regional, and global challenges, emphasizing the need to identify emerging areas and gaps where AI and data-driven models play a crucial role. The study performs a comprehensive literature survey and scientometric and semantic analyses, categorizes data-driven methods for sustainability problems, and discusses the sustainable use of AI and big data. The outcomes of the analyses highlight the importance of collaborative and inclusive research that bridges regional differences, the interconnection of AI, technology, and sustainability topics, and the major research themes related to sustainability. It further emphasizes the significance of developing hybrid approaches combining AI, data-driven techniques, and expert knowledge for multi-level, multi-dimensional decision-making. Furthermore, the study recognizes the necessity of addressing ethical concerns and ensuring the sustainable use of AI and big data in sustainability research.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"14 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Amharic spoken digits recognition using convolutional neural network 利用卷积神经网络识别阿姆哈拉语口语数字
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-04 DOI: 10.1186/s40537-024-00910-z
Tewodros Alemu Ayall, Changjun Zhou, Huawen Liu, Getnet Mezgebu Brhanemeskel, Solomon Teferra Abate, Michael Adjeisah
{"title":"Amharic spoken digits recognition using convolutional neural network","authors":"Tewodros Alemu Ayall, Changjun Zhou, Huawen Liu, Getnet Mezgebu Brhanemeskel, Solomon Teferra Abate, Michael Adjeisah","doi":"10.1186/s40537-024-00910-z","DOIUrl":"https://doi.org/10.1186/s40537-024-00910-z","url":null,"abstract":"<p>Spoken digits recognition (SDR) is a type of supervised automatic speech recognition, which is required in various human–machine interaction applications. It is utilized in phone-based services like dialing systems, certain bank operations, airline reservation systems, and price extraction. However, the design of SDR is a challenging task that requires the development of labeled audio data, the proper choice of feature extraction method, and the development of the best performing model. Even if several works have been done for various languages, such as English, Arabic, Urdu, etc., there is no developed Amharic spoken digits dataset (AmSDD) to build Amharic spoken digits recognition (AmSDR) model for the Amharic language, which is the official working language of the government of Ethiopia. Therefore, in this study, we developed a new AmSDD that contains 12,000 utterances of 0 (Zaero) to 9 (zet’enyi) digits which were recorded from 120 volunteer speakers of different age groups, genders, and dialects who repeated each digit ten times. Mel frequency cepstral coefficients (MFCCs) and Mel-Spectrogram feature extraction methods were used to extract trainable features from the speech signal. We conducted different experiments on the development of the AmSDR model using the AmSDD and classical supervised learning algorithms such as Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest (RF) as the baseline. To further improve the performance recognition of AmSDR, we propose a three layers Convolutional Neural Network (CNN) architecture with Batch normalization. The results of our experiments show that the proposed CNN model outperforms the baseline algorithms and scores an accuracy of 99% and 98% using MFCCs and Mel-Spectrogram features, respectively.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"21 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices 对大型语言模型进行 Xai 驱动的知识提炼,以便在低资源设备上高效部署
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-04 DOI: 10.1186/s40537-024-00928-3
Riccardo Cantini, Alessio Orsino, Domenico Talia
{"title":"Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices","authors":"Riccardo Cantini, Alessio Orsino, Domenico Talia","doi":"10.1186/s40537-024-00928-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00928-3","url":null,"abstract":"<p>Large Language Models (LLMs) are characterized by their inherent memory inefficiency and compute-intensive nature, making them impractical to run on low-resource devices and hindering their applicability in edge AI contexts. To address this issue, Knowledge Distillation approaches have been adopted to transfer knowledge from a complex model, referred to as the teacher, to a more compact, computationally efficient one, known as the student. The aim is to retain the performance of the original model while substantially reducing computational requirements. However, traditional knowledge distillation methods may struggle to effectively transfer crucial explainable knowledge from an LLM teacher to the student, potentially leading to explanation inconsistencies and decreased performance. This paper presents <i>DiXtill</i>, a method based on a novel approach to distilling knowledge from LLMs into lightweight neural architectures. The main idea is to leverage local explanations provided by an eXplainable Artificial Intelligence (XAI) method to guide the cross-architecture distillation of a teacher LLM into a self-explainable student, specifically a bi-directional LSTM network.Experimental results show that our XAI-driven distillation method allows the teacher explanations to be effectively transferred to the student, resulting in better agreement compared to classical distillation methods,thus enhancing the student interpretability. Furthermore, it enables the student to achieve comparable performance to the teacher LLM while also delivering a significantly higher compression ratio and speedup compared to other techniques such as post-training quantization and pruning, which paves the way for more efficient and sustainable edge AI applications</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"18 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-performance computing in healthcare:an automatic literature analysis perspective 医疗保健领域的高性能计算:自动文献分析视角
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-02 DOI: 10.1186/s40537-024-00929-2
Jieyi Li, Shuai Wang, Stevan Rudinac, Anwar Osseyran
{"title":"High-performance computing in healthcare:an automatic literature analysis perspective","authors":"Jieyi Li, Shuai Wang, Stevan Rudinac, Anwar Osseyran","doi":"10.1186/s40537-024-00929-2","DOIUrl":"https://doi.org/10.1186/s40537-024-00929-2","url":null,"abstract":"<p>The adoption of high-performance computing (HPC) in healthcare has gained significant attention in recent years, driving advancements in medical research and clinical practice. Exploring the literature on HPC implementation in healthcare is valuable for decision-makers as it provides insights into potential areas for further investigation and investment. However, manually analyzing the vast number of scholarly articles is a challenging and time-consuming task. Fortunately, topic modeling techniques offer the capacity to process extensive volumes of scientific literature, identifying key trends within the field. This paper presents an automatic literature analysis framework based on a state-of-art vector-based topic modeling algorithm with multiple embedding techniques, unveiling the research trends surrounding HPC utilization in healthcare. The proposed pipeline consists of four phases: paper extraction, data preprocessing, topic modeling and outlier detection, followed by visualization. It enables the automatic extraction of meaningful topics, exploration of their interrelationships, and identification of emerging research directions in an intuitive manner. The findings highlight the transition of HPC adoption in healthcare from traditional numerical simulation and surgical visualization to emerging topics such as drug discovery, AI-driven medical image analysis, and genomic analysis, as well as correlations and interdisciplinary connections among application domains.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"91 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140841200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational 3D topographic microscopy from terabytes of data per sample 利用每个样本的 TB 级数据进行三维拓扑显微计算
IF 8.1 2区 计算机科学
Journal of Big Data Pub Date : 2024-05-02 DOI: 10.1186/s40537-024-00901-0
Kevin C. Zhou, Mark Harfouche, Maxwell Zheng, Joakim Jönsson, Kyung Chul Lee, Kanghyun Kim, Ron Appel, Paul Reamey, Thomas Doman, Veton Saliu, Gregor Horstmeyer, Seung Ah Lee, Roarke Horstmeyer
{"title":"Computational 3D topographic microscopy from terabytes of data per sample","authors":"Kevin C. Zhou, Mark Harfouche, Maxwell Zheng, Joakim Jönsson, Kyung Chul Lee, Kanghyun Kim, Ron Appel, Paul Reamey, Thomas Doman, Veton Saliu, Gregor Horstmeyer, Seung Ah Lee, Roarke Horstmeyer","doi":"10.1186/s40537-024-00901-0","DOIUrl":"https://doi.org/10.1186/s40537-024-00901-0","url":null,"abstract":"<p>We present a large-scale computational 3D topographic microscope that enables 6-gigapixel profilometric 3D imaging at micron-scale resolution across &gt;110 cm<sup>2</sup> areas over multi-millimeter axial ranges. Our computational microscope, termed STARCAM (Scanning Topographic All-in-focus Reconstruction with a Computational Array Microscope), features a parallelized, 54-camera architecture with 3-axis translation to capture, for each sample of interest, a multi-dimensional, 2.1-terabyte (TB) dataset, consisting of a total of 224,640 9.4-megapixel images. We developed a self-supervised neural network-based algorithm for 3D reconstruction and stitching that jointly estimates an all-in-focus photometric composite and 3D height map across the entire field of view, using multi-view stereo information and image sharpness as a focal metric. The memory-efficient, compressed differentiable representation offered by the neural network effectively enables joint participation of the entire multi-TB dataset during the reconstruction process. Validation experiments on gauge blocks demonstrate a profilometric precision and accuracy of 10 µm or better. To demonstrate the broad utility of our new computational microscope, we applied STARCAM to a variety of decimeter-scale objects, with applications ranging from cultural heritage to industrial inspection.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"33 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信