{"title":"Evolutionary computation-based self-supervised learning for image processing: a big data-driven approach to feature extraction and fusion for multispectral object detection","authors":"Xiaoyang Shen, Haibin Li, Achyut Shankar, Wattana Viriyasitavat, Vinay Chamola","doi":"10.1186/s40537-024-00988-5","DOIUrl":"https://doi.org/10.1186/s40537-024-00988-5","url":null,"abstract":"<p>The image object recognition and detection technology are widely used in many scenarios. In recent years, big data has become increasingly abundant, and big data-driven artificial intelligence models have attracted more and more attention. Evolutionary computation has also provided a powerful driving force for the optimization and improvement of deep learning models. In this paper, we propose an image object detection method based on self-supervised and data-driven learning. Differ from other methods, our approach stands out due to its innovative use of multispectral data fusion and evolutionary computation for model optimization. Specifically, our method uniquely combines visible light images and infrared images to detect and identify image targets. Firstly, we utilize a self-supervised learning method and the AutoEncoder model to perform high-dimensional feature extraction on the two types of images. Secondly, we fuse the extracted features from the visible light and infrared images to detect and identify objects. Thirdly, we introduce a model parameter optimization method using evolutionary learning algorithms to enhance model performance. Validation on public datasets shows that our method achieves comparable or superior performance to existing methods.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"6 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A model for investment type recommender system based on the potential investors based on investors and experts feedback using ANFIS and MNN","authors":"Asefeh Asemi, Adeleh Asemi, Andrea Ko","doi":"10.1186/s40537-024-00965-y","DOIUrl":"https://doi.org/10.1186/s40537-024-00965-y","url":null,"abstract":"<p>This article presents an investment recommender system based on an Adaptive Neuro-Fuzzy Inference System (ANFIS) and pre-trained weights from a Multimodal Neural Network (MNN). The model is designed to support the investment process for the customers and takes into consideration seven factors to implement the proposed investment system model through the customer or potential investor data set. The system takes input from a web-based questionnaire that collects data on investors' preferences and investment goals. The data is then preprocessed and clustered using ETL tools, JMP, MATLAB, and Python. The ANFIS-based recommender system is designed with three inputs and one output and trained using a hybrid approach over three epochs with 188 data pairs and 18 fuzzy rules. The system's performance is evaluated using metrics such as RMSE, accuracy, precision, recall, and F1-score. The system is also designed to incorporate expert feedback and opinions from investors to customize and improve investment recommendations. The article concludes that the proposed ANFIS-based investment recommender system is effective and accurate in generating investment recommendations that meet investors' preferences and goals.</p><h3 data-test=\"abstract-sub-heading\">Graphical abstract</h3>\u0000","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"9 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xisong Liang, Jie Wen, Chunrun Qu, Nan Zhang, Ziyu Dai, Hao Zhang, Peng Luo, Ming Meng, Zhixiong Liu, Fan Fan, Quan Cheng
{"title":"Inhibitory neuron links the causal relationship from air pollution to psychiatric disorders: a large multi-omics analysis","authors":"Xisong Liang, Jie Wen, Chunrun Qu, Nan Zhang, Ziyu Dai, Hao Zhang, Peng Luo, Ming Meng, Zhixiong Liu, Fan Fan, Quan Cheng","doi":"10.1186/s40537-024-00960-3","DOIUrl":"https://doi.org/10.1186/s40537-024-00960-3","url":null,"abstract":"<p>Psychiatric disorders are severe health challenges that exert a heavy public burden. Air pollution has been widely reported as related to psychiatric disorder risk, but their casual association and pathological mechanism remained unclear. Herein, we systematically investigated the large genome-wide association studies (6 cohorts with 1,357,645 samples), single-cell RNA (26 samples with 157,488 cells), and bulk-RNAseq (1595 samples) datasets to reveal the genetic causality and biological link between four air pollutants and nine psychiatric disorders. As a result, we identified ten positive genetic correlations between air pollution and psychiatric disorders. Besides, PM2.5 and NO<sub>2</sub> presented significant causal effects on schizophrenia risk which was robust with adjustment of potential confounders. Besides, transcriptome-wide association studies identified the shared genes between PM2.5/NO2 and schizophrenia. We then discovered a schizophrenia-derived inhibitory neuron subtype with highly expressed shared genes and abnormal synaptic and metabolic pathways by scRNA analyses and confirmed their abnormal level and correlations with the shared genes in schizophrenia patients in a large RNA-seq cohort. Comprehensively, we discovered robust genetic causality between PM2.5, NO<sub>2</sub>, and schizophrenia and identified an abnormal inhibitory neuron subtype that links schizophrenia pathology and PM2.5/NO2 exposure. These discoveries highlight the schizophrenia risk under air pollutants exposure and provide novel mechanical insights into schizophrenia pathology, contributing to pollutant-related schizophrenia risk control and therapeutic strategies development.</p><h3 data-test=\"abstract-sub-heading\">Graphical Abstract</h3>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"58 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling the impact of BDA-AI on sustainable innovation ambidexterity and environmental performance","authors":"Chin-Tsu Chen, Asif Khan, Shih-Chih Chen","doi":"10.1186/s40537-024-00995-6","DOIUrl":"https://doi.org/10.1186/s40537-024-00995-6","url":null,"abstract":"<p>Data has evolved into one of the principal resources for contemporary businesses. Moreover, corporations have undergone digitalization; consequently, their supply chains generate substantial amounts of data. The theoretical framework of this investigation was built on novel concepts like big data analytics—artificial intelligence (BDA-AI) and supply chain ambidexterity’s (SCA) direct impacts on sustainable supply chain management (SSCM) and indirect impacts on sustainable innovation ambidexterity (SIA) and environmental performance (EP). This study selected employees of manufacturing industries as respondents for environmental performance, sustainable supply chain management, big data analytics, artificial intelligence, and supply chain ambidexterity. The results from this study show that BDA-AI and SCA significantly affect SSCM. SSCM has significant associations with SIA and EP. Finally, SIA has a significant impact on EP. According to the results indicating the indirect impacts, BDA-AI has significant indirect relationships with SIA and EP by having SSCM as the mediating variable. Furthermore, SCA has significant indirect associations with SIA and EP, with SSCM as the mediating variable. Additionally, both BDA-AI and SCA have significant indirect associations with EP, while SIA and SSCM are mediating variables. Finally, SSCM has an indirect association with EP while having SIA as a mediating variable. The findings of this paper provide several theoretical contributions to the research in sustainability and big data analytics artificial intelligence field. Furthermore, based on the suggested framework, this study offers a number of practical implications for decision-makers to improve significantly in the supply chain and BDA-AI. For instance, this paper provides significant insight for logistics and supply chain managers, supporting them in implementing BDA-AI solutions to help SSCM and enhance EP.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"13 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W. Rusik, Mohd Nor Azizi Shabudin, Shahirah Shazana A. Rahman, Harikrishna Kulaveerasingam, David Ross Appleton
{"title":"Enhancing oil palm segmentation model with GAN-based augmentation","authors":"Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W. Rusik, Mohd Nor Azizi Shabudin, Shahirah Shazana A. Rahman, Harikrishna Kulaveerasingam, David Ross Appleton","doi":"10.1186/s40537-024-00990-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00990-x","url":null,"abstract":"<p>In digital agriculture, accurate crop detection is fundamental to developing automated systems for efficient plantation management. For oil palm, the main challenge lies in developing robust models that perform well in different environmental conditions. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (< 5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%. The GAN-based model achieved comparable result, with precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (> 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%. The GAN-based model achieved a significantly better result, with a precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"25 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI sees beyond humans: automated diagnosis of myopia based on peripheral refraction map using interpretable deep learning","authors":"Yong Tang, Zhenghua Lin, Linjing Zhou, Weijia Wang, Longbo Wen, Yongli Zhou, Zongyuan Ge, Zhao Chen, Weiwei Dai, Zhikuan Yang, He Tang, Weizhong Lan","doi":"10.1186/s40537-024-00989-4","DOIUrl":"https://doi.org/10.1186/s40537-024-00989-4","url":null,"abstract":"<p>The question of whether artificial intelligence (AI) can surpass human capabilities is crucial in the application of AI in clinical medicine. To explore this, an interpretable deep learning (DL) model was developed to assess myopia status using retinal refraction maps obtained with a novel peripheral refractor. The DL model demonstrated promising performance, achieving an AUC of 0.9074 (95% CI 0.83–0.97), an accuracy of 0.8140 (95% CI 0.70–0.93), a sensitivity of 0.7500 (95% CI 0.51–0.90), and a specificity of 0.8519 (95% CI 0.68–0.94). Grad-CAM analysis provided interpretable visualization of the attention of DL model and revealed that the DL model utilized information from the central retina, similar to human readers. Additionally, the model considered information from vertical regions across the central retina, which human readers had overlooked. This finding suggests that AI can indeed surpass human capabilities, bolstering our confidence in the use of AI in clinical practice, especially in new scenarios where prior human knowledge is limited.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"23 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdul Rasheed Mahesar, Xiaoping Li, Dileep Kumar Sajnani
{"title":"Efficient microservices offloading for cost optimization in diverse MEC cloud networks","authors":"Abdul Rasheed Mahesar, Xiaoping Li, Dileep Kumar Sajnani","doi":"10.1186/s40537-024-00975-w","DOIUrl":"https://doi.org/10.1186/s40537-024-00975-w","url":null,"abstract":"<p>In recent years, mobile applications have proliferated across domains such as E-banking, Augmented Reality, E-Transportation, and E-Healthcare. These applications are often built using microservices, an architectural style where the application is composed of independently deployable services focusing on specific functionalities. Mobile devices cannot process these microservices locally, so traditionally, cloud-based frameworks using cost-efficient Virtual Machines (VMs) and edge servers have been used to offload these tasks. However, cloud frameworks suffer from extended boot times and high transmission overhead, while edge servers have limited computational resources. To overcome these challenges, this study introduces a Microservices Container-Based Mobile Edge Cloud Computing (MCBMEC) environment and proposes an innovative framework, Optimization Task Scheduling and Computational Offloading with Cost Awareness (OTSCOCA). This framework addresses Resource Matching, Task Sequencing, and Task Scheduling to enhance server utilization, reduce service latency, and improve service bootup times. Empirical results validate the efficacy of MCBMEC and OTSCOCA, demonstrating significant improvements in server efficiency, reduced service latency, faster service bootup times, and notable cost savings. These outcomes underscore the pivotal role of these methodologies in advancing mobile edge computing applications amidst the challenges of edge server limitations and traditional cloud-based approaches.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"1 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting startup success using two bias-free machine learning: resolving data imbalance using generative adversarial networks","authors":"Jungryeol Park, Saesol Choi, Yituo Feng","doi":"10.1186/s40537-024-00993-8","DOIUrl":"https://doi.org/10.1186/s40537-024-00993-8","url":null,"abstract":"<p>The success of newly established companies holds significant implications for community development and economic growth. However, startups often grapple with heightened vulnerability to market volatility, which can lead to early-stage failures. This study aims to predict startup success by addressing biases in existing predictive models. Previous research has examined external factors such as market dynamics and internal elements like founder characteristics.While such efforts have contributed to understanding success mechanisms, challenges persist, including predictor and learning data biases. This study proposes a novel approach by constructing independent variables using early-stage information, incorporating founder attributes, and mitigating class imbalance through generative adversarial networks (GAN). Our proposed model aims to enhance investment decision-making efficiency and effectiveness, offering a valuable decision support system for various venture capital funds.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"4 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CTGAN-ENN: a tabular GAN-based hybrid sampling method for imbalanced and overlapped data in customer churn prediction","authors":"I Nyoman Mahayasa Adiputra, Paweena Wanchai","doi":"10.1186/s40537-024-00982-x","DOIUrl":"https://doi.org/10.1186/s40537-024-00982-x","url":null,"abstract":"<p>Class imbalance is one of many problems of customer churn datasets. One of the common problems is class overlap, where the data have a similar instance between classes. The prediction task of customer churn becomes more challenging when there is class overlap in the data training. In this research, we suggested a hybrid method based on tabular GANs, called CTGAN-ENN, to address class overlap and imbalanced data in datasets of customers that churn. We used five different customer churn datasets from an open platform. CTGAN is a tabular GAN-based oversampling to address class imbalance but has a class overlap problem. We combined CTGAN with the ENN under-sampling technique to overcome the class overlap. CTGAN-ENN reduced the number of class overlaps by each feature in all datasets. We investigated how effective CTGAN-ENN is in each machine learning technique. Based on our experiments, CTGAN-ENN achieved satisfactory results in KNN, GBM, XGB and LGB machine learning performance for customer churn predictions. We compared CTGAN-ENN with common over-sampling and hybrid sampling methods, and CTGAN-ENN achieved outperform results compared with other sampling methods and algorithm-level methods with cost-sensitive learning in several machine learning algorithms. We provide a time consumption algorithm between CTGAN and CTGAN-ENN. CTGAN-ENN achieved less time consumption than CTGAN. Our research work provides a new framework to handle customer churn prediction problems with several types of imbalanced datasets and can be useful in real-world data from customer churn prediction.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"78 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cartographies of warfare in the Indian subcontinent: Contextualizing archaeological and historical analysis through big data approaches","authors":"Monica L. Smith, Connor Newton","doi":"10.1186/s40537-024-00962-1","DOIUrl":"https://doi.org/10.1186/s40537-024-00962-1","url":null,"abstract":"<p>Some of the most notable human behavioral palimpsests result from warfare and its durable traces in the form of defensive architecture and strategic infrastructure. For premodern periods, this architecture is often understudied at the large scale, resulting in a lack of appreciation for the enormity of the costs and impacts of military spending over the course of human history. In this article, we compare the information gleaned from the study of the fortified cities of the Early Historic period of the Indian subcontinent (c. 3rd century BCE to 4th century CE) with the precolonial medieval era (9-17th centuries CE). Utilizing in-depth archaeological and historical studies along with local sightings and citizen-science blogs to create a comprehensive data set and map series in a “big-data” approach that makes use of heterogeneous data sets and presence-absence criteria, we discuss how the architecture of warfare shifted from an emphasis on urban defense in the Early Historic period to an emphasis on territorial offense and defense in the medieval period. Many medieval fortifications are known from only local reports and have minimal identifying information but can still be studied in the aggregate using a least-shared denominator approach to quantification and mapping.</p>","PeriodicalId":15158,"journal":{"name":"Journal of Big Data","volume":"14 1","pages":""},"PeriodicalIF":8.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142186358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}