Big Data ResearchPub Date : 2022-11-28DOI: 10.1016/j.bdr.2022.100346
Xinhong Zhang , Boyan Zhang , Binjie Wang , Fan Zhang
{"title":"Automatic Prediction of T2/T3 Staging of Rectal Cancer Based on Radiomics and Machine Learning","authors":"Xinhong Zhang , Boyan Zhang , Binjie Wang , Fan Zhang","doi":"10.1016/j.bdr.2022.100346","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100346","url":null,"abstract":"<div><p><span>The staging of rectal cancer is very important to determine the treatment plans. This study investigated the relationship between the imaging features and the rectal cancer staging, so that the staging of rectal cancer can be automatically predicted based on the imaging features. A total of 81 patients who underwent with T2 or T3 stage rectal cancer from April 2018 to March 2019 were included. Firstly, tumor was labeled by the radiologist to outline the ROI (region of interest) in the high-resolution MRI images. Then the ROI was segmented by FCNN model and MedicalNet model. Secondly, features of the ROI were extracted by radiomics method. Thirdly, the key features were screened out from large number of features. Finally, a </span>machine learning<span><span> model was trained to predict rectal cancer stage. Two machine learning tools, back-projected neural network (BPNN) and </span>support vector machine method (SVM) were used for the T2/T3 staging prediction of rectal cancer. The accuracy of our methods was 88.2%∼90.5% in the testing dataset, with a confidence interval of 95%, the sensitivity was 90.8%∼91.2%, the specificity was 85.9%∼87.6%, which were better than the traditional method. The area under the curve (AUC) of the BPNN method was 0.81 ± 0.01, which had better prediction performance than the SVM method (AUC = 0.75 ± 0.03). Some of the radiomics features have a significant relationship with the T2/T3 stage of rectal cancer, so it is possible to effectively predict the T2/T3 stage of rectal cancer using the selected radiomics features and machine learning methods.</span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100346"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89991697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-28DOI: 10.1016/j.bdr.2022.100358
Zhiqiang Liu, Xuanhua Shi, Hai Jin
{"title":"Data-Efficient Performance Modeling for Configurable Big Data Frameworks by Reducing Information Overlap Between Training Examples","authors":"Zhiqiang Liu, Xuanhua Shi, Hai Jin","doi":"10.1016/j.bdr.2022.100358","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100358","url":null,"abstract":"<div><p><span>To support the various analysis application of big data<span>, big data processing<span> frameworks are designed to be highly configurable. However, for common users, it is difficult to tailor the configurable frameworks to achieve optimal performance for every application. Recently, many automatic tuning methods are proposed to configure these frameworks. In detail, these methods firstly build a performance prediction model through sampling configurations randomly and measuring the corresponding performance. Then, they conduct heuristic search in the </span></span></span>configuration space based on the performance prediction model. For most frameworks, it is too expensive to build the performance model since it needs to measure the performance of large amounts of configurations, which cause too much overhead on data collection. In this paper, we propose a novel data-efficient method to build the performance model with little impact on prediction accuracy. Compared to the traditional methods, the proposed method can reduce the overhead of data collection because it can train the performance model with much less training examples. Specifically, the proposed method can actively sample the important examples according to the dynamic requirement of the performance model during the iterative model updating. Hence, it can make full use of the collected informative data and train the performance model with much less training examples. To sample the important training examples, we employ several virtual performance model to estimate the importance of all candidate configurations efficiently. Experimental results show that our method needs less training examples than traditional methods with little impact on prediction accuracy.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100358"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-28DOI: 10.1016/j.bdr.2022.100335
Gerold Hoelzl , Sebastian Soller , Matthias Kranz
{"title":"Detecting Seasonal Dependencies in Production Lines for Forecast Optimization","authors":"Gerold Hoelzl , Sebastian Soller , Matthias Kranz","doi":"10.1016/j.bdr.2022.100335","DOIUrl":"10.1016/j.bdr.2022.100335","url":null,"abstract":"<div><p>Huge amounts of data are produced inside an industrial production plant every minute. This data is getting more accessible by higher network and computing capabilities. This poses an opportunity to apply methods in real time to support the reliability of production machines. In theory every time series, that is currently monitored by for a breach of thresholds, can be extended with a forecast method. Classical approaches, such as ARIMA and Exponential Smoothing can be used for forecasting. To describe the signal and boost the forecast results we use a clustering method to group each unknown data stream in a seasonality class. This seasonality classes can be used for insight into intra and inter group behaviour between machines and add causality to factory wide correlations. We collected 10000 multiple day segments of multiple identical and different machines. We manually hand labelled the data segments for their seasonality pattern to compare and explain the clustering results. Classes, obtained through clustering, are used to adapt each single forecast model for every machine. For the forecast method we could show improved results by selecting the correct seasonality for each data stream.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100335"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86009235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-28DOI: 10.1016/j.bdr.2022.100344
Kang Xu , Xiaoqiu Lu , Yuan-fang Li , Tongtong Wu , Guilin Qi , Ning Ye , Dong Wang , Zheng Zhou
{"title":"Neural Topic Modeling with Deep Mutual Information Estimation","authors":"Kang Xu , Xiaoqiu Lu , Yuan-fang Li , Tongtong Wu , Guilin Qi , Ning Ye , Dong Wang , Zheng Zhou","doi":"10.1016/j.bdr.2022.100344","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100344","url":null,"abstract":"<div><p><span><span>The emerging neural topic models make topic modeling more easily adaptable and extendable in unsupervised text mining. However, the existing neural topic models are difficult to retain representative information of the documents within the learnt topic representation. Fortunately, Deep Mutual Information Estimation (DMIE), which maximizes the mutual information between input data and the hidden representations to learn a good representation of the input data. DMIE provides a new paradigm for neural topic modeling. In this paper, we propose a neural topic model which incorporates deep mutual information estimation, i.e., Neural Topic Modeling with Deep Mutual Information Estimation (NTM-DMIE). NTM-DMIE is a neural network method for topic learning which maximizes the mutual information between the input documents and their latent topic representation. To learn robust topic representation, we incorporate the </span>discriminator to discriminate negative examples and positive examples via adversarial learning. Moreover, we use both global and local mutual information to preserve the rich information of the input documents in the topic representation. We evaluate NTM-DMIE on several metrics, including accuracy of </span>text clustering, with topic representation, topic uniqueness and topic coherence. Compared to the existing methods, the experimental results show that NTM-DMIE can outperform in all the metrics on the four datasets.</p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100344"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-28DOI: 10.1016/j.bdr.2022.100347
Sabri Allani , Richard Chbeir , Khouloud Salameh , Elio Mansour , Philippe Arnould
{"title":"A Multi-Objective Clustering for Better Data Management in Connected Environment","authors":"Sabri Allani , Richard Chbeir , Khouloud Salameh , Elio Mansour , Philippe Arnould","doi":"10.1016/j.bdr.2022.100347","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100347","url":null,"abstract":"<div><p>Over the past decade, the rapid increase in connected devices has enabled the emergence of new digital ecosystems<span> to provide new opportunities for monitoring and managing systems to optimize overall performance. With these connected environments, data collection and management become increasingly challenging. A significant number of works in the literature have addressed data collection and management based on different contexts (e.g., mobile ad hoc, Peer-2-Peer, and IoT<span> networks). Today, a wired network uses all of these protocols simultaneously, thus highlighting the need to build a standard data collection and management framework that considers all potential user preferences. For this purpose, multi-objective clustering has been utilized as a promising solution to ensure the stability of connected devices during the collection and management of data. In this paper, we introduce a new multi-objective clustering (MOC) technique based on various criteria for cluster construction and head selection in connected environments. More precisely, the proposed solution is based hypergraphs to represent the connected environment and clusters according to similarities between heterogeneous devices. Then, a cross-sectional hypergraph algorithm is applied to select the cluster heads. Experiments conducted show that our solution outperforms the pioneering literature methods in terms of performance and effectiveness.</span></span></p></div>","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"30 ","pages":"Article 100347"},"PeriodicalIF":3.3,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91599166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-01DOI: 10.1016/j.bdr.2022.100354
Fabrizio Maturo, A. Porreca
{"title":"Augmented Functional Analysis of Variance (A-fANOVA): Theory and Application to Google Trends for Detecting Differences in Abortion Drugs Queries","authors":"Fabrizio Maturo, A. Porreca","doi":"10.1016/j.bdr.2022.100354","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100354","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"22 1","pages":"100354"},"PeriodicalIF":3.3,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73890706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-01DOI: 10.1016/j.bdr.2022.100358
Zhiqiang Liu, Xuanhua Shi, Haici Jin
{"title":"Data-Efficient Performance Modeling for Configurable Big Data Frameworks by Reducing Information Overlap Between Training Examples","authors":"Zhiqiang Liu, Xuanhua Shi, Haici Jin","doi":"10.1016/j.bdr.2022.100358","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100358","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"95 1","pages":"100358"},"PeriodicalIF":3.3,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84506965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big Data ResearchPub Date : 2022-11-01DOI: 10.1016/j.bdr.2022.100356
Xiulin Zheng, Peipei Li, Xindong Wu
{"title":"Data Stream Classification Based on Extreme Learning Machine: A Review","authors":"Xiulin Zheng, Peipei Li, Xindong Wu","doi":"10.1016/j.bdr.2022.100356","DOIUrl":"https://doi.org/10.1016/j.bdr.2022.100356","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"60 4 1","pages":"100356"},"PeriodicalIF":3.3,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90118856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linked Open Government Data to Predict and Explain House Prices: The Case of Scottish Statistics Portal","authors":"Areti Karamanou, E. Kalampokis, K. Tarabanis","doi":"10.2139/ssrn.4123599","DOIUrl":"https://doi.org/10.2139/ssrn.4123599","url":null,"abstract":"","PeriodicalId":56017,"journal":{"name":"Big Data Research","volume":"99 1","pages":"100355"},"PeriodicalIF":3.3,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80955177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}