{"title":"Deep Convolutional Network Based Machine Intelligence Model for Satellite Cloud Image Classification","authors":"Kalyan Kumar Jena;Sourav Kumar Bhoi;Soumya Ranjan Nayak;Ranjit Panigrahi;Akash Kumar Bhoi","doi":"10.26599/BDMA.2021.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020017","url":null,"abstract":"As a huge number of satellites revolve around the earth, a great probability exists to observe and determine the change phenomena on the earth through the analysis of satellite images on a real-time basis. Therefore, classifying satellite images plays strong assistance in remote sensing communities for predicting tropical cyclones. In this article, a classification approach is proposed using Deep Convolutional Neural Network (DCNN), comprising numerous layers, which extract the features through a downsampling process for classifying satellite cloud images. DCNN is trained marvelously on cloud images with an impressive amount of prediction accuracy. Delivery time decreases for testing images, whereas prediction accuracy increases using an appropriate deep convolutional network with a huge number of training dataset instances. The satellite images are taken from the Meteorological & Oceanographic Satellite Data Archival Centre, the organization is responsible for availing satellite cloud images of India and its subcontinent. The proposed cloud image classification shows 94% prediction accuracy with the DCNN framework.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"32-43"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962954.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67846975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haipeng Shi;Huan Chen;Qinghong Yang;Jun Wang;Haihe Shi
{"title":"A Method for Bio-Sequence Analysis Algorithm Development Based on the PAR Platform","authors":"Haipeng Shi;Huan Chen;Qinghong Yang;Jun Wang;Haihe Shi","doi":"10.26599/BDMA.2022.9020030","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020030","url":null,"abstract":"The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics. Numerous solving algorithms are used for these problems, and complex similarities and differences exist among these algorithms for the same problem, causing difficulty for researchers to select the appropriate one. To address this situation, combined with the formal partition-and-recur method, component technology, domain engineering, and generic programming, the paper presents a method for the development of a family of biological sequence analysis algorithms. It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms. The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability, understandability, and development efficiency of particular algorithms.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"11-20"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962956.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicted Mean Vote of Subway Car Environment Based on Machine Learning","authors":"Kangkang Huang;Shihua Lu;Xinjun Li;Ke Feng;Weiwei Chen;Yi Xia","doi":"10.26599/BDMA.2022.9020028","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020028","url":null,"abstract":"The thermal comfort of passengers in the carriage cannot be ignored. Thus, this research aims to establish a prediction model for the thermal comfort of the internal environment of a subway car and find the optimal input combination in establishing the prediction model of the predicted mean vote (PMV) index. Data-driven modeling utilizes data from experiments and questionnaires conducted in Nanjing Metro. Support vector machine (SVM), decision tree (DT), random forest (RF), and logistic regression (LR) were used to build four models. This research aims to select the most appropriate input variables for the predictive model. All possible combinations of 11 input variables were used to determine the most accurate model, with variable selection for each model comprising 102 350 iterations. In the PMV prediction, the RF model was the best when using the correlation coefficients square (R2) as the evaluation indicator (R2: 0.7680, mean squared error (MSE): 0.2868). The variables include clothing temperature (CT), convective heat transfer coefficient between the surface of the human body and the environment (CHTC), black bulb temperature (BBT), and thermal resistance of clothes (TROC). The RF model with MSE as the evaluation index also had the highest accuracy (R2: 0.7676, MSE: 0.2836). The variables include clothing surface area coefficient (CSAC), CT, BBT, and air velocity (AV). The results show that the RF model can efficiently predict the PMV of the subway car environment.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"92-105"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962959.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent Segment Routing: Toward Load Balancing with Limited Control Overheads","authors":"Shu Yang;Ruiyu Chen;Laizhong Cui;Xiaolei Chang","doi":"10.26599/BDMA.2022.9020018","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020018","url":null,"abstract":"Segment routing has been a novel architecture for traffic engineering in recent years. However, segment routing brings control overheads, i.e., additional packets headers should be inserted. The overheads can greatly reduce the forwarding efficiency for a large network, when segment headers become too long. To achieve the best of two targets, we propose the intelligent routing scheme for traffic engineering (IRTE), which can achieve load balancing with limited control overheads. To achieve optimal performance, we first formulate the problem as a mapping problem that maps different flows to key diversion points. Second, we prove the problem is nondeterministic polynomial (NP)-hard by reducing it to a k-dense subgraph problem. To solve this problem, we develop an ant colony optimization algorithm as improved ant colony optimization (IACO), which is widely used in network optimization problems. We also design the load balancing algorithm with diversion routing (LBA-DR), and analyze its theoretical performance. Finally, we evaluate the IRTE in different real-world topologies, and the results show that the IRTE outperforms traditional algorithms, e.g., the maximum bandwidth is 24.6% lower than that of traditional algorithms when evaluating on BellCanada topology.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"55-71"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09963625.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering","authors":"Samin Poudel;Marwan Bikdash","doi":"10.26599/BDMA.2022.9020024","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020024","url":null,"abstract":"We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"72-84"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09963626.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67846982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amit Kumar Rai;Nirupama Mandal;Krishna Kant Singh;Ivan Izonin
{"title":"Satellite Image Classification Using a Hybrid Manta Ray Foraging Optimization Neural Network","authors":"Amit Kumar Rai;Nirupama Mandal;Krishna Kant Singh;Ivan Izonin","doi":"10.26599/BDMA.2022.9020027","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020027","url":null,"abstract":"A semi supervised image classification method for satellite images is proposed in this paper. The satellite images contain enormous data that can be used in various applications. The analysis of the data is a tedious task due to the amount of data and the heterogeneity of the data. Thus, in this paper, a Radial Basis Function Neural Network (RBFNN) trained using Manta Ray Foraging Optimization algorithm (MRFO) is proposed. RBFNN is a three-layer network comprising of input, output, and hidden layers that can process large amounts. The trained network can discover hidden data patterns in unseen data. The learning algorithm and seed selection play a vital role in the performance of the network. The seed selection is done using the spectral indices to further improve the performance of the network. The manta ray foraging optimization algorithm is inspired by the intelligent behaviour of manta rays. It emulates three unique foraging behaviours namelys chain, cyclone, and somersault foraging. The satellite images contain enormous amount of data and thus require exploration in large search space. The spiral movement of the MRFO algorithm enables it to explore large search spaces effectively. The proposed method is applied on pre and post flooding Landsat 8 Operational Land Imager (OLI) images of New Brunswick area. The method was applied to identify and classify the land cover changes in the area induced by flooding. The images are classified using the proposed method and a change map is developed using post classification comparison. The change map shows that a large amount of agricultural area was washed away due to flooding. The measurement of the affected area in square kilometres is also performed for mitigation activities. The results show that post flooding the area covered by water is increased whereas the vegetated area is decreased. The performance of the proposed method is done with existing state-of-the-art methods.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"44-54"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962957.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FingerDTA: A Fingerprint-Embedding Framework for Drug-Target Binding Affinity Prediction","authors":"Xuekai Zhu;Juan Liu;Jian Zhang;Zhihui Yang;Feng Yang;Xiaolei Zhang","doi":"10.26599/BDMA.2022.9020005","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020005","url":null,"abstract":"Many efforts have been exerted toward screening potential drugs for targets, and conducting wet experiments remains a laborious and time-consuming approach. Artificial intelligence methods, such as Convolutional Neural Network (CNN), are widely used to facilitate new drug discovery. Owing to the structural limitations of CNN, features extracted from this method are local patterns that lack global information. However, global information extracted from the whole sequence and local patterns extracted from the special domain can influence the drugtarget affinity. A fusion of global information and local patterns can construct neural network calculations closer to actual biological processes. This paper proposes a Fingerprint-embedding framework for Drug-Target binding Affinity prediction (FingerDTA), which uses CNN to extract local patterns and utilize fingerprints to characterize global information. These fingerprints are generated on the basis of the whole sequence of drugs or targets. Furthermore, FingerDTA achieves comparable performance on Davis and KIBA data sets. In the case study of screening potential drugs for the spike protein of the coronavirus disease 2019 (COVID-19), 7 of the top 10 drugs have been confirmed potential by literature. Ultimately, the docking experiment demonstrates that FingerDTA can find novel drug candidates for targets. All codes are available at http://lanproxy.biodwhu.cn:9099/mszjaas/FingerDTA.git.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"1-10"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09963624.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WTASR: Wavelet Transformer for Automatic Speech Recognition of Indian Languages","authors":"Tripti Choudhary;Vishal Goyal;Atul Bansal","doi":"10.26599/BDMA.2022.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020017","url":null,"abstract":"Automatic speech recognition systems are developed for translating the speech signals into the corresponding text representation. This translation is used in a variety of applications like voice enabled commands, assistive devices and bots, etc. There is a significant lack of efficient technology for Indian languages. In this paper, an wavelet transformer for automatic speech recognition (WTASR) of Indian language is proposed. The speech signals suffer from the problem of high and low frequency over different times due to variation in speech of the speaker. Thus, wavelets enable the network to analyze the signal in multiscale. The wavelet decomposition of the signal is fed in the network for generating the text. The transformer network comprises an encoder decoder system for speech translation. The model is trained on Indian language dataset for translation of speech into corresponding text. The proposed method is compared with other state of the art methods. The results show that the proposed WTASR has a low word error rate and can be used for effective speech recognition for Indian language.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"85-91"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962811.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human","authors":"Xin Liu;Yaping Lu;Liang Wang;Wei Geng;Xinyi Shi;Xiao Zhang","doi":"10.26599/BDMA.2022.9020031","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020031","url":null,"abstract":"The identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases, facilitating studies based on computational methods. In this study, we proposed a new computational approach, rotation forest position-specific scoring matrix (RF-PSSM), to predict the interactions among HCV and human proteins. In particular, PSSM was used to characterize each protein, two-dimensional principal component analysis (2DPCA) was then adopted for feature extraction of PSSM. Finally, rotation forest (RF) was used to implement classification. The results of various ablation experiments show that on independent datasets, the accuracy and area under curve (AUC) value of RF-PSSM can reach 93.74\u0000<sup>%</sup>\u0000 and 94.29%, respectively, outperforming almost all cutting-edge research. In addition, we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1, which can provide theoretical guidance for future experimental studies.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"21-31"},"PeriodicalIF":13.6,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962955.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68007734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recommendation System with Biclustering","authors":"Jianjun Sun;Yu Zhang","doi":"10.26599/BDMA.2022.9020012","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020012","url":null,"abstract":"The massive growth of online commercial data has raised the request for an automatic recommender system to benefit both users and merchants. One of the most frequently used recommendation methods is collaborative filtering, but its accuracy is limited by the sparsity of the rating dataset. Most existing collaborative filtering methods consider all features when calculating user/item similarity and ignore much local information. In collaborative filtering, selecting neighbors and determining users' similarities are the most important parts. For the selection of better neighbors, this study proposes a novel biclustering method based on modified fuzzy adaptive resonance theory. To reflect the similarity between users, a new measure that considers the effect of the number of users' common items is proposed. Specifically, the proposed novel biclustering method is first adopted to obtain local similarity and local prediction. Second, item-based collaborative filtering is used to generate global predictions. Finally, the two resultant predictions are fused to obtain a final one. Experiment results demonstrate that the proposed method outperforms state-of-the-art models in terms of several aspects on three benchmark datasets.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 4","pages":"282-293"},"PeriodicalIF":13.6,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9832761/09832768.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68067882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}