{"title":"Identification of Proteins and Genes Associated with Hedgehog Signaling Pathway Involved in Neoplasm Formation Using Text-Mining Approach","authors":"","doi":"10.26599/BDMA.2023.9020007","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020007","url":null,"abstract":"Analysis of molecular mechanisms that lead to the development of various types of tumors is essential for biology and medicine, because it may help to find new therapeutic opportunities for cancer treatment and cure including personalized treatment approaches. One of the pathways known to be important for the development of neoplastic diseases and pathological processes is the Hedgehog signaling pathway that normally controls human embryonic development. Systematic accumulation of various types of biological data, including interactions between proteins, regulation of genes transcription, proteomics, and metabolomics experiments results, allows the application of computational analysis of these big data for identification of key molecular mechanisms of certain diseases and pathologies and promising therapeutic targets. The aim of this study is to develop a computational approach for revealing associations between human proteins and genes interacting with the Hedgehog pathway components, as well as for identifying their roles in the development of various types of tumors. We automatically collect sets of abstract texts from the NCBI PubMed bibliographic database. For recognition of the Hedgehog pathway proteins and genes and neoplastic diseases we use a dictionary-based named entity recognition approach, while for all other proteins and genes machine learning method is used. For association extraction, we develop a set of semantic rules. We complete the results of the text analysis with the gene set enrichment analysis. The identified key pathways that may influence the Hedgehog pathway and their roles in tumor development are then verified using the information in the literature.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"25-93"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10373000","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Limits of Depth: Over-Smoothing and Over-Squashing in GNNs","authors":"Aafaq Mohi ud din;Shaima Qureshi","doi":"10.26599/BDMA.2023.9020019","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020019","url":null,"abstract":"Graph Neural Networks (GNNs) have become a widely used tool for learning and analyzing data on graph structures, largely due to their ability to preserve graph structure and properties via graph representation learning. However, the effect of depth on the performance of GNNs, particularly isotropic and anisotropic models, remains an active area of research. This study presents a comprehensive exploration of the impact of depth on GNNs, with a focus on the phenomena of over-smoothing and the bottleneck effect in deep graph neural networks. Our research investigates the tradeoff between depth and performance, revealing that increasing depth can lead to over-smoothing and a decrease in performance due to the bottleneck effect. We also examine the impact of node degrees on classification accuracy, finding that nodes with low degrees can pose challenges for accurate classification. Our experiments use several benchmark datasets and a range of evaluation metrics to compare isotropic and anisotropic GNNs of varying depths, also explore the scalability of these models. Our findings provide valuable insights into the design of deep GNNs and offer potential avenues for future research to improve their performance.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"205-216"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372997","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PURP: A Scalable System for Predicting Short-Term Urban Traffic Flow Based on License Plate Recognition Data","authors":"Shan Zhang;Qinkai Jiang;Hao Li;Bin Cao;Jing Fan","doi":"10.26599/BDMA.2023.9020017","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020017","url":null,"abstract":"Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time, consequently helping them avoid congestion and accidents to a certain extent. However, the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency, which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure. To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining, herein, we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data (PURP). First, to ensure prediction accuracy, PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition (LPR) data as effective characteristics. Subsequently, to utilize the recent data without retraining the model online, PURP uses the nonparametric method k-Nearest Neighbor (namely KNN) as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online. The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"171-187"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372996","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour
{"title":"Gender-Based Analysis of User Reactions to Facebook Posts","authors":"Yassine El Moudene;Jaafar Idrais;Rida El Abassi;Abderrahim Sabour","doi":"10.26599/BDMA.2023.9020005","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020005","url":null,"abstract":"Online Social Networks (OSNs) are based on the sharing of different types of information and on various interactions (comments, reactions, and sharing). One of these important actions is the emotional reaction to the content. The diversity of reaction types available on Facebook (namely FB) enables users to express their feelings, and its traceability creates and enriches the users' emotional identity in the virtual world. This paper is based on the analysis of 119875012 FB reactions (Like, Love, Haha, Wow, Sad, Angry, Thankful, and Pride) made at multiple levels (publications, comments, and sub-comments) to study and classify the users' emotional behavior, visualize the distribution of different types of reactions, and analyze the gender impact on emotion generation. All of these can be achieved by addressing these research questions: who reacts the most? Which emotion is the most expressed?","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"75-86"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372951","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on Event Tracking in Social Media Data Streams","authors":"Zixuan Han;Leilei Shi;Lu Liu;Liang Jiang;Jiawei Fang;Fanyuan Lin;Jinjuan Zhang;John Panneerselvam;Nick Antonopoulos","doi":"10.26599/BDMA.2023.9020021","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020021","url":null,"abstract":"Social networks are inevitable parts of our daily life, where an unprecedented amount of complex data corresponding to a diverse range of applications are generated. As such, it is imperative to conduct research on social events and patterns from the perspectives of conventional sociology to optimize services that originate from social networks. Event tracking in social networks finds various applications, such as network security and societal governance, which involves analyzing data generated by user groups on social networks in real time. Moreover, as deep learning techniques continue to advance and make important breakthroughs in various fields, researchers are using this technology to progressively optimize the effectiveness of Event Detection (ED) and tracking algorithms. In this regard, this paper presents an in-depth comprehensive review of the concept and methods involved in ED and tracking in social networks. We introduce mainstream event tracking methods, which involve three primary technical steps: ED, event propagation, and event evolution. Finally, we introduce benchmark datasets and evaluation metrics for ED and tracking, which allow comparative analysis on the performance of mainstream methods. Finally, we present a comprehensive analysis of the main research findings and existing limitations in this field, as well as future research prospects and challenges.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"217-243"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372955","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rong Pang;Yan Yang;Aiguo Huang;Yan Liu;Peng Zhang;Guangwu Tang
{"title":"Multi-Scale Feature Fusion Model for Bridge Appearance Defect Detection","authors":"Rong Pang;Yan Yang;Aiguo Huang;Yan Liu;Peng Zhang;Guangwu Tang","doi":"10.26599/BDMA.2022.9020048","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020048","url":null,"abstract":"Although the Faster Region-based Convolutional Neural Network (Faster R-CNN) model has obvious advantages in defect recognition, it still cannot overcome challenging problems, such as time-consuming, small targets, irregular shapes, and strong noise interference in bridge defect detection. To deal with these issues, this paper proposes a novel Multi-scale Feature Fusion (MFF) model for bridge appearance disease detection. First, the Faster R-CNN model adopts Region Of Interest (ROI) pooling, which omits the edge information of the target area, resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects. Therefore, this paper proposes an MFF based on regional feature Aggregation (MFF-A), which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area. Second, the Faster R-CNN model is insensitive to small targets, irregular shapes, and strong noises in bridge defect detection, which results in a long training time and low recognition accuracy. Accordingly, a novel Lightweight MFF (namely MFF-L) model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed, which fuses multi-scale features to shorten the training speed and improve recognition accuracy. Finally, the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372954","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cell Consistency Evaluation Method Based on Multiple Unsupervised Learning Algorithms","authors":"Jiang Chang;Xianglong Gu;Jieyun Wu;Debu Zhang","doi":"10.26599/BDMA.2023.9010003","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9010003","url":null,"abstract":"Unsupervised learning algorithms can effectively solve sample imbalance. To address battery consistency anomalies in new energy vehicles, we adopt a variety of unsupervised learning algorithms to evaluate and predict the battery consistency of three vehicles using charging fragment data from actual operating conditions. We extract battery-related features, such as the mean of maximum difference, standard deviation, and entropy of batteries and then apply principal component analysis to reduce the dimensionality and record the amount of preserved information. We then build models through a collection of unsupervised learning algorithms for the anomaly detection of cell consistency faults. We also determine whether unsupervised and supervised learning algorithms can address the battery consistency problem and document the parameter tuning process. In addition, we compare the prediction effectiveness of charging and discharging features modeled individually and in combination, determine the choice of charging and discharging features to be modeled in combination, and visualize the multidimensional data for fault detection. Experimental results show that the unsupervised learning algorithm is effective in visualizing and predicting vehicle core conformance faults, and can accurately predict faults in real time. The “distance-boxplot” algorithm shows the best performance with a prediction accuracy of 80%, a recall rate of 100%, and an F1 of 0.89. The proposed approach can be applied to monitor battery consistency faults in real time and reduce the possibility of disasters arising from consistency faults.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"42-54"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372956","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heng Lin;Zhiyong Wang;Shipeng Qi;Xiaowei Zhu;Chuntao Hong;Wenguang Chen;Yingwei Luo
{"title":"Building a High-Performance Graph Storage on Top of Tree-Structured Key-Value Stores","authors":"Heng Lin;Zhiyong Wang;Shipeng Qi;Xiaowei Zhu;Chuntao Hong;Wenguang Chen;Yingwei Luo","doi":"10.26599/BDMA.2023.9020015","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020015","url":null,"abstract":"Graph databases have gained widespread adoption in various industries and have been utilized in a range of applications, including financial risk assessment, commodity recommendation, and data lineage tracking. While the principles and design of these databases have been the subject of some investigation, there remains a lack of comprehensive examination of aspects such as storage layout, query language, and deployment. The present study focuses on the design and implementation of graph storage layout, with a particular emphasis on tree-structured key-value stores. We also examine different design choices in the graph storage layer and present our findings through the development of TuGraph, a highly efficient single-machine graph database that significantly outperforms well-known Graph DataBase Management System (GDBMS). Additionally, TuGraph demonstrates superior performance in the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB) interactive benchmark.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"156-170"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372995","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Zhao;Ming Guo;Xiangyong Chen;Jianqiang Sun;Jianlong Qiu
{"title":"Attention-Based CNN Fusion Model for Emotion Recognition During Walking Using Discrete Wavelet Transform on EEG and Inertial Signals","authors":"Yan Zhao;Ming Guo;Xiangyong Chen;Jianqiang Sun;Jianlong Qiu","doi":"10.26599/BDMA.2023.9020018","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020018","url":null,"abstract":"Walking as a unique biometric tool conveys important information for emotion recognition. Individuals in different emotional states exhibit distinct walking patterns. For this purpose, this paper proposes a novel approach to recognizing emotion during walking using electroencephalogram (EEG) and inertial signals. Accurate recognition of emotion is achieved by training in an end-to-end deep learning fashion and taking into account multi-modal fusion. Subjects wear virtual reality head-mounted display (VR-HMD) equipment to immerse in strong emotions during walking. VR environment shows excellent imitation and experience ability, which plays an important role in awakening and changing emotions. In addition, the multi-modal signals acquired from EEG and inertial sensors are separately represented as virtual emotion images by discrete wavelet transform (DWT). These serve as input to the attention-based convolutional neural network (CNN) fusion model. The designed network structure is simple and lightweight while integrating the channel attention mechanism to extract and enhance features. To effectively improve the performance of the recognition system, the proposed decision fusion algorithm combines Critic method and majority voting strategy to determine the weight values that affect the final decision results. An investigation is made on the effect of diverse mother wavelet types and wavelet decomposition levels on model performance which indicates that the 2.2-order reverse biorthogonal (rbio2.2) wavelet with two-level decomposition has the best recognition performance. Comparative experiment results show that the proposed method outperforms other existing state-of-the-art works with an accuracy of 98.73%.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"188-204"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372999","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Call for Papers: Special Issue on Big Data Computing for Cyber Physical Social Intelligence","authors":"","doi":"10.26599/BDMA.2023.9020031","DOIUrl":"https://doi.org/10.26599/BDMA.2023.9020031","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"7 1","pages":"245-245"},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10372960","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139041278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}