Pietro Cinaglia, J. L. Vázquez-Poletti, M. Cannataro
{"title":"Massive Parallel Alignment of RNA-seq Reads in Serverless Computing","authors":"Pietro Cinaglia, J. L. Vázquez-Poletti, M. Cannataro","doi":"10.3390/bdcc7020098","DOIUrl":"https://doi.org/10.3390/bdcc7020098","url":null,"abstract":"In recent years, the use of Cloud infrastructures for data processing has proven useful, with a computing potential that is not affected by the limitations of a local infrastructure. In this context, Serverless computing is the fastest-growing Cloud service model due to its auto-scaling methodologies, reliability, and fault tolerance. We present a solution based on in-house Serverless infrastructure, which is able to perform large-scale RNA-seq data analysis focused on the mapping of sequencing reads to a reference genome. The main contribution was bringing the computation of genomic data into serverless computing, focusing on RNA-seq read-mapping to a reference genome, as this is the most time-consuming task for some pipelines. The proposed solution handles massive parallel instances to maximize the efficiency in terms of running time. We evaluated the performance of our solution by performing two main tests, both based on the mapping of RNA-seq reads to Human GRCh38. Our experiments demonstrated a reduction of 79.838%, 90.079%, and 96.382%, compared to the local environments with 16, 8, and 4 virtual cores, respectively. Furthermore, serverless limitations were investigated.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49668088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wisal Khan, Teerath Kumar, Cheng Zhang, Kislay Raj, Arunabha M. Roy, Bin Luo
{"title":"SQL and NoSQL Database Software Architecture Performance Analysis and Assessments—A Systematic Literature Review","authors":"Wisal Khan, Teerath Kumar, Cheng Zhang, Kislay Raj, Arunabha M. Roy, Bin Luo","doi":"10.3390/bdcc7020097","DOIUrl":"https://doi.org/10.3390/bdcc7020097","url":null,"abstract":"The competent software architecture plays a crucial role in the difficult task of big data processing for SQL and NoSQL databases. SQL databases were created to organize data and allow for horizontal expansion. NoSQL databases, on the other hand, support horizontal scalability and can efficiently process large amounts of unstructured data. Organizational needs determine which paradigm is appropriate, yet selecting the best option is not always easy. Differences in database design are what set SQL and NoSQL databases apart. Each NoSQL database type also consistently employs a mixed-model approach. Therefore, it is challenging for cloud users to transfer their data among different cloud storage services (CSPs). There are several different paradigms being monitored by the various cloud platforms (IaaS, PaaS, SaaS, and DBaaS). The purpose of this SLR is to examine the articles that address cloud data portability and interoperability, as well as the software architectures of SQL and NoSQL databases. Numerous studies comparing the capabilities of SQL and NoSQL of databases, particularly Oracle RDBMS and NoSQL Document Database (MongoDB), in terms of scale, performance, availability, consistency, and sharding, were presented as part of the state of the art. Research indicates that NoSQL databases, with their specifically tailored structures, may be the best option for big data analytics, while SQL databases are best suited for online transaction processing (OLTP) purposes.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135288679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Villegas-Ch., Rodrigo Amores-Falconi, Eduardo Coronel-Silva
{"title":"Design Proposal for a Virtual Shopping Assistant for People with Vision Problems Applying Artificial Intelligence Techniques","authors":"W. Villegas-Ch., Rodrigo Amores-Falconi, Eduardo Coronel-Silva","doi":"10.3390/bdcc7020096","DOIUrl":"https://doi.org/10.3390/bdcc7020096","url":null,"abstract":"Accessibility is an increasingly important topic for Ecommerce, especially for individuals with vision problems. To improve their online experience, the design of a voice assistant has been proposed to allow these individuals to browse and shop online more quickly and efficiently. This voice assistant forms an intelligent system that can understand and respond to users’ voice commands. The design considers the visual limitations of the users, such as difficulty reading information on the screen or identifying images. The voice assistant provides detailed product descriptions and ideas in a clear, easy-to-understand voice. In addition, the voice assistant has a series of additional features to improve the shopping experience. For example, the assistant can provide product recommendations based on the user’s previous purchases and information about special promotions and discounts. The main goal of this design is to create an accessible and inclusive online shopping experience for the visually impaired. The voice assistant is based on a conversational user interface, allowing users to easily navigate an eCommerce website, search for products, and make purchases.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46300721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Virtual Reality-Based Digital Twins: A Case Study on Pharmaceutical Cannabis","authors":"Orestis Spyrou, W. Hurst, C. Verdouw","doi":"10.3390/bdcc7020095","DOIUrl":"https://doi.org/10.3390/bdcc7020095","url":null,"abstract":"Digital Twins are digital equivalents of real-life objects. They allow producers to act immediately in case of (expected) deviations and to simulate effects of interventions based on real-life data. Digital Twin and eXtended Reality technologies (including Augmented Reality, Mixed Reality and Virtual Reality technologies), when coupled, are promising solutions to address the challenges of highly regulated crop production, namely the complexity of modern production environments for pharmaceutical cannabis, which are growing constantly as a result of legislative changes. Cannabis farms not only have to meet very high quality standards and regulatory requirements but also have to deal with high production and market uncertainties, including energy considerations. Thus, the main contributions of the research include an architecture design for eXtended-Reality-based Digital Twins for pharmaceutical cannabis production and a proof of concept, which was demonstrated at the Wageningen University Digital Twins conference. A convenience sampling method was used to recruit 30 participants who provided feedback on the application. The findings indicate that, despite 70% being unfamiliar with the concept, 80% of the participants were positive regarding the innovation and creativity.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48678556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of Artificial Intelligence for Fraudulent Banking Operations Recognition","authors":"Bohdan Mytnyk, Oleksandr Tkachyk, Nataliya Shakhovska, Solomiia Fedushko, Yuriy Syerov","doi":"10.3390/bdcc7020093","DOIUrl":"https://doi.org/10.3390/bdcc7020093","url":null,"abstract":"This study considers the task of applying artificial intelligence to recognize bank fraud. In recent years, due to the COVID-19 pandemic, bank fraud has become even more common due to the massive transition of many operations to online platforms and the creation of many charitable funds that criminals can use to deceive users. The present work focuses on machine learning algorithms as a tool well suited for analyzing and recognizing online banking transactions. The study’s scientific novelty is the development of machine learning models for identifying fraudulent banking transactions and techniques for preprocessing bank data for further comparison and selection of the best results. This paper also details various methods for improving detection accuracy, i.e., handling highly imbalanced datasets, feature transformation, and feature engineering. The proposed model, which is based on an artificial neural network, effectively improves the accuracy of fraudulent transaction detection. The results of the different algorithms are visualized, and the logistic regression algorithm performs the best, with an output AUC value of approximately 0.946. The stacked generalization shows a better AUC of 0.954. The recognition of banking fraud using artificial intelligence algorithms is a topical issue in our digital society.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135572682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Improved Pattern Sequence-Based Energy Load Forecast Algorithm Based on Self-Organizing Maps and Artificial Neural Networks","authors":"D. Criado-Ramón, L. Ruiz, M. Pegalajar","doi":"10.3390/bdcc7020092","DOIUrl":"https://doi.org/10.3390/bdcc7020092","url":null,"abstract":"Pattern sequence-based models are a type of forecasting algorithm that utilizes clustering and other techniques to produce easily interpretable predictions faster than traditional machine learning models. This research focuses on their application in energy demand forecasting and introduces two significant contributions to the field. Firstly, this study evaluates the use of pattern sequence-based models with large datasets. Unlike previous works that use only one dataset or multiple datasets with less than two years of data, this work evaluates the models in three different public datasets, each containing eleven years of data. Secondly, we propose a new pattern sequence-based algorithm that uses a genetic algorithm to optimize the number of clusters alongside all other hyperparameters of the forecasting method, instead of using the Cluster Validity Indices (CVIs) commonly used in previous proposals. The results indicate that neural networks provide more accurate results than any pattern sequence-based algorithm and that our proposed algorithm outperforms other pattern sequence-based algorithms, albeit with a longer training time.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47740495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christine Dewi, Abbott Po Shun Chen, Henoch Juli Christanto
{"title":"Recognizing Similar Musical Instruments with YOLO Models","authors":"Christine Dewi, Abbott Po Shun Chen, Henoch Juli Christanto","doi":"10.3390/bdcc7020094","DOIUrl":"https://doi.org/10.3390/bdcc7020094","url":null,"abstract":"Researchers in the fields of machine learning and artificial intelligence have recently begun to focus their attention on object recognition. One of the biggest obstacles in image recognition through computer vision is the detection and identification of similar items. Identifying similar musical instruments can be approached as a classification problem, where the goal is to train a machine learning model to classify instruments based on their features and shape. Cellos, clarinets, erhus, guitars, saxophones, trumpets, French horns, harps, recorders, bassoons, and violins were all classified in this investigation. There are many different musical instruments that have the same size, shape, and sound. In addition, we were amazed by the simplicity with which humans can identify items that are very similar to one another, but this is a challenging task for computers. For this study, we used YOLOv7 to identify pairs of musical instruments that are most like one another. Next, we compared and evaluated the results from YOLOv7 with those from YOLOv5. Furthermore, the results of our tests allowed us to enhance the performance in terms of detecting similar musical instruments. Moreover, with an average accuracy of 86.7%, YOLOv7 outperformed previous approaches and other research results.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43087739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blockchain-Based Double-Layer Byzantine Fault Tolerance for Scalability Enhancement for Building Information Modeling Information Exchange","authors":"Widya Nita Suliyanti, Riri Fitri Sari","doi":"10.3390/bdcc7020090","DOIUrl":"https://doi.org/10.3390/bdcc7020090","url":null,"abstract":"A Practical Byzantine Fault Tolerance (PBFT) is a consensus algorithm deployed in a consortium blockchain that connects a group of related participants. This type of blockchain suits the implementation of the Building Information Modeling (BIM) information exchange with few participants. However, when much more participants are involved in the BIM information exchange, the PBFT algorithm, which inherently requires intensive communications among participating nodes, has limitations in terms of scalability and performance. The proposed solution for a multi-layer BFT hypothesizes that multi-layer BFT reduces communication complexity. However, having more layers will introduce more latency. Therefore, in this paper, Double-Layer Byzantine Fault Tolerance (DLBFT) is proposed to improve the blockchain scalability and performance of BIM information exchange. This study shows a double-layer network structure of nodes that can be built with each node on the first layer, which connects and forms a group with several nodes on the second layer. This network runs the Byzantine Fault Tolerance algorithm to reach a consensus. Instead of having one node send messages to all the nodes in the peer-to-peer network, one node only sends messages to a limited number of nodes on Layer 1 and up to three nodes in each corresponding group in Layer 2 in a hierarchical network. The DLBFT algorithm has been shown to reduce the required number of messages exchanged among nodes by 84% and the time to reach a consensus by 70%, thus improving blockchain scalability. Further research is required if more than one party is involved in multi-BIM projects.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42259504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akriti Sharma, Ayaz Z. Ansari, R. Kakulavarapu, M. Stensen, M. Riegler, H. Hammer
{"title":"Predicting Cell Cleavage Timings from Time-Lapse Videos of Human Embryos","authors":"Akriti Sharma, Ayaz Z. Ansari, R. Kakulavarapu, M. Stensen, M. Riegler, H. Hammer","doi":"10.3390/bdcc7020091","DOIUrl":"https://doi.org/10.3390/bdcc7020091","url":null,"abstract":"Assisted reproductive technology is used for treating infertility, and its success relies on the quality and viability of embryos chosen for uterine transfer. Currently, embryologists manually assess embryo development, including the time duration between the cell cleavages. This paper introduces a machine learning methodology for automating the computations for the start of cell cleavage stages, in hours post insemination, in time-lapse videos. The methodology detects embryo cells in video frames and predicts the frame with the onset of the cell cleavage stage. Next, the methodology reads hours post insemination from the frame using optical character recognition. Unlike traditional embryo cell detection techniques, our suggested approach eliminates the need for extra image processing tasks such as locating embryos or removing extracellular material (fragmentation). The methodology accurately predicts cell cleavage stages up to five cells. The methodology was also able to detect the morphological structures of later cell cleavage stages, such as morula and blastocyst. It takes about one minute for the methodology to annotate the times of all the cell cleavages in a time-lapse video.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49264161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding the Time-Period-Based Most Frequent Path from Trajectory–Topology","authors":"Jianing Ding, Xin Jin, Zhiheng Li","doi":"10.3390/bdcc7020088","DOIUrl":"https://doi.org/10.3390/bdcc7020088","url":null,"abstract":"The Time-Period-Based Most Frequent Path (TPMFP) problem has been a hot topic in traffic studies for many years. The TPMFP problem involves finding the most frequent path between two locations by observing the travelling behaviors of drivers in a specific time period. However, the previous researchers over-simplify the road network, which results in the ignorance of transfer costs at intersections. To address this problem more elegantly, we built up an urban topology model consisting of Intersection Vertices and Connection Vertices. Specifically, we split the Intersection Vertices to eliminate the influence of transfer cost on finding TPMFP and generate Trajectory–Topology from GPS records data. In addition, we further leveraged the Footmark Graph method to find the TPMFP. Finally, we conducted extensive experiments using a real-world dataset containing over eight million GPS records. Compared to the current state-of-the-art method, our proposed approach can find more reasonable MFP in approximately 10% of cases during off-peak hours and 40% of cases during peak hours.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" ","pages":""},"PeriodicalIF":3.7,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44219195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}