Eric Austin, Shraddha Makwana, Amine Trabelsi, Christine Largeron, Osmar R Zaïane
{"title":"Uncovering Flat and Hierarchical Topics by Community Discovery on Word Co-occurrence Network.","authors":"Eric Austin, Shraddha Makwana, Amine Trabelsi, Christine Largeron, Osmar R Zaïane","doi":"10.1007/s41019-023-00239-2","DOIUrl":"10.1007/s41019-023-00239-2","url":null,"abstract":"<p><p>Topic modeling aims to discover latent themes in collections of text documents. It has various applications across fields such as sociology, opinion analysis, and media studies. In such areas, it is essential to have easily interpretable, diverse, and coherent topics. An efficient topic modeling technique should accurately identify flat and hierarchical topics, especially useful in disciplines where topics can be logically arranged into a tree format. In this paper, we propose Community Topic, a novel algorithm that exploits word co-occurrence networks to mine communities and produces topics. We also evaluate the proposed approach using several metrics and compare it with usual baselines, confirming its good performances. Community Topic enables quick identification of flat topics and topic hierarchy, facilitating the on-demand exploration of sub- and super-topics. It also obtains good results on datasets in different languages.</p>","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"9 1","pages":"41-61"},"PeriodicalIF":4.2,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10980674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140337633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Forkan, Yongjin Kang, Felip Martí, Abhik Banerjee, Chris McCarthy, Hadi Ghaderi, Breno Costa, Anas Dawod, Dimitrios Georgakopolous, P. Jayaraman
{"title":"AIoT-CitySense: AI and IoT-Driven City-Scale Sensing for Roadside Infrastructure Maintenance","authors":"A. Forkan, Yongjin Kang, Felip Martí, Abhik Banerjee, Chris McCarthy, Hadi Ghaderi, Breno Costa, Anas Dawod, Dimitrios Georgakopolous, P. Jayaraman","doi":"10.1007/s41019-023-00236-5","DOIUrl":"https://doi.org/10.1007/s41019-023-00236-5","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":" 34","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138962482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anomaly Detection with Sub-Extreme Values: Health Provider Billing","authors":"Rob Muspratt, Musa Mammadov","doi":"10.1007/s41019-023-00234-7","DOIUrl":"https://doi.org/10.1007/s41019-023-00234-7","url":null,"abstract":"","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"90 1","pages":""},"PeriodicalIF":4.2,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139211057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sharon Torao Pingi, Duoyi Zhang, Md Abul Bashar, Richi Nayak
{"title":"Joint Representation Learning with Generative Adversarial Imputation Network for Improved Classification of Longitudinal Data","authors":"Sharon Torao Pingi, Duoyi Zhang, Md Abul Bashar, Richi Nayak","doi":"10.1007/s41019-023-00232-9","DOIUrl":"https://doi.org/10.1007/s41019-023-00232-9","url":null,"abstract":"Abstract Generative adversarial networks (GANs) have demonstrated their effectiveness in generating temporal data to fill in missing values, enhancing the classification performance of time series data. Longitudinal datasets encompass multivariate time series data with additional static features that contribute to sample variability over time. These datasets often encounter missing values due to factors such as irregular sampling. However, existing GAN-based imputation methods that address this type of data missingness often overlook the impact of static features on temporal observations and classification outcomes. This paper presents a novel method, fusion-aided imputer-classifier GAN (FaIC-GAN), tailored for longitudinal data classification. FaIC-GAN simultaneously leverages partially observed temporal data and static features to enhance imputation and classification learning. We present four multimodal fusion strategies that effectively extract correlated information from both static and temporal modalities. Our extensive experiments reveal that FaIC-GAN successfully exploits partially observed temporal data and static features, resulting in improved classification accuracy compared to unimodal models. Our post-additive and attention-based multimodal fusion approaches within the FaIC-GAN model consistently rank among the top three methods for classification.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135996121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Reinduction-Based Approach for Efficient High Utility Itemset Mining from Incremental Datasets","authors":"Pushp Sra, Satish Chand","doi":"10.1007/s41019-023-00229-4","DOIUrl":"https://doi.org/10.1007/s41019-023-00229-4","url":null,"abstract":"Abstract High utility itemset mining is a crucial research area that focuses on identifying combinations of itemsets from databases that possess a utility value higher than a user-specified threshold. However, most existing algorithms assume that the databases are static, which is not realistic for real-life datasets that are continuously growing with new data. Furthermore, existing algorithms only rely on the utility value to identify relevant itemsets, leading to even the earliest occurring combinations being produced as output. Although some mining algorithms adopt a support-based approach to account for itemset frequency, they do not consider the temporal nature of itemsets. To address these challenges, this paper proposes the Scented Utility Miner (SUM) algorithm that uses a reinduction strategy to track the recency of itemset occurrence and mine itemsets from incremental databases. The paper provides a novel approach for mining high utility itemsets from dynamic databases and presents several experiments that demonstrate the effectiveness of the proposed approach.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135244606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanna Zhong, Jiahui Wang, Kun Yue, Liang Duan, Zhengbao Sun, Yan Fang
{"title":"Few-Shot Relation Prediction of Knowledge Graph via Convolutional Neural Network with Self-Attention","authors":"Shanna Zhong, Jiahui Wang, Kun Yue, Liang Duan, Zhengbao Sun, Yan Fang","doi":"10.1007/s41019-023-00230-x","DOIUrl":"https://doi.org/10.1007/s41019-023-00230-x","url":null,"abstract":"Abstract Knowledge graph (KG) has become the vital resource for various applications like question answering and recommendation system. However, several relations in KG only have few observed triples, which makes it necessary to develop the method for few-shot relation prediction. In this paper, we propose the C onvolutional Neural Network with Self- A ttention R elation P rediction (CARP) model to predict new facts with few observed triples. First, to learn the relation property features, we build a feature encoder by using the convolutional neural network with self-attention from the few observed triples rather than background knowledge. Then, by incorporating the learned features, we give an embedding network to learn the representation of incomplete triples. Finally, we give the loss function and training algorithm of our CARP model. Experimental results on three real-world datasets show that our proposed method improves Hits@10 by 48% on average over the state-of-the-art competitors.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136309292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Keywords Search in Temporal Social Networks","authors":"Youming Ge, Zitong Chen, Yubao Liu","doi":"10.1007/s41019-023-00218-7","DOIUrl":"https://doi.org/10.1007/s41019-023-00218-7","url":null,"abstract":"Abstract With the increasing of requirements from many aspects, various queries and analyses arise focusing on social network. Time is a common and necessary dimension in various types of social networks. Social networks with time information are called temporal social networks, in which time information can be the time when a user sends message to another user. Keywords search in temporal social networks consists of finding relationships between a group users that has a set of query labels and is valid within the query time interval. It provides assistance in social network analysis, classification of social network users, community detection, etc. However, the existing methods have limitations in solving temporal social network keyword search problems. We propose a basic algorithm, the discrete timestamp algorithm, with the intention of turning the problem into a traditional keyword search on social networks. We also propose an approximative algorithm based on the discrete timestamp algorithm, but it still suffers from the traditional algorithms’ low efficiency. To further improve the performance, we propose a new algorithm based on dynamic programming to solve the keyword search in temporal social network. The main idea is to extend a vertex into a solution by edge-growth operation and tree-merger operation. We also propose two powerful pruning techniques to reduce the intermediate results during the extension. Additionally, all of the algorithms we proposed are capable of handling a variety of ranking functions, and all of them can be made to conform to top-N keyword querying. The efficiency and effectiveness of the proposed algorithms are verified through extensive empirical studies.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136193060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}