{"title":"An Optimized Sanitization Approach for Minable Data Publication","authors":"Fan Yang;Xiaofeng Liao","doi":"10.26599/BDMA.2022.9020007","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020007","url":null,"abstract":"Minable data publication is ubiquitous since it is beneficial to sharing/trading data among commercial companies and further facilitates the development of data-driven tasks. Unfortunately, the minable data publication is often implemented by publishers with limited privacy concerns such that the published dataset is minable by malicious entities. It prohibits minable data publication since the published data may contain sensitive information. Thus, it is urgently demanded to present some approaches and technologies for reducing the privacy leakage risks. To this end, in this paper, we propose an optimized sanitization approach for minable data publication (named as SA-MDP). SA-MDP supports association rules mining function while providing privacy protection for specific rules. In SA-MDP, we consider the trade-off between the data utility and the data privacy in the minable data publication problem. To address this problem, SA-MDP designs a customized particle swarm optimization (PSO) algorithm, where the optimization objective is determined by both the data utility and the data privacy. Specifically, we take advantage of PSO to produce new particles, which is achieved by random mutation or learning from the best particle. Hence, SA-MDP can avoid the solutions being trapped into local optima. Besides, we design a proper fitness function to guide the particles to run towards the optimal solution. Additionally, we present a preprocessing method before the evolution process of the customized PSO algorithm to improve the convergence rate. Finally, the proposed SA-MDP approach is performed and verified over several datasets. The experimental results have demonstrated the effectiveness and efficiency of SA-MDP.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 3","pages":"257-269"},"PeriodicalIF":13.6,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9793354/09793357.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68010341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Systematic Review Towards Big Data Analytics in Social Media","authors":"Md. Saifur Rahman;Hassan Reza","doi":"10.26599/BDMA.2022.9020009","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020009","url":null,"abstract":"The recent advancement in internet 2.0 creates a scope to connect people worldwide using society 2.0 and web 2.0 technologies. This new era allows the consumer to directly connect with other individuals, business corporations, and the government. People are open to sharing opinions, views, and ideas on any topic in different formats out loud. This creates the opportunity to make the “Big Social Data” handy by implementing machine learning approaches and social data analytics. This study offers an overview of recent works in social media, data science, and machine learning to gain a wide perspective on social media big data analytics. We explain why social media data are significant elements of the improved data-driven decision-making process. We propose and build the “Sunflower Model of Big Data” to define big data and bring it up to date with technology by combining 5 V's and 10 Bigs. We discover the top ten social data analytics to work in the domain of social media platforms. A comprehensive list of relevant statistical/machine learning methods to implement each of these big data analytics is discussed in this work. “Text Analytics” is the most used analytics in social data analysis to date. We create a taxonomy on social media analytics to meet the need and provide a clear understanding. Tools, techniques, and supporting data type are also discussed in this research work. As a result, researchers will have an easier time deciding which social data analytics would best suit their needs.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 3","pages":"228-244"},"PeriodicalIF":13.6,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9793354/09793356.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68010343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Intelligence Quotient Using Stylometry and Machine Learning Techniques: A Review","authors":"Glory O. Adebayo;Roman V. Yampolskiy","doi":"10.26599/BDMA.2022.9020002","DOIUrl":"https://doi.org/10.26599/BDMA.2022.9020002","url":null,"abstract":"The task of trying to quantify a person's intelligence has been a goal of psychologists for over a century. The area of estimating IQ using stylometry has been a developing area of research and the effectiveness of using machine learning in stylometry analysis for the estimation of IQ has been demonstrated in literature whose conclusions suggest that using a large dataset could improve the quality of estimation. The unavailability of large datasets in this area of research has led to very few publications in IQ estimation from written text. In this paper, we review studies that have been done in IQ estimation and also that have been done in author profiling using stylometry and we conclude that based on the success of IQ estimation and author profiling with stylometry, a study on IQ estimation from written text using stylometry will yield good results if the right dataset is used.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 3","pages":"163-191"},"PeriodicalIF":13.6,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9793354/09793359.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68010345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"News topic detection based on capsule semantic graph","authors":"Shuang Yang;Yan Tang","doi":"10.26599/BDMA.2021.9020023","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020023","url":null,"abstract":"Most news topic detection methods use word-based methods, which easily ignore the relationship among words and have semantic sparsity, resulting in low topic detection accuracy. In addition, the current mainstream probability methods and graph analysis methods for topic detection have high time complexity. For these reasons, we present a news topic detection model on the basis of capsule semantic graph (CSG). The keywords that appear in each text at the same time are modeled as a keyword graph, which is divided into multiple subgraphs through community detection. Each subgraph contains a group of closely related keywords. The graph is used as the vertex of CSG. The semantic relationship among the vertices is obtained by calculating the similarity of the average word vector of each vertex. At the same time, the news text is clustered using the incremental clustering method, where each text uses CSG; that is, the similarity among texts is calculated by the graph kernel. The relationship between vertices and edges is also considered when calculating the similarity. Experimental results on three standard datasets show that CSG can obtain higher precision, recall, and F1 values than several latest methods. Experimental results on large-scale news datasets reveal that the time complexity of CSG is lower than that of probabilistic methods and other graph analysis methods.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 2","pages":"98-109"},"PeriodicalIF":13.6,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9691293/09691297.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67994283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning in nuclear industry: A survey","authors":"Chenwei Tang;Caiyang Yu;Yi Gao;Jianming Chen;Jiaming Yang;Jiuling Lang;Chuan Liu;Ling Zhong;Zhenan He;Jiancheng Lv","doi":"10.26599/BDMA.2021.9020027","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020027","url":null,"abstract":"As a high-tech strategic emerging comprehensive industry, the nuclear industry is committed to the research, production, and processing of nuclear fuel, as well as the development and utilization of nuclear energy Nowadays, the nuclear industry has made remarkable progress in the application fields of nuclear weapons, nuclear power, nuclear medical treatment, radiation processing, and so on. With the development of artificial intelligence and the proposal of "Industry 4.0", more and more artificial intelligence technologies are introduced into the nuclear industry chain to improve production efficiency, reduce operation cost, improve operation safety, and realize risk avoidance. Meanwhile, deep learning, as an important technology of artificial intelligence, has made amazing progress in theoretical and applied research in the nuclear industry, which vigorously promotes the development of informatization, digitization, and intelligence of the nuclear industry. In this paper, we first simply comb and analyze the intelligent demand scenarios in the whole industrial chain of the nuclear industry. Then, we discuss the data types involved in the nuclear industry chain. After that, we investigate the research status of deep learning in the application fields corresponding to different data types in the nuclear industry. Finally, we discuss the limitation and unique challenges of deep learning in the nuclear industry and the future direction of the intelligent nuclear industry.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 2","pages":"140-160"},"PeriodicalIF":13.6,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9691293/09691301.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67834075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding social relationships with person-pair relations","authors":"Hang Zhao;Haicheng Chen;Leilai Li;Hai Wan","doi":"10.26599/BDMA.2021.9020022","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020022","url":null,"abstract":"Social relationship understanding infers existing social relationships among individuals in a given scenario, which has been demonstrated to have a wide range of practical value in reality. However, existing methods infer the social relationship of each person pair in isolation, without considering the context-aware information for person pairs in the same scenario. The context-aware information for person pairs exists extensively in reality, that is, the social relationships of different person pairs in a simple scenario are always related to each other. For instance, if most of the person pairs in a simple scenario have the same social relationship, \"friends\", then the other pairs have a high probability of being \"friends\" or other similar coarse-level relationships, such as \"intimate\". This context-aware information should thus be considered in social relationship understanding. Therefore, this paper proposes a novel end-to-end trainable Person-Pair Relation Network (PPRN), which is a GRU-based graph inference network, to first extract the visual and position information as the person-pair feature information, then enable it to transfer on a fully-connected social graph, and finally utilizes different aggregators to collect different kinds of person-pair information. Unlike existing methods, the method—with its message passing mechanism in the graph model—can infer the social relationship of each person-pair in a joint way (i.e., not in isolation). Extensive experiments on People In Social Context (PISC)- and People In Photo Album (PIPA)-relation datasets show the superiority of our method compared to other methods.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 2","pages":"120-129"},"PeriodicalIF":13.6,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9691293/09691299.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67994284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isaac Kofi Nti;Juanita Ahia Quarcoo;Justice Aning;Godfred Kusi Fosu
{"title":"A mini-review of machine learning in big data analytics: Applications, challenges, and prospects","authors":"Isaac Kofi Nti;Juanita Ahia Quarcoo;Justice Aning;Godfred Kusi Fosu","doi":"10.26599/BDMA.2021.9020028","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020028","url":null,"abstract":"The availability of digital technology in the hands of every citizenry worldwide makes an available unprecedented massive amount of data. The capability to process these gigantic amounts of data in real-time with Big Data Analytics (BDA) tools and Machine Learning (ML) algorithms carries many paybacks. However, the high number of free BDA tools, platforms, and data mining tools makes it challenging to select the appropriate one for the right task. This paper presents a comprehensive mini-literature review of ML in BDA, using a keyword search; a total of 1512 published articles was identified. The articles were screened to 140 based on the study proposed novel taxonomy. The study outcome shows that deep neural networks (15%), support vector machines (15%), artificial neural networks (14%), decision trees (12%), and ensemble learning techniques (11%) are widely applied in BDA. The related applications fields, challenges, and most importantly the openings for future research, are detailed.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 2","pages":"81-97"},"PeriodicalIF":13.6,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9691293/09691296.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67994391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Tong;Leilei Shi;Lu Liu;John Panneerselvam;Zixuan Han
{"title":"A novel influence maximization algorithm for a competitive environment based on social media data analytics","authors":"Jie Tong;Leilei Shi;Lu Liu;John Panneerselvam;Zixuan Han","doi":"10.26599/BDMA.2021.9020024","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020024","url":null,"abstract":"Online social networks are increasingly connecting people around the world. Influence maximization is a key area of research in online social networks, which identifies influential users during information dissemination. Most of the existing influence maximization methods only consider the transmission of a single channel, but real-world networks mostly include multiple channels of information transmission with competitive relationships. The problem of influence maximization in an environment involves selecting the seed node set for certain competitive information, so that it can avoid the influence of other information, and ultimately affect the largest set of nodes in the network. In this paper, the influence calculation of nodes is achieved according to the local community discovery algorithm, which is based on community dispersion and the characteristics of dynamic community structure. Furthermore, considering two various competitive information dissemination cases as an example, a solution is designed for self-interested information based on the assumption that the seed node set of competitive information is known, and a novel influence maximization algorithm of node avoidance based on user interest is proposed. Experiments conducted based on real-world Twitter dataset demonstrates the efficiency of our proposed algorithm in terms of accuracy and time against notable influence maximization algorithms.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 2","pages":"130-139"},"PeriodicalIF":13.6,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9691293/09691300.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67834095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MAGAN: Unsupervised low-light image enhancement guided by mixed-attention","authors":"Renjun Wang;Bin Jiang;Chao Yang;Qiao Li;Bolin Zhang","doi":"10.26599/BDMA.2021.9020020","DOIUrl":"https://doi.org/10.26599/BDMA.2021.9020020","url":null,"abstract":"Most learning-based low-light image enhancement methods typically suffer from two problems. First, they require a large amount of paired data for training, which are difficult to acquire in most cases. Second, in the process of enhancement, image noise is difficult to be removed and may even be amplified. In other words, performing denoising and illumination enhancement at the same time is difficult. As an alternative to supervised learning strategies that use a large amount of paired data, as presented in previous work, this paper presents an mixed-attention guided generative adversarial network called MAGAN for low-light image enhancement in a fully unsupervised fashion. We introduce a mixed-attention module layer, which can model the relationship between each pixel and feature of the image. In this way, our network can enhance a low-light image and remove its noise simultaneously. In addition, we conduct extensive experiments on paired and no-reference datasets to show the superiority of our method in enhancing low-light images.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"5 2","pages":"110-119"},"PeriodicalIF":13.6,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9691293/09691298.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"67834094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Call for Papers: Special Issue on Role & Impact of Advance Technologies AI, ML, and Big Data in Business and Society","authors":"","doi":"10.26599/bdma.2022.9020020","DOIUrl":"https://doi.org/10.26599/bdma.2022.9020020","url":null,"abstract":"","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"30 1","pages":""},"PeriodicalIF":13.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"69029454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}