{"title":"Robustness verification of k-nearest neighbors by abstract interpretation","authors":"Nicolò Fassina, Francesco Ranzato, Marco Zanella","doi":"10.1007/s10115-024-02108-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02108-4","url":null,"abstract":"<p>We study the certification of stability properties, such as robustness and individual fairness, of the <i>k</i>-nearest neighbor algorithm (<i>k</i>NN). Our approach leverages abstract interpretation, a well-established program analysis technique that has been proven successful in verifying several machine learning algorithms, notably, neural networks, decision trees, and support vector machines. In this work, we put forward an abstract interpretation-based framework for designing a sound approximate version of the <i>k</i>NN algorithm, which is instantiated to the interval and zonotope abstractions for approximating the range of numerical features. We show how this abstraction-based method can be used for stability, robustness, and individual fairness certification of <i>k</i>NN. Our certification technique has been implemented and experimentally evaluated on several benchmark datasets. These experimental results show that our tool can formally prove the stability of <i>k</i>NN classifiers in a precise and efficient way, thus expanding the range of machine learning models amenable to robustness certification.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"2015 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BotCL: a social bot detection model based on graph contrastive learning","authors":"Yan Li, Zhenyu Li, Daofu Gong, Qian Hu, Haoyu Lu","doi":"10.1007/s10115-024-02116-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02116-4","url":null,"abstract":"<p>The proliferation of social bots on social networks presents significant challenges to network security due to their malicious activities. While graph neural network models have shown promise in detecting social bots, acquiring a large number of high-quality labeled accounts remains challenging, impacting bot detection performance. To address this issue, we introduce BotCL, a social bot detection model that employs contrastive learning through data augmentation. Initially, we build a directed graph based on following/follower relationships, utilizing semantic, attribute, and structural features of accounts as initial node features. We then simulate account behaviors within the social network and apply two data augmentation techniques to generate multiple views of the directed graph. Subsequently, we encode the generated views using relational graph convolutional networks, achieving maximum homogeneity in node representations by minimizing the contrastive loss. Finally, node labels are predicted using Softmax. The proposed method augments data based on its distribution, showcasing robustness to noise. Extensive experimental results on Cresci-2015, Twibot-20, and Twibot-22 datasets demonstrate that our approach surpasses the state-of-the-art methods in terms of performance.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ji Liu, Chunlu Chen, Yu Li, Lin Sun, Yulun Song, Jingbo Zhou, Bo Jing, Dejing Dou
{"title":"Enhancing trust and privacy in distributed networks: a comprehensive survey on blockchain-based federated learning","authors":"Ji Liu, Chunlu Chen, Yu Li, Lin Sun, Yulun Song, Jingbo Zhou, Bo Jing, Dejing Dou","doi":"10.1007/s10115-024-02117-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02117-3","url":null,"abstract":"<p>While centralized servers pose a risk of being a single point of failure, decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities. Merging distributed computing with cryptographic techniques, decentralized technologies introduce a novel computing paradigm. Blockchain ensures secure, transparent, and tamper-proof data management by validating and recording transactions via consensus across network nodes. Federated Learning (FL), as a distributed machine learning framework, enables participants to collaboratively train models while safeguarding data privacy by avoiding direct raw data exchange. Despite the growing interest in decentralized methods, their application in FL remains underexplored. This paper presents a thorough investigation into blockchain-based FL (BCFL), spotlighting the synergy between blockchain’s security features and FL’s privacy-preserving model training capabilities. First, we present the taxonomy of BCFL from three aspects, including decentralized, separate networks, and reputation-based architectures. Then, we summarize the general architecture of BCFL systems, providing a comprehensive perspective on FL architectures informed by blockchain. Afterward, we analyze the application of BCFL in healthcare, IoT, and other privacy-sensitive areas. Finally, we identify future research directions of BCFL.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"26 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Douglas Castilho, Thársis T. P. Souza, Soong Moon Kang, João Gama, André C. P. L. F. de Carvalho
{"title":"Forecasting financial market structure from network features using machine learning","authors":"Douglas Castilho, Thársis T. P. Souza, Soong Moon Kang, João Gama, André C. P. L. F. de Carvalho","doi":"10.1007/s10115-024-02095-6","DOIUrl":"https://doi.org/10.1007/s10115-024-02095-6","url":null,"abstract":"<p>We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph, Dynamic Minimal Spanning Tree and Dynamic Threshold Networks. Experimental results show that the proposed model can forecast market structure with high predictive performance with up to <span>(40%)</span> improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sotiris Angelis, Efthymia Moraitou, George Caridakis, Konstantinos Kotis
{"title":"CHEKG: a collaborative and hybrid methodology for engineering modular and fair domain-specific knowledge graphs","authors":"Sotiris Angelis, Efthymia Moraitou, George Caridakis, Konstantinos Kotis","doi":"10.1007/s10115-024-02110-w","DOIUrl":"https://doi.org/10.1007/s10115-024-02110-w","url":null,"abstract":"<p>Ontologies constitute the semantic model of Knowledge Graphs (KGs). This structural association indicates the potential existence of methodological analogies in the development of ontologies and KGs. The deployment of fully and well-defined methodologies for KG development based on existing ontology engineering methodologies (OEMs) has been suggested and efficiently applied. However, most of the modern/recent OEMs may not include tasks that (i) empower knowledge workers and domain experts to closely collaborate with ontology engineers and KG specialists for the development and maintenance of KGs, (ii) satisfy special requirements of KG development, such as (a) ensuring modularity and agility of KGs, (b) assessing and mitigating bias at schema and data levels. Toward this aim, the paper presents a methodology for the Collaborative and Hybrid Engineering of Knowledge Graphs (CHEKG), which constitutes a hybrid (schema-centric/top-down and data-driven/bottom-up), collaborative, agile, and iterative approach for developing modular and fair domain-specific KGs. CHEKG contributes to all phases of the KG engineering lifecycle: from the specification of a KG to its exploitation, evaluation, and refinement. The CHEKG methodology is based on the main phases of the extended Human-Centered Collaborative Ontology Engineering Methodology (ext-HCOME), while it adjusts and expands the individual processes and tasks of each phase according to the specialized requirements of KG development. Apart from the presentation of the methodology per se, the paper presents recent work regarding the deployment and evaluation of the CHEKG methodology for the engineering of semantic trajectories as KGs generated from unmanned aerial vehicles (UAVs) data during real cultural heritage documentation scenarios.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"20 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou, Antonio Corral
{"title":"An academic recommender system on large citation data based on clustering, graph modeling and deep learning","authors":"Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou, Antonio Corral","doi":"10.1007/s10115-024-02094-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02094-7","url":null,"abstract":"<p>Recommendation (recommender) systems (RS) have played a significant role in both research and industry in recent years. In the area of academia, there is a need to help researchers discover the most appropriate and relevant scientific information through recommendations. Nevertheless, we argue that there is a major gap between academic state-of-the-art RS and real-world problems. In this paper, we present a novel multi-staged RS based on clustering, graph modeling and deep learning that manages to run on a full dataset (scientific digital library) in the magnitude of millions users and items (papers). We run several tests (experiments/evaluation) as a means to find the best approach regarding the tuning of our system; so, we present and compare three versions of our RS regarding recall and NDCG metrics. The results show that a multi-staged RS that utilizes a variety of techniques and algorithms is able to face real-world problems and large academic datasets. In this way, we suggest a way to close or minimize the gap between research and industry value RS.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"207 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring aspect-based sentiment analysis: an in-depth review of current methods and prospects for advancement","authors":"Irfan Ali Kandhro, Fayyaz Ali, Mueen Uddin, Asadullah Kehar, Selvakumar Manickam","doi":"10.1007/s10115-024-02104-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02104-8","url":null,"abstract":"<p>Aspect-based sentiment analysis (ABSA) is a natural language processing technique that seeks to recognize and extract the sentiment connected to various qualities or aspects of a specific good, service, or entity. It entails dissecting a text into its component pieces, determining the elements or aspects being examined, and then examining the attitude stated about each feature or aspect. The main objective of this research is to present a comprehensive understanding of aspect-based sentiment analysis (ABSA), such as its potential, ongoing trends and advancements, structure, practical applications, real-world implementation, and open issues. The current sentiment analysis aims to enhance granularity at the aspect level with two main objectives, including extracting aspects and polarity sentiment classification. Three main methods are designed for aspect extractions: pattern-based, machine learning and deep learning. These methods can capture both syntactic and semantic features of text without relying heavily on high-level feature engineering, which was a requirement in earlier approaches. Despite bringing traditional surveys, a comprehensive survey of the procedure for carrying out this task and the applications of ABSA are also included in this article. To fully comprehend each strategy's benefits and drawbacks, it is evaluated, compared, and investigated. To determine future directions, the ABSA’s difficulties are finally reviewed.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"100 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An approach for fuzzy group decision making and consensus measure with hesitant judgments of experts","authors":"Chao Huang, Xiaoyue Wu, Mingwei Lin, Zeshui Xu","doi":"10.1007/s10115-024-02098-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02098-3","url":null,"abstract":"<p>In some actual decision-making problems, experts may be hesitant to judge the performances of alternatives, which leads to experts providing decision matrices with incomplete information. However, most existing estimation methods for incomplete information in group decision-making (GDM) neglect the hesitant judgments of experts, possibly making the group decision outcomes unreasonable. Considering the hesitation degrees of experts in decision judgments, an approach is proposed based on the triangular intuitionistic fuzzy numbers (TIFNs) and TODIM (interactive and multiple criteria decision-making) method for GDM and consensus measure. First, TIFNs are applied to handle incomplete information due to the hesitant judgments of experts. Second, considering the risk attitudes of experts, a decision-making model is proposed to rank alternatives for GDM with incomplete information. Subsequently, based on measuring the concordance between solutions, a consensus model is presented to measure the group’s and individual’s consensus degrees. Finally, an illustrative example is presented to show the detailed implementation procedure of the proposed approach. The comparisons with some existing estimation methods verify the effectiveness of the proposed approach for handling incomplete information. The impacts and necessities of experts’ hesitation degrees are discussed by a sensitivity analysis.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"7 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Protecting the privacy of social network data using graph correction","authors":"Amir Dehaki Toroghi, Javad Hamidzadeh","doi":"10.1007/s10115-024-02115-5","DOIUrl":"https://doi.org/10.1007/s10115-024-02115-5","url":null,"abstract":"<p>Today, the rapid development of online social networks, as well as low costs, easy communication, and quick access with minimal facilities have made social networks an attractive and very influential phenomenon among people. The users of these networks tend to share their sensitive and private information with friends and acquaintances. This has caused the data of these networks to become a very important source of information about users, their interests, feelings, and activities. Analyzing this information can be very useful in predicting the behavior of users in dealing with various issues. But publishing this data for data mining can violate the privacy of users. As a result, data privacy protection of social networks has become an important and attractive research topic. In this context, various algorithms have been proposed, all of which meet privacy requirements by making changes in the information as well as the graph structure. But due to high processing costs and long execution times, these algorithms are not very appropriate for anonymizing big data. In this research, we improved the speed of data anonymization by using the number factorization technique to select and delete the best edges in the graph correction stage. We also used the chaotic krill herd algorithm to add edges, and considering the effect of all edges together on the structure of the graph, we selected edges and added them to the graph so that it preserved the graph’s utility. The evaluation results on the real-world datasets, show the efficiency of the proposed algorithm in comparison with the state-of-the-art methods to reduce the execution time and maintain the utility of the anonymous graph.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"28 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data","authors":"D. Sudha, M. Krishnamurthy","doi":"10.1007/s10115-024-02105-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02105-7","url":null,"abstract":"<p>A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}