Machine learning and knowledge extraction最新文献_第10页

Detection of Temporal Shifts in Semantics Using Local Graph Clustering 基于局部图聚类的语义时间偏移检测

Machine learning and knowledge extraction Pub Date : 2023-01-13 DOI: 10.3390/make5010008

N. Hwang, S. Chatterjee, Yanming Di, Sharmodeep Bhattacharyya

{"title":"Detection of Temporal Shifts in Semantics Using Local Graph Clustering","authors":"N. Hwang, S. Chatterjee, Yanming Di, Sharmodeep Bhattacharyya","doi":"10.3390/make5010008","DOIUrl":"https://doi.org/10.3390/make5010008","url":null,"abstract":"Many changes in our digital corpus have been brought about by the interplay between rapid advances in digital communication and the current environment characterized by pandemics, political polarization, and social unrest. One such change is the pace with which new words enter the mass vocabulary and the frequency at which meanings, perceptions, and interpretations of existing expressions change. The current state-of-the-art algorithms do not allow for an intuitive and rigorous detection of these changes in word meanings over time. We propose a dynamic graph-theoretic approach to inferring the semantics of words and phrases (“terms”) and detecting temporal shifts. Our approach represents each term as a stochastic time-evolving set of contextual words and is a count-based distributional semantic model in nature. We use local clustering techniques to assess the structural changes in a given word’s contextual words. We demonstrate the efficacy of our method by investigating the changes in the semantics of the phrase “Chinavirus”. We conclude that the term took on a much more pejorative meaning when the White House used the term in the second half of March 2020, although the effect appears to have been temporary. We make both the dataset and the code used to generate this paper’s results available.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"298 ","pages":"128-143"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72541768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

E2H Distance-Weighted Minimum Reference Set for Numerical and Categorical Mixture Data and a Bayesian Swap Feature Selection Algorithm 数值和分类混合数据的E2H距离加权最小参考集及贝叶斯交换特征选择算法

Machine learning and knowledge extraction Pub Date : 2023-01-11 DOI: 10.3390/make5010007

Yuto Omae, Masaya Mori

{"title":"E2H Distance-Weighted Minimum Reference Set for Numerical and Categorical Mixture Data and a Bayesian Swap Feature Selection Algorithm","authors":"Yuto Omae, Masaya Mori","doi":"10.3390/make5010007","DOIUrl":"https://doi.org/10.3390/make5010007","url":null,"abstract":"Generally, when developing classification models using supervised learning methods (e.g., support vector machine, neural network, and decision tree), feature selection, as a pre-processing step, is essential to reduce calculation costs and improve the generalization scores. In this regard, the minimum reference set (MRS), which is a feature selection algorithm, can be used. The original MRS considers a feature subset as effective if it leads to the correct classification of all samples by using the 1-nearest neighbor algorithm based on small samples. However, the original MRS is only applicable to numerical features, and the distances between different classes cannot be considered. Therefore, herein, we propose a novel feature subset evaluation algorithm, referred to as the “E2H distance-weighted MRS,” which can be used for a mixture of numerical and categorical features and considers the distances between different classes in the evaluation. Moreover, a Bayesian swap feature selection algorithm, which is used to identify an effective feature subset, is also proposed. The effectiveness of the proposed methods is verified based on experiments conducted using artificially generated data comprising a mixture of numerical and categorical features.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"15 1","pages":"109-127"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85423013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

XAIR: A Systematic Metareview of Explainable AI (XAI) Aligned to the Software Development Process XAIR:与软件开发过程相结合的可解释AI (XAI)的系统元视图

Machine learning and knowledge extraction Pub Date : 2023-01-11 DOI: 10.3390/make5010006

Tobias Clement, Nils Kemmerzell, Mohamed Abdelaal, M. Amberg

{"title":"XAIR: A Systematic Metareview of Explainable AI (XAI) Aligned to the Software Development Process","authors":"Tobias Clement, Nils Kemmerzell, Mohamed Abdelaal, M. Amberg","doi":"10.3390/make5010006","DOIUrl":"https://doi.org/10.3390/make5010006","url":null,"abstract":"Currently, explainability represents a major barrier that Artificial Intelligence (AI) is facing in regard to its practical implementation in various application domains. To combat the lack of understanding of AI-based systems, Explainable AI (XAI) aims to make black-box AI models more transparent and comprehensible for humans. Fortunately, plenty of XAI methods have been introduced to tackle the explainability problem from different perspectives. However, due to the vast search space, it is challenging for ML practitioners and data scientists to start with the development of XAI software and to optimally select the most suitable XAI methods. To tackle this challenge, we introduce XAIR, a novel systematic metareview of the most promising XAI methods and tools. XAIR differentiates itself from existing reviews by aligning its results to the five steps of the software development process, including requirement analysis, design, implementation, evaluation, and deployment. Through this mapping, we aim to create a better understanding of the individual steps of developing XAI software and to foster the creation of real-world AI applications that incorporate explainability. Finally, we conclude with highlighting new directions for future research.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"32 1","pages":"78-108"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78826420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Learning Sentence-Level Representations with Predictive Coding 用预测编码学习句子级表示

Machine learning and knowledge extraction Pub Date : 2023-01-09 DOI: 10.3390/make5010005

Vladimir Araujo, M. Moens, Álvaro Soto

{"title":"Learning Sentence-Level Representations with Predictive Coding","authors":"Vladimir Araujo, M. Moens, Álvaro Soto","doi":"10.3390/make5010005","DOIUrl":"https://doi.org/10.3390/make5010005","url":null,"abstract":"Learning sentence representations is an essential and challenging topic in the deep learning and natural language processing communities. Recent methods pre-train big models on a massive text corpus, focusing mainly on learning the representation of contextualized words. As a result, these models cannot generate informative sentence embeddings since they do not explicitly exploit the structure and discourse relationships existing in contiguous sentences. Drawing inspiration from human language processing, this work explores how to improve sentence-level representations of pre-trained models by borrowing ideas from predictive coding theory. Specifically, we extend BERT-style models with bottom-up and top-down computation to predict future sentences in latent space at each intermediate layer in the networks. We conduct extensive experimentation with various benchmarks for the English and Spanish languages, designed to assess sentence- and discourse-level representations and pragmatics-focused assessments. Our results show that our approach improves sentence representations consistently for both languages. Furthermore, the experiments also indicate that our models capture discourse and pragmatics knowledge. In addition, to validate the proposed method, we carried out an ablation study and a qualitative study with which we verified that the predictive mechanism helps to improve the quality of the representations.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"12 1","pages":"59-77"},"PeriodicalIF":0.0,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79531481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

IPPT4KRL: Iterative Post-Processing Transfer for Knowledge Representation Learning 知识表示学习的迭代后处理迁移

Machine learning and knowledge extraction Pub Date : 2023-01-06 DOI: 10.3390/make5010004

Weihang Zhang, O. Șerban, Jiahao Sun, Yike Guo

{"title":"IPPT4KRL: Iterative Post-Processing Transfer for Knowledge Representation Learning","authors":"Weihang Zhang, O. Șerban, Jiahao Sun, Yike Guo","doi":"10.3390/make5010004","DOIUrl":"https://doi.org/10.3390/make5010004","url":null,"abstract":"Knowledge Graphs (KGs), a structural way to model human knowledge, have been a critical component of many artificial intelligence applications. Many KG-based tasks are built using knowledge representation learning, which embeds KG entities and relations into a low-dimensional semantic space. However, the quality of representation learning is often limited by the heterogeneity and sparsity of real-world KGs. Multi-KG representation learning, which utilizes KGs from different sources collaboratively, presents one promising solution. In this paper, we propose a simple, but effective iterative method that post-processes pre-trained knowledge graph embedding (IPPT4KRL) on individual KGs to maximize the knowledge transfer from another KG when a small portion of alignment information is introduced. Specifically, additional triples are iteratively included in the post-processing based on their adjacencies to the cross-KG alignments to refine the pre-trained embedding space of individual KGs. We also provide the benchmarking results of existing multi-KG representation learning methods on several generated and well-known datasets. The empirical results of the link prediction task on these datasets show that the proposed IPPT4KRL method achieved comparable and even superior results when compared against more complex methods in multi-KG representation learning.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"11 1","pages":"43-58"},"PeriodicalIF":0.0,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91361686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Arabic Cyberbullying Tweets Using Machine Learning 使用机器学习检测阿拉伯网络欺凌推文

Machine learning and knowledge extraction Pub Date : 2023-01-05 DOI: 10.3390/make5010003

Alanoud Mohammed Alduailaj, A. Belghith

引用次数: 8

Machine Learning and Knowledge Extraction: 7th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2023, Benevento, Italy, August 29 – September 1, 2023, Proceedings 机器学习与知识提取:第7届IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9国际跨领域会议，CD-MAKE 2023，贝内文托，意大利，2023年8月29日- 9月1日，论文集

Machine learning and knowledge extraction Pub Date : 2023-01-01 DOI: 10.1007/978-3-031-40837-3

引用次数: 0

Skew Class-balanced Re-weighting for Unbiased Scene Graph Generation 用于无偏场景图生成的倾斜类平衡重加权

Machine learning and knowledge extraction Pub Date : 2023-01-01 DOI: 10.3390/make5010018

Haeyong Kang, C. D. Yoo

引用次数: 4

Synthetic Data Generation for Visual Detection of Flattened PET Bottles PET压扁瓶视觉检测的合成数据生成

Machine learning and knowledge extraction Pub Date : 2022-12-29 DOI: 10.3390/make5010002

Vitālijs Feščenko, Jānis Ārents, R. Kadikis

引用次数: 2

Multimodal AutoML via Representation Evolution 基于表示进化的多模态自动化

Machine learning and knowledge extraction Pub Date : 2022-12-23 DOI: 10.3390/make5010001

Blaž Škrlj, Matej Bevec, Nadine Lavrac

{"title":"Multimodal AutoML via Representation Evolution","authors":"Blaž Škrlj, Matej Bevec, Nadine Lavrac","doi":"10.3390/make5010001","DOIUrl":"https://doi.org/10.3390/make5010001","url":null,"abstract":"With the increasing amounts of available data, learning simultaneously from different types of inputs is becoming necessary to obtain robust and well-performing models. With the advent of representation learning in recent years, lower-dimensional vector-based representations have become available for both images and texts, while automating simultaneous learning from multiple modalities remains a challenging problem. This paper presents an AutoML (automated machine learning) approach to automated machine learning model configuration identification for data composed of two modalities: texts and images. The approach is based on the idea of representation evolution, the process of automatically amplifying heterogeneous representations across several modalities, optimized jointly with a collection of fast, well-regularized linear models. The proposed approach is benchmarked against 11 unimodal and multimodal (texts and images) approaches on four real-life benchmark datasets from different domains. It achieves competitive performance with minimal human effort and low computing requirements, enabling learning from multiple modalities in automated manner for a wider community of researchers.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"8 1","pages":"1-13"},"PeriodicalIF":0.0,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86545568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0