AI OpenPub Date : 2025-01-01DOI: 10.1016/j.aiopen.2025.01.001
Rui Hao , Linmei Hu , Weijian Qi , Qingliu Wu , Yirui Zhang , Liqiang Nie
{"title":"ChatLLM network: More brains, more intelligence","authors":"Rui Hao , Linmei Hu , Weijian Qi , Qingliu Wu , Yirui Zhang , Liqiang Nie","doi":"10.1016/j.aiopen.2025.01.001","DOIUrl":"10.1016/j.aiopen.2025.01.001","url":null,"abstract":"<div><div>Dialogue-based language models mark a huge milestone in the field of artificial intelligence, by their impressive ability to interact with users, as well as a series of challenging tasks prompted by customized instructions. However, the prevalent large-scale dialogue-based language models like ChatGPT still have room for improvement, such as unstable responses to questions and the inability to think cooperatively like humans. Considering the ability of dialogue-based language models in conversation and their inherent randomness in thinking, we propose ChatLLM network that allows multiple dialogue-based language models to interact, provide feedback, and think together. We design a network of ChatLLMs, consisting multiple layers of language models. Specifically, individual instances of language model may possess distinct perspectives towards the same problem, and by consolidating these diverse viewpoints via a separate language model, the ChatLLM network system can conduct decision-making more objectively and comprehensively. In addition, a language-based feedback mechanism comparable to backpropagation is devised to update the outputs of the language models within the network. This stratified system of interaction can be analogized to the relationship between leaders and employees in a social organization, where collective decision-making often yields superior judgments or resolutions. Experiments on datasets demonstrate that our network attains significant improvements in problem-solving, leading to observable progress amongst each member.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 45-52"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2025-01-01DOI: 10.1016/j.aiopen.2025.01.003
Md Shofiqul Islam , Khondokar Fida Hasan , Hasibul Hossain Shajeeb , Humayan Kabir Rana , Md. Saifur Rahman , Md. Munirul Hasan , AKM Azad , Ibrahim Abdullah , Mohammad Ali Moni
{"title":"Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection","authors":"Md Shofiqul Islam , Khondokar Fida Hasan , Hasibul Hossain Shajeeb , Humayan Kabir Rana , Md. Saifur Rahman , Md. Munirul Hasan , AKM Azad , Ibrahim Abdullah , Mohammad Ali Moni","doi":"10.1016/j.aiopen.2025.01.003","DOIUrl":"10.1016/j.aiopen.2025.01.003","url":null,"abstract":"<div><div>This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 12-44"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143134048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2025-01-01DOI: 10.1016/j.aiopen.2025.01.002
Xinrong Zhang , Shengding Hu , Weilin Zhao , Huadong Wang , Xu Han , Chaoqun He , Guoyang Zeng , Zhiyuan Liu , Maosong Sun
{"title":"Optimal RoPE extension via Bayesian Optimization for training-free length generalization","authors":"Xinrong Zhang , Shengding Hu , Weilin Zhao , Huadong Wang , Xu Han , Chaoqun He , Guoyang Zeng , Zhiyuan Liu , Maosong Sun","doi":"10.1016/j.aiopen.2025.01.002","DOIUrl":"10.1016/j.aiopen.2025.01.002","url":null,"abstract":"<div><div>Transformers are designed to process input of variable length without resource constraints. However, their performance significantly deteriorates when the input surpasses a threshold slightly larger than the pre-training context window. This limitation on the effective context window confines the application of Transformer-based large language models (LLMs) that have been the subject of great anticipation. Consequently, the generalization of pre-trained LLMs to handle varying input lengths becomes a pivotal and formidable challenge. Previous research has endeavored to address this challenge by modifying the Rotary Position Embedding (RoPE), the primary factor responsible for disparities in handling different input lengths. These efforts have provided valuable insights, while they often lack a deep understanding of the root causes of performance degradation and rely heavily on manual parameter tuning. In response to these issues, we conduct a comprehensive analysis and identify two primary causes behind the performance drop: global distribution mismatch and local resolution degradation. In light of these challenges, we introduce an Optimal RoPE (ORoPE) extension using Bayesian Optimization (BO), which alleviates the need for additional model training. Our experiments demonstrate the efficacy of our approach, outperforming baselines by up to 21.9%, 32.1%, and 41.2% at evaluation lengths of 8K, 16K, and 32K, respectively. We will release all code and data when this paper is published.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"6 ","pages":"Pages 1-11"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143134370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2024-01-01DOI: 10.1016/j.aiopen.2023.11.001
Yachuan Liu , Jiaqi Ma , Paramveer Dhillon , Qiaozhu Mei
{"title":"PM2.5 forecasting under distribution shift: A graph learning approach","authors":"Yachuan Liu , Jiaqi Ma , Paramveer Dhillon , Qiaozhu Mei","doi":"10.1016/j.aiopen.2023.11.001","DOIUrl":"10.1016/j.aiopen.2023.11.001","url":null,"abstract":"<div><p>We present a new benchmark task for graph-based machine learning, aiming to predict future air quality (PM2.5 concentration) observed by a geographically distributed network of environmental sensors. While prior work has successfully applied Graph Neural Networks (GNNs) on a wide family of spatio-temporal prediction tasks, the new benchmark task introduced here brings a technical challenge that has been less studied in the context of graph-based spatio-temporal learning: distribution shift across a long period of time. An important goal of this paper is to understand the behavior of spatio-temporal GNNs under distribution shift. We conduct a comprehensive comparative study of both graph-based and non-graph-based machine learning models under two data split methods, one results in distribution shift and one does not. Our empirical results suggest that GNN models tend to suffer more from distribution shift compared to non-graph-based models, which calls for special attention when deploying spatio-temporal GNNs in practice.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 23-29"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000220/pdfft?md5=cec5103867bd9723b31ac8d2aeadf3e7&pid=1-s2.0-S2666651023000220-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139013251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MindLLM: Lightweight large language model pre-training, evaluation and domain application","authors":"Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Yang Gao, Heyan Huang","doi":"10.1016/j.aiopen.2024.08.001","DOIUrl":"10.1016/j.aiopen.2024.08.001","url":null,"abstract":"<div><p>Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 1-26"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651024000111/pdfft?md5=5c01070780bb0f7ea417c3293322b19c&pid=1-s2.0-S2666651024000111-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2024-01-01DOI: 10.1016/j.aiopen.2023.10.005
Qi Zhang, Cheng Yang, Chuan Shi
{"title":"Adaptive negative representations for graph contrastive learning","authors":"Qi Zhang, Cheng Yang, Chuan Shi","doi":"10.1016/j.aiopen.2023.10.005","DOIUrl":"10.1016/j.aiopen.2023.10.005","url":null,"abstract":"<div><p>Graph contrastive learning (GCL) has emerged as a promising paradigm for learning graph representations. Recently, the idea of hard negatives is introduced to GCL, which can provide more challenging self-supervised objectives and alleviate over-fitting issues. These methods use different graphs in the same mini-batch as negative examples, and assign larger weights to true hard negative ones. However, the influence of such weighting strategies is limited in practice, since a small mini-batch may not contain any challenging enough negative examples. In this paper, we aim to offer a more flexible solution to affect the hardness of negatives by directly manipulating the representations of negatives. By assuming that (1) good negative representations should not deviate far from the representations of real graph samples, and (2) the computation process of graph encoder may introduce biases to graph representations, we first design a negative representation generator (NRG) which (1) employs real graphs as prototypes to perturb, and (2) introduces parameterized perturbations through the feed-forward computation of the graph encoder to match the biases. Then we design a generation loss to train the parameters in NRG and adaptively generate negative representations for more challenging contrastive objectives. Experiments on eight benchmark datasets show that our proposed framework ANGCL has 1.6% relative improvement over the best baseline, and can be successfully integrated with three types of graph augmentations. Ablation studies and hyper-parameter experiments further demonstrate the effectiveness of ANGCL.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 79-86"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000219/pdfft?md5=b0c3c461206c9fd2fcce93a0a80db1a1&pid=1-s2.0-S2666651023000219-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138992756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2024-01-01DOI: 10.1016/j.aiopen.2024.06.001
G. Viera-López , J.J. Morgado-Vega , A. Reyes , E. Altshuler , Yudivián Almeida-Cruz , Giorgio Manganini
{"title":"Improving trajectory classification through Kramers–Moyal coefficients","authors":"G. Viera-López , J.J. Morgado-Vega , A. Reyes , E. Altshuler , Yudivián Almeida-Cruz , Giorgio Manganini","doi":"10.1016/j.aiopen.2024.06.001","DOIUrl":"10.1016/j.aiopen.2024.06.001","url":null,"abstract":"<div><p>Trajectory classification focuses on predicting the class or category of a moving object based on its observed movement over time. The classification of trajectory data using classical approaches can be challenging due to the arbitrary and relatively long length of some trajectories. To overcome this, trajectories are often mapped into vector representations that aim to encode their most significant features and for a fixed number of dimensions. Here we propose a novel vector representation for trajectories that combines previously employed features with new ones derived from the computation of the Kramers–Moyal coefficients (KMC). Due to KMC originating from a Taylor expansion that progressively encapsulates more information about a stochastic process, their potential to be effective in trajectory classification is a logical anticipation. We evaluated our representation using different classifiers and several benchmark datasets previously used for trajectory classification. With the addition of features extracted from KMCs, our results indicate a reliable increase in classification accuracy and F1 score of around 4% across all datasets and models used for evaluation. Moreover, we observed an increase in accuracy of up to 20% and an increase in F1 score of up to 23% in some scenarios.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 87-93"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266665102400010X/pdfft?md5=1530eab784a46e13da719255a80cd3e1&pid=1-s2.0-S266665102400010X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141715791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2024-01-01DOI: 10.1016/j.aiopen.2024.10.002
Adikarige Randil Sanjeewa Madanayake, Kyungmi Lee, Ickjai Lee
{"title":"Mining contacts from spatio-temporal trajectories","authors":"Adikarige Randil Sanjeewa Madanayake, Kyungmi Lee, Ickjai Lee","doi":"10.1016/j.aiopen.2024.10.002","DOIUrl":"10.1016/j.aiopen.2024.10.002","url":null,"abstract":"<div><div>Contact mining is discovering objects in close proximity in their movements in order to reveal possible interactions, infections, collisions or contacts. This process can be significantly beneficial in a spread of an infectious disease situation to identify potential victims from a known infected human or animal, especially when the victims are asymptomatic. Movements of objects are captured by spatio-temporal trajectories represented by a series of geospatial locations and corresponding timestamps. A large amount of spatio-temporal trajectory data is being gathered by various location acquiring sensor devices by tracking movement behaviours of people, animals, vehicles and natural events. Trajectory data mining techniques have been proposed to discover useful patterns to understand the behaviours of spatio-temporal trajectories. One unexplored pattern is to identify contacts of targeted trajectory in spatio-temporal trajectories, which is defined as contact mining. The aim of this study is to investigate contact mining from spatio-temporal trajectories. The approach will be initiated by preprocessing spatio-temporal data and then by investigating a robust contact mining framework to efficiently and effectively mine contacts of a trajectory of interest from a given set of trajectories. Experimental results demonstrate the efficiency, effectiveness and scalability of our approach. In addition, parameter sensitivity analysis reveals the robustness and insensitivity of our framework.</div></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 197-207"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing neural network classification using fractional-order activation functions","authors":"Meshach Kumar , Utkal Mehta , Giansalvo Cirrincione","doi":"10.1016/j.aiopen.2023.12.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.12.003","url":null,"abstract":"<div><p>In this paper, a series of novel activation functions is presented, which is derived using the improved Riemann–Liouville conformable fractional derivative (<span><math><msup><mrow></mrow><mrow><mi>R</mi><mi>L</mi></mrow></msup></math></span>CFD). This study investigates the use of fractional activation functions in Multilayer Perceptron (MLP) models and their impact on the performance of classification tasks, verified using the IRIS, MNIST and FMNIST datasets. Fractional activation functions introduce a non-integer power exponent, allowing for improved capturing of complex patterns and representations. The experiment compares MLP models employing fractional activation functions, such as fractional sigmoid, hyperbolic tangent and rectified linear units, against traditional models using standard activation functions, their improved versions and existing fractional functions. The numerical studies have confirmed the theoretical observations mentioned in the paper. The findings highlight the potential usage of new functions as a valuable tool in deep learning in classification. The study suggests incorporating fractional activation functions in MLP architectures can lead to superior accuracy and robustness.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 10-22"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266665102300030X/pdfft?md5=2be839945dd6c63499655950e9809539&pid=1-s2.0-S266665102300030X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139090006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2024-01-01DOI: 10.1016/j.aiopen.2024.01.005
Long Ding , Chunping Ouyang , Yongbin Liu , Zhihua Tao , Yaping Wan , Zheng Gao
{"title":"Few-shot Named Entity Recognition via encoder and class intervention","authors":"Long Ding , Chunping Ouyang , Yongbin Liu , Zhihua Tao , Yaping Wan , Zheng Gao","doi":"10.1016/j.aiopen.2024.01.005","DOIUrl":"10.1016/j.aiopen.2024.01.005","url":null,"abstract":"<div><p>In the real world, the large and complex nature of text increases the difficulty of tagging and results in a limited amount of tagged text. Few-shot Named Entity Recognition(NER) only uses a small amount of annotation data to identify and classify entities. It avoids the above problems. Few-shot learning methods usually use prior knowledge to achieve good results. However, prior knowledge may become a confounding factor affecting the relation between sample features and real labels. This problem leads to bias and difficulty accurately capturing class. To solve this problem, a new model, Few-shot Named Entity Recognition via Encoder and Class Intervention, is proposed based on causality. We show that we can steer the model to manufacture interventions on encoder and class, and reduce the interference of confounding factors. Specifically, while cross-sample attention perturbation is used in the encoder layer, a practical causal relation between feature and classification label is developed in the class layer. This way is an attempt of causal methodology in the Few-shot Named Entity Recognition task, which improves the discrimination ability of the NER classifier. Experimental results demonstrate that our model outperforms baseline models in both 5-way and 10-way on two NER datasets.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 39-45"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651024000068/pdfft?md5=737ba44f6bb38a965193bee8501a6eb7&pid=1-s2.0-S2666651024000068-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139884960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}