AI OpenPub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.009
Wenlong Fang, Yongbin Liu, Chunping Ouyang, Lin Ren, Jiale Li, Yaping Wan
{"title":"Joint span and token framework for few-shot named entity recognition","authors":"Wenlong Fang, Yongbin Liu, Chunping Ouyang, Lin Ren, Jiale Li, Yaping Wan","doi":"10.1016/j.aiopen.2023.08.009","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.009","url":null,"abstract":"<div><p>Few-shot Named Entity Recognition (NER) is a challenging task that involves identifying new entity types using a limited number of labeled instances for training. Currently, the majority of Few-shot NER methods are based on span, which pay more attention to the boundary information of the spans as candidate entities and the entity-level information. However, these methods often overlook token-level semantic information, which can limit their effectiveness. To address this issue, we propose a novel Joint Span and Token (<strong>JST</strong>) framework that integrates both the boundary information of an entity and the semantic information of each token that comprises an entity. The <strong>JST</strong> framework employs span features to extract the boundary features of the entity and token features to extract the semantic features of each token. Additionally, to reduce the negative impact of the Other class, we introduce a method to separate named entities from the Other class in semantic space, which helps to improve the distinction between entities and the Other class. In addition, we used GPT to do data augmentation on the support sentences, generating similar sentences to the original ones. These sentences increase the diversity of the sample and the reliability of our model. Our experimental results on the Few-NERD<span><sup>1</sup></span> and SNIPS<span><sup>2</sup></span> datasets demonstrate that our model outperforms existing methods in terms of performance.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 111-119"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.010
Zeyuan Yang , Zonghan Yang , Yichen Liu , Peng Li , Yang Liu
{"title":"Restricted orthogonal gradient projection for continual learning","authors":"Zeyuan Yang , Zonghan Yang , Yichen Liu , Peng Li , Yang Liu","doi":"10.1016/j.aiopen.2023.08.010","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.010","url":null,"abstract":"<div><p>Continual learning aims to avoid catastrophic forgetting and effectively leverage learned experiences to master new knowledge. Existing gradient projection approaches impose hard constraints on the optimization space for new tasks to minimize interference, which simultaneously hinders forward knowledge transfer. To address this issue, recent methods reuse frozen parameters with a growing network, resulting in high computational costs. Thus, it remains a challenge whether we can improve forward knowledge transfer for gradient projection approaches <em>using a fixed network architecture</em>. In this work, we propose the Restricted Orthogonal Gradient prOjection (ROGO) framework. The basic idea is to adopt a restricted orthogonal constraint allowing parameters optimized in the direction oblique to the whole frozen space to facilitate forward knowledge transfer while consolidating previous knowledge. Our framework requires neither data buffers nor extra parameters. Extensive experiments have demonstrated the superiority of our framework over several strong baselines. We also provide theoretical guarantees for our relaxing strategy.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 98-110"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.10.001
Chenzhan Shang , Yupeng Hou , Wayne Xin Zhao , Yaliang Li , Jing Zhang
{"title":"Multi-grained hypergraph interest modeling for conversational recommendation","authors":"Chenzhan Shang , Yupeng Hou , Wayne Xin Zhao , Yaliang Li , Jing Zhang","doi":"10.1016/j.aiopen.2023.10.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.10.001","url":null,"abstract":"<div><p>Conversational recommender system (CRS) interacts with users through multi-turn dialogues in natural language, which aims to provide high-quality recommendations for user’s instant information need. Although great efforts have been made to develop effective CRS, most of them still focus on the contextual information from the current dialogue, usually suffering from the data scarcity issue. Therefore, we consider leveraging historical dialogue data to enrich the limited contexts of the current dialogue session.</p><p>In this paper, we propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data from different perspectives. As the core idea, we employ <em>hypergraph</em> to represent complicated semantic relations underlying historical dialogues. In our approach, we first employ the hypergraph structure to model users’ historical dialogue sessions and form a <em>session-based hypergraph</em>, which captures <em>coarse-grained, session-level</em> relations. Second, to alleviate the issue of data scarcity, we use an external knowledge graph and construct a <em>knowledge-based hypergraph</em> considering <em>fine-grained, entity-level</em> semantics. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS. Extensive experiments on two benchmarks <span>ReDial</span> and <span>TG-ReDial</span> validate the effectiveness of our approach on both recommendation and conversation tasks. Code is available at: <span>https://github.com/RUCAIBox/MHIM</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 154-164"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651023000177/pdfft?md5=845c75e23c419b9a9e76d0939d4efddc&pid=1-s2.0-S2666651023000177-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92131677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.011
Wanjun Zhong , Yifan Gao , Ning Ding , Zhiyuan Liu , Ming Zhou , Jiahai Wang , Jian Yin , Nan Duan
{"title":"Improving task generalization via unified schema prompt","authors":"Wanjun Zhong , Yifan Gao , Ning Ding , Zhiyuan Liu , Ming Zhou , Jiahai Wang , Jian Yin , Nan Duan","doi":"10.1016/j.aiopen.2023.08.011","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.011","url":null,"abstract":"<div><p>Task generalization has been a long-standing challenge in Natural Language Processing (NLP). Recent research attempts to improve the task generalization ability of pre-trained language models by mapping NLP tasks into human-readable prompted forms. However, these approaches require laborious and inflexible manual collection of prompts, and different prompts on the same downstream task may receive unstable performance. We propose Unified Schema Prompt, a flexible and extensible prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema, and thus enhances task generalization ability. The schema prompt takes the explicit data structure of each task to formulate prompts so that little human effort is involved. To test the task generalization ability of schema prompt at scale, we conduct schema prompt-based multitask pre-training on a wide variety of general NLP tasks. The framework achieves strong zero-shot and few-shot generalization performance on 16 unseen downstream tasks from 8 task types (e.g., QA, NLI, etc.). Furthermore, comprehensive analyses demonstrate the effectiveness of each component in the schema prompt, its flexibility in task compositionality, and its ability to improve performance under a full-data fine-tuning setting.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 120-129"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49710709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.08.007
Zhijie Deng , Yinpeng Dong , Jun Zhu
{"title":"Batch virtual adversarial training for graph convolutional networks","authors":"Zhijie Deng , Yinpeng Dong , Jun Zhu","doi":"10.1016/j.aiopen.2023.08.007","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.08.007","url":null,"abstract":"<div><p>We present batch virtual adversarial training (BVAT), a novel regularization method for graph convolutional networks (GCNs). BVAT addresses the issue that GCNs do not ensure the smoothness of the model’s output distribution against local perturbations around the input node features. We propose two algorithms, sampling-based BVAT and optimization-based BVAT, which promote the output smoothness of GCN classifiers based on the generated virtual adversarial perturbations for either a subset of independent nodes or all nodes via an elaborate optimization process. Extensive experiments on three citation network datasets <em>Cora</em>, <em>Citeseer</em> and <em>Pubmed</em> and a knowledge graph dataset <em>Nell</em> validate the efficacy of the proposed method in semi-supervised node classification tasks.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 73-79"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49761369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2023-01-01DOI: 10.1016/j.aiopen.2023.01.001
Rishabh Misra , Prahal Arora
{"title":"Sarcasm detection using news headlines dataset","authors":"Rishabh Misra , Prahal Arora","doi":"10.1016/j.aiopen.2023.01.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2023.01.001","url":null,"abstract":"<div><p>Sarcasm has been an elusive concept for humans. Due to interesting linguistic properties, sarcasm detection has gained traction of the Natural Language Processing (NLP) research community in the past few years. However, the task of predicting sarcasm in a text remains a difficult one for machines as well, and there are limited insights into what makes a sentence sarcastic. Past studies in sarcasm detection either use large scale datasets collected using tag-based supervision or small scale manually annotated datasets. The former category of datasets are noisy in terms of labels and language, whereas the latter category of datasets do not have enough instances to train deep learning models reliably despite having high-quality labels. To overcome these shortcomings, we introduce a high-quality and relatively larger-scale dataset which is a collection of news headlines from a sarcastic news website and a real news website. We describe the unique aspects of our dataset and compare its various characteristics with other benchmark datasets in sarcasm detection domain. Furthermore, we produce insights into what constitute as sarcasm in a text using a Hybrid Neural Network architecture. First released in 2019, we dedicate a section on how the NLP research community has extensively relied upon our contributions to push the state of the art further in the sarcasm detection domain. Lastly, we make the dataset as well as framework implementation publicly available to facilitate continued research in this domain.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"4 ","pages":"Pages 13-18"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49732927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2022-12-01DOI: 10.1016/j.aiopen.2022.11.006
Qinkai Zheng, Xiao Xia, Kun Zhang, E. Kharlamov, Yuxiao Dong
{"title":"On the distribution alignment of propagation in graph neural networks","authors":"Qinkai Zheng, Xiao Xia, Kun Zhang, E. Kharlamov, Yuxiao Dong","doi":"10.1016/j.aiopen.2022.11.006","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.11.006","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"12 1","pages":"218-228"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81955757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2022-10-01DOI: 10.2139/ssrn.4245175
Tao Wei, Yonghong Tian, Yaowei Wang, Yun Liang, C. Chen
{"title":"Optimized separable convolution: Yet another efficient convolution operator","authors":"Tao Wei, Yonghong Tian, Yaowei Wang, Yun Liang, C. Chen","doi":"10.2139/ssrn.4245175","DOIUrl":"https://doi.org/10.2139/ssrn.4245175","url":null,"abstract":"The convolution operation is the most critical component in recent surge of deep learning research. Conventional 2D convolution needs O ( C 2 K 2 ) parameters to represent, where C is the channel size and K is the kernel size. The amount of parameters has become really costly considering that these parameters increased tremendously recently to meet the needs of demanding applications. Among various implementations of the convolution, separable convolution has been proven to be more efficient in reducing the model size. For example, depth separable convolution reduces the complexity to O ( C · ( C + K 2 )) while spatial separable convolution reduces the complexity to O ( C 2 K ) . However, these are considered ad hoc designs which cannot ensure that they can in general achieve optimal separation. In this research, we propose a novel and principled operator called optimized separable convolution by optimal design for the internal number of groups and kernel sizes for general separable convolutions can achieve the complexity of O ( C 32 K ) . When the restriction in the number of separated convolutions can be lifted, an even lower complexity at O ( C · log( CK 2 )) can be achieved. Experimental results demonstrate that the proposed optimized separable convolution is able to achieve an improved performance in terms of accuracy-#Params trade-offs over both conventional, depth-wise, and depth/spatial separable convolutions.","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"40 1","pages":"162-171"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85236498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}