AI OpenPub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.12.002
Yu Cao, Yuanyuan Sun, Ce Xu, Chunnan Li, Jinming Du, Hongfei Lin
{"title":"CAILIE 1.0: A dataset for Challenge of AI in Law - Information Extraction V1.0","authors":"Yu Cao, Yuanyuan Sun, Ce Xu, Chunnan Li, Jinming Du, Hongfei Lin","doi":"10.1016/j.aiopen.2022.12.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.12.002","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"208-212"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000237/pdfft?md5=0d34de7b220463b0502bcbc2ad2a5225&pid=1-s2.0-S2666651022000237-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72246444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.03.002
Jiawei Liu , Chuan Shi , Cheng Yang , Zhiyuan Lu , Philip S. Yu
{"title":"A survey on heterogeneous information network based recommender systems: Concepts, methods, applications and resources","authors":"Jiawei Liu , Chuan Shi , Cheng Yang , Zhiyuan Lu , Philip S. Yu","doi":"10.1016/j.aiopen.2022.03.002","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.03.002","url":null,"abstract":"<div><p>As an important way to alleviate information overload, a recommender system aims to filter out irrelevant information for users and provides them items that they may be interested in. In recent years, an increasing amount of works have been proposed to introduce auxiliary information in recommender systems to alleviate data sparsity and cold-start problems. Among them, heterogeneous information networks (HIN)-based recommender systems provide a unified approach to fuse various auxiliary information, which can be combined with mainstream recommendation algorithms to effectively enhance the performance and interpretability of models, and thus have been applied in many kinds of recommendation tasks. This paper provides a comprehensive and systematic survey of HIN-based recommender systems, including four aspects: concepts, methods, applications, and resources. Specifically, we firstly introduce the concepts related to recommender systems, heterogeneous information networks and HIN-based recommendation. Secondly, we present more than 70 methods categorized according to models or application scenarios, and describe representative methods symbolically. Thirdly, we summarize the benchmark datasets and open source code. Finally, we discuss several potential research directions and conclude our survey.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 40-57"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000092/pdfft?md5=6df8ab165a626a41bbcb77bcaac40c0f&pid=1-s2.0-S2666651022000092-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72246449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.03.001
Bohan Li, Yutai Hou, Wanxiang Che
{"title":"Data augmentation approaches in natural language processing: A survey","authors":"Bohan Li, Yutai Hou, Wanxiang Che","doi":"10.1016/j.aiopen.2022.03.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.03.001","url":null,"abstract":"<div><p>As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the <strong>diversity</strong> of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges. Some useful resources are provided in Appendix A.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 71-90"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000080/pdfft?md5=daaa1ffcdc6b6cb892dcef9a8b3ee29a&pid=1-s2.0-S2666651022000080-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72246445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2022-01-01DOI: 10.1016/j.aiopen.2022.11.003
Xu Han , Weilin Zhao , Ning Ding , Zhiyuan Liu , Maosong Sun
{"title":"PTR: Prompt Tuning with Rules for Text Classification","authors":"Xu Han , Weilin Zhao , Ning Ding , Zhiyuan Liu , Maosong Sun","doi":"10.1016/j.aiopen.2022.11.003","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.11.003","url":null,"abstract":"<div><p>Recently, prompt tuning has been widely applied to stimulate the rich knowledge in pre-trained language models (PLMs) to serve NLP tasks. Although prompt tuning has achieved promising results on some few-class classification tasks, such as sentiment classification and natural language inference, manually designing prompts is cumbersome. Meanwhile, generating prompts automatically is also difficult and time-consuming. Therefore, obtaining effective prompts for complex many-class classification tasks still remains a challenge. In this paper, we propose to encode the prior knowledge of a classification task into rules, then design sub-prompts according to the rules, and finally combine the sub-prompts to handle the task. We name this <strong>P</strong>rompt <strong>T</strong>uning method with <strong>R</strong>ules “<strong>PTR</strong>”. Compared with existing prompt-based methods, PTR achieves a good trade-off between effectiveness and efficiency in building prompts. We conduct experiments on three many-class classification tasks, including relation classification, entity typing, and intent classification. The results show that PTR outperforms both vanilla and prompt tuning baselines, indicating the effectiveness of utilizing rules for prompt tuning. The source code of PTR is available at <span>https://github.com/thunlp/PTR</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"3 ","pages":"Pages 182-192"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651022000183/pdfft?md5=00c56e1aac330e25c378cff01e2ca394&pid=1-s2.0-S2666651022000183-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72286080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2021-05-18DOI: 10.1016/j.aiopen.2022.07.001
Wenkai Li, Wenbo Hu, Ting Chen, Ning Chen, Cheng Feng
{"title":"StackVAE-G: An efficient and interpretable model for time series anomaly detection","authors":"Wenkai Li, Wenbo Hu, Ting Chen, Ning Chen, Cheng Feng","doi":"10.1016/j.aiopen.2022.07.001","DOIUrl":"https://doi.org/10.1016/j.aiopen.2022.07.001","url":null,"abstract":"","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"37 6 1","pages":"101-110"},"PeriodicalIF":0.0,"publicationDate":"2021-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83401508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.09.001
Kai Xiong, Xiao Ding, Li Du, Ting Liu, Bing Qin
{"title":"Heterogeneous graph knowledge enhanced stock market prediction","authors":"Kai Xiong, Xiao Ding, Li Du, Ting Liu, Bing Qin","doi":"10.1016/j.aiopen.2021.09.001","DOIUrl":"10.1016/j.aiopen.2021.09.001","url":null,"abstract":"<div><p>We focus on the task of stock market prediction based on financial text which contains information that could influence the movement of stock market. Previous works mainly utilize a single semantic unit of financial text, such as words, events, sentences, to predict the tendency of stock market. However, the interaction of different-grained information within financial text can be useful for context knowledge supplement and predictive information selection, and then improve the performance of stock market prediction. To facilitate this, we propose constructing a heterogeneous graph with different-grained information nodes from financial text for the task. A novel heterogeneous neural network is presented to aggregate multi-grained information. Experimental results demonstrate that our proposed approach reaches higher performance than baselines.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 168-174"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000243/pdfft?md5=618178faed3a536b57646ee675c7b211&pid=1-s2.0-S2666651021000243-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73852604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A review of deep learning in question answering over knowledge bases","authors":"Chen Zhang , Yuxuan Lai , Yansong Feng , Dongyan Zhao","doi":"10.1016/j.aiopen.2021.12.001","DOIUrl":"10.1016/j.aiopen.2021.12.001","url":null,"abstract":"<div><p>Question answering over knowledge bases (KBQA) is a challenging task in natural language processing. It requires machines to answer natural language questions based on large-scale knowledge bases. Recent years have witnessed remarkable success of neural network models on many natural language processing tasks, including KBQA. In this paper, we first review the recent advances of deep learning methods on solving simple questions in two streams, the information extraction style and semantic parsing style. We then introduce how to extend the neural architectures to answer more complex questions with iteration and decomposition techniques, and summarize current research challenges.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 205-215"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000292/pdfft?md5=eb6c1b2ea9296d53ba86dfc7d7ce5213&pid=1-s2.0-S2666651021000292-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74007285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI OpenPub Date : 2021-01-01DOI: 10.1016/j.aiopen.2021.07.002
Ruben Cartuyvels, Graham Spinks , Marie-Francine Moens
{"title":"Discrete and continuous representations and processing in deep learning: Looking forward","authors":"Ruben Cartuyvels, Graham Spinks , Marie-Francine Moens","doi":"10.1016/j.aiopen.2021.07.002","DOIUrl":"10.1016/j.aiopen.2021.07.002","url":null,"abstract":"<div><p>Discrete and continuous representations of content (<em>e.g.</em>, of language or images) have interesting properties to be explored for the understanding of or reasoning with this content by machines. This position paper puts forward our opinion on the role of discrete and continuous representations and their processing in the deep learning field. Current neural network models compute continuous-valued data. Information is compressed into dense, distributed embeddings. By stark contrast, humans use discrete symbols in their communication with language. Such symbols represent a compressed version of the world that derives its meaning from shared contextual information. Additionally, human reasoning involves symbol manipulation at a cognitive level, which facilitates abstract reasoning, the composition of knowledge and understanding, generalization and efficient learning. Motivated by these insights, in this paper we argue that combining discrete and continuous representations and their processing will be essential to build systems that exhibit a general form of intelligence. We suggest and discuss several avenues that could improve current neural networks with the inclusion of discrete elements to combine the advantages of both types of representations.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 143-159"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651021000206/pdfft?md5=2930ea7a8804d90c964ce7206c845bec&pid=1-s2.0-S2666651021000206-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87346774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CPM: A large-scale generative Chinese Pre-trained language model","authors":"Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Maosong Sun","doi":"10.1016/j.aiopen.2021.07.001","DOIUrl":"10.1016/j.aiopen.2021.07.001","url":null,"abstract":"<div><p>Pre-trained Language Models (PLMs) have proven to be beneficial for various downstream NLP tasks. Recently, GPT-3, with 175 billion parameters and 570 GB training data, drew a lot of attention due to the capacity of few-shot (even zero-shot) learning. However, applying GPT-3 to address Chinese NLP tasks is still challenging, as the training corpus of GPT-3 is primarily English, and the parameters are not publicly available. In this technical report, we release the Chinese Pre-trained Language Model (CPM) with generative pre-training on large-scale Chinese training data. To the best of our knowledge, CPM, with 2.6 billion parameters and 100 GB Chinese training data, is the largest Chinese pre-trained language model, which could facilitate several downstream Chinese NLP tasks, such as conversation, essay generation, cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many NLP tasks in the settings of few-shot (even zero-shot) learning. The code and parameters are available at <span>https://github.com/TsinghuaAI/CPM</span><svg><path></path></svg>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"2 ","pages":"Pages 93-99"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.aiopen.2021.07.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90523293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}