Meishan Zhang, Gongyao Jiang, Shuang Liu, Jing Chen, Min Zhang
{"title":"LLM–Assisted Data Augmentation for Chinese Dialogue–Level Dependency Parsing","authors":"Meishan Zhang, Gongyao Jiang, Shuang Liu, Jing Chen, Min Zhang","doi":"10.1162/coli_a_00515","DOIUrl":null,"url":null,"abstract":"Dialogue–level dependency parsing, despite its growing academic interest, often encounters underperformance issues due to resource shortages. A potential solution to this challenge is data augmentation. In recent years, large language models (LLMs) have demonstrated strong capabilities in generation which can facilitate data augmentation greatly. In this study, we focus on Chinese dialogue–level dependency parsing, presenting three simple and effective strategies with LLM to augment the original training instances, namely word–level, syntax–level and discourse–level augmentations, respectively. These strategies enable LLMs to either preserve or modify dependency structures, thereby assuring accuracy while increasing the diversity of instances at different levels. We conduct experiments on the benchmark dataset released by Jiang et al. (2023) to validate our approach. Results show that our method can greatly boost the parsing performance in various settings, particularly in dependencies among elementary discourse units (EDUs). Lastly, we provide in–depth analysis to show the key points of our data augmentation strategies.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"72 1","pages":""},"PeriodicalIF":9.3000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00515","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Dialogue–level dependency parsing, despite its growing academic interest, often encounters underperformance issues due to resource shortages. A potential solution to this challenge is data augmentation. In recent years, large language models (LLMs) have demonstrated strong capabilities in generation which can facilitate data augmentation greatly. In this study, we focus on Chinese dialogue–level dependency parsing, presenting three simple and effective strategies with LLM to augment the original training instances, namely word–level, syntax–level and discourse–level augmentations, respectively. These strategies enable LLMs to either preserve or modify dependency structures, thereby assuring accuracy while increasing the diversity of instances at different levels. We conduct experiments on the benchmark dataset released by Jiang et al. (2023) to validate our approach. Results show that our method can greatly boost the parsing performance in various settings, particularly in dependencies among elementary discourse units (EDUs). Lastly, we provide in–depth analysis to show the key points of our data augmentation strategies.
期刊介绍:
Computational Linguistics is the longest-running publication devoted exclusively to the computational and mathematical properties of language and the design and analysis of natural language processing systems. This highly regarded quarterly offers university and industry linguists, computational linguists, artificial intelligence and machine learning investigators, cognitive scientists, speech specialists, and philosophers the latest information about the computational aspects of all the facets of research on language.