Yifei Ge, Azragul Yusup, Degang Chen, Hongliang Mao, Yingjie Cao
{"title":"维吾尔语命名实体识别的数据增强方法","authors":"Yifei Ge, Azragul Yusup, Degang Chen, Hongliang Mao, Yingjie Cao","doi":"10.1109/DSA56465.2022.00130","DOIUrl":null,"url":null,"abstract":"Data augmentation methods can effectively improve model generalization performance and have been widely used to alleviate the overfitting problem in the case of low resources or class imbalance; however, the data noise generated by traditional data augmentation methods can make named entity recognition models sensitive and fragile. To address the above problems, this paper proposes an applicable Uyghur language named entity recognition data augmentation method (UGDA), which improves the traditional data augmentation methods to improve the quality of data augmentation sample generation. It is shown experimentally that using the data augmentation method on a self-constructed Uyghur language dataset improves F1 values by 2.97% compared to the baseline model ($\\text{BIGRU}+\\text{CRF}$) and by 1.81% compared to the baseline model ($\\text{CINO}+\\text{CRF}$), and the generated augmented samples are also applicable to the pre-trained model, fully demonstrating that the data augmentation method proposed in this paper can generate diverse and information-rich enhanced data, effectively improving the performance of the Uyghur language named entity recognition task.","PeriodicalId":208148,"journal":{"name":"2022 9th International Conference on Dependable Systems and Their Applications (DSA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UGDA: Data Augmentation Methods for Uyghur Language Named Entity Recognition\",\"authors\":\"Yifei Ge, Azragul Yusup, Degang Chen, Hongliang Mao, Yingjie Cao\",\"doi\":\"10.1109/DSA56465.2022.00130\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data augmentation methods can effectively improve model generalization performance and have been widely used to alleviate the overfitting problem in the case of low resources or class imbalance; however, the data noise generated by traditional data augmentation methods can make named entity recognition models sensitive and fragile. To address the above problems, this paper proposes an applicable Uyghur language named entity recognition data augmentation method (UGDA), which improves the traditional data augmentation methods to improve the quality of data augmentation sample generation. It is shown experimentally that using the data augmentation method on a self-constructed Uyghur language dataset improves F1 values by 2.97% compared to the baseline model ($\\\\text{BIGRU}+\\\\text{CRF}$) and by 1.81% compared to the baseline model ($\\\\text{CINO}+\\\\text{CRF}$), and the generated augmented samples are also applicable to the pre-trained model, fully demonstrating that the data augmentation method proposed in this paper can generate diverse and information-rich enhanced data, effectively improving the performance of the Uyghur language named entity recognition task.\",\"PeriodicalId\":208148,\"journal\":{\"name\":\"2022 9th International Conference on Dependable Systems and Their Applications (DSA)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 9th International Conference on Dependable Systems and Their Applications (DSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSA56465.2022.00130\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 9th International Conference on Dependable Systems and Their Applications (DSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSA56465.2022.00130","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
UGDA: Data Augmentation Methods for Uyghur Language Named Entity Recognition
Data augmentation methods can effectively improve model generalization performance and have been widely used to alleviate the overfitting problem in the case of low resources or class imbalance; however, the data noise generated by traditional data augmentation methods can make named entity recognition models sensitive and fragile. To address the above problems, this paper proposes an applicable Uyghur language named entity recognition data augmentation method (UGDA), which improves the traditional data augmentation methods to improve the quality of data augmentation sample generation. It is shown experimentally that using the data augmentation method on a self-constructed Uyghur language dataset improves F1 values by 2.97% compared to the baseline model ($\text{BIGRU}+\text{CRF}$) and by 1.81% compared to the baseline model ($\text{CINO}+\text{CRF}$), and the generated augmented samples are also applicable to the pre-trained model, fully demonstrating that the data augmentation method proposed in this paper can generate diverse and information-rich enhanced data, effectively improving the performance of the Uyghur language named entity recognition task.