{"title":"Assessment of an annotation method for the detection of Spanish argumentative, non-argumentative, and their components","authors":"Yudi Guzmán-Monteza","doi":"10.1016/j.teler.2023.100068","DOIUrl":null,"url":null,"abstract":"<div><p>There are many annotation methods for the English language based on adapting an argumentation model according to the study domain. However, as far as research has been done, there are no annotation methods for detecting argumentative content in Spanish, not only due to the complexity of identifying the evidence but also because of the lack of data available for this task. The research aims to present and evaluate an annotation method consisting of an adapted argumentation model, an annotation guide, and an annotation process based on Twitter data analysis. The Inter Annotator Agreement (IAA) study achieves 0.63 Fleiss Kappa for Argument/Non-Argument tagging, 0.35 Fleiss Kappa for Argument Component tagging, and 0.53 Fleiss Kappa for Non-Argument Component tagging, while the best Cohen's kappa (k) index achieved, was 0.73, 0.52 and 0.75 respectively. The results' assessment highlights the need to include linguistic segmentation rules for the second annotation task. It is crucial to use discourse markers for the claim and evidence detection. For the first annotation task, it determined that if the prevalence index and the bias index are very low, the prevalence index predominates over the bias index because k increases (0.52<=<em>k</em><=0.72); likewise, for the third annotation task, when the observed agreement index is almost perfect (0.92) the value of k increases (<em>k</em>=0.75) despite a high prevalence index and a low bias index. The annotated corpus with a Fleiss Kappa >= 0.60, agreement and disagreement tables, and confusion matrices code are available on Mendeley Data Repository.</p></div>","PeriodicalId":101213,"journal":{"name":"Telematics and Informatics Reports","volume":"11 ","pages":"Article 100068"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematics and Informatics Reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772503023000282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
There are many annotation methods for the English language based on adapting an argumentation model according to the study domain. However, as far as research has been done, there are no annotation methods for detecting argumentative content in Spanish, not only due to the complexity of identifying the evidence but also because of the lack of data available for this task. The research aims to present and evaluate an annotation method consisting of an adapted argumentation model, an annotation guide, and an annotation process based on Twitter data analysis. The Inter Annotator Agreement (IAA) study achieves 0.63 Fleiss Kappa for Argument/Non-Argument tagging, 0.35 Fleiss Kappa for Argument Component tagging, and 0.53 Fleiss Kappa for Non-Argument Component tagging, while the best Cohen's kappa (k) index achieved, was 0.73, 0.52 and 0.75 respectively. The results' assessment highlights the need to include linguistic segmentation rules for the second annotation task. It is crucial to use discourse markers for the claim and evidence detection. For the first annotation task, it determined that if the prevalence index and the bias index are very low, the prevalence index predominates over the bias index because k increases (0.52<=k<=0.72); likewise, for the third annotation task, when the observed agreement index is almost perfect (0.92) the value of k increases (k=0.75) despite a high prevalence index and a low bias index. The annotated corpus with a Fleiss Kappa >= 0.60, agreement and disagreement tables, and confusion matrices code are available on Mendeley Data Repository.