Ghezal Ahmad Jan Zia, Ahmad Zia Sharifi, Fazl Ahmad Amini, Niaz Mohammad Ramaki
{"title":"面向信息抽取的DARI语言共引用解析的比较提及对模型","authors":"Ghezal Ahmad Jan Zia, Ahmad Zia Sharifi, Fazl Ahmad Amini, Niaz Mohammad Ramaki","doi":"10.5121/CSIT.2019.90708","DOIUrl":null,"url":null,"abstract":"Coreference resolution plays an important role in Information Extraction.This paper covers the investigation of two strategies based on a mention-pair resolver using Decision Tree classifier on structured and unstructured dataset, targeting coreference resolution in Dari language. Strategies are (1) training separate models which is specialized in particular categories (e.g., lexical, syntactic and semantic) and types of mentions (e.g. pronouns, proper nouns) and (2) using a structured dataset on a machine learning library that is designed to classify numerical values. Moreover, these modifications and comparative models describe a contribution of comprehensive factors involved in the resolution of texts. Specifically, we developed the first Dari corpus (’DariCoref’) based on OntoNotes and WikiCoref scheme. Both strategies are produced f-score of state-of-the-art.","PeriodicalId":383682,"journal":{"name":"8th International Conference on Soft Computing, Artificial Intelligence and Applications","volume":"129 13","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Mention-Pair Models for Coreference Resolution in DARI Language for Information Extraction\",\"authors\":\"Ghezal Ahmad Jan Zia, Ahmad Zia Sharifi, Fazl Ahmad Amini, Niaz Mohammad Ramaki\",\"doi\":\"10.5121/CSIT.2019.90708\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coreference resolution plays an important role in Information Extraction.This paper covers the investigation of two strategies based on a mention-pair resolver using Decision Tree classifier on structured and unstructured dataset, targeting coreference resolution in Dari language. Strategies are (1) training separate models which is specialized in particular categories (e.g., lexical, syntactic and semantic) and types of mentions (e.g. pronouns, proper nouns) and (2) using a structured dataset on a machine learning library that is designed to classify numerical values. Moreover, these modifications and comparative models describe a contribution of comprehensive factors involved in the resolution of texts. Specifically, we developed the first Dari corpus (’DariCoref’) based on OntoNotes and WikiCoref scheme. Both strategies are produced f-score of state-of-the-art.\",\"PeriodicalId\":383682,\"journal\":{\"name\":\"8th International Conference on Soft Computing, Artificial Intelligence and Applications\",\"volume\":\"129 13\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"8th International Conference on Soft Computing, Artificial Intelligence and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/CSIT.2019.90708\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"8th International Conference on Soft Computing, Artificial Intelligence and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/CSIT.2019.90708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Mention-Pair Models for Coreference Resolution in DARI Language for Information Extraction
Coreference resolution plays an important role in Information Extraction.This paper covers the investigation of two strategies based on a mention-pair resolver using Decision Tree classifier on structured and unstructured dataset, targeting coreference resolution in Dari language. Strategies are (1) training separate models which is specialized in particular categories (e.g., lexical, syntactic and semantic) and types of mentions (e.g. pronouns, proper nouns) and (2) using a structured dataset on a machine learning library that is designed to classify numerical values. Moreover, these modifications and comparative models describe a contribution of comprehensive factors involved in the resolution of texts. Specifically, we developed the first Dari corpus (’DariCoref’) based on OntoNotes and WikiCoref scheme. Both strategies are produced f-score of state-of-the-art.