{"title":"基于语料库不可知模型的引文检测与分类","authors":"Sean Papay, Sebastian Padó","doi":"10.26615/978-954-452-056-4_103","DOIUrl":null,"url":null,"abstract":"The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.","PeriodicalId":284493,"journal":{"name":"Recent Advances in Natural Language Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Quotation Detection and Classification with a Corpus-Agnostic Model\",\"authors\":\"Sean Papay, Sebastian Padó\",\"doi\":\"10.26615/978-954-452-056-4_103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.\",\"PeriodicalId\":284493,\"journal\":{\"name\":\"Recent Advances in Natural Language Processing\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Advances in Natural Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26615/978-954-452-056-4_103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26615/978-954-452-056-4_103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Quotation Detection and Classification with a Corpus-Agnostic Model
The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.