{"title":"Text coherence new method using word2vec sentence vectors and most likely n-grams","authors":"Mohamad Abdolahi Kharazmi, M. Kharazmi","doi":"10.1109/ICSPIS.2017.8311598","DOIUrl":null,"url":null,"abstract":"Discourse coherence modeling evaluation remains a challenge task in all Natural Language Processing subfields. Most proposed approaches focus on feature engineering, which accepts the sophisticated features to capture the logic, syntactic or semantic relationships between all sentences within a text. This paper investigates the automatic evaluation of text coherence. We introduce a fully-automatic rich statistical model of local and global coherence that uses word2vec approach to assess the coherence a document. Our modeling approach relies on numerical vectors derived from word2vec algorithm applied on a very large collection of texts. We successfully combined the word2vec vectors and most likely n-grams with cohesive LD-n-grams perplexity to assess the coherence and topic integrity of document. We present experimental results that assess the predictive power that it does not depend on the language and its semantic concepts. So it has the ability to apply on any language. Our model achieves state-of-the-art performance in coherence evaluation and order discrimination task on two datasets widely used in the previous methods.","PeriodicalId":380266,"journal":{"name":"2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS.2017.8311598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Discourse coherence modeling evaluation remains a challenge task in all Natural Language Processing subfields. Most proposed approaches focus on feature engineering, which accepts the sophisticated features to capture the logic, syntactic or semantic relationships between all sentences within a text. This paper investigates the automatic evaluation of text coherence. We introduce a fully-automatic rich statistical model of local and global coherence that uses word2vec approach to assess the coherence a document. Our modeling approach relies on numerical vectors derived from word2vec algorithm applied on a very large collection of texts. We successfully combined the word2vec vectors and most likely n-grams with cohesive LD-n-grams perplexity to assess the coherence and topic integrity of document. We present experimental results that assess the predictive power that it does not depend on the language and its semantic concepts. So it has the ability to apply on any language. Our model achieves state-of-the-art performance in coherence evaluation and order discrimination task on two datasets widely used in the previous methods.