Phuc H. Duong, Hien T. Nguyen, H. Duong, K. Ngo, D. Ngo
{"title":"A Hybrid Approach to Paraphrase Detection","authors":"Phuc H. Duong, Hien T. Nguyen, H. Duong, K. Ngo, D. Ngo","doi":"10.1109/NICS.2018.8606845","DOIUrl":null,"url":null,"abstract":"In this paper, we present a hybrid approach to the paraphrase detection task. The approach takes advantage of both feature-engineering and neural-based methods. First, we represent words and entities in a given sentence by using their pre-trained vectors. Then, those pre-trained vectors are encoded by a bidirectional long-short term memory network. The output matrix is fed into an attention network to obtain an attention vector. The final representation of the sentence is inner product of the matrix and the attention vector. We conduct experiments on the Microsoft Research Paraphrase corpus, a popular dataset used for benchmarking paraphrase detection methods. The experimental results show that our approach achieves competitive results.","PeriodicalId":137666,"journal":{"name":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS.2018.8606845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In this paper, we present a hybrid approach to the paraphrase detection task. The approach takes advantage of both feature-engineering and neural-based methods. First, we represent words and entities in a given sentence by using their pre-trained vectors. Then, those pre-trained vectors are encoded by a bidirectional long-short term memory network. The output matrix is fed into an attention network to obtain an attention vector. The final representation of the sentence is inner product of the matrix and the attention vector. We conduct experiments on the Microsoft Research Paraphrase corpus, a popular dataset used for benchmarking paraphrase detection methods. The experimental results show that our approach achieves competitive results.