通过统计语言模型测量作者合法性

2018 International Conference on Advancements in Computational Sciences (ICACS) Pub Date : 2018-02-01 DOI:10.1109/ICACS.2018.8333276

Ghazanfar Hussain, A. Husnain, Rida Zahra, S. M. U. Din

{"title":"通过统计语言模型测量作者合法性","authors":"Ghazanfar Hussain, A. Husnain, Rida Zahra, S. M. U. Din","doi":"10.1109/ICACS.2018.8333276","DOIUrl":null,"url":null,"abstract":"Smart text spinning and paid content writing have jeopardized authorship identity in literary spheres. Authors frequently outsource their work to freelance writers or forge a new piece of writing by using text spinners. These activities seemingly go unnoticed by the readers. In this paper, we propose a way of uncovering true authorship by sampling statistical model of writing features. We acquire dataset of original work from a group of authors and perform feature vector analysis to formulate every author's profile. The profile includes normalized laxative and grammatical components derived from sample space of dataset. Based upon those features, once a new writing is fed, the algorithm extracts relevant components, assigns associative weights and classifies the writing with respect to the author. The algorithm intelligently adjusts weights for swift convergence and precise classification. So far our system is able to achieve an accuracy of 100% above a certain range of words. We have tested it on various text models, spun texts and plagiarised content and the performance of our algorithm has been very promising. It is a great help in academia and professional publishing houses.","PeriodicalId":128949,"journal":{"name":"2018 International Conference on Advancements in Computational Sciences (ICACS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Measuring authorship legitimacy by statistical linguistic modelling\",\"authors\":\"Ghazanfar Hussain, A. Husnain, Rida Zahra, S. M. U. Din\",\"doi\":\"10.1109/ICACS.2018.8333276\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Smart text spinning and paid content writing have jeopardized authorship identity in literary spheres. Authors frequently outsource their work to freelance writers or forge a new piece of writing by using text spinners. These activities seemingly go unnoticed by the readers. In this paper, we propose a way of uncovering true authorship by sampling statistical model of writing features. We acquire dataset of original work from a group of authors and perform feature vector analysis to formulate every author's profile. The profile includes normalized laxative and grammatical components derived from sample space of dataset. Based upon those features, once a new writing is fed, the algorithm extracts relevant components, assigns associative weights and classifies the writing with respect to the author. The algorithm intelligently adjusts weights for swift convergence and precise classification. So far our system is able to achieve an accuracy of 100% above a certain range of words. We have tested it on various text models, spun texts and plagiarised content and the performance of our algorithm has been very promising. It is a great help in academia and professional publishing houses.\",\"PeriodicalId\":128949,\"journal\":{\"name\":\"2018 International Conference on Advancements in Computational Sciences (ICACS)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Advancements in Computational Sciences (ICACS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACS.2018.8333276\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Advancements in Computational Sciences (ICACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACS.2018.8333276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

智能文本旋转和付费内容写作已经危及了文学领域的作者身份。作家们经常把他们的工作外包给自由撰稿人，或者用文字旋转器来伪造一篇新文章。这些活动似乎没有被读者注意到。本文提出了一种利用文字特征抽样统计模型揭示真实作者身份的方法。我们从一组作者中获取原创作品数据集，并进行特征向量分析，以形成每个作者的个人资料。该轮廓包括从数据集的样本空间导出的规范化的泻药和语法成分。基于这些特征，一旦输入新的文章，算法提取相关成分，分配关联权重，并根据作者对文章进行分类。该算法通过智能调整权值，实现快速收敛和精确分类。到目前为止，我们的系统能够在一定范围内达到100%的准确率。我们已经在各种文本模型、旋转文本和抄袭内容上进行了测试，我们的算法的性能非常有希望。对学术界和专业出版社有很大的帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Measuring authorship legitimacy by statistical linguistic modelling

Smart text spinning and paid content writing have jeopardized authorship identity in literary spheres. Authors frequently outsource their work to freelance writers or forge a new piece of writing by using text spinners. These activities seemingly go unnoticed by the readers. In this paper, we propose a way of uncovering true authorship by sampling statistical model of writing features. We acquire dataset of original work from a group of authors and perform feature vector analysis to formulate every author's profile. The profile includes normalized laxative and grammatical components derived from sample space of dataset. Based upon those features, once a new writing is fed, the algorithm extracts relevant components, assigns associative weights and classifies the writing with respect to the author. The algorithm intelligently adjusts weights for swift convergence and precise classification. So far our system is able to achieve an accuracy of 100% above a certain range of words. We have tested it on various text models, spun texts and plagiarised content and the performance of our algorithm has been very promising. It is a great help in academia and professional publishing houses.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Advancements in Computational Sciences (ICACS)

自引率

0.00%

发文量