Authorship Verification via Linear Correlation Methods of n-gram and Syntax Metrics

Jared Nelson, Mohammad Shekaramiz
{"title":"Authorship Verification via Linear Correlation Methods of n-gram and Syntax Metrics","authors":"Jared Nelson, Mohammad Shekaramiz","doi":"10.1109/ietc54973.2022.9796736","DOIUrl":null,"url":null,"abstract":"This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.","PeriodicalId":251518,"journal":{"name":"2022 Intermountain Engineering, Technology and Computing (IETC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Intermountain Engineering, Technology and Computing (IETC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ietc54973.2022.9796736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.
基于n-图和句法度量线性相关方法的作者验证
本研究评估了句法分析和n-gram两种作者身份预测方法的准确性,并探讨了其潜在的应用前景。该算法测量n-gram,并从训练数据中对形容词、副词、动词、名词、标点符号和句子长度进行计数,并对每个度量进行规范化。该算法将训练样本与测试样本的度量进行比较,并根据每个度量共享的相关性预测作者身份。测试和训练数据之间相关性的严重程度在决策过程中产生重要的权重。例如,如果对一个指标的分析接近100%正相关,则为该指标分配决策中的权重最大值。相反,100%负相关接收最小值。这种新的作者身份验证方法有望在欺诈保护、历史文献研究和保持学术界的完整性方面实现未来的创新。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信