基于n-图和句法度量线性相关方法的作者验证

2022 Intermountain Engineering, Technology and Computing (IETC) Pub Date : 2022-05-01 DOI:10.1109/ietc54973.2022.9796736

Jared Nelson, Mohammad Shekaramiz

{"title":"基于n-图和句法度量线性相关方法的作者验证","authors":"Jared Nelson, Mohammad Shekaramiz","doi":"10.1109/ietc54973.2022.9796736","DOIUrl":null,"url":null,"abstract":"This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.","PeriodicalId":251518,"journal":{"name":"2022 Intermountain Engineering, Technology and Computing (IETC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Authorship Verification via Linear Correlation Methods of n-gram and Syntax Metrics\",\"authors\":\"Jared Nelson, Mohammad Shekaramiz\",\"doi\":\"10.1109/ietc54973.2022.9796736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.\",\"PeriodicalId\":251518,\"journal\":{\"name\":\"2022 Intermountain Engineering, Technology and Computing (IETC)\",\"volume\":\"108 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Intermountain Engineering, Technology and Computing (IETC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ietc54973.2022.9796736\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Intermountain Engineering, Technology and Computing (IETC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ietc54973.2022.9796736","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本研究评估了句法分析和n-gram两种作者身份预测方法的准确性，并探讨了其潜在的应用前景。该算法测量n-gram，并从训练数据中对形容词、副词、动词、名词、标点符号和句子长度进行计数，并对每个度量进行规范化。该算法将训练样本与测试样本的度量进行比较，并根据每个度量共享的相关性预测作者身份。测试和训练数据之间相关性的严重程度在决策过程中产生重要的权重。例如，如果对一个指标的分析接近100%正相关，则为该指标分配决策中的权重最大值。相反，100%负相关接收最小值。这种新的作者身份验证方法有望在欺诈保护、历史文献研究和保持学术界的完整性方面实现未来的创新。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Authorship Verification via Linear Correlation Methods of n-gram and Syntax Metrics

This research evaluates the accuracy of two methods of authorship prediction: syntactical analysis and n-gram, and explores its potential usage. The proposed algorithm measures n-gram, and counts adjectives, adverbs, verbs, nouns, punctuation, and sentence length from the training data, and normalizes each metric. The proposed algorithm compares the metrics of training samples to testing samples and predicts authorship based on the correlation they share for each metric. The severity of correlation between the testing and training data produces significant weight in the decision-making process. For example, if analysis of one metric approximates 100% positive correlation, the weight in the decision is assigned a maximum value for that metric. Conversely, a 100% negative correlation receives the minimum value. This new method of authorship validation holds promise for future innovation in fraud protection, the study of historical documents, and maintaining integrity within academia.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 Intermountain Engineering, Technology and Computing (IETC)

自引率

0.00%

发文量