Authorship classification techniques: Bridging textual domains and languages

IF 1 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal on Information Technologies and Security Pub Date : 2024-03-01 DOI:10.59035/ukbe1226

Arta Misini, A. Kadriu, Ercan Canhasi

引用次数: 0

Abstract

Authorship classification analyzes an author's prior work to identify their writing style, a unique trait of each language and individual author. This research aims to conduct a thorough comparative analysis of various methods for classifying authorship. The study leverages two corpora: AAALitCorpus of Albanian literary texts and CCAT10 of English columns. We evaluate model-generated features across different configurations. The richness of the features and the breadth of the analysis provide a significant understanding of the problem, setting a new standard for comprehensive linguistic investigations across multiple languages. The study indicates that machine learning algorithms accurately discern authorial writing styles, highlighting the complexities of classifying authorship in a cross-linguistic context.

查看原文本刊更多论文

作者分类技术：连接文本领域和语言

作者分类法通过分析作者以前的作品来确定其写作风格，这是每种语言和每个作者的独特特征。本研究旨在对各种作者分类方法进行全面的比较分析。研究利用了两个语料库：AAALitCorpus（阿尔巴尼亚文学文本）和CCAT10（英语专栏）。我们评估了不同配置下模型生成的特征。特征的丰富性和分析的广泛性为理解问题提供了重要依据，为跨多种语言的全面语言学研究设定了新标准。研究表明，机器学习算法能够准确辨别作者的写作风格，突出了跨语言背景下作者身份分类的复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal on Information Technologies and Security COMPUTER SCIENCE, INFORMATION SYSTEMS-

自引率

66.70%

发文量