{"title":"Authorship attribution for textual data on online social networks","authors":"Ritu Banga, Pulkit Mehndiratta","doi":"10.1109/IC3.2017.8284311","DOIUrl":null,"url":null,"abstract":"Authorship Attribution, (AA) is a process of determining a particular document is written by which author among a list of suspected authors. Authorship attribution has been the problem from last six decades; when there were handwritten documents needed to be identified for the genuine author. Due to the technology advancement and increase in cybercrime and unlawful activities, this problem of AA becomes forth most important to trace out the author behind online messages. Over the past, many years research has been conducted to attribute the authorship of an author on the basis of their writing style as all authors possess different distinctiveness while writing a piece of document. This paper presents a comparative study of various machine learning approaches on different feature sets for authorship attribution on short text. The Twitter dataset has been used for comparison with varying sample size of a dataset of 10 prolific authors with various combinations of feature sets. The significance and impact of combinations of features while inferring different stylometric features has been reflected. The results of different approaches are compared based on their accuracy and precision values.","PeriodicalId":147099,"journal":{"name":"2017 Tenth International Conference on Contemporary Computing (IC3)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Tenth International Conference on Contemporary Computing (IC3)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3.2017.8284311","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Authorship Attribution, (AA) is a process of determining a particular document is written by which author among a list of suspected authors. Authorship attribution has been the problem from last six decades; when there were handwritten documents needed to be identified for the genuine author. Due to the technology advancement and increase in cybercrime and unlawful activities, this problem of AA becomes forth most important to trace out the author behind online messages. Over the past, many years research has been conducted to attribute the authorship of an author on the basis of their writing style as all authors possess different distinctiveness while writing a piece of document. This paper presents a comparative study of various machine learning approaches on different feature sets for authorship attribution on short text. The Twitter dataset has been used for comparison with varying sample size of a dataset of 10 prolific authors with various combinations of feature sets. The significance and impact of combinations of features while inferring different stylometric features has been reflected. The results of different approaches are compared based on their accuracy and precision values.