{"title":"The Linguistic Properties of Award-winning Annual Reports","authors":"Jacqueline Gagnon, S. Young, P. Alves","doi":"10.2139/ssrn.3575679","DOIUrl":null,"url":null,"abstract":"We develop and test a model of high quality annual report discourse. The model is trained and evaluated on reports published between 2007 and 2018 by London Stock Exchange-listed firms shortlisted for an award by corporate reporting experts. We use methods from computational linguistics to identify an initial set of 19 features that distinguish quality according to what management say (i.e.: content) and how they say it (i.e.: language structure). We supplement these features with popular bag-of words proxies drawn from extant research (document length, reading ease, net tone, forward-looking content, and uncertainty). Stepwise regression yields a parsimonious quality model comprising 10 features that suggest more strategy-related commentary, less focus on growth, and greater language accessibility that promotes cognitive processing (evidenced by more relevancy markers, greater connectivity, more exclusive forms of language, and fewer grammatical words). The model predicts over 70% of shortlisting cases in out-of-sample tests and outperforms a baseline model comprising popular bag-of-words features.","PeriodicalId":202880,"journal":{"name":"Research Methods & Methodology in Accounting eJournal","volume":"25 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research Methods & Methodology in Accounting eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3575679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We develop and test a model of high quality annual report discourse. The model is trained and evaluated on reports published between 2007 and 2018 by London Stock Exchange-listed firms shortlisted for an award by corporate reporting experts. We use methods from computational linguistics to identify an initial set of 19 features that distinguish quality according to what management say (i.e.: content) and how they say it (i.e.: language structure). We supplement these features with popular bag-of words proxies drawn from extant research (document length, reading ease, net tone, forward-looking content, and uncertainty). Stepwise regression yields a parsimonious quality model comprising 10 features that suggest more strategy-related commentary, less focus on growth, and greater language accessibility that promotes cognitive processing (evidenced by more relevancy markers, greater connectivity, more exclusive forms of language, and fewer grammatical words). The model predicts over 70% of shortlisting cases in out-of-sample tests and outperforms a baseline model comprising popular bag-of-words features.