{"title":"Automatic Detection and Visualization of Information Structure in English","authors":"J. Blake, Evgeny Pyshkin, Šimon Pavlík","doi":"10.1145/3582768.3582784","DOIUrl":null,"url":null,"abstract":"This paper describes the design and development of an online tool that identifies and visualizes information structure in user-submitted texts written in English. Non-native users of English find it difficult to distinguish between structures that are marked and unmarked. Markedness is evaluated based on acceptability and frequency of a sequence of word tokens. Marked sentences stand out as being unnatural to native speakers, but few native speakers can explain why. Information structure can, however, frequently explain markedness. The tool detects the three principles of information structure: information focus, information flow and end weight. Information focus explains the sequence of elements within sentences. Information flow explains the sequence of elements within paragraphs. End weight explains the relative position of phrases and clauses within a sentence. Through exposure to these principles in context, this tool aims to help writers of English understand which structural language features may be judged as marked.","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582768.3582784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper describes the design and development of an online tool that identifies and visualizes information structure in user-submitted texts written in English. Non-native users of English find it difficult to distinguish between structures that are marked and unmarked. Markedness is evaluated based on acceptability and frequency of a sequence of word tokens. Marked sentences stand out as being unnatural to native speakers, but few native speakers can explain why. Information structure can, however, frequently explain markedness. The tool detects the three principles of information structure: information focus, information flow and end weight. Information focus explains the sequence of elements within sentences. Information flow explains the sequence of elements within paragraphs. End weight explains the relative position of phrases and clauses within a sentence. Through exposure to these principles in context, this tool aims to help writers of English understand which structural language features may be judged as marked.