从句子到范围关系和向后

International Workshop on Natural Language Processing and Cognitive Science Pub Date : 2016-12-06 DOI:10.5220/0003017401000111

Gábor Alberti, Márton Károly, J. Kleiber

{"title":"从句子到范围关系和向后","authors":"Gábor Alberti, Márton Károly, J. Kleiber","doi":"10.5220/0003017401000111","DOIUrl":null,"url":null,"abstract":"As we strive for sophisticated machine translation and reliable information extraction, we have launched a subproject pertaining to the revelation of reference and information structure in (Hungarian) declarative sentences. The crucial part of information extraction is a procedure whose input is a sentence, and whose output is an information structure, which is practically a set of possible operator scope orders (acceptance). A similar procedure forms the first half of machine translation, too: we need the information structure of the source-language sentence. Then an opposite procedure should come (generation), whose input is an information structure, and whose output is an intoned word sequence, that is, a sentence in the target language. We can base the procedure of acceptance (in the above sense) upon that of generation, due to the reversibility of Prolog mechanisms. And as our approach to grammar is “totally lexicalist”, the lexical description of verbs is responsible for the order and intonation of words in the generated sentence. 1 Generating and Accepting Hungarian Sentences As we strive for a sophisticated level of machine translation and reliable information extraction, we have launched a subproject pertaining to the revelation of reference and information structure in declarative sentences. We are primarily working with data from Hungarian, which is known to be a language with a very rich and explicit information structure (consisting of different types of topics, quantifiers and foci) [1], [2], [3], [4] and an also quite explicit system of four degrees of referentiality [5], [6], [7], [8], including the indefinite specific degree [9] [10]. The kind of input we consider is an ordered set of (Hungarian) words furnished with four stress marks (“unstressed” / “STRESSED” / “FOCUS-STRESSED” / “↑CONTRASTIVELY STRESSED↓”) – and our program decides if they constitute a wellformed sentence at all, with arguments of appropriate degrees of referentiality and a possible information structure, and delivers these semantic data, including the possible scope orders of topics, quantifiers and foci. We call this direction acceptance. We also try to “accept” sequences of words without stress marks: in this case the first step is furnishing them with all possible intonation patterns. A further kind of input is the opposite direction, which can be called generation, whose output is an intoned sentence. Generation is based upon the rich lexical description of tensed verbs pertaining to the sentence-internal arrangement and checking of their arguments; and – in harmony with our “totally lexicalist” approach to grammar [11], which can be regarded as a successor of Hudson’s [12] Word Grammar or Alberti G., KÃąroly M. and Kleiber J. From Sentences to Scope Relations and Backward. DOI: 10.5220/0003017401000111 In Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2010), page ISBN: 978-989-8425-13-3 Copyright c © 2010 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Karttunen’s [13] Radical Lexicalism, and a formal execution of cognitive ideas similar to those of Croft’s [14] Radical Construction Grammar – special intra-lexical generator rules are responsible for the development of the intricate pre-verbal operator zone of sentences. In what follows, Sec2 provides a review of the relevant linguistic phenomena, then Sec3 elucidates what we mean by “accepting” potential sentences with or without stress marks and “generating” sentences; and finally we speak about implementation, our theoretical and practical work in progress driven by computational aims. 2 Referentiality Requirements and Information Structure Hungarian, similar to English in this respect, has an indefinite article (egy ‘a(n)’) and a definite article (a(z) ‘the’) to distinguish different degrees of referentiality. This fact seems to suggest two degrees of referentiality, but a closer look to complex facts (in English, in Hungarian and even in Finnish, which lacks articles) proves that there are (at least) three degrees of positive referentiality in the semantic background of Universal Grammar (see example series (1-5) below), besides the lack of referentiality as a fourth degree, which occurs in Hungarian even in the case of countable nouns, as will be shown in (7) below [8]: Table 1. The four degrees of referentiality (and their expression in Hungarian). non-referential referential non-specific specific non-definite definite ∅ (bare singular) egy ‘a(n)’ egy ‘a(n)’ a(z) ‘the’ The indefinite article is claimed to refer to a specific referentiality: “its referent is a subset of a set of referents already in the domain of discourse” [10] in the English sentence (1e) below, in opposition to the one in the there construction (1b): Example 1. Degrees of referentiality – in English: three (positive) degrees. a. *There is cock in the kitchen. b. There is a cock in the kitchen. 〈+ref, –spec〉 c. *There is the cock in the kitchen. d. *Cock is in the kitchen. e. A cock is in the kitchen. 〈+spec, –def〉 f. The cock is in the kitchen. 〈+def 〉 Even without articles, Finnish can differentiate the three degrees of positive referentiality, by means of word order (〈-spec〉: (2a) vs. 〈+spec〉: (2b-c)) and number agreement (〈-def〉: (2b) vs. 〈+def〉: (2c)): 101","PeriodicalId":378427,"journal":{"name":"International Workshop on Natural Language Processing and Cognitive Science","volume":"256 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"From Sentences to Scope Relations and Backward\",\"authors\":\"Gábor Alberti, Márton Károly, J. Kleiber\",\"doi\":\"10.5220/0003017401000111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As we strive for sophisticated machine translation and reliable information extraction, we have launched a subproject pertaining to the revelation of reference and information structure in (Hungarian) declarative sentences. The crucial part of information extraction is a procedure whose input is a sentence, and whose output is an information structure, which is practically a set of possible operator scope orders (acceptance). A similar procedure forms the first half of machine translation, too: we need the information structure of the source-language sentence. Then an opposite procedure should come (generation), whose input is an information structure, and whose output is an intoned word sequence, that is, a sentence in the target language. We can base the procedure of acceptance (in the above sense) upon that of generation, due to the reversibility of Prolog mechanisms. And as our approach to grammar is “totally lexicalist”, the lexical description of verbs is responsible for the order and intonation of words in the generated sentence. 1 Generating and Accepting Hungarian Sentences As we strive for a sophisticated level of machine translation and reliable information extraction, we have launched a subproject pertaining to the revelation of reference and information structure in declarative sentences. We are primarily working with data from Hungarian, which is known to be a language with a very rich and explicit information structure (consisting of different types of topics, quantifiers and foci) [1], [2], [3], [4] and an also quite explicit system of four degrees of referentiality [5], [6], [7], [8], including the indefinite specific degree [9] [10]. The kind of input we consider is an ordered set of (Hungarian) words furnished with four stress marks (“unstressed” / “STRESSED” / “FOCUS-STRESSED” / “↑CONTRASTIVELY STRESSED↓”) – and our program decides if they constitute a wellformed sentence at all, with arguments of appropriate degrees of referentiality and a possible information structure, and delivers these semantic data, including the possible scope orders of topics, quantifiers and foci. We call this direction acceptance. We also try to “accept” sequences of words without stress marks: in this case the first step is furnishing them with all possible intonation patterns. A further kind of input is the opposite direction, which can be called generation, whose output is an intoned sentence. Generation is based upon the rich lexical description of tensed verbs pertaining to the sentence-internal arrangement and checking of their arguments; and – in harmony with our “totally lexicalist” approach to grammar [11], which can be regarded as a successor of Hudson’s [12] Word Grammar or Alberti G., KÃąroly M. and Kleiber J. From Sentences to Scope Relations and Backward. DOI: 10.5220/0003017401000111 In Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2010), page ISBN: 978-989-8425-13-3 Copyright c © 2010 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Karttunen’s [13] Radical Lexicalism, and a formal execution of cognitive ideas similar to those of Croft’s [14] Radical Construction Grammar – special intra-lexical generator rules are responsible for the development of the intricate pre-verbal operator zone of sentences. In what follows, Sec2 provides a review of the relevant linguistic phenomena, then Sec3 elucidates what we mean by “accepting” potential sentences with or without stress marks and “generating” sentences; and finally we speak about implementation, our theoretical and practical work in progress driven by computational aims. 2 Referentiality Requirements and Information Structure Hungarian, similar to English in this respect, has an indefinite article (egy ‘a(n)’) and a definite article (a(z) ‘the’) to distinguish different degrees of referentiality. This fact seems to suggest two degrees of referentiality, but a closer look to complex facts (in English, in Hungarian and even in Finnish, which lacks articles) proves that there are (at least) three degrees of positive referentiality in the semantic background of Universal Grammar (see example series (1-5) below), besides the lack of referentiality as a fourth degree, which occurs in Hungarian even in the case of countable nouns, as will be shown in (7) below [8]: Table 1. The four degrees of referentiality (and their expression in Hungarian). non-referential referential non-specific specific non-definite definite ∅ (bare singular) egy ‘a(n)’ egy ‘a(n)’ a(z) ‘the’ The indefinite article is claimed to refer to a specific referentiality: “its referent is a subset of a set of referents already in the domain of discourse” [10] in the English sentence (1e) below, in opposition to the one in the there construction (1b): Example 1. Degrees of referentiality – in English: three (positive) degrees. a. *There is cock in the kitchen. b. There is a cock in the kitchen. 〈+ref, –spec〉 c. *There is the cock in the kitchen. d. *Cock is in the kitchen. e. A cock is in the kitchen. 〈+spec, –def〉 f. The cock is in the kitchen. 〈+def 〉 Even without articles, Finnish can differentiate the three degrees of positive referentiality, by means of word order (〈-spec〉: (2a) vs. 〈+spec〉: (2b-c)) and number agreement (〈-def〉: (2b) vs. 〈+def〉: (2c)): 101\",\"PeriodicalId\":378427,\"journal\":{\"name\":\"International Workshop on Natural Language Processing and Cognitive Science\",\"volume\":\"256 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Natural Language Processing and Cognitive Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0003017401000111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Natural Language Processing and Cognitive Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0003017401000111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

[+ref， -spec >]厨房里有一只公鸡。公鸡在厨房里。一只公鸡在厨房里。f.公鸡在厨房里。即使没有冠词，芬兰语也可以通过语序(< -spec >: (2a) vs. < +spec >: (2b-c))和数量一致性(< -def >: (2b) vs. < +def >: (2c)): 101来区分三种程度的积极参照

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From Sentences to Scope Relations and Backward

As we strive for sophisticated machine translation and reliable information extraction, we have launched a subproject pertaining to the revelation of reference and information structure in (Hungarian) declarative sentences. The crucial part of information extraction is a procedure whose input is a sentence, and whose output is an information structure, which is practically a set of possible operator scope orders (acceptance). A similar procedure forms the first half of machine translation, too: we need the information structure of the source-language sentence. Then an opposite procedure should come (generation), whose input is an information structure, and whose output is an intoned word sequence, that is, a sentence in the target language. We can base the procedure of acceptance (in the above sense) upon that of generation, due to the reversibility of Prolog mechanisms. And as our approach to grammar is “totally lexicalist”, the lexical description of verbs is responsible for the order and intonation of words in the generated sentence. 1 Generating and Accepting Hungarian Sentences As we strive for a sophisticated level of machine translation and reliable information extraction, we have launched a subproject pertaining to the revelation of reference and information structure in declarative sentences. We are primarily working with data from Hungarian, which is known to be a language with a very rich and explicit information structure (consisting of different types of topics, quantifiers and foci) [1], [2], [3], [4] and an also quite explicit system of four degrees of referentiality [5], [6], [7], [8], including the indefinite specific degree [9] [10]. The kind of input we consider is an ordered set of (Hungarian) words furnished with four stress marks (“unstressed” / “STRESSED” / “FOCUS-STRESSED” / “↑CONTRASTIVELY STRESSED↓”) – and our program decides if they constitute a wellformed sentence at all, with arguments of appropriate degrees of referentiality and a possible information structure, and delivers these semantic data, including the possible scope orders of topics, quantifiers and foci. We call this direction acceptance. We also try to “accept” sequences of words without stress marks: in this case the first step is furnishing them with all possible intonation patterns. A further kind of input is the opposite direction, which can be called generation, whose output is an intoned sentence. Generation is based upon the rich lexical description of tensed verbs pertaining to the sentence-internal arrangement and checking of their arguments; and – in harmony with our “totally lexicalist” approach to grammar [11], which can be regarded as a successor of Hudson’s [12] Word Grammar or Alberti G., KÃąroly M. and Kleiber J. From Sentences to Scope Relations and Backward. DOI: 10.5220/0003017401000111 In Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science (ICEIS 2010), page ISBN: 978-989-8425-13-3 Copyright c © 2010 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Karttunen’s [13] Radical Lexicalism, and a formal execution of cognitive ideas similar to those of Croft’s [14] Radical Construction Grammar – special intra-lexical generator rules are responsible for the development of the intricate pre-verbal operator zone of sentences. In what follows, Sec2 provides a review of the relevant linguistic phenomena, then Sec3 elucidates what we mean by “accepting” potential sentences with or without stress marks and “generating” sentences; and finally we speak about implementation, our theoretical and practical work in progress driven by computational aims. 2 Referentiality Requirements and Information Structure Hungarian, similar to English in this respect, has an indefinite article (egy ‘a(n)’) and a definite article (a(z) ‘the’) to distinguish different degrees of referentiality. This fact seems to suggest two degrees of referentiality, but a closer look to complex facts (in English, in Hungarian and even in Finnish, which lacks articles) proves that there are (at least) three degrees of positive referentiality in the semantic background of Universal Grammar (see example series (1-5) below), besides the lack of referentiality as a fourth degree, which occurs in Hungarian even in the case of countable nouns, as will be shown in (7) below [8]: Table 1. The four degrees of referentiality (and their expression in Hungarian). non-referential referential non-specific specific non-definite definite ∅ (bare singular) egy ‘a(n)’ egy ‘a(n)’ a(z) ‘the’ The indefinite article is claimed to refer to a specific referentiality: “its referent is a subset of a set of referents already in the domain of discourse” [10] in the English sentence (1e) below, in opposition to the one in the there construction (1b): Example 1. Degrees of referentiality – in English: three (positive) degrees. a. *There is cock in the kitchen. b. There is a cock in the kitchen. 〈+ref, –spec〉 c. *There is the cock in the kitchen. d. *Cock is in the kitchen. e. A cock is in the kitchen. 〈+spec, –def〉 f. The cock is in the kitchen. 〈+def 〉 Even without articles, Finnish can differentiate the three degrees of positive referentiality, by means of word order (〈-spec〉: (2a) vs. 〈+spec〉: (2b-c)) and number agreement (〈-def〉: (2b) vs. 〈+def〉: (2c)): 101

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on Natural Language Processing and Cognitive Science

自引率

0.00%

发文量