{"title":"Text segmentation in Polish","authors":"Pawel P. Mazur","doi":"10.1109/ISDA.2005.89","DOIUrl":null,"url":null,"abstract":"In the paper a great importance of text segmentation in natural language engineering and in artificial intelligence systems has been pointed out. It has been shown that in Polish all punctuation marks that end sentences have also other functions in sentences. In this context various approaches to sentence boundary disambiguation have been presented. Taking features of Polish into consideration, text tokenization has been analysed. The direction of empirical research on Polish texts segmentation based on the analysis contained in this paper has been drawn. Also the list of Polish abbreviations that have the same spelling as some common words has been presented.","PeriodicalId":345842,"journal":{"name":"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Intelligent Systems Design and Applications (ISDA'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2005.89","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In the paper a great importance of text segmentation in natural language engineering and in artificial intelligence systems has been pointed out. It has been shown that in Polish all punctuation marks that end sentences have also other functions in sentences. In this context various approaches to sentence boundary disambiguation have been presented. Taking features of Polish into consideration, text tokenization has been analysed. The direction of empirical research on Polish texts segmentation based on the analysis contained in this paper has been drawn. Also the list of Polish abbreviations that have the same spelling as some common words has been presented.