{"title":"Building a corpus of Italian Web forums","authors":"Silvia Petri, M. Tavosanis","doi":"10.21248/jlcl.24.2009.116","DOIUrl":"https://doi.org/10.21248/jlcl.24.2009.116","url":null,"abstract":"","PeriodicalId":137584,"journal":{"name":"Journal for Language Technology and Computational Linguistics","volume":"2 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120987137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Centroid-based Approach for Genre Categorization of Web Pages","authors":"Chaker Jebari","doi":"10.21248/jlcl.24.2009.114","DOIUrl":"https://doi.org/10.21248/jlcl.24.2009.114","url":null,"abstract":"In this paper we propose a new centroid-based approach for genre catego rization of web pages. Our approach constructs genre centroids using a set of genre-labeled web pages, called training web pages. The obtained cen troids will be used to classify new web pages. The aim of our approach is to provide a flexible, incremental, refined and combined categorization, which is more suitable for automatic web genre identification. Our approach is flexible because it assigns a web page to all predefined genres with a confi dence score; it is incremental because it classifies web pages one by one; it is refined because each web page either refines the centroids or is discarded as noisy page; finally, our approach combines three dierent feature sets, i.e. URL addresses, logical structure and hypertext structure. The experiments conducted on two known corpora show that our approach is very fast and outperforms other approaches.","PeriodicalId":137584,"journal":{"name":"Journal for Language Technology and Computational Linguistics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127176699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Brief Survey of Text Mining","authors":"A. Hotho, A. Nürnberger, G. Paass","doi":"10.21248/jlcl.20.2005.68","DOIUrl":"https://doi.org/10.21248/jlcl.20.2005.68","url":null,"abstract":"The enormous amount of information stored in unstructured texts cannot simply be used for further processing by computers, which typically handle text as simple sequences of character strings. Therefore, specific (pre-)processing methods and algorithms are required in order to extract useful patterns. Text mining refers generally to the process of extracting interesting information and knowledge from unstructured text. In this article, we discuss text mining as a young and interdisciplinary field in the intersection of the related areas information retrieval, machine learning, statistics, computational linguistics and especially data mining. We describe the main analysis tasks preprocessing, classification, clustering, information extraction and visualization. In addition, we briefly discuss a number of successful applications of text mining.","PeriodicalId":137584,"journal":{"name":"Journal for Language Technology and Computational Linguistics","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129204433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GermaNet Synsets as Selectional Preferences in Semantic Verb Clustering","authors":"S. Schulte im Walde","doi":"10.21248/jlcl.19.2004.58","DOIUrl":"https://doi.org/10.21248/jlcl.19.2004.58","url":null,"abstract":"","PeriodicalId":137584,"journal":{"name":"Journal for Language Technology and Computational Linguistics","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132075560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rezension von Roland Hausser: Foundations of Computational Linguistics. Man-Machine Communication in Natural Language","authors":"W. Lenders","doi":"10.21248/jlcl.16.1999.14","DOIUrl":"https://doi.org/10.21248/jlcl.16.1999.14","url":null,"abstract":"","PeriodicalId":137584,"journal":{"name":"Journal for Language Technology and Computational Linguistics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126379997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}