{"title":"一个集中的爬虫罗马尼亚词发现","authors":"Ionut-Gabriel Radu, Traian Rebedea","doi":"10.1109/ROEDUNET-RENAM.2014.6955323","DOIUrl":null,"url":null,"abstract":"As all natural languages are subject to change over time and as the Web becomes more prevalent, it also constitutes a major source for identifying language evolution. Although these changes affect all linguistic branches ranging from phonetics, lexicon and grammar to semantics and pragmatics, we will focus only on discovering new potential words that entered the Romanian lexicon or alternative forms (lexicalizations) that are frequently used. In this paper we describe the architecture of a system which models the rate of Romanian vocabulary growth based on different statistics gathered by a focused web crawler. In order to validate the proposed system, the paper also presents the main new words identified by the system in online texts written in Romanian.","PeriodicalId":340048,"journal":{"name":"2014 RoEduNet Conference 13th Edition: Networking in Education and Research Joint Event RENAM 8th Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A focused crawler for Romanian words discovery\",\"authors\":\"Ionut-Gabriel Radu, Traian Rebedea\",\"doi\":\"10.1109/ROEDUNET-RENAM.2014.6955323\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As all natural languages are subject to change over time and as the Web becomes more prevalent, it also constitutes a major source for identifying language evolution. Although these changes affect all linguistic branches ranging from phonetics, lexicon and grammar to semantics and pragmatics, we will focus only on discovering new potential words that entered the Romanian lexicon or alternative forms (lexicalizations) that are frequently used. In this paper we describe the architecture of a system which models the rate of Romanian vocabulary growth based on different statistics gathered by a focused web crawler. In order to validate the proposed system, the paper also presents the main new words identified by the system in online texts written in Romanian.\",\"PeriodicalId\":340048,\"journal\":{\"name\":\"2014 RoEduNet Conference 13th Edition: Networking in Education and Research Joint Event RENAM 8th Conference\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 RoEduNet Conference 13th Edition: Networking in Education and Research Joint Event RENAM 8th Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROEDUNET-RENAM.2014.6955323\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 RoEduNet Conference 13th Edition: Networking in Education and Research Joint Event RENAM 8th Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROEDUNET-RENAM.2014.6955323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
As all natural languages are subject to change over time and as the Web becomes more prevalent, it also constitutes a major source for identifying language evolution. Although these changes affect all linguistic branches ranging from phonetics, lexicon and grammar to semantics and pragmatics, we will focus only on discovering new potential words that entered the Romanian lexicon or alternative forms (lexicalizations) that are frequently used. In this paper we describe the architecture of a system which models the rate of Romanian vocabulary growth based on different statistics gathered by a focused web crawler. In order to validate the proposed system, the paper also presents the main new words identified by the system in online texts written in Romanian.