Nickolay Shamraev, Alexander Batalshchikov, M. Zulkarneev, S. Repalov, Anna Shirokova
{"title":"Weighted finite-state transducer approach to German compound words reconstruction for Speech Recognition","authors":"Nickolay Shamraev, Alexander Batalshchikov, M. Zulkarneev, S. Repalov, Anna Shirokova","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382976","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382976","url":null,"abstract":"An approach is proposed for German Large Vocabulary Speech Recognition, dealing with the problem of compound words, based on unsupervised word decomposition for German words and a probabilistic method for combining the words using finite state transducers. The basic idea of the method is to train n-gram language model on the texts where compound words are substituted by their parts plus concatenation symbol. Thus, the context information is taken into account for the compound words and is used in the process of recombination to find most probable variant for recognition result. The advantage of this approach is the improvement of the word recognition accuracy and a more precise recombination of compound words.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126571427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Balandin, M. Buzdalov, Tatiana Lando, Lidia Pivovarova, S. Popova, Dmitry Ustalov, J. Zizka
{"title":"Preface of AINL-ISMW FRUCT conference proceedings","authors":"S. Balandin, M. Buzdalov, Tatiana Lando, Lidia Pivovarova, S. Popova, Dmitry Ustalov, J. Zizka","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382971","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382971","url":null,"abstract":"We welcome you to the Artificial Intelligence and Natural Language & Information Extraction, Social Media and Web Search (AINL-ISMW) FRUCT Conference. This is the first time when these conferences and international school are organized together in the beautiful city of Saint-Petersburg. All events of the conference are hosted on the ground of Saint-Petersburg State University and ITMO University, which are both known as regional leaders in IT and ICT, with long history and strong scientific schools, as well as strong traditions of cooperation with universities all around the globe.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130837553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of sentence similarity measures for Russian paraphrase identification","authors":"Ekaterina V. Pronoza, E. Yagunova","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382973","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382973","url":null,"abstract":"In this paper we analyze and compare different types of sentence similarity measures applied to the problem of sentential paraphrase identification. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity measures, we also analyze the corpus itself. As a result of the research we disprove the supposition that it is more difficult to distinguish between precise and loose paraphrases than between loose paraphrases and non-paraphrases. We also come up with the recommendations for the application of different similarity measures to identifying paraphrases derived from the news texts.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130932556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Morpho-syntactic parsing based on neural networks and corpus data","authors":"R. Rybka, A. Sboev, I. Moloshnikov, D. Gudovskikh","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382975","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382975","url":null,"abstract":"This article presents methods to construct procedure of morpho-syntactic parsing based on corpus dataset analyzes. It contains 1) the method to eliminate morphological ambiguities using existing morphological parsers and then converting the results of parsing into the format of the language corpus used; 2) a method of selecting parameters for syntactic parsing and assessment of the achievable accuracy of parsing, which can be provided by the data of the used corpus; 3) a method of parsing sentences on the basis of neural network algorithms and a selected set of parameters in the format of used corpus. The basis for this study are sentences with unambiguous morpho-syntactic marking from the Russian National Corpus.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127017480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An information retrieval system for technology analysis and forecasting","authors":"N. Nikitinsky, Dmitry Ustalov, Sergey Shashev","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382969","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382969","url":null,"abstract":"Expert evaluation of grant proposals and research projects is often facilitated by specialized decision support systems, which analyze research and industry trends in a large domain-dependent text corpus. Despite that there exist production-grade technological forecasting systems for English, Russian patent databases and citation indexes had been developed isolated from the global ones. This complicates technology analysis and forecasting in research conducted in Russia. In this paper, we present a scientific information retrieval system designed for the Russian language. The system uses patents, research papers and government contracts for facilitating the expertise process by providing the experts with relevant documents. Comparison of our system with a popular baseline shows promising results.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126954517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revealing potential changes of significant terms in streams of textual data written in natural languages using windowing and text mining","authors":"J. Zizka, F. Dařena","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382982","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382982","url":null,"abstract":"The presented research deals with analyzing continuous streams of textual data written in natural languages. One of problems is revealing possible significant concept changes in Internet blogs, discussions, etc., together with discovering what represents such data, if it is more-or-less topically invariable or changing, and what kind of change occurred. A real-world textual dataset is analyzed using text-mining with automatically generated decision trees to find significant words that affect correct assignment of document labels (classes) and can be used for detecting noticeable changes. The changes and their detection are here modeled by assorted gradual mixture of two languages and the change degree is measured by cosine, Eucledian, and Jaccard distance (similarity), which provide qualitatively the same result. The monitoring procedure is based on analyzing successively adjacent couples of data-windows in the stream using the comparison of the current and its previous window, both represented by their lists of relevant features expressed in words. The presented results demonstrate that the suggested method provides reliable results.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121441521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software-to-hardware tester for the STP-ISS transport protocol verification","authors":"V. Olenev, I. Lavrovskaya, N. Chumakova","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382970","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382970","url":null,"abstract":"Implementation of conformance testers for the communication protocols is an important task, which is being solved in the majority of industrial companies that develop the communication equipment. Current article gives a description of such kind of tester, which is developed to test the on-board devices that work in conformance to the STP-ISS transport protocol standard and SpaceWire networking standard. We give a brief description of the possible solutions for hardware testing; provide the description of STP-ISS protocol. Then we report on implementation of the Software-to-Hardware STP-ISS tester and fields of its application.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130655540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of the new REST API for open source LBS-platform Geo2Tag","authors":"M. Zaslavskiy, D. Mouromtsev","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382981","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382981","url":null,"abstract":"The article describes current state of Geo2Tag LBS platform project and new API version implementation. The platform was improved by following challenges: data visualization, extended datetime processing, social network integration and background calculations support. These challenges were justified by review of most important tendencies for geocontext applications and LBS platforms. Recommendations were fully implemented in API. Also the article contains description of new version implementation. As an example Open Data import API and specific plugin for Open Karelia system was implemented. This extension allowed performing geocontext markup of complex spatiotemporal data inside the platform.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126733777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowdsourcing synset relations with Genus-Species-Match","authors":"Dmitry Ustalov","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382980","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382980","url":null,"abstract":"Enabling a domain-specific lexical resource is useful for improving the performance of a natural language processing system. However, such resources may be represented in the form of glossaries-terms provided with their sense definitions. Despite the problem of integrating such domain-specific glossaries into more sophisticated general purpose resources like thesuari being highly topical, it is complicated by ambiguity of the individual terms. This paper presents Genus-Species-Match, a crowdsourcing workflow for matching noisy pairs of synsets representing hyponymic/hypernymic relations. The system demonstrates F1 score of 80% on an experiment conducted on an online labor marketplace using the EMERCOM glossary and the Yet Another RussNet sense inventory.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"137 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131362586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Khritankov, P. Botov, Nikolay S. Surovenko, S. V. Tsarkov, Dmitriy V. Viuchnov, Yuri V. Chekhovich
{"title":"Discovering text reuse in large collections of documents: A study of theses in history sciences","authors":"A. Khritankov, P. Botov, Nikolay S. Surovenko, S. V. Tsarkov, Dmitriy V. Viuchnov, Yuri V. Chekhovich","doi":"10.1109/AINL-ISMW-FRUCT.2015.7382965","DOIUrl":"https://doi.org/10.1109/AINL-ISMW-FRUCT.2015.7382965","url":null,"abstract":"In this paper we investigate graphs of text reuse cases in scientific degree theses in history sciences (07.xx.xx of Russian Higher Attestation Committee topic codes). Using algorithmic and statistical methods we discovered groups of highly connected theses with large amount of text reuse between them. In addition we located works compiled from several other theses and point out sources of reuse.","PeriodicalId":122232,"journal":{"name":"2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)","volume":"317 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123684556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}