L. Cabral, R. Lins, R. Mello, F. Freitas, B. T. Ávila, S. Simske, M. Riss
{"title":"A platform for language independent summarization","authors":"L. Cabral, R. Lins, R. Mello, F. Freitas, B. T. Ávila, S. Simske, M. Riss","doi":"10.1145/2644866.2644890","DOIUrl":null,"url":null,"abstract":"The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information in documents by creating shorter versions of the text. However, most of the techniques and tools available for automatic text summarization are designed only for the English language, which is a severe restriction. There are multilingual platforms that support, at most, 2 languages. This paper proposes a language independent summarization platform that provides corpus acquisition, language classification, translation and text summarization for 25 different languages.","PeriodicalId":91385,"journal":{"name":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","volume":"43 1","pages":"203-206"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Document Engineering. ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2644866.2644890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information in documents by creating shorter versions of the text. However, most of the techniques and tools available for automatic text summarization are designed only for the English language, which is a severe restriction. There are multilingual platforms that support, at most, 2 languages. This paper proposes a language independent summarization platform that provides corpus acquisition, language classification, translation and text summarization for 25 different languages.