{"title":"Learning statistics from raw text documents","authors":"Nour El Houda Ben Chaabene, Maha Mallek","doi":"10.1109/CoDIT.2018.8394841","DOIUrl":null,"url":null,"abstract":"Statistics are still the best tool for analyzing political, economic and social phenomena. Among other things, they allow projections and forecasts to be used to assist in decision-making. The today's information society, and the era of “Big Data”, have facilitated access to information. However, most of the available data is unstructured and, as a result, is not readily available for use by the IT tool, particularly statistical data. In the context of a research project whose objective is to extract statistical in-formation from the results of a Web search, it is imperative to recognize the statistical variables dealt with and their values associated. One of the primordial stages and the assignment, to a given variable, of the different instances corresponding to it. We propose in this work an approach for identifying these statistical data in order to represent them in the form of structured data that are easy to process with the help of computers.","PeriodicalId":128011,"journal":{"name":"2018 5th International Conference on Control, Decision and Information Technologies (CoDIT)","volume":"235 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Control, Decision and Information Technologies (CoDIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoDIT.2018.8394841","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Statistics are still the best tool for analyzing political, economic and social phenomena. Among other things, they allow projections and forecasts to be used to assist in decision-making. The today's information society, and the era of “Big Data”, have facilitated access to information. However, most of the available data is unstructured and, as a result, is not readily available for use by the IT tool, particularly statistical data. In the context of a research project whose objective is to extract statistical in-formation from the results of a Web search, it is imperative to recognize the statistical variables dealt with and their values associated. One of the primordial stages and the assignment, to a given variable, of the different instances corresponding to it. We propose in this work an approach for identifying these statistical data in order to represent them in the form of structured data that are easy to process with the help of computers.