P. Dumrong, J. Gould, G. Lee, L. Nicholson, K. Gao, P. Beling, M. Blume, J. Robinson
{"title":"The quantification of unstructured information and its use in predictive modeling","authors":"P. Dumrong, J. Gould, G. Lee, L. Nicholson, K. Gao, P. Beling, M. Blume, J. Robinson","doi":"10.1109/SIEDS.2003.158028","DOIUrl":null,"url":null,"abstract":"Managing text-based information is crucial when trying to extract valuable information from documents. Assigning a numerical value to the text-based (unstructured) information is one of the ways to extract value. This research studied the quantification of unstructured text and its forecasting power. In order to examine unstructured information that related to predictive models, the Beige books were utilized to investigate and predict changes in the U.S. economy. The Beige books describe current economic conditions and discuss fluctuations in real gross domestic product (GDP). To quantify the text-based unstructured information, the direct scoring algorithm (DSA) was proposed. It utilized the keywords in the document and their subjectively-determined numerical weights to score individual sentence. Statistical analyses were then conducted to verify which sections of the Beige books contributed the most significant information to the prediction of GDP. Utilizing the significant sections, a linear regression model was constructed to predict future GDP growth. The adjusted-R/sup 2/ values of the DSA model were compared to the scoring of the same documents by an economic expert. The comparison demonstrated that the DSA model using the Beige book significantly contributed to the prediction of GDP, and it explained similar amounts of variance compared to the scores created by an economic expert. Also, a comparison between a structured predictive model and the DSA model was conducted to again prove the significance of text-based information.","PeriodicalId":256790,"journal":{"name":"IEEE Systems and Information Engineering Design Symposium, 2003","volume":"1998 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Systems and Information Engineering Design Symposium, 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2003.158028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Managing text-based information is crucial when trying to extract valuable information from documents. Assigning a numerical value to the text-based (unstructured) information is one of the ways to extract value. This research studied the quantification of unstructured text and its forecasting power. In order to examine unstructured information that related to predictive models, the Beige books were utilized to investigate and predict changes in the U.S. economy. The Beige books describe current economic conditions and discuss fluctuations in real gross domestic product (GDP). To quantify the text-based unstructured information, the direct scoring algorithm (DSA) was proposed. It utilized the keywords in the document and their subjectively-determined numerical weights to score individual sentence. Statistical analyses were then conducted to verify which sections of the Beige books contributed the most significant information to the prediction of GDP. Utilizing the significant sections, a linear regression model was constructed to predict future GDP growth. The adjusted-R/sup 2/ values of the DSA model were compared to the scoring of the same documents by an economic expert. The comparison demonstrated that the DSA model using the Beige book significantly contributed to the prediction of GDP, and it explained similar amounts of variance compared to the scores created by an economic expert. Also, a comparison between a structured predictive model and the DSA model was conducted to again prove the significance of text-based information.