{"title":"A Novel Approach of Feature Vector Design for Financial Information Extraction Using Supervised Learning","authors":"M. Dadhich, James G. Lewis","doi":"10.1109/ISCMI.2016.50","DOIUrl":null,"url":null,"abstract":"Financial information extraction from big financial reports is a tedious task. This paper speaks about page-wise feature generation and applying learning algorithms for identifying financial information (balance sheets, cash flows, and income statements) in Form 10-K or annual reports of companies. Balance sheets, cash flows, and income statements have some structure in them and are semi-structured information. This approach employs selection of unigrams and bigrams based on frequency of occurrence and expert advice, generation of page wise features, and applying learning models for identifying patterns of specific financial information. Different supervised learning models are applied yielding results with very high accuracy (greater than 99%).","PeriodicalId":417057,"journal":{"name":"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI.2016.50","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Financial information extraction from big financial reports is a tedious task. This paper speaks about page-wise feature generation and applying learning algorithms for identifying financial information (balance sheets, cash flows, and income statements) in Form 10-K or annual reports of companies. Balance sheets, cash flows, and income statements have some structure in them and are semi-structured information. This approach employs selection of unigrams and bigrams based on frequency of occurrence and expert advice, generation of page wise features, and applying learning models for identifying patterns of specific financial information. Different supervised learning models are applied yielding results with very high accuracy (greater than 99%).