Predicting Secondary Equity Offerings (SEOs) Using Machine Learning

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2018-12-01 DOI:10.1109/ICMLA.2018.00198

Linlin Cui, Jianhua Chen, Wentao Wu

{"title":"Predicting Secondary Equity Offerings (SEOs) Using Machine Learning","authors":"Linlin Cui, Jianhua Chen, Wentao Wu","doi":"10.1109/ICMLA.2018.00198","DOIUrl":null,"url":null,"abstract":"This paper explores the application of machine learning techniques in finance to predict if a publicly-traded firm will issue a Seasoned Equity Offering (SEO) by analyzing the firm's 10-Q filing documents with Security and Exchange Commissions (SEC). Specifically, using the information content in the Management Discussion and Analysis section (MD&A) of 10-Q filings, we train five different algorithms, including Logistic Regression (LR), Support Vector Classification (SVC), Multinomial Naïve Bayes (NB), Artificial Neural Network (ANN) and Random Forest (RF). Two types of features, unigrams and phrases are considered. Term frequency-inverse document (TF-IDF) scores are used as independent variables in these models. Experimental results show that the accuracy of phrases-only models has a range of 0-2% improvement for LR, NB, and RF compared with unigrams-only models. The accuracy of phrase-only model for SVC is close to that of unigrams-only model. The 74.53% accuracy of unigrams-only model for SVC classifier performs the best among all tested classifiers. The precision of all models varies between 60% and 75%, while the recall varies between 55% and 85%. Further, we tune model parameters of one linear model (LR) and one non-linear model (RF) to see how these parameters will impact the models' performance. Finally, we apply RF to find the most important features on prediction and find that \"merger\" is the most important feature in both unigrams-only model and phrases-only model. We conclude that text mining with SEC financial document filings could be an effective tool to predict important corporate events such as SEO.","PeriodicalId":6533,"journal":{"name":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"25 3 1","pages":"1219-1224"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2018.00198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper explores the application of machine learning techniques in finance to predict if a publicly-traded firm will issue a Seasoned Equity Offering (SEO) by analyzing the firm's 10-Q filing documents with Security and Exchange Commissions (SEC). Specifically, using the information content in the Management Discussion and Analysis section (MD&A) of 10-Q filings, we train five different algorithms, including Logistic Regression (LR), Support Vector Classification (SVC), Multinomial Naïve Bayes (NB), Artificial Neural Network (ANN) and Random Forest (RF). Two types of features, unigrams and phrases are considered. Term frequency-inverse document (TF-IDF) scores are used as independent variables in these models. Experimental results show that the accuracy of phrases-only models has a range of 0-2% improvement for LR, NB, and RF compared with unigrams-only models. The accuracy of phrase-only model for SVC is close to that of unigrams-only model. The 74.53% accuracy of unigrams-only model for SVC classifier performs the best among all tested classifiers. The precision of all models varies between 60% and 75%, while the recall varies between 55% and 85%. Further, we tune model parameters of one linear model (LR) and one non-linear model (RF) to see how these parameters will impact the models' performance. Finally, we apply RF to find the most important features on prediction and find that "merger" is the most important feature in both unigrams-only model and phrases-only model. We conclude that text mining with SEC financial document filings could be an effective tool to predict important corporate events such as SEO.

查看原文本刊更多论文

利用机器学习预测二级股权发行(seo)

本文探讨了机器学习技术在金融领域的应用，通过分析上市公司向美国证券交易委员会(SEC)提交的10-Q文件，预测上市公司是否会发行经验丰富的股票发行(SEO)。具体来说，利用10-Q文件的管理讨论和分析部分(MD&A)中的信息内容，我们训练了五种不同的算法，包括逻辑回归(LR)、支持向量分类(SVC)、多项式Naïve贝叶斯(NB)、人工神经网络(ANN)和随机森林(RF)。考虑了两种类型的特征，字母组合和短语。术语频率逆文档(TF-IDF)分数被用作这些模型中的独立变量。实验结果表明，纯短语模型对LR、NB和RF的准确率比纯短语模型提高了0-2%。纯短语模型的SVC精度接近纯图模型。在所有被测试的分类器中，SVC分类器的unigrams-only模型的准确率为74.53%。所有模型的准确率在60%到75%之间，召回率在55%到85%之间。此外，我们调整了一个线性模型(LR)和一个非线性模型(RF)的模型参数，以了解这些参数将如何影响模型的性能。最后，我们应用正则表达式找出预测中最重要的特征，发现“合并”是纯图模型和纯短语模型中最重要的特征。我们得出的结论是，使用SEC财务文件文件进行文本挖掘可能是预测重要公司事件(如SEO)的有效工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量