Keyu Chen, Benjamin Cosgro, Oretha Domfeh, Alex Stern, Gizem Korkmaz, Neil Alexander Kattampallil
{"title":"利用Google BERT检测和衡量新闻文章中讨论的创新","authors":"Keyu Chen, Benjamin Cosgro, Oretha Domfeh, Alex Stern, Gizem Korkmaz, Neil Alexander Kattampallil","doi":"10.1109/SIEDS52267.2021.9483744","DOIUrl":null,"url":null,"abstract":"In this paper, we leverage non-survey data (i.e., news articles), natural language processing (NLP), and deep learning methods to detect and measure innovation, ultimately enriching innovation surveys. Our dataset is composed of 1.9M news articles published between 2013 and 2018 acquired from Dow Jones Data, News, and Analytics. We use Bidirectional Encoder Representation from Transformers (BERT), a neural network-based technique for NLP pre-training developed by Google. Our methods involve: (i) utilizing Google’s BERT as a binary classifier to identify articles that mention innovation, (ii) developing BERT’s named-entity recognition algorithm to extract company names from these articles, (iii) leveraging BERT’s question and answering capabilities to extract company and product names. As a result, we obtain innovation indicators, i.e., company innovations in the pharmaceutical sector.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Leveraging Google BERT to Detect and Measure Innovation Discussed in News Articles\",\"authors\":\"Keyu Chen, Benjamin Cosgro, Oretha Domfeh, Alex Stern, Gizem Korkmaz, Neil Alexander Kattampallil\",\"doi\":\"10.1109/SIEDS52267.2021.9483744\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we leverage non-survey data (i.e., news articles), natural language processing (NLP), and deep learning methods to detect and measure innovation, ultimately enriching innovation surveys. Our dataset is composed of 1.9M news articles published between 2013 and 2018 acquired from Dow Jones Data, News, and Analytics. We use Bidirectional Encoder Representation from Transformers (BERT), a neural network-based technique for NLP pre-training developed by Google. Our methods involve: (i) utilizing Google’s BERT as a binary classifier to identify articles that mention innovation, (ii) developing BERT’s named-entity recognition algorithm to extract company names from these articles, (iii) leveraging BERT’s question and answering capabilities to extract company and product names. As a result, we obtain innovation indicators, i.e., company innovations in the pharmaceutical sector.\",\"PeriodicalId\":426747,\"journal\":{\"name\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS52267.2021.9483744\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483744","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Leveraging Google BERT to Detect and Measure Innovation Discussed in News Articles
In this paper, we leverage non-survey data (i.e., news articles), natural language processing (NLP), and deep learning methods to detect and measure innovation, ultimately enriching innovation surveys. Our dataset is composed of 1.9M news articles published between 2013 and 2018 acquired from Dow Jones Data, News, and Analytics. We use Bidirectional Encoder Representation from Transformers (BERT), a neural network-based technique for NLP pre-training developed by Google. Our methods involve: (i) utilizing Google’s BERT as a binary classifier to identify articles that mention innovation, (ii) developing BERT’s named-entity recognition algorithm to extract company names from these articles, (iii) leveraging BERT’s question and answering capabilities to extract company and product names. As a result, we obtain innovation indicators, i.e., company innovations in the pharmaceutical sector.