{"title":"Knowledge discovery in scientific databases using text mining and social network analysis","authors":"A. Jalalimanesh","doi":"10.1109/CCSII.2012.6470471","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel methodology to extract core concepts from text corpus. This methodology is based on text mining and social network analysis. At the text mining phase the keywords are extracted by tokenizing, removing stop-lists and generating N-grams. Network analysis phase includes co-word occurrence extraction, network representation of linked terms and calculating centrality measure. We applied our methodology on a text corpus including 650 thesis titles in the domain of Industrial engineering. Interpreting enriched networks was interesting and gave us valuable knowledge about corpus content.","PeriodicalId":389895,"journal":{"name":"2012 IEEE Conference on Control, Systems & Industrial Informatics","volume":"08 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Conference on Control, Systems & Industrial Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCSII.2012.6470471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
This paper introduces a novel methodology to extract core concepts from text corpus. This methodology is based on text mining and social network analysis. At the text mining phase the keywords are extracted by tokenizing, removing stop-lists and generating N-grams. Network analysis phase includes co-word occurrence extraction, network representation of linked terms and calculating centrality measure. We applied our methodology on a text corpus including 650 thesis titles in the domain of Industrial engineering. Interpreting enriched networks was interesting and gave us valuable knowledge about corpus content.