{"title":"基于文本数据的商业组织聚类——一种LDA主题建模方法","authors":"Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta","doi":"10.1109/CINTI53070.2021.9668337","DOIUrl":null,"url":null,"abstract":"Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.","PeriodicalId":340545,"journal":{"name":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","volume":"55 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach\",\"authors\":\"Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta\",\"doi\":\"10.1109/CINTI53070.2021.9668337\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.\",\"PeriodicalId\":340545,\"journal\":{\"name\":\"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)\",\"volume\":\"55 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CINTI53070.2021.9668337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINTI53070.2021.9668337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach
Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.