基于文本数据的商业组织聚类——一种LDA主题建模方法

2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI) Pub Date : 2021-11-18 DOI:10.1109/CINTI53070.2021.9668337

Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta

{"title":"基于文本数据的商业组织聚类——一种LDA主题建模方法","authors":"Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta","doi":"10.1109/CINTI53070.2021.9668337","DOIUrl":null,"url":null,"abstract":"Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.","PeriodicalId":340545,"journal":{"name":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","volume":"55 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach\",\"authors\":\"Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta\",\"doi\":\"10.1109/CINTI53070.2021.9668337\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.\",\"PeriodicalId\":340545,\"journal\":{\"name\":\"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)\",\"volume\":\"55 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CINTI53070.2021.9668337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINTI53070.2021.9668337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

文本数据提供了一个新的视角和巨大的潜力，在分析和细分业务组织的附加信息。统计上的“硬数据”往往过于笼统，甚至具有误导性，可能受到几种外生和内生因素的影响，而问卷或调查相关的“软数据”很难获得，或者可能因受访者在组织中的职位或其个人取向而产生偏见。另一方面，除了上述信息来源，商业组织，教育和研究机构等也提供了很多次关于他们自己的文本数据，这可以进一步有助于了解被调查人群。本文利用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)对51个中欧商业、教育和研究机构进行了主题建模。被调查的组织是在线调查的参与者，他们的文本组织描述与基本的地理和行业相关数据一起收集。根据调查结果，对利益相关者进行了分组，并对基于LDA的方法进行了测试，以进一步支持中欧地区商业组织和其他类型组织的集群形成工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach

Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)

自引率

0.00%

发文量