基于文本数据的商业组织聚类——一种LDA主题建模方法

Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta
{"title":"基于文本数据的商业组织聚类——一种LDA主题建模方法","authors":"Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta","doi":"10.1109/CINTI53070.2021.9668337","DOIUrl":null,"url":null,"abstract":"Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.","PeriodicalId":340545,"journal":{"name":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","volume":"55 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach\",\"authors\":\"Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta\",\"doi\":\"10.1109/CINTI53070.2021.9668337\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.\",\"PeriodicalId\":340545,\"journal\":{\"name\":\"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)\",\"volume\":\"55 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CINTI53070.2021.9668337\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINTI53070.2021.9668337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

文本数据提供了一个新的视角和巨大的潜力,在分析和细分业务组织的附加信息。统计上的“硬数据”往往过于笼统,甚至具有误导性,可能受到几种外生和内生因素的影响,而问卷或调查相关的“软数据”很难获得,或者可能因受访者在组织中的职位或其个人取向而产生偏见。另一方面,除了上述信息来源,商业组织,教育和研究机构等也提供了很多次关于他们自己的文本数据,这可以进一步有助于了解被调查人群。本文利用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)对51个中欧商业、教育和研究机构进行了主题建模。被调查的组织是在线调查的参与者,他们的文本组织描述与基本的地理和行业相关数据一起收集。根据调查结果,对利益相关者进行了分组,并对基于LDA的方法进行了测试,以进一步支持中欧地区商业组织和其他类型组织的集群形成工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach
Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信