{"title":"创业中高度可变文本数据的鲁棒最大关联规则挖掘与可视化","authors":"Frédéric Simard, J. St-Pierre, Ismaïl Biskri","doi":"10.1145/3012071.3012097","DOIUrl":null,"url":null,"abstract":"Searching for reliable information in textual data with highly heterogeneous vocabulary yields major difficulties. The task at hand was to study an amalgam of transcripts of think-aloud experiments conducted with entrepreneurs with different backgrounds. The many different backgrounds of the entrepreneurs are translated into the high variability of the vocabulary found in the transcripts. In an effort to reduce this variability while using the method for investigating textual databases in the form of association rules presented by Agrawal et al. [1], is exposed a novel approach based on the use of synonyms to standardize the data prior to applying association rules. Moreover, as association rules retrieval techniques produce large datasets and because those statistical objects express relationships between items, a method to analyze those discovered associations in the form of a network is further presented. This enables the use of Graph Theory/Network Science, two mature related fields whose methods can lead to interesting and nontrivial discoveries.","PeriodicalId":294250,"journal":{"name":"Proceedings of the 8th International Conference on Management of Digital EcoSystems","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Mining and visualizing robust maximal association rules on highly variable textual data in entrepreneurship\",\"authors\":\"Frédéric Simard, J. St-Pierre, Ismaïl Biskri\",\"doi\":\"10.1145/3012071.3012097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Searching for reliable information in textual data with highly heterogeneous vocabulary yields major difficulties. The task at hand was to study an amalgam of transcripts of think-aloud experiments conducted with entrepreneurs with different backgrounds. The many different backgrounds of the entrepreneurs are translated into the high variability of the vocabulary found in the transcripts. In an effort to reduce this variability while using the method for investigating textual databases in the form of association rules presented by Agrawal et al. [1], is exposed a novel approach based on the use of synonyms to standardize the data prior to applying association rules. Moreover, as association rules retrieval techniques produce large datasets and because those statistical objects express relationships between items, a method to analyze those discovered associations in the form of a network is further presented. This enables the use of Graph Theory/Network Science, two mature related fields whose methods can lead to interesting and nontrivial discoveries.\",\"PeriodicalId\":294250,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on Management of Digital EcoSystems\",\"volume\":\"127 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on Management of Digital EcoSystems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3012071.3012097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Management of Digital EcoSystems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3012071.3012097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mining and visualizing robust maximal association rules on highly variable textual data in entrepreneurship
Searching for reliable information in textual data with highly heterogeneous vocabulary yields major difficulties. The task at hand was to study an amalgam of transcripts of think-aloud experiments conducted with entrepreneurs with different backgrounds. The many different backgrounds of the entrepreneurs are translated into the high variability of the vocabulary found in the transcripts. In an effort to reduce this variability while using the method for investigating textual databases in the form of association rules presented by Agrawal et al. [1], is exposed a novel approach based on the use of synonyms to standardize the data prior to applying association rules. Moreover, as association rules retrieval techniques produce large datasets and because those statistical objects express relationships between items, a method to analyze those discovered associations in the form of a network is further presented. This enables the use of Graph Theory/Network Science, two mature related fields whose methods can lead to interesting and nontrivial discoveries.