{"title":"FODEX——从网络论坛中提取通用数据","authors":"Sebastian Pretzsch, Klemens Muthmann, A. Schill","doi":"10.1109/WAINA.2012.134","DOIUrl":null,"url":null,"abstract":"The web is a large source for valuable data. Today, this data is not only provided by professional publishers, but everyone in the form of user-generated content. A large part of such content is located in web forums. As platforms to share knowledge, they are easily accessible for everyone. However, their vast amount makes it hard to find discussions on a specific topic. Automatic systems can filter and point to relevant information. Unfortunately, the content is presented in a human-readable layout and is not intended to be processed by automatic systems. Therefore, it is necessary to separate the content in a web forum discussion from the layout before doing any further information mining. This paper presents FODEX - a system for automatic forum data extraction. It extracts data from any forum and matches it to a unified data schema.","PeriodicalId":375709,"journal":{"name":"2012 26th International Conference on Advanced Information Networking and Applications Workshops","volume":"53 31","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"FODEX -- Towards Generic Data Extraction from Web Forums\",\"authors\":\"Sebastian Pretzsch, Klemens Muthmann, A. Schill\",\"doi\":\"10.1109/WAINA.2012.134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The web is a large source for valuable data. Today, this data is not only provided by professional publishers, but everyone in the form of user-generated content. A large part of such content is located in web forums. As platforms to share knowledge, they are easily accessible for everyone. However, their vast amount makes it hard to find discussions on a specific topic. Automatic systems can filter and point to relevant information. Unfortunately, the content is presented in a human-readable layout and is not intended to be processed by automatic systems. Therefore, it is necessary to separate the content in a web forum discussion from the layout before doing any further information mining. This paper presents FODEX - a system for automatic forum data extraction. It extracts data from any forum and matches it to a unified data schema.\",\"PeriodicalId\":375709,\"journal\":{\"name\":\"2012 26th International Conference on Advanced Information Networking and Applications Workshops\",\"volume\":\"53 31\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 26th International Conference on Advanced Information Networking and Applications Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WAINA.2012.134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 26th International Conference on Advanced Information Networking and Applications Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAINA.2012.134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FODEX -- Towards Generic Data Extraction from Web Forums
The web is a large source for valuable data. Today, this data is not only provided by professional publishers, but everyone in the form of user-generated content. A large part of such content is located in web forums. As platforms to share knowledge, they are easily accessible for everyone. However, their vast amount makes it hard to find discussions on a specific topic. Automatic systems can filter and point to relevant information. Unfortunately, the content is presented in a human-readable layout and is not intended to be processed by automatic systems. Therefore, it is necessary to separate the content in a web forum discussion from the layout before doing any further information mining. This paper presents FODEX - a system for automatic forum data extraction. It extracts data from any forum and matches it to a unified data schema.