Leandro Ordoñez-Ante, Thomas Vanhove, Gregory van Seghbroeck, T. Wauters, F. Turck
{"title":"社交网站滥用检测的交互式查询和数据可视化","authors":"Leandro Ordoñez-Ante, Thomas Vanhove, Gregory van Seghbroeck, T. Wauters, F. Turck","doi":"10.1109/ICITST.2016.7856676","DOIUrl":null,"url":null,"abstract":"Big Data technologies have traditionally operated in an offline setting, collecting large batches of information on clusters of commodity machines and performing complex and time-consuming computations over it. While frameworks following this approach served well for most applications involving big data analysis during the last decade, other use cases have recently emerged posing challenging requirements on latency and demanding real-time data processing, querying and visualization. That is the case for applications aiming at detecting threatening behaviors in social network platforms, where timely action is required to avoid adverse consequences. In this sense, more and more attention has been drawn towards online data processing systems claiming to address the limitations of batch-oriented frameworks. This paper reports a work in progress on distributed data processing for enabling low-latency querying over big data sets. Two software architectures are discussed for addressing the problem and an experimental evaluation is performed on a proof of concept implementation showing how an approach based on query pre-processing and stateful distributed stream computation can meet the requirements for supporting interactive querying on large and continuously generated data.","PeriodicalId":258740,"journal":{"name":"2016 11th International Conference for Internet Technology and Secured Transactions (ICITST)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Interactive querying and data visualization for abuse detection in social network sites\",\"authors\":\"Leandro Ordoñez-Ante, Thomas Vanhove, Gregory van Seghbroeck, T. Wauters, F. Turck\",\"doi\":\"10.1109/ICITST.2016.7856676\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data technologies have traditionally operated in an offline setting, collecting large batches of information on clusters of commodity machines and performing complex and time-consuming computations over it. While frameworks following this approach served well for most applications involving big data analysis during the last decade, other use cases have recently emerged posing challenging requirements on latency and demanding real-time data processing, querying and visualization. That is the case for applications aiming at detecting threatening behaviors in social network platforms, where timely action is required to avoid adverse consequences. In this sense, more and more attention has been drawn towards online data processing systems claiming to address the limitations of batch-oriented frameworks. This paper reports a work in progress on distributed data processing for enabling low-latency querying over big data sets. Two software architectures are discussed for addressing the problem and an experimental evaluation is performed on a proof of concept implementation showing how an approach based on query pre-processing and stateful distributed stream computation can meet the requirements for supporting interactive querying on large and continuously generated data.\",\"PeriodicalId\":258740,\"journal\":{\"name\":\"2016 11th International Conference for Internet Technology and Secured Transactions (ICITST)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 11th International Conference for Internet Technology and Secured Transactions (ICITST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITST.2016.7856676\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 11th International Conference for Internet Technology and Secured Transactions (ICITST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITST.2016.7856676","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Interactive querying and data visualization for abuse detection in social network sites
Big Data technologies have traditionally operated in an offline setting, collecting large batches of information on clusters of commodity machines and performing complex and time-consuming computations over it. While frameworks following this approach served well for most applications involving big data analysis during the last decade, other use cases have recently emerged posing challenging requirements on latency and demanding real-time data processing, querying and visualization. That is the case for applications aiming at detecting threatening behaviors in social network platforms, where timely action is required to avoid adverse consequences. In this sense, more and more attention has been drawn towards online data processing systems claiming to address the limitations of batch-oriented frameworks. This paper reports a work in progress on distributed data processing for enabling low-latency querying over big data sets. Two software architectures are discussed for addressing the problem and an experimental evaluation is performed on a proof of concept implementation showing how an approach based on query pre-processing and stateful distributed stream computation can meet the requirements for supporting interactive querying on large and continuously generated data.