{"title":"Effective topic modeling for email","authors":"Hiep Hong, Teng-Sheng Moh","doi":"10.1109/HPCSim.2015.7237060","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237060","url":null,"abstract":"Emails have been increasingly popular and have become an indispensible tool for communication and document exchange. Because of its convenience, people use emails every day at work, at school, and for personal matters. Consequently, the number of emails people receive daily keeps on increasing, causing them to spend more time organizing the emails. People often need to classify and move email into folders so that they can go back and read them later. Most email client tools available today allow the users to filter and organize emails by defining rules on how to handle incoming emails. However, this manual process requires users to know their expected emails very well, and to make good use of these tools users need to understand how filtering rules work and how to apply them correctly. In reality, most users do not know what their incoming emails will be. The work described in this paper aims to take the burden of organizing emails away from users by using the Latent Dirichlet Allocation (LDA) [10] to automatically extract topics from emails and group them into folders of common topics. Experiments have shown that the proposed method is able to correctly group emails in appropriate topics with 77% accuracy.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129000049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward a fully parallel multigrid in time algorithm in PETSc environment: A case study in ocean models","authors":"L. Carracciuolo, L. D’Amore, Valeria Mele","doi":"10.1109/HPCSim.2015.7237098","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237098","url":null,"abstract":"We consider linear systems that arise from the discretization of evolutionary models. Typically, solution algorithms are based on a time-stepping approach, solving for one time step after the other. Parallelism is limited to the spatial dimension only. Because time is sequential in nature, the idea of simultaneously solving along time steps is not intuitive. One approach to achieve parallelism in time direction is MGRIT algorithm [7], based on multigrid reduction (MGR) techniques. Here we refer to this approach as MGR-1D. Other kind of approach is the space-time multigrid, where time is simply another dimension in the grid. Analougsly, we refer to this approach as MGR-4D. In this work, motivated by the need of maximizing the availability of new algorithms to climate science, we propose a new parallel approach that mixes both the MGR-1D idea and classical space multigrid methods. We refer to it as the MGR3D+1 approach. Moreover, we discuss their implementation in the high performance scientific library PETSc, as starting point to develope more efficient and scalable algorithms in ocean models.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129371258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Moreton-Fernandez, Arturo González-Escribano, D. Ferraris
{"title":"On the run-time cost of distributed-memory communications generated using the polyhedral model","authors":"Ana Moreton-Fernandez, Arturo González-Escribano, D. Ferraris","doi":"10.1109/HPCSim.2015.7237034","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237034","url":null,"abstract":"The polyhedral model can be used to automatically generate distributed-memory communications for affine nested loops. Recently, new communication schemes that reduce the communication volume have been presented. In this paper we study the extra computational effort introduced at run-time by the code generated to manage the communication details across distributed processes. We focus on the most sophisticated communication scheme so far introduced (the FOP scheme). We present an asymptotic cost study of the FOP scheme in terms of two main run-time parameters: The problem size, and the number of processors. Based on this study, we identify scalability limitations in current implementations of these techniques, and propose a simple implementation alternative to eliminate one of them. Experimental results are presented, showing the potential impact on performance of these implementation limitations when using these codes in large parallel systems.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129521300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NoC-centric partitioning and reconfiguration technologies for the efficient sharing of multi-core programmable accelerators","authors":"Marco Balboni, D. Bertozzi","doi":"10.1109/HPCSim.2015.7237107","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237107","url":null,"abstract":"Today, multi- and many-core architectures are gaining momentum as a potential source of hardware acceleration, bringing to new challenges for system designers related to both system virtualization and runtime testing. My research activity tackles these challenges exploiting and optimizing the capabilities of reconfiguring the routing function at runtime.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134189524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How advanced cloud technologies can impact and change HPC environments for simulation","authors":"M. Mancini, G. Aloisio","doi":"10.1109/HPCSim.2015.7237116","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237116","url":null,"abstract":"In the last years, most enterprises and IT organizations have adopted virtualization and cloud computing solutions to achieve features such as flexibility, elasticity, fault tolerance, high availability and reliability for their computational, storage and networking resource infrastructures. Moreover, recent advances in Linux containers [1] and the emergence of technologies as Docker [2] are revolutionizing the way of developing and deploying web and large scale distributed applications.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123686872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisiting co-scheduling for upcoming ExaScale systems","authors":"Stefan Lankes","doi":"10.1109/HPCSim.2015.7237117","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237117","url":null,"abstract":"Future generation supercomputers will be a hundred times faster than today's leaders of the Top 500 while reaching the exascale mark. It is predicted that this performance gain in terms of CPU power will be achieved by a shift in the ratio of compute nodes to cores per node. The amount of nodes will not grow significantly compared to today's systems, instead they will be built by using many-core CPUs holding more than hundreds of cores resulting in a widening gap between compute power and I/O performance [1]. Four key challenges of future exascale systems have been identified by previous studies that must be coped with when designing them: energy and power, memory and storage, concurrency and locality, and resiliency [2].","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126779255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tracing long running applications: A case study using Gromacs","authors":"M. Wagner, J. Doleschal, A. Knüpfer","doi":"10.1109/HPCSim.2015.7237031","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237031","url":null,"abstract":"Performance analysis is inevitable to develop applications that utilize the enormous capabilities of current HPC systems. While many recent tool studies focused on large scales, performance analysis of long-running applications has not been paid much attention. This paper investigates challenges that arise from monitoring long-running real-life applications, in particular, the disruptive bias of intermediate memory buffer flushes in the measurement environment. We propose a concept for an in-memory event tracing that completely avoids intermediate memory buffer flushes. We evaluate to which extent such an in-memory event tracing workflow helps overcoming the critical properties, such as resulting trace size, application slow down, and measurement bias. We utilize a prototype implementation, based on Score-P and OTF2, with the molecular dynamics packages Gromacs, an application currently infeasible to monitor in a full production run.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129770087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A runtime/memory trade-off of the continous Ziggurat method on GPUs","authors":"C. Riesinger, T. Neckel","doi":"10.1109/HPCSim.2015.7237018","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237018","url":null,"abstract":"Pseudo random number generators are intensively used in many computational applications, e.g. the treatment of Uncertainty Quantification problems. For this reason, the optimization of such generators for various hardware architectures is of big interest. We present a runtime/memory trade-off for the popular Ziggurat method with focus on GPUs. Such a trade-off means that the runtime of pseudo random number generation can be reduced by investing more memory and vice versa. Especially GPUs benefit from this approach since it reduces warp divergence which occurs for rejection methods such as the Ziggurat method. To our knowledge, such a trade-off for the Ziggurat method has never been investigated before for GPUs. It is shown that this approach makes the Ziggurat method competitive against well established normal pseudo random number generators on GPUs. Optimal implementations and grid configurations are given for different GPU architectures.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128849301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Active learning for support vector regression in radiation shielding design","authors":"Paulina Duckic, Krešimir Trontl, M. Matijević","doi":"10.1109/HPCSim.2015.7237055","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237055","url":null,"abstract":"Recently a novel approach based on support vector regression technique has been proposed and tested for the estimation of multi layer buildup factors for gamma ray shielding calculations, while for neutron shielding calculations some initial analyses have been conducted. During the development of the model a number of questions regarding possible application of active learning measures have been raised. In this paper general applicability of the active learning measures on the problem, in particular data transfer method used in the investigation, and testing of the active procedure are discussed.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116872436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance","authors":"G. Utrera, Marisa Gil, X. Martorell","doi":"10.1109/HPCSim.2015.7237072","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237072","url":null,"abstract":"Applications for HPC platforms are mainly based on hybrid programming models: MPI for communication and OpenMP for task and fork-join parallelism to exploit shared memory communication inside a node. On the basis of this scheme, much research has been carried out to improve performance. Some examples are: the overlap of communication and computation, or the increase of speedup and bandwidth on new network fabrics (i.e. Infiniband and 10GB or 40GB ethernet). Henceforth, as far as computation and communication are concerned, the HPC platforms will be heterogeneous with high-speed networks. And, in this context, an important issue is to decide how to distribute the workload among all the nodes in order to balance the application execution as well as choosing the most appropriate programming model to exploit parallelism inside the node. In this paper we propose a mechanism to balance dynamically the work distribution among the heterogeneous components of an heterogeneous cluster based on their performance characteristics. For our evaluations we run the miniFE mini-application of the Mantevo suite benchmark, in a heterogeneous Intel MIC cluster. Experimental results show that making an effort to choose the appropriate number of threads can improve performance significantly over choosing the maximum available number of cores in the Intel MIC.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124393442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}