{"title":"Keynote: Internet of mobile things: Challenges and opportunities","authors":"K. Nahrstedt","doi":"10.1145/2628071.2635931","DOIUrl":"https://doi.org/10.1145/2628071.2635931","url":null,"abstract":"The Internet of Things (IoT) concept has been around for some time and applications such as transportation, health-care, education, travel, smart grid, retail, are and will be major benefactors of this concept. However, only recently, due to technological advances in sensor devices and rich wireless connectivity, Internet of Things at scale is becoming reality. For example, Cisco's Internet of Things Group predicts over 50 billion connected sensory devices by 2020. In this talk, we will discuss the Internet of Mobile Things (IoMT) since several game-changing technological advances happened on ‘mobile things’ such as mobile phones, trains, and cars, where rich sets of sensors, connected via diverse sets of wireless Internet technologies, are changing and influencing how people communicate, move, and download and distribute information. In this space, challenges come from the needs to determine (1) contextual information such as location, duration of contact, density of devices, utilizing networked sensory information; (2) higher level knowledge such as users' activity detection, mood detection, applications usage pattern detection and user interactions on ‘mobile things’, utilizing contextual information; and (3) adaptive and real-time parallel and distributed architectures that integrate context, activity, mood, usage patterns into mobile application services on mobile ‘things’. Solving these challenges will provide enormous opportunities to improve the utility of mobile ‘things’, optimizing scarce resources on mobile ‘things’ such as energy, memory, and bandwidth.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115389208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deepak Majeti, Kuldeep S. Meel, R. Barik, Vivek Sarkar
{"title":"ADHA: Automatic data layout framework for heterogeneous architectures","authors":"Deepak Majeti, Kuldeep S. Meel, R. Barik, Vivek Sarkar","doi":"10.1145/2628071.2628122","DOIUrl":"https://doi.org/10.1145/2628071.2628122","url":null,"abstract":"Data layouts play a crucial role in determining the performance of a given application running on a given architecture. Existing parallel programming frameworks for both multicore and heterogeneous systems leave the onus of selecting a data layout to the programmer. Therefore, shifting the burden of data layout selection to optimizing compilers can greatly enhance programmer productivity and application performance. In this work, we introduce ADHA: a two-level hierarchal formulation of the data layout problem for modern heterogeneous architectures. We have created a reference implementation of ADHA in the Heterogeneous Habanero-C (H2C) parallel programming system. ADHA shows significant performance benefits of up to 6.92× compared to manually specified layouts for two benchmark programs running on a CPU+GPU heterogeneous platform.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116460378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sreepathi Pai, R. Govindarajan, M. J. Thazhuthaveetil
{"title":"Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels","authors":"Sreepathi Pai, R. Govindarajan, M. J. Thazhuthaveetil","doi":"10.1145/2628071.2628117","DOIUrl":"https://doi.org/10.1145/2628071.2628117","url":null,"abstract":"Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) currently uses the FIFO policy to schedule thread blocks of concurrent kernels. We show that the FIFO policy leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive Shortest Remaining Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime of GPU kernels, we show that such an estimate of the runtime can be easily obtained using online profiling and exploiting a simple observation on GPU kernels' grid structure. Specifically, we propose a novel Structural Runtime Predictor. Using a simple Staircase model of GPU kernel execution, we show that the runtime of a kernel can be predicted by profiling only the first few thread blocks. We evaluate an online predictor based on this model on benchmarks from ERCBench, and find that it can estimate the actual runtime reasonably well after the execution of only a single thread block. Next, we design a thread block scheduler that is both concurrent kernel-aware and uses this predictor. We implement the Shortest Remaining Time First (SRTF) policy and evaluate it on two-program workloads from ER-CBench. SRTF improves STP by 1.18× and ANTT by 2.25× over FIFO. When compared to MPMax, a state-of-the-art resource allocation policy for concurrent kernels, SRTF improves STP by 1.16× and ANTT by 1.3×. To improve fairness, we also propose SRTF/Adaptive which controls resource usage of concurrently executing kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12×, ANTT by 2.23× and Fairness by 2.95× compared to FIFO. Overall, our implementation of SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an oracle optimal scheduling policy), bridging 49% of the gap between FIFO and SJF.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133446230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}