Janghaeng Lee, Haicheng Wu, M. Ravichandran, Nathan Clark
{"title":"线程裁剪:动态地将线程编织在一起,以实现高效、自适应的并行应用程序","authors":"Janghaeng Lee, Haicheng Wu, M. Ravichandran, Nathan Clark","doi":"10.1145/1815961.1815996","DOIUrl":null,"url":null,"abstract":"Extracting performance from modern parallel architectures requires that applications be divided into many different threads of execution. Unfortunately selecting the appropriate number of threads for an application is a daunting task. Having too many threads can quickly saturate shared resources, such as cache capacity or memory bandwidth, thus degrading performance. On the other hand, having too few threads makes inefficient use of the resources available. Beyond static resource assignment, the program inputs and dynamic system state (e.g., what other applications are executing in the system) can have a significant impact on the right number of threads to use for a particular application. To address this problem we present the Thread Tailor, a dynamic system that automatically adjusts the number of threads in an application to optimize system efficiency. The Thread Tailor leverages offline analysis to estimate what type of threads will exist at runtime and the communication patterns between them. Using this information Thread Tailor dynamically combines threads to better suit the needs of the target system. Thread Tailor adjusts not only to the architecture, but also other applications in the system, and this paper demonstrates that this type of adjustment can lead to significantly better use of thread-level parallelism in real-world architectures.","PeriodicalId":132033,"journal":{"name":"Proceedings of the 37th annual international symposium on Computer architecture","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"94","resultStr":"{\"title\":\"Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications\",\"authors\":\"Janghaeng Lee, Haicheng Wu, M. Ravichandran, Nathan Clark\",\"doi\":\"10.1145/1815961.1815996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting performance from modern parallel architectures requires that applications be divided into many different threads of execution. Unfortunately selecting the appropriate number of threads for an application is a daunting task. Having too many threads can quickly saturate shared resources, such as cache capacity or memory bandwidth, thus degrading performance. On the other hand, having too few threads makes inefficient use of the resources available. Beyond static resource assignment, the program inputs and dynamic system state (e.g., what other applications are executing in the system) can have a significant impact on the right number of threads to use for a particular application. To address this problem we present the Thread Tailor, a dynamic system that automatically adjusts the number of threads in an application to optimize system efficiency. The Thread Tailor leverages offline analysis to estimate what type of threads will exist at runtime and the communication patterns between them. Using this information Thread Tailor dynamically combines threads to better suit the needs of the target system. Thread Tailor adjusts not only to the architecture, but also other applications in the system, and this paper demonstrates that this type of adjustment can lead to significantly better use of thread-level parallelism in real-world architectures.\",\"PeriodicalId\":132033,\"journal\":{\"name\":\"Proceedings of the 37th annual international symposium on Computer architecture\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"94\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 37th annual international symposium on Computer architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1815961.1815996\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th annual international symposium on Computer architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1815961.1815996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications
Extracting performance from modern parallel architectures requires that applications be divided into many different threads of execution. Unfortunately selecting the appropriate number of threads for an application is a daunting task. Having too many threads can quickly saturate shared resources, such as cache capacity or memory bandwidth, thus degrading performance. On the other hand, having too few threads makes inefficient use of the resources available. Beyond static resource assignment, the program inputs and dynamic system state (e.g., what other applications are executing in the system) can have a significant impact on the right number of threads to use for a particular application. To address this problem we present the Thread Tailor, a dynamic system that automatically adjusts the number of threads in an application to optimize system efficiency. The Thread Tailor leverages offline analysis to estimate what type of threads will exist at runtime and the communication patterns between them. Using this information Thread Tailor dynamically combines threads to better suit the needs of the target system. Thread Tailor adjusts not only to the architecture, but also other applications in the system, and this paper demonstrates that this type of adjustment can lead to significantly better use of thread-level parallelism in real-world architectures.