Histoire & mesure最新文献

筛选
英文 中文
Global Figures: Tools for Observing and Governing Agricultural Markets (1880s-1940s) 全球数据:观察和管理农业市场的工具(1880 -1940)
Histoire & mesure Pub Date : 2023-06-30 DOI: 10.4000/histoiremesure.18973
Federico D’Onofrio, Niccolò Mignemi
{"title":"Global Figures: Tools for Observing and Governing Agricultural Markets (1880s-1940s)","authors":"Federico D’Onofrio, Niccolò Mignemi","doi":"10.4000/histoiremesure.18973","DOIUrl":"https://doi.org/10.4000/histoiremesure.18973","url":null,"abstract":"","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136300987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acceleration of Software Transactional Memory through Hardware Clock 通过硬件时钟加速软件事务性内存
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613912
E. Atoofian
{"title":"Acceleration of Software Transactional Memory through Hardware Clock","authors":"E. Atoofian","doi":"10.1145/2613908.2613912","DOIUrl":"https://doi.org/10.1145/2613908.2613912","url":null,"abstract":"Transactional Memory (TM) has gained momentum mainly due to its ability to provide synchronization transparency in parallel programs. In transactional applications, accesses to the shared data structures are handles by TM layer with no intervention by a programmer. Time-based software transactional memories (STMs) exploit a global clock to validate transactional data. Unfortunately, the clock becomes a bottleneck especially in programs with large number of concurrent transactions. In this paper, we exploit hardware support to implement the global clock. The hardware clock is implemented on the processor chip and enables bottleneck-free transactional memory run-time systems. Our evaluation using Gem5 simulator shows that the hardware clock is effective and reduces execution time of Stamp benchmarks up to 62%.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"182 1","pages":"41-47"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73271458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Low-resource and Scalable Strategy for Segment Partitioning of Many-core Nano Networks 多核纳米网络的低资源可扩展分段策略
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613915
V. Catania, Andrea Mineo, Salvatore Monteleone, Davide Patti
{"title":"A Low-resource and Scalable Strategy for Segment Partitioning of Many-core Nano Networks","authors":"V. Catania, Andrea Mineo, Salvatore Monteleone, Davide Patti","doi":"10.1145/2613908.2613915","DOIUrl":"https://doi.org/10.1145/2613908.2613915","url":null,"abstract":"In this work we introduce the design and implementation of DiSR, a distributed approach to topology discovery and defect mapping in nanoscale network-on-chip scenario. We first describe the conceptual elements and the execution model of DiSR, showing how the open-source Nanoxim platform has been used to evaluate the proposed approach in terms of node coverage and scalability achieved when establishing a segment partitioning. Next, in order to demostrate the feasibility of the proposed strategy in the context of the limited node resources, we propose both a schematic and gate-level hardware implementation of the required control logic and storage. Results show a relatively acceptable impact, ranging from 10 to about 20% of the 10,0000 transistors budget available for each node.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"31 1","pages":"17-24"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88326428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Closed Loop Control based Power Manager for WiNoC Architectures 基于闭环控制的WiNoC架构电源管理器
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613914
Mohd Shahrizal Rusli, Andrea Mineo, M. Palesi, G. Ascia, V. Catania, M. N. Marsono
{"title":"A Closed Loop Control based Power Manager for WiNoC Architectures","authors":"Mohd Shahrizal Rusli, Andrea Mineo, M. Palesi, G. Ascia, V. Catania, M. N. Marsono","doi":"10.1145/2613908.2613914","DOIUrl":"https://doi.org/10.1145/2613908.2613914","url":null,"abstract":"In modern CMOS technologies, the integration density continues to increase while limitations due to the wires interconnect become a bottleneck especially in multi-hop intra-chip communications. Emerging architectures, such as Wireless Networks-on-Chip (WiNoC), represent the candidate solutions to deal with communication latency issues which affect the many-core architectures. In WiNoC, metallic wires are replaced with long-range radio interconnections. Unfortunately, the energy consumed by the RF transceiver (i.e., the main building block of a WiNoC), and in particular by its transmitter, accounts for a significant fraction of the overall communication energy. Current WiNoC proposals use the same transmitting power for each transmitter regardless the physical location of the receiver antenna. This paper proposes a closed loop control mechanism that, based on the bit error rate observed by the receivers, selectively reconfigures the transmitters by calibrating their transmitting power. Preliminary results show the effectiveness of the proposed technique which allows to save up to 40% of energy with less than 2% of performance degradation.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"108 1","pages":"60-63"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80842078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based Manycores 基于多核集群的高效分叉/连接支持体系结构意识的相关性研究
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613911
H. Al-Khalissi, Mladen Berekovic, A. Marongiu
{"title":"On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based Manycores","authors":"H. Al-Khalissi, Mladen Berekovic, A. Marongiu","doi":"10.1145/2613908.2613911","DOIUrl":"https://doi.org/10.1145/2613908.2613911","url":null,"abstract":"Several recent manycores leverage a hierarchical design, where small-medium numbers of cores are grouped inside clusters and enjoy low-latency, high-bandwidth local communication through fast L1 scratchpad memories. Several clusters can be interconnected through a network-on-chip (NoC), which ensures system scalability but introduces non-uniform memory access (NUMA) effects: the cost to access a specific memory location depends of the physical path that corresponding transactions traverse. These peculiarities of the HW must clearly be carefully taken into account when designing support for programming models. In this paper we study how architectural awareness is key to supporting efficient and streamlined fork/join primitives. We compare hierarchical fork/join operations to \"flat\" ones, where there is no notion of the hierarchical interconnection system, considering two real-world manycores: Intel SCC and STMicro-electronics STHORM.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"5 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82915690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Virtualization Framework for IOMMU-less Many-Core Accelerators 无iommu多核加速器的虚拟化框架
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613910
Christian Pinto, A. Marongiu, L. Benini
{"title":"A Virtualization Framework for IOMMU-less Many-Core Accelerators","authors":"Christian Pinto, A. Marongiu, L. Benini","doi":"10.1145/2613908.2613910","DOIUrl":"https://doi.org/10.1145/2613908.2613910","url":null,"abstract":"Modern high-end embedded systems are designed as sophisticated systems-on-chip (SoC) composed of a virtualization-ready multi-core processor (the host) coupled to programmable manycore accelerators (PMCA). To tackle the increased programming complexity, the roadmap of several major industrial players envisions dedicated HW support to allow host and PMCA to communicate via shared virtual memory. I/O memory management units (IOMMU) and other HW are required to allow coherent virtual memory sharing. Currently no embedded heterogeneous SoC exist that provides such support, and it is unclear if the required HW will fit the tight area and energy budgets of such designs. However, providing the abstraction of a shared memory is very relevant to simplify programming of heterogeneous SoCs, as well as techniques to extend virtualization support to the manycore. We present in this work a software infrastructure which enables such support in absence of dedicated HW. The proposed mechanism is based on standard Linux KVM, and relies on (transparent) memory copies to resolve virtual-to-physical address translation. We describe an implementation for a real heterogeneous SoC and provide a detailed analysis of the cost of our SW-only solution.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"1992 1","pages":"33-40"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89010069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A GALS Router for Asynchronous Network-on-Chip 异步片上网络的GALS路由器
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613918
Pooria M. Yaghini, Ashkan Eghbal, N. Bagherzadeh
{"title":"A GALS Router for Asynchronous Network-on-Chip","authors":"Pooria M. Yaghini, Ashkan Eghbal, N. Bagherzadeh","doi":"10.1145/2613908.2613918","DOIUrl":"https://doi.org/10.1145/2613908.2613918","url":null,"abstract":"A scalable asynchronous NoC router with lower power consumption and latency comparing to a synchronous design is introduced in this article. It employs GALS interfaces (synchronous to asynchronous/asynchronous to synchronous), imposing negligible area overhead to handle the Metastability issue. It is synthesized with the help of Persia tool, resulting in 23165 transistors. The power consumption and delay factor have been evaluated by means of H-Spice toolset in 90nm manufacturing technology. According to the experimental results the proposed asynchronous design consumes less power than synchronous scheme by removing clock signals. The imposed area overhead of asynchronous design is reported 36% higher than synchronous one.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"14 1","pages":"52-55"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77989155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adaptive Compute-phase Prediction and Thread Prioritization to Mitigate Memory Access Latency 自适应计算阶段预测和线程优先级以减少内存访问延迟
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613919
Ismail Akturk, Özcan Özturk
{"title":"Adaptive Compute-phase Prediction and Thread Prioritization to Mitigate Memory Access Latency","authors":"Ismail Akturk, Özcan Özturk","doi":"10.1145/2613908.2613919","DOIUrl":"https://doi.org/10.1145/2613908.2613919","url":null,"abstract":"The full potential of chip multiprocessors remains unexploited due to the thread oblivious memory access schedulers used in off-chip main memory controllers. This is especially pronounced in embedded systems due to limitations in memory. We propose an adaptive compute-phase prediction and thread prioritization algorithm for memory access scheduling for embedded chip multiprocessors. The proposed algorithm efficiently categorize threads based on execution characteristics and provides fine-grained prioritization that allows to differentiate threads and prioritize their memory access requests accordingly. The threads in compute phase are prioritized among the threads in memory phase. Furthermore, the threads in compute phase are prioritized among themselves based on the potential of making more progress in their execution. Compared to the prior works First-Ready First-Come First-Serve (FR-FCFS) and Compute-phase Prediction with Writeback-Refresh Overlap (CP-WO), the proposed algorithm reduces the execution time of the generated workloads up to 23.6% and 12.9%, respectively.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"57 3 1","pages":"48-51"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76797594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Stream Buffer Mechanism for Pervasive Splitting Transformations on Polyhedral Process Networks 多面体进程网络上普适分裂变换的流缓冲机制
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613913
P. Meloni, Giuseppe Tuveri, L. Raffo, Igor Loi, Francesco Conti
{"title":"A Stream Buffer Mechanism for Pervasive Splitting Transformations on Polyhedral Process Networks","authors":"P. Meloni, Giuseppe Tuveri, L. Raffo, Igor Loi, Francesco Conti","doi":"10.1145/2613908.2613913","DOIUrl":"https://doi.org/10.1145/2613908.2613913","url":null,"abstract":"In modern MPSoC architectures, programming to effectively exploit all the available resources becomes very challenging. Polyhedral Process Networks (PPN) are a known model of computation that represents a suitable solution for systematic mapping of parallel applications onto multiprocessor architectures. In previous works it has been shown that a given PPN program specification can be further analyzed and optimized, in order to meet the desired performance requirements. In this paper we present an online process splitting transformation that does not need a re-design of the communication patterns in network structure of the application. The novelty of our approach is that, differently from other compile-time approaches, the proposed transformation technique can be applied at run-time and followed, if needed, by the backward transformation. Using a FPGA-based MPSoC shared memory platform, we present an evaluation of the achievable performance improvements. We also discuss the overhead caused by the introduction of the run-time transformation support.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"51 1","pages":"25-32"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89080640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive Cache Bypass and Insertion for Many-core Accelerators 多核加速器的自适应缓存旁路和插入
Histoire & mesure Pub Date : 2014-06-15 DOI: 10.1145/2613908.2613909
Xuhao Chen, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Zhiying Wang, Wen-mei W. Hwu
{"title":"Adaptive Cache Bypass and Insertion for Many-core Accelerators","authors":"Xuhao Chen, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Zhiying Wang, Wen-mei W. Hwu","doi":"10.1145/2613908.2613909","DOIUrl":"https://doi.org/10.1145/2613908.2613909","url":null,"abstract":"Many-core accelerators, e.g. GPUs, are widely used for accelerating general-purpose compute kernels. With the SIMT execution model, GPUs can hide memory latency through massive multithreading for many regular applications. To support more applications with irregular memory access pattern, cache hierarchy is introduced to GPU architecture to capture input data sharing and mitigate the effect of irregular accesses. However, GPU caches suffer from poor efficiency due to severe contention, which makes it difficult to adopt heuristic management policies, and also limits system performance and energy-efficiency.\u0000 We propose an adaptive cache management policy specifically for many-core accelerators. The tag array of L2 cache is enhanced with extra bits to track memory access history, an thus the locality information is captured and provided to L1 cache as heuristics to guide its run-time bypass and insertion decisions. By preventing un-reused data from polluting the cache and alleviating contention, cache efficiency is significantly improved. As a result, the system performance is improved by 31% on average for cache sensitive benchmarks, compared to the baseline GPU architecture.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"27 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74601692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信