2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献

筛选
英文 中文
GPUCalorie: Floorplan Estimation for GPU Thermal Evaluation GPU热量:GPU热评估的平面图估算
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00034
M. Chow, Ali Jahanshahi, Ana Beltrán, S. Tan, Daniel Wong
{"title":"GPUCalorie: Floorplan Estimation for GPU Thermal Evaluation","authors":"M. Chow, Ali Jahanshahi, Ana Beltrán, S. Tan, Daniel Wong","doi":"10.1109/ispass55109.2022.00034","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00034","url":null,"abstract":"GPUs are massively parallel architecture that consume significant power, which lead to high thermal output. Thermal constraints of GPUs are one of the major limitations in high performance, mobile and embedded applications. However, accurate thermal modeling tools for GPUs are lacking for researchers. We identify that limiting factors to further research are the absence of GPU floorplans necessary for thermal modeling, validated thermal trends, and outdated component-level power models. To this end, we present GPUCalorie, a thermal modeling methodology using specialized infrared thermography setup for measuring and validating thermal behaviors of real GPUs. We validate a floorplan of Nvidia’s GTX1050 identified through our infrared thermography setup. We validate the GPUCalorie identified floorplan against a real GTX1050 GPU, showing 10% error for the thermal map.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133788304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Microarchitectural Performance Evaluation of AV1 Video Encoding Workloads AV1视频编码工作负载的微架构性能评估
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00038
Steffen Jensen, Jaekyu Lee, Dam Sunwoo, Matthew Horsnell, L. John
{"title":"Microarchitectural Performance Evaluation of AV1 Video Encoding Workloads","authors":"Steffen Jensen, Jaekyu Lee, Dam Sunwoo, Matthew Horsnell, L. John","doi":"10.1109/ISPASS55109.2022.00038","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00038","url":null,"abstract":"Video encoding/decoding is an extremely relevant workload in our society today. Videos account for a significant percentage of the world’s network traffic, which is expected only to be growing. Thus, it is important to understand these workloads to optimize the hardware to handle them better.This paper explores the reasons for the large runtimes taken by AV1 encoding workloads. We discover that the runtime of the SVT-AV1 encoder is significantly higher than other encoders because it requires a larger number of instructions to encode the same video, rather than any significant microarchitectural inefficiencies. We also compare the thread scaling of SVT-AV1 against other codecs and observe that SVT-AV1 contains the highest degree of parallelism of the tested encoders.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133459066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Left-shifter: A pre-silicon framework for usage model based performance verification of the PCIe interface in server processor system on chips 左移器:芯片上服务器处理器系统中基于使用模型的PCIe接口性能验证的预硅框架
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00009
Tessil Thomas, B. Venkatasubramanian, Dinesh Sthapit, Christopher Gray, Atresh Gummadavelly, J. Bergeron, Pankaj Mehta, Prabu Thangamuthu
{"title":"Left-shifter: A pre-silicon framework for usage model based performance verification of the PCIe interface in server processor system on chips","authors":"Tessil Thomas, B. Venkatasubramanian, Dinesh Sthapit, Christopher Gray, Atresh Gummadavelly, J. Bergeron, Pankaj Mehta, Prabu Thangamuthu","doi":"10.1109/ISPASS55109.2022.00009","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00009","url":null,"abstract":"Input/Output (IO) peripherals like storage devices and network interface cards play a significant role in determining the end user visible performance of many server applications. In addition, many server applications depend on accelerators to achieve the desired performance levels. PCIe is the de-facto standard used for connecting IO peripherals and accelerators to server processor System On Chips (SoC). Therefore, it is important to verify that PCIe interface(s) of a server processor SoC allows full utilization of the available PCIe link bandwidth with reasonable transaction latencies for PCIe traffic patterns corresponding to the most common ways in which PCIe IO devices and accelerators are used by applications. Currently, to the best of our knowledge, such IO and accelerator usage model based PCIe interface performance verification can only be done after the manufactured SoC is available (i.e., in post-silicon). Unfortunately, doing such verification in post-silicon means that if any serious performance issues are found, the SoC developer is forced to invest in costly rectification and remanufacturing of the SoC. In this paper, we introduce an emulation-based framework that enables a “shift-left” of usage model based PCIe interface performance verification from post-silicon to pre-silicon. In contrast to the current post-silicon-based approach, our framework offers a low cost, fast turnaround method to identify and fix PCIe related performance issues prior to manufacturing the chip.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115650103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Address Translation Conscious Caching and Prefetching for High Performance Cache Hierarchy 高性能缓存层次结构的地址转换意识缓存和预取
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00044
Vasudha, Biswabandan Panda
{"title":"Address Translation Conscious Caching and Prefetching for High Performance Cache Hierarchy","authors":"Vasudha, Biswabandan Panda","doi":"10.1109/ispass55109.2022.00044","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00044","url":null,"abstract":"Performance of Translation Lookaside Buffers (TLBs) and on-chip caches plays a crucial role in delivering high-performance for memory-intensive applications with irregular memory accesses. Our observations show that, on average, an L2 TLB (STLB) miss for address translation can stall the head of the reorder buffer (ROB) for a maximum of 50 cycles. The corresponding data request, also called as the replay load can stall the head of the ROB for more than 200 cycles. We show that current state-of-the-art mid-level (L2C) and last-level cache (LLC) replacement policies do not treat cache block with address translations and replay data access differently. As a result these policies fail to reduce ROB stalls because of translation and replay data access misses. To improve the performance further on top of high-performing cache replacement policies, we propose address translation and replay data access conscious cache replacement policies at L2C and LLC. Our enhancements help in reducing ROB stalls due to STLB misses by 28.76%. We also find that cache blocks storing replay loads are dead (no reuse after insertion), and cache replacement policies alone cannot mitigate the ROB stalls caused by replay data accesses. Hence, we propose an address translation hit triggered hardware prefetcher that brings replay data on an address translation hit at the L2C and LLC. This enhancement reduces ROB stalls due to replay data accesses by 18.5%. For a group of memory-intensive benchmarks with high STLB misses, our enhancements improve performance by 5.1% (reducing ROB stall cycles by 46.7%) and as high as 10.6%, on top of state-of-the-art cache replacement policies that are highly competitive. Our enhancements do not incur any additional storage overhead. However, we need additional flags from the page-table-walker into the cache hierarchy.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123532160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization Ruby:通过不完全分解提高张量代数加速器的硬件效率
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00039
Mark Horeni, Pooria Taheri, Po-An Tsai, A. Parashar, J. Emer, S. Joshi
{"title":"Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization","authors":"Mark Horeni, Pooria Taheri, Po-An Tsai, A. Parashar, J. Emer, S. Joshi","doi":"10.1109/ispass55109.2022.00039","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00039","url":null,"abstract":"Finding high-quality mappings of Deep Neural Network (DNN) models onto tensor accelerators is critical for efficiency. State-of-the-art mapping exploration tools use remainderless (i.e., perfect) factorization to allocate hardware resources, through tiling the tensors, based on factors of tensor dimensions. This limits the size of the search space, (i.e., mapspace), but can lead to low resource utilization. We introduce a new mapspace, Ruby, that adds remainders (i.e., imperfect factorization) to expand the mapspace with high-quality mappings for user-defined architectures. This expansion allows us to allocate resources more precisely by generating tile sizes that better conform to hardware resources. However, this mapspace expansion also incurs an increase in the number of unique mappings. Consequently, this paper studies the trade-off between Ruby’s mapspace expansion and mapping quality. We propose Ruby-S (Spatial) to only employ imperfect factorization towards improved parallelism. Ruby-S incurs a moderate mapspace expansion while reducing energy-delay product (EDP) up to 50% when implementing ResNet-50 on an Eyeriss-like architecture with an average improvement of 20%. For the most part, this improvement can be attributed to higher compute utilization. EDP on a Simba-like architecture improves up to 40% with an average of 10%. For DeepBench workloads Ruby-S yields improvements of up to 45% with an average improvement of 10% on an Eyeriss-like architecture. Ruby-S is robust to accelerator configurations and improves EDP by 20% on average, with a maximum improvement of 55% when implementing ResNet-50 on different accelerator configurations. Ruby-S mappings form a new Pareto frontier, improving the performance of previous configurations by an average of 30% and 20% for ResNet-50 and DeepBench workloads respectively.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124851655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Profiling an Architectural Simulator 对架构模拟器进行分析
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00032
Nedasadat Taheri, Alexander Manely, Ahmni R. Pang, Mohammad Alian
{"title":"Profiling an Architectural Simulator","authors":"Nedasadat Taheri, Alexander Manely, Ahmni R. Pang, Mohammad Alian","doi":"10.1109/ISPASS55109.2022.00032","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00032","url":null,"abstract":"In this work we set out to answer the following two questions: (1) where are the bottlenecks in a state-of-the-art architectural simulator? (2) How much can we make architectural simulations run faster by tuning simple system configuration? We choose gem5 as the representative architectural simulator, run several simulations with various configurations, perform a detailed Top-Down analysis of the gem5 source code, and tune system settings for running simulations more efficiently.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128276916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cross-Level Characterization of Program Behavior : (Extended Poster Abstract) 程序行为的跨层次表征:(扩展海报摘要)
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00036
Li Tang, S. Pakin
{"title":"Cross-Level Characterization of Program Behavior : (Extended Poster Abstract)","authors":"Li Tang, S. Pakin","doi":"10.1109/ISPASS55109.2022.00036","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00036","url":null,"abstract":"Program behavior can be defined as a collection of executions [1]. Program behavior strongly relates to actual program performance but can be complicated to be characterized and analyzed. Characterization is important as it helps better understand program behavior by measuring various operations a program performs. There are many existing techniques [2]–[7] for program characterization, which operate at different levels of instrumentation: source code, intermediate representation (IR), instruction set architecture (ISA), and CPU microarchitecture. Each of these levels provides different capabilities and limitations. In this paper, we introduce Cross-Level Characterization (CLC), an analysis of similarities and differences in resource counts as measured at each level of instrumentation during a program’s transformation from source code through execution on a specific microarchitecture.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129639374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Analysis and Optimization with Little’s Law 基于利特尔定律的性能分析与优化
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00002
Sanyam Mehta
{"title":"Performance Analysis and Optimization with Little’s Law","authors":"Sanyam Mehta","doi":"10.1109/ispass55109.2022.00002","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00002","url":null,"abstract":"Performance tools are the bridge between processor architecture and a user. However, with the increasingly complex processor architectures, it is becoming increasingly difficult for the users to comprehend the information generated by the performance tools to help diagnose and fix the performance bottlenecks. In addition, the performance tools are themselves limited in many cases. Finally, there is wide variability in the kind of performance counters provided by the different processor vendors, making performance tools unportable across emerging architectures. In this work, we propose to solve these problems by accurately computing a portable and easily comprehensible performance metric - the (Memory-Level Parallelism) MLP of an application. The observed MLP when seen as a fraction of peak theoretical MLP supported by the host processor provides important guidance on the applicability of various popular program optimizations. Six case studies on three different processors each with a different memory technology show that our metric is both effective in program analysis and provides useful guidance on program optimization.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125852147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Indigo Program-Verification Microbenchmark Suite of Irregular Parallel Code Patterns 不规则并行代码模式的Indigo程序验证微基准套件
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ispass55109.2022.00003
Yiqiang Liu, Noushin Azami, Corbin Walters, Martin Burtscher
{"title":"The Indigo Program-Verification Microbenchmark Suite of Irregular Parallel Code Patterns","authors":"Yiqiang Liu, Noushin Azami, Corbin Walters, Martin Burtscher","doi":"10.1109/ispass55109.2022.00003","DOIUrl":"https://doi.org/10.1109/ispass55109.2022.00003","url":null,"abstract":"Irregular programs are found in many domains and tend to exhibit input-dependent control flow and memory accesses. This paper introduces the Indigo suite of important irregular parallel code patterns for testing verification and other tools. We studied many irregular CPU and GPU programs and extracted the key code patterns. Then, we methodically built variations of these patterns to alter the control-flow and memory-access behavior and/or introduce bugs, yielding the thousands of OpenMP and CUDA microbenchmarks in the suite. Indigo includes a set of generators to systematically create an unbounded number of inputs for each microbenchmark, which is essential to exercise the wide range of possible behaviors of input-dependent codes. To manage the millions of code and input combinations, Indigo provides the flexibility to generate user-defined subsets of the suite. Experiments with a subset of buggy and bug-free codes illustrate that irregular programs pose a significant challenge to both static and dynamic program verification tools. Moreover, such tools can perform quite differently across code patterns that contain the same bug.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"681 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122974833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
TILE-SIM: A Systematic Approach to Systolic Array-based Accelerator Evaluation TILE-SIM:一种基于收缩阵列的加速器评估系统方法
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI: 10.1109/ISPASS55109.2022.00016
Yuhang Li, M. Wen, Jiawei Fei, Junzhong Shen, Yasong Cao
{"title":"TILE-SIM: A Systematic Approach to Systolic Array-based Accelerator Evaluation","authors":"Yuhang Li, M. Wen, Jiawei Fei, Junzhong Shen, Yasong Cao","doi":"10.1109/ISPASS55109.2022.00016","DOIUrl":"https://doi.org/10.1109/ISPASS55109.2022.00016","url":null,"abstract":"The systolic array provides extremely high efficiency for running matrix multiplication, and is one of the mainstream architectures of today’s deep learning accelerators. In order to develop efficient accelerators, people usually employ simulators to make design trade-offs. However, current simulators suffer from coarse-grained modeling methods and ideal assumptions, which limits their ability of describing structural characteristics of systolic arrays. In addition, they do not support the exploration of microarchitecture. This paper presents TILE-SIM, a computing-centric systematic method for evaluating systolic array accelerators by using an event-driven method. TILE-SIM can obtain accurate results and provide the best mapping scheme for different workload due to its fine-grained modeling technique and deny of ideal assumption. Experimental results show that TILE-SIM plays a significant role in design trade-offs and outperforms state-of-the-art simulators, with an accuracy of more than 95%.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131229503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信