{"title":"GRAPHIC: Gather and Process Harmoniously in the Cache With High Parallelism and Flexibility","authors":"Yiming Chen;Mingyen Lee;Guohao Dai;Mufeng Zhou;Nagadastagiri Challapalle;Tianyi Wang;Yao Yu;Yongpan Liu;Yu Wang;Huazhong Yang;Vijaykrishnan Narayanan;Xueqing Li","doi":"10.1109/TETC.2023.3290683","DOIUrl":"10.1109/TETC.2023.3290683","url":null,"abstract":"In-memory computing (IMC) has been proposed to overcome the von Neumann bottleneck in data-intensive applications. However, existing IMC solutions could not achieve both high parallelism and high flexibility, which limits their application in more general scenarios: As a highly parallel IMC design, the functionality of a MAC crossbar is limited to the matrix-vector multiplication; Another IMC method of logic-in-memory (LiM) is more flexible in supporting different logic functions, but has low parallelism. To improve the LiM parallelism, we are inspired by investigating how the single-instruction, multiple-data (SIMD) instruction set in conventional CPU could potentially help to expand the number of LiM operands in one cycle. The biggest challenge is the inefficiency in handling non-continuous data in parallel due to the SIMD limitation of (i) continuous address, (ii) limited cache bandwidth, and (iii) large full-resolution parallel computing overheads. This article presents GRAPHIC, the first reported in-memory SIMD architecture that solves the parallelism and irregular data access challenges in applying SIMD to LiM. GRAPHIC exploits content-addressable memory (CAM) and row-wise-accessible SRAM. By providing the in-situ, full-parallelism, and low-overhead operations of address search, cache read-compute-and-update, GRAPHIC accomplishes high-efficiency gather and aggregation with high parallelism, high energy efficiency, low latency, and low area overheads. Experiments in both continuous data access and irregular data pattern applications show an average speedup of 5x over iso-area AVX-like LiM, and 3-5x over the emerging CAM-based accelerators of CAPE and GaaS-X in advanced techniques.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"84-96"},"PeriodicalIF":5.9,"publicationDate":"2023-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New Construction of Balanced Codes Based on Weights of Data for DNA Storage","authors":"Xiaozhou Lu;Sunghwan Kim","doi":"10.1109/TETC.2023.3293477","DOIUrl":"10.1109/TETC.2023.3293477","url":null,"abstract":"As maintaining a proper balanced GC content is crucial for minimizing errors in DNA storage, constructing GC-balanced DNA codes has become an important research topic. In this article, we propose a novel code construction method based on the weight distribution of the data, which enables us to construct GC-balanced DNA codes. Additionally, we introduce a specific encoding process for both balanced and imbalanced data parts. One of the key differences between the proposed codes and existing codes is that the parity lengths of the proposed codes are variable depending on the data parts, while the parity lengths of existing codes remain fixed. To evaluate the effectiveness of the proposed codes, we compare their average parity lengths to those of existing codes. Our results demonstrate that the proposed codes have significantly shorter average parity lengths for DNA sequences with appropriate GC contents.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"973-984"},"PeriodicalIF":5.9,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CRAM-Based Acceleration for Intermittent Computing of Parallelizable Tasks","authors":"Khakim Akhunov;Kasım Sinan Yıldırım","doi":"10.1109/TETC.2023.3293426","DOIUrl":"10.1109/TETC.2023.3293426","url":null,"abstract":"There is an emerging requirement for performing data-intensive parallel computations, e.g., machine-learning inference, locally on batteryless sensors. These devices are resource-constrained and operate intermittently due to the irregular energy availability in the environment. Intermittent execution might lead to several side effects that might prevent the correct execution of computational tasks. Even though recent studies proposed methods to cope with these side effects and execute these tasks correctly, they overlooked the efficient intermittent execution of parallelizable data-intensive machine-learning tasks. In this article, we present PiMCo—a novel programmable CRAM-based in-memory coprocessor that exploits the Processing In-Memory (PIM) paradigm and facilitates the power-failure resilient execution of parallelizable computational loads. Contrary to existing PIM solutions for intermittent computing, PiMCo promotes better programmability to accelerate a variety of parallelizable tasks. Our performance evaluation demonstrates that PiMCo improves the performance of existing low-power accelerators for intermittent computing by up to 8× and energy efficiency by up to 150×.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"48-59"},"PeriodicalIF":5.9,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purab Ranjan Sutradhar;Sathwika Bavikadi;Sai Manoj Pudukotai Dinakarrao;Mark A. Indovina;Amlan Ganguly
{"title":"3DL-PIM: A Look-Up Table Oriented Programmable Processing in Memory Architecture Based on the 3-D Stacked Memory for Data-Intensive Applications","authors":"Purab Ranjan Sutradhar;Sathwika Bavikadi;Sai Manoj Pudukotai Dinakarrao;Mark A. Indovina;Amlan Ganguly","doi":"10.1109/TETC.2023.3293140","DOIUrl":"10.1109/TETC.2023.3293140","url":null,"abstract":"Memory-centric computing systems have demonstrated superior performance and efficiency in memory-intensive applications compared to state-of-the-art CPUs and GPUs. 3-D stacked DRAM architectures unlock higher I/O data bandwidth than the traditional 2-D memory architecture and therefore are better suited for incorporating memory-centric processors. However, merely integrating high-precision ALUs in the 3-D stacked memory does not ensure an optimized design since such a design can only achieve a limited utilization of the internal bandwidth of a memory chip and limited operational parallelization. To address this, we propose 3DL-PIM, a 3-D stacked memory-based Processing in Memory (PIM) architecture that locates a plurality of Look-up Table (LUT)-based low-footprint Processing Elements (PE) within the memory banks in order to achieve high parallel computing performance by maximizing data-bandwidth utilization. Instead of relying on the traditional logic-based ALUs, the PEs are formed by clustering a group of programmable LUTs and therefore can be programmed on-the-fly to perform various logic/arithmetic operations. Our simulations show that 3DL-PIM can achieve respectively up to 2.6× higher processing performance at 2.65× higher area efficiency compared to a state-of-the-art 3-D stacked memory-based accelerator.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"60-72"},"PeriodicalIF":5.9,"publicationDate":"2023-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Graph-Incorporated Latent Factor Analysis Model for High-Dimensional and Sparse Data","authors":"Di Wu;Yi He;Xin Luo","doi":"10.1109/TETC.2023.3292866","DOIUrl":"10.1109/TETC.2023.3292866","url":null,"abstract":"A High-dimensional and \u0000<underline>s</u>\u0000parse (HiDS) matrix is frequently encountered in Big Data-related applications such as e-commerce systems or wireless sensor networks. It is of great significance to perform highly accurate representation learning on an HiDS matrix due to the great desires of extracting latent knowledge from it. \u0000<underline>L</u>\u0000atent \u0000<underline>f</u>\u0000actor \u0000<underline>a</u>\u0000nalysis (LFA), which represents an HiDS matrix by learning the low-rank embeddings based on its observed entries only, is one of the most effective and efficient approaches to this issue. However, most existing LFA-based models directly perform such embeddings on an HiDS matrix without exploiting its hidden graph structures, resulting in accuracy loss. To aid this issue, this paper proposes a \u0000<underline>g</u>\u0000raph-incorporated \u0000<underline>l</u>\u0000atent \u0000<underline>f</u>\u0000actor \u0000<underline>a</u>\u0000nalysis (GLFA) model. It adopts two-fold ideas: 1) a graph is constructed for identifying the hidden \u0000<underline>h</u>\u0000igh-\u0000<underline>o</u>\u0000rder \u0000<underline>i</u>\u0000nteraction (HOI) among nodes described by an HiDS matrix, and 2) a recurrent LFA structure is carefully designed with the incorporation of HOI, thereby improving the representation learning ability of a resultant model. Experimental results on three real-world datasets demonstrate that GLFA outperforms six state-of-the-art models in predicting the missing data of an HiDS matrix, which evidently supports its strong representation learning ability to HiDS data.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"907-917"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shaahin Angizi;Sepehr Tabrizchi;David Z. Pan;Arman Roohi
{"title":"PISA: A Non-Volatile Processing-in-Sensor Accelerator for Imaging Systems","authors":"Shaahin Angizi;Sepehr Tabrizchi;David Z. Pan;Arman Roohi","doi":"10.1109/TETC.2023.3292251","DOIUrl":"10.1109/TETC.2023.3292251","url":null,"abstract":"This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor in-memory computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only a near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate minor accuracy degradation on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of \u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u0000 1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by \u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u0000 84% compared to a baseline.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"962-972"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu-Wei Chang;Hong-Yen Chen;Chansu Han;Tomohiro Morikawa;Takeshi Takahashi;Tsung-Nan Lin
{"title":"FINISH: Efficient and Scalable NMF-Based Federated Learning for Detecting Malware Activities","authors":"Yu-Wei Chang;Hong-Yen Chen;Chansu Han;Tomohiro Morikawa;Takeshi Takahashi;Tsung-Nan Lin","doi":"10.1109/TETC.2023.3292924","DOIUrl":"10.1109/TETC.2023.3292924","url":null,"abstract":"5G networks with the vast number of devices pose security threats. Manual analysis of such extensive security data is complex. Dark-NMF can detect malware activities by monitoring unused IP address space, i.e., the darknet. However, the challenges of cooperative training for Dark-NMF are immense computational complexity with Big Data, communication overhead, and privacy concern with darknet sensor IP addresses. Darknet sensors can observe multivariate time series of packets from the same hosts, represented as intersecting columns in different data matrices. Previous works do not consider intersecting columns, losing a host's semantics because they do not aggregate the host's time series. To solve these problems, we proposed a federated IoT malware detection NMF for intersecting source hosts (FINISH) algorithm for offloading computing tasks to 5G multiaccess edge computing (MEC). The experiments show that FINISH is scalable to a data size with a shorter computational time and has a lower false positive detection performance than Dark-NMF. The comparison results demonstrate that FINISH has better computation and communication efficiency than related works and a short communication time, taking only 1/10 the execution time in a simulated 5G MEC. The experimental results can provide substantial insights into developing federated cybersecurity in the future.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"934-949"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Maximizing Social Influence With Minimum Information Alteration","authors":"Guan Wang;Weihua Li;Quan Bai;Edmund M-K Lai","doi":"10.1109/TETC.2023.3292384","DOIUrl":"10.1109/TETC.2023.3292384","url":null,"abstract":"With the rapid advancement of the Internet and social platforms, how to maximize the influence across popular online social networks has attracted great attention from both researchers and practitioners. Almost all the existing influence diffusion models assume that influence remains constant in the process of information spreading. However, in the real world, people tend to alternate information by attaching opinions or modifying the contents before spreading it. Namely, the meaning and idea of a message normally mutate in the process of influence diffusion. In this article, we investigate how to maximize the influence in online social platforms with a key consideration of suppressing the information alteration in the diffusion cascading process. We leverage deep learning models and knowledge graphs to present users’ personalised behaviours, i.e., actions after receiving a message. Furthermore, we investigate the information alteration in the process of influence diffusion. A novel seed selection algorithm is proposed to maximize the social influence without causing significant information alteration. Experimental results explicitly show the rationale of the proposed user behaviours deep learning model architecture and demonstrate the novel seeding algorithm's outstanding performance in both maximizing influence and retaining the influence originality.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 2","pages":"419-431"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Edgeless-GNN: Unsupervised Representation Learning for Edgeless Nodes","authors":"Yong-Min Shin;Cong Tran;Won-Yong Shin;Xin Cao","doi":"10.1109/TETC.2023.3292240","DOIUrl":"10.1109/TETC.2023.3292240","url":null,"abstract":"We study the problem of embedding \u0000<i>edgeless</i>\u0000 nodes such as users who newly enter the underlying network, while using graph neural networks (GNNs) widely studied for effective representation learning of graphs. Our study is motivated by the fact that GNNs cannot be straightforwardly adopted for our problem since message passing to such edgeless nodes having no connections is impossible. To tackle this challenge, we propose \u0000<inline-formula><tex-math>$mathsf{Edgeless-GNN}$</tex-math></inline-formula>\u0000, a novel inductive framework that enables GNNs to generate node embeddings even for edgeless nodes through \u0000<i>unsupervised learning</i>\u0000. Specifically, we start by constructing a proxy graph based on the similarity of node attributes as the GNN's computation graph defined by the underlying network. The known network structure is used to train model parameters, whereas a \u0000<i>topology-aware</i>\u0000 loss function is established such that our model judiciously learns the network structure by encoding positive, negative, and second-order relations between nodes. For the edgeless nodes, we \u0000<i>inductively</i>\u0000 infer embeddings by expanding the computation graph. By evaluating the performance of various downstream machine learning tasks, we empirically demonstrate that \u0000<inline-formula><tex-math>$mathsf{Edgeless-GNN}$</tex-math></inline-formula>\u0000 exhibits (a) superiority over state-of-the-art inductive network embedding methods for edgeless nodes, (b) effectiveness of our topology-aware loss function, (c) robustness to incomplete node attributes, and (d) a linear scaling with the graph size.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 1","pages":"150-162"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44628489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resource Allocation Optimization by Quantum Computing for Shared Use of Standalone IRS","authors":"Takahiro Ohyama;Yuichi Kawamoto;Nei Kato","doi":"10.1109/TETC.2023.3292355","DOIUrl":"10.1109/TETC.2023.3292355","url":null,"abstract":"Intelligent reflecting surfaces (IRSs) have attracted attention as a technology that can considerably improve the energy utilization efficiency of sixth-generation (6G) mobile communication systems. IRSs enable control of propagation characteristics by adjusting the phase shift of each reflective element. However, designing the phase shift requires the acquisition of channel information for each reflective element, which is impractical from an overhead perspective. In addition, for multiple wireless network operators to share an IRS for communication, new infrastructure facilities and operational costs are required at each operator's end to control the IRS in a coordinated manner. Herein, we propose a wireless communication system using standalone IRSs to solve these problems. The standalone IRSs cover a wide area by periodically switching phase shifts, and each operator allocates radio resources according to their phase-shift switching. Furthermore, we derive a quadratic unconstrained binary optimization equation for the proposed system to optimize radio resource allocation using quantum computing. The results of computer simulations indicate that the proposed system and method can be used to achieve efficient communication in 6G mobile communication systems.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"11 4","pages":"950-961"},"PeriodicalIF":5.9,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62528843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}