Latest Articles

## A Novel Rule Mapping on TCAM for Power Efficient Packet Classification

Packet Classification is the enabling function performed in commodity switches for providing various services such as access control, intrusion... (more)

## Improving Test and Diagnosis Efficiency through Ensemble Reduction and Learning

Machine learning is a powerful lever for developing, improving, and optimizing test methodologies to cope with the demand from the advanced nodes.... (more)

## Revealing Cluster Hierarchy in Gate-level ICs Using Block Diagrams and Cluster Estimates of Circuit Embeddings

Contemporary integrated circuits (ICs) are increasingly being constructed using intellectual... (more)

## Stress-Induced Performance Shifts in 3D DRAMs

3D-stacked DRAMs can significantly increase cell density and bandwidth while also lowering power consumption. However, 3D structures experience significant thermomechanical stress due to the differential rate of contraction of the constituent materials, which have different coefficients of thermal expansion. This impacts circuit performance. This... (more)

## Exploring the Role of Large Centralised Caches in Thermal Efficient Chip Design

In the era of short channel length, Dynamic Thermal Management (DTM) has become a challenging task for the architects and designers engineering modern... (more)

## Reducing DRAM Refresh Rate Using Retention Time Aware Universal Hashing Redundancy Repair

As the device capacity of Dynamic Random Access Memory (DRAM) increases, refresh operation becomes a significant contributory factor toward total... (more)

## Time-Multiplexed FPGA Overlay Architectures: A Survey

This article presents a comprehensive survey of time-multiplexed (TM) FPGA overlays from the research literature. These overlays are categorized based on their implementation into two groups: processor-based overlays, as their implementation follows that of conventional silicon-based microprocessors, and; CGRA-like overlays, with either an array of... (more)

## Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures

Heterogeneous multichip architectures have gained significant interest in high-performance computing clusters to cater to a wide range of... (more)

##### NEWS

Call for Special Issue on Machine Learning for CAD: This Special Issue focuses on machine learning methods for all aspects of CAD for VLSI and electronic system design. Deadline for submission is June 15th 2019.

ACM TODAES new page limit policy: Manuscripts must be formatted in the ACM Transactions format; a 35-page limit applies to the final paper. Rare exceptions are possible if recommended by the reviewers and approved by the Editorial Board.

ORCID is a community-based effort to create a global registry of unique researcher identifiers for the purpose of ensuring proper attribution of works to their creators. When you submit a manuscript for review, you will be presented with the opportunity to register for your ORCID.

Welcome ACM Associate Editors

##### Forthcoming Articles
Approximate Data Reuse-Based Accelerator Design for Embedded Processor

Due to increasing diversity and complexity of applications in embedded systems, accelerator designs trading-off area/energy-efficiency and design-productivity are becoming a further crucial issue. Targeting applications in the category of Recognition, Mining, and Synthesis (RMS), this study proposes a novel accelerator design to achieve a good trade-off in efficiency and design-productivity (or reusability) by introducing a new computing paradigm called "approximate computing (AC)." Leveraging from the facts that frequently-executed parts of applications (i.e., hotspots) are conventionally the target of acceleration and that RMS applications are error-tolerant and often take similar input data repeatedly, our proposed accelerator reuses previous computational results of similar enough data to reduce computations. The proposed accelerator is composed of a simple controller and a dedicated memory to store limited sets of previous input data with corresponding computational results in a hotspot. Therefore, this accelerator can be applied to different and/or multiple hotspots/applications only through small extension of the controller, to achieve efficient accelerator design and resolve the design-productivity issue. We conducted quantitative evaluations using a representative RMS application (image compression) to demonstrate the effectiveness of our method over conventional ones with precise computing. Moreover, we provide important findings on parameter exploration for our accelerator design, offering a wider applicability of our accelerator to other applications.

IP Protection and Supply Chain Security through Logic Obfuscation: A Systematic Overview

The globalization of the semiconductor supply chain introduces ever-increasing security and privacy risks. Two major concerns are IP theft through reverse engineering and malicious modification of the design. The latter concern in part relies on successful reverse engineering of the design as well. IC camouflaging and logic locking are two research techniques that can thwart reverse engineering by end-users or foundries. However, developing low overhead locking/camouflaging schemes that can resist the ever-evolving state-of-the-art attacks has been a research challenge for several years. This article provides a comprehensive review of the state-of-art with respect to locking/camouflaging techniques. We start by defining a systematic threat model for these techniques and discuss how various real-world scenarios relate to each threat model. We then discuss the evolution of generic algorithmic attacks under each threat model leading to the strongest existing attacks. The paper then systematizes defences, discussing attacks that are more specific to certain kinds of locking/camouflaging. In conclusion the paper discusses open problems and future directions.

Impact of Electrostatic Coupling on Monolithic 3D-enabled Network on Chip

Monolithic-3D-integration (M3D) improves the performance and energy efficiency of 3D ICs over conventional TSV-based counterparts. The smaller dimensions of monolithic inter-tier vias (MIVs) offer high density integration, the flexibility of partitioning logic blocks across multiple tiers and significantly reduced total wire-length enable high-performance and energy-efficiency. However, the performance of M3D ICs degrades due to the presence of electrostatic coupling when the inter-layer-dielectric (ILD) thickness between two adjacent tiers is less than 50nm. In this work, we evaluate the performance of an M3D-enabled Network-on-chip (NoC) architecture in the presence of electrostatic coupling. Electrostatic coupling induces significant delay and energy overheads for the multi-tier NoC routers. This in turn results in considerable performance degradation if the NoC design methodology does not incorporate the effects of electrostatic coupling. We demonstrate that electrostatic coupling degrades the energy-delay-product (EDP) of an M3D NoC by 18.1% averaged over eight different applications from SPLASH-2 and PARSEC benchmark suites. As a countermeasure, we advocate the adoption of electrostatic coupling-aware M3D NoC design methodology. Experimental results show that the coupling-aware M3D NoC reduces performance penalty by lowering the number of multi-tier routers significantly.

Smart-Hop Arbitration Request Propagation: Avoiding Quadratic Arbitration Complexity and False Negatives in SMART NoCs

SMART-based NoC designs achieve ultra-low latencies by enabling flits to traverse multiple hops within a single clock cycle. Notwithstanding the clear performance benefits, SMART-based NoCs suffer from several shortcomings: each router must arbitrate among a quadratic number of requests, which leads to high costs; each router independently makes its own arbitration decisions, which leads to a problem called false negatives that causes throughput loss. In this paper, we propose a new SMART-based NoC design called SHARP that overcomes these shortcomings. Our evaluation demonstrates that SHARP increases throughput by up to 19% and average link utilization by up to 24% by avoiding false negatives. By avoiding quadratic arbitration, our evaluation further demonstrates that SHARP reduces the wiring and area overhead significantly.

Cut Optimization for Reduendant Via Insertion in Self-Aligned Double Patterning

Redundant via (RV) helps prevent via defects and leads to yield increase. RV insertion in self-aligned double patterning (SADP) is more challenging since cut optimization has to be considered together. In SADP process, the 1-D metal line is divided into signal wires and dummy wires by line-end cuts; if an RV is inserted, signal wires are extended to connect to RV and an additional cut, called RV cut, is introduced to make a space for the extension. Since RV cuts and line-end cuts reside in the same mask, design rules between those cuts have to be honored, which leads to cut optimization problem. In this paper, we address a problem of integrated RV insertion and cut optimization. We show that the problem can be formulated as integer linear programming (ILP). A heuristic algorithm is presented for practical application: the locations where RV can be inserted, called RV candidates, are identified and some candidates that cause electrical short or conflicts with nearby line-end cuts are dropped; RV insertion is then performed; line-end cut is assigned to each gap, considering the positions of RV cuts, in which graph partitioning technique is applied to reduce the runtime. Experimental results demonstrate that 75% of vias receive RVs with 8% increase in total wire length, which is only slightly worse than the optimal result obtained by ILP.

JAMS-SG: A Framework for Jitter-Aware Message Scheduling for Time-Triggered Automotive Networks

Time-triggered automotive networks use time-triggered protocols (FlexRay, TT Ethernet, etc.) for periodic message transmissions that often originate from safety and time-critical applications. One of the major challenges with time-triggered transmissions is jitter, which is the unpredictable delay-induced deviation from the actual periodicity of a message. Failure to account for jitter can be catastrophic in time-sensitive systems, such as automotive platforms. In this article, we propose a novel scheduling framework (JAMS-SG) that satisfies timing constraints during message delivery for both jitter-affected time-triggered messages and high priority event-triggered messages in automotive networks. At design time, JAMS-SG performs jitter-aware frame packing (packing of multiple signals from Electronic Control Units (ECUs) into messages), and schedule synthesis with a hybrid heuristic. At runtime, a Multi-Level Feedback Queue (MLFQ) handles jitter affected time-triggered messages, and high priority event-triggered messages which are scheduled using a runtime scheduler. Our simulation results, based on messages and network traffic data from a real vehicle, indicate that JAMS-SG is highly scalable and outperforms the best-known prior work in the area, in the presence of jitter.

Efficient Cache Reconfiguration using Machine Learning in NoC-based Many-Core CMPs

Dynamic cache reconfiguration (DCR) is an effective technique to optimize energy consumption in many-core architectures. While early work on DCR has shown promising energy saving opportunities, prior techniques are not suitable for many-core architectures since they do not consider the interactions and tight coupling between memory, caches and network-on-chip (NoC) traffic. In this paper, we propose an efficient cache reconfiguration framework in NoC-based many-core architectures. The proposed work makes three major contributions. First, we model a distributed directory based many-core architecture similar to Intel Xeon Phi architecture. Next, we propose an efficient cache reconfiguration framework that considers all significant components, including NoC, caches and main memory. Finally, we propose a machine learning based framework that can reduce the exploration time by an order of magnitude with negligible loss in accuracy. Our experimental results demonstrate 18.5% energy savings on average compared to base cache configuration.

Modeling and Simulation of Dynamic Applications using Scenario-Aware Dataflow

The trade-off between analyzability and expressiveness is a key factor when choosing a suitable dataflow model of computation (MoC) for designing, modeling and simulating applications considering a formal base. A large number of techniques and analysis tools exist for static dataflow models, such as synchronous dataflow. However, they cannot express dynamic behavior required for more dynamic applications in signal streaming or to model runtime reconfigurable systems. On the other hand, dynamic dataflow models like Kahn process networks sacrifice analyzability for expressiveness. Scenario-aware dataflow (SADF) is an excellent trade-off providing sufficient expressiveness for dynamic systems, while still giving access to powerful analysis methods. In spite of an increasing interest in SADF methods, there is a lack of formally-defined functional models for describing and simulating SADF systems. This paper overcomes the current situation by introducing a functional model for the SADF MoC, as well as a set of abstract operations for simulating it. We present the first modeling and simulation environment for SADF so far and demonstrate its capabilities through a comprehensive tutorial-style example of a RISC processor described as an SADF application, as well as the modeling of an MPEG-4 simple profile decoder. Finally, we discuss the potential of our formal model as a frontend for formal system design flows regarding dynamic applications.

Architectural Design of Flow-based Microfluidic Biochips for Multi-Target Dilution of Biochemical Fluids

Microfluidic technologies enable replacement of time consuming and complex steps of biochemical laboratory protocols with a tiny chip. Sample preparation (i. e., dilution or mixing of fluids) is one of the primary tasks of any bioprotocol. In real-life applications where several assays need to be executed for different diagnostic purposes, the same sample fluid is often required with different target concentration factors (CFs). Although several multi-target dilution algorithms have been developed for digital microfluidic ({\em DMF}) biochips, they are not efficient for implementation with continuous-flow based microfluidic ({\em CMF}) chips, which are preferred in the laboratories. In this paper, we present a multi-target dilution algorithm ({\em MTDA}) for {\em CMF} biochips, which, to the best of our knowledge, is the first-of-its-kind. We design a flow-based rotary mixer with a suitable number of segments depending on the target-$CF$ profile, error-tolerance, and optimization criteria. In order to schedule several intermediate fluid-mixing tasks, we develop a multi-target scheduling algorithm ({\em MTSA}) aiming to minimize the usage of storage units, while producing dilutions with multiple $CF$s. Furthermore, we propose a storage architecture for efficiently loading (storing) of intermediate fluids from (to) the storage units.

Investigating the Impact of Image Content On the Energy Efficiency of Hardware Accelerated Digital Spatial Filters

Battery operated low-power portable computing devices are becoming an inseparable part of human daily life. One of the major goals is to achieve the longest battery life in such a device. Additionally, the need for performance in processing multimedia content is ever increasing. Processing image and video content consume more power than other applications. A common approach to improving energy efficiency is to implement the computationally intensive functions as digital hardware accelerators. Spatial filtering is one of the most commonly used methods of digital image processing. As per the Fourier theory, an image can be considered as a two-dimensional signal that is composed of spatially extended two-dimensional sinusoidal patterns called gratings. Spatial frequency theory states that sinusoidal gratings can be characterised by its spatial frequency, phase, amplitude and orientation. This paper presents results from our investigation into assessing the impact of these characteristics of a digital image on the energy efficiency of hardware accelerated spatial filters employed to process the same image. Two greyscale images each of size 128x128 pixels comprising of two-dimensional sinusoidal gratings at maximum spatial frequency of 64 cycles per image orientated at 0 and 90 degrees respectively, were processed in a hardware implemented Gaussian smoothing filter. The energy efficiency of the filter was compared with the baseline energy efficiency of processing a featureless plain black image. The results show that energy efficiency of the filter drops to 12.5% when the gratings are orientated at 0 degrees whilst rises to 72.38% at 90 degrees.

###### All ACM Journals | See Full Journal Index

Search TODAES
enter search term and/or author name