Latest Articles

## A Novel Rule Mapping on TCAM for Power Efficient Packet Classification

Packet Classification is the enabling function performed in commodity switches for providing various services such as access control, intrusion... (more)

## Improving Test and Diagnosis Efficiency through Ensemble Reduction and Learning

Machine learning is a powerful lever for developing, improving, and optimizing test methodologies to cope with the demand from the advanced nodes.... (more)

## Revealing Cluster Hierarchy in Gate-level ICs Using Block Diagrams and Cluster Estimates of Circuit Embeddings

Contemporary integrated circuits (ICs) are increasingly being constructed using intellectual... (more)

## Stress-Induced Performance Shifts in 3D DRAMs

3D-stacked DRAMs can significantly increase cell density and bandwidth while also lowering power consumption. However, 3D structures experience significant thermomechanical stress due to the differential rate of contraction of the constituent materials, which have different coefficients of thermal expansion. This impacts circuit performance. This... (more)

## Exploring the Role of Large Centralised Caches in Thermal Efficient Chip Design

In the era of short channel length, Dynamic Thermal Management (DTM) has become a challenging task for the architects and designers engineering modern... (more)

## Reducing DRAM Refresh Rate Using Retention Time Aware Universal Hashing Redundancy Repair

As the device capacity of Dynamic Random Access Memory (DRAM) increases, refresh operation becomes a significant contributory factor toward total... (more)

## Time-Multiplexed FPGA Overlay Architectures: A Survey

This article presents a comprehensive survey of time-multiplexed (TM) FPGA overlays from the research literature. These overlays are categorized based on their implementation into two groups: processor-based overlays, as their implementation follows that of conventional silicon-based microprocessors, and; CGRA-like overlays, with either an array of... (more)

## Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures

Heterogeneous multichip architectures have gained significant interest in high-performance computing clusters to cater to a wide range of... (more)

## Approximate Data Reuse-based Accelerator Design for Embedded Processor

Due to increasing diversity and complexity of applications in embedded systems, accelerator designs trading-off area/energy-efficiency and... (more)

## Modeling and Simulation of Dynamic Applications Using Scenario-Aware Dataflow

The tradeoff between analyzability and expressiveness is a key factor when choosing a suitable dataflow model of computation (MoC) for designing,... (more)

##### NEWS

ACM TODAES new page limit policy: Manuscripts must be formatted in the ACM Transactions format; a 35-page limit applies to the final paper. Rare exceptions are possible if recommended by the reviewers and approved by the Editorial Board.

ORCID is a community-based effort to create a global registry of unique researcher identifiers for the purpose of ensuring proper attribution of works to their creators. When you submit a manuscript for review, you will be presented with the opportunity to register for your ORCID.

Welcome ACM Associate Editors

##### Forthcoming Articles
Optimization of Threshold Logic Networks with Node Merging and Wire Replacement

In this paper, we present an optimization method for threshold logic networks (TLNs) based on observability don?t care-based node merging. To reduce gate count in a TLN, it iteratively merges two gates that are functionally equivalent or whose differences are never observed at the primary outputs. Furthermore, it is able to identify redundant wires and replace wires for removing more gates. Basically, the proposed method is primarily adapted from an ATPG-based node-merging approach which works for conventional Boolean logic networks. To extend the approach for TLNs, we develop a method for computing the mandatory assignments of a stuck-at fault test on a threshold gate and a method for conducting logic implication in a TLN. Additionally, to achieve a better optimization quality, we integrate the proposed method with other optimization methods. The experimental results show that the overall optimization method can save an average of approximately 4.7% threshold gates for a set of TLNs which are generated by using the latest TLN synthesis method. The experimental results also demonstrate the efficiency of the optimization method.

IP Protection and Supply Chain Security through Logic Obfuscation: A Systematic Overview

The globalization of the semiconductor supply chain introduces ever-increasing security and privacy risks. Two major concerns are IP theft through reverse engineering and malicious modification of the design. The latter concern in part relies on successful reverse engineering of the design as well. IC camouflaging and logic locking are two research techniques that can thwart reverse engineering by end-users or foundries. However, developing low overhead locking/camouflaging schemes that can resist the ever-evolving state-of-the-art attacks has been a research challenge for several years. This article provides a comprehensive review of the state-of-art with respect to locking/camouflaging techniques. We start by defining a systematic threat model for these techniques and discuss how various real-world scenarios relate to each threat model. We then discuss the evolution of generic algorithmic attacks under each threat model leading to the strongest existing attacks. The paper then systematizes defences, discussing attacks that are more specific to certain kinds of locking/camouflaging. In conclusion the paper discusses open problems and future directions.

Two-Sided Net Untangling With Internal Detours for Single-Layer Bus Routing

In this paper, firstly, the concept of using internal detours on untangled nets can be introduced into two-sided net untangling. Furthermore, given a set of 2-pin nets inside a single bus, based on the one-sided untangling results with internal detours on untangled nets, an efficient algorithm uses a minimal set of internal detours to guarantee that the crossing conditions of the given nets inside a single bus can be eliminated in one initial two-sided untangling result without considering the capacity constraint between two adjacent pins inside two pin-rows. Finally, based on one initial two-sided untangling result, an iterative rip-up-and-reassign algorithm is proposed to eliminate the possible capacity violations with satisfying the necessary capacity constraints in two-sided net untangling. Compared with Yan's algorithms in one-sided net untangling with internal detours for 12 tested examples with different capacity constraints, the experimental results show that our proposed two-sided untangling algorithm can use the benefit of more routing space behind two pin-rows to eliminates 89.5% of the used internal detours in increasing less CPU time for the 12 tested examples on the average. Compared with two-sided net untangling without internal detour for 12 tested examples with different capacity constraints, the experimental results show that our proposed two-sided untangling algorithm can successfully untangle all the twisted nets inside a single bus by using a minimal set of internal detours on the untangled nets for the 12 tested examples in reasonable time.

Automatic Stage-form Circuit Reduction for Multistage Opamp Design Equation Generation

An automatic stage-form circuit reduction method for multistage operational amplifiers (opamps) is proposed in this paper. It is purposed for automatic generation of design equations in multistage opamp design. It is a fully symbolic method without using any numerical numbers; thus it is a significant improvement over the prior methods that must rely on numerical reference numbers. Hence, the design equations so generated can serve the purpose of intrinsic circuit characterization independent of specific biasing and sizing conditions. Following the design equation generation procedure developed by this work, circuit-level frequency characteristics including analytical poles and zeros can be automatically generated and used as the constraint equations in sizing. Examples are provided to demonstrate that the proposed method can to applied to generate readable analytical reduced models and readable design equations for use in the design of multistage opamps.

Smart-Hop Arbitration Request Propagation: Avoiding Quadratic Arbitration Complexity and False Negatives in SMART NoCs

SMART-based NoC designs achieve ultra-low latencies by enabling flits to traverse multiple hops within a single clock cycle. Notwithstanding the clear performance benefits, SMART-based NoCs suffer from several shortcomings: each router must arbitrate among a quadratic number of requests, which leads to high costs; each router independently makes its own arbitration decisions, which leads to a problem called false negatives that causes throughput loss. In this paper, we propose a new SMART-based NoC design called SHARP that overcomes these shortcomings. Our evaluation demonstrates that SHARP increases throughput by up to 19% and average link utilization by up to 24% by avoiding false negatives. By avoiding quadratic arbitration, our evaluation further demonstrates that SHARP reduces the wiring and area overhead significantly.

Making Aging Useful by Recycling Aging-Induced Clock Skew

Device aging, which causes significant loss on circuit performance and lifetime, has been a primary factor in reliability degradation of nanoscale designs. In this paper, we propose to take advantage of aging-induced clock skews (i.e., make them useful for aging tolerance) by manipulating and recycling these time-varying skews to compensate for the performance degradation of logic networks. The goal is to assign achievable/reasonable aging-induced clock skews in a circuit, such that its effective performance degradation due to aging can be tolerated, that is, the lifespan can be maximized. On average, 25.04% aging tolerance can be achieved with insignificant design overhead. Moreover, we employ Vth assignment on clock buffers to further tolerate the aging-induced degradation of logic networks. When Vth assignment is applied on top of aforementioned aging manipulation, the average aging tolerance can be enhanced to 35.96%.

Hidden in Plaintext: An Obfuscation-based Countermeasure against FPGA Bitstream Tampering Attacks

Field Programmable Gate Arrays (FPGAs) have become an attractive choice for diverse applications due to their reconfigurability and unique security features. However, designs mapped to FPGAs are prone to malicious modifications or tampering of critical functions. Besides, targeted modifications have demonstrably compromised FPGA implementations of various cryptographic primitives. Existing security measures based on encryption and authentication can be bypassed using their side-channel vulnerabilities to execute bitstream tampering attacks. Furthermore, numerous resource-constrained applications are now equipped with low-end FPGAs which may not support power-hungry cryptographic solutions. In this paper, we propose a novel obfuscation-based approach to achieve strong resistance against both random and targeted pre-configuration tampering of critical functions in an FPGA design. Our solution first identifies the unique structural and functional features that separate the critical function from the rest of the design using a machine learning guided framework. The selected features are eliminated by applying appropriate obfuscation techniques, many of which take advantage of ?FPGA dark silicon?? unused lookup table resources, to mask the critical functions. Furthermore, following the same obfuscation principle, a redundancy-based technique is proposed to thwart targeted, rule-based, and random tampering. We have developed a complete methodology and custom software tool?ow that integrates with commercial tools. By applying the masking technique on a design containing AES, we show the e?ectiveness of the proposed framework in hiding the critical S-Box function. We implement the redundancy integrated solution in various cryptographic designs to analyze the overhead. In order to protect 16.2% critical component of a design, the proposed approach incurs an average area overhead of only 2.4% over similar redundancy-based approaches, while achieving strong security.

Security-Aware Routing and Scheduling for Control Applications on Ethernet TSN Networks

Today, it is common knowledge, in the cyber-physical systems domain, that the tight interaction between the cyber and physical elements provides the possibility of substantially improving the performance of these systems that is otherwise impossible. On the downside, however, this tight interaction with cyber elements makes it easier for an adversary to compromise the safety of the system. This becomes particularly important since such systems typically comprise several critical physical components, e.g., adaptive cruise control or engine control that allow deep intervention in the driving of a vehicle. As a result, it is important to ensure not only the reliability of such systems, e.g., in terms of schedulability and stability of control plants, but also resilience to adversarial attacks. In this article, we propose a security-aware methodology for routing and scheduling for control applications in Ethernet networks. The goal is to maximize the resilience of control applications within these networked control systems to malicious interference, while guaranteeing the stability of all control plants, despite the stringent resource constraints in such cyber-physical systems. Our experimental evaluations demonstrate that careful optimization of available resources can significantly improve the resilience of these networked control systems to attacks.

Runtime Stress Estimation for 3D IC Reliability Management Using Artificial Neural Network

Heat dissipation and the related thermal-mechanical stress problems are the major obstacles in the development of the three dimensional integrated circuit (3D IC). Reliability management techniques can be used to alleviate such problems and enhance the reliability of 3D IC. However, it is difficult to obtain the time varying stress information at runtime, which limits the effectiveness of the reliability management. In this article, we propose a fast stress estimation method for runtime reliability management using artificial neural network (ANN). The new method builds ANN based stress model by training offline using temperature and stress data. The ANN stress model is then used to estimate the important stress information, such as the maximum stress around each TSV, for reliability management at runtime. Since there are a variety of potential ANN structures to choose from for the ANN stress model, we analyze and test three ANN based stress models with three major types of ANNs in this work: the normal ANN based stress model, the ANN stress model with hand-crafted feature extraction, and the convolutional neural network (CNN) based stress model. The structures of each ANN stress model and the functions of these structures in 3D IC stress estimation are demonstrated and explained. The new runtime stress estimation method is tested using the three ANN stress models with different layer configurations. Experiments show that the new method is able to estimate important stress information at extremely fast speed with good accuracy for runtime 3D IC reliability enhancement. Although all three ANN stress models show acceptable capabilities in runtime stress estimation, the CNN based stress model achieves the best performance considering both stress estimation accuracy and computing overhead.

Architectural Design of Flow-based Microfluidic Biochips for Multi-Target Dilution of Biochemical Fluids

Microfluidic technologies enable replacement of time consuming and complex steps of biochemical laboratory protocols with a tiny chip. Sample preparation (i. e., dilution or mixing of fluids) is one of the primary tasks of any bioprotocol. In real-life applications where several assays need to be executed for different diagnostic purposes, the same sample fluid is often required with different target concentration factors (CFs). Although several multi-target dilution algorithms have been developed for digital microfluidic ({\em DMF}) biochips, they are not efficient for implementation with continuous-flow based microfluidic ({\em CMF}) chips, which are preferred in the laboratories. In this paper, we present a multi-target dilution algorithm ({\em MTDA}) for {\em CMF} biochips, which, to the best of our knowledge, is the first-of-its-kind. We design a flow-based rotary mixer with a suitable number of segments depending on the target-$CF$ profile, error-tolerance, and optimization criteria. In order to schedule several intermediate fluid-mixing tasks, we develop a multi-target scheduling algorithm ({\em MTSA}) aiming to minimize the usage of storage units, while producing dilutions with multiple $CF$s. Furthermore, we propose a storage architecture for efficiently loading (storing) of intermediate fluids from (to) the storage units.

Investigating the Impact of Image Content On the Energy Efficiency of Hardware Accelerated Digital Spatial Filters

Battery operated low-power portable computing devices are becoming an inseparable part of human daily life. One of the major goals is to achieve the longest battery life in such a device. Additionally, the need for performance in processing multimedia content is ever increasing. Processing image and video content consume more power than other applications. A common approach to improving energy efficiency is to implement the computationally intensive functions as digital hardware accelerators. Spatial filtering is one of the most commonly used methods of digital image processing. As per the Fourier theory, an image can be considered as a two-dimensional signal that is composed of spatially extended two-dimensional sinusoidal patterns called gratings. Spatial frequency theory states that sinusoidal gratings can be characterised by its spatial frequency, phase, amplitude and orientation. This paper presents results from our investigation into assessing the impact of these characteristics of a digital image on the energy efficiency of hardware accelerated spatial filters employed to process the same image. Two greyscale images each of size 128x128 pixels comprising of two-dimensional sinusoidal gratings at maximum spatial frequency of 64 cycles per image orientated at 0 and 90 degrees respectively, were processed in a hardware implemented Gaussian smoothing filter. The energy efficiency of the filter was compared with the baseline energy efficiency of processing a featureless plain black image. The results show that energy efficiency of the filter drops to 12.5% when the gratings are orientated at 0 degrees whilst rises to 72.38% at 90 degrees.

Real-Time Scheduling of DAG tasks with arbitrary deadlines