Latest Articles

## Energy-Efficient and Quality-Assured Approximate Computing Framework Using a Co-Training Method

Approximate computing is a promising design paradigm that introduces a new... (more)

## Efficient Cache Reconfiguration Using Machine Learning in NoC-Based Many-Core CMPs

Dynamic cache reconfiguration (DCR) is an effective technique to optimize energy consumption in many-core architectures. While early work on DCR has... (more)

## Cut Optimization for Redundant Via Insertion in Self-Aligned Double Patterning

Redundant via (RV) insertion helps prevent via defects and hence leads to yield enhancement. However, RV insertion in self-aligned double patterning... (more)

## Impact of Electrostatic Coupling on Monolithic 3D-enabled Network on Chip

Monolithic-3D-integration (M3D) improves the performance and energy efficiency of 3D ICs over conventional through-silicon-vias-based counterparts.... (more)

## JAMS-SG: A Framework for Jitter-Aware Message Scheduling for Time-Triggered Automotive Networks

Time-triggered automotive networks use time-triggered protocols (FlexRay, TTEthernet, etc.) for periodic message transmissions that often originate from safety and time-critical applications. One of the major challenges with time-triggered transmissions is jitter, which is the unpredictable delay-induced deviation from the actual periodicity of a... (more)

## Smart-Hop Arbitration Request Propagation: Avoiding Quadratic Arbitration Complexity and False Negatives in SMART NoCs

SMART-based NoC designs achieve ultra-low latencies by enabling flits to traverse multiple hops within a single clock cycle. Notwithstanding the clear performance benefits, SMART-based NoCs suffer from several shortcomings: each router must arbitrate among a quadratic number of requests, which leads to high costs; each router independently makes... (more)

## IP Protection and Supply Chain Security through Logic Obfuscation: A Systematic Overview

The globalization of the semiconductor supply chain introduces ever-increasing security and privacy risks. Two major concerns are IP theft through... (more)

Real-time and embedded systems are shifting from single-core to multi-core processors, on which the software must be parallelized to fully utilize the... (more)

## Optimization of Threshold Logic Networks with Node Merging and Wire Replacement

In this article, we present an optimization method for threshold logic networks (TLNs) based on observability don’t-care-based node merging.... (more)

## Two-sided Net Untangling with Internal Detours for Single-layer Bus Routing

It is known that one-sided net untangling can be used to untangle the twisted nets inside a bus for single-layer bus routing. However, limited space... (more)

## Runtime Stress Estimation for Three-dimensional IC Reliability Management Using Artificial Neural Network

Heat dissipation and the related thermal-mechanical stress problems are the major obstacles in the... (more)

##### NEWS

ACM TODAES new page limit policy: Manuscripts must be formatted in the ACM Transactions format; a 35-page limit applies to the final paper. Rare exceptions are possible if recommended by the reviewers and approved by the Editorial Board.

ORCID is a community-based effort to create a global registry of unique researcher identifiers for the purpose of ensuring proper attribution of works to their creators. When you submit a manuscript for review, you will be presented with the opportunity to register for your ORCID.

Welcome ACM Associate Editors

##### Forthcoming Articles
Analog/RF Post-silicon Tuning via Bayesian Optimization

Tunable analog/RF circuit has emerged as a promising technique to address the significant performance uncertainties caused by process variations. To optimize these tunable circuits after fabrication, most existing post-silicon programming methods are developed by using real-valued performance metrics. However, when measuring a performance of interest on silicon, it is often substantially more expensive to obtain a real-valued measurement than a binary testing outcome (i.e., pass or fail). In this paper, we propose a Gaussian Process Classification model to capture the binary performance metrics of tunable analog/RF circuits. Based on these models, post-silicon programming is cast into an optimization problem that can be solved by a novel Bayesian optimization algorithm. Moreover, measurement noises are further incorporated into our proposed post-silicon programming in order to produce a robust circuit. Two circuit examples demonstrate that the proposed approach can efficiently program tunable circuits with binary performance metrics while other conventional methods are not applicable

Harnessing the Granularity of Micro-Electrode-Dot-Array Architectures for optimizing Droplet Routing in Biochips

In this paper, we consider the problem of droplet routing for Microelectrode-Dot-Array (MEDA) biochips. MEDA biochips today provide a host of useful features for droplet movement by making it possible to manoeuvre droplets at a much finer granularity and with significantly increased flexibility. More precisely, MEDA biochips support more degrees of freedom in navigation and volumetric manipulation such as diagonal movement, droplet reshaping, and fractional-level split-and-merge. This helps improving routing of droplets on microfluidic grids -- in particular, when the space available on the grid is limited or blocked by obstacles. In this work, we discuss how these improved capabilities can be utilized in the realization of the desired routes on those biochips. To this end, we introduce a routing method that utilizes satisfiability solvers and guarantees the generation of optimal solutions. This significantly improves the state of the art, since previously proposed solutions either (1) relied on heuristics and, hence, were not able to guarantee the optimum or (2) only considered a subset of the MEDA features. The solution proposed in this work includes a formulation of all MEDA features which, as illustrated by examples, allows for the determination of routing solutions with smaller completion times. Experimental evaluations confirm these findings.

Search-Space Decomposition for System-Level Design Space Exploration of Embedded Systems

The development of large-scale many-core platforms and the rising complexity of embedded applications have led to a significant increase in the number of implementation possibilities for a single application. Furthermore, rising demands on safe, energy-efficient, or real-time capable application execution make the problem of determining feasible implementations that are optimal with respect to such design objectives even more of a challenge. State-of-the-art Design Space Exploration (DSE) techniques demonstrably suffer from the vast and sparse search spaces posed by modern embedded systems, emphasizing the need for novel design methodologies in this field. Based on the idea of reducing problem complexity by a suitable decomposition of the system specification, the work at hand proposes a portfolio of dynamic decomposition mechanisms that automatically decompose any system specification based on a short pre-exploration of the complete system. We present a two-phase approach consisting of a set of novel data extraction and representation techniques combined with a selection of filtering operations that extract a decomposed system specification based on information gathered during pre-exploration. The proposed decomposition procedure can seamlessly be integrated in any DSE flow, constituting a flexible extension for existing DSEs. We illustrate the efficiency of the proposed decomposition portfolio applied to state-of-the-art DSE for many-core systems as well as networked embedded systems from the automotive domain. Experimental results show significant increases in optimization quality of up to 87% within constant DSE time compared to existing approaches.

LBNoC-Design of low-latency router architecture with Lookahead Bypass for Network-on-chip using FPGA

An FPGA based NoC using a low latency router with a look-ahead bypass(LBNoC) has been designed. The proposed design targets the optimized area with improved network performance. The techniques such as single cycle router bypass, parallel Virtual Channel and Switch Allocation, combined virtual cut through and wormhole switching have been employed in the design of the LBNoC router. The LBNoC router is parameterizable with the network topology, traffic patterns, routing algorithms, buffer depth, buffer width, number of VCs, I/O ports being configurable. A table based routing algorithm has been employed to support the design of custom topologies. The input buffer modules of the NoC router have been mapped on the FPGA BRAM hard blocks to utilize resources efficiently. The LBNoC architecture consumes 4.5% and 27.1% fewer hardware resources than the ProNoC and CONNECT NoC architectures. The average packet latency of the LBNoC NoC architecture is 30% and 15% lesser than the CONNECT and ProNoC architectures. The LBNoC architecture is 1.15× and 1.18× faster than the ProNoC and CONNECT NoC frameworks.

Hardware Trojan Mitigation in Pipelined MPSoCs

Multiprocessor System-on-Chip (MPSoC) has become necessary due to the the billions of transistors available to the designer, the need for fast design turnaround times and the power wall. Thus, present embedded systems are designed with MPSoCs, and one possible way MPSoCs can be realized is through Pipelined MPSoC (PMPSoC) architectures, which are used in applications from video surveillance to cryptosystems. Hardware Trojans (HTs) on PMPSoCs are a significant concern due to the damage caused by their stealth. An adversary could use HTs to extract secret information (data leakage), to modify functionality/data (functional modification), or make PMPSoCs deny service. In this paper, we present PMPGuard, a mechanism that, (1) detects the presence of hardware Trojans in Third Party Intellectual Property (3PIP) cores of PMPSoCs, by continuous monitoring and testing, and (2) recovers the system by switching the infected processor core with another one. We designed, implemented, and tested the system on a commercial cycle accurate multiprocessor simulation environment. Compared to the state of the art system level techniques that use Triple Modular Redundancy (TMR) and therefore incurs at least 3× area and power overheads, our proposed system incurs about 2× area and 1.5× power overheads, without any adverse impact on throughput.

Making Aging Useful by Recycling Aging-Induced Clock Skew

Device aging, which causes significant loss on circuit performance and lifetime, has been a primary factor in reliability degradation of nanoscale designs. In this paper, we propose to take advantage of aging-induced clock skews (i.e., make them useful for aging tolerance) by manipulating and recycling these time-varying skews to compensate for the performance degradation of logic networks. The goal is to assign achievable/reasonable aging-induced clock skews in a circuit, such that its effective performance degradation due to aging can be tolerated, that is, the lifespan can be maximized. On average, 25.04% aging tolerance can be achieved with insignificant design overhead. Moreover, we employ Vth assignment on clock buffers to further tolerate the aging-induced degradation of logic networks. When Vth assignment is applied on top of aforementioned aging manipulation, the average aging tolerance can be enhanced to 35.96%.

Hidden in Plaintext: An Obfuscation-based Countermeasure against FPGA Bitstream Tampering Attacks

Field Programmable Gate Arrays (FPGAs) have become an attractive choice for diverse applications due to their reconfigurability and unique security features. However, designs mapped to FPGAs are prone to malicious modifications or tampering of critical functions. Besides, targeted modifications have demonstrably compromised FPGA implementations of various cryptographic primitives. Existing security measures based on encryption and authentication can be bypassed using their side-channel vulnerabilities to execute bitstream tampering attacks. Furthermore, numerous resource-constrained applications are now equipped with low-end FPGAs, which may not support power-hungry cryptographic solutions. In this paper, we propose a novel obfuscation-based approach to achieve strong resistance against both random and targeted pre-configuration tampering of critical functions in an FPGA design. Our solution first identifies the unique structural and functional features that separate the critical function from the rest of the design using a machine learning guided framework. The selected features are eliminated by applying appropriate obfuscation techniques, many of which take advantage of "FPGA dark silicon" - unused lookup table resources, to mask the critical functions. Furthermore, following the same obfuscation principle, a redundancy-based technique is proposed to thwart targeted, rule-based, and random tampering. We have developed a complete methodology and custom software toolflow that integrates with commercial tools. By applying the masking technique on a design containing AES, we show the effectiveness of the proposed framework in hiding the critical S-Box function. We implement the redundancy integrated solution in various cryptographic designs to analyze the overhead. In order to protect 16.2% critical component of a design, the proposed approach incurs an average area overhead of only 2.4% over similar redundancy-based approaches, while achieving strong security.

Security-Aware Routing and Scheduling for Control Applications on Ethernet TSN Networks

Today, it is common knowledge, in the cyber-physical systems domain, that the tight interaction between the cyber and physical elements provides the possibility of substantially improving the performance of these systems that is otherwise impossible. On the downside, however, this tight interaction with cyber elements makes it easier for an adversary to compromise the safety of the system. This becomes particularly important since such systems typically comprise several critical physical components, e.g., adaptive cruise control or engine control that allow deep intervention in the driving of a vehicle. As a result, it is important to ensure not only the reliability of such systems, e.g., in terms of schedulability and stability of control plants, but also resilience to adversarial attacks. In this article, we propose a security-aware methodology for routing and scheduling for control applications in Ethernet networks. The goal is to maximize the resilience of control applications within these networked control systems to malicious interference, while guaranteeing the stability of all control plants, despite the stringent resource constraints in such cyber-physical systems. Our experimental evaluations demonstrate that careful optimization of available resources can significantly improve the resilience of these networked control systems to attacks.

Hierarchical Ensemble Reduction and Learning for Resource-Constrained Computing

Generic tree ensembles (such as Random Forest, RF) rely on a substantial amount of individual models to attain desirable performance. The cost of maintaining a large ensemble could become prohibitive in applications where computing resources are stringent. In this work, a hierarchical ensemble reduction and learning framework is proposed. Experiments show our method consistently outperforms RF in terms of both accuracy and retained ensemble size. In other words, ensemble reduction is achieved with enhancement in accuracy rather than degradation. The method can be executed efficiently, up to >590X time reduction than a recent ensemble reduction work. We also developed Boolean logic encoding techniques to directly tackle multiclass problems. Moreover, our framework bridges the gap between software-based ensemble methods and hardware computing in the IoT era. We developed a novel conversion paradigm that supports the automatic deployment of >500 trees on a chip. Our proposed method reduces power consumption and overall area utilization by >21.5% and >62%, respectively, comparing with RF. The hierarchical approach provides rich opportunities to balance between the computation (training and response time), the hardware resource (memory and energy), and accuracy.

An Implication-Based Test Scheme for Both Diagnosis and Concurrent Error Detection Applications

This paper describes a diagnosis-aware hybrid concurrent error detection (DAH-CED) scheme that can facilitate both off-line and on-line test applications. By using the proposed scheme, not only the probability of detecting errors (on-line) but also the diagnosability of the target circuit (off-line) can be significantly enhanced. The proposed scheme combines the implication-based method with the parity check method. In particular, novel algorithms are developed to identify specific implications for enhancing the diagnosability for the modeled faults proactively. Furthermore, a reduction algorithm is also presented to minimize the number of the employed implications, while no loss on probability of detecting errors and diagnosability is also guaranteed. To the best of our knowledge, this issue is not addressed in the literature. In order to validate the proposed scheme, not only stuck-at faults but also transition faults are considered to simulate the timing-related errors. The experimental results on nine ITC?99 benchmark circuits show that the diagnosability for stuck-at (transition) faults is enhanced by 6.88% (7.78%) by applying the proposed scheme. As for the probability of detecting errors, 97.73% (97.10%) is achieved for errors caused by stuck-at (transition) faults. Moreover, only 3.11% of implications are needed.

Lithography Hotspot Detection with FFT-Based Feature Extraction and Imbalanced Learning Rate

With the increasing gap between transistor feature size and lithography manufacturing capability, the detection of lithography hotspots becomes a key stage of physical verification flow to enhance manufacturing yield. Although machine learning approaches are distinguished for their high detection efficiency, they still suffer from problems such as large scale layout and class imbalance. In this paper, we develop a hotspot detection model based on machine learning with high performance. In the proposed model, we firstly apply an FFT-based feature extraction method, which can compress large scale layout to a multi-dimensional representation with much smaller size while preserves the discriminative layout pattern information, to improve the detection efficiency. Secondly, addressing the class imbalance problem, we propose a new technique called Imbalanced Learning Rate (ILR) and embed it into the CNN model to further reduce false alarms without accuracy decay. Compared with the results of current state-of-the-art approaches on ICCAD 2012 Contest benchmarks, our proposed model can achieve better solutions in many evaluation metrics, including the official metrics.

Bio-chemical Assay Locking to Thwart Bio-IP Theft

Microfluidic technologies are now entering a phase of rapid commercialization and deployment. One indicator of this is the recent FDA approval of the Baebies SEEKER, a digital microfluidic platform for medical diagnostics [4]. The chemicals, materials, and biochemical protocols required to realize a modern microfluidic system are becoming increasingly sophisticated and complex, making the task of designing such a system impractical for a single organization. It is expected that the manufacture of microfluidic systems will begin to adopt a horizontal supply chain, where the holders of intellectual property (IP) that dictate a biochip?s functionality send their designs to a third-party foundry for fabrication [1]. Such an approach mirrors the manufacturing model established by the semiconductor industry. An undesirable side-effect of this manufacturing model is the potential for untrusted third-parties, who in the course of performing their intended duties, also steal IP or alter designs to modify the functionality of the end product. It is critical that designers of microfluidic systems prevent IP theft not only to prevent financial losses, but also to preserve the trust of end users. Grey market devices fabricated with lower quality may not perform to the same standard as authentic devices, which may lead to faulty operation. Given that microfluidic systems are commonly employed in mission-critical applications, this would lead to a severe erosion in trust.

Architectural Design of Flow-based Microfluidic Biochips for Multi-Target Dilution of Biochemical Fluids

Microfluidic technologies enable replacement of time consuming and complex steps of biochemical laboratory protocols with a tiny chip. Sample preparation (i. e., dilution or mixing of fluids) is one of the primary tasks of any bioprotocol. In real-life applications where several assays need to be executed for different diagnostic purposes, the same sample fluid is often required with different target concentration factors (CFs). Although several multi-target dilution algorithms have been developed for digital microfluidic ({\em DMF}) biochips, they are not efficient for implementation with continuous-flow based microfluidic ({\em CMF}) chips, which are preferred in the laboratories. In this paper, we present a multi-target dilution algorithm ({\em MTDA}) for {\em CMF} biochips, which, to the best of our knowledge, is the first-of-its-kind. We design a flow-based rotary mixer with a suitable number of segments depending on the target-$CF$ profile, error-tolerance, and optimization criteria. In order to schedule several intermediate fluid-mixing tasks, we develop a multi-target scheduling algorithm ({\em MTSA}) aiming to minimize the usage of storage units, while producing dilutions with multiple $CF$s. Furthermore, we propose a storage architecture for efficiently loading (storing) of intermediate fluids from (to) the storage units.

Memristive Crossbar Mapping for Neuromorphic Computing Systems on 3D IC

In recent years, neuromorphic computing systems based on memristive crossbar have provided a promising solution to enable acceleration of neural networks. However, most of the neural networks used in realistic applications are often sparse. If such sparse neural network is directly implemented on a single memristive crossbar, it would result in inefficient hardware realizations. In this work, we propose E3D-FNC, an enhanced 3D floorplanning framework for neuromorphic computing systems, in which the neuron clustering and the layer assignment are considered interactively. First, in each iteration, hierarchical clustering partitions neurons into a set of clusters under the guidance of the proposed distance metric. The optimal number of clusters is determined by L-method. Then matrix re-ordering is proposed to re-arrange the columns of the weight matrix in each cluster. As a result, the transformed connection matrix can be easily mapped into a set of crossbars with high utilizations. Next, since the clustering results will inversely affect the floorplan, we perform the floorplanning of neurons and crossbars again. All the proposed methodologies are embedded in an iterative framework to improve the quality of NCS design. Finally, a 3D floorplan of neuromorphic computing systems is generated. Experimental results show that E3D-FNC can achieve highly hardware-efficient designs, compared to state-of-the-art.