Latest Articles

## Probabilistic Evaluation of Hardware Security Vulnerabilities

Various design techniques can be applied to implement the finite state machine (FSM) functions in order to optimize timing, performance, power, and to... (more)

## A Hardware-Efficient Block Matching Algorithm and Its Hardware Design for Variable Block Size Motion Estimation in Ultra-High-Definition Video Encoding

Variable block size motion estimation has contributed greatly to achieving an optimal interframe encoding, but involves high computational complexity and huge memory access, which is the most critical... (more)

## Reducing Writebacks Through In-Cache Displacement

Non-Volatile Memory (NVM) technology is a promising solution to fulfill the ever-growing need for... (more)

## Performance-Aware Test Scheduling for Diagnosing Coexistent Channel Faults in Topology-Agnostic Networks-on-Chip

High--performance multiprocessor SoCs used in practice require a complex network-on-chip (NoC) as... (more)

## Writeback-Aware LLC Management for PCM-Based Main Memory Systems

With the increase in the number of data-intensive applications on today's workloads, DRAM-based main memories are struggling to satisfy the... (more)

## Formal Modeling and Verification of a Victim DRAM Cache

The emerging Die-stacking technology enables DRAM to be used as a cache to break the “Memory Wall” problem. Recent studies have... (more)

## Design Automation for Dilution of a Fluid Using Programmable Microfluidic Device--Based Biochips

Microfluidic lab-on-a-chip has emerged as a new technology for implementing biochemical protocols on... (more)

## Integrated Latch Placement and Cloning for Timing Optimization

This article presents an algorithm for integrated timing-driven latch placement and cloning. Given a circuit placement, the proposed algorithm... (more)

## Incomplete Tests for Undetectable Faults to Improve Test Set Quality

The presence of undetectable faults in a set of target faults implies that tests, which may be important for detecting defects, are missing from the... (more)

## Integrated Approach of Airgap Insertion for Circuit Timing Optimization

Airgap technology enables air to be introduced in inter-metal dielectric (IMD). Airgap between certain wires reduces coupling capacitance due to the... (more)

## Enhancing Speculative Execution With Selective Approximate Computing

Speculative execution is an optimization technique used in modern processors by which predicted instructions are executed in advance with an objective... (more)

##### NEWS

ACM TODAES new page limit policy: Manuscripts must be formatted in the ACM Transactions format; a 35-page limit applies to the final paper. Rare exceptions are possible if recommended by the reviewers and approved by the Editorial Board.

ORCID is a community-based effort to create a global registry of unique researcher identifiers for the purpose of ensuring proper attribution of works to their creators. When you submit a manuscript for review, you will be presented with the opportunity to register for your ORCID.

Welcome ACM Associate Editors

##### Forthcoming Articles
Data-driven Anomaly Detection with Timing Features for Embedded Systems

Malware is a serious threat to network-connected embedded systems, as evidenced by the continued and rapid growth of such devices, commonly referred to as of the Internet of Things. Their ubiquitous use in critical applications require robust protection to ensure user safety and privacy. That protection must be applied to all system aspects, extending beyond protecting the network and external interfaces. Anomaly detection is one of the last lines of defence against malware, in which data-driven approaches that require the least domain knowledge are popular. However, embedded systems, particularly edge devices, face several challenges in applying data-driven anomaly detection, including unpredictability of malware, limited tolerance to long data collection windows, and limited computing/energy resources. In this paper, we utilize subcomponent timing information of software execution, including intrinsic software execution, instruction cache misses, and data cache misses as features, to detect anomalies based on ranges, multidimensional Euclidean distance, and classification at runtime. Detection methods based on lumped timing range are also evaluated and compared. We design several hardware detectors implementing these data-driven detection methods, which non-intrusively measuring lumped/subcomponent timing of all system/function calls of the embedded application. We evaluate the area, power, and detection latency of the presented detector designs. Experimental results demonstrate that the subcomponent timing model provides sufficient features to achieve high detection accuracy with low false positive rates using a one-class support vector machine, considering sophisticated mimicry malware.

Share-n-Learn: A Framework for Sharing Activity Recognition Models in Wearable Systems with Context-Varying Sensors

Wearable sensors utilize machine learning algorithms to infer important events such as behavioral routine and health status of their end-users from time-series sensor data. A major obstacle in large-scale utilization of these systems is that the machine learning algorithms cannot be shared among users or reused in contexts different than the setting in which the training data are collected. As a result, the algorithms need to be retrained from scratch in new sensor-contexts such as when the on-body location of the wearable sensor changes or when the system is utilized by a new user. The retraining process places significant burden on end-users and system designers to collect and label large amounts of training sensor data. In this article, we challenge the current algorithm training paradigm and introduce {\it Share-n-Learn} to automatically detect and learn physical sensor-contexts from a repository of shared expert models without collecting any new labeled training data. Share-n-Learn enables system designers and end-users to seamlessly share and reuse machine learning algorithms that are trained under different contexts and data collection settings. We develop algorithms to autonomously identify sensor-contexts and propose a gating function to automatically activate the most accurate machine learning model among the set of shared expert models. We assess the performance of Share-n-Learn for activity recognition when a dynamic sensor constantly migrates from one body-location to another. Our analysis based on real data collected with human subjects on three datasets demonstrate that Share-n-Learn achieves, on average, 68.4% accuracy in detecting physical activities with context-varying wearables.

A Novel Resistive Memory based Process-In-Memory Architecture for Efficient Logic and Add Operations

The coming era of big data revives the Processing-In-Memory (PIM) architecture to relieve the memory wall problem that embarrasses the modern computing system. However, most existing PIM designs just put computing units closer to memory, rather than a complete integration of them due to their incompatibility in CMOS manufacturing. Fortunately, the emerging Resistive-RAM (ReRAM) offers new hope to this dilemma owing to its inherent memory and computing capability using the same device. In this paper, we propose a ReRAM memory structure with efficient PIM capability of both logic and add operations. It first leverages non-linearity to suppress \emph{sneak current} and thus sustains high memory density. Using a differential bit cell, it also enables efficient processing of arbitrary logic functions using the same memory cells with non-destructive operations. Then, a novel PIM adder is proposed, which customizes a sneak current path as the carry-chain for fast carry propagation and improves adder performance significantly. In the experiment, the proposed PIM demonstrates higher efficiency in both computing area and performance for logic and addition, which greatly increases the ReRAM PIM applicability for future computable architectures.

Comparing Platform-Aware Control Design Flows For Composable and Predictable TDM-Based Execution Platforms

We compare three platform-aware feedback control design flows that are tailored for a composable and predictable Time Division Multiplexing (TDM) based execution platform. The platform allows for independent execution of multiple applications. Using the precise timing knowledge of the platform execution, we accurately characterise the execution of the control application (i.e., sensing, computing, and actuating operations) to design efficient feedback controllers with high control performance in terms of settling time. The design flows are derived for Single-Rate (SR) and Multi-Rate (MR) sampling schemes. We show the applicability of the design flows based on three design considerations and their trade-off: control performance, resource utilisation and actuation signals. The design flows are validated by means of MATLAB and Hardware-In-the-Loop (HIL) simulations for a motion control application.

Thermal-aware 3D Symmetrical Buffered Clock Tree Synthesis

The semiconductor industry has accepted three dimensional integrated circuits (3D ICs) as a possible solution to address speed and power management problems. In addition, 3D ICs have recently demonstrated a huge potential in reducing wire length and increasing the density of a chip. However, the growing density in chips such as TSV-based 3D ICs has brought the increased temperature on chip and temperature gradients depending on location. Thus, through silicon via (TSV)-based 3D clock tree synthesis (CTS) causes thermal problem leading to large clock skew. We propose a novel 3D symmetrical buffered clock tree synthesis considering thermal variation. First, 3D abstract tree topology based on nearest neighbor selection with median cost (3D-NNM) is constructed by pairing sinks that have similar power consumption. Second, the layer assignment of internal nodes is determined for uniform TSV distribution. Third, in thermal-aware 3D deferred merging embedding (DME), the exact location of TSV is determined and wire routing/buffer insertion are performed after the thermal profile based on grid is obtained. The proposed method is verified using a 45nm process technology and utilized a predictive technology model (PTM) with HSPICE. Also, is evaluated for the IBM benchmarks and ISPD09 benchmarks with no blockages. In experimental result, we can achieve average 18% of clock skew reduction compared to existing thermal-aware 3D CTS. Therefore, thermal-aware 3D symmetrical buffered clock tree synthesis presented in this work is very efficient for circuit reliability.

DCW: A Reactive and Predictable Programming Framework for LET-based Distributed Real-time Systems

Real-time systems continuously interact with the physical environment and often have to satisfy stringent timing constraints imposed by their interactions. Those systems involve two main properties: reactivity and predictability. Reactivity allows the system to continuously react to a non-deterministic external environment, while predictability guarantees the deterministic execution of safety-critical parts of applications. However, with the increase in software complexity, traditional approaches to develop real-time systems make temporal behaviors difficult to infer, especially when the system is required to address non-deterministic aperiodic events from the physical environment. In this paper, we propose a reactive and predictable programming framework, Distributed Clockwerk (DCW), for distributed real-time systems. DCW introduces the Servant, which is a non-preemptible execution entity, to implement periodic tasks based on the Logical Execution Time (LET) model. Furthermore, a joint schedule policy, based on the slack stealing algorithm, is proposed to efficiently address aperiodic events with no violated hard time constraints. To further support predictable communication among distributed nodes, DCW implements the Time-Triggered Controller Area Network (TTCAN) to avoid collisions while accessing the shared communication medium. Moreover, a programming framework implements to provide a set of programming APIs for defining timing and functional behaviors of concurrent tasks. An example is further implemented to illustrate the DCW design flow. The evaluation results demonstrate that our proposal can improve both periodic and aperiodic reactivity compared with existing work, and the implemented DCW can also ensure the system predictability by achieving extremely low overheads.

SSA-AC: Static Significance Analysis for Approximate Computing

Recently, the quest to reduce energy consumption in digital systems has been the subject of a number of ongoing studies. One of the most researched focuses is ?Approximate Computing (AC).? AC is a new computing paradigm in both hardware and software designs that aim to achieve energy-efficient digital systems. Although a variety of AC techniques have been studied so far, the main question ?how (in which section) can a program or a circuit be approximated?? has not been answered yet. This work addresses the above issue by developing a software framework SSA-AC (Static Significance Analysis for Approximate Computing) to analyze the target application program and guide the designers to identify parts of the program to which approximation can or cannot be applied. SSA-AC statically analyzes the significance of variables in the precise version of the program and thus needs no trial-and-error evaluation or specific test data. Experimental results show that SSA-AC can successfully extract the significance ranking of inputs/variables to be approximated in a much shorter time than existing statistical works that are inevitably data-dependent.

Enabling IC Traceability via Blockchain Pegged to Embedded PUF

Globalization of IC supply chain has increased the risk of counterfeit, tampered and re-packaged chips in the market. Counterfeit electronics poses a security risk in safety critical applications like avionics, SCADA systems and defense. It also a?ects the reputation of legitimate suppliers and causes fnancial losses. Hence, it becomes necessary to develop traceability solutions to ensure the integrity of supply chain, from the time fabrication to the end of product-life, which allows a customer to verify the provenance of a device or a system. In this paper, we present an IC traceability solution based on blockchain. A blockchain is a public immutable database that maintains a continuously-growing list of data records secured from tampering and revision. Over the lifetime of an IC, all ownership transfer information is recorded and archived in a blockchain. This safe, verifable method prevents any party from altering or challenging the legitimacy of the information being exchanged. However, a chain of sales record is not enough to ensure provenance of an IC. There is a need for clone-proof method for securely binding the identity of an IC to the blockchain information. In this paper, we propose a method of IC supply chain traceability via blockchain pegged to embedded physically unclonable function (PUF). The blockchain provides ownership transfer record, while the PUF provides unique identifcation for an IC allowing it to be linked uniquely to a blockchain. Our proposed solution automates hardware and software protocols using blockchain-powered Smart Contract that allows supply chain participants to authenticate, track, trace, analyze, and provision chips throughout their entire life cycle.

An Optimized Cost Flow Algorithm to Spread Cells in Detailed Placement

Placement is an important and challenging step in VLSI physical design. The placement solution can significantly impact timing and routability. In sub-nanometric technology nodes, several restrictions have been imposed on the placement solutions. These restrictions make designing an optimized and legal solution very hard. Achieving optimized placement solutions is especially challenging in regions with high-density utilization. The quality of placement solution can significantly impact the final circuit implementation. In this work, we present a cell spreading algorithm to move cells out from high-density utilization regions. Our algorithm opens up new spaces in regions with high cell concentration. These spaces can then be exploited by detailed placement algorithms to further optimize the placement solution. The objective of our technique is to reduce area density utilization while considering cell displacement and circuit delay. The outcome of the proposed algorithm is to obtain a uniform distribution of cells in the placement area while having minimal effects on the delay. To achieve this goal, our proposed algorithm uses branch and cut, and network flow techniques. Experimental results on industrial and academic circuits illustrate that our proposed algorithm can minimize circuit delay (up to 25%), cell displacement (up to 17 ?m), dynamic power consumption (up to 5.3%), and leakage power (up to 15%).

Adaptive Test for RF/Analog Circuit Using Higher Order Correlations Among Measurements

As process variations increase and devices get more diverse in their behavior, using the same test list for all devices is increasingly inefficient. Methodologies that adapt the test sequence with respect to lot, wafer, or even device?s own behavior help contain the test cost while maintaining test quality. In adaptive test selection approaches, initial test list, a set of tests that are applied to all devices to learn information, plays a crucial role in the quality outcome. Most adaptive test approaches select this initial list based on fail probability of each test individually. Such a selection approach does not take into account the correlations that exist among various measurements and potentially will lead to the selection of correlated tests. In this work, we propose a new adaptive test algorithm that includes a mathematical model for initial test ordering that takes correlations among measurements into account. The proposed method can be integrated within an existing test flow running in the background to improve not only the test quality but also the test time. Experimental results using four distinct industry circuits and large amounts of measurement data show that the proposed technique outperforms prior approaches considerably.

Electronics Supply Chain Integrity Enabled by Blockchain

Electronic systems are ubiquitous today, playing an irreplaceable role in our personal lives as well as in critical infrastructures such as power grid, satellite communication, and public transportation. In the past few decades, the security of software running on these systems has received significant attention. However, hardware has been assumed to be trustworthy and reliable by default'' without really analyzing the vulnerabilities in the electronics supply chain. With the rapid globalization of the semiconductor industry, it has become challenging to ensure the integrity and security of hardware. In this paper, we discuss the integrity concerns associated with a globalized electronics supply chain. More specifically, we divide the supply chain into six distinct entities: IP owner/foundry (OCM), distributor, assembler, integrator, end user, and electronics recycler, and analyze the vulnerabilities and threats associated with each stage. To address the concerns of the supply chain integrity, we propose a blockchain-based certificate authority framework that can be used to manage critical chip information such as electronic chip identification (ECID), chip grade, transaction time, etc. The decentralized nature of the proposed framework can mitigate most threats of the electronics supply chain, such as recycling, remarking, cloning, and overproduction.

CAD-Base: An Attack Vector into the Electronics Supply Chain

Fabless semiconductor companies design system-on-chips (SoC) by using third-party intellectual property (IP) cores and fabricate them in off-shore, potentially untrustworthy foundries. Owing to the globally distributed electronics supply chain, security has emerged as a serious concern. In this paper, we explore electronics computer-aided design (CAD) software as a threat vector that an attacker can exploit to introduce vulnerabilities into the SoC. We show that all electronics CAD tools ? high-level synthesis, logic synthesis, physical design, verification, test and post-silicon validation? are potential threat vectors to different degrees. We have demonstrated CAD-based attacks on processor including the commercial ARM Cortex M0 processor [1].

Augmenting Operating Systems with OpenCL Accelerators

Heterogeneous computing leverages more than one kind of processors to boost the performance of user-space applications with the heterogeneous programming languages, e.g., OpenCL. While some works have been done to accelerate the computations required by Linux kernel software, they are either application-specific solutions or tightly coupled with the certain computing platforms and are not able to support the general-purpose in-kernel accelerations using different types of processors. In this article, the general-purpose software framework called KOCL, Kernel acceleration with OpenCL, is proposed to tackle the problem. KOCL exposes a set of the high-level programming interfaces for the Linux kernel module developers to offload compute-intensive tasks on different hardware accelerators without managing and coordinating the platform-specific computing and memory resources. The simplified programming efforts are achieved by the developed platform management and memory models, which provide a systematic means of managing the heterogeneous hardware resources. In addition, the one- and zero-copy data buffering schemes are offered by KOCL, so that the offloaded tasks deliver high performance on the platforms with different memory architectures. We have developed the prototype system to accelerate the Network-Attached Storage server applications. Significant performance improvements are achieved with the three different types of accelerators, i.e., the multicore processor, the integrated GPU, and the discrete GPU, respectively. We believe that KOCL is useful for the design of embedded appliances to evaluate the performance of design alternatives.

###### All ACM Journals | See Full Journal Index

Search TODAES
enter search term and/or author name