ACM Transactions on

Design Automation of Electronic Systems (TODAES)

Latest Articles

A Cross-level Verification Methodology for Digital IPs Augmented with Embedded Timing Monitors

Smart systems are characterized by the integration in a single device of multi-domain subsystems of... (more)

Thermal-aware 3D Symmetrical Buffered Clock Tree Synthesis

The semiconductor industry has accepted three-dimensional integrated circuits (3D ICs) as a possible solution to address speed and power management... (more)

Compilation of Dataflow Applications for Multi-Cores using Adaptive Multi-Objective Optimization

State-of-the-art system synthesis techniques employ meta-heuristic optimization techniques for Design Space Exploration (DSE) to tailor application... (more)

Augmenting Operating Systems with OpenCL Accelerators

Heterogeneous computing leverages more than one kind of processors to boost the performance of user-space applications with the heterogeneous... (more)

Electronics Supply Chain Integrity Enabled by Blockchain

Electronic systems are ubiquitous today, playing an irreplaceable role in our personal lives, as well as in critical infrastructures such as power... (more)

Comparing Platform-aware Control Design Flows for Composable and Predictable TDM-based Execution Platforms

We compare three platform-aware feedback control design flows that are tailored for a composable and... (more)

Data-driven Anomaly Detection with Timing Features for Embedded Systems

Malware is a serious threat to network-connected embedded systems, as evidenced by the continued and rapid growth of such devices, commonly referred... (more)

SSA-AC: Static Significance Analysis for Approximate Computing

Recently, the quest to reduce energy consumption in digital systems has been the subject of a number of ongoing studies. One of the most researched focuses is approximate computing (AC). AC is a new computing paradigm in both hardware and software designs that aim to achieve energy-efficient digital systems. Although a variety of AC techniques have... (more)

An Optimized Cost Flow Algorithm to Spread Cells in Detailed Placement

Placement is an important and challenging step in VLSI physical design. The placement solution can significantly impact timing and routability. In... (more)

Enabling IC Traceability via Blockchain Pegged to Embedded PUF

Globalization of IC supply chain has increased the risk of counterfeit, tampered, and re-packaged chips in the market. Counterfeit electronics poses a... (more)


ACM TODAES new page limit policy: Manuscripts must be formatted in the ACM Transactions format; a 35-page limit applies to the final paper. Rare exceptions are possible if recommended by the reviewers and approved by the Editorial Board.

ORCID is a community-based effort to create a global registry of unique researcher identifiers for the purpose of ensuring proper attribution of works to their creators. When you submit a manuscript for review, you will be presented with the opportunity to register for your ORCID.

Welcome ACM Associate Editors

Forthcoming Articles
Stress-Induced Performance Shifts in 3D DRAMs

3D-stacked DRAMs can significantly increase cell density and bandwidth while also lowering power consumption. However, 3D structures experience significant thermomechanical stress due to the differential rate of contraction of the constituent materials, which have different coefficients of thermal expansion. This impacts circuit performance. This paper develops a procedure that performs a performance analysis of 3D DRAMs, capturing the impact of both layout-aware stress and layout-independent stress on parameters such as latency, leakage power, refresh power, area, and bus delay. The approach first proposes a semianalytical stress analysis method for the entire 3D DRAM structure, capturing the stress induced by TSVs, micro bumps, package bumps, and warpage. Next, this stress is translated to variations in device mobility and threshold voltage, after which analytical models for latency, leakage power, and refresh power are derived. Finally, a complete analysis of performance variations is performed for various 3D DRAM layout configurations to assess the impact of layout-dependent stress. We explore the use of alternative flexible package substrate options to mitigate the performance impact of stress. Specifically, we explore the use of an alternative bendable package substrate made of polyimide to reduce warpage-induced stress and show that it reduces stress-induced variations, and improves the performance metrics for stacked 3D DRAMs.

Compiler-Assisted and Profiling-Based Analysis for Fast and Efficient STT-MRAM On-Chip Cache Design

Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising candidate for large on-chip memories as a zero-leakage, high density and non-volatile alternative to the present SRAM technology. Since memories are the dominating component of a System-on-Chip, the overall performance of the system is highly dependent on that memories. Nevertheless, the high write energy and latency of the emerging STT-MRAM are the most challenging design issues in a modern computing system. By relaxing the non-volatility of these devices, it is possible to reduce the write energy and latency costs, at the expense of reducing the retention time, which in turn may lead to loss of data. In this paper, we propose a hybrid STT-MRAM design for caches with different retention capabilities. Then, based on the application requirements (i.e., execution time and memory access rate), program data layout is re-arranged at compilation time for achieving fast and energy efficient hybrid STT-MRAM on-chip memory design with no reliability degradation. The application requirements have been defined at function granularity based on profiling and static code analysis, which estimate the required retention time and memory access rate, respectively. Experimental results show that the proposed hybrid STT-MRAM cache combined with profiling-based and compiler level analysis for the data re-arranging, on average, reduces the write energy per access by 49.7%. At system level, overall static and dynamic energy of the cache are respectively reduced by 8.1% and 44%. Whereas, the system performance has been improved up to 8.1%.

Layout Resynthesis by Applying Design-for-Manufacturability Guidelines to Avoid Low-Coverage Areas of a Cell-Based Design

Design-for-manufacturability (DFM) guidelines are recommended layout design practices intended to capture layout features that are difficult to manufacture correctly. Avoiding such features prevents the occurrence of potential systematic defects. Layout features that result in DFM guideline violations may not be avoided completely due to the design constraints of chip area, performance and power consumption. A framework for translating DFM guideline violations into potential systematic defects, and faults, was described earlier. In a cell-based design, the translated faults may be internal or external to cells. In this article we focus on undetectable faults that are external to cells. Using a resynthesis procedure that makes fine changes to the layout while maintaining the design constraints, we target areas of the design where large numbers of external faults related to DFM guideline violations are undetectable. By eliminating the corresponding DFM guideline violations, we ensure that the circuit does not suffer from low-coverage areas that may result in detectable systematic defects escaping detection, but failing the circuit in the field. The layout resynthesis procedure is applied to benchmark circuits and logic blocks of the OpenSPARC T1 microprocessor. Experimental results indicate that the improvement in the coverage of potential systematic defects is significant.

Improving Test and Diagnosis Efficiency through Ensemble Reduction and Learning

Machine learning is a powerful lever for developing, improving, and optimizing test methodologies to cope with the demand from the advanced nodes. Ensemble methods are a particular learning paradigm that uses multiple models to boost performance. In this work, ensemble reduction and learning is explored for integrated circuit test and diagnosis. For testing, the proposed method is able to reduce the number of system-level tests without incurring substantial increase in defect escapes or yield losses. Significant cost from test execution and set-up preparation can thereby be saved. Experiments are performed on two designs of commercially fabricated chips, for an overall population of > 264,000 chips. The results demonstrate that our method is able to reduce 29.27% and 21.74% of the number of tests for the two chips, respectively, at the cost of very low defect escapes. For failure diagnosis, the framework is able to predict an adequate amount of test data necessary for accurate failure diagnosis. Experiments performed on five standard benchmarks demonstrate that our method outperforms a state-of-the-art work in terms of data-volume reduction. The proposed ensemble-based methodology creates opportunities for improving test and diagnosis efficiency.

MEMS-IC Robustness Optimization Considering Electrical and Mechanical Design and Process Parameters

MEMS-based sensor circuits are traditionally designed separately using CAD tools specific to each energy domain (electrical and mechanical). The paper presents a complete approach for combined MEMS-IC robustness optimization. Advanced methods for robustness analysis and optimization considering design, operating and process parameters, developed for integrated circuits, are transferred to MEMS-IC systems. Both electrical and mechanical design and process parameters are included in the optimization. The methodology is exemplified on two demonstrator examples: a MEMS microphone and a MEMS accelerometer, each with an integrated readout circuit. A successful optimization requires the simultaneous inclusion of design parameters and process tolerances from both energy domains. To save CPU time, a reduced-order, circuit-level model is used for the MEMS part and this model is created only when necessary. To integrate the generation of the simplified model into the optimization flow, a simulation-in-a-loop flow based on commercial tools for both the electrical and the mechanical domain has been implemented.

DCW: A Reactive and Predictable Programming Framework for LET-based Distributed Real-time Systems

Real-time systems continuously interact with the physical environment and often have to satisfy stringent timing constraints imposed by their interactions. Those systems involve two main properties: reactivity and predictability. Reactivity allows the system to continuously react to a non-deterministic external environment, while predictability guarantees the deterministic execution of safety-critical parts of applications. However, with the increase in software complexity, traditional approaches to develop real-time systems make temporal behaviors difficult to infer, especially when the system is required to address non-deterministic aperiodic events from the physical environment. In this paper, we propose a reactive and predictable programming framework, Distributed Clockwerk (DCW), for distributed real-time systems. DCW introduces the Servant, which is a non-preemptible execution entity, to implement periodic tasks based on the Logical Execution Time (LET) model. Furthermore, a joint schedule policy, based on the slack stealing algorithm, is proposed to efficiently address aperiodic events with no violated hard time constraints. To further support predictable communication among distributed nodes, DCW implements the Time-Triggered Controller Area Network (TTCAN) to avoid collisions while accessing the shared communication medium. Moreover, a programming framework implements to provide a set of programming APIs for defining timing and functional behaviors of concurrent tasks. An example is further implemented to illustrate the DCW design flow. The evaluation results demonstrate that our proposal can improve both periodic and aperiodic reactivity compared with existing work, and the implemented DCW can also ensure the system predictability by achieving extremely low overheads.

Fault Tolerance Technique Offlining Faulty Blocks by Heap Memory Management

On Chip Reconfigurable CMOS Analog Circuit Design and Automation Against Aging Phenomena: Sense and React

Adaptive Test for RF/Analog Circuit Using Higher Order Correlations Among Measurements

As process variations increase and devices get more diverse in their behavior, using the same test list for all devices is increasingly inefficient. Methodologies that adapt the test sequence with respect to lot, wafer, or even device?s own behavior help contain the test cost while maintaining test quality. In adaptive test selection approaches, initial test list, a set of tests that are applied to all devices to learn information, plays a crucial role in the quality outcome. Most adaptive test approaches select this initial list based on fail probability of each test individually. Such a selection approach does not take into account the correlations that exist among various measurements and potentially will lead to the selection of correlated tests. In this work, we propose a new adaptive test algorithm that includes a mathematical model for initial test ordering that takes correlations among measurements into account. The proposed method can be integrated within an existing test flow running in the background to improve not only the test quality but also the test time. Experimental results using four distinct industry circuits and large amounts of measurement data show that the proposed technique outperforms prior approaches considerably.

A Novel Rule Mapping on TCAM for Power Efficient Packet Classification

Packet Classification is the enabling function performed in commodity switches for providing various services like access control, intrusion detection, load balancing and so on. Ternary Content Addressable Memories (TCAMs) are the de-facto standard for performing packet classification at high speeds. However, TCAMs are highly costlier both in terms of cost and power consumption, forcing the switch vendors towards placing lots of effort for power management. Hence, power efficient solutions for TCAM based packet classification are highly relevant even today. In this paper, we propose a novel rule placement algorithm based on the unique field values presence within the rule databases. We evaluate the total search that is needed to be inspected with respect to traditional placement approach and the proposed placement approach based on the information content within the fields. Simulation results showed an average reduction of 30.55% in the search space by the proposed placement approach thereby resulting in an average reduction of 18.85% per search energy over TCAM. With typical TCAM clock -speeds ranging between 200 - 400 MHz, this reduction in the per search energy maps to a huge reduction in the total energy consumed by the TCAM based network switches. The proposed solution is plug and play type requiring only minimal preprocessing within the Network Processing Unit (NPU) of the switches and edge routers.

Cross-point Resistive Memory: Nonideal Properties and Solutions

Emerging computational resistive memory is a promising candidate to overcome DRAM challenges and the memory wall bottleneck. However, its cell-level and array-level nonideal properties significantly degrade the reliability, performance, accuracy, and energy efficiency during memory access and analog computation. Cell-level nonidealities include nonlinearity, asymmetry, variability, etc. Array-level nonidealities include interconnect resistance, parasitic capacitance, sneak path, etc. This review summarizes solutions that can mitigate the impact of these nonideal properties. Firstly, we introduce several typical resistive memory devices with focus on their switching modes and characteristics. Secondly, we review resistive memory cells and memory array structures, including 1T1R, 1R, 1S1R, 1TnR, and CMOL. We also overview 3D cross-point arrays and their structural properties. Thirdly, we analyze the impact of cell-level and array-level nonideal properties during memory access and analog arithmetic operation with focus on dot product operation and matrix-vector multiplication. Fourthly, we discuss how to mitigate these nonideal properties by static physical and geometric parameter optimization and dynamic runtime optimization from the viewpoint of cell-array interaction-and-codesign. Dynamic runtime operation schemes include line connection, voltage bias, logical-to-physical mapping, state partition, read reference setting, and switching mode reconfiguration. We also highlight challenges on multilevel cell cross-point arrays and 3D cross-point arrays during these operations. Finally, we survey peripheral circuits design considerations. We also portray an unified reconfigurable computational memory architecture.

Revealing Cluster Hierarchy in Gate-level ICs Using Block Diagrams and Cluster Estimates of Circuit Embeddings

Contemporary integrated circuits (ICs) are increasingly being constructed using intellectual property blocks (IPs) obtained from third parties in a globalized supply chain. The increased vulnerability to adversarial changes during this untrusted supply chain raises concerns about the integrity of the end product. The difference in the levels of abstraction between the initial specification and the final available circuit design poses a challenge for analyzing the final circuit for malicious insertions. Reverse engineering presents one way to help reduce the difficulty of circuit analysis and inspection. In this work, we provide a framework that given (i) a gate-level netlist of a design and (ii) a block diagram for the design with relative sizes of the blocks, outputs a matching between the partitions of the circuit and blocks in the block diagram. We first compute a geometric embedding for each node in the circuit, and then apply a clustering algorithm on the embedding features to obtain circuit partitions. Each partition is then mapped to the high-level blocks in the block diagram. These partitions can then be further analyzed for malicious insertions with much reduced complexity in comparison with the full chip. We tested our algorithm on different designs with varying sizes to evaluate the efficacy of algorithm, including the multi-core processor OpenSparc T1, and showed that we can successfully match over 90% of gates to their corresponding blocks.

All ACM Journals | See Full Journal Index

enter search term and/or author name